LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 33587 - clang fails to compile linux: invalid size for constraints (integrated-as)
Summary: clang fails to compile linux: invalid size for constraints (integrated-as)
Status: RESOLVED WONTFIX
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: X86 (show other bugs)
Version: 6.0
Hardware: PC Linux
: P normal
Assignee: Unassigned LLVM Bugs
URL: https://github.com/ClangBuiltLinux/li...
Keywords:
Depends on:
Blocks: 4068
  Show dependency tree
 
Reported: 2017-06-25 13:53 PDT by Dmitry Golovin
Modified: 2021-09-06 14:05 PDT (History)
15 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Golovin 2017-06-25 13:53:20 PDT
The following errors are generated when compiling linux on x86 with clang with integrated-as enabled: "invalid output size for constraint '=q'" and "invalid input size for constraint 'qi'". These constraints are defined in macros percpu_from_op and percpu_to_op from arch/x86/include/asm/percpu.h:

    asm(op "b %1,"__percpu_arg(0)
        : "+m" (var)
        : "qi" ((pto_T__)(val)));

    asm(op "b "__percpu_arg(1)",%0"
        : "=q" (pfo_ret__)
        : "m" (var));


Latest LLVM source and Linux source were used.

Example error when compiling arch/x86/kernel/cpu/intel.c:

arch/x86/kernel/cpu/intel.c:97:2: error: invalid input size for constraint 'qi'
        this_cpu_or(msr_misc_features_shadow,
        ^
./include/linux/percpu-defs.h:498:32: note: expanded from macro 'this_cpu_or'
#define this_cpu_or(pcp, val)           __pcpu_size_call(this_cpu_or_, pcp, val)
                                        ^
./include/linux/percpu-defs.h:364:11: note: expanded from macro '__pcpu_size_call'
                case 1: stem##1(variable, __VA_ARGS__);break;           \
                        ^
<scratch space>:263:1: note: expanded from here
this_cpu_or_1
^
./arch/x86/include/asm/percpu.h:425:34: note: expanded from macro 'this_cpu_or_1'
#define this_cpu_or_1(pcp, val)         percpu_to_op("or", (pcp), val)
                                        ^
./arch/x86/include/asm/percpu.h:101:15: note: expanded from macro 'percpu_to_op'
                    : "qi" ((pto_T__)(val)));           \
                            ^
arch/x86/kernel/cpu/intel.c:97:2: error: invalid input size for constraint 'qi'
./include/linux/percpu-defs.h:498:32: note: expanded from macro 'this_cpu_or'
#define this_cpu_or(pcp, val)           __pcpu_size_call(this_cpu_or_, pcp, val)
                                        ^
./include/linux/percpu-defs.h:365:11: note: expanded from macro '__pcpu_size_call'
                case 2: stem##2(variable, __VA_ARGS__);break;           \
                        ^
<scratch space>:264:1: note: expanded from here
"0"
^
./arch/x86/include/asm/percpu.h:426:34: note: expanded from macro 'this_cpu_or_2'
#define this_cpu_or_2(pcp, val)         percpu_to_op("or", (pcp), val)
                                        ^
./arch/x86/include/asm/percpu.h:101:15: note: expanded from macro 'percpu_to_op'
                    : "qi" ((pto_T__)(val)));           \
                            ^
arch/x86/kernel/cpu/intel.c:97:2: error: invalid input size for constraint 'qi'
./include/linux/percpu-defs.h:498:32: note: expanded from macro 'this_cpu_or'
#define this_cpu_or(pcp, val)           __pcpu_size_call(this_cpu_or_, pcp, val)
                                        ^
./include/linux/percpu-defs.h:366:11: note: expanded from macro '__pcpu_size_call'
                case 4: stem##4(variable, __VA_ARGS__);break;           \
                        ^
<scratch space>:264:1: note: expanded from here
"0"
^
./arch/x86/include/asm/percpu.h:427:34: note: expanded from macro 'this_cpu_or_4'
#define this_cpu_or_4(pcp, val)         percpu_to_op("or", (pcp), val)
                                        ^
./arch/x86/include/asm/percpu.h:101:15: note: expanded from macro 'percpu_to_op'
                    : "qi" ((pto_T__)(val)));           \
                            ^
arch/x86/kernel/cpu/intel.c:505:2: error: invalid input size for constraint 'qi'
        this_cpu_write(msr_misc_features_shadow, 0);
        ^
./include/linux/percpu-defs.h:495:34: note: expanded from macro 'this_cpu_write'
#define this_cpu_write(pcp, val)        __pcpu_size_call(this_cpu_write_, pcp, val)
                                        ^
./include/linux/percpu-defs.h:364:11: note: expanded from macro '__pcpu_size_call'
                case 1: stem##1(variable, __VA_ARGS__);break;           \
                        ^
<scratch space>:61:1: note: expanded from here
this_cpu_write_1
^
./arch/x86/include/asm/percpu.h:416:36: note: expanded from macro 'this_cpu_write_1'
#define this_cpu_write_1(pcp, val)      percpu_to_op("mov", (pcp), val)
                                        ^
./arch/x86/include/asm/percpu.h:101:15: note: expanded from macro 'percpu_to_op'
                    : "qi" ((pto_T__)(val)));           \
                            ^
arch/x86/kernel/cpu/intel.c:505:2: error: invalid input size for constraint 'qi'
./include/linux/percpu-defs.h:495:34: note: expanded from macro 'this_cpu_write'
#define this_cpu_write(pcp, val)        __pcpu_size_call(this_cpu_write_, pcp, val)
                                        ^
./include/linux/percpu-defs.h:365:11: note: expanded from macro '__pcpu_size_call'
                case 2: stem##2(variable, __VA_ARGS__);break;           \
                        ^
<scratch space>:62:1: note: expanded from here
"0"
^
./arch/x86/include/asm/percpu.h:417:36: note: expanded from macro 'this_cpu_write_2'
#define this_cpu_write_2(pcp, val)      percpu_to_op("mov", (pcp), val)
                                        ^
./arch/x86/include/asm/percpu.h:101:15: note: expanded from macro 'percpu_to_op'
                    : "qi" ((pto_T__)(val)));           \
                            ^
arch/x86/kernel/cpu/intel.c:505:2: error: invalid input size for constraint 'qi'
./include/linux/percpu-defs.h:495:34: note: expanded from macro 'this_cpu_write'
#define this_cpu_write(pcp, val)        __pcpu_size_call(this_cpu_write_, pcp, val)
                                        ^
./include/linux/percpu-defs.h:366:11: note: expanded from macro '__pcpu_size_call'
                case 4: stem##4(variable, __VA_ARGS__);break;           \
                        ^
<scratch space>:62:1: note: expanded from here
"0"
^
./arch/x86/include/asm/percpu.h:418:36: note: expanded from macro 'this_cpu_write_4'
#define this_cpu_write_4(pcp, val)      percpu_to_op("mov", (pcp), val)
                                        ^
./arch/x86/include/asm/percpu.h:101:15: note: expanded from macro 'percpu_to_op'
                    : "qi" ((pto_T__)(val)));           \
                            ^
arch/x86/kernel/cpu/intel.c:511:8: error: invalid output size for constraint '=q'
        msr = this_cpu_read(msr_misc_features_shadow);
              ^
./include/linux/percpu-defs.h:494:29: note: expanded from macro 'this_cpu_read'
#define this_cpu_read(pcp)              __pcpu_size_call_return(this_cpu_read_, pcp)
                                        ^
./include/linux/percpu-defs.h:308:23: note: expanded from macro '__pcpu_size_call_return'
        case 1: pscr_ret__ = stem##1(variable); break;                  \
                             ^
<scratch space>:62:1: note: expanded from here
"0"
^
./arch/x86/include/asm/percpu.h:413:31: note: expanded from macro 'this_cpu_read_1'
#define this_cpu_read_1(pcp)            percpu_from_op("mov", pcp)
                                        ^
./arch/x86/include/asm/percpu.h:188:15: note: expanded from macro 'percpu_from_op'
                    : "=q" (pfo_ret__)                  \
                            ^
arch/x86/kernel/cpu/intel.c:511:8: error: invalid output size for constraint '=q'
./include/linux/percpu-defs.h:494:29: note: expanded from macro 'this_cpu_read'
#define this_cpu_read(pcp)              __pcpu_size_call_return(this_cpu_read_, pcp)
                                        ^
./include/linux/percpu-defs.h:309:23: note: expanded from macro '__pcpu_size_call_return'
        case 2: pscr_ret__ = stem##2(variable); break;                  \
                             ^
<scratch space>:62:1: note: expanded from here
"0"
^
./arch/x86/include/asm/percpu.h:414:31: note: expanded from macro 'this_cpu_read_2'
#define this_cpu_read_2(pcp)            percpu_from_op("mov", pcp)
                                        ^
./arch/x86/include/asm/percpu.h:188:15: note: expanded from macro 'percpu_from_op'
                    : "=q" (pfo_ret__)                  \
                            ^
arch/x86/kernel/cpu/intel.c:511:8: error: invalid output size for constraint '=q'
./include/linux/percpu-defs.h:494:29: note: expanded from macro 'this_cpu_read'
#define this_cpu_read(pcp)              __pcpu_size_call_return(this_cpu_read_, pcp)
                                        ^
./include/linux/percpu-defs.h:310:23: note: expanded from macro '__pcpu_size_call_return'
        case 4: pscr_ret__ = stem##4(variable); break;                  \
                             ^
<scratch space>:62:1: note: expanded from here
"0"
^
./arch/x86/include/asm/percpu.h:415:31: note: expanded from macro 'this_cpu_read_4'
#define this_cpu_read_4(pcp)            percpu_from_op("mov", pcp)
                                        ^
./arch/x86/include/asm/percpu.h:188:15: note: expanded from macro 'percpu_from_op'
                    : "=q" (pfo_ret__)                  \
                            ^
9 errors generated.

This blocks the meta-bug 4068.
Comment 1 dwmw2@infradead.org 2018-02-14 08:15:34 PST
Interestingly, while this fails to compile:

$ cat q.c
long long foo(void)
{
	long long ret;
	asm ("movb $5, %0" : "=q" (ret));
	return ret;
}
$ ~/git/llvm6/bin/clang -o- -S -O2 q.c -m32
q.c:4:29: error: invalid output size for constraint '=q'
        asm ("movb $5, %0" : "=q" (ret));
                                   ^


... this version works (and shouldn't):

$ cat q.c
long long foo(void)
{
	long long ret;
	asm ("movl $5, %0" : "=r" (ret));
	return ret;
}
$ ~/git/llvm6/bin/clang -o- -S -O2 q.c -m32
	.text
	.file	"q.c"
	.globl	foo                     # -- Begin function foo
	.p2align	4, 0x90
	.type	foo,@function
foo:                                    # @foo
# %bb.0:
	#APP
	movl	$5, %eax
	#NO_APP
	retl
.Lfunc_end0:
	.size	foo, .Lfunc_end0-foo
                                        # -- End function

	.ident	"clang version 6.0.0 (trunk 321709) (llvm/trunk 321711)"
	.section	".note.GNU-stack","",@progbits
Comment 2 dwmw2@infradead.org 2018-02-14 08:23:43 PST
The offending code in the kernel actually looks more like this:

long long foo(void)
{
	long long ret;
	switch (sizeof(ret)) {
	case 1:
		asm ("movb $5, %0" : "=q" (ret));
		break;
	case 4:
		asm ("movl $4, %0" : "=r" (ret));
		break;
	case 8:
		/* Do something more complex, but ultimately correct */
		break;
	}
	return ret;
}

It also fails, apparently because LLVM objects to the asm with the "=q" constraint before ever realising that it wouldn't have to emit it anyway.

Interestingly, if the datatype of 'ret' is just "long", the above code does compile. Because for a "long", LLVM evidently *does* manage to optimise away that asm statement before attempting to validate it? 

Some consistency here would be useful. Don't make me do this:
http://git.infradead.org/users/dwmw2/linux-retpoline.git/commitdiff/1b756b72f6b7fe3c239d782e64c4d2253aeda0f9
Comment 3 Dmitry Golovin 2018-02-14 08:38:03 PST
I was recently contacted by an anonymous person regarding this bug and this person claimed to compile a functional Linux kernel with only minor changes. Quote:

> I had been getting the error for much older clang and kernel,
> but I solved the problem for my purposes by changing
> instances of 'qi' and ‎'=q' in arch/x86/include/asm/percpu.h
> to 'ri' and '=r', respectively. Then I used make with an
> --ignore-errors flag. I got a successful vmlinux‎ build,
> which is all I needed because I just required a llvm bc
> module of the entire kernel.

I don't know if this information is useful or up to date, just publishing it here.
Comment 4 dwmw2@infradead.org 2018-02-14 11:02:38 PST
> I had been getting the error for much older clang and kernel,
> but I solved the problem for my purposes by changing
> instances of 'qi' and ‎'=q' in arch/x86/include/asm/percpu.h
> to 'ri' and '=r', respectively.

Hm yes, that does seem to work with both clang and gcc. Will see how this works:
http://git.infradead.org/users/dwmw2/linux-retpoline.git/commitdiff/9278627b7371b4511789f69d199f14a151fc4122
Comment 5 dwmw2@infradead.org 2018-02-14 13:28:46 PST
Hm, kernel 0-day testing suggests that it doesn't work with GCC in all cases:

   mm/vmstat.c: In function '__inc_zone_state':
>> mm/vmstat.c:392:1: error: unsupported size for integer register
    }
    ^
   mm/vmstat.c: Assembler messages:
>> mm/vmstat.c:385: Error: bad register name `%sil'
Comment 6 Hans Wennborg 2018-02-26 07:23:33 PST
I don't think we can block the 6.0 release on this.
Comment 7 Reid Kleckner 2018-02-26 15:44:11 PST
Akira added this check in r218064 in 2014.

This seems like something that might be better implemented in Linux with __attribute__((overloadable)) or _Generic. Otherwise we will have to delay diagnosing invalid inline asm constraints until after optimization, and this generally provides a worse user experience.
Comment 8 Akira Hatanaka 2018-02-27 18:14:02 PST
It looks like X86_32TargetInfo::validateOperandSize is missing the check for constraint 'r'.

Based on the code I see that is generated with -O0, I don't think silently accepting "=q" and "=r" (without any changes to the backend code generation) would be correct.

$ clang -o- -S -O0 q.c -m32

_foo:                                   ## @foo
	.cfi_startproc
## %bb.0:                               ## %entry
	pushl	%ebp
	.cfi_def_cfa_offset 8
	.cfi_offset %ebp, -8
	movl	%esp, %ebp
	.cfi_def_cfa_register %ebp
	subl	$8, %esp
	## InlineAsm Start
	movl	$5, %eax
	## InlineAsm End
	movl	%ecx, -4(%ebp) ### is %ecx initialized?
	movl	%eax, -8(%ebp)
	movl	-8(%ebp), %eax
	movl	-4(%ebp), %edx
	addl	$8, %esp
	popl	%ebp
	retl
Comment 9 dwmw2@infradead.org 2018-02-27 23:26:38 PST
It's OK to fail if you were going to *output* the asm in question. That includes my first q.c, as well as the kernel code in comment #2 if built with -O0.

It's less OK to fail when you weren't going to have to emit it anyway, which is the case I really care about.

If you're going to break *that*, why would we have any confidence that you aren't also going to break well-known constructs such as

 if (optimisable_to_zero_at_compile_time)
    __undefined_func();

An LTO-capable compiler would be within its rights to break that, right? Again by performing the validity check *before* the optimisation which makes it go away. And we'd be *just* as unhappy if you broke that, as we are with the asm thing ;)
Comment 10 Akira Hatanaka 2018-02-28 10:18:55 PST
The validity check is done early in the front-end partly because the backend (as it stands today) cannot handle inline-asm instructions that have constraints that are not valid for the input or output operands (and that was what motivated r218064). So unlike the undefined behavior case you mentioned, we can't just let the front-end ignore the invalid inline-asm statement and let the compiler crash in the backend.

I still think it's valuable to inform users that there is something wrong in the inline-asm statement, so I don't think we want to just drop the diagnostic completely. Maybe there should be a command line option (that is turned off by default) that tells the compiler to ignore invalid inline-asm statements and keep on going, but in that case we would have to teach the backend to handle invalid constraints or drop the invalid inline-asm in the front-end (the IR for the inline-asm statement will not be emitted).
Comment 11 Reid Kleckner 2018-02-28 11:08:59 PST
(In reply to Akira Hatanaka from comment #10)
> I still think it's valuable to inform users that there is something wrong in
> the inline-asm statement, so I don't think we want to just drop the
> diagnostic completely. Maybe there should be a command line option (that is
> turned off by default) that tells the compiler to ignore invalid inline-asm
> statements and keep on going, but in that case we would have to teach the
> backend to handle invalid constraints or drop the invalid inline-asm in the
> front-end (the IR for the inline-asm statement will not be emitted).

We should always do what we can to make LLVM's backend more robust so that it can report proper errors instead of crashing, but it has less information about the constraint locations, so it's always going to give a lower quality diagnostic. Therefore I think we want to stick with the early validation.

We could add a flag to disable it, but flags suck. We could also throw it under the bus of -fno-integrated-as, if Linux already uses that.

If Linux ends up making source changes here, I'd recommend _Generic, since that should work with GCC, too.
Comment 12 dwmw2@infradead.org 2018-02-28 11:31:56 PST
I'm happy to make source changes. I had been pondering this:
https://lkml.org/lkml/2018/2/9/529

However, I wasn't sure that was going to be sufficient — since I've just made you notice that you *ought* to be erroring on some of the other asm statements in there too.
Comment 13 Nick Desaulniers 2018-03-06 11:13:29 PST
> We could also throw it under the bus of -fno-integrated-as, if Linux already uses that.

We already do:
https://elixir.bootlin.com/linux/v4.16-rc4/source/Makefile#L734
https://elixir.bootlin.com/linux/v4.16-rc4/source/Makefile#L746

> I'd recommend _Generic

Isn't that -std=c11 ?  The kernel uses -std=gnu89, for now.
https://elixir.bootlin.com/linux/v4.16-rc4/source/Makefile#L422
That's a whole other topic for another day.

> I'm happy to make source changes. I had been pondering this: https://lkml.org/lkml/2018/2/9/529

That patch seems fine.  While it's not great, compromises will need to be made to support additional compiler coverage.  My contacts at Intel are complaining about this very error blocking them from adding support for Clang to 0-day bot, so I'm happy the sooner we can resolve this.

> Maybe there should be a command line option (that is turned off by default) that tells the compiler to ignore invalid inline-asm statements and keep on going, but in that case we would have to teach the backend to handle invalid constraints or drop the invalid inline-asm in the front-end (the IR for the inline-asm statement will not be emitted).

We could immediately use that, then clean up our code should we cause compiler crashes.

>Interestingly, if the datatype of 'ret' is just "long", the above code does compile. Because for a "long", LLVM evidently *does* manage to optimise away that asm statement before attempting to validate it? 
> Some consistency here would be useful.
> since I've just made you notice that you *ought* to be erroring on some of the other asm statements in there too.

Seems fair.
Comment 14 dwmw2@infradead.org 2018-03-13 02:46:17 PDT
(In reply to Reid Kleckner from comment #7)
> Akira added this check in r218064 in 2014.
> 
> This seems like something that might be better implemented in Linux with
> __attribute__((overloadable)) or _Generic. Otherwise we will have to delay
> diagnosing invalid inline asm constraints until after optimization, and this
> generally provides a worse user experience.

Strictly speaking, you don't have to delay *diagnosing* invalid asm; you only need to delay *aborting*. You can effectively turn it into asm("#error you screwed up if we're ever emitting this...") in the no-integrated-as case, perhaps?

I am open to all suggestions, really. But you *should* make the "=q" case and the "=r" case consistent; one of those barfs on a uint64_t and the other doesn't, which is clearly wrong. And *that* means my existing hack patch isn't going to work since it only works around the issue for "=q".

If we can move straight to a "fix" rather than me upstreaming a Linux patch which may well break before a real fix comes along, that would be best. Or do you recommend that I expand my hack in Linux to cover all the cases? I can't guarantee that would be accepted...
Comment 15 Nick Desaulniers 2018-03-13 09:22:07 PDT
> Or do you recommend that I expand my hack in Linux to cover all the cases?

Where are the other occurrences? I was aware of just the one.
Comment 16 dwmw2@infradead.org 2018-03-13 12:49:28 PDT
(In reply to Nick Desaulniers from comment #15)
> Where are the other occurrences? I was aware of just the one.

This one fails (even when tautologically optimised out):

	uint64_t ret;
	asm ("movb $5, %0" : "=q" (ret));

If that's going to, then this one *should* fail, and I half expect it to start now that I keep pointing it out:

	uint64_t ret;
	asm ("movb $5, %0" : "=r" (ret));

It is the latter which I'm referring to when I say "other occurrences". I don't want to paper over only the first, and then find that the second starts failing too.
Comment 17 Nick Desaulniers 2018-03-19 10:25:39 PDT
Looks like another case of this in __raw_cmpxchg() in arch/x86/include/asm/cmpxchg.h was just reported upstream in linux-next.
Comment 18 SedatDilek 2018-05-06 02:41:10 PDT
As far as I understand the percpu issue was fixed in Linus tree by...

commit 22636f8c9511245cb3c8412039f1dd95afb3aa59
"x86/asm: Add instruction suffixes to bitops"

Nevertheless, I was hitting the __raw_cmpxchg() problem in Linux v4.7-rc3+, too.

Personally, I am using binutils/ld (here: version 2.30) and clang-7 (here: version 1:7~svn330207-1~exp1+0~20180417201234.1709~1.gbp6fb10d) from <apt.llvm.org> to compile my Linux-kernels on Debian/testing AMD64.

Regards,
Sedat
Comment 19 SedatDilek 2018-07-23 07:22:31 PDT
Jan Beulich writes in [1]:

"...Short of the clang folks being able to point
out a suitable compiler level workaround, did anyone consider replacing the
expressions with the casts to u64 by invocations of arch_cmpxchg64() /
arch_cmpxchg64_local()? Later code in asm-generic/atomic-instrumented.h
suggests these symbols are required to be defined anyway.

The only other option I see is to break out the __X86_CASE_Q cases into
separate macros (evaluating to nothing or BUILD_BUG_ON(1) for 32-bit).
The definition of __X86_CASE_Q for 32-bit is bogus anyway - the
comment saying "sizeof will never return -1" is meaningless, because for
the comparison with the expression in switch() the constant from the case
label is converted to the type of that expression (i.e. size_t) anyway, i.e.
the value compared against is (size_t)-1, which is a value the compiler
can't prove that it won't be returned by sizeof() (despite that being
extremely unlikely)."

Can someone look at this?

[1] https://www.spinics.net/lists/linux-mm/msg153655.html
Comment 20 SedatDilek 2018-07-23 07:36:01 PDT
When forcing to build with -O0 instead of default -O2 I see:

./include/asm-generic/atomic-instrumented.h:390:3: error: array size is negative
                BUILD_BUG_ON(sizeof(unsigned long) != 8);
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:66:52: note: expanded from macro 'BUILD_BUG_ON'
#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
                                                   ^~~~~~~~~~~~~~~~~~~

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/asm-generic/atomic-instrumented.h#n390
Comment 21 Nick Desaulniers 2018-11-04 18:27:14 PST
(In reply to Reid Kleckner from comment #11)
> (In reply to Akira Hatanaka from comment #10)
> > I still think it's valuable to inform users that there is something wrong in
> > the inline-asm statement, so I don't think we want to just drop the
> > diagnostic completely. Maybe there should be a command line option (that is
> > turned off by default) that tells the compiler to ignore invalid inline-asm
> > statements and keep on going, but in that case we would have to teach the
> > backend to handle invalid constraints or drop the invalid inline-asm in the
> > front-end (the IR for the inline-asm statement will not be emitted).
> We could add a flag to disable it, but flags suck. We could also throw it
> under the bus of -fno-integrated-as, if Linux already uses that.

(In reply to dwmw2@infradead.org from comment #14)
> Strictly speaking, you don't have to delay *diagnosing* invalid asm; you
> only need to delay *aborting*. You can effectively turn it into asm("#error
> you screwed up if we're ever emitting this...") in the no-integrated-as
> case, perhaps?

I've been thinking about this issue more since testing Alexander Ivchenko's asm goto llvm+clang patches.  It seems that the Linux kernel has multiple instances of code that is only semantically valid AFTER inlining has been performed, which tightly couples the codebase to GCC.  So literally code that's only valid in GCC at -O2 (not even at -O0, which is not used in the kernel though, AFAIK).

These instances are typically extended inline asm statements in always_inline static functions, whose constraints are only valid after inlining.  This can be hacked around by converting these always_inline static functions to macros, which throws out type safety (unless you get clever with __builtin_types_compatible_p :^) ), and that there ends up be multiple nested static always_inline functions for which multiple functions must then become converted to macros to make the inline asm constraints semantically valid.  In the end, I think I had to convert maybe 5 functions roughly 20 lines on average per function in the x86 kernel submodule to macros to get code using asm-goto to compile in Clang.

Going back to David's suggestion in comment #14, wouldn't disabling inline asm constraint checking when -fno-integrated-as is set be a possible solution? Akira's comment #10 talks about the backend, but for the kernel, we don't care (yet) because we use -fno-integrated-as. (I would like to pursue replacing GAS with Clang someday, but I think there's a long tail of missing compatibility there).

There's little point to try to improve user experience when they use -fno-integrated-as, as they're leaving Clang land. Users of -fno-integrated-as should understand that is the bargain they're signing up for.

I think it's reasonable to disable inline asm constraint semantic analysis when using -fno-integrated-as in Clang, instead deferring to the external assembler to report diagnostics.  Thoughts?
Comment 22 Nick Desaulniers 2020-05-12 13:28:25 PDT
In response to comment #16 and comment #8, I have https://reviews.llvm.org/D79804.

We're actively discussing this case on LKML right now, as it's the last blocker for 32b x86 Linux kernel builds w/ Clang.
https://lore.kernel.org/lkml/20200504230309.237398-1-ndesaulniers@google.com/T/#u
(see thread index at bottom).

I'm inclined to accept this difference between GCC and Clang, modify the kernel code, and close this as wont fix.  If someday we have an MLIR dialect used in Clang, we can maybe revisit performing semantic analysis such as inline asm constrain validation after an early dead code elimination.
Comment 23 Paul Menzel 2021-09-06 14:01:53 PDT
For reference, do you have the Linux commits “working around” this issue?
Comment 24 Paul Menzel 2021-09-06 14:05:28 PDT
The two commits below have a link tag with this LLVM bug URL:

*   a9c640ac96e1 (x86/boot: Restrict header scope to make Clang happy), https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a9c640ac96e19b3966357ec9bb586edd2e1e74de
*   158807de5822 (x86/uaccess: Make __get_user_size() Clang compliant on 32-bit), https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=158807de5822d1079e162a3762956fd743dd483e