When compiling the following test.c (-S -O), the assembly generated by clang ignores that eax is being used for something else than the parameter, and recklessly puts the function parameter into eax, so that due to the subl %eax, %eax there will be a page fault when accessing (%eax). GCC instead chooses ebx for the parameter as it sees that eax is being used for something different. test.c ----------------------------------------------- int atomic_intr_cond_try(int *p) { int ret; __asm __volatile("incl %0; " \ "1: ;" \ "subl %%eax,%%eax; " \ "btsl $31,%0; jnc 2f; " \ "decl %0; " \ "movl $1,%%eax;" \ "2: ;" \ : "+m" (*p), "=a"(ret) \ : : "cx", "dx"); return (ret); } CCC/clang output: ----------------------------------------------- .globl atomic_intr_cond_try .type atomic_intr_cond_try,@function atomic_intr_cond_try: pushl %ebp movl %esp, %ebp subl $4, %esp movl 8(%ebp), %eax #APP incl (%eax); 1: ;subl %eax,%eax; btsl $31,(%eax); jnc 2f; decl (%eax); movl $1,%eax;2: ; #NO_APP movl %eax, -4(%ebp) addl $4, %esp popl %ebp ret GCC output: ----------------------------------------------- .globl atomic_intr_cond_try .type atomic_intr_cond_try, @function atomic_intr_cond_try: pushl %ebp movl %esp, %ebp pushl %ebx movl 8(%ebp), %ebx #APP incl (%ebx); 1: ;subl %eax,%eax; btsl $31,(%ebx); jnc 2f; decl (%ebx); movl $1,%eax;2: ; #NO_APP popl %ebx leave ret
Clang is correct here, and you just happen to be getting *lucky* with GCC (it will do the same thing as clang in some cases). The compiler doesn't scan your inline asm string to see that you're explicitly using EAX, you have to tell it that it does. The compiler is allowed to reuse an output register for inputs. Please see the GCC manual on this, in particular the discussion of early clobbers and clobber lists. What application does this code come from?
Sorry Chris, my bad. now that you mention clobber lists I remember. I haven't used gcc inline assembly for a few years now. The code is from DragonFly BSD, I'll push for a fix later today. Thanks, Alex
Great, I appreciate you pushing the fix upstream!
No the bug is valid : try to set eax as clobber with gcc and will get a nice error. You don't need to set eax as clobber because "=a" constrait means [1] use eax register as write operand. The correct code should be something like [2] where %1 is used instead of hardcoding %eax. But clang still misscompile it affecting %0 and %1 to eax. [1] http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/Machine-Constraints.html#Machine-Constraints [2] int atomic_intr_cond_try(int *p) { int ret; __asm __volatile("incl %0; " \ "1: ;" \ "subl %1,%1; " \ "btsl $31,%0; jnc 2f; " \ "decl %0; " \ "movl $1,%1;" \ "2: ;" \ : "+m" (*p), "=a"(ret) \ : : "cx", "dx"); return (ret); }
Updated output of gcc and ccc with Matthieu's suggested code: CCC/clang output: ----------------------------------------------- .globl atomic_intr_cond_try .type atomic_intr_cond_try,@function atomic_intr_cond_try: pushl %ebp movl %esp, %ebp subl $4, %esp movl 8(%ebp), %eax #APP incl (%eax); 1: ;subl %eax,%eax; btsl $31,(%eax); jnc 2f; decl (%eax); movl $1,%eax;2: ; #NO_APP movl %eax, -4(%ebp) addl $4, %esp popl %ebp ret GCC output: ----------------------------------------------- .globl atomic_intr_cond_try .type atomic_intr_cond_try, @function atomic_intr_cond_try: pushl %ebp movl %esp, %ebp pushl %ebx movl 8(%ebp), %ebx #APP incl (%ebx); 1: ;subl %eax,%eax; btsl $31,(%ebx); jnc 2f; decl (%ebx); movl $1,%eax;2: ; #NO_APP popl %ebx leave ret
I'm not sure what you mean here. You seem to think that use of EAX as an output prevents it from being used as an input. This is clearly not the case: int test(int x) { int ret; __asm __volatile("foo %0 %1 " : "=a"(ret) : "r"(x) : "cx", "dx"); return (ret); } GCC compiles this to: ... foo %eax %eax ...
Ah, the issue here is that the mem constraint is both an input *and* an output, and because it is an output, it should conflict with the eax. Here's a reduced testcase: int test(int *x) { int ret; __asm __volatile("foo %0 %1 " : "+m"(*x), "=a"(ret) : : "cx", "dx"); return (ret); } We compile it to: _test: movl 4(%esp), %eax ## InlineAsm Start foo (%eax) %eax ## InlineAsm End ret
Note that it happen for other register constrain that "a" : int test(int *x) { int ret; __asm __volatile("foo %0 %1 " : "+m"(*x), "=r"(ret) : ); return (ret); } it seems the register allocator don't see the 2 output constraint
Ok, there are deeper problems here. Consider this: int test(int *x) { int ret; __asm __volatile("foo %0 %1 %2 " : "+m"(*x), "=a"(ret) : : "cx", "dx"); return (ret); } We compile this at -O0 to: foo (%eax) %eax (%esi) Here we aren't even constraining the input and output mem to be the same! If we look at the .ll file, we see this asm: %asmtmp = call i32 asm sideeffect "foo $0 $1 $2 ", "=*m,={ax},*m,~{dirflag},~{fpsr},~{flags},~{dx},~{cx}"(i32* %1, i32* %2) nounwind ; <i32> [#uses=1] indeed the two mems are represented as distinct pointers with nothing that ties them together.
I just took a quick look; this is actually a general issue with output memory constraints: int test(int *x) { int ret; __asm __volatile("foo %0 %1 " : "=m"(*x), "=a"(ret) : : "cx", "dx"); return (ret); } Relevant bit of output: #APP foo (%eax) %eax #NO_APP Clearly not going to work...
I think that the right fix for this has to happen at both the llvm IR level and the machineinstr level: a given mem operand needs to know if it is input or output. At RA time, this would cause the registers that are part of the mem to be live in or live out, and properly conflict as needed.
(In reply to comment #11) > I think that the right fix for this has to happen at both the llvm IR level and > the machineinstr level: a given mem operand needs to know if it is input or > output. At RA time, this would cause the registers that are part of the mem to > be live in or live out, and properly conflict as needed. I think we already keep track of it properly at the IR level. Or is there some obvious issue I'm missing?
I hate to crash the Party, but this inline asm is invalid from the start, as Chris already noted. The m constrain and its modifiers refer to the memory location, as input and/or output, NOT to the means to get "there". The m constrain could generate a constant address: btsl $31, 0xdeadbeef or a pic address: btsl $31, 0x1234(%ebx) or a stack reference: btsl $31, -8(%esp) or as in this case a tmp pointer which has to be stored in a register. This is transparent to the inline asm, as requested by the m constrain. Any input or output semantics refer to the memory location. That the tmp pointer can alias with the other register is "normal" due to how the GCC inline asm model works (seen as "atomic", outputs are created while inputs (even virtual) are consumed in one step, refer to the gcc manual). That it does not happen with gcc is a mere implementation detail, maybe a heuristic which chooses another register as long as one is "easily available". And this is also wanted, a movl (%eax) %eax is legal and be the right thing if the pointer (%eax) is not needed any more afterwards to reduce register pressure. Also the last example from Chris is fine in this respect. Only if the processor model says a read register (even if only to read an address from it) can not be written to in the same instruction (pipeline model, what ever), than this would be illegal. How foo creates 2 outputs without an input is another matter. So this inline asm fails to explicitly pass in the pointer in a register because the pointer is needed several times even after the outputs are "generated" (or better say touched). *Additionally* you need the m constrain to tell the compiler "Hey, i also modified memory, don't cache it around this inline asm". All you can say about this is that it is inconvenient, but thats the way the inline asm model works. so you could write it as: int atomic_intr_cond_try(int *p) { int ret; __asm __volatile( "incl (%2)\n" "1:\n\t" "subl %1, %1\n\t" "btsl $31, (%2)\n\t" "jnc 2f\n\t" "decl (%2)\n\t" "movl $1,%1\n" "2:" : /* %0 */ "=m" (*p), /* %1 */ "=r" (ret), /* %2 */ "=r" (p) : /* %3 */ "m" (*p), /* %4 */ "2" (p)); return ret; } (If the compiler fails to put ret into %eax right from the start, then...) Note that i try not to use the + modifier, because it is buggy in some older gcc (3 series) and always a source of surprise. And if you do not need a real 1 as return value but a -1 would do, you can use a sbb like this, and even get away with reusing the reg: int atomic_intr_cond_try(int *p) { int ret; __asm __volatile( "incl %0\n" "btsl $31, %0\n\t" "jnc 2f\n\t" "decl %0\n" "2:\n\t" "sbb %1,%1\n" : /* %0 */ "=m" (*p), /* %1 */ "=r" (ret) : /* %2 */ "m" (*p)); return ret; } Greetings
Sorry, I didn't see this one earlier for some reason. #13 is right, the registers involved in constructing a memory address are *input* registers not outputs, and can conflict with outputs. The "+" refers to the memory location, not to any registers involved in computing its address. As #13 points out the generated code for #10 is OK and could be desirable in some circumstances. The example in #9 is botched somehow, there is no second memory operand. I think the best fix for the original code is to mark EAX as earlyclobber by writing it "=&a(ret)", which prevents it from being used as an input register elsewhere. See gcc documentation. That works in LLVM TOT (although it's possible it didn't when this PR was filed).