LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 3812 - improper handling of +m constraints
Summary: improper handling of +m constraints
Status: RESOLVED INVALID
Alias: None
Product: tools
Classification: Unclassified
Component: llvm-gcc (show other bugs)
Version: trunk
Hardware: PC other
: P normal
Assignee: Unassigned Clang Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-03-15 04:48 PDT by Alex Hornung
Modified: 2010-03-06 13:58 PST (History)
6 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Hornung 2009-03-15 04:48:45 PDT
When compiling the following test.c (-S -O), the assembly generated by clang ignores that eax is being used for something else than the parameter, and recklessly puts the function parameter into eax, so that due to the subl %eax, %eax there will be a page fault when accessing (%eax). GCC instead chooses ebx for the parameter as it sees that eax is being used for something different.
 

test.c
-----------------------------------------------
int
atomic_intr_cond_try(int *p)
{
	int ret;

	__asm __volatile("incl %0; " \
			 "1: ;" \
			 "subl %%eax,%%eax; " \
			 "btsl $31,%0; jnc 2f; " \
			 "decl %0; " \
			 "movl $1,%%eax;" \
			 "2: ;" \
			 : "+m" (*p), "=a"(ret) \
			 : : "cx", "dx");
	return (ret);
}




CCC/clang output:
-----------------------------------------------
.globl	atomic_intr_cond_try
	.type	atomic_intr_cond_try,@function
atomic_intr_cond_try:
	pushl	%ebp
	movl	%esp, %ebp
	subl	$4, %esp
	movl	8(%ebp), %eax
#APP
	incl (%eax); 1: ;subl %eax,%eax; btsl $31,(%eax); jnc 2f; decl (%eax); movl $1,%eax;2: ;
#NO_APP
	movl	%eax, -4(%ebp)
	addl	$4, %esp
	popl	%ebp
	ret





GCC output:
-----------------------------------------------
.globl atomic_intr_cond_try
	.type	atomic_intr_cond_try, @function
atomic_intr_cond_try:
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%ebx
	movl	8(%ebp), %ebx
#APP
	incl (%ebx); 1: ;subl %eax,%eax; btsl $31,(%ebx); jnc 2f; decl (%ebx); movl $1,%eax;2: ;
#NO_APP
	popl	%ebx
	leave
	ret
Comment 1 Chris Lattner 2009-03-16 13:33:52 PDT
Clang is correct here, and you just happen to be getting *lucky* with GCC (it will do the same thing as clang in some cases).  The compiler doesn't scan your inline asm string to see that you're explicitly using EAX, you have to tell it that it does.  The compiler is allowed to reuse an output register for inputs.  Please see the GCC manual on this, in particular the discussion of early clobbers and clobber lists.

What application does this code come from?
Comment 2 Alex Hornung 2009-03-16 14:07:43 PDT
Sorry Chris,
my bad. now that you mention clobber lists I remember. I haven't used gcc inline assembly for a few years now.
The code is from DragonFly BSD, I'll push for a fix later today.

Thanks,
Alex
Comment 3 Chris Lattner 2009-03-16 14:44:10 PDT
Great, I appreciate you pushing the fix upstream!
Comment 4 Matthieu castet 2009-03-16 15:21:04 PDT
No the bug is valid : try to set eax as clobber with gcc and will get a nice error.

You don't need to set eax as clobber because "=a" constrait means [1] use eax register as write operand.

The correct code should be something like [2] where %1 is used instead of hardcoding %eax.

But clang still misscompile it affecting %0 and %1 to eax.




[1] http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/Machine-Constraints.html#Machine-Constraints

[2]
int
atomic_intr_cond_try(int *p)
{
        int ret;

        __asm __volatile("incl %0; " \
                         "1: ;" \
                         "subl %1,%1; " \
                         "btsl $31,%0; jnc 2f; " \
                         "decl %0; " \
                         "movl $1,%1;" \
                         "2: ;" \
                         : "+m" (*p), "=a"(ret) \
                         : : "cx", "dx");
        return (ret);
}
Comment 5 Alex Hornung 2009-03-16 15:43:47 PDT
Updated output of gcc and ccc with Matthieu's suggested code:

CCC/clang output:
-----------------------------------------------
.globl  atomic_intr_cond_try
        .type   atomic_intr_cond_try,@function
atomic_intr_cond_try:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $4, %esp
        movl    8(%ebp), %eax
        #APP
        incl (%eax); 1: ;subl %eax,%eax; btsl $31,(%eax); jnc 2f; decl (%eax); movl $1,%eax;2: ;
        #NO_APP
        movl    %eax, -4(%ebp)
        addl    $4, %esp
        popl    %ebp
        ret





GCC output:
-----------------------------------------------
.globl atomic_intr_cond_try
        .type   atomic_intr_cond_try, @function
atomic_intr_cond_try:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        movl    8(%ebp), %ebx
#APP
        incl (%ebx); 1: ;subl %eax,%eax; btsl $31,(%ebx); jnc 2f; decl (%ebx); movl $1,%eax;2: ;
#NO_APP
        popl    %ebx
        leave
        ret
Comment 6 Chris Lattner 2009-03-17 17:12:30 PDT
I'm not sure what you mean here.  You seem to think that use of EAX as an output prevents it from being used as an input.  This is clearly not the case:

int
test(int x)
{
        int ret;

        __asm __volatile("foo %0 %1 " 
                         : "=a"(ret) 
                         : "r"(x) : "cx", "dx");
        return (ret);
}

GCC compiles this to: 

...
	foo %eax %eax 
...
Comment 7 Chris Lattner 2009-03-17 17:14:29 PDT
Ah, the issue here is that the mem constraint is both an input *and* an output, and because it is an output, it should conflict with the eax.  Here's a reduced testcase:

int test(int *x) {
        int ret;
        __asm __volatile("foo %0 %1 " 
                         : "+m"(*x), "=a"(ret) 
                         : : "cx", "dx");
        return (ret);
}

We compile it to:

_test:
	movl	4(%esp), %eax
	## InlineAsm Start
	foo (%eax) %eax 
	## InlineAsm End
	ret


Comment 8 Matthieu castet 2009-03-17 17:32:42 PDT
Note that it happen for other register constrain that "a" :

int test(int *x) {
    int ret;
    __asm __volatile("foo %0 %1 "
            : "+m"(*x), "=r"(ret)
            :  );
    return (ret);
}


it seems the register allocator don't see the 2 output constraint
Comment 9 Chris Lattner 2009-03-17 18:27:20 PDT
Ok, there are deeper problems here.  Consider this:

int
test(int *x)
{
        int ret;

        __asm __volatile("foo %0 %1 %2 " 
                         : "+m"(*x), "=a"(ret) 
                         : : "cx", "dx");
        return (ret);
}

We compile this at -O0 to:
	foo (%eax) %eax (%esi) 

Here we aren't even constraining the input and output mem to be the same!

If we look at the .ll file, we see this asm:

	%asmtmp = call i32 asm sideeffect "foo $0 $1 $2 ", "=*m,={ax},*m,~{dirflag},~{fpsr},~{flags},~{dx},~{cx}"(i32* %1, i32* %2) nounwind		; <i32> [#uses=1]

indeed the two mems are represented as distinct pointers with nothing that ties them together.
Comment 10 Eli Friedman 2009-04-30 20:16:19 PDT
I just took a quick look; this is actually a general issue with output memory constraints:

int
test(int *x)
{
        int ret;

        __asm __volatile("foo %0 %1 "
                         : "=m"(*x), "=a"(ret) 
                         : : "cx", "dx");
        return (ret);
}

Relevant bit of output:
	#APP
	foo (%eax) %eax 
	#NO_APP

Clearly not going to work...
Comment 11 Chris Lattner 2009-05-04 01:37:02 PDT
I think that the right fix for this has to happen at both the llvm IR level and the machineinstr level: a given mem operand needs to know if it is input or output.  At RA time, this would cause the registers that are part of the mem to be live in or live out, and properly conflict as needed.
Comment 12 Eli Friedman 2009-05-04 16:27:13 PDT
(In reply to comment #11)
> I think that the right fix for this has to happen at both the llvm IR level and
> the machineinstr level: a given mem operand needs to know if it is input or
> output.  At RA time, this would cause the registers that are part of the mem to
> be live in or live out, and properly conflict as needed.

I think we already keep track of it properly at the IR level.  Or is there some obvious issue I'm missing?
Comment 13 Jan Seiffert 2009-09-04 17:14:51 PDT
I hate to crash the Party, but this inline asm is invalid from the start, as Chris already noted.

The m constrain and its modifiers refer to the memory location, as input and/or output, NOT to the means to get "there".

The m constrain could generate a constant address:
btsl $31, 0xdeadbeef
or a pic address:
btsl $31, 0x1234(%ebx)
or a stack reference:
btsl $31, -8(%esp)
or as in this case a tmp pointer which has to be stored in a register.
This is transparent to the inline asm, as requested by the m constrain.
Any input or output semantics refer to the memory location.

That the tmp pointer can alias with the other register is "normal" due to how the GCC inline asm model works (seen as "atomic", outputs are created while inputs (even virtual) are consumed in one step, refer to the gcc manual).
That it does not happen with gcc is a mere implementation detail, maybe a heuristic which chooses another register as long as one is "easily available".

And this is also wanted, a
movl (%eax) %eax
is legal and be the right thing if the pointer (%eax) is not needed any more afterwards to reduce register pressure.

Also the last example from Chris is fine in this respect. Only if the processor model says a read register (even if only to read an address from it) can not be written to in the same instruction (pipeline model, what ever), than this would be illegal. How foo creates 2 outputs without an input is another matter.

So this inline asm fails to explicitly pass in the pointer in a register because  the pointer is needed several times even after the outputs are "generated" (or better say touched). *Additionally* you need the m constrain to tell the compiler "Hey, i also modified memory, don't cache it around this inline asm".

All you can say about this is that it is inconvenient, but thats the way the inline asm model works.

so you could write it as:
int
atomic_intr_cond_try(int *p)
{
	int ret;

	__asm __volatile(
		"incl	(%2)\n"
		"1:\n\t"
		"subl	%1, %1\n\t"
		"btsl	$31, (%2)\n\t"
		"jnc	2f\n\t"
		"decl	(%2)\n\t"
		"movl	$1,%1\n"
		"2:"
	: /* %0 */ "=m" (*p),
	  /* %1 */ "=r" (ret),
	  /* %2 */ "=r" (p)
	: /* %3 */ "m" (*p),
	  /* %4 */ "2" (p));

	return ret;
}

(If the compiler fails to put ret into %eax right from the start, then...)
Note that i try not to use the + modifier, because it is buggy in some older gcc (3 series) and always a source of surprise.
And if you do not need a real 1 as return value but a -1 would do, you can use a sbb like this, and even get away with reusing the reg:
int
atomic_intr_cond_try(int *p)
{
	int ret;

	__asm __volatile(
		"incl	%0\n"
		"btsl	$31, %0\n\t"
		"jnc	2f\n\t"
		"decl	%0\n"
		"2:\n\t"
		"sbb	%1,%1\n"
	: /* %0 */ "=m" (*p),
	  /* %1 */ "=r" (ret)
	: /* %2 */ "m" (*p));

	return ret;
}


Greetings
Comment 14 Dale Johannesen 2009-09-04 18:03:22 PDT
Sorry, I didn't see this one earlier for some reason.

#13 is right, the registers involved in constructing a memory address are *input* registers not outputs, and can conflict with outputs.    The "+" refers to the memory location, not to any registers involved in computing its address.  As #13 points out the generated code for #10 is OK and could be desirable in some circumstances.

The example in #9 is botched somehow, there is no second memory operand.

I think the best fix for the original code is to mark EAX as earlyclobber by writing it "=&a(ret)", which  prevents it from being used as an input register elsewhere.   See gcc documentation.  That works in LLVM TOT (although it's possible it didn't when this PR was filed).