LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 24545 - [x86] Silly code generation for _addcarry_u32/_addcarry_u64
Summary: [x86] Silly code generation for _addcarry_u32/_addcarry_u64
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: X86 (show other bugs)
Version: trunk
Hardware: PC Linux
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-22 00:20 PDT by Melissa
Modified: 2019-01-27 02:37 PST (History)
8 users (show)

See Also:
Fixed By Commit(s): 348727,352289,352289,352291


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Melissa 2015-08-22 00:20:28 PDT
x86 intrinsics _addcarry_u32 and _addcarry_u64 generate silly code.  For example, the following function to get the result of a 64-bit addition (the XOR is to the output clearer):

	u64 testcarry(u64 a, u64 b, u64 c, u64 d)
	{
		u64 result0, result1;
		_addcarry_u64(_addcarry_u64(0, a, c, &result0), b, d, &result1);
		return result0 ^ result1;
	}

This is the code generated with -O1, -O2 and -O3:

	xor	eax, eax
	add	al, -1
	adc	rdi, rdx
	mov	qword ptr [rsp - 8], rdi
	setb	al
	add	al, -1
	adc	rsi, rcx
	mov	qword ptr [rsp - 16], rsi
	xor	rsi, qword ptr [rsp - 8]
	mov	rax, rsi
	ret

The first silliness is that _addcarry_u64 does not optimize a compile-time constant 0 being the first carry parameter.  Instead of "adc", it should just use "add".

The second silliness is with the use of r8b to store the carry flag, then using "add r8b, -1" to put the result back into carry.

The third silliness is that _addcarry_u64 is taking its pointer parameter too literally; it shouldn't be storing values to memory at all.

Instead, the code should be something like this:

	add	rdx, rdi
	mov	rax, rdx
	adc	rcx, rsi
	xor	rax, rcx
	ret

Naturally, for something this simple, I'd use unsigned __int128, but this came up in large number math.

Cross-filed with GCC (with different generated code since it's a different compiler =^-^=):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67317
Comment 1 deadalnix 2017-02-05 07:49:39 PST
I have been working on a serie of patch to improve that situation. Review are welcome.

https://reviews.llvm.org/D29541
https://reviews.llvm.org/D29528
https://reviews.llvm.org/D29524
https://reviews.llvm.org/D29443
Comment 2 Arthur O'Dwyer 2018-11-10 22:25:28 PST
https://godbolt.org/z/JsSJR8
With Clang trunk as of 2018-11-11, this simple "addcarry" now correctly generates optimal code, which is great:

    void badadd(u64& a, const u64& b)
    {
        _addcarry_u64(0, a, b, &a);
    }

    movq    (%rsi), %rax
    addq    %rax, (%rdi)
    ret

But the equivalent code for "subborrow" generates suboptimal code:

    void badsub(u64& a, const u64& b)
    {
        _subborrow_u64(0, a, b, &a);
    }

    movq    (%rdi), %rax
    subq    (%rsi), %rax
    movq    %rax, (%rdi)
    ret

"badsub" SHOULD generate the optimal code of

    movq    (%rsi), %rax
    subq    %rax, (%rdi)
    ret
Comment 3 Arthur O'Dwyer 2018-11-10 22:50:22 PST
Both of these trivial functions generate suboptimal code:
https://godbolt.org/z/L4nbS2

    void plusequal(u64 *p, u64 x) {
        _addcarry_u64(0, *p, x, p);
    }

    void minusequal(u64 *p, u64 x) {
        _subborrow_u64(0, *p, x, p);
    }

"plusequal" generates

    xorl %eax, %eax
    addb $-1, %al
    adcq %rsi, (%rdi)
    retq

instead of

    addq %rsi, (%rdi)
    retq

"minusequal" generates

    xorl %eax, %eax
    addb $-1, %al
    sbbq %rsi, (%rdi)
    retq

instead of

    subq %rsi, (%rdi)
    retq
Comment 4 Simon Pilgrim 2018-12-09 05:08:59 PST
None of these look great with clang trunk at the moment (Arthur - your godbolt in Comment #2 was using gcc unfortunately....).

https://godbolt.org/z/8rfUWk

Adding Craig as he's been cleaning up the X86 EFLAGS support recently.
Comment 5 Craig Topper 2018-12-09 10:05:14 PST
I made a fix the case in Comment 3 in r348727. I wouldn't be surprised if there are more bugs out there.
Comment 6 Arthur O'Dwyer 2018-12-11 09:01:05 PST
@Craig, that's awesome! I confirm that I'm seeing perfect codegen in more cases than I used to. :)

Were you inviting me to submit more suboptimal cases? Here's another one where I'm trying to generate a carry-in to _subborrow_u64 and it's not quite getting it:
https://godbolt.org/z/Km0CAX

    bool less_than_ideal(u64 xlo, u64 xhi, u64 *y) {
        bool cf = (xlo < y[0]);
        return _subborrow_u64(cf, xhi, y[1], &xhi);
    }

Should generate:

    cmpq (%rdx), %rdi
    sbbq 8(%rdx), %rsi
    setb %al
    retq

Currently generates:

    cmpq %rdi, (%rdx)
    seta %al
    addb $-1, %al
    sbbq 8(%rdx), %rsi
    setb %al
    retq
Comment 7 Simon Pilgrim 2019-01-25 11:03:40 PST
This last one seems to be due to us not being able to commute compares to get the necessary X86::COND_B

https://godbolt.org/z/ZrzIfD

Craig this looks like we can do this in combineCarryThroughADD?
Comment 8 Simon Pilgrim 2019-01-26 05:09:25 PST
(In reply to Simon Pilgrim from comment #7)
> This last one seems to be due to us not being able to commute compares to
> get the necessary X86::COND_B
> 
> https://godbolt.org/z/ZrzIfD
> 
> Craig this looks like we can do this in combineCarryThroughADD?

Candidate Patch: https://reviews.llvm.org/D57281
Comment 9 Simon Pilgrim 2019-01-27 02:37:20 PST
Fixed [Comment #6] in rL352289 (with a followup at rL352291).