LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 31754 - replacing variable by a constant breaks add-with-carry (adc) and subtract-with-borrow (sbb) optimization
Summary: replacing variable by a constant breaks add-with-carry (adc) and subtract-wit...
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Common Code Generator Code (show other bugs)
Version: trunk
Hardware: PC Linux
: P normal
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
: 38590 (view as bug list)
Depends on: 40486
Blocks:
  Show dependency tree
 
Reported: 2017-01-25 06:06 PST by Vincent Lefevre
Modified: 2019-02-24 07:46 PST (History)
7 users (show)

See Also:
Fixed By Commit(s): 352210, 354746


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vincent Lefevre 2017-01-25 06:06:18 PST
In some C code, the generated x86_64 code gets worse when I replace a variable by a constant. Example:

typedef unsigned long T;

void sub (T *p, T u, T v, T w)
{
  p[0] = w - v;
  p[1] = u - (w < v);
}

void sub0 (T *p, T u, T v)
{
  T w = 0;
  p[0] = w - v;
  p[1] = u - (w < v);
}

void sub1 (T *p, T u, T v)
{
  T w = 1;
  p[0] = w - v;
  p[1] = u - (w < v);
}

sub0 and sub1 are like sub, except that w is 0 and 1 respectively.

With Clang 3.9.1 under Debian/unstable, I get from "clang-3.9 -O3 -S":

sub:                                    # @sub
        .cfi_startproc
# BB#0:
        subq    %rdx, %rcx
        movq    %rcx, (%rdi)
        sbbq    $0, %rsi
        movq    %rsi, 8(%rdi)
        retq

sub0:                                   # @sub0
        .cfi_startproc
# BB#0:
        movq    %rdx, %rax
        negq    %rax
        movq    %rax, (%rdi)
        cmpq    $1, %rdx
        adcq    $-1, %rsi
        movq    %rsi, 8(%rdi)
        retq

sub1:                                   # @sub1
        .cfi_startproc
# BB#0:
        movl    $1, %eax
        subq    %rdx, %rax
        movq    %rax, (%rdi)
        xorl    %eax, %eax
        cmpq    $1, %rdx
        seta    %al
        subq    %rax, %rsi
        movq    %rsi, 8(%rdi)
        retq

For sub0 and sub1, I would expect

        sbbq    $0, %rsi

like for sub.
Comment 1 Vincent Lefevre 2017-01-25 06:23:15 PST
Same issue with add-with-carry (adc):

typedef unsigned long T;

void add (T *p, T u, T v, T w)
{
  p[0] = v + w;
  p[1] = u + (p[0] < v);
}

void add1 (T *p, T u, T v)
{
  T w = 1;
  p[0] = v + w;
  p[1] = u + (p[0] < v);
}

gives:

add:                                    # @add
        .cfi_startproc
# BB#0:
        addq    %rcx, %rdx
        movq    %rdx, (%rdi)
        adcq    $0, %rsi
        movq    %rsi, 8(%rdi)
        retq

add1:                                   # @add1
        .cfi_startproc
# BB#0:
        leaq    1(%rdx), %rax
        movq    %rax, (%rdi)
        xorl    %eax, %eax
        cmpq    $-1, %rdx
        sete    %al
        addq    %rsi, %rax
        movq    %rax, 8(%rdi)
        retq
Comment 2 Ilya Lesokhin 2018-10-23 22:13:10 PDT
*** Bug 38590 has been marked as a duplicate of this bug. ***
Comment 3 Arthur O'Dwyer 2018-12-20 18:23:51 PST
I believe I'm seeing basically this same issue. Here's my C++ test case:
https://godbolt.org/z/eZWlsG

----

#include <x86intrin.h>
using u64 = unsigned long long;

template<u64 K>
void test(u64& alo, u64& ahi)
{
    u64 blo = K;
    u64 bhi = 0;
    bool cf = (alo += blo) < blo;
    _addcarry_u64(cf, ahi, bhi, &ahi);
}

template void test<0>(u64&, u64&);  // mildly awful
template void test<1>(u64&, u64&);  // awful
template void test<2>(u64&, u64&);  // awesome
template void test<3>(u64&, u64&);  // awesome

-----

Clang trunk does really awesome on test<N> for most N (see codegen below).
It generates a redundant store for test<0>.
The code for test<1> is just bizarre. I think what's happening on test<1> is that the optimization that turns (x < y) into "add/adc" is fighting against an optimization that turns (x < 1) into (x != 0).
Similar glitches happen at N=0x8000000000000000 and N=(-1).

test<1> is the most important case, because it corresponds to big integer "operator++". It would be nice to get the optimal codegen for this case.

test<0>:
  addq $0, (%rsi)
  retq
test<1>:
  addq $1, (%rdi)
  sete %al
  addb $-1, %al
  adcq $0, (%rsi)
  retq
test<2>:
  addq $2, (%rdi)
  adcq $0, (%rsi)
  retq
test<3>:
  addq $3, (%rdi)
  adcq $0, (%rsi)
  retq
test<4>:
  addq $4, (%rdi)
  adcq $0, (%rsi)
  retq
Comment 4 Simon Pilgrim 2019-01-27 02:41:04 PST
rL352210 helps with test<0>
Comment 5 Sanjay Patel 2019-01-31 08:13:07 PST
Another piece of the puzzle:
https://reviews.llvm.org/D57516
Comment 6 Sanjay Patel 2019-02-04 07:09:47 PST
https://reviews.llvm.org/D57637 / https://reviews.llvm.org/rL352984
...handles at least part of the problem with x86-specific changes. 

And things should be better generically with:
https://reviews.llvm.org/rL352998
https://reviews.llvm.org/rL353001

But none of these addresses the original subtraction overflow bugs from the problem description. I think we need to add the 'usubo' sibling to CGP for at least part of that, but it's not a simple extension because the pattern in IR for subtraction overflow is not the same as addition overflow (see bug 40487)
Comment 7 Sanjay Patel 2019-02-05 14:32:02 PST
CGP 'usubo' patch for review:
https://reviews.llvm.org/D57789
Comment 8 Sanjay Patel 2019-02-18 17:41:59 PST
The 'sub1' example from the description should be improved after:
https://reviews.llvm.org/rL354298

_sub1:
	movl	$1, %eax
	subq	%rdx, %rax
	movq	%rax, (%rdi)
	sbbq	$0, %rsi
	movq	%rsi, 8(%rdi)
	retq

Any way to improve on that still?
Comment 9 Sanjay Patel 2019-02-20 13:26:03 PST
The 'sub0' case should be fixed here:
https://reviews.llvm.org/rL354519
Comment 10 Simon Pilgrim 2019-02-24 07:33:16 PST
https://godbolt.org/z/vMyJNI

Of Vincent's original sub/add cases only the add1 looks like its still having problems.
Comment 11 Sanjay Patel 2019-02-24 07:41:46 PST
(In reply to Simon Pilgrim from comment #10)
> https://godbolt.org/z/vMyJNI
> 
> Of Vincent's original sub/add cases only the add1 looks like its still
> having problems.

That's the last problem that I was tracking for this report too.
Should be fixed after:
https://reviews.llvm.org/rL354746

Feel free to reopen and/or file new bugs (like bug 40825) for missing pieces.