In some C code, the generated x86_64 code gets worse when I replace a variable by a constant. Example: typedef unsigned long T; void sub (T *p, T u, T v, T w) { p[0] = w - v; p[1] = u - (w < v); } void sub0 (T *p, T u, T v) { T w = 0; p[0] = w - v; p[1] = u - (w < v); } void sub1 (T *p, T u, T v) { T w = 1; p[0] = w - v; p[1] = u - (w < v); } sub0 and sub1 are like sub, except that w is 0 and 1 respectively. With Clang 3.9.1 under Debian/unstable, I get from "clang-3.9 -O3 -S": sub: # @sub .cfi_startproc # BB#0: subq %rdx, %rcx movq %rcx, (%rdi) sbbq $0, %rsi movq %rsi, 8(%rdi) retq sub0: # @sub0 .cfi_startproc # BB#0: movq %rdx, %rax negq %rax movq %rax, (%rdi) cmpq $1, %rdx adcq $-1, %rsi movq %rsi, 8(%rdi) retq sub1: # @sub1 .cfi_startproc # BB#0: movl $1, %eax subq %rdx, %rax movq %rax, (%rdi) xorl %eax, %eax cmpq $1, %rdx seta %al subq %rax, %rsi movq %rsi, 8(%rdi) retq For sub0 and sub1, I would expect sbbq $0, %rsi like for sub.
Same issue with add-with-carry (adc): typedef unsigned long T; void add (T *p, T u, T v, T w) { p[0] = v + w; p[1] = u + (p[0] < v); } void add1 (T *p, T u, T v) { T w = 1; p[0] = v + w; p[1] = u + (p[0] < v); } gives: add: # @add .cfi_startproc # BB#0: addq %rcx, %rdx movq %rdx, (%rdi) adcq $0, %rsi movq %rsi, 8(%rdi) retq add1: # @add1 .cfi_startproc # BB#0: leaq 1(%rdx), %rax movq %rax, (%rdi) xorl %eax, %eax cmpq $-1, %rdx sete %al addq %rsi, %rax movq %rax, 8(%rdi) retq
*** Bug 38590 has been marked as a duplicate of this bug. ***
I believe I'm seeing basically this same issue. Here's my C++ test case: https://godbolt.org/z/eZWlsG ---- #include <x86intrin.h> using u64 = unsigned long long; template<u64 K> void test(u64& alo, u64& ahi) { u64 blo = K; u64 bhi = 0; bool cf = (alo += blo) < blo; _addcarry_u64(cf, ahi, bhi, &ahi); } template void test<0>(u64&, u64&); // mildly awful template void test<1>(u64&, u64&); // awful template void test<2>(u64&, u64&); // awesome template void test<3>(u64&, u64&); // awesome ----- Clang trunk does really awesome on test<N> for most N (see codegen below). It generates a redundant store for test<0>. The code for test<1> is just bizarre. I think what's happening on test<1> is that the optimization that turns (x < y) into "add/adc" is fighting against an optimization that turns (x < 1) into (x != 0). Similar glitches happen at N=0x8000000000000000 and N=(-1). test<1> is the most important case, because it corresponds to big integer "operator++". It would be nice to get the optimal codegen for this case. test<0>: addq $0, (%rsi) retq test<1>: addq $1, (%rdi) sete %al addb $-1, %al adcq $0, (%rsi) retq test<2>: addq $2, (%rdi) adcq $0, (%rsi) retq test<3>: addq $3, (%rdi) adcq $0, (%rsi) retq test<4>: addq $4, (%rdi) adcq $0, (%rsi) retq
rL352210 helps with test<0>
Another piece of the puzzle: https://reviews.llvm.org/D57516
https://reviews.llvm.org/D57637 / https://reviews.llvm.org/rL352984 ...handles at least part of the problem with x86-specific changes. And things should be better generically with: https://reviews.llvm.org/rL352998 https://reviews.llvm.org/rL353001 But none of these addresses the original subtraction overflow bugs from the problem description. I think we need to add the 'usubo' sibling to CGP for at least part of that, but it's not a simple extension because the pattern in IR for subtraction overflow is not the same as addition overflow (see bug 40487)
CGP 'usubo' patch for review: https://reviews.llvm.org/D57789
The 'sub1' example from the description should be improved after: https://reviews.llvm.org/rL354298 _sub1: movl $1, %eax subq %rdx, %rax movq %rax, (%rdi) sbbq $0, %rsi movq %rsi, 8(%rdi) retq Any way to improve on that still?
The 'sub0' case should be fixed here: https://reviews.llvm.org/rL354519
https://godbolt.org/z/vMyJNI Of Vincent's original sub/add cases only the add1 looks like its still having problems.
(In reply to Simon Pilgrim from comment #10) > https://godbolt.org/z/vMyJNI > > Of Vincent's original sub/add cases only the add1 looks like its still > having problems. That's the last problem that I was tracking for this report too. Should be fixed after: https://reviews.llvm.org/rL354746 Feel free to reopen and/or file new bugs (like bug 40825) for missing pieces.