Reviewing the addcarry cases from [Bug #31754 Comment #3]: https://godbolt.org/z/eZWlsG The IR demonstrates that much of the problem is that while CombineUAddWithOverflow can match the ult/ugt cases of the add result, it can't handle ult/ugt canonicalizations (against 1/-1/signbit/etc) that result in a icmp eq/ne instead: define void @_Z4testILy1EEvRyS0_(i64*, i64*) { %3 = load i64, i64* %0, align 8 %4 = add i64 %3, 1 store i64 %4, i64* %0, align 8 %5 = icmp eq i64 %4, 0 // <---- fails to match uaddo %6 = zext i1 %5 to i8 %7 = load i64, i64* %1, align 8 %8 = tail call { i8, i64 } @llvm.x86.addcarry.64(i8 %6, i64 %7, i64 0) #2 %9 = extractvalue { i8, i64 } %8, 1 store i64 %9, i64* %1, align 8 ret void } define void @_Z4testILy2EEvRyS0_(i64*, i64*) { %3 = load i64, i64* %0, align 8 %4 = add i64 %3, 2 store i64 %4, i64* %0, align 8 %5 = icmp ult i64 %4, 2 // <---- matches uaddo %6 = zext i1 %5 to i8 %7 = load i64, i64* %1, align 8 %8 = tail call { i8, i64 } @llvm.x86.addcarry.64(i8 %6, i64 %7, i64 0) #2 %9 = extractvalue { i8, i64 } %8, 1 store i64 %9, i64* %1, align 8 ret void } declare { i8, i64 } @llvm.x86.addcarry.64(i8, i64, i64)
Added tests at rL352316
I'm wondering if we should fix this via IR canonicalization rather than pattern-matching proliferation. For historical/codegen reasons, we have favored squashing IR select into math/logic as a canonicalization. But given that cmp/select may have profile metadata attached, I don't think that's the right choice so early in the optimization pipeline. We've reversed several of the obvious cases (where the math/logic sequence is longer) already after improving the backend. This one is instruction count neutral: https://rise4fun.com/Alive/r9s %z = zext i1 %x to i32 %s = sub i32 %y, %z => %ym1 = sub i32 %y, 1 %s = select i1 %x, i32 %ym1, i32 %y
This was raised during triage of the issues in [Bug #31754] - if those can be fixed without needing this fix then I'm happy for this bug to be discarded.
(In reply to Simon Pilgrim from comment #3) > This was raised during triage of the issues in [Bug #31754] - if those can > be fixed without needing this fix then I'm happy for this bug to be > discarded. Reference: https://reviews.llvm.org/D57453 So yes, there are independent holes raised by each. I'm still inclined to try to solve the problems generally because every time I've tried a tailored fix for a bug, its cousins come out of the woodwork. But that said, there is a simple fix for the matcher for the most important sub-problem in bug 31754. I'll post that soon.
(In reply to Sanjay Patel from comment #4) > ...there is a simple fix for the matcher for the most important > sub-problem in bug 31754. I'll post that soon. https://reviews.llvm.org/D57516
x86-specific IR improvement is here: https://reviews.llvm.org/rL352870 The matcher improvement (D57516) is likely ok, but it did raise a minor x86 codegen regression, partly addressed here: https://reviews.llvm.org/D57547 ...and there's an open question about Hexagon codegen too.
We should get the +1 case now: https://reviews.llvm.org/rL352998 And if that somehow escapes, then we have DAG matching to catch it too: https://reviews.llvm.org/rL352984 The add of SIGNED_MIN_VAL and -1 are harder (reduced from the previous godbolt link): https://godbolt.org/z/1T3CWx define void @_Z4testILj2147483648EEvRjS0_(i32* dereferenceable(4), i32* dereferenceable(4)) { %3 = load i32, i32* %0, align 4 %4 = xor i32 %3, -2147483648 store i32 %4, i32* %0, align 4 %5 = lshr i32 %3, 31 %6 = trunc i32 %5 to i8 %7 = load i32, i32* %1, align 4 %8 = tail call { i8, i32 } @llvm.x86.addcarry.32(i8 %6, i32 %7, i32 0) #2 %9 = extractvalue { i8, i32 } %8, 1 store i32 %9, i32* %1, align 4 ret void } define @_Z4testILj4294967295EEvRjS0_(i32* dereferenceable(4), i32* dereferenceable(4)) { %3 = load i32, i32* %0, align 4 %4 = add i32 %3, -1 store i32 %4, i32* %0, align 4 %5 = icmp ne i32 %3, 0 %6 = zext i1 %5 to i8 %7 = load i32, i32* %1, align 4 %8 = tail call { i8, i32 } @llvm.x86.addcarry.32(i8 %6, i32 %7, i32 0) #2 %9 = extractvalue { i8, i32 } %8, 1 store i32 %9, i32* %1, align 4 ret void } declare { i8, i32 } @llvm.x86.addcarry.32(i8, i32, i32) #1 --------------------------------------------------------------------------- For -1, we don't have a cmp that depends on the add, so we can't just tack another pattern into the matcher. With 0x80000000, there's no add or cmp to match; we have independent xor/lshr. Doing this in CGP is a lot of extra work for highly unlikely patterns. It's possible that this is easier to recognize in the DAG, but I'd still doubt that the effort is worth the payoff. I'm still looking at the general IR improvements, but that's held up by a preliminary fold in the DAG: https://reviews.llvm.org/D57401 So unless there's evidence that adding those 2 huge constants and checking for overflow is important, I think we shouldn't worry about those 2 cases.
Sanjay, thanks, that new patch is awesome! I think the 0x80000000 case is just a curiosity and I won't agitate for it. The 0xFFFFFFFF case is important, though, because it's what would be used by "BigInt::operator--()". https://github.com/Quuxplusone/WideIntProofOfConcept/commit/f8e692731b093f48797e411d6685e9106b9c9e35 Your patch improved my "preinc" and "postinc" cases, but it didn't touch my equally important "predec" and "postdec" cases. I think (but am not 100% sure) that if you fixed the 0xFFFFFFFF case, it would improve the "predec" and "postdec" cases. https://godbolt.org/z/XVB4hY
(In reply to Arthur O'Dwyer from comment #8) > Sanjay, thanks, that new patch is awesome! > I think the 0x80000000 case is just a curiosity and I won't agitate for it. > The 0xFFFFFFFF case is important, though, because it's what would be used by > "BigInt::operator--()". > > https://github.com/Quuxplusone/WideIntProofOfConcept/commit/ > f8e692731b093f48797e411d6685e9106b9c9e35 > Your patch improved my "preinc" and "postinc" cases, but it didn't touch my > equally important "predec" and "postdec" cases. I think (but am not 100% > sure) that if you fixed the 0xFFFFFFFF case, it would improve the "predec" > and "postdec" cases. > https://godbolt.org/z/XVB4hY Thanks for the info. I’m working on a CGP patch to form unsigned subtract with overflow, and I’m hoping that will work with the -1 case. Subtract is a bit more difficult to match as noted in bug 40487, but I’ll try to get that done soon.
(In reply to Arthur O'Dwyer from comment #8) > https://godbolt.org/z/XVB4hY Arthur - can you verify that we are getting those cases now in your original apps? I think we are still missing a couple of add patterns from bug 31754, but getting closer...
Sanjay: I confirm that I now see perfect codegen for Uint128::decrement()! If you're still looking for work ;) I see improvement, but not perfection, in Uint128::negate(). Here's an example. https://godbolt.org/z/BGRoQ4 xorl %eax, %eax xorl %ecx, %ecx subq (%rdi), %rcx sbbq 8(%rdi), %rax movq %rcx, (%rdi) movq %rax, 8(%rdi) retq I'm not sure if GCC's codegen is perfect, or if there's a way to use "sbbq" to save an instruction here somehow. negq (%rdi) adcq $0, 8(%rdi) negq 8(%rdi) ret
(In reply to Arthur O'Dwyer from comment #11) > Sanjay: I confirm that I now see perfect codegen for Uint128::decrement()! Great - thanks for checking that! > If you're still looking for work ;) I see improvement, but not perfection, > in Uint128::negate(). Here's an example. > https://godbolt.org/z/BGRoQ4 > xorl %eax, %eax > xorl %ecx, %ecx > subq (%rdi), %rcx > sbbq 8(%rdi), %rax > movq %rcx, (%rdi) > movq %rax, 8(%rdi) > retq > I'm not sure if GCC's codegen is perfect, or if there's a way to use "sbbq" > to save an instruction here somehow. > negq (%rdi) > adcq $0, 8(%rdi) > negq 8(%rdi) > ret Hmmm...those 2 tests are independent of what we've done so far: 1. In the uint128 case, we're not forming the overflow intrinsic in IR (CGP) because that's not a legal type (native type for x86). 2. In the uint64 case, we are forming the overflow intrinsic now (so it's better than before), but the x86 lowering needs to be improved. Since they end up with identical asm, I'm hoping that is a single x86-specific backend fix in lowering X86ISD::SUB to 'neg'. Do you mind filing a new bug report with those examples? It's hard to keep all of these problems organized with the overlap.
Filed https://bugs.llvm.org/show_bug.cgi?id=40825 for uint128 negation.
We should have all of the common cases after: https://reviews.llvm.org/rL354746 I filed bug 40842 to track the +0x8000... case because I think the root cause is different on that one.