https://godbolt.org/z/Z2afe8 #include <stdint.h> struct u128 { uint64_t x, y; }; u128 shl2(uint64_t rdi, uint64_t rdx, int n) { if (n & 64) rdx = rdi; #if WORSE if (!(n & 64)) rdi = 0; #else if (0 == (n & 64)) rdi = 0; #endif return {rdi,rdx}; } Without -DWORSE, Clang generates perfect codegen. andl $64, %edx xorl %eax, %eax cmovneq %rdi, %rsi cmovneq %rdi, %rax movq %rsi, %rdx retq But with -DWORSE, Clang generates a dead-store "shr" instruction: andl $64, %edx xorl %eax, %eax shrl $6, %edx // USELESS INSTRUCTION! cmovneq %rdi, %rsi cmovneq %rdi, %rax movq %rsi, %rdx retq This is surprising, given that the only difference is using (0 == x) versus (!x). I would expect these to have equivalent codegen.
Sorry, cut-and-paste error there. The "perfect codegen" without -DWORSE is actually xorl %eax, %eax testb $64, %dl cmovneq %rdi, %rsi cmovneq %rdi, %rax movq %rsi, %rdx retq (That is, where -DWORSE produces "and, xor, shr", the optimal codegen produces "xor, test".)
This comes down to: define dso_local { i64, i64 } @_Z4shl2mmi(i64, i64, i32) local_unnamed_addr #0 { %4 = and i32 %2, 64 %5 = icmp eq i32 %4, 0 %6 = select i1 %5, i64 %1, i64 %0 %7 = select i1 %5, i64 0, i64 %0 %8 = insertvalue { i64, i64 } undef, i64 %7, 0 %9 = insertvalue { i64, i64 } %8, i64 %6, 1 ret { i64, i64 } %9 } vs define dso_local { i64, i64 } @_Z4shl2mmi(i64, i64, i32) local_unnamed_addr #0 { %4 = and i32 %2, 64 %5 = icmp ne i32 %4, 0 %6 = select i1 %5, i64 %0, i64 %1 %7 = select i1 %5, i64 %0, i64 0 %8 = insertvalue { i64, i64 } undef, i64 %7, 0 %9 = insertvalue { i64, i64 } %8, i64 %6, 1 ret { i64, i64 } %9 }
Technically, this is an IR canonicalization problem: if we have functionally equivalent code, then there should be 1 unique/minimal IR form for that. But that breaks down because the icmp has multiple uses, so we don't try to convert the predicate to "eq". The easier solution will probably be to add a backend transform to recognize the and+shift and turn that into 'test'.
This should be fixed after r349385
Resolving - fixed by rL349385