As reported in https://bugs.freebsd.org/237515 and https://github.com/michaelrsweet/htmldoc/issues/349, the HTMLDOC program compiled with clang 8.0 hangs (and eventually causes a stack overflow) when processing a PDF. Bisection shows this is a regression caused by https://reviews.llvm.org/rL345814 ("[IndVars] Smart hard uses detection"). I attempted to minimize the test case as much as possible; the following is a small self-contained program showing the issue: ======================================================================== #include <stdio.h> __attribute__((__noinline__)) void parse_pre(const char *dataptr) { int col; char line[1024]; char *lineptr; for (col = 0, lineptr = line; *dataptr != '\0'; dataptr++) { printf("before inner loop: col=%d, diff=%td\n", col, lineptr - line); do { printf("inside inner loop: col=%d, diff=%td\n", col, lineptr - line); *lineptr++ = ' '; col++; } while (col & 7); } } int main(void) { parse_pre("a"); return 0; } ======================================================================== What appears to happen is that IndVars pulls the (col & 7) test out of the inner loop, but it precalculates the iteration count incorrectly, causing that inner loop to never exit. It will keep on overwriting the line[] buffer until it crashes. With clang trunk r345813, the inner loop looks like: .LBB0_3: # %do.body # Parent Loop BB0_2 Depth=1 # => This Inner Loop Header: Depth=2 #DEBUG_VALUE: parse_pre:lineptr <- [DW_OP_constu 48, DW_OP_minus] [$rbp+0] #DEBUG_VALUE: parse_pre:dataptr <- [DW_OP_constu 56, DW_OP_minus] [$rbp+0] #DEBUG_VALUE: parse_pre:col <- $r15d #DEBUG_VALUE: parse_pre:col <- $ebx #DEBUG_VALUE: parse_pre:lineptr <- $r13 .loc 1 11 68 is_stmt 1 # ps-pdf-min.c:11:68 movq %r13, %rdx subq %r12, %rdx .loc 1 11 7 is_stmt 0 # ps-pdf-min.c:11:7 movl $.L.str.1, %edi xorl %eax, %eax movl %ebx, %esi callq printf .loc 1 12 18 is_stmt 1 # ps-pdf-min.c:12:18 movb $32, (%r13) .loc 1 12 15 is_stmt 0 # ps-pdf-min.c:12:15 addq $1, %r13 .Ltmp9: .loc 1 13 10 is_stmt 1 # ps-pdf-min.c:13:10 addl $1, %ebx .Ltmp10: #DEBUG_VALUE: parse_pre:col <- $ebx .loc 1 14 5 # ps-pdf-min.c:14:5 testb $7, %bl jne .LBB0_3 E.g., %ebx contains 'col', is increased by 'addl $1, %ebx', and is then tested with 'testb $7, %bl'. This works correctly. With clang trunk r345814, the inner loop becomes: .loc 1 10 5 is_stmt 1 # ps-pdf-min.c:10:5 movl %r12d, %r13d negl %r13d andl $7, %r13d leaq (%rbx,%r13), %rax .Ltmp6: #DEBUG_VALUE: parse_pre:lineptr <- [DW_OP_constu 48, DW_OP_minus] [$rbp+0] .loc 1 0 5 is_stmt 0 # ps-pdf-min.c:0:5 movq %rax, -48(%rbp) # 8-byte Spill .Ltmp7: .p2align 4, 0x90 .LBB0_3: # %do.body # Parent Loop BB0_2 Depth=1 # => This Inner Loop Header: Depth=2 #DEBUG_VALUE: parse_pre:lineptr <- [DW_OP_constu 48, DW_OP_minus] [$rbp+0] #DEBUG_VALUE: parse_pre:col <- $r12d #DEBUG_VALUE: parse_pre:dataptr <- $r14 #DEBUG_VALUE: parse_pre:col <- $r12d #DEBUG_VALUE: parse_pre:lineptr <- $rbx .loc 1 11 68 is_stmt 1 # ps-pdf-min.c:11:68 movq %rbx, %rdx subq %r15, %rdx .loc 1 11 7 is_stmt 0 # ps-pdf-min.c:11:7 movl $.L.str.1, %edi xorl %eax, %eax movl %r12d, %esi callq printf .loc 1 12 18 is_stmt 1 # ps-pdf-min.c:12:18 movb $32, (%rbx) .loc 1 12 15 is_stmt 0 # ps-pdf-min.c:12:15 addq $1, %rbx .Ltmp8: .loc 1 13 10 is_stmt 1 # ps-pdf-min.c:13:10 addl $1, %r12d .Ltmp9: #DEBUG_VALUE: parse_pre:col <- $r12d .loc 1 14 5 # ps-pdf-min.c:14:5 addq $-1, %r13 jne .LBB0_3 Here, %r12d contains 'col', which is first stored in %r13d, negated, then anded with $7. Unfortunately, on the first loop, 'col' is zero, so negation doesn't do anything, and neither does 'andl $7'. So at the end of the loop, %r13 is decremented to -1, and the loop effectively executes 2^64 times... :-/
Note that r345814 was reverted, but reapplied later in https://reviews.llvm.org/rL346397, and that still causes the same problem as describe in this bug.
Adding Nikita since the fix for bug 31181 may be something of a similar nature (it looks like an off-by-one, somehow).
This is very likely to be existing latent miscompile being exposed by an innocent change. The next step here is that someone needs to take the C example, and isolate the IR snippet being miscompiled, and the pass in question. My educated guess would be IndVarSimplify, possibly in LFTR, but that's simply intuition. p.s. Nikita just fixed a long latent LFTR issue, so a retest on ToT is probably warranted as a starting step. We're also theorizing another variant of the same for pointer induction variables which is *not* yet fixed.
IR trace and diff for what indvars does specifically: https://gist.github.com/nikic/bbe73ddd399cd1926e2eb11d851a0288 Haven't looked yet whether what it does is right.
I skimmed the IR that Nikita posted. There's a suspicious use of an inbounds gep following an LFTR rewrite. I haven't studied it enough to say it's definitely incorrect, but it really looks a lot like the gep inbounds bug we'd theorized about.
Just took another look at this. SCEV computes the BE count as (-1 + (-1 * (trunc i32 %tmp1.07 to i3))) which is then incremented to (-1 * (trunc i32 %tmp1.07 to i3)) for post-inc form and then the limit is computed to be %0 = trunc i32 %tmp1.07 to i3 %1 = mul i3 %0, -1 %2 = zext i3 %1 to i64 %lftr.limit = getelementptr i8, i8* %tmp3.08, i64 %2 We should be doing something like zext(X-1)+1 here, which is (in this case) not the same as X. LFTR has a bunch of logic around this stuff, will need a closer look.
If we're working on integer IVs, then target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" define void @test(i32 %start, i32* %p) { entry: br label %loop loop: %i = phi i32 [ %start, %entry ], [ %i.inc, %loop ] %i2 = phi i32 [ 0, %entry ], [ %i2.inc, %loop ] %i.inc = add nuw i32 %i, 1 %i2.inc = add nuw i32 %i2, 1 store volatile i32 %i2.inc, i32* %p %and = and i32 %i.inc, 7 %cmp = icmp eq i32 %and, 0 br i1 %cmp, label %end, label %loop end: ret void } is correctly transformed to: target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" define void @test(i32 %start, i32* %p) { entry: %0 = trunc i32 %start to i3 %1 = sub i3 0, %0 br label %loop loop: ; preds = %loop, %entry %i2 = phi i32 [ 0, %entry ], [ %i2.inc, %loop ] %i2.inc = add nuw nsw i32 %i2, 1 store volatile i32 %i2.inc, i32* %p %lftr.wideiv = trunc i32 %i2.inc to i3 %exitcond = icmp eq i3 %lftr.wideiv, %1 br i1 %exitcond, label %end, label %loop end: ; preds = %loop ret void } Note that the exit condition is evaluated with the truncated i3 IV. If LFTR switches to a pointer IV @data = global [256 x i8] zeroinitializer define void @test2(i32 %start) { entry: br label %loop loop: %i = phi i32 [ %start, %entry ], [ %i.inc, %loop ] %p = phi i8* [ getelementptr inbounds ([256 x i8], [256 x i8]* @data, i64 0, i64 0), %entry ], [ %p.inc, %loop ] %i.inc = add nuw i32 %i, 1 %p.inc = getelementptr inbounds i8, i8* %p, i64 1 store volatile i8 0, i8* %p.inc %and = and i32 %i.inc, 7 %cmp = icmp eq i32 %and, 0 br i1 %cmp, label %end, label %loop end: ret void } then we get: define void @test2(i32 %start) { entry: %0 = trunc i32 %start to i3 %1 = sub i3 0, %0 %2 = zext i3 %1 to i64 %lftr.limit = getelementptr i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @data, i64 0, i64 0), i64 %2 br label %loop loop: ; preds = %loop, %entry %p = phi i8* [ getelementptr inbounds ([256 x i8], [256 x i8]* @data, i64 0, i64 0), %entry ], [ %p.inc, %loop ] %p.inc = getelementptr inbounds i8, i8* %p, i64 1 store volatile i8 0, i8* %p.inc %exitcond = icmp eq i8* %p.inc, %lftr.limit br i1 %exitcond, label %end, label %loop end: ; preds = %loop ret void } And this transformation is incorrect.
https://reviews.llvm.org/D63686
(In reply to Nikita Popov from comment #8) > https://reviews.llvm.org/D63686 Hmm, https://reviews.llvm.org/rL363613 (for https://reviews.llvm.org/D62939) already seems to fix my original test case... @reames, was that intentional?
(In reply to Dimitry Andric from comment #9) > (In reply to Nikita Popov from comment #8) > > https://reviews.llvm.org/D63686 > > Hmm, https://reviews.llvm.org/rL363613 (for https://reviews.llvm.org/D62939) > already seems to fix my original test case... @reames, was that intentional? This is a fix for an unrelated issue, but which also happens to prevent use of post-inc LFTR for your case. Without testing, I'd expect that the problem would still exist with some minor code variations, such as `*++lineptr` instead of `*lineptr++`.
(In reply to Nikita Popov from comment #10) > (In reply to Dimitry Andric from comment #9) > > (In reply to Nikita Popov from comment #8) > > > https://reviews.llvm.org/D63686 > > > > Hmm, https://reviews.llvm.org/rL363613 (for https://reviews.llvm.org/D62939) > > already seems to fix my original test case... @reames, was that intentional? > > This is a fix for an unrelated issue, but which also happens to prevent use > of post-inc LFTR for your case. Without testing, I'd expect that the problem > would still exist with some minor code variations, such as `*++lineptr` > instead of `*lineptr++`. Indeed: $ diff -u ps-pdf-min.c ps-pdf-pre.c --- ps-pdf-min.c 2019-05-23 19:06:49.665832000 +0000 +++ ps-pdf-pre.c 2019-06-23 21:02:39.577863000 +0000 @@ -9,7 +9,7 @@ printf("before inner loop: col=%d, diff=%td\n", col, lineptr - line); do { printf("inside inner loop: col=%d, diff=%td\n", col, lineptr - line); - *lineptr++ = ' '; + *++lineptr = ' '; col++; } while (col & 7); } $ ~/ins/llvm-trunk-r363613/bin/clang -O2 ps-pdf-pre.c -o ps-pdf-pre $ ./ps-pdf-pre before inner loop: col=0, diff=0 inside inner loop: col=0, diff=0 inside inner loop: col=1, diff=1 inside inner loop: col=2, diff=2 inside inner loop: col=3, diff=3 inside inner loop: col=4, diff=4 inside inner loop: col=5, diff=5 inside inner loop: col=6, diff=6 inside inner loop: col=7, diff=7 inside inner loop: col=8, diff=8 inside inner loop: col=9, diff=9 [...] inside inner loop: col=3663, diff=3663 Segmentation fault (core dumped) I'll go try out your https://reviews.llvm.org/D63686 now, thanks.
Fixed by https://reviews.llvm.org/rL364709.