LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 41998 - After r345814, IndVars regression causing infinite loop
Summary: After r345814, IndVars regression causing infinite loop
Status: RESOLVED FIXED
Alias: None
Product: new-bugs
Classification: Unclassified
Component: new bugs (show other bugs)
Version: 8.0
Hardware: PC All
: P normal
Assignee: Nikita Popov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-23 12:54 PDT by Dimitry Andric
Modified: 2019-06-29 09:30 PDT (History)
8 users (show)

See Also:
Fixed By Commit(s): 364709


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dimitry Andric 2019-05-23 12:54:03 PDT
As reported in https://bugs.freebsd.org/237515 and
https://github.com/michaelrsweet/htmldoc/issues/349, the HTMLDOC program
compiled with clang 8.0 hangs (and eventually causes a stack overflow)
when processing a PDF.

Bisection shows this is a regression caused by
https://reviews.llvm.org/rL345814 ("[IndVars] Smart hard uses
detection").

I attempted to minimize the test case as much as possible; the following
is a small self-contained program showing the issue:

========================================================================
#include <stdio.h>

__attribute__((__noinline__)) void parse_pre(const char *dataptr) {
  int col;
  char line[1024];
  char *lineptr;

  for (col = 0, lineptr = line; *dataptr != '\0'; dataptr++) {
    printf("before inner loop: col=%d, diff=%td\n", col, lineptr - line);
    do {
      printf("inside inner loop: col=%d, diff=%td\n", col, lineptr - line);
      *lineptr++ = ' ';
      col++;
    } while (col & 7);
  }
}

int main(void) {
  parse_pre("a");

  return 0;
}
========================================================================

What appears to happen is that IndVars pulls the (col & 7) test out of
the inner loop, but it precalculates the iteration count incorrectly,
causing that inner loop to never exit.  It will keep on overwriting the
line[] buffer until it crashes.

With clang trunk r345813, the inner loop looks like:

.LBB0_3:                                # %do.body
                                        #   Parent Loop BB0_2 Depth=1
                                        # =>  This Inner Loop Header: Depth=2
        #DEBUG_VALUE: parse_pre:lineptr <- [DW_OP_constu 48, DW_OP_minus] [$rbp+0]
        #DEBUG_VALUE: parse_pre:dataptr <- [DW_OP_constu 56, DW_OP_minus] [$rbp+0]
        #DEBUG_VALUE: parse_pre:col <- $r15d
        #DEBUG_VALUE: parse_pre:col <- $ebx
        #DEBUG_VALUE: parse_pre:lineptr <- $r13
        .loc    1 11 68 is_stmt 1       # ps-pdf-min.c:11:68
        movq    %r13, %rdx
        subq    %r12, %rdx
        .loc    1 11 7 is_stmt 0        # ps-pdf-min.c:11:7
        movl    $.L.str.1, %edi
        xorl    %eax, %eax
        movl    %ebx, %esi
        callq   printf
        .loc    1 12 18 is_stmt 1       # ps-pdf-min.c:12:18
        movb    $32, (%r13)
        .loc    1 12 15 is_stmt 0       # ps-pdf-min.c:12:15
        addq    $1, %r13
.Ltmp9:
        .loc    1 13 10 is_stmt 1       # ps-pdf-min.c:13:10
        addl    $1, %ebx
.Ltmp10:
        #DEBUG_VALUE: parse_pre:col <- $ebx
        .loc    1 14 5                  # ps-pdf-min.c:14:5
        testb   $7, %bl
        jne     .LBB0_3

E.g., %ebx contains 'col', is increased by 'addl $1, %ebx', and is then
tested with 'testb $7, %bl'.  This works correctly.

With clang trunk r345814, the inner loop becomes:

        .loc    1 10 5 is_stmt 1        # ps-pdf-min.c:10:5
        movl    %r12d, %r13d
        negl    %r13d
        andl    $7, %r13d
        leaq    (%rbx,%r13), %rax
.Ltmp6:
        #DEBUG_VALUE: parse_pre:lineptr <- [DW_OP_constu 48, DW_OP_minus] [$rbp+0]
        .loc    1 0 5 is_stmt 0         # ps-pdf-min.c:0:5
        movq    %rax, -48(%rbp)         # 8-byte Spill
.Ltmp7:
        .p2align        4, 0x90
.LBB0_3:                                # %do.body
                                        #   Parent Loop BB0_2 Depth=1
                                        # =>  This Inner Loop Header: Depth=2
        #DEBUG_VALUE: parse_pre:lineptr <- [DW_OP_constu 48, DW_OP_minus] [$rbp+0]
        #DEBUG_VALUE: parse_pre:col <- $r12d
        #DEBUG_VALUE: parse_pre:dataptr <- $r14
        #DEBUG_VALUE: parse_pre:col <- $r12d
        #DEBUG_VALUE: parse_pre:lineptr <- $rbx
        .loc    1 11 68 is_stmt 1       # ps-pdf-min.c:11:68
        movq    %rbx, %rdx
        subq    %r15, %rdx
        .loc    1 11 7 is_stmt 0        # ps-pdf-min.c:11:7
        movl    $.L.str.1, %edi
        xorl    %eax, %eax
        movl    %r12d, %esi
        callq   printf
        .loc    1 12 18 is_stmt 1       # ps-pdf-min.c:12:18
        movb    $32, (%rbx)
        .loc    1 12 15 is_stmt 0       # ps-pdf-min.c:12:15
        addq    $1, %rbx
.Ltmp8:
        .loc    1 13 10 is_stmt 1       # ps-pdf-min.c:13:10
        addl    $1, %r12d
.Ltmp9:
        #DEBUG_VALUE: parse_pre:col <- $r12d
        .loc    1 14 5                  # ps-pdf-min.c:14:5
        addq    $-1, %r13
        jne     .LBB0_3

Here, %r12d contains 'col', which is first stored in %r13d, negated,
then anded with $7.  Unfortunately, on the first loop, 'col' is zero, so
negation doesn't do anything, and neither does 'andl $7'.  So at the end
of the loop, %r13 is decremented to -1, and the loop effectively
executes 2^64 times... :-/
Comment 1 Dimitry Andric 2019-05-23 13:00:51 PDT
Note that r345814 was reverted, but reapplied later in https://reviews.llvm.org/rL346397, and that still causes the same problem as describe in this bug.
Comment 2 Dimitry Andric 2019-06-04 14:11:59 PDT
Adding Nikita since the fix for bug 31181 may be something of a similar nature (it looks like an off-by-one, somehow).
Comment 3 listmail 2019-06-04 14:41:00 PDT
This is very likely to be existing latent miscompile being exposed by an innocent change.  The next step here is that someone needs to take the C example, and isolate the IR snippet being miscompiled, and the pass in question.  My educated guess would be IndVarSimplify, possibly in LFTR, but that's simply intuition.  

p.s. Nikita just fixed a long latent LFTR issue, so a retest on ToT is probably warranted as a starting step.  We're also theorizing another variant of the same for pointer induction variables which is *not* yet fixed.
Comment 4 Nikita Popov 2019-06-04 14:45:20 PDT
IR trace and diff for what indvars does specifically: https://gist.github.com/nikic/bbe73ddd399cd1926e2eb11d851a0288

Haven't looked yet whether what it does is right.
Comment 5 listmail 2019-06-04 16:07:02 PDT
I skimmed the IR that Nikita posted.  There's a suspicious use of an inbounds gep following an LFTR rewrite.  I haven't studied it enough to say it's definitely incorrect, but it really looks a lot like the gep inbounds bug we'd theorized about.
Comment 6 Nikita Popov 2019-06-16 12:27:38 PDT
Just took another look at this. SCEV computes the BE count as

    (-1 + (-1 * (trunc i32 %tmp1.07 to i3)))

which is then incremented to

    (-1 * (trunc i32 %tmp1.07 to i3))

for post-inc form and then the limit is computed to be

    %0 = trunc i32 %tmp1.07 to i3
    %1 = mul i3 %0, -1
    %2 = zext i3 %1 to i64
    %lftr.limit = getelementptr i8, i8* %tmp3.08, i64 %2

We should be doing something like zext(X-1)+1 here, which is (in this case) not the same as X. LFTR has a bunch of logic around this stuff, will need a closer look.
Comment 7 Nikita Popov 2019-06-16 13:47:51 PDT
If we're working on integer IVs, then

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
define void @test(i32 %start, i32* %p) {
entry:
  br label %loop

loop:
  %i = phi i32 [ %start, %entry ], [ %i.inc, %loop ]
  %i2 = phi i32 [ 0, %entry ], [ %i2.inc, %loop ] 
  %i.inc = add nuw i32 %i, 1
  %i2.inc = add nuw i32 %i2, 1
  store volatile i32 %i2.inc, i32* %p
  %and = and i32 %i.inc, 7
  %cmp = icmp eq i32 %and, 0
  br i1 %cmp, label %end, label %loop

end:
  ret void
}

is correctly transformed to:

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
define void @test(i32 %start, i32* %p) {
entry:
  %0 = trunc i32 %start to i3
  %1 = sub i3 0, %0
  br label %loop

loop:                                             ; preds = %loop, %entry
  %i2 = phi i32 [ 0, %entry ], [ %i2.inc, %loop ]
  %i2.inc = add nuw nsw i32 %i2, 1
  store volatile i32 %i2.inc, i32* %p
  %lftr.wideiv = trunc i32 %i2.inc to i3
  %exitcond = icmp eq i3 %lftr.wideiv, %1
  br i1 %exitcond, label %end, label %loop

end:                                              ; preds = %loop
  ret void
}

Note that the exit condition is evaluated with the truncated i3 IV.

If LFTR switches to a pointer IV

@data = global [256 x i8] zeroinitializer
define void @test2(i32 %start) {
entry:
  br label %loop

loop:
  %i = phi i32 [ %start, %entry ], [ %i.inc, %loop ]
  %p = phi i8* [ getelementptr inbounds ([256 x i8], [256 x i8]* @data, i64 0, i64 0), %entry ], [ %p.inc, %loop ]
  %i.inc = add nuw i32 %i, 1
  %p.inc = getelementptr inbounds i8, i8* %p, i64 1
  store volatile i8 0, i8* %p.inc 
  %and = and i32 %i.inc, 7
  %cmp = icmp eq i32 %and, 0
  br i1 %cmp, label %end, label %loop

end:
  ret void
}

then we get:

define void @test2(i32 %start) {
entry:
  %0 = trunc i32 %start to i3
  %1 = sub i3 0, %0
  %2 = zext i3 %1 to i64
  %lftr.limit = getelementptr i8, i8* getelementptr inbounds ([256 x i8], [256 x i8]* @data, i64 0, i64 0), i64 %2
  br label %loop

loop:                                             ; preds = %loop, %entry
  %p = phi i8* [ getelementptr inbounds ([256 x i8], [256 x i8]* @data, i64 0, i64 0), %entry ], [ %p.inc, %loop ]
  %p.inc = getelementptr inbounds i8, i8* %p, i64 1
  store volatile i8 0, i8* %p.inc
  %exitcond = icmp eq i8* %p.inc, %lftr.limit
  br i1 %exitcond, label %end, label %loop

end:                                              ; preds = %loop
  ret void
}

And this transformation is incorrect.
Comment 8 Nikita Popov 2019-06-22 03:10:29 PDT
https://reviews.llvm.org/D63686
Comment 9 Dimitry Andric 2019-06-23 13:13:17 PDT
(In reply to Nikita Popov from comment #8)
> https://reviews.llvm.org/D63686

Hmm, https://reviews.llvm.org/rL363613 (for https://reviews.llvm.org/D62939) already seems to fix my original test case... @reames, was that intentional?
Comment 10 Nikita Popov 2019-06-23 13:45:11 PDT
(In reply to Dimitry Andric from comment #9)
> (In reply to Nikita Popov from comment #8)
> > https://reviews.llvm.org/D63686
> 
> Hmm, https://reviews.llvm.org/rL363613 (for https://reviews.llvm.org/D62939)
> already seems to fix my original test case... @reames, was that intentional?

This is a fix for an unrelated issue, but which also happens to prevent use of post-inc LFTR for your case. Without testing, I'd expect that the problem would still exist with some minor code variations, such as `*++lineptr` instead of `*lineptr++`.
Comment 11 Dimitry Andric 2019-06-23 14:05:24 PDT
(In reply to Nikita Popov from comment #10)
> (In reply to Dimitry Andric from comment #9)
> > (In reply to Nikita Popov from comment #8)
> > > https://reviews.llvm.org/D63686
> > 
> > Hmm, https://reviews.llvm.org/rL363613 (for https://reviews.llvm.org/D62939)
> > already seems to fix my original test case... @reames, was that intentional?
> 
> This is a fix for an unrelated issue, but which also happens to prevent use
> of post-inc LFTR for your case. Without testing, I'd expect that the problem
> would still exist with some minor code variations, such as `*++lineptr`
> instead of `*lineptr++`.

Indeed:

$ diff -u ps-pdf-min.c ps-pdf-pre.c
--- ps-pdf-min.c        2019-05-23 19:06:49.665832000 +0000
+++ ps-pdf-pre.c        2019-06-23 21:02:39.577863000 +0000
@@ -9,7 +9,7 @@
     printf("before inner loop: col=%d, diff=%td\n", col, lineptr - line);
     do {
       printf("inside inner loop: col=%d, diff=%td\n", col, lineptr - line);
-      *lineptr++ = ' ';
+      *++lineptr = ' ';
       col++;
     } while (col & 7);
   }

$ ~/ins/llvm-trunk-r363613/bin/clang -O2 ps-pdf-pre.c -o ps-pdf-pre

$ ./ps-pdf-pre
before inner loop: col=0, diff=0
inside inner loop: col=0, diff=0
inside inner loop: col=1, diff=1
inside inner loop: col=2, diff=2
inside inner loop: col=3, diff=3
inside inner loop: col=4, diff=4
inside inner loop: col=5, diff=5
inside inner loop: col=6, diff=6
inside inner loop: col=7, diff=7
inside inner loop: col=8, diff=8
inside inner loop: col=9, diff=9
[...]
inside inner loop: col=3663, diff=3663
Segmentation fault (core dumped)

I'll go try out your https://reviews.llvm.org/D63686 now, thanks.
Comment 12 Nikita Popov 2019-06-29 09:30:04 PDT
Fixed by https://reviews.llvm.org/rL364709.