Created attachment 22447 [details] A somewhat reduced ll test case A test case (lv-bug.c) like this seems to miscompile when using -fvectorize: #include <stdint.h> #include <stdio.h> int y = 0; int b = 1; int d = 1; int main() { #pragma clang loop vectorize_width(4) for (int i = 0; i < 3; ++i) { b = (y == 0) ? d : (d / y); } if (b == 1) printf("GOOD!\n"); else printf("BAD!\n"); } When compiled+executed (for x86_64 GNU/Linux) using build-all/bin/clang -O1 lv-bug.c && ./a.out the result is "GOOD!" When compiled+executed using build-all/bin/clang -O1 lv-bug.c -fvectorize && ./a.out the result is "BAD!" Also see comments in the attached lv-fold-tail-by-masking-bug.ll lit test case. Afaict the problem is related to the "fold tail by masking" that takes place (since we have a small trip count that even is smaller than the VF, but the problem is also seen if we for example increase the trip count to 9). Somehow the presence of the conditional sdiv (not executed for y==0) inside the loop fools LV to vectorize the loop??? (if changing the sdiv to for example xor I see remarks such as "loop not vectorized: Cannot fold tail by masking in the presence of live outs.") I've tried lots of old (nightly) downstream builds of clang, and it started to fail around October 2018. So I do not think that it is a new bug. The ll-reproducer indicates that the problem exist in opt 8.0.0 (when testing on https://godbolt.org/z/uaz-P9).
Whewn looking at foo2 in the attached lv-fold-tail-by-masking-bug.ll test case we have a phi like this inside the loop: %cond = phi i64 [ 55, %cond.false ], [ 77, %for.body ] The VPlan says N2 [label = "cond.end:\n" + "EMIT %vp58968 = icmp ule %vp46400 %vp58880\l" + "EMIT %vp60136 = not %vp58064\l" + "EMIT %vp60264 = and %vp60136 %vp58968\l" + "EMIT %vp61944 = and %vp58064 %vp58968\l" + "BLEND %cond = 55/%vp60264 77/%vp61944\l" ] and we end up with a "blend" %1 = icmp ule <4 x i32> %induction, <i32 2, i32 2, i32 2, i32 2> %2 = xor <4 x i1> %broadcast.splat2, <i1 true, i1 true, i1 true, i1 true> %3 = and <4 x i1> %2, %1 %4 = and <4 x i1> %broadcast.splat2, %1 %predphi = select <4 x i1> %4, <4 x i64> <i64 77, i64 77, i64 77, i64 77>, <4 x i64> <i64 55, i64 55, i64 55, i64 55> ... middle.block: ; preds = %vector.body %6 = extractelement <4 x i64> %predphi, i32 3 That is wrong. Could have worked if taking the result from element 2 in the middle.block (since the fourth iteration should be masked?). I don't know much about this really, but a patch like below may solve (or hide) the problem: (I find that comment about "Add them to the list of allowed exits." a little bit suspicious since the code didn't add the instruction to AllowedExits) diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp index 21e75d55a8c..21d7e55fa38 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp @@ -592,6 +592,7 @@ bool LoopVectorizationLegality::canVectorizeInstrs() { if (BB != Header) { // Non-header phi nodes that have outside uses can be vectorized. Add // them to the list of allowed exits. + AllowedExit.insert(&I); // Unsafe cyclic dependencies with header phis are identified during // legalization for reduction, induction and first order // recurrences.
Added some more people as CC (that probably knows a lot more than me about this).
Yes, good catch! Fold-tail currently supports reduction last-vector-value live-out's, but has yet to support last-scalar-value live-outs, including non-header phi's; and it relies on AllowedExit in order to detect them and bail out. This seems to be an over-simplification of rL339703. Would you like to contribute the fix?
(In reply to ayal.zaks from comment #3) > Yes, good catch! Fold-tail currently supports reduction last-vector-value > live-out's, but has yet to support last-scalar-value live-outs, including > non-header phi's; and it relies on AllowedExit in order to detect them and > bail out. This seems to be an over-simplification of rL339703. Would you > like to contribute the fix? I'm not that confortable with this piece of code and I do not really know what to write in the commit msg (if we want to explain what is going on). I can of course put up a patch with the oneliner and some kind of reproducer, but wouldn't mind if someone with more knowledge about the vectorizer would do it. Could this have larger impact (also disabling vectorization for "valid" situations)? Do you think there is a need for more precise test cases?
(In reply to bjorn.a.pettersson from comment #4) > (In reply to ayal.zaks from comment #3) > > Yes, good catch! Fold-tail currently supports reduction last-vector-value > > live-out's, but has yet to support last-scalar-value live-outs, including > > non-header phi's; and it relies on AllowedExit in order to detect them and > > bail out. This seems to be an over-simplification of rL339703. Would you > > like to contribute the fix? > > I'm not that confortable with this piece of code and I do not really know > what to write in the commit msg (if we want to explain what is going on). I > can of course put up a patch with the oneliner and some kind of reproducer, > but wouldn't mind if someone with more knowledge about the vectorizer would > do it. IMO it would be perfectly fine to put up a patch. It's straight forward to provide feedback directly via Phabricator and refine the patch. I am planning on looking into lifting the restriction on such PHI nodes in the coming week, but I am not entirely sure if I will find time. But I think it would be good to restrict the legality checks in the meantime. > > Could this have larger impact (also disabling vectorization for "valid" > situations)? I think we only check AllowedExits when folding the tail, so nothing else should be impacted. The allowed cases (supported reductions and PHIs with only users in the loop body) are explicitly checked for. So we should not miss valid cases. I guess this could also be verified by building the test suite with -Os with and without the patch. The generated code should be the same. > > Do you think there is a need for more precise test cases? Not sure what you mean with a more precise test case? The attached one looks good, as long as we just run loop-vectorize on it. I guess a loop where the PHI operands are not loop invariant would be good to have additionally.
Thanks for all the feedback. I'll upload my fix for review!
Patch for review: https://reviews.llvm.org/D67074 (precommit of test case in https://reviews.llvm.org/D67072 to get history of what it looked like when it miscompiled)
Landed in r370721 (with precommit of test case in r370720). Should this be added to meta tickets for 8.0.1: https://bugs.llvm.org/show_bug.cgi?id=41221 9.0.0: https://bugs.llvm.org/show_bug.cgi?id=42474 (I could not see any meta for 9.0.1 yet, but maybe this is too late for 9.0.0) I changed "Importance" to "release blocker" since it was a miscompile. (Not sure exactly how that field usually is used.)
(In reply to bjorn.a.pettersson from comment #8) > Landed in r370721 (with precommit of test case in r370720). > > Should this be added to meta tickets for > 8.0.1: https://bugs.llvm.org/show_bug.cgi?id=41221 > 9.0.0: https://bugs.llvm.org/show_bug.cgi?id=42474 > > (I could not see any meta for 9.0.1 yet, but maybe this is too late for > 9.0.0) I've put it on my watch-list. If we do another release candidate for 9.0.0, we should probably pick this up, otherwise I think it will have to wait for 9.0.1 in which case I'll add it once there's a tracking bug.
Merged to release_90 in r371044.