Created attachment 21600 [details] LLVM IR reproducer PR41081 described an issue where an assertion is hit: HT.TopLevelMap[ThisEntry->getKey()] == ThisEntry && "Scope imbalance!"); I've attached the LLVM IR from the original ticket. To reproduce: opt -early-cse locale-reduce.ll -S -o - I've 'fixed' the original bug by removing this pass from the arm backend pipeline as I have no experience with this pass.
This is now hitting http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable. Reduced: target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64" target triple = "armv7-unknown-linux-android" define void @f(i32 %span_left, i32 %clip_left) { entry: %cmp = icmp sgt i32 %clip_left, %span_left %sub = sub nsw i32 %clip_left, %span_left %cond = select i1 %cmp, i32 %sub, i32 0 %cmp83292 = icmp slt i32 %cond, undef %0 = sub i32 %clip_left, %span_left %1 = select i1 %cmp, i32 %0, i32 0 unreachable } (Not completely sure it's the same issue as the original testcase, but it's at least the same assertion.)
Hmm, actually, it's possible I didn't reduce the issue correctly; my original testcase hits a different assertion. Looking a bit more.
Figured out what's happening. So when EarlyCSE takes an instruction, and replaces it with another, it does the following: if (auto *I = dyn_cast<Instruction>(V)) I->andIRFlags(Inst); This is fine on its own, but there's a problem: this changes the hash of instructions that depend on I. In particular, matchDecomposedSelectPattern depends on the nsw flags of its operands. This, of course, confuses the hash tables, and leads to multiple different assertions, depending on the exact testcase. Not sure what the best solution is. Should we stop trying to pattern-match selects? Can we restrict matchDecomposedSelectPattern to patterns which don't require nsw? Can we rehash the relevant hashtable keys?
Another option could be to skip the CSE rather than modify the available value when the redundant value has flags not present on the available value. Maybe even specifically when the flag in question is nsw, or specifically when it's a flag we've used in matching a min/max pattern, or something along those lines.
...and of course I got that backwards. I meant when the available value has flags not present on the redundant value.
Skipping the transform on incompatible flags sounds good to me. Does this only affect selects though?
I'm not really familiar with EarlyCSE yet, but is there a reason why we can't remove all users of the instruction being changed from AvailableValues and add them back once the change is completed (but now with newly computed hash)?
Here's a patch draft to limit the select matching if the operands have flags. This avoids the assert on the attached test and the example in comment 1. There's 1 regression on a test in llvm/test/Transforms/EarlyCSE/commute.ll with this patch, but I'd guess that's not a common occurrence. Ideally, we could change the hashing as suggested in comment 7, but I haven't tried to implement that. --------------------------------------------------------------------------- diff --git a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp index cd163c84c61..21ed48ac615 100644 --- a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp +++ b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp @@ -152,10 +152,20 @@ static bool matchSelectWithOptionalNotCond(Value *V, Value *&Cond, Value *&A, std::swap(A, B); } - // Set flavor if we find a match, or set it to unknown otherwise; in - // either case, return true to indicate that this is a select we can - // process. - if (auto *CmpI = dyn_cast<ICmpInst>(Cond)) + // Try to match a cmp+select idiom (min/max/abs). Exclude patterns that may + // require instruction flags (nsw, FMF, etc.) on the operands because we try + // to intersect (drop) those flags as part of hashing/value numbering. + auto hasNoFlags = [](Value *Val) { + if (auto *OBO = dyn_cast<OverflowingBinaryOperator>(Val)) + return !OBO->hasNoUnsignedWrap() && !OBO->hasNoSignedWrap(); + if (auto *PEO = dyn_cast<PossiblyExactOperator>(Val)) + return !PEO->isExact(); + if (auto *FPMO = dyn_cast<FPMathOperator>(Val)) + return FPMO->getFastMathFlags().none(); + return true; + }; + auto *CmpI = dyn_cast<ICmpInst>(Cond); + if (CmpI && hasNoFlags(A) && hasNoFlags(B)) Flavor = matchDecomposedSelectPattern(CmpI, A, B, A, B).Flavor; else Flavor = SPF_UNKNOWN;
I'm not particularly invested in any particular approach here, but one thing that may complicate the implementation is figuring out how far the flags propagate. For example, there's an nsw check in matchMinMax, which is called from matchSelectPattern, but matchDecomposedSelectPattern may have already looked through a cast before calling matchSelectPattern. Same thing with isKnownNegation, though it looks like we're always calling that with NeedNSW false, so maybe that one's ok? These are just the first couple places I could spot the select pattern matching depending on flags, the higher-level point is that it's tricky and error-prone looking through the code to figure out what all operations may have relevant flags, and presumably the set could change over time with updates to ValueTracking. Looking at the suggestions on this thread in light of that: - stop pattern-matching selects is a big hammer that wouldn't be complicated by the issue of how far the flags propagate - restrict matchDecomposedSelectPattern to not depend on flags would probably imply threading that flag through the ValueTracking logic, which seems messy - rehashing the relevant instructions is complicated because the relevant instructions aren't just the first-level users of the instruction with the flags - skip cse when flag modification would change hash has the same issue that it's hard to identify the relevant instructions - limit select matching within CSE to avoid matching when it would rely on flags also has the issue of knowing how far back to look for flags TBH, I'm not liking any of those implications. One way to avoid this complexity would be to disallow CSE across differently-flagged instructions (or at least in the case where the available instruction has flags that aren't present on the redundant instruction). That would essentially be (partially) undoing the optimization side of r267111...
Just avoiding CSE when flags mismatch will disable optimization for cases like this: %1 = add nuw %a, %b %2 = add %a, %b << we can't add this value to AvailableValues since for AvailableValues it's equal to %1 %3 = add %a, %b << here we won't know about %2 %4 = add %a, %b << here we won't know about %2 and %3 since we can't add them to AvailableValues Solution that seems the cleanest to me: 1. Mix flags into hash. 2. (optional) Add a flag into SimpleValue called like "IgnoreFlags" that disables mixing flags into hash. Then, when we look for a substitute for an instruction: first we try to find exactly the same instruction, second we look for the same instruction minus flags. This will handle the case mentioned above, though it won't cover cases like this: %1 = add nuw %a, %b %2 = add nuw nsw %a, %b << we won't recognize %1 as a suitable replacement for %2 This issue can be solved by replacing step (2) with following: When we add an instruction to the AvailableValues, we should add it several times with different hashes: every hash should represent an instruction this one can replace. For example, for "add %a, %b" we should add it with hashes of: add %a, %b add nuw %a, %b add nsw %a, %b add nuw nsw %a, %b Though for fast-math flags this will be messy.
(In reply to Joseph Tremoulet from comment #9) > Looking at the suggestions on this thread in light of that: > - stop pattern-matching selects is a big hammer that wouldn't be > complicated by the issue of how far the flags propagate > - restrict matchDecomposedSelectPattern to not depend on flags would > probably imply threading that flag through the ValueTracking logic, which > seems messy There's another quick-fix variant that would solve the shown bugs that lives between the 2 options above: stop using ValueTracking's matchSelectPattern to recognize min/max/abs, but implement basic pattern matching in EarlyCSE for those instead. InstCombine has gotten better at canonicalizing these patterns, so I doubt there would be much real-world downside. Dirty patch that removes the ValueTracking call (I can clean this up to not use the ValueTracking names and add basic 'abs' matching if we decide this is the way forward): diff --git a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp index cd163c84c61..177118a75e4 100644 --- a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp +++ b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp @@ -155,10 +155,22 @@ static bool matchSelectWithOptionalNotCond(Value *V, Value *&Cond, Value *&A, // Set flavor if we find a match, or set it to unknown otherwise; in // either case, return true to indicate that this is a select we can // process. - if (auto *CmpI = dyn_cast<ICmpInst>(Cond)) - Flavor = matchDecomposedSelectPattern(CmpI, A, B, A, B).Flavor; - else - Flavor = SPF_UNKNOWN; + CmpInst::Predicate Pred; + if (!match(Cond, m_ICmp(Pred, m_Specific(A), m_Specific(B)))) { + if (!match(Cond, m_ICmp(Pred, m_Specific(B), m_Specific(A)))) { + Flavor = SPF_UNKNOWN; + return true; + } + Pred = ICmpInst::getSwappedPredicate(Pred); + } + + switch (Pred) { + case CmpInst::ICMP_UGT: Flavor = SPF_UMAX; break; + case CmpInst::ICMP_ULT: Flavor = SPF_UMIN; break; + case CmpInst::ICMP_SGT: Flavor = SPF_SMAX; break; + case CmpInst::ICMP_SLT: Flavor = SPF_SMIN; break; + default: Flavor = SPF_UNKNOWN; break; + } return true; }
(In reply to Sanjay Patel from comment #11) > stop using ValueTracking's matchSelectPattern > to recognize min/max/abs, but implement basic pattern matching in EarlyCSE > for those instead. SGTM, assuming the basic pattern matching stays pretty simple (it would be a shame to wind up duplicating a significant chunk of the value-numbering stuff in EarlyCSE), especially if we can verify that the perf impact is negligible.
Patch proposal that extends the diff from comment 11: https://reviews.llvm.org/D74285
Should be fixed with: https://reviews.llvm.org/rGb8ebc11f0320
*** Bug 45206 has been marked as a duplicate of this bug. ***