New user self-registration is disabled due to spam. For an account please email bugs-admin@lists.llvm.org with your e-mail address and full name.

Bug 41083 - Assertion in ScopedHashTable during EarlyCSE
Summary: Assertion in ScopedHashTable during EarlyCSE
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Scalar Optimizations (show other bugs)
Version: trunk
Hardware: PC Linux
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
: 45206 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-03-15 06:43 PDT by Sam Parker
Modified: 2020-03-18 02:22 PDT (History)
7 users (show)

See Also:
Fixed By Commit(s): b8ebc11f0320


Attachments
LLVM IR reproducer (6.61 KB, text/plain)
2019-03-15 06:43 PDT, Sam Parker
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sam Parker 2019-03-15 06:43:22 PDT
Created attachment 21600 [details]
LLVM IR reproducer

PR41081 described an issue where an assertion is hit:
HT.TopLevelMap[ThisEntry->getKey()] == ThisEntry && "Scope imbalance!");

I've attached the LLVM IR from the original ticket. To reproduce:

opt -early-cse locale-reduce.ll -S -o -

I've 'fixed' the original bug by removing this pass from the arm backend pipeline as I have no experience with this pass.
Comment 1 Eli Friedman 2020-01-21 18:20:19 PST
This is now hitting http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable.  Reduced:

target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "armv7-unknown-linux-android"

define void @f(i32 %span_left, i32 %clip_left) {
entry:
  %cmp = icmp sgt i32 %clip_left, %span_left
  %sub = sub nsw i32 %clip_left, %span_left
  %cond = select i1 %cmp, i32 %sub, i32 0
  %cmp83292 = icmp slt i32 %cond, undef
  %0 = sub i32 %clip_left, %span_left
  %1 = select i1 %cmp, i32 %0, i32 0
  unreachable
}

(Not completely sure it's the same issue as the original testcase, but it's at least the same assertion.)
Comment 2 Eli Friedman 2020-01-21 18:23:44 PST
Hmm, actually, it's possible I didn't reduce the issue correctly; my original testcase hits a different assertion.  Looking a bit more.
Comment 3 Eli Friedman 2020-01-21 19:28:05 PST
Figured out what's happening.

So when EarlyCSE takes an instruction, and replaces it with another, it does the following:

        if (auto *I = dyn_cast<Instruction>(V))
          I->andIRFlags(Inst);

This is fine on its own, but there's a problem: this changes the hash of instructions that depend on I.  In particular, matchDecomposedSelectPattern depends on the nsw flags of its operands.  This, of course, confuses the hash tables, and leads to multiple different assertions, depending on the exact testcase.

Not sure what the best solution is.  Should we stop trying to pattern-match selects?  Can we restrict matchDecomposedSelectPattern to patterns which don't require nsw?  Can we rehash the relevant hashtable keys?
Comment 4 Joseph Tremoulet 2020-01-22 07:38:41 PST
Another option could be to skip the CSE rather than modify the available value when the redundant value has flags not present on the available value.  Maybe even specifically when the flag in question is nsw, or specifically when it's a flag we've used in matching a min/max pattern, or something along those lines.
Comment 5 Joseph Tremoulet 2020-01-22 07:39:42 PST
...and of course I got that backwards.  I meant when the available value has flags not present on the redundant value.
Comment 6 Sam Parker 2020-02-06 04:00:36 PST
Skipping the transform on incompatible flags sounds good to me. Does this only affect selects though?
Comment 7 Daniil Suchkov 2020-02-06 04:05:26 PST
I'm not really familiar with EarlyCSE yet, but is there a reason why we can't remove all users of the instruction being changed from AvailableValues and add them back once the change is completed (but now with newly computed hash)?
Comment 8 Sanjay Patel 2020-02-06 08:12:23 PST
Here's a patch draft to limit the select matching if the operands have flags. This avoids the assert on the attached test and the example in comment 1. There's 1 regression on a test in llvm/test/Transforms/EarlyCSE/commute.ll with this patch, but I'd guess that's not a common occurrence.

Ideally, we could change the hashing as suggested in comment 7, but I haven't tried to implement that.
---------------------------------------------------------------------------

diff --git a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
index cd163c84c61..21ed48ac615 100644
--- a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
+++ b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
@@ -152,10 +152,20 @@ static bool matchSelectWithOptionalNotCond(Value *V, Value *&Cond, Value *&A,
     std::swap(A, B);
   }
 
-  // Set flavor if we find a match, or set it to unknown otherwise; in
-  // either case, return true to indicate that this is a select we can
-  // process.
-  if (auto *CmpI = dyn_cast<ICmpInst>(Cond))
+  // Try to match a cmp+select idiom (min/max/abs). Exclude patterns that may
+  // require instruction flags (nsw, FMF, etc.) on the operands because we try
+  // to intersect (drop) those flags as part of hashing/value numbering.
+  auto hasNoFlags = [](Value *Val) {
+    if (auto *OBO = dyn_cast<OverflowingBinaryOperator>(Val))
+      return !OBO->hasNoUnsignedWrap() && !OBO->hasNoSignedWrap();
+    if (auto *PEO = dyn_cast<PossiblyExactOperator>(Val))
+      return !PEO->isExact();
+    if (auto *FPMO = dyn_cast<FPMathOperator>(Val))
+      return FPMO->getFastMathFlags().none();
+    return true;
+  };
+  auto *CmpI = dyn_cast<ICmpInst>(Cond);
+  if (CmpI && hasNoFlags(A) && hasNoFlags(B))
     Flavor = matchDecomposedSelectPattern(CmpI, A, B, A, B).Flavor;
   else
     Flavor = SPF_UNKNOWN;
Comment 9 Joseph Tremoulet 2020-02-06 09:12:19 PST
I'm not particularly invested in any particular approach here, but one thing that may complicate the implementation is figuring out how far the flags propagate.  For example, there's an nsw check in matchMinMax, which is called from matchSelectPattern, but matchDecomposedSelectPattern may have already looked through a cast before calling matchSelectPattern.  Same thing with isKnownNegation, though it looks like we're always calling that with NeedNSW false, so maybe that one's ok?  These are just the first couple places I could spot the select pattern matching depending on flags, the higher-level point is that it's tricky and error-prone looking through the code to figure out what all operations may have relevant flags, and presumably the set could change over time with updates to ValueTracking.

Looking at the suggestions on this thread in light of that:
 - stop pattern-matching selects is a big hammer that wouldn't be complicated by the issue of how far the flags propagate
 - restrict matchDecomposedSelectPattern to not depend on flags would probably imply threading that flag through the ValueTracking logic, which seems messy
 - rehashing the relevant instructions is complicated because the relevant instructions aren't just the first-level users of the instruction with the flags
 - skip cse when flag modification would change hash has the same issue that it's hard to identify the relevant instructions
 - limit select matching within CSE to avoid matching when it would rely on flags also has the issue of knowing how far back to look for flags


TBH, I'm not liking any of those implications.
One way to avoid this complexity would be to disallow CSE across differently-flagged instructions (or at least in the case where the available instruction has flags that aren't present on the redundant instruction).  That would essentially be (partially) undoing the optimization side of r267111...
Comment 10 Daniil Suchkov 2020-02-07 01:05:54 PST
Just avoiding CSE when flags mismatch will disable optimization for cases like this:
%1 = add nuw %a, %b
%2 = add %a, %b << we can't add this value to AvailableValues since for AvailableValues it's equal to %1
%3 = add %a, %b << here we won't know about %2
%4 = add %a, %b << here we won't know about %2 and %3 since we can't add them to AvailableValues

Solution that seems the cleanest to me:
1. Mix flags into hash.
2. (optional) Add a flag into SimpleValue called like "IgnoreFlags" that disables mixing flags into hash. Then, when we look for a substitute for an instruction: first we try to find exactly the same instruction, second we look for the same instruction minus flags.
This will handle the case mentioned above, though it won't cover cases like this:
%1 = add nuw %a, %b
%2 = add nuw nsw %a, %b << we won't recognize %1 as a suitable replacement for %2

This issue can be solved by replacing step (2) with following:
When we add an instruction to the AvailableValues, we should add it several times with different hashes: every hash should represent an instruction this one can replace. For example, for "add %a, %b" we should add it with hashes of:
add %a, %b
add nuw %a, %b
add nsw %a, %b
add nuw nsw %a, %b

Though for fast-math flags this will be messy.
Comment 11 Sanjay Patel 2020-02-07 08:09:03 PST
(In reply to Joseph Tremoulet from comment #9)
> Looking at the suggestions on this thread in light of that:
>  - stop pattern-matching selects is a big hammer that wouldn't be
> complicated by the issue of how far the flags propagate
>  - restrict matchDecomposedSelectPattern to not depend on flags would
> probably imply threading that flag through the ValueTracking logic, which
> seems messy

There's another quick-fix variant that would solve the shown bugs that lives between the 2 options above: stop using ValueTracking's matchSelectPattern to recognize min/max/abs, but implement basic pattern matching in EarlyCSE for those instead. InstCombine has gotten better at canonicalizing these patterns, so I doubt there would be much real-world downside.

Dirty patch that removes the ValueTracking call (I can clean this up to not use the ValueTracking names and add basic 'abs' matching if we decide this is the way forward):

diff --git a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
index cd163c84c61..177118a75e4 100644
--- a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
+++ b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
@@ -155,10 +155,22 @@ static bool matchSelectWithOptionalNotCond(Value *V, Value *&Cond, Value *&A,
   // Set flavor if we find a match, or set it to unknown otherwise; in
   // either case, return true to indicate that this is a select we can
   // process.
-  if (auto *CmpI = dyn_cast<ICmpInst>(Cond))
-    Flavor = matchDecomposedSelectPattern(CmpI, A, B, A, B).Flavor;
-  else
-    Flavor = SPF_UNKNOWN;
+  CmpInst::Predicate Pred;
+  if (!match(Cond, m_ICmp(Pred, m_Specific(A), m_Specific(B)))) {
+    if (!match(Cond, m_ICmp(Pred, m_Specific(B), m_Specific(A)))) {
+      Flavor = SPF_UNKNOWN;
+      return true;
+    }
+    Pred = ICmpInst::getSwappedPredicate(Pred);
+  }
+
+  switch (Pred) {
+  case CmpInst::ICMP_UGT: Flavor = SPF_UMAX; break;
+  case CmpInst::ICMP_ULT: Flavor = SPF_UMIN; break;
+  case CmpInst::ICMP_SGT: Flavor = SPF_SMAX; break;
+  case CmpInst::ICMP_SLT: Flavor = SPF_SMIN; break;
+  default: Flavor = SPF_UNKNOWN; break;
+  }
 
   return true;
 }
Comment 12 Joseph Tremoulet 2020-02-07 11:33:59 PST
(In reply to Sanjay Patel from comment #11)
> stop using ValueTracking's matchSelectPattern
> to recognize min/max/abs, but implement basic pattern matching in EarlyCSE
> for those instead.

SGTM, assuming the basic pattern matching stays pretty simple (it would be a shame to wind up duplicating a significant chunk of the value-numbering stuff in EarlyCSE), especially if we can verify that the perf impact is negligible.
Comment 13 Sanjay Patel 2020-02-08 11:05:14 PST
Patch proposal that extends the diff from comment 11:
https://reviews.llvm.org/D74285
Comment 14 Sanjay Patel 2020-02-11 08:15:23 PST
Should be fixed with:
https://reviews.llvm.org/rGb8ebc11f0320
Comment 15 Hans Wennborg 2020-03-18 02:22:36 PDT
*** Bug 45206 has been marked as a duplicate of this bug. ***