LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 31274 - cost models should allow something more than an instruction as an input
Summary: cost models should allow something more than an instruction as an input
Status: NEW
Alias: None
Product: libraries
Classification: Unclassified
Component: Transformation Utilities (show other bugs)
Version: trunk
Hardware: PC All
: P normal
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-05 11:53 PST by Sanjay Patel
Modified: 2021-06-04 05:50 PDT (History)
4 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sanjay Patel 2016-12-05 11:53:10 PST
Filing a bug to keep track of a suggestion that has come up a few times recently:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/107489.html
http://lists.llvm.org/pipermail/llvm-dev/2016-November/106879.html

Here's a umax example to illustrate:

$ cat costmodel_patterns.ll
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.12.0"

define i32 @max(i32* nocapture readonly %x, i32 %N) #0 {
entry:
  %cmp11 = icmp eq i32 %N, 0
  br i1 %cmp11, label %for.cond.cleanup, label %for.body.preheader

for.body.preheader:   
  %wide.trip.count = zext i32 %N to i64
  br label %for.body

for.cond.cleanup.loopexit:
  br label %for.cond.cleanup

for.cond.cleanup:        
  %ret.0.lcssa = phi i32 [ 0, %entry ], [ %.ret.0, %for.cond.cleanup.loopexit ]
  ret i32 %ret.0.lcssa

for.body:               
  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ]
  %ret.012 = phi i32 [ %.ret.0, %for.body ], [ 0, %for.body.preheader ]
  %arrayidx = getelementptr inbounds i32, i32* %x, i64 %indvars.iv
  %0 = load i32, i32* %arrayidx, align 4
  %cmp1 = icmp ugt i32 %0, %ret.012
  %.ret.0 = select i1 %cmp1, i32 %0, i32 %ret.012
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count
  br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
}

attributes #0 = { "target-features"="+avx" }

-----------------------------------------------------------------------------

This is IR for a target that has AVX, therefore, 'umax' is a single and simple 
instruction with expected throughput of 1 inst / cycle:
$ ./opt -loop-vectorize costmodel_patterns.ll -S | ./llc -o -  |grep max
...	
	vpmaxud	%xmm4, %xmm1, %xmm1
...

The cost model interface, however, is limited to providing costs for individual IR instructions. For example:
  /// \returns The expected cost of compare and select instructions.
  int getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
                         Type *CondTy = nullptr) const;


That means we see something like this:

$ ./opt -cost-model -analyze costmodel_patterns.ll -S
Printing analysis 'Cost Model Analysis' for function 'max':
...
Cost Model: Found an estimated cost of 1 for instruction:   %cmp1 = icmp ugt i32 %0, %ret.012
Cost Model: Found an estimated cost of 1 for instruction:   %.ret.0 = select i1 %cmp1, i32 %0, i32 %ret.012

...so we calculate a cost of '2' for a max idiom that should have a cost of '1' (both in terms of machine instruction count and throughput).
Comment 1 Sanjay Patel 2016-12-05 12:07:26 PST
(In reply to comment #0)
> $ ./opt -cost-model -analyze costmodel_patterns.ll -S
> Printing analysis 'Cost Model Analysis' for function 'max':
> ...
> Cost Model: Found an estimated cost of 1 for instruction:   %cmp1 = icmp ugt
> i32 %0, %ret.012
> Cost Model: Found an estimated cost of 1 for instruction:   %.ret.0 = select
> i1 %cmp1, i32 %0, i32 %ret.012
> 
> ...so we calculate a cost of '2' for a max idiom that should have a cost of
> '1' (both in terms of machine instruction count and throughput).

Screwed that paste up: this probably actually makes sense for the scalar case (there's no x86 scalar max instruction for that).

It's the vector case that shows the problem:

$ ./opt -loop-vectorize costmodel_patterns.ll -S -debug
...
LV: Found an estimated cost of 1 for VF 4 For instruction:   %.ret.0 = select i1 %cmp1, i32 %0, i32 %ret.012
LV: Found an estimated cost of 1 for VF 4 For instruction:   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1

For an AVX target, we'd expect max to have cost = 1 for VF 4.
Comment 2 Sanjay Patel 2018-05-14 07:21:42 PDT
https://reviews.llvm.org/D46276
Comment 3 Sanjay Patel 2018-05-14 11:21:33 PDT
(In reply to Sanjay Patel from comment #2)
> https://reviews.llvm.org/D46276

That's 1 IR instruction mapping to multiple machine instructions. What we want here is multiple IR instructions mapping to 1 machine instruction. 

That's also needed for bug 37426 if we don't have a rotate intrinsic.
Comment 4 Sanjay Patel 2018-08-26 06:25:04 PDT
(In reply to Sanjay Patel from comment #3)
> (In reply to Sanjay Patel from comment #2)
> 
> That's also needed for bug 37426 if we don't have a rotate intrinsic.

Side note to this bug, but we do have rotate intrinsics now following a discussion started in bug 37387:
https://reviews.llvm.org/rL337221 (IR intrinsics)
https://reviews.llvm.org/rL340141 (C/C++ builtins)

We don't canonicalize IR to that form yet. Adding backend (for vector types), vectorizer, and cost model support are prerequisites to completing that mission.