Filing a bug to keep track of a suggestion that has come up a few times recently: http://lists.llvm.org/pipermail/llvm-dev/2016-November/107489.html http://lists.llvm.org/pipermail/llvm-dev/2016-November/106879.html Here's a umax example to illustrate: $ cat costmodel_patterns.ll target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-apple-macosx10.12.0" define i32 @max(i32* nocapture readonly %x, i32 %N) #0 { entry: %cmp11 = icmp eq i32 %N, 0 br i1 %cmp11, label %for.cond.cleanup, label %for.body.preheader for.body.preheader: %wide.trip.count = zext i32 %N to i64 br label %for.body for.cond.cleanup.loopexit: br label %for.cond.cleanup for.cond.cleanup: %ret.0.lcssa = phi i32 [ 0, %entry ], [ %.ret.0, %for.cond.cleanup.loopexit ] ret i32 %ret.0.lcssa for.body: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ] %ret.012 = phi i32 [ %.ret.0, %for.body ], [ 0, %for.body.preheader ] %arrayidx = getelementptr inbounds i32, i32* %x, i64 %indvars.iv %0 = load i32, i32* %arrayidx, align 4 %cmp1 = icmp ugt i32 %0, %ret.012 %.ret.0 = select i1 %cmp1, i32 %0, i32 %ret.012 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body } attributes #0 = { "target-features"="+avx" } ----------------------------------------------------------------------------- This is IR for a target that has AVX, therefore, 'umax' is a single and simple instruction with expected throughput of 1 inst / cycle: $ ./opt -loop-vectorize costmodel_patterns.ll -S | ./llc -o - |grep max ... vpmaxud %xmm4, %xmm1, %xmm1 ... The cost model interface, however, is limited to providing costs for individual IR instructions. For example: /// \returns The expected cost of compare and select instructions. int getCmpSelInstrCost(unsigned Opcode, Type *ValTy, Type *CondTy = nullptr) const; That means we see something like this: $ ./opt -cost-model -analyze costmodel_patterns.ll -S Printing analysis 'Cost Model Analysis' for function 'max': ... Cost Model: Found an estimated cost of 1 for instruction: %cmp1 = icmp ugt i32 %0, %ret.012 Cost Model: Found an estimated cost of 1 for instruction: %.ret.0 = select i1 %cmp1, i32 %0, i32 %ret.012 ...so we calculate a cost of '2' for a max idiom that should have a cost of '1' (both in terms of machine instruction count and throughput).
(In reply to comment #0) > $ ./opt -cost-model -analyze costmodel_patterns.ll -S > Printing analysis 'Cost Model Analysis' for function 'max': > ... > Cost Model: Found an estimated cost of 1 for instruction: %cmp1 = icmp ugt > i32 %0, %ret.012 > Cost Model: Found an estimated cost of 1 for instruction: %.ret.0 = select > i1 %cmp1, i32 %0, i32 %ret.012 > > ...so we calculate a cost of '2' for a max idiom that should have a cost of > '1' (both in terms of machine instruction count and throughput). Screwed that paste up: this probably actually makes sense for the scalar case (there's no x86 scalar max instruction for that). It's the vector case that shows the problem: $ ./opt -loop-vectorize costmodel_patterns.ll -S -debug ... LV: Found an estimated cost of 1 for VF 4 For instruction: %.ret.0 = select i1 %cmp1, i32 %0, i32 %ret.012 LV: Found an estimated cost of 1 for VF 4 For instruction: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 For an AVX target, we'd expect max to have cost = 1 for VF 4.
https://reviews.llvm.org/D46276
(In reply to Sanjay Patel from comment #2) > https://reviews.llvm.org/D46276 That's 1 IR instruction mapping to multiple machine instructions. What we want here is multiple IR instructions mapping to 1 machine instruction. That's also needed for bug 37426 if we don't have a rotate intrinsic.
(In reply to Sanjay Patel from comment #3) > (In reply to Sanjay Patel from comment #2) > > That's also needed for bug 37426 if we don't have a rotate intrinsic. Side note to this bug, but we do have rotate intrinsics now following a discussion started in bug 37387: https://reviews.llvm.org/rL337221 (IR intrinsics) https://reviews.llvm.org/rL340141 (C/C++ builtins) We don't canonicalize IR to that form yet. Adding backend (for vector types), vectorizer, and cost model support are prerequisites to completing that mission.