David Sherwood 6998f8ae2d [LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

  unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
  return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.

Differential Revision: https://reviews.llvm.org/D99718
2021-04-28 13:41:07 +01:00

41 lines
1.6 KiB
LLVM

; REQUIRES: asserts
; RUN: opt -loop-vectorize -force-vector-width=2 -debug-only=loop-vectorize -S -o - < %s 2>&1 | FileCheck %s
%struct.foo = type { i32, i64 }
; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %0 = bitcast i64* %b to i32*
; The bitcast below will be scalarized due to the predication in the loop. Bitcasts
; between pointer types should be treated as free, despite the scalarization.
define void @foo(%struct.foo* noalias nocapture %in, i32* noalias nocapture readnone %out, i64 %n) {
entry:
br label %for.body
for.body: ; preds = %entry, %if.end
%i.012 = phi i64 [ %inc, %if.end ], [ 0, %entry ]
%b = getelementptr inbounds %struct.foo, %struct.foo* %in, i64 %i.012, i32 1
%0 = bitcast i64* %b to i32*
%a = getelementptr inbounds %struct.foo, %struct.foo* %in, i64 %i.012, i32 0
%1 = load i32, i32* %a, align 8
%tobool.not = icmp eq i32 %1, 0
br i1 %tobool.not, label %if.end, label %land.lhs.true
land.lhs.true: ; preds = %for.body
%2 = load i32, i32* %0, align 4
%cmp2 = icmp sgt i32 %2, 0
br i1 %cmp2, label %if.then, label %if.end
if.then: ; preds = %land.lhs.true
%sub = add nsw i32 %2, -1
store i32 %sub, i32* %0, align 4
br label %if.end
if.end: ; preds = %if.then, %land.lhs.true, %for.body
%inc = add nuw nsw i64 %i.012, 1
%exitcond.not = icmp eq i64 %inc, %n
br i1 %exitcond.not, label %for.end, label %for.body
for.end: ; preds = %if.end
ret void
}