- Split from #165532. This is a step toward a unified interface for
masked/gather-scatter/strided/expand-compress cost modeling.
- Replace the ad-hoc parameter list with a single attributes object.
API change:
```
- InstructionCost getMaskedMemoryOpCost(Opcode, Src, Alignment,
- AddressSpace, CostKind);
+ InstructionCost getMaskedMemoryOpCost(MemIntrinsicCostAttributes,
+ CostKind);
```
Notes:
- NFCI intended: callers populate MemIntrinsicCostAttributes with the
same information as before.
- Follow-up: migrate gather/scatter, strided, and expand/compress cost
queries to the same attributes-based entry point.
FCmp instructions have both a predicate and fast-math flags. Introduce a
new FCmp kind, that combines both to model this correctly in the current
system.
This should be NFC modulo VPlan printing which now includes the correct
fast-math flags.
Update VPlan to populate VPIRFlags during VPInstruction construction and
use it when creating widened recipes, instead of constructing VPIRFlags
from the underlying IR instruction each time. The VPRecipeWithIRFlags
constructor taking an underlying instruction and setting the flags based
on it has been removed.
This centralizes initial VPIRFlags creation and ensures flags are
consistently available throughout VPlan transformations and makes sure
we don't accidentally re-add flags from the underlying instruction that
already got dropped during transformations.
Follow-up to https://github.com/llvm/llvm-project/pull/167253, which did
the same for VPIRMetadata.
Should be NFC w.r.t. to the generated IR.
PR: https://github.com/llvm/llvm-project/pull/168450
Update VPlan to populate VPIRMetadata during VPInstruction construction
and use it when creating widened recipes, instead of constructing
VPIRMetadata from the underlying IR instruction each time.
This centralizes VPIRMetadata in VPInstructions and ensures metadata is
consistently available throughout VPlan transformations.
PR: https://github.com/llvm/llvm-project/pull/167253
Changes: The previous patch had to be reverted to a mismatching-OpType
assert in cse. The reduced-test has now been added corresponding to a
RVV pointer-induction, and the pointer-induction case has been updated
to use createOverflowingBinaryOp.
While at it, record VPIRFlags in VPWidenInductionRecipe.
Update VPInstruction constructor to delegate to constructor with more
comprehensive checking and validation.
This required updating some unit tests, to make sure the constructed
VPInstructions are valid.
Currently, in-loop reductions for AnyOf and FindIV are not supported.
They were implicitly blocked. This happened because
RecurrenceDescriptor::getReductionOpChain could not detect their
recurrence chain. The reason is that RecurrenceDescriptor::getOpcode was
set to Instruction::Or, but the recurrence chains of AnyOf and FindIV do
not actually contain an Instruction::Or.
This patch explicitly disables in-loop reductions for AnyOf and FindIV
instead of relying on getReductionOpChain to implicitly prevent them.
[`llvm.experimental.get.vector.length`](https://llvm.org/docs/LangRef.html#id2399)
has the property that if the AVL (%cnt) is less than or equal to VF
(%max_lanes) then the return value is just AVL.
This patch uses SCEV to simplify this in optimizeForVFAndUF, and adds
`ExplicitVectorLength` to
`VPInstruction::opcodeMayReadOrWriteFromMemory` so it gets removed once
dead.
This means that VPExpressions will now be constructed for
VPPartialReductionRecipe's when the loop has tail-folding predication.
Note that control-flow (if/else) predication is not yet handled
for partial reductions, because of the way partial reductions
are recognised and built up.
Split off from #158690. Currently if an instruction needs predicated due
to tail folding, it will also have a predicated discount applied to it
in multiple places.
This is likely inaccurate because we can expect a tail folded
instruction to be executed on every iteration bar the last.
This fixes it by checking if the instruction/block was originally
predicated, and in doing so prevents vectorization with tail folding
where we would have had to scalarize the memory op anyway.
On llvm-test-suite this causes 4 loops in total to no longer be
vectorized with -O3 on arm64-apple-darwin, and there's no observable
performance impact.
Generalize VPWidenSelectRecipe codegen to consider single-scalar
conditions instead of just loop-invariant ones.
If the condition is a single-scalar, we can simply use a scalar
condition.
PR: https://github.com/llvm/llvm-project/pull/165506
BranchOnCount and BranchOnCond do not read memory, but cannot be moved.
Mark them as having side-effects, but not reading/writing memory, which
more accurately models that above. This allows removing some special
checking for branches both in the current code and future patches.
Update VPInstruction constructor to accept VPIRMetadata between the
Flags and DebugLoc parameters. This allows metadata to be passed during
construction rather than assigned afterward.
Update getSCEVExprForVPValue to handle more complex expressions, to use
it in VPReplicateRecipe::comptueCost.
In particular, it supports construction SCEV expressions for
GetElementPtr VPReplicateRecipes, with operands that are
VPScalarIVStepsRecipe, VPDerivedIVRecipe and VPCanonicalIVRecipe. If we
hit a sub-expression we don't support yet, we return
SCEVCouldNotCompute.
Note that the SCEV expression is valid VF = 1: we only support
construction AddRecs for VPCanonicalIVRecipe, which is an AddRec
starting at 0 and stepping by 1. The returned SCEV expressions could be
converted to a VF specific one, by rewriting the AddRecs to ones with
the appropriate step.
Note that the logic for constructing SCEVs for GetElementPtr was
directly ported from ScalarEvolution.cpp.
Another thing to note is that we construct SCEV expression purely by
looking at the operation of the recipe and its translated operands, w/o
accessing the underlying IR (the exception being getting the source
element type for GEPs).
PR: https://github.com/llvm/llvm-project/pull/161276
Factor out common code to determine legality of hoisting and sinking.
The patch has the side-effect of fixing an underlying bug, where a
load/store pair is reordered.
Add an member Alignment to VPWidenMemoryRecipe to store memory alignment
directly in the recipe. Update constructors, clone(), and relevant
methods to use this stored alignment instead of querying the IR
instruction. This allows VPWidenLoadRecipe/VPWidenStoreRecipe to be
constructed without relying on the original IR instruction in the
future.
When the legacy cost model scalarizes loads that are used as addresses
for other loads and stores, it looks to phi nodes, if they are direct
address operands of loads/stores. Match this behavior in
isUsedByLoadStoreAddress, to fix a divergence between legacy and
VPlan-based cost model.
Follow up on 7c4f188 ([LV] Support multiplies by constants when forming
scaled reductions), introducing m_APInt, and improving code around
canConstantBeExtended: we change canConstantBeExtended to take an APInt.
When narrowing stores of a single-scalar, we currently use
ExtractLastElement, which extracts the last element across all parts.
This is not correct if the store's address is not uniform across all
parts. If it is only uniform-per-part, the last lane per part must be
extracted. Add a new ExtractLastLanePerPart opcode to handle this
correctly. Most transforms apply to both ExtractLastElement and
ExtractLastLanePerPart, with the only difference being their treatment
during unrolling.
Fixes https://github.com/llvm/llvm-project/issues/162498.
PR: https://github.com/llvm/llvm-project/pull/163056
Replication is currently not supported for scalable VFs. Make sure
VPReplicateRecipe::computeCost returns an invalid cost early, for
scalable VFs if the recipe is not a single-scalar.
Note that this moves the existing invalid-costs.ll out of the AArch64
subdirectory, as it does not use a target triple.
Fixes https://github.com/llvm/llvm-project/issues/160792.
VPBlendRecipes are introduced as part of if-conversion, potentially adding
a def-use chain from a load used in a compare to another load/store. In
the scalar IR, there is no connection via def-use chains, so the legacy
cost model won't consider the load used by memory operation.
Skipping blends brings the VPlan-based cost-computation in line with the
legacy cost model after https://github.com/llvm/llvm-project/pull/162157.
Currently there's a crash when trying to construct VPExpressionRecipes
for a mul (ext, ext), if the multiply has outside users; the mul will be
cloned to serve its external users, but the extends won't get cloned and
will stay connected to users outside the loop (the cloned multiply).
To fix this, process recipes in reverse order. This ensures that we
visit bundled users before their operands, properly ensuring that the
extends for the external user are cloned as well.
This reverts commit f80c0baf058dbdc5 and 94eade61a02ae5.
Recommit a small fix for targets using prefersVectorizedAddressing.
Original message:
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.
There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.
Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.
PR: https://github.com/llvm/llvm-project/pull/160053
If a load is scalarized because it is used by a load/store address, the
legacy cost model does not pass ScalarEvolution to getAddressComputationCost.
Match the behavior in VPReplicateRecipe::computeCost.
This reverts commit f61be4352592639a0903e67a9b5d3ec664ad4d23.
Recommit a small fix handling scalarization overhead consistently with
legacy cost model if a load is used directly as operand of another
memory operation, which fixes
https://github.com/llvm/llvm-project/issues/161404.
Original message:
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.
There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.
Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.
PR: https://github.com/llvm/llvm-project/pull/160053
We can create partial reductions for multiplies with constants, if the
constant is small enough to be extended from source to destination type
w/o changing the value.
This only handles constant on the right side of a multiply, relying on
other passes to canonicalize the input.
Alive2 Proofs: https://alive2.llvm.org/ce/z/iWRMr6
PR: https://github.com/llvm/llvm-project/pull/161092
The VPExpressionRecipe class uses a set to store its bundled recipes. If
repeated recipes are bundled then the duplicates will be lost, causing
the following recipes to not be at the expected place in the set.
When printing a reduce.add(mul(ext, ext)) bundle, for example, if the
extends are the same then the 3rd element of the set will be the
reduction, rather than the expected mul, causing a cast error. With this
change, the recipes are at the expected index in the set.
Fixes#156464
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.
There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.
Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.
PR: https://github.com/llvm/llvm-project/pull/160053