Update the verifier to first check if the number of incoming values
matches the number of predecessors, before using
incoming_values_and_blocks. We unfortunately need also check here, as
this may be called before verifyPhiRecipes runs.
Also update the verifier unit tests, to actually fail for the expected
recipes.
Update isNoWrap to only use the inbounds/nusw flags from GEPs that are
guaranteed to be dereferenced on every iteration. This fixes a case
where we incorrectly determine no dependence.
I think the issue is isolated to code that evaluates the resulting
AddRec at BTC, just using it to compute the distance between accesses
should still be fine; if the access does not execute in a given
iteration, there's no dependence in that iteration. But isolating the
code is not straight-forward, so be conservative for now. The practical
impact should be very minor (only one loop changed across a corpus with
27k modules from large C/C++ workloads.
Fixes https://github.com/llvm/llvm-project/issues/160912.
PR: https://github.com/llvm/llvm-project/pull/161445
This change aims to avoid inserting a freeze instruction between the
load and bitcast when scalarizing extend-extract. This is particularly
useful in combination with
https://github.com/llvm/llvm-project/pull/164682, which can then
potentially further scalarize, provided there is no freeze.
alive2 proof: https://alive2.llvm.org/ce/z/W-GD88
If the parent node is non-schedulable (only externally used instructions), and at least one instruction has multiple uses and used in the binop, such copyable node should be created. Otherwise, it may contain wrong def-use chain model, which cannot be effective detected.
Fixes#166035
Currently in optimizeMaskToEVL we convert every widened load, store or
reduction to a VP predicated recipe with EVL, regardless of whether or
not it uses the header mask.
So currently we have to be careful when working on other parts VPlan to
make sure that the EVL transform doesn't break or transform something
incorrectly, because it's not a semantics preserving transform.
Forgetting to do so has caused miscompiles before, like the case that
was fixed in #113667
This PR rewrites it to work in terms of pattern matching, so it now only
converts a recipe to a VP predicated recipe if it is exactly masked with
the header mask.
After this the transform should be a true optimisation and not change
any semantics, so it shouldn't miscompile things if other parts of VPlan
change.
This fixes#152541, and allows us to move addExplicitVectorLength into
tryToBuildVPlanWithVPRecipes in #153144
It also splits out the load/store transforms into separate patterns for
reversed and non-reversed, which should make #146525 easier to implement
and reason about.
BranchOnCount and BranchOnCond do not read memory, but cannot be moved.
Mark them as having side-effects, but not reading/writing memory, which
more accurately models that above. This allows removing some special
checking for branches both in the current code and future patches.
Update VPInstruction constructor to accept VPIRMetadata between the
Flags and DebugLoc parameters. This allows metadata to be passed during
construction rather than assigned afterward.
If the laternate operation is more stricter than the main operation, we
cannot rely on the analysis of the main operation. In such case, better
to avoid doing the analysis at all, since it may affect the overall
result and lead to incorrect optimization
Fixes#165878
Update getSCEVExprForVPValue to handle more complex expressions, to use
it in VPReplicateRecipe::comptueCost.
In particular, it supports construction SCEV expressions for
GetElementPtr VPReplicateRecipes, with operands that are
VPScalarIVStepsRecipe, VPDerivedIVRecipe and VPCanonicalIVRecipe. If we
hit a sub-expression we don't support yet, we return
SCEVCouldNotCompute.
Note that the SCEV expression is valid VF = 1: we only support
construction AddRecs for VPCanonicalIVRecipe, which is an AddRec
starting at 0 and stepping by 1. The returned SCEV expressions could be
converted to a VF specific one, by rewriting the AddRecs to ones with
the appropriate step.
Note that the logic for constructing SCEVs for GetElementPtr was
directly ported from ScalarEvolution.cpp.
Another thing to note is that we construct SCEV expression purely by
looking at the operation of the recipe and its translated operands, w/o
accessing the underlying IR (the exception being getting the source
element type for GEPs).
PR: https://github.com/llvm/llvm-project/pull/161276
VPWidenCanonicalIV and VPBlend recipes are created by VPPredicator, and
VPCanonicalIVPHI and VPInstruction recipes are created by
VPlanConstruction. WidenPHIs are never created.
Refine logic to scalarize interleave group member: only skip
scalarization overhead for member being used as addresses. For others,
use the regular scalar memory op cost.
This currently doesn't trigger in practice as far as I could find, but
fixes a potential divergence between VPlan- and legacy cost models.
It fixes a concrete divergence with a follow-up patch,
https://github.com/llvm/llvm-project/pull/161276.
If the gather/buildvector node has the match and this matching node has
a scheduled copyable parent, and the parent node of the original node
has a last instruction, which is non-schedulable and is part of the
schedule copyable parent, such matching node should be excluded as
non-matching, since it produces wrong def-use chain.
Fixes#165435
This follows similar reasoning as 45ce88758d24
(https://github.com/llvm/llvm-project/pull/159556):
LV does not preserve LCSSA, it constructs it just before processing a
loop to vectorize. Runtime check expressions are invariant to that loop,
so expanding them should not break LCSSA form for the loop we are about
to vectorize.
LV creates SCEV and memory runtime checks early on and then disconnects
the blocks temporarily. The patch fixes a mis-compile, where previously
LCSSA construction during SCEV expand may replace uses in currently
unreachable SCEV/memory check blocks.
Fixes https://github.com/llvm/llvm-project/issues/162512
PR: https://github.com/llvm/llvm-project/pull/165505
Need to re-check the instruction with the non-schedulable parent, only
if this parent has a user phi node (i.e. it is used only outside the
block) and the user instruction has unique parent instruction.
Fixes issue reported in 20675ee67d (commitcomment-168863594)
A reduction (including partial reductions) with a multiply of a constant
value can be bundled by first converting it from `reduce.add(mul(ext,
const))` to `reduce.add(mul(ext, ext(const)))` as long as it is safe to
extend the constant.
This PR adds such bundling by first truncating the constant to the
source type of the other extend, then extending it to the destination
type of the extend. The first truncate is necessary so that the types of
each extend's operand are then the same, and the call to
canConstantBeExtended proves that the extend following a truncate is
safe to do. The truncate is removed by optimisations.
This is a stacked PR, 1a and 1b can be merged in any order:
1a. https://github.com/llvm/llvm-project/pull/147302
1b. https://github.com/llvm/llvm-project/pull/163175
2. -> https://github.com/llvm/llvm-project/pull/162503
Factor out common code to determine legality of hoisting and sinking.
The patch has the side-effect of fixing an underlying bug, where a
load/store pair is reordered.
Add an member Alignment to VPWidenMemoryRecipe to store memory alignment
directly in the recipe. Update constructors, clone(), and relevant
methods to use this stored alignment instead of querying the IR
instruction. This allows VPWidenLoadRecipe/VPWidenStoreRecipe to be
constructed without relying on the original IR instruction in the
future.
If the instructions in the node do not require scheduling and used
outside basic block only, still need to check, if their operands are
non-inst too. Such nodes should be emitted in the beginning of the
block.
Fixes#165151
InstSimplifyFolder can fold binary intrinsics, so take the opportunity
to unify code with getOpcodeOrIntrinsicID, and handle the case. The
additional handling of WidenGEP is non-functional, as the GEP is
simplified before it is widened, as the included test shows.
When an VF is specified via a loop hint, it will be clamped to a safe
VF or ignored if it is found to be unsafe. This is not the case for
user-specified interleave counts, which can lead to loops such as
the following with a memory dependence being vectorised with
interleaving:
```
#pragma clang loop interleave_count(4)
for (int i = 4; i < LEN; i++)
b[i] = b[i - 4] + a[i];
```
According to [1], loop hints are ignored if they are not safe to apply.
This patch adds a check to prevent vectorisation with interleaving if
isSafeForAnyVectorWidth() returns false. This is already checked in
selectInterleaveCount().
[1]
https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave
8d29d09309 exposed a crash due to incorrectly trying to handle masked
interleave recipes. For now, the current code does not support masked
interleave recipes. Bail out for them.
An assertion failed when Polly was registering for the pass manager
which assumed that there would be only Polly passes. Since this does not
need to be the case, re-apply with the assert removed.
Includes a non-Polly change to trigger the premerge CI to trigger
check-llvm which failed for 0b9a7b80c0674c5c6f746139912111bea7eae63b,
but pre-merge did not catch.
If the parent node is non-schedulable and it includes several copies of
the same instruction, its operand might be replaced by the copyable
nodes in multiple children nodes, and if the instruction is commutative,
they can be used in different operands. The compiler shall consider this
opportunity, taking into account that non-copyable children are
scheduled only ones for the same parent instruction.
Fixes#164242
Move narrowInterleaveGroups to to general VPlan optimization stage.
To do so, narrowInterleaveGroups now has to find a suitable VF where all
interleave groups are consecutive and saturate the full vector width.
If such a VF is found, the original VPlan is split into 2:
a) a new clone which contains all VFs of Plan, except VFToOptimize, and
b) the original Plan with VFToOptimize as single VF.
The original Plan is then optimized. If a new copy for the other VFs has
been created, it is returned and the caller has to add it to the list of
candidate plans.
Together with https://github.com/llvm/llvm-project/pull/149702, this
allows to take the narrowed interleave groups into account when
computing costs to choose the best VF and interleave count.
One example where we currently miss interleaving/unrolling when
narrowing interleave groups is https://godbolt.org/z/Yz77zbacz
PR: https://github.com/llvm/llvm-project/pull/149706