4527 Commits

Author SHA1 Message Date
Florian Hahn
05e1b5340b
[VPlan] Model FOR resume value extraction in VPlan. (#93396)
This patch uses the ExtractFromEnd VPInstruction opcode
to extract the value of a FOR to be used as resume value for the ph in
the scalar loop.

It adds a new live-out that temporarily wraps the FOR phi in the scalar
loop. fixFixedOrderRecurrence will process live outs for fixed order
recurrence phis by creating a new phi node in the scalar preheader, 
using the generated value for the live-out as incoming value from the
middle block and the original start value as incoming value for the
other edge. Creation of the phi in the preheader, as well as updating
the phi in the scalar loop will also be moved to VPlan in the future,
eventually retiring fixFixedOrderRecurrence

Depends on https://github.com/llvm/llvm-project/pull/93395

PR: https://github.com/llvm/llvm-project/pull/93396
2024-06-05 11:18:06 +01:00
Florian Hahn
e949b54a5b
[LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (#93499)
Update LAA to use PSE::getSymbolicMaxBackedgeTakenCount which returns
the minimum of the countable exits.

When analyzing dependences and computing runtime checks, we need the
smallest upper bound on the number of iterations. In terms of memory
safety, it shouldn't matter if any uncomputable exits leave the loop,
as long as we prove that there are no dependences given the minimum of
the countable exits. The same should apply also for generating runtime
checks.

Note that this shifts the responsiblity of checking whether all exit
counts are computable or handling early-exits to the users of LAA.

Depends on https://github.com/llvm/llvm-project/pull/93498

PR: https://github.com/llvm/llvm-project/pull/93499
2024-06-04 22:23:30 +01:00
Ramkumar Ramachandra
59cb55d384
VPlan: add missing case for LogicalAnd; fix crash (#93553)
VPTypeAnalysis::inferScalarTypeForRecipe is missing the case for
VPInstruction::LogicalAnd, due to which the test
vplan-incomplete-cases.ll crashes. Add this missing case, and move the
test in vplan-infer-not-or-type.ll to vplan-incomplete-cases.ll, showing
correct codegen for trip-counts 2 and 3.
2024-06-04 08:58:16 +01:00
Florian Hahn
07b330132c
[VPlan] Model FOR extract of exit value in VPlan. (#93395)
This patch introduces a new ExtractFromEnd VPInstruction opcode to
extract the value of a FOR for users outside the loop (i.e. in the
scalar loop's exits). This moves the first part of fixing first order
recurrences to VPlan, and removes some additional code to patch up
live-outs, which is now handled automatically.

The majority of test changes is due to changes in the order of which the
extracts are generated now. As we are now using VPTransformState to
generate the extracts, we may be able to re-use existing extracts in the
loop body in some cases. For scalable vectors, in some cases we now have
to compute the runtime VF twice, as each extract is now independent, but
those should be trivial to clean up for later passes (and in line with
other places in the code that also liberally re-compute runtime VFs).

PR: https://github.com/llvm/llvm-project/pull/93395
2024-06-03 20:20:30 +01:00
Florian Hahn
f7e63e8b46
[LV] Operands feeding pointers of interleave member pointers are free.
For interleave groups we only create a pointer for the start of the
interleave group, not all original loads/stores. Mark single-use ops
feeding interleave group mem ops as free when vectorizing.
2024-06-01 13:59:29 +01:00
Tyler Lanphear
d337c504ef
[SLP][NFCI] Address issues seen in downstream Coverity scan. (#93757)
- Prevent null dereference: if the Mask given to
  `ShuffleInstructionBuilder::adjustExtracts()` is empty or all-poison,
  then `VecBase` will be `nullptr` and the call to
  `castToScalarTyElem(VecBase)` will dereference it. Add an assert
  to guard against this.

- Prevent use of uninitialized scalar: in the unlikely event that
  `CandidateVFs` is empty, then `AnyProfitableGraph` will be
  uninitialized in `if` condition following the loop. (This seems like a
  false-positive, but I submitted this change anyways as initializing
  bools costs nothing and is generally good practice)
2024-05-31 18:34:23 -07:00
Florian Hahn
654cd94629
[VPlan] Unconditionally run optimizeForVFAndUF.
Now that the VPlan for the main vector loop gets cloned in the epilogue
vectorization code path, there optimizeForVFAndUF can be applied
unconditionally.
2024-05-31 06:32:49 -07:00
Florian Hahn
f38d84ce32
[VPlan] Use ir-bb prefix for VPIRBasicBlock.
Follow-up to adjust the names and tests after
https://github.com/llvm/llvm-project/pull/93398.
2024-05-30 17:43:40 -07:00
Florian Hahn
5785048321
[VPlan] Add VPIRBasicBlock, use to model pre-preheader. (#93398)
This patch adds a new special type of VPBasicBlock that wraps an
existing IR basic block. Recipes of the block get added before the
terminator of the wrapped IR basic block. Making it a subclass of
VPBasicBlock avoids duplicating various APIs to manage recipes in a
block, as well as makes sure the traversals filtering VPBasicBlocks
automatically apply as well.

Initially VPIRBasicBlock are only used for the pre-preheader (wrapping
the original preheader of the scalar loop).

As follow-up, this will be used to move more parts of the skeleton
inside VPlan, starting with the branch and condition in the middle
block.

Separated out of https://github.com/llvm/llvm-project/pull/92651

PR: https://github.com/llvm/llvm-project/pull/93398
2024-05-30 11:23:32 -07:00
Ramkumar Ramachandra
43100766f2
LV: generalize profitability criterion over TC (#93300)
Generalize LoopVectorizationPlanner::isMoreProfitable smoothly across
the fixed-vector and scalable-vector cases, taking the trip-count into
account, and fixing logical pitfalls that arise from a lack of
generality.
2024-05-30 10:54:32 +01:00
Florian Hahn
1794046536
[VPlan] Move verifier to class to reduce need to pass via args. (NFC)
Move VPlan verification functions to avoid the need to pass VPDT across
multiple calls. This also allows easier extensions in the future.
2024-05-29 21:01:28 -07:00
David Green
93d8d74ae6
[VectorCombine] Remove requirement for Instructions in shuffleToIdentity (#93543)
This removes the check that both operands of the original shuffle are
instructions, which is a relic from a previous version that held more
variables as Instructions.
2024-05-29 09:36:53 +01:00
David Green
1c6746e2db
[VectorCombine] Add support for zext/sext/trunc to shuffleToIdentity (#92696)
This is one of the simple additions to shuffleToIdentity that help it
look through intermediate zext/sext instructions.
2024-05-29 08:56:41 +01:00
David Green
516a9f5183
[VectorCombine] Add Cmp and Select for shuffleToIdentity (#92794)
Other than some additional checks needed for compare predicates and
selects with scalar condition operands, these are relatively simple
additions to what already exists.
2024-05-28 13:10:19 +01:00
David Green
f53f2a8c92
[VectorCombine] Add constant splat handling for shuffleToIdentity (#92797)
This just adds splat constants, which can be treated like any other
splat which hopefully makes them very simple. It does not try to handle
more complex constant vectors yet, just the more common splats.
2024-05-28 10:55:57 +01:00
Florian Hahn
8b037862b6
[VPlan] Preserve DT (and SCEV) in VPlan-native path (#93287)
As a follow-up to b2f65e80, use the DTU to also update and preserve
the DT in the native path. This should also allow preserving SCEV in the
native path

PR: https://github.com/llvm/llvm-project/pull/93287
2024-05-27 17:03:53 -07:00
Nikita Popov
8cdecd4d3a
[IR] Add getelementptr nusw and nuw flags (#90824)
This implements the `nusw` and `nuw` flags for `getelementptr` as
proposed at
https://discourse.llvm.org/t/rfc-add-nusw-and-nuw-flags-for-getelementptr/78672.

The three possible flags are encapsulated in the new `GEPNoWrapFlags`
class. Currently this class has a ctor from bool, interpreted as the
InBounds flag. This ctor should be removed in the future, as code gets
migrated to handle all flags.

There are a few places annotated with `TODO(gep_nowrap)`, where I've had
to touch code but opted to not infer or precisely preserve the new
flags, so as to keep this as NFC as possible and make sure any changes
of that kind get test coverage when they are made.
2024-05-27 16:05:17 +02:00
Florian Hahn
83646590af
[VPlan] Remove unused Range arg from createWidenInductionRecipe (NFC).
The Range argument is not used by createWidenInductionRecipe; induction
classification applies across the whole range of VFs. Remove the
argument.
2024-05-25 09:11:36 +01:00
Shih-Po Hung
0338c55ea5
[LV, VPlan] Check if plan is compatible to EVL transform (#92092)
The transform updates all users of inductions to work based on EVL,
instead
of the VF directly. At the moment, widened inductions cannot be updated,
so
bail out if the plan contains any.
This patch introduces a check before applying EVL transform. If any
recipes in loop rely on RuntimeVF, the plan is discarded.
2024-05-25 08:22:49 +08:00
Ramkumar Ramachandra
bb0d29a72d
[LV] fix logical error in trunc cost (#91136)
In LoopVectorizationCostModel::getInstructionCost(), when the condition
canTruncateToMinimalBitwidth() is satisfied, for a trunc, the source
type is computed as the smallest type of the source vector and the
destination vector, and the destination type is computed as the largest
type of the instruction and destination type. This is clearly a logical
error, as the original source vector type could be smaller than the
original destination vector type, and the trunc semantics are broken
because we're attempting to widen.

Fixes #47665.
2024-05-24 18:01:58 +01:00
Alexey Bataev
70a54bca6f
[SLP]Improve/fix extracts calculations for non-power-of-2 elements.
One of the previous patches introduced initial support for non-power-of-2
number of elements but some parts of the SLP vectorizer still were not
adjusted to handle the costs correctly. Patch fixes it by improving
analysis of the non-power-of-2 number of elements and fixes in the cost
of the extractelements instructions.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/93213
2024-05-24 09:33:36 -04:00
Florian Hahn
b2f65e809e
[VPlan] Use DomTreeUpdater to automatically update DT for vector loop. (#92525)
Use DTU to queue DominatorTree updates directly when connecting basic
blocks during VPlan execution.

This simplifies DT updates and will also automatically allow updating
the DT for the VPlan-native path as additional benefit in a follow-up.

PR: https://github.com/llvm/llvm-project/pull/92525
2024-05-24 10:10:12 +01:00
Alexey Bataev
554c47c8e9 [SLP]Fix undef poison vector values shuffles with poisonous vectors.
If trying to find vector value in shuffling of the extractelements and
one of the vector values is undef value, need to generate real mask value
for such vector and either undef vector, or incoming second vector, if
  non-poisonous.
2024-05-22 10:41:57 -07:00
Han-Kuan Chen
5b205956e1
[SLP] NFC. Reduce newTreeEntry usage. (#92994) 2024-05-22 23:26:33 +08:00
Alexey Bataev
30d484fa99 [SLP]Fix a crash when trying to convert masked gather nodes to strided.
Need to check if the loads node is masked gather. Only vectorized loads
can be converted to strided.
2024-05-22 08:08:56 -07:00
Florian Hahn
352dc7d4bb
[LV] Propagate PredicatedBBsAfterVectorization to predecessors.
This fixes some cases where predicated BBs where missed previously,
leading to under-estimating the cost of those blocks.
2024-05-21 10:27:32 +01:00
Ramkumar Ramachandra
e3fa7ee11d
VectorCombine: refactor foldShuffleToIdentity (NFC) (#92766)
Lift out the long lambdas into static functions, use C++ destructing
syntax, and fix other minor things to improve the readability of the
function.
2024-05-21 08:12:30 +01:00
Florian Hahn
82c5d350d2
[VPlan] Add commutative binary OR matcher, use in transform. (#92539)
Split off from https://github.com/llvm/llvm-project/pull/89386, this
extends the binary matcher to support matching commuative operations.
This is used for a new m_c_BinaryOr matcher, used in simplifyRecipe.

PR: https://github.com/llvm/llvm-project/pull/92539
2024-05-20 13:03:48 +01:00
Han-Kuan Chen
9f449c3427
[SLP] NFC. Use TreeEntry::getOperand if setOperandsInOrder is called (#92727)
already.
2024-05-20 18:46:30 +08:00
Florian Hahn
b050048d35
[VPlan] Simplify (X && Y) || (X && !Y) -> X. (#89386)
Simplify a common pattern generated for masks when folding the tail.

PR: https://github.com/llvm/llvm-project/pull/89386
2024-05-19 15:45:23 +00:00
David Green
c3677e4522 [VectorCombine] Don't transform single shuffles in shuffleToIdentity
This will help in later patches where the checks for operands being
instructions is removed, and might help not remove unnecessary poison lanes.
2024-05-18 23:37:55 +01:00
Florian Hahn
577785c5ca
[VPlan] Remove unused removeLastOperand (NFC).
The last use of the function has been removed a while ago. Remove the
unused function.
2024-05-18 18:43:20 +01:00
Florian Hahn
1e7d047c71
[VPlan] Mark LoopInfo preserved in native-path as well (NFC).
LoopInfo is updated during VPlan execution now, so it will also be
updated correctly in the native path.
2024-05-17 12:18:01 +01:00
Pietro Ghiglio
83d9aa2768
[VPlan] Add scalar inferencing support for addrspace cast (#92107)
Fixes https://github.com/llvm/llvm-project/issues/91434

PR: https://github.com/llvm/llvm-project/pull/92107
2024-05-15 14:03:21 +01:00
Jay Foad
1650f1b3d7
Fix typo "indicies" (#92232) 2024-05-15 13:10:16 +01:00
Florian Hahn
d187005cad
[VPlan] Update VPBlendRecipe codegen for for first-lane only.
Update VPBlendRecipe::execute to support generating code for first-lane
only. This fixes a crash in the newly added test
@test_not_first_lane_only_wide_compare_incoming_order_swapped.
2024-05-15 11:00:15 +01:00
Florian Hahn
67d840b60f
[VPlan] Relax over-aggressive assertion in VPTransformState::get().
There are cases where a vector value has some users that demand the
the single scalar value only (NeedsScalar), while other users demand the
vector value (see attached test cases). In those cases, the NeedsScalar
users should only demand the first lane.

Fixes https://github.com/llvm/llvm-project/issues/91883.
2024-05-14 19:10:49 +01:00
Florian Hahn
b1e99a699d
[LV] Drop redundant comment from createEdgeMask (NFC).
Follow-up to remove a redundant comment post-commit
https://github.com/llvm/llvm-project/pull/91897
2024-05-14 12:43:47 +01:00
Ramkumar Ramachandra
d7ef34bfe3
[LV] update comment following 63d8058 (NFC) (#91120)
Address a review comment post landing 63d8058 (LoopVectorize: guard
appending InstsToScalarize; fix bug) to update a comment.
2024-05-14 10:59:26 +01:00
Florian Hahn
632317e9ab
[VPlan] Add non-poison propagating LogicalAnd VPInstruction opcode. (#91897)
Add a new opcode to mode non-poison propagating logical AND operations
used when generating edge masks. This follows the similar decision to
model Not as dedicated opcode as well, to improve clarity.

This also helps to simplify the matchers for
https://github.com/llvm/llvm-project/pull/89386.


PR: https://github.com/llvm/llvm-project/pull/91897
2024-05-14 09:42:49 +01:00
Florian Hahn
e122380445
[LV] Use VPBuilder to create Select (NFCI). 2024-05-13 20:44:39 +01:00
David Green
b7ed097f29
[VectorCombine] Add intrinsics handling to shuffleToIdentity (#91000)
This is probably the most involved addition, as it tries to make use of
isTriviallyVectorizable with isVectorIntrinsicWithScalarOpAtArg to handle a
number of different intrinsics that are all lane-wise. Additional tests have
been added for some of the different intrinsics from
isVectorIntrinsicWithScalarOpAtArg / isVectorIntrinsicWithOverloadTypeAtArg.
2024-05-12 20:31:11 +01:00
Florian Hahn
c3d2af0f4e
[VPlan] VPEVLBasedIVPHI is a VPSingleDefRecipe.
VPEVLBasedIVPHIRecipe inherits from VPSingleDefRecipe. Add
VPEVLBasedIVPHISC to VPSingleDefRecipe::classof to make isa/dyn_cast &
co work as expected.

Split off https://github.com/llvm/llvm-project/pull/67934.
2024-05-09 19:18:37 +01:00
Alexey Bataev
58a94b1d0a [SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type.
Need to look through the SExt/ZExt scalars to be gathered, when trying
to reduce their width after minbitwidth analysis to prevent permanent
attempts to revectorize such gathered instructions.
2024-05-09 04:19:43 -07:00
Arthur Eubanks
2fb3774321 Revert "[SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type."
This reverts commit 2475efa91d8b4fa8f1a2d16052cb6d14be7d5dc6.

Causes crashes, see comments on 2475efa91d.
2024-05-08 23:01:47 +00:00
Alexey Bataev
2475efa91d [SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type.
Need to look through the SExt/ZExt scalars to be gathered, when trying
to reduce their width after minbitwidth analysis to prevent permanent
attempts to revectorize such gathered instructions.
2024-05-08 07:25:19 -07:00
Ramkumar Ramachandra
57b9c15227
VectorCombine: fix logical error after m_Trunc match (#91201)
The matcher m_Trunc() matches an Operator with a given Opcode, which
could either be an Instruction or ConstExpr.
VectorCombine::foldTruncFromReductions() incorrectly assumes that the
pattern matched is always an Instruction, and attempts a cast. Fix this.

Fixes #88796.
2024-05-08 09:47:55 +01:00
Florian Hahn
082c81ae4a
[LV] Properly extend versioned constant strides.
We only version unknown strides to 1. If the original type is i1, then
the sign of the extension matters. Properly extend the stride value
before replacing it.

Fixes https://github.com/llvm/llvm-project/issues/91369.
2024-05-07 21:31:42 +01:00
Alexey Bataev
f00f294130 [SLP]Fix PR91309: Do not consider SExt as always producing signed result.
Still need to do the full analysis of the signedness of the values
rather than rely on Instruction opcode, if the opcode is SExt. Still may
produce unsigned result.
2024-05-07 08:57:52 -07:00
Alexey Bataev
c144157f3d [SLP]Use last pointer instead of first for reversed strided stores.
Need to use the last address of the vectorized stores for the strided
stores, not the first one, to correctly store the data.
2024-05-06 10:16:28 -07:00