1996 Commits

Author SHA1 Message Date
Florian Hahn
6dda74cc51
[VPlan] Use createSelect in adjustRecipesForReductions (NFCI).
Simplify the code and rename Result->NewExitingVPV as suggested by
@ayalz in https://github.com/llvm/llvm-project/pull/70253.
2024-01-03 20:54:10 +00:00
Florian Hahn
f18536d642
[VPlan] Model address separately. (#72164)
Move vector pointer generation to a separate VPVectorPointerRecipe.
This untangles address computation from the memory recipes future
and is also needed to enable explicit unrolling in VPlan.

https://github.com/llvm/llvm-project/pull/72164
2024-01-01 19:51:15 +00:00
Florian Hahn
516cc98aff
[LV] Fix typo in comment (NFC). 2023-12-28 21:20:10 +00:00
Florian Hahn
a5891fa4d2
[VPlan] Initial modeling of VF * UF as VPValue. (#74761)
This patch starts initial modeling of VF * UF in VPlan.
Initially, introduce a dedicated VFxUF VPValue, which is then
populated during VPlan::prepareToExecute. Initially, the VF * UF
applies only to the main vector loop region. Once we extend the
scope of VPlan in the future, we may want to associate different VFxUFs
with different vector loop regions (e.g. the epilogue vector loop)

This allows explicitly parameterizing recipes that rely on the
VF * UF, like the canonical induction increment. At the moment, this
mainly helps to avoid generating some duplicated calls to vscale with
scalable vectors. It should also allow using EVL as induction increments
explicitly in D99750. Referring to VF * UF is also needed in other
places that we plan to migrate to VPlan, like the minimum trip count
check during skeleton creation.

The first version creates the value for VF * UF directly in
prepareToExecute to limit the scope of the patch. A follow-on patch will
model VF * UF computation explicitly in VPlan using recipes.

Moved from Phabricator (https://reviews.llvm.org/D157322)
2023-12-08 18:30:30 +00:00
Graham Hunter
d0d5ef8133
[LV] Add support for linear arguments for vector function variants (#73941)
If we have vectorized variants of a function which take linear
parameters, we should be able to vectorize assuming the strides match.
2023-12-08 10:24:05 +00:00
Alexey Bataev
056367bb19
[LV]Support dropping of nneg flag for zext widencast recipes. (#74112)
Compiler crashes when the assertion triggered for zext nneg instruction,
that checks that the instruction cannot produce poison. Changed the base
class for widencast recipe to handle dropping nneg flag to avoid
compiler crash.
2023-12-05 09:17:23 -05:00
Florian Hahn
70535f5e60
[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version.
This patch replaces the IR based truncateToMinimalBitwidths with a VPlan
version. This has 3 benefits:
1) the VPlan-based version is simpler; we don't need to implement
   special codegen for each supported instruction type like the IR based
   one.
2) Removes a dependency on the cost-model after VPlan execution and
3) Removes a use of getVPValue that uses underlying values after VPlan
   execution (See removed FIXME).

Depends on D149081.

Depends on D149079.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149903
2023-12-02 16:12:38 +00:00
Florian Hahn
cbf7b52a65
[VPlan] Properly update reduction live-out after placing select.
After inserting a select for the final value, update the VPlan def-use
chains. At the moment, the incorrect live-out doesn't cause a
mis-compile, as computing the final reduction value is not yet modeled
in VPlan.
2023-12-02 15:22:09 +00:00
Graham Hunter
104b7c624e
[LV] Add support for uniform parameters on vectorized function variants (#72891)
Parameters marked as uniform take a scalar value, assuming the value is
invariant in the scalar loop.
2023-11-28 15:01:32 +00:00
Florian Hahn
906f598263
[VPlan] Remove dead IsEpilogueVec argument from prepareToExecute (NFC). 2023-11-23 16:59:50 +00:00
Matthias Braun
2404477219
LoopVectorize: Add better heuristic for vectorized epilogue skip test (#72589)
This is a follow-up to PR #72450 correcting the branch_weights used
for the test whether the vectorized epilogue loop should be skipped.
2023-11-20 11:02:27 -08:00
Graham Hunter
4d64a2bcd3
[LV] Refactor vector function variant selection to prepare for uniform args (#68879)
Parameters marked as uniform take a scalar value, assuming the value is
invariant in the scalar loop. In order to support this, we need to stop
asking for a vector function variant with a default shape assuming that all
arguments will become vector arguments, and instead consider all available
variants and their parameter types.
2023-11-20 13:30:03 +00:00
Florian Hahn
7fd021a092
[LV] Don't crash on vector masks during scalar VPReductionRecipe::exec.
VPReductionRecipe may be executed for scalar VFs. Make sure to access
part 0 of the condition, as it could be an active-lane-mask, which is a
vector <1 x i1>

Fixes https://github.com/llvm/llvm-project/issues/72720.
2023-11-18 21:52:22 +00:00
Florian Hahn
2a9aed1730
[LV] Retain mask-reversal comment as suggested after e5e71af.
Address post-commit comment to retain comment.
2023-11-18 20:53:23 +00:00
Florian Hahn
e5e71affb7
[LV] Reverse mask up front, not when creating vector pointer. (#72163)
Reverse mask early on when populating BlockInMask. This will enable
separating mask management and address computation from the memory
recipes in the future and is also needed to enable explicit unrolling in
VPlan.
2023-11-17 13:59:35 +00:00
Nikita Popov
de176d8c54
[SCEV][LV] Invalidate LCSSA exit phis more thoroughly (#69909)
This an alternative to #69886. The basic problem is that SCEV can look
through trivial LCSSA phis. When the phi node later becomes non-trivial,
we do invalidate it, but this doesn't catch uses that are not covered by
the IR use-def walk, such as those in BECounts.

Fix this by adding a special invalidation method for LCSSA phis, which
will also invalidate all the SCEVUnknowns/SCEVAddRecExprs used by the
LCSSA phi node and defined in the loop.

We should probably also use this invalidation method in other places
that add predecessors to exit blocks, such as loop unrolling and loop
peeling.

Fixes #69097.
Fixes #66616.
Fixes #63970.
2023-11-17 09:34:24 +01:00
Matthias Braun
a9cc6fc280
LoopVectorize: Set branch_weight for conditional branches (#72450)
Consistently add `branch_weights` metadata in any condition branch
created by `LoopVectorize.cpp`:
- Will only add metadata if the original loop-latch branch had metadata
assigned.
- Most checks should rarely trigger so I am using a 127:1 ratio.
- For the middle block we assume an equal distribution of modulo
results.
2023-11-16 11:33:46 -08:00
Graham Hunter
b070629c10
[LV] Increase max VF if vectorized function variants exist (#66639)
If there are function calls in the candidate loop and we have vectorized
variants available, try some wider VFs in case the conservative initial
maximum based on the widest types in the loop won't actually allow us
to make use of those function variants.
2023-11-13 10:27:10 +00:00
Florian Hahn
34c2dcd5ac
[VPlan] Move initial skeleton construction to createInitialVPlan. (NFC)
This patch moves creating the  middle VPBBs and an initial empty
vector loop region for the top-level loop to createInitialVPlan.

This consolidates code to create the initial VPlan skeleton and enables
adding other bits outside the main region during initial VPlan
construction. In particular, D150398 will add the exit check & branch to
the middle block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158333
2023-11-12 13:00:44 +00:00
Florian Hahn
ed6f4994d8
[VPlan] Handle conditional ordered reductions with scalar VFs.
VPReductionRecipe::execute was not handling predicates for ordered
reduction with scalar VFs, which was causing a crash. Thsi patch adds
dedicated handling for scalar VFs when dealing with the condition.
The other operands are already handled in a similar fashion below.

Fixes #70988.
2023-11-11 12:55:40 +00:00
Simon Pilgrim
3ca4fe80d4 [Transforms] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC.
startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)
2023-11-06 16:50:18 +00:00
Florian Hahn
fd82b5b287
[LV] Support recieps without underlying instr in collectPoisonGenRec.
Support recipes without underlying instruction in
collectPoisonGeneratingRecipes by directly trying to dyn_cast_or_null
the underlying value.

Fixes https://github.com/llvm/llvm-project/issues/70590.
2023-11-03 10:21:14 +00:00
Igor Kirillov
70904226e1
[LoopVectorize] Enhance Vectorization decisions for predicate tail-folded loops with low trip counts (#69588)
* Avoid using `CM_ScalarEpilogueNotAllowedLowTripLoop` for loops known
to be predicate tail-folded, delegating to `areRuntimeChecksProfitable`
to decide on the profitability of vectorizing loops with runtime checks.
* Update the `areRuntimeChecksProfitable` function to consider the
`ScalarEpilogueLowering` setting when assessing vectorization of a loop.

With this patch, we can make more informed decisions for loops with low
trip counts, especially when leveraging Profile-Guided Optimization
(PGO) data.
2023-10-30 13:43:26 +00:00
Florian Hahn
b0b88643a1
[VPlan] Add initial anlysis to infer scalar type of VPValues. (#69013)
This patch adds initial type inferrence for VPValues. It infers the
scalar type of a VPValue, by bottom-up traversing through defining
recipes until root nodes with known types are reached (e.g. live-ins or
load recipes). The types are then propagated top down through
operations.

This is intended as building block for a VPlan-based cost model, which
will need access to type information for VPValues/recipes.

Initial testing is done by asserting the inferred type matches the type
of the result value generated for a widen and replicate recipes.
2023-10-27 14:38:28 +01:00
Florian Hahn
4f56d47d05
[VPlan] Make ExpandedSCEVs argument const (NFC).
The argument is only used in const contexts. Simplifies a follow-up
diff.
2023-10-22 12:31:55 +01:00
Florian Hahn
ca01f2af78
[LV] Enforce order of reductions with intermediate stores in VPlan (NFC)
Reductions with intermediate stores currently need to be fixed in order
of their intermediate stores. Instead of doing this at fixup time after
code has been generated, sort the reductions in adjustRecipesForReductions.

This makes the order explicit in VPlan and will enable removing
fixReductions with modeling computing the final reduction result in
VPlan, followed by also modeling the intermediate stores explicitly.
2023-10-21 21:26:52 +01:00
Lou Knauer
852bac4439
[VPlan] Support scalable vectors in outer-loop vectorization
This patch enables scalable vectors in the VPlan-native path.
If a vectorization factor is specified via loop vectorization hints,
that factor is used. If no vectorization factor is specified, but the
target preferes scalable vectorization, a scalable vectorization factor
is selected.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D157484
2023-10-20 23:17:35 +01:00
Florian Hahn
2ec7bba77b
Recommit "[VPlan] Insert Trunc/Exts for reductions directly in VPlan."
This reverts commit e4ea0997486000b460c4875a00301b73b3c0d6a7.

The recommit fixes a reported crash by adding a missing check to make
sure the cast recipes are only introduced when vectorizing.

Test coverage added in 3cac608fbd0811b2f5c59c6e13148162ccd8543e.
Original commit message:
   Update the code to create Trunc/Ext recipes directly in
    adjustRecipesForReductions instead of fixing it up later in
    fixReductions.

    This explicitly models the required conversions and also makes sure they
    are generated at the right place (instead of after the exit condition),
    hence the changes in a few tests.
2023-10-20 14:30:04 +01:00
Fangrui Song
e4ea099748 Revert "[VPlan] Insert Trunc/Exts for reductions directly in VPlan."
This reverts commit fd311126349b8fe1684d62154a9fa5a7bbb0b713.

There are two different crash reports on fd31112634
2023-10-18 23:25:31 -07:00
Florian Hahn
fd31112634
[VPlan] Insert Trunc/Exts for reductions directly in VPlan.
Update the code to create Trunc/Ext recipes directly in
adjustRecipesForReductions instead of fixing it up later in
fixReductions.

This explicitly models the required conversions and also makes sure they
are generated at the right place (instead of after the exit condition),
hence the changes in a few tests.
2023-10-17 19:17:40 +01:00
Yingwei Zheng
4718b4011f
[LV] Invalidate disposition of SCEV values after loop vectorization (#69230)
This PR fixes the assertion failure of `SE.verify()` after loop vectorization.
2023-10-17 03:49:39 +08:00
Florian Hahn
b1115f8cce
[LV] Use LatchVPBB directly instead of going through region (NFC).
Split off from D158333.
2023-10-13 20:08:31 +01:00
Rin
df8e0d057d
[AArch64][LoopVectorize] Use upper bound trip count instead of the constant TC when choosing max VF (#67697)
This patch is based off of
https://github.com/llvm/llvm-project/pull/67543.

We are currently using the exact trip count to make decisions regarding
the maximum VF. We can instead use the upper bound TC, which will be the
same as the constant trip count when that is known.
2023-10-09 16:26:19 +01:00
Graham Hunter
3273ea40e5
[LV] Cache call vectorization decisions (#66521)
LoopVectorize currently queries VFDatabase repeatedly for each CI,
and each query to VFDatabase rescans all vector variants.

This patch instead makes a decision for each call once per VF based
on the cost of scalarization vs. function call to a vector variant
of the function vs. a vector intrinsic, then caches the decision
along with relevant info for use in planning and plan execution.
2023-10-09 11:23:19 +01:00
Florian Hahn
dae91f5dbc
[VPlan] Avoid VPTransformState::reset in fixReduction (NFCI).
There's no need to repeatedly query and reset the state for
LoopExitInstDef. This removes one of the last uses of
VPTransformState::reset, by use a vector to store and update the
results. No other code should try to retrieve the result from State
outside the fixReductionCall.
2023-10-07 23:24:24 +01:00
Rin
d3e4702c0f
[AArch64] [LoopVectorize] Use either fixed-width or scalable VF when tail-folding (#67543)
Since the getMaximisedVFForTarget function is called twice, once for fixed-width and once for scalable, it adds no value to always return a fixed-width VF. Instead, when we are tail-folding, we can use either fixed-width or scalable vectors.
2023-10-05 10:24:30 +01:00
Florian Hahn
07e715953b
[VPlan] Check users of LoopExitInstDef in VPlan directly. (NFCI)
Instead of walking the IR def use chains of the generated code, adjust
the generated VPInstruction if needed and check its users in VPlan.
2023-10-03 20:42:15 +01:00
Florian Hahn
97687b7aea
[VPlan] Add active-lane-mask as VPlan-to-VPlan transformation.
This patch updates the mask creation code to always create compares of
the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front
when tail folding and introduce active-lane-mask as later
transformation.

This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count)
the canonical form for tail-folding early on. Introducing more specific
active-lane-mask recipes is treated as a VPlan-to-VPlan optimization.

This has the advantage of keeping the logic  (and complexity) of
introducing active-lane-mask recipes in a single place, instead of
spreading the logic out across multiple functions. It also simplifies
initial VPlan construction and enables treating introducing EVL as
similar optimization.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158779
2023-09-25 13:34:45 +01:00
Florian Hahn
1a9e45080f
[VPBuilder] Add setInsertPoint version taking a recipe directly (NFC).
This helps to slightly simplify code when a recipe can be obtained
easily. Suggested in D158779.
2023-09-25 12:17:53 +01:00
Youngsuk Kim
e5026f0179 [llvm] Remove uses of Type::getPointerTo() (NFC)
Partial progress towards removing in-tree uses of `getPointerTo()`,
by employing the following options:

* Drop the call entirely if the sole purpose of it is to support a no-op
  bitcast (remove the no-op bitcast as well).

* Replace with `PointerType::get()`/`PointerType::getUnqual()`

This is a NFC cleanup effort.

Reviewed By: barannikov88

Differential Revision: https://reviews.llvm.org/D155232
2023-09-22 19:44:38 -04:00
Florian Hahn
f23246a0bb
[LV] Directly add fast-math flags to select recipe (NFC).
Now that VPInstruction can manage fast math flags via
VPRecipeWithIRFlags, use them directly to model the fast-math flags of
the select created for the final reduction value instead of adding them
late.
2023-09-21 11:05:55 +01:00
Florian Hahn
1a9358c090
[LV] Relax over-strict assertion for reduction exit value selects.
After f108c6c, (mul x, 1) is simplified to x, which can cause the select
for the final reduction value when tail-folding to use the reduction
value for both options. Relax the assertion to make sure this case is
allowed.

Note that the reduction is now redundant itself and could be further
simplified.

Fixes #66895.
2023-09-21 10:12:29 +01:00
Jeremy Morse
e54277fa10 [NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder
This patch adds a two-argument SetInsertPoint method to IRBuilder that
takes a block/iterator instead of an instruction, and updates many call
sites to use it. The motivating reason for doing this is given here [0],
we'd like to pass around more information about the position of debug-info
in the iterator object. That necessitates passing iterators around most of
the time.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Differential Revision: https://reviews.llvm.org/D152468
2023-09-11 20:01:19 +01:00
Jeremy Morse
6942c64e81 [NFC][RemoveDIs] Prefer iterator-insertion over instructions
Continuing the patch series to get rid of debug intrinsics [0], instruction
insertion needs to be done with iterators rather than instruction pointers,
so that we can communicate information in the iterator class. This patch
adds an iterator-taking insertBefore method and converts various call sites
to take iterators. These are all sites where such debug-info needs to be
preserved so that a stage2 clang can be built identically; it's likely that
many more will need to be changed in the future.

At this stage, this is just changing the spelling of a few operations,
which will eventually become signifiant once the debug-info bearing
iterator is used.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Differential Revision: https://reviews.llvm.org/D152537
2023-09-11 11:48:45 +01:00
Florian Hahn
08de6508ab
[LV] Return debug loc directly from getDebugLocFromInstrOrOps (NFCI)
The return value of the function is only used to get the debug location.
Directly return the debug location, as this avoids an extra null
check in the caller.
2023-09-08 16:29:09 +01:00
Florian Hahn
168e23c741
[VPlan] Remove reference to Instr when setting debug loc. (NFCI)
This allows untangling references to underlying IR for various recipes.
2023-09-05 10:59:13 +01:00
Mel Chen
26aed5b9a8 [VPlan][LoopUtils] Remove unused parameter TTI
This patch removes the member TTI from VPReductionRecipe, as the
generation of reduction operations no longer requires TTI.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D158148
2023-09-04 05:30:37 -07:00
Nuno Lopes
5a3fd5f3f5 [LoopVectorizer] Fix PR #65212: vectorization of reduction loop wasn't respecting original store alignment 2023-09-03 16:35:05 +01:00
Florian Hahn
fd66195777
[VPlan] Manage compare predicates in VPRecipeWithIRFlags.
Extend VPRecipeWithIRFlags to also manage predicates for compares. This
allows removing the custom ICmpULE opcode from VPInstruction which was a
workaround for missing proper predicate handling.

This simplifies the code a bit while also allowing compares with any
predicates. It also fixes a case where the compare predixcate wasn't
printed properly for VPReplicateRecipes.

Discussed/split off from D150398.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158992
2023-09-02 21:45:24 +01:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00