1955 Commits

Author SHA1 Message Date
Florian Hahn
1a9358c090
[LV] Relax over-strict assertion for reduction exit value selects.
After f108c6c, (mul x, 1) is simplified to x, which can cause the select
for the final reduction value when tail-folding to use the reduction
value for both options. Relax the assertion to make sure this case is
allowed.

Note that the reduction is now redundant itself and could be further
simplified.

Fixes #66895.
2023-09-21 10:12:29 +01:00
Jeremy Morse
e54277fa10 [NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder
This patch adds a two-argument SetInsertPoint method to IRBuilder that
takes a block/iterator instead of an instruction, and updates many call
sites to use it. The motivating reason for doing this is given here [0],
we'd like to pass around more information about the position of debug-info
in the iterator object. That necessitates passing iterators around most of
the time.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Differential Revision: https://reviews.llvm.org/D152468
2023-09-11 20:01:19 +01:00
Jeremy Morse
6942c64e81 [NFC][RemoveDIs] Prefer iterator-insertion over instructions
Continuing the patch series to get rid of debug intrinsics [0], instruction
insertion needs to be done with iterators rather than instruction pointers,
so that we can communicate information in the iterator class. This patch
adds an iterator-taking insertBefore method and converts various call sites
to take iterators. These are all sites where such debug-info needs to be
preserved so that a stage2 clang can be built identically; it's likely that
many more will need to be changed in the future.

At this stage, this is just changing the spelling of a few operations,
which will eventually become signifiant once the debug-info bearing
iterator is used.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Differential Revision: https://reviews.llvm.org/D152537
2023-09-11 11:48:45 +01:00
Florian Hahn
08de6508ab
[LV] Return debug loc directly from getDebugLocFromInstrOrOps (NFCI)
The return value of the function is only used to get the debug location.
Directly return the debug location, as this avoids an extra null
check in the caller.
2023-09-08 16:29:09 +01:00
Florian Hahn
168e23c741
[VPlan] Remove reference to Instr when setting debug loc. (NFCI)
This allows untangling references to underlying IR for various recipes.
2023-09-05 10:59:13 +01:00
Mel Chen
26aed5b9a8 [VPlan][LoopUtils] Remove unused parameter TTI
This patch removes the member TTI from VPReductionRecipe, as the
generation of reduction operations no longer requires TTI.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D158148
2023-09-04 05:30:37 -07:00
Nuno Lopes
5a3fd5f3f5 [LoopVectorizer] Fix PR #65212: vectorization of reduction loop wasn't respecting original store alignment 2023-09-03 16:35:05 +01:00
Florian Hahn
fd66195777
[VPlan] Manage compare predicates in VPRecipeWithIRFlags.
Extend VPRecipeWithIRFlags to also manage predicates for compares. This
allows removing the custom ICmpULE opcode from VPInstruction which was a
workaround for missing proper predicate handling.

This simplifies the code a bit while also allowing compares with any
predicates. It also fixes a case where the compare predixcate wasn't
printed properly for VPReplicateRecipes.

Discussed/split off from D150398.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158992
2023-09-02 21:45:24 +01:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00
Igor Kirillov
ac65fb8699 [LoopVectorize] Fix incorrect order of invariant stores when there are multiple reductions.
When a loop has multiple reductions, each with an intermediate invariant
store, the order in which those reductions are processed is not considered.
This can result in the invariant stores outside the loop not preserving the
original order.
This patch sorts VPReductionPHIRecipes by the order in which they have
stores in the original loop before running
`InnerLoopVectorizer::fixReduction` function, and it helps to maintain
the correct order of stores.

Fixes https://github.com/llvm/llvm-project/issues/64047

Differential Revision: https://reviews.llvm.org/D157631
2023-08-31 16:21:44 +00:00
Florian Hahn
96e83d3705
[LV] Use IRBuilder to create and optimize middle-block compare.
Split off from D150398 to avoid builder-related diff changes there.
Using IRBuilder to create ICmps simplifies the result if both operands
are constants.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158332
2023-08-29 11:42:18 +01:00
David Sherwood
c02184f286 [LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loop
Suppose we have a nested loop like this:

  void foo(int32_t *dst, int32_t *src, int m, int n) {
    for (int i = 0; i < m; i++) {
      for (int j = 0; j < n; j++) {
        dst[(i * n) + j] += src[(i * n) + j];
      }
    }
  }

We currently generate runtime memory checks as a precondition for
entering the vectorised version of the inner loop. However, if the
runtime-determined trip count for the inner loop is quite small
then the cost of these checks becomes quite expensive. This patch
attempts to mitigate these costs by adding a new option to
expand the memory ranges being checked to include the outer loop
as well. This leads to runtime checks that can then be hoisted
above the outer loop. For example, rather than looking for a
conflict between the memory ranges:

1. &dst[(i * n)] -> &dst[(i * n) + n]
2. &src[(i * n)] -> &src[(i * n) + n]

we can instead look at the expanded ranges:

1. &dst[0] -> &dst[((m - 1) * n) + n]
2. &src[0] -> &src[((m - 1) * n) + n]

which are outer-loop-invariant. As with many optimisations there
is a trade-off here, because there is a danger that using the
expanded ranges we may never enter the vectorised inner loop,
whereas with the smaller ranges we might enter at least once.

I have added a HoistRuntimeChecks option that is turned off by
default, but can be enabled for workloads where we know this is
guaranteed to be of real benefit. In future, we can also use
PGO to determine if this is worthwhile by using the inner loop
trip count information.

When enabling this option for SPEC2017 on neoverse-v1 with the
flags "-Ofast -mcpu=native -flto" I see an overall geomean
improvement of ~0.5%:

SPEC2017 results (+ is an improvement, - is a regression):
520.omnetpp: +2%
525.x264: +2%
557.xz: +1.2%
...
GEOMEAN: +0.5%

I didn't investigate all the differences to see if they are
genuine or noise, but I know the x264 improvement is real because
it has some hot nested loops with low trip counts where I can
see this hoisting is beneficial.

Tests have been added here:

  Transforms/LoopVectorize/runtime-checks-hoist.ll

Differential Revision: https://reviews.llvm.org/D152366
2023-08-24 12:14:02 +00:00
Florian Hahn
26bb2da28b
[VPlan] Proactively create mask for tail-folding up-front (NFCI).
Split off mask creation for tail folding and proactively create the mask for
the header block.

This simplifies createBlockInMask.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D157037
2023-08-23 21:36:16 +01:00
Kolya Panchenko
acbe886880 [LV] Vectorization remark for outerloop
Reviewed By: fhahn, ABataev

Differential Revision: https://reviews.llvm.org/D150696
2023-08-21 13:05:06 -04:00
Florian Hahn
57a6f6579c
[LV] Simplify condition for induction recipe insertion (NFCI).
Split off independent suggestion from D157037. This simplifies the
condition to decide if a recipe needs to be inserted to the header phi
section or simply appended.

The assertion has been updated to allow cases where the first non-phi
recipe is the end of the block, in which case inserting before this
point is equivalent to appending.
2023-08-21 15:58:07 +01:00
Florian Hahn
c34d049706
[LV] Re-use existing NewInsertionPoint variable for insertion (NFCI).
Split off independent suggestion from D157037.
2023-08-21 15:21:29 +01:00
Florian Hahn
56f5738d85
[LV] Move induction ::execute impls to VPlanRecipes.cpp (NFC).
All dependencies on code from LoopVectorize.cpp have been
removed/refactored. Move the ::execute implementations to other recipe
definitions in VPlanRecipes.cpp
2023-08-20 21:00:05 +01:00
Craig Topper
46eded75cd [LoopVectorize] Replace dyn_cast with isa to suppress an unused variable warning. NFC 2023-08-19 14:41:00 -07:00
Florian Hahn
622b611f23
[VPlan] Inline buildScalarSteps in single user (NFC).
Other users have been refactored, remove the uneeded function.
2023-08-19 17:02:31 +01:00
Florian Hahn
9ee4a740e3
[LV] Remove unused MiddleVPBB argument from addUsersInExitBlock (NFC).
The argument is no longer used, remove it.
2023-08-17 10:36:12 +01:00
Mel Chen
463e7cb892 [LV][VPlan] Refactor VPReductionRecipe to use reference for member RdxDesc
This commit refactors the implementation of VPReductionRecipe to use
reference instead of pointer for member RdxDesc. Because the member
RdxDesc in VPReductionRecipe should not be a nullptr, using a reference
will provide clearer semantics.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D158058
2023-08-16 19:37:49 -07:00
Florian Hahn
00bc500830
[VPlan] Store FPBinOp directly in VPDerivedIVRecipe (NFCI).
Address post-commit simplification suggestion for 8a56179bcd8c:
Store operator only for floating point inductions (i.e. the binary op is
a FPMathOperator).
2023-08-14 21:45:19 +01:00
Florian Hahn
aacaf3d580
[VPlan] Simplify VPDerivedIV truncation handling (NFCI).
Address post-commit simplification suggestion for 8a56179bcd8c: Replace
IsTruncated by conditionally setting TruncResultTy only if truncation
is required.
2023-08-14 17:33:10 +01:00
Florian Hahn
d32e68ae53
[docs] Graduate VectorizationPlan.rst from proposal.
VPlan has become an integral part of the inner loop vectorizer pipeline
that has been actively developed over the previous years. Let's move
VectorizationPlan.rst from the proposal stage to bring the docs in line
and to avoid confusion when reading the docs.

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D157593
2023-08-10 17:15:43 +01:00
Bjorn Pettersson
e53b28c833 [llvm] Drop some bitcasts and references related to typed pointers
Differential Revision: https://reviews.llvm.org/D157551
2023-08-10 15:07:07 +02:00
Florian Hahn
8a56179bcd
[VPlan] Store induction kind & binop directly in VPDerviedIVRecipe(NFC)
Limit the information stored in VPDerivedIVRecipe to the ingredients
really needed.
2023-08-10 10:57:32 +01:00
Florian Hahn
e6d5dcf84c
[LV] Pass kind and induction binop to emitTransformedIndex (NFC).
Explicitly pass InductionKind and InductionBinOp to
emitTransformedIndex. Only those values are needed from the induction
descriptor. This makes explicit what is needed for the function and
allows future use cases where the a full induction descriptor object is
not available.
2023-08-10 10:35:42 +01:00
Florian Hahn
698ae66092
[VPlan] Replace FMF in VPInstruction with VPRecipeWithIRFlags (NFC).
Update VPInstruction to use VPRecipeWithIRFlags to manage FMFs for
VPInstruction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D157144
2023-08-08 20:13:11 +01:00
Florian Hahn
af635a5547
[VPlan] Model wrap flags directly, remove *NUW opcodes (NFC)
Model wrap flags directly using VPRecipeWithIRFlags and clean up the
duplicated *NUW opcodes.

D157144 will build on this and also model FMFs for VPInstruction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D157194
2023-08-08 12:12:30 +01:00
Florian Hahn
aac8acb115
[VPlan] Model masked assumes as replicate recipes, drop them (NFCI).
Replace ConditionalAssume set by treating conditional assumes like other
predicated instructions (i.e. create a VPReplicateRecipe with a mask)
and later remove any assume recipes with masks during VPlan cleanup.

This reduces coupling of VPlan construction and Legal by removing a
shared set between the 2 and results in a cleaner code structure
overall.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D157034
2023-08-06 20:56:24 +01:00
Florian Hahn
a6d6730709
[LV] Split off code to optimize initial VPlan (NFC).
Split up tryToBuildVPlanWithVPRecipes into intial plan creation and
optimizations, by introducing a VPLanTransform::optimize helper.

Depends on D154640.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154644
2023-08-04 13:21:20 +01:00
Florian Hahn
39cf210450
[LV] Remove unnecessary std::move from tryToBuildVPlanWith.. (NFC).
Split off D154644.
2023-08-04 11:56:05 +01:00
Florian Hahn
c30099ef0b
[LV] Return null VPlanPtr instead of std::optional for tryToBuild (NFC)
Cleanup in preparation for D154644. This was suggested earlier and helps
to simplify the code with D154644.
2023-08-04 11:48:24 +01:00
Florian Hahn
deec9e7674
[VPlan] Move VPTransformState::get() to VPlan.cpp (NFC).
The last dependency of code defined in LoopVectorize.cpp has been
removed a while ago. Move VPTransformState::get() to VPlan.cpp where
other members are also defined.
2023-08-03 21:49:58 +01:00
Mel Chen
425e9e81a0 [LV] Rename the Select[I|F]Cmp reduction pattern to [I|F]AnyOf. (NFC)
Regarding this NFC change, please refer to the discussion in this thread. https://reviews.llvm.org/D150851#4467261

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D155786
2023-08-03 00:37:19 -07:00
Mel Chen
97cccdd9f3 [LV][NFC] Remove the redundant braces. 2023-08-02 20:45:04 -07:00
Florian Hahn
8ea274b46b
[VPlan] Fix in-loop reduction chains using VPlan def-use chains (NFCI)
Update adjustRecipesForReductions to directly use the VPlan def-use
chains for in-loop reductions to collect the reduction operations that
need adjusting.

This allows the removal of
 * ReductionChainMap
 * recording of recipes for instruction in the reduction chain
 * removes late uses of getVPValue
 * removes to need for removeVPValueFor.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D155845
2023-08-02 17:04:29 +01:00
Florian Hahn
d1d0e135a1
[LV] Move packScalarIntoVectorValue to VPTransformState (NFC).
This moves packScalarIntoVectorValue from ILV to the more approriate
VPTransformState.
2023-08-02 12:36:48 +01:00
Bjorn Pettersson
408cc94445 [LV][LSV][SLP] Drop some typed pointer bitcasts
Differential Revision: https://reviews.llvm.org/D156736
2023-08-02 12:08:37 +02:00
Florian Hahn
707359ecf5
Recommit "[LV] Re-use existing broadcast value for live-ins."
This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53.

Recommits eea9258648ce with a fix to only erase the instruction from the
first part if it is defined outside the loop. This fixes a
use-after-free error reported.
2023-08-01 15:54:02 +01:00
Florian Hahn
822c749aec
[LV] Shrink operands before creating new instr to force eval order.
Shrink operands before creating the new instruction to make sure the
same evaluation order is used on all platforms. This fixes buildbot
failures due to different argument evaluation order on different
systems.
2023-07-30 17:16:37 +01:00
Martin Storsjö
245ec675a4 Revert "[LV] Re-use existing broadcast value for live-ins."
This reverts commit eea9258648ce73507f6f85c395de978af659d498.

That commit triggered crashes in the following testcase:

$ cat reduced.c
typedef struct {
  int a[8]
} b;
typedef struct {
  b *c;
  short d
} e;
void f() {
  int g;
  char *h;
  e *i = f;
  short j = i->d;
  int a = i->c->a[0];
  for (;;)
    for (; g < a; g++) {
      *h = j * i->d >> 8;
      h++;
    }
}
$ clang -target aarch64-linux-gnu -w -c -O2 reduced.c
2023-07-25 10:35:41 +03:00
Florian Hahn
eea9258648
[LV] Re-use existing broadcast value for live-ins.
When requesting a vector value for a live-in, we can re-use the
broadcast of the live-in of part 0 for parts > 0.
2023-07-24 11:50:47 +01:00
Florian Hahn
25d34215bb
[LV] Replace use of getMaxSafeDepDist with isSafeForAnyVector (NFC)
Replace the use of getMaxSafeDepDistBytes with the more direct
isSafeForAnyVector. This removes the need to define getMaxSafeDepDistBytes.
2023-07-21 22:05:50 +02:00
Florian Hahn
68746a8cea
[LV] Move all VPlan transforms after initial VPlan construction.
Reorder VPlan transforms slightly so they are all grouped together,
after disabling Value -> VPValue lookup. In terms of codegen impact,
this should be NFC modulo a small number of instruction reorderings.

Preparation to split up tryToBuildVPlanWithVPRecipes in a follow-up.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154640
2023-07-18 10:53:30 +01:00
Nikita Popov
94abecca6b [IVDescriptors] Remove typed pointer support (NFC)
This also removes the element type from the descriptor, as it is
always i8. The meaning of the step is now the same between
integers and pointers.
2023-07-12 15:48:29 +02:00
Florian Hahn
9259f41e62
[VPlan] Clear reduction flags directly as VPlanTransform.
After D150027, all relevant recipes should model their IR flags
directly. Instead of removing the flags after codegen as part of
fixReductions, drop poison generating flags directly from the recipes.

Depends on D150027.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D150028
2023-07-09 21:11:51 +01:00
Florian Hahn
14ec3f4b06
[LV] Skip VFs > # iterations remaining for epilogue vectorization.
If a candidate VF for epilogue vectorization is greater than the number of
remaining iterations, the epilogue loop would be dead. Skip such factors.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154264
2023-07-07 21:43:51 +01:00
Florian Hahn
aee851fd0e
Revert "[LV] Skip VFs < iterations remaining for epilogue vectorization."
This reverts commit 7cc0be01a0068946ea3613dc2cb45c81b0f45860.

The title of the commit is incorrect, revert to fix the commit message.
2023-07-07 21:41:24 +01:00
Florian Hahn
7cc0be01a0
[LV] Skip VFs < iterations remaining for epilogue vectorization.
If a candidate VF for epilogue vectorization is less than the number of
remaining iterations, the epilogue loop would be dead. Skip such factors.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154264
2023-07-07 20:33:42 +01:00