2056 Commits

Author SHA1 Message Date
Florian Hahn
faa8f582b9
[VPlan] Add printing test with fast-math flags.
Add missing test coverage for D150029.
2023-05-09 22:43:03 +01:00
Noah Goldstein
7770b0abfd [KnownBits] Improve KnownBits::rem(X, Y) in cases where we can deduce low-bits of output
The first `cttz(Y)` bits in `X` are translated 1-1 in the output.

Alive2 Links:
    https://alive2.llvm.org/ce/z/Qc47p7
    https://alive2.llvm.org/ce/z/19ut5H

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D149421
2023-05-07 19:11:53 -05:00
Florian Hahn
e3afe0b89d
[VPlan] Add VPWidenCastRecipe, split off from VPWidenRecipe (NFCI).
To generate cast instructions, the result type is needed. To allow
creating widened casts without underlying instruction, introduce a new
VPWidenCastRecipe that also holds the result type.

This functionality will be used in a follow-up patch to
implement truncateToMinimalBitwidths as VPlan-to-VPlan transform.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149081
2023-05-05 13:20:16 +01:00
Florian Hahn
b85a402dd8
[VPlan] Introduce new entry block to VPlan for early SCEV expansion.
This patch adds a new preheader block the VPlan to place SCEV expansions
expansions like the trip count. This preheader block is disconnected
at the moment, as the bypass blocks of the skeleton are not yet modeled
in VPlan.

The preheader block is executed before skeleton creation, so the SCEV
expansion results can be used during skeleton creation. At the moment,
the trip count expression and induction steps are expanded in the new
preheader. The remainder of SCEV expansions will be moved gradually in
the future.

D147965 will update skeleton creation to use the steps expanded in the
pre-header to fix #58811.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147964
2023-05-04 14:00:13 +01:00
Florian Hahn
79692750d2
[LV] Use VPValue for SCEV expansion in fixupIVUsers.
The step is already expanded in the VPlan. Use this expansion instead.
This is a step towards modeling fixing up IV users in VPlan.

 It also fixes a crash casued by SCEV-expanding the Step expression in
fixupIVUsers, where the IR is in an incomplete state

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147963
2023-05-04 09:25:59 +01:00
Philip Reames
cb3cb417a0 [LV] Refresh some auto-gen tests to reduce diff [nfc] 2023-05-01 13:55:11 -07:00
Philip Reames
30cdb2ac7e [LAA] Add command line flag to disable unit stride speculation
This is purely so that we can expose and work through downstream codegen issues.  My intention is to see if we can get this disabled by default, but that requires fixing a bunch of downstream issues first.
2023-05-01 10:49:51 -07:00
Yingwei Zheng
6d667d4b26
[InstCombine] Combine const GEP chains
This patch reverts rGae739aefd7473517d3f08b5c8d08a66c7f469198 to address performance regressions reported by our [CI](https://github.com/dtcxzyw/llvm-ci/issues/137) after rG2ec1d0f427c7822540352c0c14d057e7bfe4f77b.

For example:
```
define ptr @const_gep_chain(ptr %p, i64 %a) {
    %p1 = getelementptr inbounds i8, ptr %p, i64 %a
    %p2 = getelementptr inbounds i8, ptr %p1, i64 1
    %p3 = getelementptr inbounds i8, ptr %p2, i64 2
    %p4 = getelementptr inbounds i8, ptr %p3, i64 3
    ret ptr %p4
}
```
The last three GEPs will not be folded since rG2ec1d0f427c7822540352c0c14d057e7bfe4f77b.

I think it is appropriate to remove this code because there is no compile-time regression reported in our benchmarks.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D149240
2023-05-02 00:28:39 +08:00
Noah Goldstein
d840391401 [ValueTracking] Add logic for isKnownNonZero(smin/smax X, Y)
For `smin` if either `X` or `Y` is negative, the result is non-zero.
For `smax` if either `X` or `Y` is strictly positive, the result is
non-zero.

For both if `X != 0` and `Y != 0` the result is non-zero.

Alive2 Link:
    https://alive2.llvm.org/ce/z/7yvbgN
    https://alive2.llvm.org/ce/z/zizbvq

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D149417
2023-04-30 10:06:46 -05:00
Noah Goldstein
883daa7ac4 [ValueTracking] Add logic for isKnownNonZero(umax X, Y)
`(umax X, Y) != 0` -> `X != 0 || Y != 0`

Alive2 Link:
    https://alive2.llvm.org/ce/z/_Z9AUT

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D149415
2023-04-30 10:06:46 -05:00
Mel Chen
de01dba7f2 [LV] Add tests for integer min max with index reduction pattern. (NFC)
The test case for signed max with index, include strict and non-strict
max.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D146718
2023-04-28 05:13:23 -07:00
Florian Hahn
07e5f57df4
[LV] Add tests for #60831.
Also contains an extra test mentioned in D144434.
2023-04-28 10:42:01 +01:00
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
Florian Hahn
883eb88cae
[LV] Add extra uniformity tests with LSHR and AND.
Extra tests for D148841 based on the tests added in
95539186c82604f783.
2023-04-26 19:51:35 +01:00
Nikita Popov
1745341296 [LoopVectorize] Preserve SCEV
As far as I can tell, LoopVectorize preserves SCEV, mainly by dint
of forgetting the loop being vectorized. We should mark it as
preserved in the pass manager.

This is a very small compile-time improvement.

Differential Revision: https://reviews.llvm.org/D149147
2023-04-26 09:43:54 +02:00
Vasileios Porpodas
95539186c8 [LV][NFC] Precommit test for a follow-up patch that introduces uniformity for a specific VF.
Differential Revision: https://reviews.llvm.org/D147734
2023-04-25 14:12:22 -07:00
Paul Osmialowski
9cf1881f8f [SCEV] Do not plant SCEV checks unnecessarily
The vectorisation analysis collects strides for loop invariant
pointers, which is wrong because they are not strided. We don't
need to generate SCEV checks (which are costly performancewise)
for such pointers, we just need to do the appropriate aliasing
checks.

This patch fixes the problem by changing getStrideFromPointer()
to treat loop invariant pointers as having no stride.

Originally proposed by David Sherwood with further suggestions
from Florian Hahn.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D146958
2023-04-25 21:47:14 +01:00
Paul Osmialowski
20d0f80dd3 [test] A test case for D146958
This commit introduces a test for the change introduced by the
`[SCEV] Do not plant SCEV checks unnecessarily` commit,
D146958.

Reviewed By: fhahn, david-arm

Differential Revision: https://reviews.llvm.org/D146974
2023-04-25 21:36:20 +01:00
Nikita Popov
d9dcd52d16 [LoopVectorize] Convert test to opaque pointers (NFC) 2023-04-25 15:28:01 +02:00
David Green
1869a9c225 [LV] Use the known trip count when costing non-tail folded VFs
Now that we store the ScalarCost in the VectorizationFactor it is possible to
use it to get a slightly more accurate cost in isMoreProfitable between two
vector factors. This extends the logic added in D101726 to non-tail-folded
cases, using the costs of `VecCost * (TripCount / VF) + ScalarCost * (TripCount % VF)`
to compare VFs where the TripCount is known and we are not folding the tail.

This shouldn't alter very much as small trip counts are usually not vectorized,
but does seem to help in the testcase where 4 * VF4 is chosen as profitable
compared to 2 * VF8 + 4 * scalar.

Differential Revision: https://reviews.llvm.org/D147720
2023-04-24 22:02:30 +01:00
Jay Foad
593e25ffae [Vectorize] Fix vectorization, scalarization and folding of llvm.is.fpclass
llvm.is.fpclass is different from other vectorizable intrinsics in that
it is overloaded on an argument type, not on the return type.

Differential Revision: https://reviews.llvm.org/D148905
2023-04-24 13:42:08 +01:00
Jay Foad
3237497d01 [Vectorize] Pre-commit tests for D148905
Differential Revision: https://reviews.llvm.org/D149050
2023-04-24 13:42:08 +01:00
Nikita Popov
d003c01c30 [LV][IndVars] Move test to correct directory and regenerate (NFC)
For some reason, an IndVarSimplify test was in the LoopVectorize
directory.
2023-04-21 18:03:41 +02:00
Florian Hahn
6b8d19d2b5
Recommit "[VPlan] Switch to checking sinking legality for recurrences in VPlan."
This reverts the revert commit 3d8ed8b5192a59104bfbd5bf7ac84d035ee0a4a5.

The new version of the patch adds a set to avoid duplicating work in
isFixedOrderRecurrence, which was previously done through the removed
SinkAfter map.

Original commit message:
    Building on D142885 and D142589, retire the SinkAfter map from the
    recurrence handling code. It is replaced by checking whether it is
    possible to sink all users of a recurrence directly in VPlan. This
    results in simpler code overall and allows to handle additional cases
    (see the improvements in @test_crash).

    Depends on D142885.
    Depends on D142589.

    Reviewed By: Ayal

    Differential Revision: https://reviews.llvm.org/D142886
2023-04-20 09:31:16 +01:00
Nikita Popov
2ec1d0f427 [InstCombine] Don't reassociate GEPs for loop invariance
Since D146813, LICM will reassociate GEPs to expose hoisting
opportunities itself. Don't perform this transform in InstCombine,
where it is fragile because it depends on an optional LoopInfo
analysis.
2023-04-18 12:17:07 +02:00
Nikita Popov
39ca70392b [LV] Regenerate test checks (NFC) 2023-04-18 11:52:36 +02:00
Graham Hunter
d8c49d2ac9 [LV][AArch64] Autogenerate checks for scalable-strict-fadd.ll (NFC)
Precommit for D145163.
2023-04-18 10:25:05 +01:00
Manoj Gupta
3d8ed8b519 Revert "[VPlan] Switch to checking sinking legality for recurrences in VPlan."
This reverts commit 7fc0b3049df532fce726d1ff6869a9f6e3183780.

Causes a clang hang when building xz utils, github issue #62187.
2023-04-17 12:19:36 -07:00
Florian Hahn
9679b63500
[LV] Precommit test for D147963.
Reduced test case for #58811.
2023-04-17 16:19:13 +01:00
Florian Hahn
f555fd5d83
[LV] Regenreate check lines fr pr33706.ll
This avoids conflicts when regenerating check lines.
2023-04-17 13:49:49 +01:00
Florian Hahn
b0da998494
[LV] Extend recurrence test coverage for sinking memory instructions.
Extra coverage for D143604, D143605.
2023-04-17 13:08:15 +01:00
Florian Hahn
02369b75fd
[VPlan] Mark recurrence recipes as not having side-effects.
Add support for FirstOrderRecurrenceSplice and VPFirstOrderRecurrencePHI
recipes to mayHaveSideEffects. They both don't have side-effects.
2023-04-17 12:30:52 +01:00
Florian Hahn
a0d667b89b
[LV] Add users to recurrence tests to make sure they are not removable.
This ensures VPlan-based DCE won't be able to remove the unused
recurrences.

It also adds a dedicated new test (@unused_recurrence) where an unused
recurrence can be removed.
2023-04-17 11:56:56 +01:00
David Sherwood
69ee653313 [LoopVectorize] Take vscale into account when deciding to create epilogues
In LoopVectorizationCostModel::isEpilogueVectorizationProfitable we
check to see if the chosen main vector loop VF >= 16. If so, we
decide to create a vector epilogue loop. However, this doesn't
take VScaleForTuning into account because we could be targeting a
CPU where vscale > 1, and hence the runtime VF would be a multiple
of the known minimum value.

This patch multiplies scalable VFs by VScaleForTuning and several
tests have been updated that now produce vector epilogues.

Differential Revision: https://reviews.llvm.org/D147522
2023-04-17 10:49:40 +00:00
Florian Hahn
83ab5708d1
[LV] Don't sink scalar instructions that may read from memory.
The current sinking code doesn't prevent us from sinking a load past an
aliasing store. Skip sinking instructions that may read from memory to
avoid a mis-compile.

See @minimal_bit_widths_with_aliasing_store for an example where 2 loads
are sunk past aliasing stores before this fix.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147259
2023-04-17 09:30:25 +01:00
Florian Hahn
7fc0b3049d
[VPlan] Switch to checking sinking legality for recurrences in VPlan.
Building on D142885 and D142589, retire the SinkAfter map from the
recurrence handling code. It is replaced by checking whether it is
possible to sink all users of a recurrence directly in VPlan. This
results in simpler code overall and allows to handle additional cases
(see the improvements in @test_crash).

Depends on D142885.
Depends on D142589.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142886
2023-04-13 22:00:52 +01:00
Florian Hahn
cdb48eefd0
[LV] Add exit users for recurrences in test, fix names.
Add users of exit values for recurrences to make sure exit value
generation will be checked in a follow-up change.

Also adjusts/fixes naming in the test.
2023-04-13 20:23:56 +01:00
Simon Pilgrim
aa754f7e0f [IR] llvm::createMinMaxOp - create integer min/max intrinsics instead of icmp/sel
Based off D148215, when expanding a min/max reduction we should be creating min/max intrinsics directly instead of relying on instcombine to fold them back together.

This patch handles integer min/max cases. Hopefully we can add floating point support soon (at least for fastmath/nnan cases) - but we're missing some of the plumbing to pass the correct FMF to the intrinsic at the moment.

Differential Revision: https://reviews.llvm.org/D148221
2023-04-13 16:40:43 +01:00
Craig Topper
4b47d875a1 [LV] Optimize trip count SCEV.
To calculate the trip count we need to add 1 to the backedge
taken count. If we need to widen the backedge count, it's better
to do the add before the widening if we can guarantee it won't
overflow.

The code here is based on similar code I found in
LoopIdiomRecognize.

This is the vectorizer version of this InstCombine patch D142783.
Looking at the IR diffs, this does look like it gets more cases
than the InstCombine patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D147355
2023-04-12 16:17:58 -07:00
Sjoerd Meijer
d827865e9f Recommit "[AArch64][TTI] Cost model FADD/FSUB/FNEG""
Fixed two test cases that relied on Asserts, and added a fallthrough
annotation to the switch case.
2023-04-11 12:48:15 +01:00
Florian Hahn
f9d0b35d22
[LV] Re-use already computed runtime VF in fixFixedOrderRecurrence.
This was suggested as independent cleanup in D147472.

This removes a redundant runtime VF computation when using scalable
vectors.
2023-04-10 21:25:12 +01:00
Florian Hahn
35af27c30a
[VPlan] Only create extracts for recurrence exits if there are live-outs.
Move the code to collect live-out earlier and only generate extracts for
exit values if there are any live-outs that use them.

Depends on D147472.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147567
2023-04-10 21:08:34 +01:00
Florian Hahn
b0e118bd77
[LV] Update tests checking VPlans to use patterns for VPValues.
This makes the tests more robust to changes in value numbering for
VPValues.
2023-04-09 20:32:09 +01:00
Florian Hahn
620e011a25
[VPlan] Don't add live-outs if scalar epilogue is required.
Instead of clearing live outs when a scalar epilogue is required late,
don't add live outs during VPlan construction if a scalar epilogue is
required.

This enables more VPlan-based DCE (if the live out would be the only
user in the plan) and is a step towards removing an access of the cost
model in fixedVectorizedLoop (which is after VPlan execution).

Depends on D147468.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147471
2023-04-09 09:18:24 +01:00
David Sherwood
9278dd7b2b [LoopVectorize] Fix zext/sext cost calculations when types are shrunk
In getInstructionCost if we know a zext/sext is going to be shrunk
we should only be changing the destination type, and leave the
source type unchanged. For example, we may change a zext from

  zext <16 x i8> %a to <16 x i32>

to

  zext <16 x i8> %a to <16 x i16>

However, we were previously calculating the cost for doing

  zext <16 x i16> %a to <16 x i16>

which is incorrect.

Differential Revision: https://reviews.llvm.org/D147152
2023-04-06 08:52:25 +00:00
Nikita Popov
53f7f85703 [LoopVectorize] Convert some tests to opaque pointers (NFC) 2023-04-06 09:38:47 +02:00
Philip Reames
c416f6700f [IVDescriptors] Add pointer InductionDescriptors with non-constant strides (try 2)
(JFYI - This has been heavily reframed since original attempt at landing.)

This change updates the InductionDescriptor logic to allow matching a pointer IV with a non-constant stride, but also updates the LoopVectorizer to bailout on such descriptors by default. This preserves the default vectorizer behavior.

In review, it was pointed out that there's multiple unfortunate performance implications which need to be addressed before this can be enabled. Having a flag allows us to exercise the behavior, and write test cases for logic which is otherwise unreachable (or hard to reach).

This will also enable non-constant stride pointer recurrences for other consumers. I've audited said code, and don't see any obvious issues.

Differential Revision: https://reviews.llvm.org/D147336
2023-04-05 09:32:35 -07:00
Philip Reames
37646a2c28 [RISCV] Account for LMUL in memory op costs
Generally, the cost of a memory op will scale with the number of vector registers accessed. Machines might exist which have a narrow memory access than vector register width, but machines with a wider memory access width than vector register width seem unlikely.

I noticed this because we were preferring wide loads + deinterleaves on examples where the cost of a short gather (actually a strided load) would be better. Touching 8 vector registers instead of doing a 4 element gather is not a good tradeoff.

Differential Revision: https://reviews.llvm.org/D147470
2023-04-05 07:58:56 -07:00
Graham Hunter
185863f7de [LV] Use available masked vector function variants when required
LLVM has the ability to vectorize using function variants that require
a mask by creating an all-true mask, and to vectorize a conditional
call via scalarization, now we want to join the two parts together
and use a masked variant when a mask is required.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D136251
2023-04-05 11:18:38 +01:00
Florian Hahn
b5925a22c9
[LV] Add uses of recurrences in exit blocks in some tests.
This preserves the spirit of the tests even if a follow-up changes only
generates exit values for recurrences if they are actually used.
2023-04-04 21:19:29 +01:00