123 Commits

Author SHA1 Message Date
Niwin Anto
eaf0d82529
[LV] Disable fold tail by masking when IV is used outside (#81609)
When induction variable are used outside the loop body, tail folding
by masking mis-compiles, because for users outside of the loop the
final value of the induction is computed separately from the vector
loop.

Fixes https://github.com/llvm/llvm-project/issues/76069
Fixes https://github.com/llvm/llvm-project/issues/51677
2024-03-04 11:33:30 +00:00
Graham Hunter
b070629c10
[LV] Increase max VF if vectorized function variants exist (#66639)
If there are function calls in the candidate loop and we have vectorized
variants available, try some wider VFs in case the conservative initial
maximum based on the widest types in the loop won't actually allow us
to make use of those function variants.
2023-11-13 10:27:10 +00:00
Simon Pilgrim
3ca4fe80d4 [Transforms] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC.
startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)
2023-11-06 16:50:18 +00:00
Florian Hahn
aac8acb115
[VPlan] Model masked assumes as replicate recipes, drop them (NFCI).
Replace ConditionalAssume set by treating conditional assumes like other
predicated instructions (i.e. create a VPReplicateRecipe with a mask)
and later remove any assume recipes with masks during VPlan cleanup.

This reduces coupling of VPlan construction and Legal by removing a
shared set between the 2 and results in a cleaner code structure
overall.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D157034
2023-08-06 20:56:24 +01:00
Florian Hahn
e48b1e87a3
[LV] Split off invariance check from isUniform (NFCI).
After 572cfa3fde5433, isUniform now checks VF based uniformity instead of
just invariance as before.

As follow-up cleanup suggested in D148841, separate the invariance check
out and update callers that currently check only for invariance.

This also moves the implementation of isUniform from LoopAccessAnalysis
to LoopVectorizationLegality, as LoopAccesAnalysis doesn't use the more
general isUniform.
2023-06-01 19:09:11 +01:00
Florian Hahn
572cfa3fde
[LV] Use SCEV for uniformity analysis across VF
This patch uses SCEV to check if a value is uniform across a given VF.

The basic idea is to construct SCEVs where the AddRecs of the loop are
adjusted to reflect the version in the vectorized loop (Step multiplied
by VF). We construct a SCEV for the value of the vector lane 0
(offset 0) compare it to the expressions for lanes 1 to the last vector
lane (VF - 1). If they are equal, consider the expression uniform.

While re-writing expressions, we also need to catch expressions we
cannot determine uniformity (e.g. SCEVUnknown).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D148841
2023-05-31 16:01:00 +01:00
Philip Reames
e41dce4d49 [LAA/LV] Simplify stride speculation logic [NFC] (try 2)
The original commit wasn't quite NFC, and this was caught by an arguably overly strong assert.  Specifically, I'd failed to strip off the integer cast off the SCEV before saving it in the map.  The result - other than a failed assert - is that we'd speculate on the casted unknown, not the unknown.  The only case I can think of where that might change behavior would be a sext(i1 load).  I doubt that case is interesting in practice, but it's good to be strictly NFC on this change regardless.

Original commit message follows..

The existing code makes it hard to tell that collectStridedAccess is really about identifying some loop invariant SCEV which is *profitable* to speculate is equal to one. The odd dual usage structure of Value and SCEV confuses this point.

We could choose to loosen the profitability analysis if desired. I'm not proposing doing so at this time as it exposes too many cases where the speculation is unprofitable.

Differential Revision: https://reviews.llvm.org/D147750
2023-05-11 10:19:23 -07:00
Philip Reames
dc0d00c5fc Revert "[LAA/LV] Simplify stride speculation logic [NFC]"
This reverts commit d5b840131223f2ffef4e48ca769ad1eb7bb1869a.  Running this through broader testing after rebasing is revealing a crash.  Reverting while I investigate.
2023-05-11 09:26:35 -07:00
Philip Reames
d5b8401312 [LAA/LV] Simplify stride speculation logic [NFC]
The existing code makes it hard to tell that collectStridedAccess is really about identifying some loop invariant SCEV which is *profitable* to speculate is equal to one. The odd dual usage structure of Value and SCEV confuses this point.

We could choose to loosen the profitability analysis if desired. I'm not proposing doing so at this time as it exposes too many cases where the speculation is unprofitable.

Differential Revision: https://reviews.llvm.org/D147750
2023-05-11 08:32:56 -07:00
Florian Hahn
6b8d19d2b5
Recommit "[VPlan] Switch to checking sinking legality for recurrences in VPlan."
This reverts the revert commit 3d8ed8b5192a59104bfbd5bf7ac84d035ee0a4a5.

The new version of the patch adds a set to avoid duplicating work in
isFixedOrderRecurrence, which was previously done through the removed
SinkAfter map.

Original commit message:
    Building on D142885 and D142589, retire the SinkAfter map from the
    recurrence handling code. It is replaced by checking whether it is
    possible to sink all users of a recurrence directly in VPlan. This
    results in simpler code overall and allows to handle additional cases
    (see the improvements in @test_crash).

    Depends on D142885.
    Depends on D142589.

    Reviewed By: Ayal

    Differential Revision: https://reviews.llvm.org/D142886
2023-04-20 09:31:16 +01:00
Manoj Gupta
3d8ed8b519 Revert "[VPlan] Switch to checking sinking legality for recurrences in VPlan."
This reverts commit 7fc0b3049df532fce726d1ff6869a9f6e3183780.

Causes a clang hang when building xz utils, github issue #62187.
2023-04-17 12:19:36 -07:00
Florian Hahn
7fc0b3049d
[VPlan] Switch to checking sinking legality for recurrences in VPlan.
Building on D142885 and D142589, retire the SinkAfter map from the
recurrence handling code. It is replaced by checking whether it is
possible to sink all users of a recurrence directly in VPlan. This
results in simpler code overall and allows to handle additional cases
(see the improvements in @test_crash).

Depends on D142885.
Depends on D142589.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142886
2023-04-13 22:00:52 +01:00
Philip Reames
92aae9e725 [LV] Remove a cover function with a single use [nfc]
And more importantly, move the fixme to the sole caller where it actually makes sense in context.
2023-04-06 08:27:57 -07:00
Philip Reames
c416f6700f [IVDescriptors] Add pointer InductionDescriptors with non-constant strides (try 2)
(JFYI - This has been heavily reframed since original attempt at landing.)

This change updates the InductionDescriptor logic to allow matching a pointer IV with a non-constant stride, but also updates the LoopVectorizer to bailout on such descriptors by default. This preserves the default vectorizer behavior.

In review, it was pointed out that there's multiple unfortunate performance implications which need to be addressed before this can be enabled. Having a flag allows us to exercise the behavior, and write test cases for logic which is otherwise unreachable (or hard to reach).

This will also enable non-constant stride pointer recurrences for other consumers. I've audited said code, and don't see any obvious issues.

Differential Revision: https://reviews.llvm.org/D147336
2023-04-05 09:32:35 -07:00
Graham Hunter
185863f7de [LV] Use available masked vector function variants when required
LLVM has the ability to vectorize using function variants that require
a mask by creating an all-true mask, and to vectorize a conditional
call via scalarization, now we want to join the two parts together
and use a masked variant when a mask is required.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D136251
2023-04-05 11:18:38 +01:00
Kazu Hirata
c8f9555c4d [Transforms] Use *{Set,Map}::contains (NFC) 2023-03-14 00:24:30 -07:00
Graham Hunter
a180344589 [LV] Allow scalarization of function calls when masking is required
This patch adds support for scalarizing calls to a function when
there is a vector variant that cannot be used, either because there
isn't a masked variant or because the cost model indicated a VF
without a masked variant was better.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D134422
2023-03-03 15:26:04 +00:00
Florian Hahn
825e16969e
[LAA] Pass LoopAccessInfoManager instead of GetLAA function.
Use LoopAccessInfoManager directly instead of various GetLAA lambdas.

Depends on D134608.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134609
2022-10-04 11:51:25 +01:00
Igor Kirillov
2d60d7ba1a [LoopVectorize][Fix] Crash when invariant store address is calculated inside loop
Fixes #57572

Generally LICM pass is responsible for sinking out code that calculates
invariant address inside loop as it only needed to be calculated once.
But in rare case it does not happen we will not be vectorizing the
loop.

Differential Revision: https://reviews.llvm.org/D133687
2022-09-28 10:33:50 +01:00
Philip Reames
f6d110e26f [LAA] Make getPtrStride return Option instead of overloading zero as error value [nfc]
This is purely NFC restructure in advance of a change which actually exposes zero strides.  This is mostly because I find this interface confusing each time I look at it.
2022-09-27 15:55:44 -07:00
Matt Arsenault
b609741958 LoopVectorize: Pass through AssumptionCache 2022-09-19 19:25:22 -04:00
Kazu Hirata
56ea4f9bd3 [Transforms] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-27 21:21:02 -07:00
Florian Hahn
b8709a9d03
[LV] Support fixed order recurrences.
If the incoming previous value of a fixed-order recurrence is a phi in
the header, go through incoming values from the latch until we find a
non-phi value. Use this as the new Previous, all uses in the header
will be dominated by the original phi, but need to be moved after
the non-phi previous value.

At the moment, fixed-order recurrences are modeled as a chain of
first-order recurrences.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D119661
2022-08-18 19:15:52 +01:00
Philip Reames
1436adae2c [LV-L] Add const and move method body out of line [nfc] 2022-08-18 11:10:19 -07:00
Philip Reames
c064d3f139 [LV] Use early continue to simplify code [nfc] 2022-08-18 10:31:55 -07:00
Kazu Hirata
50724716cd [Transforms] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-14 12:51:58 -07:00
Nikita Popov
1ed8b29302 [LoopVectorizationLegality] Drop unused variable (NFC) 2022-07-06 10:43:39 +02:00
Nikita Popov
8ee913d83b [IR] Remove Constant::canTrap() (NFC)
As integer div/rem constant expressions are no longer supported,
constants can no longer trap and are always safe to speculate.
Remove the Constant::canTrap() method and its usages.
2022-07-06 10:36:47 +02:00
Florian Hahn
644a965c1e
[LV] Vectorize cases with larger number of RT checks, execute only if profitable.
This patch replaces the tight hard cut-off for the number of runtime
checks with a more accurate cost-driven approach.

The new approach allows vectorization with a larger number of runtime
checks in general, but only executes the vector loop (and runtime checks) if
considered profitable at runtime. Profitable here means that the cost-model
indicates that the runtime check cost + vector loop cost < scalar loop cost.

To do that, LV computes the minimum trip count for which runtime check cost
+ vector-loop-cost < scalar loop cost.

Note that there is still a hard cut-off to avoid excessive compile-time/code-size
increases, but it is much larger than the original limit.

The performance impact on standard test-suites like SPEC2006/SPEC2006/MultiSource
is mostly neutral, but the new approach can give substantial gains in cases where
we failed to vectorize before due to the over-aggressive cut-offs.

On AArch64 with -O3, I didn't observe any regressions outside the noise level (<0.4%)
and there are the following execution time improvements. Both `IRSmk` and `srad` are relatively short running, but the changes are far above the noise level for them on my benchmark system.

```
CFP2006/447.dealII/447.dealII    -1.9%
CINT2017rate/525.x264_r/525.x264_r    -2.2%
ASC_Sequoia/IRSmk/IRSmk       -9.2%
Rodinia/srad/srad     -36.1%
```

`size` regressions on AArch64 with -O3 are

```
MultiSource/Applications/hbd/hbd                 90256.00   106768.00 18.3%
MultiSourc...ks/ASCI_Purple/SMG2000/smg2000     240676.00   257268.00  6.9%
MultiSourc...enchmarks/mafft/pairlocalalign     472603.00   489131.00  3.5%
External/S...2017rate/525.x264_r/525.x264_r     613831.00   630343.00  2.7%
External/S...NT2006/464.h264ref/464.h264ref     818920.00   835448.00  2.0%
External/S...te/538.imagick_r/538.imagick_r    1994730.00  2027754.00  1.7%
MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4    1236471.00  1253015.00  1.3%
MultiSource/Applications/oggenc/oggenc         2108147.00  2124675.00  0.8%
External/S.../CFP2006/447.dealII/447.dealII    4742999.00  4759559.00  0.3%
External/S...rate/510.parest_r/510.parest_r   14206377.00 14239433.00  0.2%
```

Reviewed By: lebedev.ri, ebrevnov, dmgreen

Differential Revision: https://reviews.llvm.org/D109368
2022-07-04 15:11:39 +01:00
Malhar Jajoo
6bb40552f2 [LoopVectorize] Add support for invariant stores of ordered reductions
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D126772
2022-06-17 14:56:21 +01:00
Florian Hahn
d157019482
[VPlan] Remove unused native utilities incompatible with nested regions.
The implementations of VPlanDominatorTree, VPlanLoopInfo and VPlanPredicator
are all incompatible with modeling loops in VPlans as region without
explicit back-edges.

Those pieces are not actively used and only exercised by a few gtest
unit tests. They are at the moment blocking progress towards unifying
the native and inner-loop vectorizer paths in D121624 and D123005.

I think we should not block forward progress on unused pieces of code,
so this patch removes the utilities for now. The plan is to re-introduce
them as needed in a way that is compatible with the unified VPlan scheme
used in both the inner loop vectorizer and the native path.

Reviewed By: sguggill

Differential Revision: https://reviews.llvm.org/D123017
2022-06-01 09:32:59 +01:00
Igor Kirillov
4e5e042d9a [LoopVectorize] Support reductions that store intermediary result
Adds ability to vectorize loops containing a store to a loop-invariant
address as part of a reduction that isn't converted to SSA form due to
lack of aliasing info. Runtime checks are generated to ensure the store
does not alias any other accesses in the loop.

Ordered fadd reductions are not yet supported.

Differential Revision: https://reviews.llvm.org/D110235
2022-05-03 10:12:30 +01:00
David Green
6f81903e89 [LV][SLP] Mark fptosi_sat as vectorizable
This adds fptosi_sat and fptoui_sat to the list of trivially
vectorizable functions, mainly so that the loop vectorizer can vectorize
the instruction. Marking them as trivially vectorizable also allows them
to be SLP vectorized, and Scalarized.

The signature of a fptosi_sat requires two type overrides
(@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only
take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd
to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first
operand of the intrinsic as a overloaded (but not scalar) operand.

Differential Revision: https://reviews.llvm.org/D124358
2022-05-03 09:32:34 +01:00
David Green
9727c77d58 [NFC] Rename Instrinsic to Intrinsic 2022-04-25 18:13:23 +01:00
Florian Hahn
46432a0088
[VPlan] Add VPWidenPointerInductionRecipe.
This patch moves pointer induction handling from VPWidenPHIRecipe to its
own recipe. In the process, it adds all information required to generate
code for pointer inductions without relying on Legal to access the list
of induction phis.

Alternatively VPWidenPHIRecipe could also take an optional pointer to InductionDescriptor.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121615
2022-03-24 14:58:45 +00:00
serge-sans-paille
ed98c1b376 Cleanup includes: DebugInfo & CodeGen
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121332
2022-03-12 17:26:40 +01:00
Philip Reames
5ba115031d [PSE] Remove assumption that top level predicate is union from public interface [NFC*]
Note that this doesn't actually cause the top level predicate to become a non-union just yet.

The * above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.
2022-02-10 16:14:52 -08:00
Paul Walker
7c68ed8892 [SVE] Reintroduce -scalable-vectorization=preferred as an alias to "on".
Some buildbots still rely on the experimental flag, so let's keep
it until everything has been migrated to the new "on by default"
state.
2021-12-21 12:54:04 +00:00
Sander de Smalen
b1ff20fd35 [LV] Enable scalable vectorization by default for SVE cores.
The availability of SVE should be sufficient to enable scalable
auto-vectorization.

This patch adds a new TTI interface to query the target what style of
vectorization it wants when scalable vectors are available. For other
targets than AArch64, this currently defaults to 'FixedWidthOnly'.

Differential Revision: https://reviews.llvm.org/D115651
2021-12-20 16:23:29 +00:00
Florian Hahn
978883d254
[VPlan] Add InductionDescriptor to VPWidenIntOrFpInduction. (NFC)
This allows easier access to the induction descriptor from VPlan,
without needing to go through Legal. VPReductionPHIRecipe already
contains a RecurrenceDescriptor in a similar fashion.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D115111
2021-12-10 09:55:09 +00:00
Florian Hahn
d74a8a78ad
[LV] Mark various functions as const (NFC).
Make sure various accessors do not modify any state, in preparation for
D115111.
2021-12-09 10:51:29 +00:00
Kazu Hirata
4f0225f6d2 [Transforms] Migrate from getNumArgOperands to arg_size (NFC)
Note that getNumArgOperands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-10-01 09:57:40 -07:00
Nikita Popov
45c467346a [LAA] Pass access type to getPtrStride()
Pass the access type to getPtrStride(), so it is not determined
from the pointer element type. Many cases still fetch the element
type at a higher level though, so this only partially addresses
the issue.
2021-09-11 19:16:49 +02:00
Philip Reames
95346ba877 [LV] Enable vectorization of multiple exit loops w/computable exit counts
This change enables vectorization of multiple exit loops when the exit count is statically computable. That requirement - shared with the rest of LV - in turn requires each exit to be analyzeable and to dominate the latch.

The majority of work to support this was done in a set of previous patches. In particular,, 72314466 avoids having multiple edges from the middle block to the exits, and 4b33b2387 which added support for non-latch single exit and multiple exits with a single exiting block. As a result, this change is basically just removing a bailout and adjusting some tests now that the prerequisite work is done and has stuck in tree for a bit.

Differential Revision: https://reviews.llvm.org/D105817
2021-07-15 08:53:51 -07:00
Paul Walker
287d39dd5a [NFC] Fix a few whitespace issues and typos. 2021-07-04 11:49:58 +01:00
Simon Pilgrim
5e6bfb661e [Analysis] Pass RecurrenceDescriptor as const reference. NFCI.
We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.).

Differential Revision: https://reviews.llvm.org/D104029
2021-06-11 10:24:14 +01:00
Joachim Meyer
4f01122c3f [LV] Parallel annotated loop does not imply all loads can be hoisted.
As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads.
This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL.

The question remains why this was initially added and what the implications of removing this optimization would be.
Do we need an alternative mechanism to propagate the information about legality of if-conversion?
Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general?
I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous.

Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103907
2021-06-10 23:37:57 +02:00
Kerry McLaughlin
9f76a85260 [LoopVectorize] Enable strict reductions when allowReordering() returns false
When loop hints are passed via metadata, the allowReordering function
in LoopVectorizationLegality will allow the order of floating point
operations to be changed:

  bool allowReordering() const {
    // When enabling loop hints are provided we allow the vectorizer to change
    // the order of operations that is given by the scalar loop. This is not
    // enabled by default because can be unsafe or inefficient.

The -enable-strict-reductions flag introduced in D98435 will currently only
vectorize reductions in-loop if hints are used, since canVectorizeFPMath()
will return false if reordering is not allowed.

This patch changes canVectorizeFPMath() to query whether it is safe to
vectorize the loop with ordered reductions if no hints are used. For
testing purposes, an additional flag (-hints-allow-reordering) has been
added to disable the reordering behaviour described above.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D101836
2021-05-26 13:59:12 +01:00
Sander de Smalen
4f86aa650c [LV] Add -scalable-vectorization=<option> flag.
This patch adds a new option to the LoopVectorizer to control how
scalable vectors can be used.

Initially, this suggests three levels to control scalable
vectorization, although other more aggressive options can be added in
the future.

The possible options are:
- Disabled:   Disables vectorization with scalable vectors.
- Enabled:    Vectorize loops using scalable vectors or fixed-width
              vectors, but favors fixed-width vectors when the cost
              is a tie.
- Preferred:  Like 'Enabled', but favoring scalable vectors when the
              cost-model is inconclusive.

Reviewed By: paulwalker-arm, vkmr

Differential Revision: https://reviews.llvm.org/D101945
2021-05-19 10:40:56 +01:00
Sander de Smalen
f82966d19a [LoopVectorizationLegality] NFC: Mark some interfaces as 'const'
This patch marks blockNeedsPredication, isConsecutivePtr, isMaskRequired
and getSymbolicStrides as 'const'.
2021-05-14 11:53:54 +01:00