447 Commits

Author SHA1 Message Date
pvanhout
ae77aceba5 [Analysis] Remove DA & LegacyDA
UniformityAnalysis offers all of the same features and much more, there is no reason left to use the legacy DAs.
See RFC: https://discourse.llvm.org/t/rfc-deprecate-divergenceanalysis-legacydivergenceanalysis/69538

- Remove LegacyDivergenceAnalysis.h/.cpp
- Remove DivergenceAnalysis.h/.cpp + Unit tests
- Remove SyncDependenceAnalysis - it was not a real registered analysis and was only used by DAs
- Remove/adjust references to the passes in the docs where applicable
- Remove TTI hook associated with those passes.
- Move tests to UniformityAnalysis folder.
  - Remove RUN lines for the DA, leave only the UA ones.
- Some tests had to be adjusted/removed depending on how they used the legacy DAs.

Reviewed By: foad, sameerds

Differential Revision: https://reviews.llvm.org/D148116
2023-04-17 09:01:22 +02:00
Simon Pilgrim
fb8038db73 [TTI] getExtendedReductionCost - replace std::optional<FastMathFlags> args with FastMathFlags
Followup to D148149 where it was noticed that the std::optional wrapper wasn't helping with anything (we can just use an empty FastMathFlags()).
2023-04-13 11:26:28 +01:00
Simon Pilgrim
9e30b87afb [TTI] getMinMaxReductionCost - add FastMathFlag argument
Similar to the getArithmeticReductionCost / getExtendedReductionCost calls (which really don't need to use std::optional<>).

This will be necessary to correct recognize fast/nnan fmax/fmul reductions which can avoid nan handling - which will allow us to remove the fmax/fmin special case in X86TTIImpl::getMinMaxCost and use getIntrinsicInstrCost like we do for integer reductions (63c3895327839ba5b57f5b99ec9e888abf976ac6).

Differential Revision: https://reviews.llvm.org/D148149
2023-04-13 10:42:42 +01:00
Michael Liao
72fc08a541 [InstCombine] Teach alloca replacement to handle addrspacecast
- As the address space cast may not be valid on a specific target,
  `addrspacecast` is not handled when an `alloca` is able to be replaced
  with the source of memcpy/memmove. This patch addresses that by
  querying a target hook on whether that address space cast is valid.
  For example, on most GPU targets, the cast from a global pointer to a
  generic pointer is valid.
- If that cast is allowedd (by querying `isValidAddrSpaceCast`), the
  replacement is enhanced to handle that `addrspacecast` as well.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D147025
2023-04-11 11:47:37 -04:00
David Sherwood
b4089cfa2f [NFC][LoopVectorize] Simplify preferPredicateOverEpilogue interface
Given just how many arguments we pass to
preferPredicateOverEpilogue and considering this list may
grow over time I've decided to pass in a pointer to a new
TailFoldingInfo structure instead, similar to what we do
with IntrinsicCostAttributes, etc. In addition, many of the
arguments we pass in are actually available in the
LoopVectorizationLegality class so I've managed to
reduce the set of pointers that we need to pass in the
TailFoldingInfo struct.

Differential Revision: https://reviews.llvm.org/D146127
2023-04-04 14:00:49 +00:00
Wang, Xin10
0bf75d8541 Handle the unexpected inputs for pass HardwareLoops
For a function TryConvertLoop in pass HardwareLoops, wrong input arguments will
lead to crash. There will be 3 cases.
In line 342, compiler want to get something from
HWLoopInfo.CountType, which depends on if argument Bitwidth is given, if not,
will crash.
In Function isHardwareLoopCandidate, it dereference CountType too.
In Function InsertLoopDec, it dereference LoopDecrement.
They all could lead to crash. This patch add condition to this pass, when we meet unexpected inputs then skip
the pass.

Reviewed By: samparker, fhahn

Differential Revision: https://reviews.llvm.org/D146277
2023-03-30 10:42:19 +08:00
David Sherwood
1c4fedfa35 [LoopVectorize] Don't tail-fold for scalable VFs when there is no scalar tail
Currently in LoopVectorize we avoid tail-folding if we can
prove the trip count is always a multiple of the maximum
fixed-width VF. This works because we know the vectoriser
only ever chooses a VF that is a power of 2. However, if
we are also considering scalable VFs then we conservatively
bail out of the optimisation because we don't know the value
of vscale, which could be an odd or prime number, etc.

This patch tries to enable the same optimisation for scalable
VFs by asking if vscale is known to be a power of 2. If so,
we can then query the maximum value of vscale and use the same
logic as we do for fixed-width VFs. I've also added a new TTI
hook called isVScaleKnownToBeAPowerOfTwo that does the same
thing as the existing TargetLowering hook.

Differential Revision: https://reviews.llvm.org/D146199
2023-03-27 08:34:30 +00:00
Valery N Dmitriev
f9b438b519 [SLP] Outline GEP chain cost modeling into new TTI interface - NFCI.
Cost modeling for GEPs should actually be target dependent but is currently
done inside SLP target-independent way.
Sinking it into TTI enables target dependent implementation.
This patch adds new TTI interface and implementation of the basic functionality
trying to retain existing cost modeling.

Differential Revision: https://reviews.llvm.org/D144770
2023-03-14 14:01:34 -07:00
Sander de Smalen
c41b41eb11 [LoopVectorize] Use overflow-check analysis to improve tail-folding.
This work follows on from D142109 and addresses a possible regression
when we know the loop iteration counter cannot overflow.

When we know the overflow-check always evaluates to false, it's better to
use the other style of tail folding where it assumes a runtime check was
added, because that avoids having to calculate a modified trip-count.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D142894
2023-03-01 14:17:58 +00:00
Luke Lau
b02b1e0ed6 [LV][NFC] Use ElementCount for getMaxInterleaveFactor
In order to allow targets to disable interleaving for scalable vectors, pass the entire VF's ElementCount to getMaxInterleaveFactor.
This is based off of the approach used here: 8d36708507

The plan would then be to disable interleaving on scalable VFs on RISC-V in a follow up patch.
See https://reviews.llvm.org/D143723#4132349

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D144474
2023-02-22 10:15:05 +00:00
Simon Tatham
a8cd35c3b7 [LowerTypeTests] Support generating Armv6-M jump tables. (reland)
[Originally committed as f6ddf7781471b71243fa3c3ae7c93073f95c7dff;
reverted in bbef38352fbade9e014ec97d5991da5dee306da7 due to test
breakage; now relanded with the Arm tests conditioned on
`arm-registered-target`]

The LowerTypeTests pass emits a jump table in the form of an
`inlineasm` IR node containing a string representation of some
assembly. It tests the target triple to see what architecture it
should be generating assembly for. But that's not good enough for
`Triple::thumb`, because the 32-bit PC-relative `b.w` branch
instruction isn't available in all supported architecture versions. In
particular, Armv6-M doesn't support that instruction (although the
similar Armv8-M Baseline does).

Most of this patch is concerned with working out whether the
compilation target is Armv6-M or not, which I'm doing by going through
all the functions in the module, retrieving a TargetTransformInfo for
each one, and querying it via a new method I've added to check its
SubtargetInfo. If any function's TTI indicates that it's targeting an
architecture supporting B.W, then we assume we're also allowed to use
B.W in the jump table.

The Armv6-M compatible jump table format requires a temporary
register, and therefore also has to use the stack in order to restore
that register.

Another consequence of this change is that jump tables on Arm/Thumb
are no longer always the same size. In particular, on an architecture
that supports Arm and Thumb-1 but not Thumb-2, the Arm and Thumb
tables are different sizes from //each other//. As a consequence,
``getJumpTableEntrySize`` can no longer base its answer on the target
triple's architecture: it has to take into account the decision that
``selectJumpTableArmEncoding`` made, which meant I had to move that
function to an earlier point in the code and store its answer in the
``LowerTypeTestsModule`` class.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D143576
2023-02-20 10:46:47 +00:00
Simon Tatham
bbef38352f Revert "[LowerTypeTests] Support generating Armv6-M jump tables."
This reverts commit f6ddf7781471b71243fa3c3ae7c93073f95c7dff.

Eight buildbots reported that the two test files changed by that
commit had started failing. The buildbots in question all had in
common that they build with a very restricted `LLVM_TARGETS_TO_BUILD`,
such as only X86 or AArch64 or Hexagon. I didn't notice this before
commit because my own build has the full default set of targets, and
in that circumstance, the tests pass.

I assume the problem has something to do with the attempt to query
TargetTransformInfo: if you can't make a valid TTI for the target
triple then you can't ask it what kind of inline assembler you should
be emitting, and so `opt` without the Arm backend can't get the Arm
cases of these tests right.

I don't have time to fix this until next week, so I'll revert the
change for now to keep the buildbots happy.
2023-02-16 17:11:06 +00:00
Simon Tatham
f6ddf77814 [LowerTypeTests] Support generating Armv6-M jump tables.
The LowerTypeTests pass emits a jump table in the form of an
`inlineasm` IR node containing a string representation of some
assembly. It tests the target triple to see what architecture it
should be generating assembly for. But that's not good enough for
`Triple::thumb`, because the 32-bit PC-relative `b.w` branch
instruction isn't available in all supported architecture versions. In
particular, Armv6-M doesn't support that instruction (although the
similar Armv8-M Baseline does).

Most of this patch is concerned with working out whether the
compilation target is Armv6-M or not, which I'm doing by going through
all the functions in the module, retrieving a TargetTransformInfo for
each one, and querying it via a new method I've added to check its
SubtargetInfo. If any function's TTI indicates that it's targeting an
architecture supporting B.W, then we assume we're also allowed to use
B.W in the jump table.

The Armv6-M compatible jump table format requires a temporary
register, and therefore also has to use the stack in order to restore
that register.

Another consequence of this change is that jump tables on Arm/Thumb
are no longer always the same size. In particular, on an architecture
that supports Arm and Thumb-1 but not Thumb-2, the Arm and Thumb
tables are different sizes from //each other//. As a consequence,
``getJumpTableEntrySize`` can no longer base its answer on the target
triple's architecture: it has to take into account the decision that
``selectJumpTableArmEncoding`` made, which meant I had to move that
function to an earlier point in the code and store its answer in the
``LowerTypeTestsModule`` class.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D143576
2023-02-16 15:34:49 +00:00
Sander de Smalen
005311399e [LoopVectorize][TTI] NFCI: Clarify enum for the tail folding style.
This NFC (intended) patch has several small changes:
* It renames PredicationStyle to TailFoldingStyle.
* It renames TTI.emitActiveLaneMask() to TTI.getPreferredTailFoldingStyle()
* Simplifies some of its uses in the LoopVectorizer

Rationale: To my surprise PredicationStyle::None did not mean 'no
predication', but rather 'no active lane mask intrinsic', such that the
predicate is created using a splat + compare with stepvector. The enum is
also highly specific to tail folding, so it seems better to name this
around that feature, i.e. 'tail folding style'.

This also makes it more amenable to extend it to other tail folding styles,
such as the one added in D142109.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D142887
2023-02-03 14:59:57 +00:00
Serguei Katkov
6188929dfb [TTI][NFC] Introduce option to set predictable branch threshold
Currently TargetTransformInfo::getPredictableBranchThreshold() method
returns hardcoded value 99. This value affects the decision whether to
convert select instruction to branch or not in several passes:
SelectOptimize, CodeGenPrepare, SimplifyCFG.

It would be useful to make possible to play with that threshold in order
to test select-optimize heuristics.

Option was originally introduced in the TargetLoweringBase, but was
removed in the revision 664d0c052c315 and not restored in the TTI

Patch Author: aleksandr.popov
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D143060
2023-02-02 17:54:56 +07:00
ShihPo Hung
5fb3a57ea7 [Cost] Add CostKind to getVectorInstrCost and its related users
LoopUnroll estimates the loop size via getInstructionCost(),
but getInstructionCost() cannot pass CostKind to getVectorInstrCost().
And so does getShuffleCost() to getBroadcastShuffleOverhead(),
getPermuteShuffleOverhead(), getExtractSubvectorOverhead(),
and getInsertSubvectorOverhead().

To address this, this patch adds an argument CostKind to these
functions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D142116
2023-01-21 05:29:24 -08:00
Alexey Bataev
9b5f62685a [SLP]Fix cost of the broadcast buildvector/gather.
Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.

Differential Revision: https://reviews.llvm.org/D140498
2023-01-06 09:25:05 -08:00
David Green
16a72a0f87 [AArch64] Enable the select optimize pass for AArch64
This enabled the select optimize patch for ARM Out of order AArch64
cores. It is trying to solve a problem that is difficult for the
compiler to fix. The criteria for when a csel is better or worse than a
branch depends heavily on whether the branch is well predicted and the
amount of ILP in the loop (as well as other criteria like the core in
question and the relative performance of the branch predictor).  The
pass seems to do a decent job though, with the inner loop heuristics
being well implemented and doing a better job than I had expected in
general, even without PGO information.

I've been doing quite a bit of benchmarking. The headline numbers are
these for SPEC2017 on a Neoverse N1:
  500.perlbench_r   -0.12%
  502.gcc_r         0.02%
  505.mcf_r         6.02%
  520.omnetpp_r     0.32%
  523.xalancbmk_r   0.20%
  525.x264_r        0.02%
  531.deepsjeng_r   0.00%
  541.leela_r       -0.09%
  548.exchange2_r   0.00%
  557.xz_r          -0.20%

Running benchmarks with a combination of the llvm-test-suite plus
several versions of SPEC gave between a 0.2% and 0.4% geomean
improvement depending on the core/run. The instruction count went down
by 0.1% too, which is a good sign, but the results can be a little
noisy.  Some issues from other benchmarks I had ran were improved in
rGca78b5601466f8515f5f958ef8e63d787d9d812e. In summary well predicted
branches will see in improvement, badly predicted branches may get
worse, and on average performance seems to be a little better overall.

This patch enables the pass for AArch64 under -O3 for cores that will
benefit for it. i.e. not in-order cores that do not fit into the "Assume
infinite resources that allow to fully exploit the available
instruction-level parallelism" cost model. It uses a subtarget feature
for specifying when the pass will be enabled, which I have enabled under
cpu=generic as the performance increases for out of order cores seems
larger than any decreases for inorder, which were minor.

Differential Revision: https://reviews.llvm.org/D138990
2022-12-03 16:08:58 +00:00
Krzysztof Parzyszek
86fe4dfdb6 TargetTransformInfo: convert Optional to std::optional
Recommit: added missing "#include <cstdint>".
2022-12-02 11:42:15 -08:00
Krzysztof Parzyszek
4e12d1836a Revert "TargetTransformInfo: convert Optional to std::optional"
This reverts commit b83711248cb12639e7ef7303cfbb4452b4067e85.

Some buildbots are failing.
2022-12-02 11:34:04 -08:00
Krzysztof Parzyszek
b83711248c TargetTransformInfo: convert Optional to std::optional 2022-12-02 11:27:12 -08:00
Krzysztof Parzyszek
26424c96c0 Attributes: convert Optional to std::optional 2022-12-02 08:15:45 -06:00
Stanislav Mekhanoshin
bcaf31ec3f [AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217
2022-11-17 09:23:53 -08:00
Philip Reames
269bc684e7 [LV][RISCV] Disable vectorization of epilogue loops
Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder.

In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV.

In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV.

As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination.

As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV.

Differential Revision: https://reviews.llvm.org/D136695
2022-10-25 14:28:02 -07:00
Shubham Narlawar
b920407cf5 [LICM] Disable thread-safety checks in single-thread model
If the single-thread model is used, or the
-licm-force-thread-model-single flag is specified, skip checks
related to thread-safety. This means that store promotion for
conditionally executed stores only requires proof of
dereferenceability and writability, but not of thread-safety. For
example, this enables promotion of stores to (non-constant) globals,
as well as captured allocas.

Fixes https://github.com/llvm/llvm-project/issues/50537.

Differential Revision: https://reviews.llvm.org/D130466
2022-10-10 16:51:16 +02:00
Simon Pilgrim
c94cbc343e Fix gcc warning about ambiguous if-else chain
Fixes warnings introduced by D111968
2022-09-23 14:36:28 +01:00
Simon Pilgrim
a6e9141505 [TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436)
The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both.

This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling.

Fixes #50778

Differential Revision: https://reviews.llvm.org/D111968
2022-09-23 14:03:18 +01:00
Philip Reames
8c46881a53 [TTI] Recognize fp constants in getOperandInfo
We were recognizing vectors of floats, but not scalars.  That's a tad odd.
2022-09-21 14:34:34 -07:00
Matthias Gehre
c1502425ba Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth
Also remove new-pass-manager version of ExpandLargeDivRem because there is no way
yet to access TargetLowering in the new pass manager.

Differential Revision: https://reviews.llvm.org/D133691
2022-09-12 17:06:16 +01:00
Matthias Gehre
2090e85fee [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64
This adds the ExpandLargeDivRem to the default pass pipeline.
The limit at which it expands div/rem instructions is configured
via a new TargetTransformInfo hook (default: no expansion)
X86, Arm and AArch64 backends implement this hook to expand div/rem
instructions with more than 128 bits.

Differential Revision: https://reviews.llvm.org/D130076
2022-09-06 15:32:04 +01:00
Simon Pilgrim
e2d140e9c3 [TTI] Add isExpensiveToSpeculativelyExecute wrapper
CGP uses a raw `getInstructionCost(I, TargetTransformInfo::TCK_SizeAndLatency) >= TCC_Expensive` check to see if its better to move an expensive instruction used in a select behind a branch instead.

This is causing issues with upcoming improvements to TCK_SizeAndLatency costs on X86 as we need to use TCK_SizeAndLatency as an uop count (so its compatible with various target-specific buffer sizes - see D132288), but we can have instructions that have a low TCK_SizeAndLatency value but should still be treated as 'expensive' (FDIV for example) - by adding a isExpensiveToSpeculativelyExecute wrapper we can keep the current behaviour but still add an x86 override in a future patch when the cost tables are updated to compensate.
2022-09-03 13:12:22 +01:00
Philip Reames
c9608d57b8 [TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC]
This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet.  The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.
2022-08-23 07:55:42 -07:00
Philip Reames
104fa367ee [TTI] Use OperandValueInfo in getArithmeticInstrCost implementation [NFC]
This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both.

This is the change which motivated the whole sequence which preceeded it.  In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact.  This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through.

I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance.  For instance, every parameter which changes type in this change also changes name.  This was intentional to make sure that every call site possible effected must show up in the diff.  This let me audit each one closely.
2022-08-22 15:16:39 -07:00
Philip Reames
27d3321c4f [TTI] Use OperandValueInfo in getMemoryOpCost client api [nfc]
This removes the last use of OperandValueKind from the client side API, and (once this is fully plumbed through TTI implementation) allow use of the same properties in store costing as arithmetic costing.
2022-08-22 11:26:31 -07:00
Philip Reames
274f86e7a6 [TTI] Remove OperandValueKind/Properties from getArithmeticInstrCost interface [nfc]
This completes the client side transition to the OperandValueInfo version of this routine.  Backend TTI implementations still use the prior versions for now.
2022-08-22 11:06:32 -07:00
Philip Reames
c42a5f1cc2 [TTI] Migrate getOperandInfo to OperandVaueInfo [nfc]
This is part of merging OperandValueKind and OperandValueProperties.
2022-08-22 10:19:02 -07:00
Philip Reames
5cd427106d [TTI] Start process of merging OperandValueKind and OperandValueProperties [nfc]
OperandValueKind and OperandValueProperties both provide facts about the operands of an instruction for purposes of cost modeling.  We've discussed merging them several times; before I plumb through more flags, let's go ahead and do so.

This change only adds the client side interface for getArithmeticInstrCost and makes a couple of minor changes in client code to prove that it works.  Target TTI implementations still use the split flags.  I'm deliberately splitting what could be one big change into a series of smaller ones so that I can lean on the compiler to catch errors along the way.
2022-08-22 09:48:15 -07:00
Ting Wang
d2d77e050b [PowerPC][Coroutines] Add tail-call check with call information for coroutines
Fixes #56679.

Reviewed By: ChuanqiXu, shchenz

Differential Revision: https://reviews.llvm.org/D131953
2022-08-21 22:20:40 -04:00
Simon Pilgrim
5263155d5b [CostModel] Add CostKind argument to getShuffleCost
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.

Differential Revision: https://reviews.llvm.org/D132287
2022-08-21 10:54:51 +01:00
Philip Reames
b0a2c48e9f [tti] Consolidate getOperandInfo without OperandValueProperties copies [nfc] 2022-08-19 16:22:22 -07:00
Alexey Bataev
d53e245951 [COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC.
Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to
better estimate cost with immediate values.

Part of D126885.
2022-08-19 07:33:00 -07:00
Simon Pilgrim
fdec50182d [CostModel] Replace getUserCost with getInstructionCost
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.

Original Patch by @samparker (Sam Parker)

Differential Revision: https://reviews.llvm.org/D79483
2022-08-18 11:55:23 +01:00
Simon Pilgrim
1d522a39f7 [TTI] Remove getInstructionThroughput cost helper.
Pulled out of D79483 - we can just as easily use getUserCost directly
2022-08-17 11:41:47 +01:00
Dinar Temirbulatov
cab6cd6834 [AArch64][LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding.
After D121595 was commited, I noticed regressions assosicated with small trip
count numbersvectorisation by tail folding with scalable vectors. As a solution
for those issues I propose to introduce the minimal trip count threshold value.

  Differential Revision: https://reviews.llvm.org/D130755
2022-08-09 22:10:17 +01:00
Fangrui Song
7d6017fd31 [TTI] Change new getVectorInstrCost overload to use const reference after D131114
A const reference is preferred over a non-null const pointer.
`Type *` is kept as is to match the other overload.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D131197
2022-08-04 15:16:51 -07:00
Mingming Liu
bc8f2f3649 [AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation.
1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method.
2) This patch also changes a few callsites (VectorCombine.cpp,
   SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method.
3) This is a split of D128302.

Differential Revision: https://reviews.llvm.org/D131114
2022-08-04 12:58:25 -07:00
David Sherwood
4ef9cb6c17 [AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses
If we have interleave groups in the loop we want to vectorise then
we should fall back on normal vectorisation with a scalar epilogue. In
such cases when tail-folding is enabled we'll almost certainly go on to
create vplans with very high costs for all vector VFs and fall back on
VF=1 anyway. This is likely to be worse than if we'd just used an
unpredicated vector loop in the first place.

Once the vectoriser has proper support for analysing all the costs
for each combination of VF and vectorisation style, then we should
be able to remove this.

Added an extra test here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D128342
2022-08-02 09:52:33 +01:00
jacquesguan
e38af7ba95 [LV] Refactor getExtendedAddReductionCost to support other extended reduction more than Add.
Now the API getExtendedAddReductionCost is used to determine the cost of extended Add reduction with optional Mul. For Arm, it could cover the cases. But for other target, for example: RISCV, they support other kinds of extended recution, such as FAdd.

This patch does the following changes:
1, Split getExtendedAddReductionCost into 2 new API: getExtendedReductionCost which handles the extended reduction with addtional input of Opcode; getMulAccReductionCost which handle the MLA cases the getExtendedAddReductionCost.
2, Refactor getReductionPatternCost, add some contraint condition to make sure the getMulAccReductionCost should only handle the reuction of Add + Mul.

Differential Revision: https://reviews.llvm.org/D130868
2022-08-02 16:02:38 +08:00
Stanislav Mekhanoshin
0562cf442f Allow data prefetch into non-default address space
I am playing with the LoopDataPrefetch pass and found out that it
bails to work with a pointer in a non-zero address space. This
patch adds the target callback to check if an address space is to
be considered for prefetching. Default implementation still only
allows address space 0, so this is NFCI.

This does not currently affect any known targets, but seems to be
generally useful for the future.

Differential Revision: https://reviews.llvm.org/D129795
2022-07-27 10:01:26 -07:00
Malhar Jajoo
41958f76d8 [Costmodel] Add "type-based-intrinsic-cost" cli option
This patch adds a command line flag to be able to test
the type based cost-model analysis for Intrinsics.

Differential Revision: https://reviews.llvm.org/D129109
2022-07-22 15:50:57 +01:00