885 Commits

Author SHA1 Message Date
Luke Lau
b02b1e0ed6 [LV][NFC] Use ElementCount for getMaxInterleaveFactor
In order to allow targets to disable interleaving for scalable vectors, pass the entire VF's ElementCount to getMaxInterleaveFactor.
This is based off of the approach used here: 8d36708507

The plan would then be to disable interleaving on scalable VFs on RISC-V in a follow up patch.
See https://reviews.llvm.org/D143723#4132349

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D144474
2023-02-22 10:15:05 +00:00
Kazu Hirata
a7baaab952 Use APInt::isZero instead of APInt::isNulLValue (NFC)
Note that APInt::isNullValue has been soft-deprecated in favor of
APInt::isZero.
2023-02-19 22:23:58 -08:00
Kazu Hirata
cbde2124f1 Use APInt::popcount instead of APInt::countPopulation (NFC)
This is for consistency with the C++20-style bit manipulation
functions in <bit>.
2023-02-19 11:29:12 -08:00
ShihPo Hung
5fb3a57ea7 [Cost] Add CostKind to getVectorInstrCost and its related users
LoopUnroll estimates the loop size via getInstructionCost(),
but getInstructionCost() cannot pass CostKind to getVectorInstrCost().
And so does getShuffleCost() to getBroadcastShuffleOverhead(),
getPermuteShuffleOverhead(), getExtractSubvectorOverhead(),
and getInsertSubvectorOverhead().

To address this, this patch adds an argument CostKind to these
functions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D142116
2023-01-21 05:29:24 -08:00
Guillaume Chatelet
8fd5558b29 [NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:49:38 +00:00
Alexey Bataev
f698c21345 [X86][NFC]Move and rephrase the comment, NFC 2023-01-10 04:35:11 -08:00
Alexey Bataev
9b5f62685a [SLP]Fix cost of the broadcast buildvector/gather.
Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.

Differential Revision: https://reviews.llvm.org/D140498
2023-01-06 09:25:05 -08:00
Fangrui Song
21c4dc7997 std::optional::value => operator*/operator->
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).

This fixes clang.
2022-12-17 00:42:05 +00:00
Roman Lebedev
64d46e141c
[NFC][Costmodel][X86] Replication shuffle: AVX512F can promote i1 to i32.
As the added codegen test coverage shows,
there isn't that much difference between AVX512DQI and
baseline AVX512F codegen, DQI added `vpmovm2d`/`vpmovd2m`,
but with just the Foundation we can use `vpternlogd`/`vptestmd`
to do the same.
2022-12-13 21:21:07 +03:00
Roman Lebedev
ff5fcda430
[x86][Costmodel] AVX512VL: add missing costs for v8 i1<->i32 casts
This would come up as a regression in the follow-up Replication-of-i1 patch.

https://godbolt.org/z/fxr9Mzssr
2022-12-13 21:21:07 +03:00
Simon Pilgrim
f6a96bee51 [X86] X86TTIImpl::getIntImmCost - use APInt::isInt/isSignedInt directly
Avoid some getSExtValue()/getZExtValue() calls

Hopefully we can remove some of the getBitWidth() constraints as well, as many are just there as a proxy for legal types (albeit assuming x86_64).
2022-12-12 15:32:49 +00:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Krzysztof Parzyszek
86fe4dfdb6 TargetTransformInfo: convert Optional to std::optional
Recommit: added missing "#include <cstdint>".
2022-12-02 11:42:15 -08:00
Krzysztof Parzyszek
4e12d1836a Revert "TargetTransformInfo: convert Optional to std::optional"
This reverts commit b83711248cb12639e7ef7303cfbb4452b4067e85.

Some buildbots are failing.
2022-12-02 11:34:04 -08:00
Krzysztof Parzyszek
b83711248c TargetTransformInfo: convert Optional to std::optional 2022-12-02 11:27:12 -08:00
Haohai Wen
1215e86a0e [CostModel][X86] Fix permute latency cost
Avx512 permute latency should be 3 instead of 1.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D138427
2022-11-23 19:17:16 +08:00
Roman Lebedev
8e37b53360
[X86] Rewrite getScalarizationOverhead()
All of our insert/extract ops work on 128-bit lanes.

For `Insert`, we need to extract affected 128-bit lane,
unless it's being fully overwritten (FIXME: do we need to be
careful about legalization-induced padding that we obviously don't demand?),
perform insertions, and then insert the 128-bit lane back.

But hold on. If we are operating on an 256-bit legal vector,
and thus have two 128-bit subvectors, and are fully overwriting them both,
we don't actually need to insert *both* subvectors,
only the second one, into the implicitly-widened first one.

Also, `Insert` wasn't actually querying the costs,
but just assuming them to be `1`.

`getShuffleCost(TTI::SK_ExtractSubvector)` notes:
```
  // Note that in general, the insertion starting at the beginning of a vector
  // isn't free, because we need to preserve the rest of the wide vector.
```
... so as far as i can tell, we didn't account for that.

I was hoping this would allow vectorization at a higher VF at one case i looked at,
but the subvector insertion cost is still dis-advising that.

The change for `Extract` is NFC, and is for consistency only,
i wanted to get rid of of that weird explicit discounting of insertion of 0'th element,
since the general code should already deal with that.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D137913
2022-11-15 21:07:12 +03:00
Simon Pilgrim
196f27bb56 [CostModel][X86] Add missing cost kinds for v2i64 icmp on SLM 2022-09-25 15:12:21 +01:00
Simon Pilgrim
faff990e9b [X86] Fix Icelake VPMULLQ zmm pipes and adjust AVX512DQ v8i64 mul costs to match worse case
Icelake PMULLQ throughput regressed cf SkylakeServer as its Pipe0 only

Confirmed with Intel SOM, Agner and instlatx64
2022-09-25 14:18:08 +01:00
Simon Pilgrim
a6e9141505 [TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436)
The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both.

This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling.

Fixes #50778

Differential Revision: https://reviews.llvm.org/D111968
2022-09-23 14:03:18 +01:00
Simon Pilgrim
98907f8685 [CostModel][X86] Tidyup sdiv/srem/udiv/urem by constant cost tables
Preparation for adding cost kinds handling

This is necessary to eventually unblock D111968
2022-09-22 20:46:33 +01:00
Simon Pilgrim
dc93202b44 [CostModel][X86] Remove duplicate ashr v4i64 cost table entry. NFCI. 2022-09-22 17:27:26 +01:00
Simon Pilgrim
e030be64d8 [CostModel][X86] Add partial CostKinds handling for funnelshifts/rotates
This mainly just adds costs for the targets where we have actual funnelshift/rotate instructions (VBMI2/XOP etc.) - the cases where we expand still need addressing, although for many the default shift+or expansion, especially for uniform cases, isn't that bad.

This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-22 11:24:11 +01:00
Simon Pilgrim
b2cd8118d0 [CostModel][X86] Add CostKinds handling for smax/smin/umax/umin instructions
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-22 10:19:23 +01:00
Simon Pilgrim
839ba13c3e [CostModel][X86] Add vbmi2 costs for funnelshift/rotate intrinsics
Add costs for the funnel shift instructions - fixes some discrepancies I was hitting with costs numbers from the 'cost-tables vs llvm-mca' script D103695
2022-09-21 13:48:22 +01:00
Simon Pilgrim
6b4d409f69 [CostModel][X86] Add CostKinds handling for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF instructions
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-19 17:37:58 +01:00
Simon Pilgrim
135c9b2c4b [CostModel][X86] Add CostKinds handling for vector ctlz instructions
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-19 16:44:09 +01:00
Simon Pilgrim
2538adde5c [CostModel][X86] Add CostKinds handling for cttz
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-19 15:57:03 +01:00
Simon Pilgrim
d90a42d64c [CostModel][X86] Add CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF cost handling
Without LZCNT/BMI, the *_ZERO_UNDEF costs are cheaper as they can avoid the zero handling.
2022-09-19 14:06:33 +01:00
Simon Pilgrim
23cb1c42cd [CostModel][X86] Update throughput costs for CTLZ ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (and recent fixes to the bdver2 + alderlake models)

Adding full CostKinds costs are affecting some other tests as they make assumptions about SizeLatency costs, so they need addressing first
2022-09-16 16:56:49 +01:00
Simon Pilgrim
f8fa04295f [CostModel][X86] Add CostKinds handling for vector integer comparisons
These were based off a mixture of vector integer add/sub costs and the numbers from the 'cost-tables vs llvm-mca' script from D103695 - the extra costs for different predicates are still proving tricky to implement, but I've gotten most costs to within +/1 now - the AVX512 are tricky as we still don't handle predicate results properly, so most of these were done by hand.
2022-09-16 13:03:41 +01:00
Simon Pilgrim
94620e4fc3 [CostModel][X86] Add CostKinds handling for vector shift by generic/non-uniform shift amounts
These are the worst case generic vector shift costs, where nothing is known about the shift amounts - in particular this should stop us using the default sizelatency cost of 1 for so many pre-AVX2 vector shifts that can often actually expand during lowering to +20 uops, just for 128-bit vectors, resulting in some horrible inline/unroll decisions.

This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)
2022-09-15 16:51:58 +01:00
Simon Pilgrim
0ec028fe10 [CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops
Vector shift by const uniform is the cheapest shift instruction we have, non-const uniform have a marginally higher cost - some targets 'splat' the amount internally to use the shift-per-element instruction, others see a higher cost for the explicit zeroing of the upper bits for the (64-bit) shift amount.

This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)
2022-09-15 14:05:30 +01:00
Simon Pilgrim
854a4595b6 [CostModel][X86] getArithmeticInstrCost - move GLM/SLM custom costs AFTER constant shift -> multiply canonicalization
Corrects the shift by constant costs to better account for them being converted to multiples for lowering - which demonstrates that we should probably be trying harder NOT to convert these to multiplies for some CPUs (v4i32 in particular).
2022-09-14 11:46:26 +01:00
Simon Pilgrim
40ab7875f8 [CostModel][X86] Fix throughput costs for AVX512BW v32i16 shifts
Fixes regression from a931dbfbd30754cf39897037a223eee60ae9e855
2022-09-14 11:18:23 +01:00
Matthias Gehre
c1502425ba Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth
Also remove new-pass-manager version of ExpandLargeDivRem because there is no way
yet to access TargetLowering in the new pass manager.

Differential Revision: https://reviews.llvm.org/D133691
2022-09-12 17:06:16 +01:00
Simon Pilgrim
20ad05f9b4 [CostModel][X86] Add CostKinds handling for abs ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695
2022-09-12 16:34:37 +01:00
Simon Pilgrim
bd0109f392 [CostModel][X86] Move AVX512/AVX2 uniform shift costs into the generic uniform cost tables
They shouldn't be happening after XOP shift costs - AVX2 shift supports takes preference over XOP for everything but vXi8 shifts - the improvement is pretty limited as it only affects bdver4 targets but it does help clean up a fraction of the messy shift cost logic....
2022-09-12 12:08:42 +01:00
Simon Pilgrim
a931dbfbd3 [CostModel][X86] Merge AVX512BW vXi8/vXi16 shifts into default AVX512BW cost table
We only need to handle the uniform cases early
2022-09-10 18:18:42 +01:00
Simon Pilgrim
10edf88458 [CostModel][X86] Update CTPOP costs
With the bdver2 model updates, many of the AVX1 costs were far too high - it also helped expose some costs mismatches for Atom/Silvermont
2022-09-10 17:57:20 +01:00
Simon Pilgrim
55b78e28d8 [CostModel][X86] Add missing i8 throughput cost 2022-09-09 10:58:51 +01:00
Simon Pilgrim
e74102a963 [CostModel][X86] Merge getTypeBasedIntrinsicInstrCost into getIntrinsicInstrCost
For the few non type based intrinsic cases we can just check for !isTypeBasedOnly() to access the args directly.

I don't think we have a need to keep getTypeBasedIntrinsicInstrCost in BasicTTIImpl.h any more and can do a similar merge there as well - but it's a messier refactor and will take a while.
2022-09-07 12:04:09 +01:00
Simon Pilgrim
648e182d92 [CostModel][X86] getIntrinsicInstrCost - convert to CostKindTblEntry
Begin the refactoring to use CostKindTblEntry and return real latency/codesize/sizelatency costs instead of reusing the throughput numbers

This should allow us to merge getTypeBasedIntrinsicInstrCost into getIntrinsicInstrCost and remove all remaining references
2022-09-06 22:05:32 +01:00
Simon Pilgrim
10e0f3e948 [CostModel][X86] Add CostKinds handling for ctpop ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (although it still struggles with avx512 predicate numbers which had to be done manually)

Some of the pre-AVX values still aren't great - atom/slm worst case numbers for ctpop expansion really affect these (especially throughput/latency), so we need to clean them up in a more consistent way - its a pity we don't have models for more older cpus (merom/nehalem etc.) as other examples.
2022-09-06 17:27:24 +01:00
Matthias Gehre
2090e85fee [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64
This adds the ExpandLargeDivRem to the default pass pipeline.
The limit at which it expands div/rem instructions is configured
via a new TargetTransformInfo hook (default: no expansion)
X86, Arm and AArch64 backends implement this hook to expand div/rem
instructions with more than 128 bits.

Differential Revision: https://reviews.llvm.org/D130076
2022-09-06 15:32:04 +01:00
Simon Pilgrim
83552e8c72 [CostModel][X86] Add CostKinds handling for SSE FCMP_ONE/FCMP_UEQ predicates
These require special handling to account for their expansion in lowering.

I'm trying very hard not to have to add predicate specific costs - but it might be inevitable.....
2022-09-06 12:05:22 +01:00
Simon Pilgrim
c1b5e36d74 [CostModel][X86] Add CostKinds handling for fcmp ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (although it still struggles with avx512 predicate numbers which had to be done manually)

SSE numbers are still too low for FCMP_ONE/FCMP_UEQ cases which expand to a more complex sequence than the existing 'ExtraCost' system can manage.
2022-09-06 10:34:53 +01:00
Simon Pilgrim
8534f51474 [CostModel][X86] Add CostKinds handling for sqrt intrinsicc
This was achieved using the 'cost-tables vs llvm-mca' script from D103695

Some of the znver1/znver2 latency/throughput numbers were really weird (some copy+paste afaict) - I've used the numbers from the AMD SoG, which roughly match the 'worst case' range value from Agner
2022-09-04 18:39:21 +01:00
Simon Pilgrim
626a84db47 [CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry
Begin the refactoring to use CostKindTblEntry and return real latency/codesize/sizelatency costs instead of reusing the throughput numbers
2022-09-04 17:59:08 +01:00
Simon Pilgrim
80d4b3a275 Revert rG06e73626cf0fc33b025a0f98f1eee4a302279982 "[CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry"
Some arm buildbots are complaining about a phase ordering test failure in unsigned-multiply-overflow-check.ll - I guess this test needs making x86 specific first
2022-09-04 17:51:11 +01:00