845 Commits

Author SHA1 Message Date
David Green
4ac2721e51
[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934)
This tries to add some costs for the shuffle in a ST3/ST4 instruction,
which are represented in LLVM IR as store(interleaving shuffle). In
order to detect the store, it needs to add a CxtI context instruction to
check the users of the shuffle. LD3 and LD4 are added, LD2 should be a
zip1 shuffle, which will be added in another patch.

It should help fix some of the regressions from #87510.
2024-04-09 16:36:08 +01:00
Simon Pilgrim
f09f9bc0aa [X86] Add TODO to remove getGSScalarCost and use BaseT / getCommonMaskedMemoryOpCost directly
There are only a few differences in the use of AddressSpace and getScalarizationOverhead that need to be handled.
2024-04-05 13:16:27 +01:00
Simon Pilgrim
1b4c37fec2 [TTI][X86] getGSVectorCost/getGSScalarCost - add CostKind to the function arguments.
Initial refactor - only getGSScalarCost can actually use CostKind so far, and currently both are only ever set to TCK_RecipThroughput.
2024-04-05 11:15:46 +01:00
Simon Pilgrim
ed41249498 [CostModel][X86] Update AVX1 sext v4i1 -> v4i64 cost based off worst case llvm-mca numbers
We were using raw instruction count which overestimated the costs for #67803
2024-04-04 17:17:55 +01:00
Simon Pilgrim
3871eaba6b [CostModel][X86] Update AVX1 sext v8i1 -> v8i32 cost based off worst case llvm-mca numbers
We were using raw instruction count which overestimated the costs for #67803
2024-04-04 12:26:35 +01:00
Il-Capitano
308ed0233a
[Intrinsics] Make patchpoint.i64 generic on its return type (#85911)
Currently patchpoints can only have two result types, `void` and `i64`.
This limits the result to general purpose registers.
This patch makes `patchpoint.i64` an overloadable intrinsic, allowing
result values that can fit in a single register (e.g. integers,
pointers, floats).
2024-03-26 19:08:52 +05:30
Simon Pilgrim
ee5e027cc6 [X86] getShuffleCost - recognise concat_vector(X,Y) shuffle as InsertSubvector instead of PermuteTwoSrc
We don't have a concat_vector shuffle kind and improveShuffleKindFromMask won't alter the base type to match it as InsertSubvector.

But since this is how X86 will lower concat_vector anyhow, just recognise it explicitly.

Another step for #67803
2024-03-21 09:29:39 +00:00
Kolya Panchenko
889d99a50f
[TTI] Add alignment argument to TTI for compress/expand support (#83516)
Since `llvm.compressstore` and `llvm.expandload` do require memory
access, it's essential for some target to check if alignment is good to
be able to lower them to target-specific instructions
2024-03-05 20:33:56 -05:00
Nikita Popov
e84182af91
[X86][Inline] Skip inline asm in inlining target feature check (#83820)
When inlining across functions with different target features, we
perform roughly two checks:
 1. The caller features must be a superset of the callee features.
2. Calls in the callee cannot use types where the target features would
change the call ABI (e.g. by changing whether something is passed in a
zmm or two ymm registers). The latter check is very crude right now.

The latter check currently also catches inline asm "calls". I believe
that inline asm should be excluded from this check, as it is independent
from the usual call ABI, and instead governed by the inline asm
constraint string.

Fixes https://github.com/llvm/llvm-project/issues/67054.
2024-03-05 14:21:33 +01:00
Simon Pilgrim
9978f6a10f [CostModel][X86] Reduce the extra costs for ICMP complex predicates when an operand is constant
In most cases, SETCC lowering will be able to simplify/commute the comparison by adjusting the constant.

TODO: We still need to adjust ExtraCost based on CostKind

Fixes #80122
2024-02-21 16:19:39 +00:00
Simon Pilgrim
a0869b14cd [CostModel][X86] Fix expanded CTPOP i8 costs
Updated to match #79989 / 9410019ac977141bc73aee19690b5896ded59219
2024-02-21 14:54:50 +00:00
Alexey Bataev
7bc079c852
[TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for
extract subvector.

Many targets do not have cost for extractsubvector shuffle kind, but
have the costs for single source permute. If there are no costs
estimation for extractsubvector, better to switchto single source
permute for better cost estimation.

Reviewers: RKSimon, davemgreen, arsenm

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/79837
2024-02-12 07:09:49 -05:00
HaohaiWen
4147b72301
[CostModel][X86] Fix fpext conversion cost for 16 elements (#76278)
The fpext conversion cost for 16 elements should be 4 from Znver4.
2024-01-09 09:05:11 +08:00
Simon Pilgrim
5842dfe34d [CostModel][X86] Update SSSE3/AVX1 BSWAP costs
Drop atom/slm costs from the default bswap costs, and update the avx1 latency costs based off latest codegen.

Based off analysis report from https://github.com/RKSimon/llvm-scripts/check_cost_tables.py

Fixes #62659
2024-01-02 12:06:48 +00:00
Alexey Bataev
5096501082 [SLP][TTI][X86]Add addsub pattern cost estimation. (#76461)
SLP/TTI do not know about the cost estimation for addsub pattern,
supported by X86. Previously the support for pattern detection was added
(seeTTI::isLegalAltInstr), but the cost still did not estimated
properly.
2023-12-28 05:04:04 -08:00
Douglas Yung
fb981e6b4b Revert "[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461)"
This reverts commit bc8c4bbd7973ab9527a78a20000aecde9bed652d.

Change is failing to build on several bots:
- https://lab.llvm.org/buildbot/#/builders/127/builds/60184
- https://lab.llvm.org/buildbot/#/builders/123/builds/23709
- https://lab.llvm.org/buildbot/#/builders/216/builds/32302
2023-12-27 23:52:04 -08:00
Alexey Bataev
bc8c4bbd79
[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461)
SLP/TTI do not know about the cost estimation for addsub pattern,
supported by X86. Previously the support for pattern detection was added
(seeTTI::isLegalAltInstr), but the cost still did not estimated
properly.
2023-12-27 15:57:21 -05:00
Sander de Smalen
81b7f115fb
[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979)
It seems TypeSize is currently broken in the sense that:

  TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)

without failing its assert that explicitly tests for this case:

  assert(LHS.Scalable == RHS.Scalable && ...);

The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.

This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.

The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
2023-11-22 08:52:53 +00:00
Fangrui Song
8e247b8f47 Replace TypeSize::{getFixed,getScalable} with canonical TypeSize::{Fixed,Scalable}. NFC 2023-10-27 00:30:41 -07:00
Phoebe Wang
58d4fe287e
[X86][EVEX512] Do not allow 512-bit memcpy without EVEX512 (#70420)
Solves crash mentioned in #65920.
2023-10-27 15:26:05 +08:00
Alexey Bataev
e22818d5c9 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-05 06:17:07 -07:00
Simon Pilgrim
baecc9e997 [CostModel][X86] getShuffleCost - add fallback (to half vector) for bfloat vector shuffle costs
Add initial half/bfloat broadcast shuffles test coverage (more to follow)

Fixes #68117 - which was stuck in a loop between getting scalarized insert/extract costs for the shuffle and then trying to convert a bfloat insert into a shuffle again......
2023-10-05 11:12:40 +01:00
Arthur Eubanks
07389535a7 Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit b186f1f68be11630355afb0c08b80374a6d31782.

Causes crashes, see https://reviews.llvm.org/D158449.
2023-10-04 14:37:16 -07:00
Alexey Bataev
b186f1f68b [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-04 07:53:30 -07:00
Alexey Bataev
1129dec778 Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 6f43d28f3452b3ef598bc12b761cfc2dbd0f34c9 to fix
a crash reported in https://reviews.llvm.org/D158449.
2023-10-03 13:02:16 -07:00
Alexey Bataev
6f43d28f34 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-03 10:26:11 -07:00
Alexey Bataev
ebcb5d59fc Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 9f5960e004ff54082ccfa9396522e07358f5b66b to fix
buildbots reported here https://lab.llvm.org/buildbot/#/builders/230/builds/19412.
2023-09-29 15:03:46 -07:00
Alexey Bataev
9f5960e004 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-29 13:16:03 -07:00
Alexey Bataev
3204f88a8b Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit c88c281cf1ac1a01c55231b93826d7c8ae83985b to fix the
crash revealed by https://lab.llvm.org/buildbot/#/builders/230/builds/19353.
2023-09-28 11:57:32 -07:00
Alexey Bataev
c88c281cf1 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-28 11:03:21 -07:00
Youngsuk Kim
e5026f0179 [llvm] Remove uses of Type::getPointerTo() (NFC)
Partial progress towards removing in-tree uses of `getPointerTo()`,
by employing the following options:

* Drop the call entirely if the sole purpose of it is to support a no-op
  bitcast (remove the no-op bitcast as well).

* Replace with `PointerType::get()`/`PointerType::getUnqual()`

This is a NFC cleanup effort.

Reviewed By: barannikov88

Differential Revision: https://reviews.llvm.org/D155232
2023-09-22 19:44:38 -04:00
Alexey Bataev
9a207578ac [TTI]Add InsertSubvector pattern in improveShuffleKindFromMask().
It improves shuffle instructions estimation and improves vectorization
outcome.

Differential Revision: https://reviews.llvm.org/D157425
2023-08-18 13:47:01 -07:00
XinWang10
993bdb047c [X86]Support options -mno-gather -mno-scatter
Gather instructions could lead to security issues, details please refer to https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html.
This supported options -mno-gather and -mno-scatter, which could avoid generating gather/scatter instructions in backend except using intrinsics or inline asms.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D157680
2023-08-17 23:02:25 -07:00
Simon Pilgrim
bbfdb8cc2d [CostModel][X86] Add scalar rotate-by-immediate costs
As noted on #63980 rotate by immediate amounts is much cheaper than variable amounts.

This still needs to be expanded to vector rotate cases, and we need to add reasonable funnel-shift costs as well (very tricky as there's a huge range in CPU behaviour for these).
2023-07-27 16:54:30 +01:00
Simon Pilgrim
9da119a6a6 [X86] getIntImmCostInst - avoid repeating getNumOperands() in for-loop (style). NFC. 2023-07-23 15:49:33 +01:00
Simon Pilgrim
1ebc965116 [X86] getIntImmCostInst - silence static analyzer overflow warning. NFCI.
Use the divideCeil uint64_t return type directly
2023-07-23 15:49:33 +01:00
Phoebe Wang
f11526b091 [X86][BF16] Do not scalarize masked load for BF16 when we have AVX512BF16
Fixes #63017

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D155952
2023-07-22 18:16:49 +08:00
Phoebe Wang
fbae3d1d3c Revert "[X86][BF16] Do not scalarize masked load for BF16 when we have BWI"
This reverts commit ca1c05208ed35ba72869c65ad773b2cca4bbd360.

It caused Buildbot fail: https://lab.llvm.org/buildbot#builders/220/builds/24870
2023-07-21 23:29:11 +08:00
Phoebe Wang
ca1c05208e [X86][BF16] Do not scalarize masked load for BF16 when we have BWI
Fixes #63017

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D155952
2023-07-21 23:18:54 +08:00
David Green
12025cef3e [CostModel] Use min/max intrinsics for vecreduce.min/max costs
This changes the costmodelling of the vecreduce.min/max nodes to use the costs
of the relevant min/max intrinsics instead of expanding them to compare and
selects. The getMinMaxReductionCost have changed to take a Opcode for the
relevant intrinsic, dropping the IsUnsigned and CondTy parameters as they are
no longer needed.

A follow up patch will add some basic fminimum/fmaximum costmodelling.

Differential Revision: https://reviews.llvm.org/D153547
2023-07-04 15:02:30 +01:00
Luke Lau
a68dcd09e8 [TTI] Use users of GEP to guess access type in getGEPCost
Currently getGEPCost uses the target type of the GEP as a heuristic for
the type that will be accessed, to pass onto isLegalAddressingMode.
Targets use this to work out if a GEP can then be folded into the
load/store instruction that uses the GEP.
For example, on RISC-V loads and stores can have an offset added to a
base register folded into a single instruction, so the following GEP is
free:

%p = getelementptr i32, ptr %base, i32 42       ; getInstructionCost = 0
%x = load i32, ptr %p                           ; getInstructionCost = 1
------------------------------------------------------------------------
lw t0, a0(42)

However vector loads and stores cannot have an offset folded into them,
so the following GEP is costed:

%p = getelementptr <2 x i32>, ptr %base, i32 42 ; getInstructionCost = 1
%x = load <2 x i32>, ptr %p                     ; getInstructionCost = 1
------------------------------------------------------------------------
addi  a0, 42
vle32 v8, (a0)

The issue arises whenever there is a mismatch between the target type of
the GEP and the type that is actually accessed:

%p = getelementptr i32, ptr %base, i32 42       ; getInstructionCost = 0
%x = load <2 x i32>, ptr %p                     ; getInstructionCost = 1
------------------------------------------------------------------------
addi  a0, 42
vle32 v8, (a0)

Even though this GEP will result in an add instruction, because TTI
thinks it's loading an i32, it will think it can be folded and not
charge for it.

The target type can become mismatched with the memory access during
transformations, noticeably during SLP where a scalar base pointer will
be reused to perform a vector load or store.

This patch adds an optional AccessType argument to getGEPCost which
allows the type of memory accessed by users to be passed in as a hint,
so that we can more accurately determine if the GEP can be folded into
its users.

If AccessType is not provided, getGEPCost falls back to the old
behaviour of using the PointeeType to guess the memory access type. This
can be revisited in a later patch.

Also for now, only GEPs with exactly one user use the access type hint.
Whilst we could look through all users and use all access types to
determine if we can fold the GEP, this patch avoids doing so to prevent
O(N) behaviour.

Differential Revision: https://reviews.llvm.org/D149889
2023-06-29 13:44:37 +01:00
Simon Pilgrim
595a74391d [CostModel][X86] Tweak SSE2 v2i64 multiply costs based off D46276 script
It looks like we were trying to account for SLM costs, which are actually handled separately

Fixes #62969
2023-06-14 11:06:15 +01:00
Simon Pilgrim
e64f9140c5 [TTI][X86] Recognise PMULUDQ costs for vXi64 multiplies
Addresses part of Issue #62969 - if the upper 32-bits of the vXi64 elements are known to be zero, then a multiply simplifies to a single (fast) PMULUDQ instruction

We still have the problem that minRequiredElementSize can't determine that the upper bits are zero for the test case from Issue #62969 - I'll take a look at that next.
2023-06-14 10:34:02 +01:00
Luke Lau
c27a0b21c5 [SLP][RISCV] Account for offset folding in getPointersChainCost
For a GEP in a pointer chain, if:
1) a pointer chain is unit-strided
2) the base pointer wasn't folded and is sitting in a register somewhere
3) the distance between the GEP and the base pointer is small enough and
   can be folded into the addressing mode of the using load/store

Then we can exclude that GEP from the total cost of the pointer chain,
as it will likely be folded away.

In order to check if 3) holds, we need to know the type of memory access
being made by the users of the pointer chain. For that, we need to pass
along a new argument to getPointersChainCost. (Using the source pointer
type of the GEP isn't accurate, see https://reviews.llvm.org/D149889 for
more details).

Also note that 2) is currently an assumption, and could be modelled more
accurately.

This prevents some unprofitable cases from being SLP vectorized on
RISC-V by making the scalar costs cheaper and closer to the actual
codegen.

For now the getPointersChainCost hook is duplicated for RISC-V to prevent
disturbing other targets, but could be merged back in and shared with
other targets in a following patch.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D149654
2023-05-22 13:55:30 +01:00
ManuelJBrito
d22edb9794 [IR][NFC] Change UndefMaskElem to PoisonMaskElem
Following the change in shufflevector semantics,
poison will be used to represent undefined elements in shufflevector masks.

Differential Revision: https://reviews.llvm.org/D149256
2023-04-27 18:01:54 +01:00
Simon Pilgrim
aca5f9aeea [CostModel][X86] getMemoryOpCost - increase cost of sub-32-bit vector load/stores
For 8-bit/16-bit vector loads/stores we scalarize and transfer to/from the vector unit, or use the (usually slow) PINSR/PEXTR instructions.

Fixes #59867
2023-04-23 21:48:25 +01:00
Simon Pilgrim
fed28ada47 [CostModel][X86] Add i64 MUL latency/codesize/size-latency cost estimates 2023-04-21 15:55:22 +01:00
Simon Pilgrim
ceccc59aac [CostModel][X86] Add i32 MUL latency/codesize/size-latency cost estimates 2023-04-21 15:42:41 +01:00
Simon Pilgrim
3e9d046bfc [CostModel][X86] Improve i16 and vXi16 MUL costs
Use a modified version of the D103695 script to determine more accurate throughput/latency/codesize/size-latency cost estimates
2023-04-21 15:42:40 +01:00
Simon Pilgrim
4060042384 [CostModel][X86] Improve i8 and vXi8 MUL costs
We were treating vXi8 multiply as the sum of a trunc(mul(extend(),extend())) which diverged from the costs from llvm-mcaonce we extended beyond legal types

Use a modified version of the D103695 script to determine more accurate throughput/latency/codesize/size-latency cost estimates

Helps address some of the regressions identified in D148806
2023-04-20 19:38:51 +01:00