232 Commits

Author SHA1 Message Date
Shih-Po Hung
3d985a6f1b
[RISCV][TTI] Scale the cost of Select with LMUL (#88098)
Use the Val type to estimate the instruction cost for SelectInst.
2024-04-10 14:18:15 +08:00
Shih-Po Hung
ee52add6cb
[RISCV][TTI] Implement cost of intrinsic active_lane_mask (#87931)
This patch uses the argument type to infer the LMUL cost for the index
generation, add, and comparison.
2024-04-10 10:08:33 +08:00
David Green
4ac2721e51
[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934)
This tries to add some costs for the shuffle in a ST3/ST4 instruction,
which are represented in LLVM IR as store(interleaving shuffle). In
order to detect the store, it needs to add a CxtI context instruction to
check the users of the shuffle. LD3 and LD4 are added, LD2 should be a
zip1 shuffle, which will be added in another patch.

It should help fix some of the regressions from #87510.
2024-04-09 16:36:08 +01:00
Alexey Bataev
413a66f339
[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172)
This patch introduces generating VP intrinsics in the Loop Vectorizer.

Currently the Loop Vectorizer supports vector predication in a very
limited capacity via tail-folding and masked load/store/gather/scatter
intrinsics. However, this does not let architectures with active vector
length predication support take advantage of their capabilities.
Architectures with general masked predication support also can only take
advantage of predication on memory operations. By having a way for the
Loop Vectorizer to generate Vector Predication intrinsics, which (will)
provide a target-independent way to model predicated vector
instructions. These architectures can make better use of their
predication capabilities.

Our first approach (implemented in this patch) builds on top of the
existing tail-folding mechanism in the LV (just adds a new tail-folding
mode using EVL), but instead of generating masked intrinsics for memory
operations it generates VP intrinsics for loads/stores instructions. The
patch adds a new VPlanTransforms to replace the wide header predicate
compare with EVL and updates codegen for load/stores to use VP
store/load with EVL.

Other important part of this approach is how the Explicit Vector Length
is computed. (VP intrinsics define this vector length parameter as
Explicit Vector Length (EVL)). We use an experimental intrinsic
`get_vector_length`, that can be lowered to architecture specific
instruction(s) to compute EVL.

Also, added a new recipe to emit instructions for computing EVL. Using
VPlan in this way will eventually help build and compare VPlans
corresponding to different strategies and alternatives.

Differential Revision: https://reviews.llvm.org/D99750
2024-04-04 18:30:17 -04:00
Shih-Po Hung
97523e5321
[RISCV][TTI] Scale the cost of intrinsic stepvector with LMUL (#87301)
Use the return type to measure the LMUL size for latency/throughput cost
2024-04-04 08:30:15 +08:00
Shih-Po Hung
d7a43a00fe
[RISCV][TTI] Scale the cost of trunc/fptrunc/fpext with LMUL (#87101)
Use the destination data type to measure the LMUL size for
latency/throughput cost
2024-04-02 09:30:51 +08:00
Shih-Po Hung
84f24c2daf
[RISCV][TTI] Scale the cost of intrinsic umin/umax/smin/smax with LMUL (#87245)
Use the return type to measure the LMUL size for throughput/latency cost
2024-04-02 09:26:27 +08:00
Shih-Po Hung
c7954ca312
Recommit "[RISCV] Refine cost on Min/Max reduction (#79402)" (#86480)
This is recommitted as the test and fix for
llvm.vector.reduce.fmaximum/fminimum are covered in #80553 and #80697
2024-04-01 14:44:10 +08:00
ShihPo Hung
aa2d5d5413 Recommit "[RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617)"
Changes in Recommit:
  Add an additional check on sign/zero extend to the same type.

Original message:
  Use the destination data type to measure the LMUL size for
  latency/throughput cost
2024-03-26 23:41:16 -07:00
Jianjian Guan
05a7b22a01
[RISCV] Add areInlineCompatible for riscv target (#86639)
Inline a callee if its target-features are a subset of the callers
target-features.
2024-03-27 14:16:03 +08:00
ShihPo Hung
da3e58e74a Revert "[RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617)"
This reverts commit 7545c635729a2055a429c5decd26a619a8d6e74b as it's
failing on the Linux bots.
2024-03-26 21:47:32 -07:00
Shih-Po Hung
7545c63572
[RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617)
Use the destination data type to measure the LMUL size for
latency/throughput cost
2024-03-27 10:58:17 +08:00
Craig Topper
2fbc40d36d [RISCV] Split compound if statement to fix a crash.
We're not allowed to call getELEN when the vector extension
is not enabled. If we're looking at a vector type, isTypeLegal would
only return true if the vector extensions are enabled. So early out
for non-vector types before we call isTypeLegal and getELEN.
2024-03-26 11:53:17 -07:00
ShihPo Hung
5dc0c75aab [RISCV][TTI] Fix missing return in the end of function 2024-03-25 23:32:18 -07:00
Shih-Po Hung
817f453aa5
[RISCV][TTI] Refactor getCastInstrCost to exit early (#86619)
To reduce the indentation by using early returns, this patch hoist the
return for illegal type and non vector type earlier.

It should mostly be an NFC.
2024-03-26 14:15:40 +08:00
Shih-Po Hung
3cb024198f
[RISCV][CostModel] Estimate cost of llvm.vector.reduce.fmaximum/fminimum (#80697)
The ‘llvm.vector.reduce.fmaximum/fminimum.*’ intrinsics propagate NaNs
if any element of the vector is a NaN.
Following #79402, the patch adds the cost for NaN check (vmfne + vcpop)
2024-03-25 17:17:36 +08:00
Kolya Panchenko
aa68e2814d
[RISCV] Support llvm.masked.compressstore intrinsic (#83457)
The changeset enables lowering of `llvm.masked.compressstore(%data,
%ptr, %mask)` for RVV for fixed vector type into:
```
%0 = vcompress %data, %mask, %vl
%new_vl = vcpop %mask, %vl
vse %0, %ptr, %1, %new_vl
```
Such lowering is only possible when `%data` fits into available LMULs
and otherwise `llvm.masked.compressstore` is scalarized by
`ScalarizeMaskedMemIntrin` pass.
Even though RVV spec in the section `15.8` provide alternative sequence
for compressstore, use of `vcompress + vcpop` should be a proper
canonical form to lower `llvm.masked.compressstore`. If RISC-V target
find the sequence from `15.8` better, peephole optimization can
transform `vcompress + vcpop` into that sequence.
2024-03-13 15:18:51 -04:00
Visoiu Mistrih Francis
eceb24c439
[RISCV] Hoist immediate addresses from loads/stores (#83644)
In case of loads/stores from an immediate address, avoid rematerializing
the constant for every block and allow consthoist to hoist it to the
entry block.
2024-03-05 22:41:56 -08:00
Shih-Po Hung
fb67dce1cb
[RISCV] Fix crash when unrolling loop containing vector instructions (#83384)
When MVT is not a vector type, TCK_CodeSize should return an invalid
cost. This patch adds a check in the beginning to make sure all cost
kinds return invalid costs consistently.

Before this patch, TCK_CodeSize returns a valid cost on scalar MVT but
other cost kinds doesn't.

This fixes the issue #83294 where a loop contains vector instructions
and MVT is scalar after type legalization when the vector extension is
not enabled,
2024-03-02 12:33:55 +08:00
Shih-Po Hung
6ee9c8afbc
[RISCV][CostModel] Updates reduction and shuffle cost (#77342)
- Make `andi` cost 1 in SK_Broadcast
- Query the cost of VID_V, VRSUB_VX/VRSUB_VI which would scale with LMUL
2024-02-29 15:41:19 +08:00
Philip Reames
f037e709ca
[RISCV][TTI] Cost a subvector extract at a register boundary with exact vlen (#82405)
If we have exact vlen knowledge, we can figure out which indices
correspond to register boundaries. Our lowering uses this knowledge to
replace the vslidedown.vi with a sub-register extract. Our costs can
reflect that as well.

This is another piece split off
https://github.com/llvm/llvm-project/pull/80164

---------

Co-authored-by: Luke Lau <luke_lau@icloud.com>
2024-02-21 07:56:08 -08:00
Philip Reames
2549c24142 Reapply "[RISCV][TTI] Extract subvector at index zero is free (#81751)"
This reverts commit 834d11c21541c8bf92ef598c1171e8163b69e8c7 which was
a revert of my 3a626937b1b652e3c87cd0050df9c24cc5127d3b.  I had failed
to rebase after new tests added overnight by
fc0b67e1d79d1f199687f8f06d619984d9520230.

Original commit message follows:

Extracing a subvector at index zero corresponds to a type conversion and
possibly a subregister operation. We will not emit a vslidedown. As such,
they are free.

As an aside, it looks like we're not passing an index in for cases where
the subvec type is scalable. For at least index zero, we probably should be.

Revert "Revert "[RISCV][TTI] Extract subvector at index zero is free (#81751)""
2024-02-15 16:51:15 -08:00
Craig Topper
834d11c215 Revert "[RISCV][TTI] Extract subvector at index zero is free (#81751)"
This reverts commit 3a626937b1b652e3c87cd0050df9c24cc5127d3b.

Causes tests added by fc0b67e1d79d1f199687f8f06d619984d9520230 to fail.
2024-02-15 12:51:23 -08:00
Philip Reames
3a626937b1
[RISCV][TTI] Extract subvector at index zero is free (#81751)
Extracing a subvector at index zero corresponds to a type conversion and
possibly a subregister operation. We will not emit a vslidedown. As
such, they are free.

As an aside, it looks like we're not passing an index in for cases where
the subvec type is scalable. For at least index zero, we probably should
be.
2024-02-15 07:43:50 -08:00
Philip Reames
59e559067b
Revert "[RISCV] Refine cost on Min/Max reduction" (#80340)
Reverts llvm/llvm-project#79402. Crash reported. On closer inspection,
this patch does not handle Intrinsic::maximum and Intrinsic::minimum.
2024-02-01 13:09:07 -08:00
Alexey Bataev
8ad14b6d90
[TTI]Add support for strided loads/stores.
Added basic legality check and cost estimation functions for strided loads and stores.

These interfaces will be built upon in https://github.com/llvm/llvm-project/pull/80310.

Reviewers: preames

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/80329
2024-02-01 16:07:38 -05:00
Shih-Po Hung
2800448f88
[RISCV] Refine cost on Min/Max reduction (#79402)
This patch is split off from #77342, and follows #79103

- Correct for CodeSize cost that 1 instruction is not included. 3 is
from {VMV.S, ReductionOp, VMV.X}
- Add SplitCost which chains a series of VMAX/VMIN/... which scales with
LMUL.
- Use MVT to estimate VL.
2024-01-30 16:47:32 +08:00
Shih-Po Hung
bf716fb716
[RISCV] Refine cost on Min/Max reduction with i1 type (#79401)
It is split off from #77342.
InstCombine transform min/max reduction with i1 into arithmetic
reduction,
so this patch reuses the cost logic in arithmetic reduction cost
function.
2024-01-26 19:35:27 +08:00
Shih-Po Hung
84be954cb2
[RISCV][CostModel] Refine Arithmetic reduction costs (#79103)
This patch is split off from #77342

- Correct for CodeSize cost that 1 instruction is not included. 3 is
from {VMV.S, ReductionOp, VMV.X}
- Add SplitCost
Unordered reduction chain a series of VADD/VFADD/... which scales with
LMUL.
 Ordered reductions chain a series of VFREDOSUMs.
- Use MVT to estimate VL.
2024-01-25 10:49:44 +08:00
Shih-Po Hung
7e63940f69
[RISCV][CostModel] Make VMV_S_X and VMV_X_S cost independent of LMUL (#78739)
Following #77963, instructions like VMV_S_X/VMV_X_S 
handle single element, so the cost don't scale with LMUL.
2024-01-23 11:00:19 +08:00
Philip Reames
8bf624af47 [RISCV] Key VectorIntrinsicCostTable by SEW [nfc-ish]
Previously, we'd keyed the table by the vector type, but we were actually assigning the same cost for all the types with a common element type.  Unless we'd missed an entry, this means that effectively we were performing an SEW lookup.

Restructure the table to make this SEW dependence more explicit, and in the process greatly reduce the size of the table.
2024-01-18 17:10:56 -08:00
Philip Reames
2663d2cb9c
[RISCV] Adjust select shuffle cost to reflect mask creation cost (#77963)
This is inspired by
https://github.com/llvm/llvm-project/pull/77342#pullrequestreview-1814673242,
and is split off of same with some differences in style.

A select is a vmerge.vv with the additional cost of materializing the
bitmask vector in a vreg. All masks fit within a single vector register
(e8 + m8 is the worst case), and thus our worst case cost should be
roughly 3 (2 scalar to produce the address, one vector load op). Given
most shuffles are small, and the mask will be instead produced by
LUI/ADDI + vmv.s.x or ADDI + vmv.s.x, using 2 as the default seems quite
reasonable. At worst, we're not going to be off by much.

The prior lowering scaled the cost of the bitmask with LMUL, which I
don't understand. At m1 it did use the same base cost of 2. (@lukel97
You wrote the original code here, anything I'm missing here?)
2024-01-18 10:24:47 -08:00
Luke Lau
a348397a1c
[RISCV] Don't scale cost by LMUL for TCK_CodeSize in getMemoryOpCost (#78407) 2024-01-17 21:41:35 +07:00
Shih-Po Hung
475890cd2e
[RISCV][CostModel] Add getRISCVInstructionCost() to TTI for CostKind (#76793)
Instruction cost for CodeSize and Latency/RecipThroughput can be very
different. Considering the diversity of CostKind and vendor-specific
cost, and how they are spread across various TTI functions, it's
becoming quite a challenge to handle. This patch adds an interface
getRISCVInstructionCost to address it.
2024-01-04 21:04:36 +08:00
Vitaly Buka
9c39d9bb49
Revert "[RISCV][CostModel] Add getRISCVInstructionCost() to TTI for Cost… (#73651)" (#76536)
Fails on bots https://lab.llvm.org/buildbot/#/builders/5/builds/39629

Issue #76535

This reverts commit 3e75dece919511e4a2edada82d783304cc14a9cd.
2023-12-28 13:30:56 -08:00
Shih-Po Hung
3e75dece91
[RISCV][CostModel] Add getRISCVInstructionCost() to TTI for Cost… (#73651)
…Kind

Instruction cost for CodeSize and Latency/RecipThroughput can be very
different. Considering the diversity of CostKind and vendor-specific
cost, and how they are spread across various TTI functions, it's
becoming quite a challenge to handle. This patch adds an interface
getRISCVInstructionCost to address it.
2023-12-28 14:36:01 +08:00
melonedo
3eaed9e6f5
[RISCV] Implement intrinsics for XCVbitmanip extension in CV32E40P (#74993)
Implement XCVbitmanip intrinsics for CV32E40P according to the
specification.

This commit is part of a patch-set to upstream the vendor specific
extensions of CV32E40P that need LLVM intrinsics to implement Clang
builtins.

Contributors: @CharKeaney, @ChunyuLiao, @jeremybennett, @lewis-revill,
@NandniJamnadas, @PaoloS02, @simonpcook, @xingmingjie.

Spec:
05481cf0ef/specifications/corev-builtin-spec.md (listing-of-pulp-bit-manipulation-builtins-xcvbitmanip).

Previously reviewed on Phabricator: https://reviews.llvm.org/D157510.
Parallel GCC patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635795.html.

Co-authored-by: melonedo <funanzeng@gmail.com>
2023-12-17 19:29:40 +08:00
Sander de Smalen
81b7f115fb
[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979)
It seems TypeSize is currently broken in the sense that:

  TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)

without failing its assert that explicitly tests for this case:

  assert(LHS.Scalable == RHS.Scalable && ...);

The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.

This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.

The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
2023-11-22 08:52:53 +00:00
Wang Pengcheng
e179b125fb
[RISCV][NFC] Pass MCSubtargetInfo instead of FeatureBitset in RISCVMatInt (#71770)
The use of `hasFeature` is more descriptive and the callers of
`RISCVMatInt` have no need to call `getFeatureBits()` any more.
2023-11-09 15:15:23 +08:00
Fangrui Song
8e247b8f47 Replace TypeSize::{getFixed,getScalable} with canonical TypeSize::{Fixed,Scalable}. NFC 2023-10-27 00:30:41 -07:00
Ramkumar Ramachandra
98c90a13c6
ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering (#66924)
The issue #55208 noticed that std::rint is vectorized by the
SLPVectorizer, but a very similar function, std::lrint, is not.
std::lrint corresponds to ISD::LRINT in the SelectionDAG, and
std::llrint is a familiar cousin corresponding to ISD::LLRINT. Now,
neither ISD::LRINT nor ISD::LLRINT have a corresponding vector variant,
and the LangRef makes this clear in the documentation of llvm.lrint.*
and llvm.llrint.*.

This patch extends the LangRef to include vector variants of
llvm.lrint.* and llvm.llrint.*, and lays the necessary ground-work of
scalarizing it for all targets. However, this patch would be devoid of
motivation unless we show the utility of these new vector variants.
Hence, the RISCV target has been chosen to implement a custom lowering
to the vfcvt.x.f.v instruction. The patch also includes a CostModel for
RISCV, and a trivial follow-up can potentially enable the SLPVectorizer
to vectorize std::lrint and std::llrint, fixing #55208.

The patch includes tests, obviously for the RISCV target, but also for
the X86, AArch64, and PowerPC targets to justify the addition of the
vector variants to the LangRef.
2023-10-19 13:05:04 +01:00
Alexey Bataev
e22818d5c9 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-05 06:17:07 -07:00
Arthur Eubanks
07389535a7 Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit b186f1f68be11630355afb0c08b80374a6d31782.

Causes crashes, see https://reviews.llvm.org/D158449.
2023-10-04 14:37:16 -07:00
Alexey Bataev
b186f1f68b [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-04 07:53:30 -07:00
Alexey Bataev
1129dec778 Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 6f43d28f3452b3ef598bc12b761cfc2dbd0f34c9 to fix
a crash reported in https://reviews.llvm.org/D158449.
2023-10-03 13:02:16 -07:00
Alexey Bataev
6f43d28f34 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-03 10:26:11 -07:00
Alexey Bataev
ebcb5d59fc Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 9f5960e004ff54082ccfa9396522e07358f5b66b to fix
buildbots reported here https://lab.llvm.org/buildbot/#/builders/230/builds/19412.
2023-09-29 15:03:46 -07:00
Alexey Bataev
9f5960e004 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-29 13:16:03 -07:00
Alexey Bataev
3204f88a8b Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit c88c281cf1ac1a01c55231b93826d7c8ae83985b to fix the
crash revealed by https://lab.llvm.org/buildbot/#/builders/230/builds/19353.
2023-09-28 11:57:32 -07:00
Alexey Bataev
c88c281cf1 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-28 11:03:21 -07:00