196 Commits

Author SHA1 Message Date
melonedo
3eaed9e6f5
[RISCV] Implement intrinsics for XCVbitmanip extension in CV32E40P (#74993)
Implement XCVbitmanip intrinsics for CV32E40P according to the
specification.

This commit is part of a patch-set to upstream the vendor specific
extensions of CV32E40P that need LLVM intrinsics to implement Clang
builtins.

Contributors: @CharKeaney, @ChunyuLiao, @jeremybennett, @lewis-revill,
@NandniJamnadas, @PaoloS02, @simonpcook, @xingmingjie.

Spec:
05481cf0ef/specifications/corev-builtin-spec.md (listing-of-pulp-bit-manipulation-builtins-xcvbitmanip).

Previously reviewed on Phabricator: https://reviews.llvm.org/D157510.
Parallel GCC patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635795.html.

Co-authored-by: melonedo <funanzeng@gmail.com>
2023-12-17 19:29:40 +08:00
Sander de Smalen
81b7f115fb
[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979)
It seems TypeSize is currently broken in the sense that:

  TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)

without failing its assert that explicitly tests for this case:

  assert(LHS.Scalable == RHS.Scalable && ...);

The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.

This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.

The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
2023-11-22 08:52:53 +00:00
Wang Pengcheng
e179b125fb
[RISCV][NFC] Pass MCSubtargetInfo instead of FeatureBitset in RISCVMatInt (#71770)
The use of `hasFeature` is more descriptive and the callers of
`RISCVMatInt` have no need to call `getFeatureBits()` any more.
2023-11-09 15:15:23 +08:00
Fangrui Song
8e247b8f47 Replace TypeSize::{getFixed,getScalable} with canonical TypeSize::{Fixed,Scalable}. NFC 2023-10-27 00:30:41 -07:00
Ramkumar Ramachandra
98c90a13c6
ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering (#66924)
The issue #55208 noticed that std::rint is vectorized by the
SLPVectorizer, but a very similar function, std::lrint, is not.
std::lrint corresponds to ISD::LRINT in the SelectionDAG, and
std::llrint is a familiar cousin corresponding to ISD::LLRINT. Now,
neither ISD::LRINT nor ISD::LLRINT have a corresponding vector variant,
and the LangRef makes this clear in the documentation of llvm.lrint.*
and llvm.llrint.*.

This patch extends the LangRef to include vector variants of
llvm.lrint.* and llvm.llrint.*, and lays the necessary ground-work of
scalarizing it for all targets. However, this patch would be devoid of
motivation unless we show the utility of these new vector variants.
Hence, the RISCV target has been chosen to implement a custom lowering
to the vfcvt.x.f.v instruction. The patch also includes a CostModel for
RISCV, and a trivial follow-up can potentially enable the SLPVectorizer
to vectorize std::lrint and std::llrint, fixing #55208.

The patch includes tests, obviously for the RISCV target, but also for
the X86, AArch64, and PowerPC targets to justify the addition of the
vector variants to the LangRef.
2023-10-19 13:05:04 +01:00
Alexey Bataev
e22818d5c9 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-05 06:17:07 -07:00
Arthur Eubanks
07389535a7 Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit b186f1f68be11630355afb0c08b80374a6d31782.

Causes crashes, see https://reviews.llvm.org/D158449.
2023-10-04 14:37:16 -07:00
Alexey Bataev
b186f1f68b [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-04 07:53:30 -07:00
Alexey Bataev
1129dec778 Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 6f43d28f3452b3ef598bc12b761cfc2dbd0f34c9 to fix
a crash reported in https://reviews.llvm.org/D158449.
2023-10-03 13:02:16 -07:00
Alexey Bataev
6f43d28f34 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-03 10:26:11 -07:00
Alexey Bataev
ebcb5d59fc Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 9f5960e004ff54082ccfa9396522e07358f5b66b to fix
buildbots reported here https://lab.llvm.org/buildbot/#/builders/230/builds/19412.
2023-09-29 15:03:46 -07:00
Alexey Bataev
9f5960e004 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-29 13:16:03 -07:00
Alexey Bataev
3204f88a8b Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit c88c281cf1ac1a01c55231b93826d7c8ae83985b to fix the
crash revealed by https://lab.llvm.org/buildbot/#/builders/230/builds/19353.
2023-09-28 11:57:32 -07:00
Alexey Bataev
c88c281cf1 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-28 11:03:21 -07:00
Ramkumar Ramachandra
7c128f6d0e
CostModel/RISCV: tweak cost of vector ctpop under ZVBB (#67020)
Under RISCV experimental-zvbb, vector variants of llvm.ctpop lower to a
single instruction: vcpop. The cost-model does not check for the ZVBB
extension, and always associates a high cost to vector variants of
llvm.ctpop. Fix this defect.
2023-09-27 13:00:33 +01:00
Sergey Kachkov
3d7df0a547
[RISCV][CostModel] Estimate cost of Extract/InsertElement with non-constant index when vector instructions are not available (#67334)
This patch fixes the compilation time issue of matrix-types-spec test
from test-suite.

Reproduction of the problem:
```
clang++ -DNDEBUG --target=riscv64-linux-gnu --sysroot=<sysroot path> --gcc-toolchain=<gcc path> -O2 -fenable-matrix <test-suite-path>/SingleSource/UnitTests/matrix-types-spec.cpp
```

On my machine, compilation takes 50.44s. In comparison, the same test
with RVV (-march=rv64gcv) compiles in 3.06s, and for x86-64 target it
takes 1.71s. It turns out that the main issue is unrolling of loop in
multiplySpec function, that has extractelements with non-constant index:
```
for.body9.i:                                      ; preds = %for.body9.i, %for.cond6.preheader.i
  %indvars.iv.i92 = phi i64 [ 0, %for.cond6.preheader.i ], [ %indvars.iv.next.i93, %for.body9.i ]
  %Elt.033.i = phi double [ 0.000000e+00, %for.cond6.preheader.i ], [ %80, %for.body9.i ]
  %77 = mul nuw nsw i64 %indvars.iv.i92, 25
  %78 = add nuw nsw i64 %77, %indvars.iv39.i91
  %matrixext.i = extractelement <475 x double> %62, i64 %78
  %79 = add nuw nsw i64 %indvars.iv.i92, %74
  %matrixext13.i = extractelement <209 x double> %73, i64 %79
  %80 = tail call double @llvm.fmuladd.f64(double %matrixext.i, double %matrixext13.i, double %Elt.033.i)
  %indvars.iv.next.i93 = add nuw nsw i64 %indvars.iv.i92, 1
  %exitcond.not.i94 = icmp eq i64 %indvars.iv.next.i93, 19
  br i1 %exitcond.not.i94, label %for.cond.cleanup8.i, label %for.body9.i, !llvm.loop !21
```

When RVV is supported, extractelement/insertelement with non-constant
index can be lowered quite efficiently with vslidedown/vslideup;
otherwise it's lowered via stack memory operations, i.e. for
extractelement each vector element is stored on stack and then the
needed element is loaded back; for insertelement is stores all vector
elements, rewrites the required element value and then loads vector
back. Currently the cost of such expensive operation is estimated as
zero, so loop unroll processes the loop more aggresively. The proper
estimation of cost (in a way like in X86 target) prohibits unrolling of
this loop and fixes compilation time (2.77s on my machine).
2023-09-27 13:12:42 +03:00
Craig Topper
64b1fbb3bd
[RISCV] Disable constant hoisting for mul by one more/less than a pow… (#67385)
…er of 2.

We can use a shift+add/sub for these. This often has same or lower
latency than a multiply and may have more execution resources available.
2023-09-26 09:58:44 -07:00
Sergey Kachkov
0a5d52a757
[RISCV][CostModel] Add getCFInstrCost RISC-V implementation (#65599)
This patch implements getCFInstrCost TTI hook that mostly affects
LoopVectorizer decisions. It sets zero cost for PHI nodes and zero
throughput cost for branches (assuming that branches are likely to
be predicted). The implementation is similar to X86/AArch64/PowerPC
targets and reduces loop cost by excluding induction PHIs/loop latch
branches, which in turn leads to selecting smaller vectorization
factor.
2023-09-25 12:26:01 +03:00
Philip Reames
463c9f44dc [RISCV] Move slide and gather costing to TLI [NFC] (PR #65396)
As mentioned in TODOs from D159332.  This PR doesn't actually
common up that copy of the code because doing so is not NFC - due to
DLEN.  Fixing that will be a future PR.
2023-09-07 18:28:17 -07:00
Philip Reames
2cb357d01a [RISCV][TTI] Constify a few routines [nfc] 2023-09-07 11:03:52 -07:00
Philip Reames
3e89aca446 [RISCV] Rename getELEN to getELen [nfc]
Let's follow the naming scheme use for DLen, XLen, and FLen.
2023-08-31 11:27:00 -07:00
Philip Reames
0b98c356b5 [RISCV] Restructure i1 insert/extract element costing [nfc-ish]
These get expanded through i8, so let's just model that directly instead of looking at the expanded code and trying to fit that into the flow. This results in slightly better costs for constant indices today (we can remove the add), but is mostly focused on making future changes easier.

Differential Revision: https://reviews.llvm.org/D158770
2023-08-25 07:28:56 -07:00
Philip Reames
514b38cd7e [RISCV] Remove mask size restriction on single source and dual src shuffle costing (try 2)
Some callers pass in an empty mask to represent "unknown".  We should use the generic costs for these cases.  We can add VL=1 costing seperately if desired.

Reapplying after revert.  A new test had been added, and I'd missed updating it when rebasing before.  This is a great happy accident as I hadn't figured out how to get SLP to exercise this case, I'd merely noticed it via inspection.
2023-08-23 14:43:02 -07:00
Philip Reames
84e5278229 Revert "[RISCV] Remove mask size restriction on single source and dual src shuffle costing"
This reverts commit 2246700e7b25e60ea682525b6995e72d69968984.  Seeing buildbot failures; it looks like I rebased over a new test which is effected by the change.
2023-08-23 14:25:03 -07:00
Philip Reames
2246700e7b [RISCV] Remove mask size restriction on single source and dual src shuffle costing
Some callers pass in an empty mask to represent "unknown".  We should use the generic costs for these cases.  We can add VL=1 costing seperately if desired.
2023-08-23 14:13:07 -07:00
wangpc
9a82bda9de [RISCV] Fix assertion of getShuffleCost
This assertion is introduced by D157425.

We should calculate the cost iff `Mask` is not empty.

Fixes 64901

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D158590
2023-08-23 20:10:50 +08:00
Philip Reames
4a35655dac [RISCV][TTI] Factor out getVSlideCost helper [nfc]
Reasonable implementations may differ in complexity cost, so doing some API prepwork to support tunables.  Note that this patch only covers the cases where we model the slide cost as linear.  A separate patch will propose changing our insert and extract costs from constant-in-lmul to linear-in-lmul.
2023-08-22 14:16:51 -07:00
Philip Reames
f346cbd378 [RISCV][TTI] Factor out getVRGatherVICost helper [nfc]
Reasonable implementations may differ in complexity cost, so doing some API prepwork to support tunables.
2023-08-22 14:10:46 -07:00
Alexey Bataev
9a207578ac [TTI]Add InsertSubvector pattern in improveShuffleKindFromMask().
It improves shuffle instructions estimation and improves vectorization
outcome.

Differential Revision: https://reviews.llvm.org/D157425
2023-08-18 13:47:01 -07:00
Philip Reames
7cc6b80d9a [RISCV][CostModel] Model vrgather.vv as being quadradic in LMUL
vrgather.vv across multiple vector registers (i.e. LMUL > 1) requires all to all data movement. This includes two conceptual sets of changes:

    For permutes, we were modeling these as being linear in LMUL.
    For reverse, we were modeling them as being fixed cost in LMUL.

Both were wrong, and have been adjusted to O(LMUL^2).  Noticed via code inspection while looking at something else.

Its worth asking whether we should be lowering reverse to something other than a vrgather at high LMULs. That shuffle is quite expensive.  (Future work)

Differential Revision: https://reviews.llvm.org/D152019
2023-07-18 11:52:34 -07:00
Luke Lau
02bb33c3ce [RISCV] Check for alignment when lowering interleaved/deinterleaved loads/stores
As noted by @reames, we should be checking that the memory access is aligned to
the element size (or the unaligned vector memory access feature is enabled)
before lowering vlseg/vsseg intrinsics via the interleaved access pass.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D154536
2023-07-07 15:34:24 +01:00
David Green
12025cef3e [CostModel] Use min/max intrinsics for vecreduce.min/max costs
This changes the costmodelling of the vecreduce.min/max nodes to use the costs
of the relevant min/max intrinsics instead of expanding them to compare and
selects. The getMinMaxReductionCost have changed to take a Opcode for the
relevant intrinsic, dropping the IsUnsigned and CondTy parameters as they are
no longer needed.

A follow up patch will add some basic fminimum/fmaximum costmodelling.

Differential Revision: https://reviews.llvm.org/D153547
2023-07-04 15:02:30 +01:00
Luke Lau
d0d864f6f4 [SLP] Explicitly pass AccessTy to getGEPCost
Building on D149889, this patch updates SLP to pass the vector type as
the AccessTy to getGEPCost.
This should have the effect of GEPs being costed for more often instead
of being treated as foldable into the address mode and thus free, as
some architectures, notably RISC-V, do not have offset+reg addressing
modes for vector memory accesses.

Note that in SLP, GEPs are costed in two places: getPointersChainCost
and GetGEPCostDiff.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D153570
2023-06-29 18:42:24 +01:00
Luke Lau
a68dcd09e8 [TTI] Use users of GEP to guess access type in getGEPCost
Currently getGEPCost uses the target type of the GEP as a heuristic for
the type that will be accessed, to pass onto isLegalAddressingMode.
Targets use this to work out if a GEP can then be folded into the
load/store instruction that uses the GEP.
For example, on RISC-V loads and stores can have an offset added to a
base register folded into a single instruction, so the following GEP is
free:

%p = getelementptr i32, ptr %base, i32 42       ; getInstructionCost = 0
%x = load i32, ptr %p                           ; getInstructionCost = 1
------------------------------------------------------------------------
lw t0, a0(42)

However vector loads and stores cannot have an offset folded into them,
so the following GEP is costed:

%p = getelementptr <2 x i32>, ptr %base, i32 42 ; getInstructionCost = 1
%x = load <2 x i32>, ptr %p                     ; getInstructionCost = 1
------------------------------------------------------------------------
addi  a0, 42
vle32 v8, (a0)

The issue arises whenever there is a mismatch between the target type of
the GEP and the type that is actually accessed:

%p = getelementptr i32, ptr %base, i32 42       ; getInstructionCost = 0
%x = load <2 x i32>, ptr %p                     ; getInstructionCost = 1
------------------------------------------------------------------------
addi  a0, 42
vle32 v8, (a0)

Even though this GEP will result in an add instruction, because TTI
thinks it's loading an i32, it will think it can be folded and not
charge for it.

The target type can become mismatched with the memory access during
transformations, noticeably during SLP where a scalar base pointer will
be reused to perform a vector load or store.

This patch adds an optional AccessType argument to getGEPCost which
allows the type of memory accessed by users to be passed in as a hint,
so that we can more accurately determine if the GEP can be folded into
its users.

If AccessType is not provided, getGEPCost falls back to the old
behaviour of using the PointeeType to guess the memory access type. This
can be revisited in a later patch.

Also for now, only GEPs with exactly one user use the access type hint.
Whilst we could look through all users and use all access types to
determine if we can fold the GEP, this patch avoids doing so to prevent
O(N) behaviour.

Differential Revision: https://reviews.llvm.org/D149889
2023-06-29 13:44:37 +01:00
Simon Pilgrim
defb5cd783 Fix MSVC "'std::max': no matching overloaded function found" error. NFCI. 2023-06-14 19:32:17 +01:00
Philip Reames
7f26c27e03 [RISCV] Enable SLP by default (when vectors are available)
I propose that we go ahead and enabled SLP by default. Over the last few weeks, @luke and I have been working through codegen issues seen at small VLs from a couple of SPEC workloads. We still have a ways to go to get optimal codegen, but we're at the point where having a single configuration we're all tuning against is probably the right default.

As a bit of history, I introduced this TTI hook back in a310637132 back in August of last year to unblock enabling LoopVectorizer. At the time, we had a couple known issues: constant materialization, address generation, and a general lack of maturity of small fixed vector codegen. By now, each of these has had significant investment. I can't say any of them are completely fixed, but we're no longer seeing instances of them every place we look.

What we're mostly seeing at this point is a long tail of code gen opportunities, many involving build vectors, shuffles, and extract patterns. I have a couple patches up to continue iterating on those issues, but I don't think they need to be blockers for enabling SLP.

Differential Revision: https://reviews.llvm.org/D152750
2023-06-14 09:49:58 -07:00
Craig Topper
6ac2ce7d84 [RISCV] Introduce the concept of DLEN(data path width) into getLMULCost.
SiFive's x280 CPU has a vector unit that VLEN/2 bits wide. This
means that LMUL=1 operations take 2 to process all VLEN bits.

This patch adds a DLenFactor tuning parameter and applies it to
TuneSiFive7. getLMULCost has been updated to use this factor in
its calculations. I've added an x280 command line to one cost
model test to demonstrate the effect.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D152421
2023-06-13 16:09:25 -07:00
Graham Hunter
95bfb1902d [LV][AArch64] Allow (limited) interleaving for scalable vectors
This patch uses the (de)interleaving intrinsics introduced in
D141924 to handle vectorization of interleaving groups with a
factor of 2 for scalable vectors.

Reviewed By: fhahn, reames

Differential Revision: https://reviews.llvm.org/D145163
2023-06-09 11:42:10 +01:00
Luke Lau
c27a0b21c5 [SLP][RISCV] Account for offset folding in getPointersChainCost
For a GEP in a pointer chain, if:
1) a pointer chain is unit-strided
2) the base pointer wasn't folded and is sitting in a register somewhere
3) the distance between the GEP and the base pointer is small enough and
   can be folded into the addressing mode of the using load/store

Then we can exclude that GEP from the total cost of the pointer chain,
as it will likely be folded away.

In order to check if 3) holds, we need to know the type of memory access
being made by the users of the pointer chain. For that, we need to pass
along a new argument to getPointersChainCost. (Using the source pointer
type of the GEP isn't accurate, see https://reviews.llvm.org/D149889 for
more details).

Also note that 2) is currently an assumption, and could be modelled more
accurately.

This prevents some unprofitable cases from being SLP vectorized on
RISC-V by making the scalar costs cheaper and closer to the actual
codegen.

For now the getPointersChainCost hook is duplicated for RISC-V to prevent
disturbing other targets, but could be merged back in and shared with
other targets in a following patch.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D149654
2023-05-22 13:55:30 +01:00
Craig Topper
ffa32cd11e [RISCV] Disable constant hoiting for multiply by a power of 2. 2023-05-20 19:20:49 -07:00
Craig Topper
1e6d069709 [RISCV] Simplify and improve getLMULCost.
Use divideCeil for fixed vectors which avoids the need for a std::max
with 1 and should be more correct for odd sized vectors if those
occur.

Use conditional operator instead of an if/else.
2023-05-19 14:09:50 -07:00
Philip Reames
c501aa8843 [RISCV][TTI] Model shuffle mask materialization with correct index type
We were modeling these as if the index type was always e8, but the actual
lowering uses the data type width if legal.  We also weren't accounting for
i64 on xlen32 correctly.

Noticed via inspection while working on the shuffle/buildvec lowering.  Note
that this costing is also wrong in a more major way - we don't actually use
a constant pool load in many cases.  But that's a separate issue.
2023-05-05 12:26:21 -07:00
Yeting Kuo
35c877a6f0 [RISCV] Customed lower vector nearbyint and rint in RISC-V.
The patch lowers vector rint/nearbyint like vp.rint/nearbyint.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D148619
2023-04-19 11:07:23 +08:00
Simon Pilgrim
fb8038db73 [TTI] getExtendedReductionCost - replace std::optional<FastMathFlags> args with FastMathFlags
Followup to D148149 where it was noticed that the std::optional wrapper wasn't helping with anything (we can just use an empty FastMathFlags()).
2023-04-13 11:26:28 +01:00
Simon Pilgrim
9e30b87afb [TTI] getMinMaxReductionCost - add FastMathFlag argument
Similar to the getArithmeticReductionCost / getExtendedReductionCost calls (which really don't need to use std::optional<>).

This will be necessary to correct recognize fast/nnan fmax/fmul reductions which can avoid nan handling - which will allow us to remove the fmax/fmin special case in X86TTIImpl::getMinMaxCost and use getIntrinsicInstrCost like we do for integer reductions (63c3895327839ba5b57f5b99ec9e888abf976ac6).

Differential Revision: https://reviews.llvm.org/D148149
2023-04-13 10:42:42 +01:00
Philip Reames
b0e0c1e46c [RISCV][TTI] Call improveShuffleKindFromMask like all the other backends
No test diff; noticed via inspection.
2023-04-12 17:43:36 -07:00
Philip Reames
27b6ddbf6e [RISCV] Speculative fix for issue reported against D147470 post commit 2023-04-05 17:25:42 -07:00
Philip Reames
0e6d7eceaa [RISCV][TTI] Cost model for SK_ExtractSubvector
Differential Revision: https://reviews.llvm.org/D147618
2023-04-05 09:56:43 -07:00
Philip Reames
37646a2c28 [RISCV] Account for LMUL in memory op costs
Generally, the cost of a memory op will scale with the number of vector registers accessed. Machines might exist which have a narrow memory access than vector register width, but machines with a wider memory access width than vector register width seem unlikely.

I noticed this because we were preferring wide loads + deinterleaves on examples where the cost of a short gather (actually a strided load) would be better. Touching 8 vector registers instead of doing a 4 element gather is not a good tradeoff.

Differential Revision: https://reviews.llvm.org/D147470
2023-04-05 07:58:56 -07:00
Philip Reames
8865ed4dbb [RISCV][TTI] Cost SK_Tranpose as a generic two element shuffle
This matches the actual lowering.  The previous costing was "as if" it had been fully scalarized.
2023-04-04 12:52:50 -07:00