284 Commits

Author SHA1 Message Date
Luke Lau
9c7188871c
[RISCV] Cost ordered bf16/f16 w/ zvfhmin reductions as invalid (#114250)
In #111000 we removed promotion of fadd/fmul reductions for bf16 and f16
without zvfh, and marked the cost as invalid to prevent the vectorizers
from emitting them. However it inadvertently didn't change the cost for
ordered reductions, so this moves the check earlier to fix this.

This also uses BasicTTIImpl instead which now assigns a valid but
expensive cost for fixed-length vectors, which reflects how codegen will
actually scalarize them.
2024-10-31 23:36:09 +08:00
Pengcheng Wang
18f0f70934
[RISCV] Support llvm.masked.expandload intrinsic (#101954)
We can use `viota`+`vrgather` to synthesize `vdecompress` and lower
expanding load to `vcpop`+`load`+`vdecompress`.

And if `%mask` is all ones, we can lower expanding load to a normal
unmasked load.

Fixes #101914.
2024-10-31 20:03:58 +08:00
Elvis Wang
a8575c1459
[RISCV] Sink ordered reduction check into FAdd. NFC (#114180) 2024-10-31 13:35:37 +08:00
Luke Lau
14045de250
[RISCV] Account for factor in interleave memory op costs (#111511)
Currently we cost an interleaved memory op as if it were a load/store of
the widened vector type, but this was undercosting in all cases when
compared to the measured performance of todays hardware.

On the x280 at NF=2 and spacemit-x60 at NF=2,3 and 4, a segmented load
is carried out as a wide load and NF LMUL shuffle ops:
https://github.com/preames/bp3-microarch#vlseg_lmul_x_sew_throughput

All other NFs go through a slow path. On the spacemit-x60 this is
proportional to VLMAX * NF, and on the x280 proportional to the number
of segments.

This patch increases the cost by implementing a wide load + NF LMUL
shuffle op cost for the lowest common denominator NF=2, and then a
slower cost proportional to VL for the other NFs.

In a follow up patch we can add a tuning flag to use the faster cost
model for NF=3 and 4 on the spacemit-x60.

Note that the FIXME about illegal vectors seems to have been fixed in
#100436
2024-10-31 05:36:46 +08:00
Luke Lau
e989e31a47
[RISCV] Mark f16/bf16 lrint and llrint cost as invalid (#113924)
We currently can't lower scalable vector lrint and llrint nodes for bf16
and f16, even with zvfh, and will crash.

Mark the cost as invalid for now to prevent the vectorizers from
emitting them.

Note that we can actually lower fixed-length vectors fine by scalarizing
them, but we were still undercosting these too so I've also included
them. I presume there's an opportunity to improve the codegen later on.
2024-10-30 17:21:18 +02:00
Han-Kuan Chen
12bcea3292
[RISCV][TTI] Recognize CONCAT_VECTORS if a shufflevector mask is multiple insert subvector. (#111459)
reference: https://github.com/llvm/llvm-project/pull/110457
2024-10-18 20:16:56 +07:00
Elvis Wang
566012a64e
[RISCV][TTI] Implement instruction cost for vp_merge. (#112327)
This patch implement the instruction for `vp_merge`, which will generate
similar instruction sequence to the `select` instruction.
2024-10-17 07:47:43 +08:00
Philip Reames
b3c687b4e9
[LV] Check early for supported interleave factors with scalable types [nfc] (#111592)
Previously, the cost model was returning an invalid cost. This simply
moves the check from one place to another. This is mostly to make the
cost modeling code a bit easier to follow.

---------

Co-authored-by: Mel Chen <mel.chen@sifive.com>
2024-10-15 07:37:46 -07:00
Jeffrey Byrnes
853c43d04a
[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
duplication.
2024-10-09 14:30:09 -07:00
Philip Reames
f11568bcb0 Revert "[RISCV][TTI] Recognize CONCAT_VECTORS if a shufflevector mask is multiple insert subvector. (#110457)"
This reverts commit 554eaec63908ed20c35c8cc85304a3d44a63c634.  Change was not approved when landed.
2024-10-07 11:31:57 -07:00
Han-Kuan Chen
554eaec639
[RISCV][TTI] Recognize CONCAT_VECTORS if a shufflevector mask is multiple insert subvector. (#110457) 2024-10-05 14:58:44 +08:00
Luke Lau
487686b82e
[SDAG][RISCV] Don't promote VP_REDUCE_{FADD,FMUL} (#111000)
In https://reviews.llvm.org/D153848, promotion was added for a variety
of f16 ops with zvfhmin, including VP reductions.

However I don't believe it's correct to promote f16 fadd or fmul
reductions to f32 since we need to round the intermediate results.

Today if we lower @llvm.vp.reduce.fadd.nxv1f16 on RISC-V, we'll get two
different results depending on whether we compiled with +zvfh or
+zvfhmin, for example with a 3 element reduction:

	; v9 = [0.1563, 5.97e-8, 0.00006104]

	; zvfh
	vsetivli x0, 3, e16, m1, ta, ma
	vmv.v.i v8, 0
	vfredosum.vs v8, v9, v8
	vfmv.f.s fa0, v8
	; fa0 = 0.1563

	; zvfhmin
	vsetivli x0, 3, e16, m1, ta, ma
	vfwcvt.f.f.v v10, v9
	vsetivli x0, 3, e32, m1, ta, ma
	vmv.v.i v8, 0
	vfredosum.vs v8, v10, v8
	vfmv.f.s fa0, v8
	fcvt.h.s fa0, fa0
	; fa0 = 0.1564

This same thing happens with reassociative reductions e.g. vfredusum.vs,
and this also applies for bf16.

I couldn't find anything in the LangRef for reductions that suggest the
excess precision is allowed. There may be something we can do in Clang
with -fexcess-precision=fast, but I haven't looked into this yet.

I presume the same precision issue occurs with fmul, but not with
fmin/fmax/fminimum/fmaximum.

I can't think of another way of lowering these other than scalarizing,
and we can't scalarize scalable vectors, so this just removes the
promotion and adjusts the cost model to return an invalid cost. (It
looks like we also don't currently cost fmul reductions, so presumably
they also have an invalid cost?)

I think this should be enough to stop the loop vectorizer or SLP from
emitting these intrinsics.
2024-10-04 00:17:45 +08:00
Philip Reames
50afafbf29
[RISCV][TTI] Adjust constant materialization cost for (z/s)ext from i1 (#110282)
When we're lowering to a split sequence, we only need one
materialization of the zero constant. Our codegen looks something like
this:

  vmv.v.i	v24, 0
  vmerge.vim	v8, v24, -1, v0
  vmv1r.v	v0, v16
  vmerge.vim	v16, v24, -1, v0

Note: Doing this specific case since it was pointed out in
https://github.com/llvm/llvm-project/pull/110164#discussion_r1778268391,
but it's worth noting that we have the same basic problem (over costing
split operations with split invariant terms) at multiple places through
this file.
2024-09-27 10:53:45 -07:00
Philip Reames
1a9569c4f0
[RISCV][TTI] Avoid an infinite recursion issue in getCastInstrCost (#110164)
Calling into BasicTTI is not always safe. In particular, BasicTTI does
not have a full legalization implementation (vector widening is
missing), and falls back on scalarization. The problem is that
scalarization for <N x i1> vectors is cost in terms of the cast API and
we can end up in an infinite recursive cycle.

The "right" fix for this would be teach BasicTTI how to model the full
legalization state machine, but several attempts at doing so have
resulted in dead ends or undesirable cost changes for targets I don't
understand.

This patch instead papers over the issue by avoiding the call to the
base class when dealing with an i1 source or dest. This doesn't
necessarily produce correct costs, but it should at least return
something semi-sensible and not crash.

Fixes https://github.com/llvm/llvm-project/issues/108708
2024-09-27 07:47:09 -07:00
Philip Reames
d288574363
[TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824)
This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704,
and extends the costing API for compares and selects to provide
information about the operands passed in an analogous manner. This
allows us to model the cost of materializing the vector constant, as
some select-of-constants are significantly more expensive than others
when you account for the cost of materializing the constants involved.

This is a stepping stone towards fixing
https://github.com/llvm/llvm-project/issues/109466. A separate SLP patch
will be required to utilize the new API.
2024-09-25 07:25:57 -07:00
Luke Lau
f43ad88ae1
[RISCV] Handle zvfhmin and zvfbfmin promotion to f32 in half arith costs (#108361)
Arithmetic half or bfloat ops on zvfhmin and zvfbfmin respectively will
be promoted and carried out in f32, so this updates
getArithmeticInstrCost to check for this.
2024-09-25 18:50:16 +08:00
Philip Reames
0b524efa95
[RISCV][TTI] Reduce cost of a <N x i1> build_vector pattern (#109449)
This is a follow up to 7f6bbb3. When lowering a <N x i1> build_vector,
we currently chose to extend to i8, perform the build_vector there, and
then truncate back in vector. Our costing on the other hand accounts for
it as if we performed a vector extend, an insert, and a vector extract
for every element. This significantly over estimates the cost.

Note that we can likely do better in our build_vector lowering here by
packing the bits in scalar, and doing a build_vector of the packed bits.
Regardless, our costing should match our lowering.
2024-09-23 07:21:54 -07:00
Elvis Wang
80b44517f5
[RISCV][TTI] Add instruction cost for vp.select. (#109381)
This patch make instruction cost for vp.select the same as its non-vp
counterpart.
2024-09-23 15:06:04 +08:00
Philip Reames
7f6bbb3c4f
[RISCV][TTI] Reduce cost of a build_vector pattern (#108419)
This change is actually two related changes, but they're very hard to
meaningfully separate as the second balances the first, and yet doesn't
do much good on it's own.

First, we can reduce the cost of a build_vector pattern. Our current
costing for this defers to generic insertelement costing which isn't
unreasonable, but also isn't correct. While inserting N elements
requires N-1 slides and N vmv.s.x, doing the full build_vector only
requires N vslide1down. (Note there are other cases that our build
vector lowering can do more cheaply, this is simply the easiest upper
bound which appears to be "good enough" for SLP costing purposes.)

Second, we need to tell SLP that calls don't preserve vector registers.
Without this, SLP will vectorize scalar code which performs e.g. 4 x
float @exp calls as two <2 x float> @exp intrinsic calls. Oddly, the
costing works out that this is in fact the optimal choice - except that
we don't actually have a <2 x float> @exp, and unroll during DAG. This
would be fine (or at least cost neutral) except that the libcall for the
scalar @exp blows all vector registers. So the net effect is we added a
bunch of spills that SLP had no idea about. Thankfully, AArch64 has a
similiar problem, and has taught SLP how to reason about spill cost once
the right TTI hook is implemented.

Now, for some implications...

The SLP solution for spill costing has some inaccuracies. In particular,
it basically just guesses whether a intrinsic will be lowered to a call
or not, and can be wrong in both directions. It also has no mechanism to
differentiate on calling convention.

This has the effect of making partial vectorization (i.e. starting in
scalar) more profitable. In practice, the major effect of this is to
make it more like SLP will vectorize part of a tree in an intersecting
forrest, and then vectorize the remaining tree once those uses have been
removed.

This has the effect of biasing us slightly away from strided, or indexed
loads during vectorization - because the scalar cost is more accurately
modeled, and these instructions look relevatively less profitable.
2024-09-20 08:34:36 -07:00
Elvis Wang
86ce8e4504
[RISCV][TTI] Fix potential crash of using dyn_cast() in getIntrinsicInstrCost() NFC. (#109379)
This patch fix the potential crash about using dyn_cast in `vp_cmp`
which is same as #109313.

Check if the IntrinsicCostAttrubute contains underlying instruction
first and cast to the VPCmpIntrinsic.
2024-09-20 22:45:53 +08:00
Philip Reames
f3f3883f4b [RISCV] Fix crash reported in https://github.com/llvm/llvm-project/issues/109313
Change edc71e22c004d3b3dfc535f7917ea0b47a282ac8 had tried to handle the case
where an instruction wasn't available, but had used a dyn_cast instead of
a dyn_cast_or_null.  Switch instead to an explicit null check, and a cast.
2024-09-19 11:22:39 -07:00
Jay Foad
e03f427196
[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133)
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
2024-09-19 16:16:38 +01:00
Elvis Wang
edc71e22c0
[RISCV][TTI] Add instruction cost for vp.load/store. (#109245)
This patch makes the instruction cost of vp.load/store same as their
non-vp counterpart.
2024-09-19 16:00:21 +08:00
Philip Reames
2e7c7d20d5
[RISCV][TTI] Adjust cost for extract/insert element when VLEN is known (#108595)
If we know an exact VLEN, then the index is effectively modulo the
number of elements in a single vector register. Our lowering performs
this subvector optimization.

A bit of context. This change may look a bit strange on it's own given
we are currently *not* scaling insert/extract cost by LMUL. This costing
decision needs to change, but is very intertwined with SLP
profitability, and is thus a bit hard to adjust. I'm hoping that
https://github.com/llvm/llvm-project/pull/108419 will let me start to
untangle this. This change is basically a case of finding a subset I can
tackle before other dependencies are in place which does no real harm in
the meantime.
2024-09-17 08:43:40 -07:00
Luke Lau
41f1b467a2
[RISCV] Account for zvfhmin and zvfbfmin promotion in register usage (#108370)
A half with only zvfhmin or bfloat will end up getting promoted to a f32
for most instructions.

Unless the loop consists only of memory ops and permutation instructions
which don't need promoted (is this common?), we'll end up using double
the LMUL than what's currently being returned by getRegUsageForType.

Since this is used by the loop vectorizer, it seems better to be
conservative and assume that any usage of a zvfhmin half/bfloat will end
up being widened to a f32
2024-09-17 13:50:19 +08:00
Elvis Wang
1b3e64a9d2
[RISCV][TTI] Add vp.cmp intrinsic cost with functionalOPC. (#107504)
This patch make the instruction cost of VP compare intrinsics as same as
their non-VP counterpart.
2024-09-12 07:06:36 +08:00
Elvis Wang
845d8d909c
[RISCV][TTI] Add cost of typebased cast VPIntrinsics with functionalOPC. (#97797)
This patch make the instruction cost of type-based cast VP intrinsics
will be same as their non-VP counterpart.
This is the following patch of
[#93435](https://github.com/llvm/llvm-project/pull/93435)
2024-09-05 13:05:01 +08:00
Shih-Po Hung
837ee5b46a
[RISCV][TTI] Scale the cost of FP-Int conversion with LMUL (#87506)
Widening/narrowing the source data type to match the destination data
type may require multiple steps.
To model the costs, the patch generated the interim type by following
the logic in RISCVTargetLowering::lowerVPFPIntConvOp.
2024-09-02 09:38:42 +08:00
Philip Reames
59f05b683d
[RISCV][TTI] Model cost for insert/extract into illegal types (#106440)
We'd previously just deferred to the base implementation, but that more
or less always returns 1. This underestimates the cost of the
insert/extract, biases the SLP vectorizer towards forming illegally
typed vectors, and underestimates the cost of scalarized operations
(like unaligned scatter/gather).
2024-08-29 09:45:47 -07:00
Maciej Gabka
95d2d1cba0
Move stepvector intrinsic out of experimental namespace (#98043)
This patch is moving out stepvector intrinsic from the experimental
namespace.

This intrinsic exists in LLVM for several years now, and is widely used.
2024-08-28 12:48:20 +01:00
Alexey Bataev
2a50dac9fb [RISCV][TTI]Fix the cost estimation for long select shuffle.
The code was broken completely. Need to iterate over the whole mask and
process the submasks correctly, check if they form full indentity and
adjust indices correctly.

Fixes https://github.com/llvm/llvm-project/issues/106126
2024-08-26 17:27:52 -07:00
Philip Reames
424b87b8d6
[RISCV][TTI] Use legalized element types when costing casts (#105723)
This fixes a crash introduced by my
ac6e1fd0c089043fe60bd0040ba3cad884f00206.

I had failed to consider the case where a vector is truncated to an
illegal element type. The resulting intermediate VT wasn't an MVT and
we'd fail an assertion. Surprisingly, SLP does query illegal element
types in some cases.
2024-08-22 16:19:48 -07:00
LiqinWeng
abaa53199e
[RISCV] Implement RISCVTTIImpl::shouldConsiderAddressTypePromotion for RISCV (#102560)
This optimization helps reduce repeated calculations of base addresses
by extracting type extensions when the same base address is accessed
multiple times but its offset is a constant.
2024-08-15 10:37:04 +08:00
Philip Reames
ac6e1fd0c0
[RISCV][TTI] Cost non-power-of-two size changing casts (#101047)
For a cast with src and destination size being unequal, we were costing
the cast as if it were being scalarized, when in fact we can often
promote such cases to a wider legal type.

Note that for casts with equal size (i.e. bitcast, some fp<->i, and
ptrtoint) the generic logic in BasicTTI already assumed promotion. It
just doesn't handle the cast where source and destination are both
promoted to non-equal types.

This is analogous to d3fd28a, but with the same reasoning applied to
casts instead.
2024-08-13 14:58:16 -07:00
Jeremy Morse
bde243259b Revert "[Asan] Provide TTI hook to provide memory reference infromation of target intrinsics. (#97070)"
This reverts commit e8ad87c7d06afe8f5dde2e4c7f13c314cb3a99e9.
This reverts commit d3c9bb0cf811424dcb8c848cf06773dbdde19965.

A few buildbots trip up on asan-rvv-intrinsics.ll. I've also reverted
the follow-up commit d3c9bb0cf8.

https://lab.llvm.org/buildbot/#/builders/46/builds/2895
2024-08-08 12:26:05 +01:00
Yeting Kuo
e8ad87c7d0
[Asan] Provide TTI hook to provide memory reference infromation of target intrinsics. (#97070)
Previously asan considers target intrinsics as black boxes, so asan
could not instrument accurate check. This patch provide TTI hooks to
make targets describe their intrinsic informations to asan.

Note,
1. this patch renames InterestingMemoryOperand to MemoryRefInfo.
2. this patch does not support RVV indexed/segment load/store.
2024-08-08 13:40:26 +08:00
Craig Topper
ad80265874
[RISCV] Qualify all XCV predicates with !is64Bit. (#101074)
The tablegen patterns all have isRV32. I did not check if any of them
could naively support RV64.

Fixes #101067 and probably other bugs like it we haven't found yet.
2024-07-29 21:52:57 -07:00
Philip Reames
b66310f938
[RISCV][TTI] Split costing of [u/s]int_to_fp from fp_to_[u/s]int [nfc] (#101029)
The amount of code sharing between them is fairly small, and the split
version is much easier to read.
2024-07-29 09:32:36 -07:00
Philip Reames
d3fd28a134
[RISCV][TTI] Properly model odd vector sized LD/ST operations (#100436)
The motivation for this change is the costing of a LD or ST with nearly
power of 2 vectors (e.g. <3 x i32> or <7 x i32>) on V. There's an
experimental option in SLP to allow emitting these if the cost model
says they're profitable. This really helps with e.g. RGB vectors.

Our actual lowering for these depends on whether a wider container type
is known available. If so, we use a vle or vse on the wider type with a
restricted VL. If not, we split until a legal type is found, and then
apply the vle/vse on the sub-pieces.

This change is intentionally restricted to only the case where promotion
(widening w/VL predication) is involved. We appear to have at least one
bug in our splitting lowering (see discussion on review), and to avoid
exposing this more widely, I chose to not adjust costs for the splitting
case. The current splitting costing assumes scalarization (which is not
true of the actual lowering), but that has the effect of biasing
vectorization away from such cases strongly.

For the widening case, the true cost scales with the next largest legal
type. The default implementation assumes that such a type is scalarized.
Changing that brings our cost in line with our actual lowering decision.
Note that since scalarization is not possible for scalable types, the
prior costing falsely returned Invalid for that case.
2024-07-26 12:52:20 -07:00
Luke Lau
58854facb3
[RISCV] Don't cost vector arithmetic fp ops as cheaper than scalar (#99594)
I was comparing some SPEC CPU 2017 benchmarks across rva22u64 and
rva22u64_v, and noticed that in a few cases that rva22u64_v was
considerably slower.

One of them was 519.lbm_r, which has a large loop that was being
unprofitably vectorized. It has an if/else in the loop which requires
large amounts of predication when vectorized, but despite the loop
vectorizer taking this into account the vector cost came out as cheaper
than the scalar.

It looks like the reason for this is because we cost scalar floating
point ops as 2, but their vector equivalents as 1 (for LMUL 1). This
comes from how we use BasicTTIImpl for scalars which treats floats as
twice as expensive as integers.

This patch doubles the cost of vector floating point arithmetic ops so
that they're at least as expensive as their scalar counterparts, which
gives a 13% speedup on 519.lbm_r at -O3 on the spacemit-x60.

Fixes #62576 (the last point there about scalar fsub/fmul)
2024-07-22 13:56:10 +08:00
Alex Bradbury
8687f7cd66
[RISCV] Support constant hoisting of immediate store values (#96073)
Previously getIntImmInstCost only calculated the cost of materialising
the argument of a store if it was the address. This means
ConstantHoisting's transformation wouldn't kick in for cases like
storing two values that require multiple instructions to materialise but
where one can be cheaply generated from the other (e.g. by an addition).

Two key changes were needed to avoid regressions when enabling this:
* Allowing constant materialisation cost to be calculated assuming
zeroes are free (as might happen if you had a 2*XLEN constant and one
half is zero).
* Avoiding constant hoisting if we have a misaligned store that's going
to be a legalised to a sequence of narrower stores. I'm seeing cases
where hoisting the constant ends up with worse codegen in that case.

Out of caution and so as not to unexpectedly degrade other existing hoisting logic, FreeZeroes is used only for the new cost calculations for the load instruction. It would likely make sense to revisit this later.
2024-07-17 15:19:31 +01:00
Elvis Wang
4762f3bab0
[RISCV][TTI] Add cost of type based binOp VP intrinsics with functionalOPC. (#93435)
Intrinsics not supported in the backend will fall Into BasicTTIImpl,
which will check if the VP intrinsic is a type based instruction.
All type based instruction will fall into the
`getTypeBasedIntrinsicInstrCost()` which doesn't support instruction
with scalable vector type.

This patch adds the instruction cost for type based binOp VP intrinsic
instructions in the backend to get the valid instruction costs.
The cost of type based binOp VP intrinsics will be same as their non-VP
counterpart.
2024-07-05 08:13:18 +08:00
Philip Reames
25b65be43d
[RISCV][LSR] Account for temporary register for base addition (#92296)
An LSR formula may require the addition of multiple base or scale
registers, this sum reduction requires a temporary register to perform.
Since the formulas are independent, we only need one temporary,
regardless of the number of unique formula. Each formula can reuse the
same temporary. A later CSE pass may come along and combine
sub-expressions - but then the register pressure would be that passes
problem to consider.

This change fixes up the costing in the RISCV specific way, but this is
really a generic LSR problem. I just didn't feel like fighting with LSR
and dealing with all the various targets swinging slightly in hard to
reason about ways. This problem is more pronounced on RISCV than any
other target due to our lack of addressing modes.

This change is not hugely important on it's own, but I have an upcoming
change to add support fo shNadd in LSR which biases us fairly strongly
towards adding more "base adds". Without this change, we see net
regression due to the increase in register pressure which is not
accounted for.
2024-05-22 13:38:39 -07:00
Elvis Wang
b60e62896e
[RISCV][CostModel] Remove cost of icmp inst in icmp+select with SFB. (#91158)
With ShortFowrardBranchOpt(SFB) or ConditionalMoveFusion, scalar
ICmp and scalar Select instructions will lower to SELECT_CC
and lower to PseudoCCMOVGPR which will generate a conditional
branch instruction and a move instruction.
The cost of scalar (ICmp + Select) = (0 + Select instruction cost)
2024-05-20 16:03:18 +08:00
Craig Topper
487b43cdc9
[RISCV] Pass subvector type to isLegalInterleavedAccessType in getInterleavedMemoryOpCost. (#91825)
isLegalInterleavedAccessType expects the subvector type, but
getInterleavedMemoryOpCost is called with the full vector type. So we
need to divide by Factor.
2024-05-15 21:47:29 -07:00
Min-Yih Hsu
4c68de5a00
[RISCV][CostModel] Add cost model for experimental.cttz.elts (#91778)
The cost of `experimental.cttz.elts` in RISC-V equals to the cost of
vfirst when the zero_is_poison argument is true. Otherwise, we add
additional costs of cmp + select to convert the -1 result from vfirst to
EVL.
2024-05-14 09:18:08 -07:00
Shih-Po Hung
22213d5883 Recommit [RISCV][TTI] Support fdiv/udiv/sdiv/srem/urem in getArithmeticInstrCost (#89170)
Insert a break to fix the implicit-fallthrough caught by sanitizer.

Original commit message:

This patch made following changes:
1. Support ISD FDIV/UDIV/SDIV/UREM/SREM
2. Classify instructions which cost the same
2024-05-12 20:10:51 -07:00
ShihPo Hung
d67c3a4b1f Revert "[RISCV][TTI] Support fdiv/udiv/sdiv/srem/urem in getArithmeticInstrCost (#89170)"
This reverts commit ed16e7aac44f2024b45d8c6c9dc2817d77d0ea97.
2024-05-12 19:57:40 -07:00
Shih-Po Hung
ed16e7aac4
[RISCV][TTI] Support fdiv/udiv/sdiv/srem/urem in getArithmeticInstrCost (#89170)
This patch made following changes:
1. Support ISD FDIV/UDIV/SDIV/UREM/SREM
2. Classify instructions which cost the same
2024-05-13 09:47:57 +08:00
Mel Chen
3f1fef3699
[RISCV] Support interleaved accesses for scalable vector. (#90583)
The support for interleaved accesses for scalable vector with a factor
of 2 is enabled in vectorizer. Therefore, the patch removed the
restriction for scalable vector with a factor of 2.
2024-05-03 21:56:31 +08:00