1752 Commits

Author SHA1 Message Date
Pengcheng Wang
18f0f70934
[RISCV] Support llvm.masked.expandload intrinsic (#101954)
We can use `viota`+`vrgather` to synthesize `vdecompress` and lower
expanding load to `vcpop`+`load`+`vdecompress`.

And if `%mask` is all ones, we can lower expanding load to a normal
unmasked load.

Fixes #101914.
2024-10-31 20:03:58 +08:00
Luke Lau
6da5968f5e
[RISCV] Lower scalar_to_vector for supported FP types (#114340)
In https://reviews.llvm.org/D147608 we added custom lowering for
integers, but inadvertently also marked it as custom for scalable FP
vectors despite not handling it.

This adds handling for floats and marks it as custom lowered for
fixed-length FP vectors too.

Note that this doesn't handle bf16 or f16 vectors that would need
promotion, but these scalar_to_vector nodes seem to be emitted when
expanding them.
2024-10-31 13:15:17 +08:00
Craig Topper
55dbacbf07
[RISCV] Remove RISCVISD::VFCVT_X(U)_F_VL by using VFCVT_RM_X(U)_F_VL with DYN rounding mode. NFC (#114306) 2024-10-30 19:16:23 -07:00
Yingwei Zheng
cf9d1c1486
[SDAG] Simplify SDNodeFlags with bitwise logic (#114061)
This patch allows using enumeration values directly and simplifies the
implementation with bitwise logic. It addresses the comment in
https://github.com/llvm/llvm-project/pull/113808#discussion_r1819923625.
2024-10-31 08:10:07 +08:00
Luke Lau
96f5c68350
[RISCV] Lower @llvm.experimental.vector.compress for zvfhmin/zvfbfmin (#113770)
This is a follow up to #113291 and handles f16/bf16 with zvfhmin and
zvfbmin.
2024-10-28 09:37:06 +00:00
Pengcheng Wang
b799cc3418
[RISCV] Add lowering for @llvm.experimental.vector.compress (#113291)
This intrinsic was introduced by #92289 and currently we just expand
it for RISC-V.

This patch adds custom lowering for this intrinsic and simply maps
it to `vcompress` instruction.

Fixes #113242.
2024-10-23 14:22:32 +08:00
Sam Elliott
9b9c2a082c [RISCV][NFC] Move RISCVISD::TAIL beside RISCVISD::CALL 2024-10-22 11:12:58 -07:00
Craig Topper
1bc1a79a65
[RISCV] Support inline assembly 'f' constraint for Zfinx. (#112986)
This would allow some inline assembly code to work with either F or Zfinx.
This appears to match gcc behavior.
2024-10-18 18:17:23 -07:00
Sam Elliott
03dcd88c78
[RISCV][ISel] Ensure 'in X' Constraints prevent X0 (#112563)
I'm not sure if this fix is required, but I've written the patch anyway.
This does not cause test changes, but we haven't got tests that try to
use all 32 registers in inline assembly.

Broadly, for GPRs, we made the explicit choice that `r` constraints
would never attempt to use `x0`, because `x0` isn't really usable like
the other GPRs. I believe the same thing applies to `Zhinx`, `Zfinx` and
`Zdinx` because they should not be allocating operands to `x0` either,
so this patch introduces new `NoX0` classes for `GPRF16` and `GPRF32`
registers, and uses them with inline assembly. There is also a
`GPRPairNoX0` for the `Zdinx` case on rv32, avoiding use of the `x0`
pair which has different behaviour to the other GPR pairs.
2024-10-18 22:33:35 +01:00
Sam Elliott
228f88fdc8
[RISCV] Inline Assembly: RVC constraint and N modifier (#112561)
This change implements support for the `cr` and `cf` register
constraints (which allocate a RVC GPR or RVC FPR respectively), and the
`N` modifier (which prints the raw encoding of a register rather than
the name).

The intention behind these additions is to make it easier to use inline
assembly when assembling raw instructions that are not supported by the
compiler, for instance when experimenting with new instructions or when
supporting proprietary extensions outside the toolchain.

These implement part of my proposal in riscv-non-isa/riscv-c-api-doc#92

As part of the implementation, I felt there was not enough coverage of
inline assembly and the "in X" floating-point extensions, so I have
added more regression tests around these configurations.
2024-10-18 10:40:38 +01:00
Roger Ferrer Ibáñez
9d469b5988
[RISCV] Implement trampolines for rv64 (#96309)
This is implementation is based on what the X86 target does but
emitting the instructions that GCC emits for rv64.

---------

Co-authored-by: Pengcheng Wang <wangpengcheng.pp@bytedance.com>
2024-10-18 08:06:47 +02:00
Jay Foad
85c17e4092
[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)
Convert many instances of:
  Fn = Intrinsic::getOrInsertDeclaration(...);
  CreateCall(Fn, ...)
to the equivalent CreateIntrinsic call.
2024-10-17 16:20:43 +01:00
Nikita Popov
255a99c29f
[APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (#80309)
This fixes all the places that hit the new assertion added in
https://github.com/llvm/llvm-project/pull/106524 in tests. That is,
cases where the value passed to the APInt constructor is not an N-bit
signed/unsigned integer, where N is the bit width and signedness is
determined by the isSigned flag.

The fixes either set the correct value for isSigned, set the
implicitTrunc flag, or perform more calculations inside APInt.

Note that the assertion is currently still disabled by default, so this
patch is mostly NFC.
2024-10-17 08:48:08 +02:00
Luke Lau
2b6b7f664d
[RISCV] Mark math functions as expanded for zvfhmin/zvfbfmin (#112508)
For regular floating point types we mark these as expanded on scalable
vectors so they're not legal in the cost model, so this does the same
for f16 w/ zvfhmin and bf16.
2024-10-16 21:40:37 +01:00
Luke Lau
e88bcc1204
[RISCV] Lower vector_splice on zvfhmin/zvfbfmin (#112579)
Similar to other permutation ops, we can just reuse the existing
lowering.
2024-10-16 21:40:18 +01:00
Luke Lau
f6c23222a4
[RISCV] Promote fixed-length bf16 arith vector ops with zvfbfmin (#112393)
The aim is to have the same set of promotions on fixed-length bf16
vectors as on fixed-length f16 vectors, and then deduplicate them
similarly to what was done for scalable vectors.

It looks like fneg/fabs/fcopysign end up getting expanded because fsub
is now legal, and the default operation action must be expand.
2024-10-15 22:49:05 +01:00
YunQiang Su
c01ddbe916
RISC-V: Select FCANONICALIZE (#112083)
We can use `FMIN.x OP,OP` to canonlize a float.
2024-10-14 14:12:36 +08:00
Jim Lin
dba54fb074
[RISCV] Add support for inline asm constraint vd (#111653)
It constrains vector registers excluding v0. Refer to
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html RISC-V part.

This patch also adds a testcase for constraints vr, vd and vm.
2024-10-14 10:47:59 +08:00
Craig Topper
902520256b [RISCV] Make (sext_inreg X, i1) legal for XTHeadBb to cover the existing isel pattern.
I just happened to notice the untested isel pattern.
2024-10-11 16:16:07 -07:00
Daniel Mokeev
26b832a9ec
[RISCV] Add DAG combine to turn (sub (shl X, 8-Y), (shr X, Y)) into orc.b (#111828)
This patch generalizes the DAG combine for `(sub (shl X, 8), X) =>
(orc.b X)`
into the more general form of `(sub (shl X, 8 - Y), (srl X, Y)) =>
(orc.b X)`.

Alive2 generalized proof: https://alive2.llvm.org/ce/z/dFcf_n
Related issue: https://github.com/llvm/llvm-project/issues/96595
Related PR: https://github.com/llvm/llvm-project/pull/96680
2024-10-11 20:41:47 +08:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Luke Lau
a3cd269fbe
[RISCV] Remove {s,u}int_to_fp custom op action for f16/bf16 (#111471)
It turns out that {s,u}int_to_fp nodes get their operation action from
their operand's type, not the result type, so we don't need to set it
for fp16 or bf16. vp_{s,u}int_to_fp uses the result type though so we
need to keep it.

This also means that we can lower int_to_fp for fixed length bf16
vectors already, so this adds tests for that.

The cost model test changes are due to BasicTTIImpl's getCastInstrCost
not taking into account that int_to_fp needs its legal type swapped.
This can be fixed in a later patch, but its worth noting that the
affected types in the tests currently crash when lowered anyway (due to
them needing split at LMUL > 8)
2024-10-10 14:40:24 +01:00
Jeffrey Byrnes
853c43d04a
[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
duplication.
2024-10-09 14:30:09 -07:00
Yingwei Zheng
9cf8c094c7
[RISCV][DAGCombine] Combine sext_inreg (shl X, Y), i32 into sllw X, Y (#111101)
Alive2: https://alive2.llvm.org/ce/z/ncf36D
2024-10-04 16:03:09 +08:00
Luke Lau
487686b82e
[SDAG][RISCV] Don't promote VP_REDUCE_{FADD,FMUL} (#111000)
In https://reviews.llvm.org/D153848, promotion was added for a variety
of f16 ops with zvfhmin, including VP reductions.

However I don't believe it's correct to promote f16 fadd or fmul
reductions to f32 since we need to round the intermediate results.

Today if we lower @llvm.vp.reduce.fadd.nxv1f16 on RISC-V, we'll get two
different results depending on whether we compiled with +zvfh or
+zvfhmin, for example with a 3 element reduction:

	; v9 = [0.1563, 5.97e-8, 0.00006104]

	; zvfh
	vsetivli x0, 3, e16, m1, ta, ma
	vmv.v.i v8, 0
	vfredosum.vs v8, v9, v8
	vfmv.f.s fa0, v8
	; fa0 = 0.1563

	; zvfhmin
	vsetivli x0, 3, e16, m1, ta, ma
	vfwcvt.f.f.v v10, v9
	vsetivli x0, 3, e32, m1, ta, ma
	vmv.v.i v8, 0
	vfredosum.vs v8, v10, v8
	vfmv.f.s fa0, v8
	fcvt.h.s fa0, fa0
	; fa0 = 0.1564

This same thing happens with reassociative reductions e.g. vfredusum.vs,
and this also applies for bf16.

I couldn't find anything in the LangRef for reductions that suggest the
excess precision is allowed. There may be something we can do in Clang
with -fexcess-precision=fast, but I haven't looked into this yet.

I presume the same precision issue occurs with fmul, but not with
fmin/fmax/fminimum/fmaximum.

I can't think of another way of lowering these other than scalarizing,
and we can't scalarize scalable vectors, so this just removes the
promotion and adjusts the cost model to return an invalid cost. (It
looks like we also don't currently cost fmul reductions, so presumably
they also have an invalid cost?)

I think this should be enough to stop the loop vectorizer or SLP from
emitting these intrinsics.
2024-10-04 00:17:45 +08:00
Keith Packard
ca57e8f23f
[RISCV] Support -mstack-protector-guard=tls (#108942)
Add support for using a thread-local variable with a specified offset
for holding the stack guard canary value.

Closes: #46685
2024-10-02 16:33:31 -07:00
Luke Lau
1fa4a74d53
[RISCV] Lower insert_vector_elt on zvfhmin/zvfbfmin (#110221)
This is the dual of #110144, but doesn't handle the case when the scalar
type is illegal i.e. no zfhmin/zfbfmin. It looks like softening isn't
yet implemented for insert_vector_elt operands and it will crash during
type legalization, so I've left that configuration out of the tests.
2024-10-02 15:26:25 +08:00
Luke Lau
30f58ab17f
[RISCV] Lower vector_reverse for zvfhmin/zvfbfmin (#110218)
Previously we crashed because we had no lowering for f16/bf16 scalable
vectors.
Because the lowering uses vrgather_vv_vl, we need to add bf16 patterns
for it.
2024-10-02 14:25:15 +08:00
Craig Topper
8ed18eded9
[RISCV] Add correct MachinePointerInfo when putting arguments on the stack. (#110140)
Previously we used an empty MachinePointerInfo. I checked a few other
targets like X86, ARM, and AArch64 and they all appear to use correct
MachinePointerInfo.
2024-09-27 13:48:01 -07:00
Jesse Huang
9bdcf7aa18
[RISCV] Software guard direct calls in large code model (#109377)
Support for large code model are added recently, and sementically
direct calls are lowered to an indirect branch with a constant pool target.
By default it does not use the x7 register and this is suboptimal with
Zicfilp because it introduces landing pad check, which is unnecessary
since the constant pool is read-only and unlikely to be tampered.

Change direct calls and tail calls to use x7 as the scratch
register (a.k.a. software guarded branch in the CFI spec)
2024-09-27 13:04:16 +08:00
Luke Lau
2b84ef06ac
[RISCV] Handle f16/bf16 extract_vector_elt when scalar type is legal (#110144)
When the scalar type is illegal, it gets softened during type
legalization and gets lowered as an integer.

However with zfhmin/zfbfmin the type is now legal and it passes through
type legalization where it crashes because we didn't have any custom
lowering or patterns for it.

This handles said case via the existing custom lowering to a vslidedown
and vfmv.f.s.
It also handles the case where we only have zvfhmin/zvfbfmin and don't
have vfmv.f.s, in which case we need to extract it to a GPR and then use
fmv.h.x.

Fixes #110126
2024-09-27 08:00:59 +08:00
Craig Topper
bd592b11c3 [RISCV] Minor cleanups to lowerInterleaveIntrinsicToStore and lowerDeinterleaveIntrinsicToLoad. NFC
-Reduce the scope of some variables.
-Use getArgOperand instead of getOperand to get intrinsic operands.
-Use initialize_list instead of a SmallVector.
-Remove wide VectorType variable that is only used to check fixed vs
 scalable. We can use the narrow VectorType for that.
2024-09-25 21:37:37 -07:00
Craig Topper
cf1de0a7b4 [RISCV] Reuse Factor variable instead of hardcoding 2 in other places. NFC 2024-09-25 16:36:18 -07:00
Luke Lau
f172c31a57
[RISCV] Lower memory ops and VP splat for zvfhmin and zvfbfmin (#109387)
We can lower f16/bf16 memory ops without promotion through the existing
custom lowering.

Some of the zero strided VP loads get combined to a VP splat, so we need
to also handle the lowering for that for f16/bf16 w/ zvfhmin/zvfbfmin.
This patch copies the lowering from ISD::SPLAT_VECTOR over to
lowerScalarSplat which is used by the VP splat lowering.
2024-09-26 01:47:46 +08:00
Craig Topper
3c348bf543
[RISCV] Fold (fmv_x_h/w (load)) to an integer load. (#109900) 2024-09-25 10:29:44 -07:00
Kazu Hirata
cd53c8429e [RISCV] Fix a warning
This patch fixes:

  llvm/lib/Target/RISCV/RISCVISelLowering.cpp:10479:12: error:
  variable 'SubRegIdx' set but not used
  [-Werror,-Wunused-but-set-variable]
2024-09-24 16:22:53 -07:00
Craig Topper
1f9ca89798
[RISCV] Don't create insert/extract subreg during lowering. (#109754)
Create the equivalent INSERT_SUBVECTOR/EXTRACT_SUBVECTOR instead.

When we tried porting this to global isel, we noticed that subreg
operations are created early. We aren't able to do this until
instruction selection in global isel.

For SelectionDAG, it makes sense to use insert/extract_subvector as the
canonical form for these operations pre-isel. If it had come into
SelectionDAG as a insert/extract_subvector we would have kept it in that
form.
2024-09-24 15:54:49 -07:00
Craig Topper
e64673d317
[RISCV] Treat insert_subvector into undef with index==0 as legal. (#109745)
Regardless of fixed and scalable type. We can always use subreg ops.

We don't need to do any container conversion.
2024-09-24 09:49:32 -07:00
Piotr Fusik
cc7b24a4d1
[NFC] Fix typos in comments (#109765) 2024-09-24 11:19:56 +02:00
Craig Topper
23558afaf2
[RISCV] Hoist duplicate code in lowerINSERT_SUBVECTOR. NFC (#109733) 2024-09-23 19:32:33 -07:00
Craig Topper
af1cf699f0
[RISCV] Move OrigIdx == 0 check to start of lowerEXTRACT_SUBVECTOR. NFC (#109731)
Allows us to remove a separate check of OrigIdx != 0 for the mask case.
2024-09-23 18:21:59 -07:00
Craig Topper
079f31c11f
[RISCV] Move the rest of Zfa FLI instruction handling to lowerConstantFP. (#109217)
We already moved the fneg case. This moves the rest so we can drop the
custom isel.
2024-09-19 15:16:10 -07:00
Craig Topper
8e4909aa19
[RISCV] Remove unnecessary vand.vi from vXi1 and nvXvi1 VECTOR_REVERSE codegen. (#109071)
Use a setne with 0 instead of a trunc. We know we zero extended the node
so we can get by with a non-zero check only. The truncate lowering
doesn't know that we zero extended so has to mask the lsb.

I don't think DAG combine sees the trunc before we lower it to RISCVISD
nodes so we don't get a chance to use computeKnownBits to remove the
AND.
2024-09-18 09:43:48 -07:00
Luke Lau
737f56fdf7 [RISCV] Deduplicate zvfhmin and zvfbfmin operation actions. NFC
After #108937 fp16 w/o zvfh and bf16 are now in sync and should have
the same lowering.
2024-09-18 18:07:11 +08:00
Luke Lau
edac1b2d63
[RISCV] Promote bf16 ops to f32 with zvfbfmin (#108937)
For f16 with zvfhmin, we promote most ops and VP ops to f32. This does
the same for bf16 with zvfbfmin, so the two fp types should now be in
sync.

There are a few places in the custom lowering where we need to check for
a LMUL 8 f16/bf16 vector that can't be promoted and must be split, this
extracts that out into isPromotedOpNeedingSplit.

In a follow up NFC we can deduplicate the code that sets up the
promotions.
2024-09-18 17:39:40 +08:00
Luke Lau
8d7d4c25cb
[RISCV] Split fp rounding ops with zvfhmin nxv32f16 (#108765)
This adds zvfhmin test coverage for fceil, ffloor, fnearbyint, frint,
fround and froundeven and splits them at nxv32f16 to avoid crashing,
similarly to what we do for other nodes that we promote.

This also sets ftrunc to promote which was previously missing. We
already promote the VP version of it, vp_froundtozero.
Marking it as promoted affects some of the cost model tests since
they're no longer expanded.
2024-09-18 16:36:13 +08:00
Mikhail R. Gadelha
d2125e1db6
[RISCV] Support STRICT_UINT_TO_FP and STRICT_SINT_TO_FP (#102503)
This patch adds support for the missing STRICT_UINT_TO_FP and
STRICT_SINT_TO_FP for riscv and adds a test case for rv32 which was
previously crashing.

The code is in line with how other strict_* nodes are handled
(e.g., getting op(1) instead of op(0) when it's a strict node, as op(0)
in a strict node is the entry token).
2024-09-17 11:21:52 -03:00
Luke Lau
6af2f225a0
[RISCV] Restrict combineOp_VLToVWOp_VL w/ bf16 to vfwmadd_vl with zvfbfwma (#108798)
We currently make sure to check that if folding an op to an f16 widening
op that we have zvfh. We need to do the same for bf16 vectors, but with
the further restriction that we can only combine vfmadd_vl to vfwmadd_vl
(to get vfwmaccbf16.v{v,f}).

The added test case currently crashes because we try to fold an add to a
bf16 widening add, which doesn't exist in zvfbfmin or zvfbfwma

This moves the checks into the extension support checks to keep it one
place.
2024-09-17 13:35:25 +08:00
Kazu Hirata
1e4e1ceeeb
[Target] Avoid repeated hash lookups (NFC) (#108677) 2024-09-14 07:39:09 -07:00
Craig Topper
ee4582f9c8
[RISCV] Use CCValAssign::getCustomReg for fixed vector arguments/returns with RVV. (#108470)
We need to insert a insert_subvector or extract_subvector which feels
pretty custom.

This should make it easier to support fixed vector arguments for GISel.
2024-09-13 07:23:44 -07:00