1378 Commits

Author SHA1 Message Date
Craig Topper
e64f5d6305
[RISCV] Replace RISCVISD::VP_MERGE_VL with a new node that has a separate passthru operand. (#75682)
ISD::VP_MERGE treats the false operand as the source for elements past
VL. The vmerge instruction encodes 3 registers and treats the vd
register as the source for the tail.

This patch adds a new ISD opcode that models the tail source explicitly.
During lowering we copy the false operand to this operand.

I think we can merge RISCVISD::VSELECT_VL with this new opcode by using
an UNDEF passthru, but I'll save that for another patch.
2023-12-21 14:34:49 -08:00
Craig Topper
0dcff0db3a
[RISCV] Add codegen support for experimental.vp.splice (#74688)
IR intrinsics were already defined, but no codegen support had been
added.

I extracted this code from our downstream. Some of it may have come from
https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.
2023-12-21 08:38:32 -08:00
Yeting Kuo
9b561ca044
[RISCV] Make performFP_TO_INTCombine fold with ISD::FRINT. (#76020)
Fold (fp_to_int (frint X)) to (fcvt X) without rounding mode.
2023-12-21 15:03:36 +08:00
Yeting Kuo
b7376c3196
[RISCV][NFC] Add comments and tests for frint case of performFP_TO_INT_SATCombine. (#76014)
performFP_TO_INT_SATCombine could also serve pattern (fp_to_int_sat
(frint X)).
2023-12-20 14:56:28 +08:00
Yeting Kuo
cdc0392669
[RISCV] Update implies for subtarget feature. (#75824)
PR #75576 and #75735 update some implies in
llvm/lib/Support/RISCVISAInfo.cpp, but both of them miss the subtarget
feature part.
This patch still preserve predicate HasStdExtZfhOrZfhmin and
HasStdExtZhinxOrZhinxmin, since they could make error message more
readable. ( Users might not know that zfh implies zfhmin.)
2023-12-19 09:47:46 +08:00
Jie Fu
b6cce87110 [RISCV] Fix -Wbraced-scalar-init in RISCVISelLowering.cpp (NFC)
llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp:339:24:
 error: braces around scalar initializer [-Werror,-Wbraced-scalar-init]
  339 |     setOperationAction({ISD::ROTL}, XLenVT, Expand);
      |                        ^~~~~~~~~~~
1 error generated.
2023-12-17 19:59:42 +08:00
melonedo
3eaed9e6f5
[RISCV] Implement intrinsics for XCVbitmanip extension in CV32E40P (#74993)
Implement XCVbitmanip intrinsics for CV32E40P according to the
specification.

This commit is part of a patch-set to upstream the vendor specific
extensions of CV32E40P that need LLVM intrinsics to implement Clang
builtins.

Contributors: @CharKeaney, @ChunyuLiao, @jeremybennett, @lewis-revill,
@NandniJamnadas, @PaoloS02, @simonpcook, @xingmingjie.

Spec:
05481cf0ef/specifications/corev-builtin-spec.md (listing-of-pulp-bit-manipulation-builtins-xcvbitmanip).

Previously reviewed on Phabricator: https://reviews.llvm.org/D157510.
Parallel GCC patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635795.html.

Co-authored-by: melonedo <funanzeng@gmail.com>
2023-12-17 19:29:40 +08:00
Philip Reames
e8a15eca92
[RISCV] Prefer whole register loads and stores when VL=VLMAX (#75531)
If we're lowering a fixed length vector load or store which happens to
exactly VLEN in size (when VLEN is exactly known), we can use a whole
register load or store instead of the unit strided variants. This
doesn't require a vsetvli in some cases, allows additional flexibility
of vsetvli cases in others, and doesn't have a runtime dependency on the
value of VL.
2023-12-15 09:26:57 -08:00
Philip Reames
632f1c5d18
[RISCV] When VLEN is exactly known, prefer VLMAX encoding for vsetvli (#75412)
If we know the exact VLEN, then we can tell if the AVL for particular
operation is equivalent to the vsetvli xN, zero, <vtype> encoding. Using
this encoding is better than having to materialize an immediate in a
register, but worse than being able to use the vsetivli zero, imm,
<type> encoding.
2023-12-13 17:51:03 -08:00
Philip Reames
12af9c8337 [RISCV] Extract a utility for computing bounds on VLMAX [nfc]
Simplifying an upcoming change...
2023-12-13 13:40:18 -08:00
Craig Topper
2c185709bc
[RISCV] Remove setJumpIsExpensive(). (#74647)
Middle end up optimizations can speculate away the short circuit
behavior of C/C++ && and ||. Using i1 and/or or logical select
instructions and a single branch.

SelectionDAGBuilder can turn i1 and/or/select back into multiple
branches, but this is disabled when jump is expensive.

RISC-V can use slt(u)(i) to evaluate a condition into any GPR which
makes us better than other targets that use a flag register. RISC-V also
has single instruction compare and branch. So its not clear from a code
size perspective that using compare+and/or is better.

If the full condition is dependent on multiple loads, using a logic
delays the branch resolution until all the loads are resolved even if
there is a cheap condition that makes the loads unnecessary.

PowerPC and Lanai are the only CPU targets that use setJumpIsExpensive.
NVPTX and AMDGPU also use it but they are GPU targets. PowerPC appears
to have a MachineIR pass that turns AND/OR of CR bits into multiple
branches. I don't know anything about Lanai and their reason for using
setJumpIsExpensive.

I think the decision to use logic vs branches is much more nuanced than
this big hammer. So I propose to make RISC-V match other CPU targets.

Anyone who wants the old behavior can still pass -mllvm
-jump-is-expensive=true.
2023-12-13 09:37:25 -08:00
Craig Topper
8227072f5a [RISCV] Add missing break to last case in switch. NFC 2023-12-12 13:52:52 -08:00
Craig Topper
3c5b42acd3
[RISCV] Allocate the varargs GPR save area as a single object. (#74354)
Previously we allocated one object for each GPR. We also allocated the
same offset twice, once to save for VASTART and then again for the first
register in the save loop.

This patch uses a single object for all the registers and shares this
with VASTART. This is more consistent with other targets like AArch64
and ARM.

I've removed the setValue(nullptr) from the memory operand now. Having a
single object makes me a lot more comfortable about alias analysis being
able to see what is going on. This led to the scheduling changes in
push-pop-popret.ll and vararg.ll.
2023-12-05 10:30:01 -08:00
Craig Topper
b73d79fda8 [RISCV] Fix typo in comment. NFC
This should say "Assume that VL output is <= 65536".
2023-12-04 14:15:49 -08:00
Craig Topper
47fe9fcaf2
[RISCV] Share ArgGPRs array between SelectionDAG and GISel. (#74152)
This will allow us to isolate the EABI from D70401 to this new function.
2023-12-04 11:29:54 -08:00
Craig Topper
26fc26c184
[RISCV] Simplify computation of VarArgsSaveSize. NFC (#74209)
The computation we use for computing the size already returns 0 when all
registers are allocated. We don't need an if to set it to 0.

Use the size being 0 to check for whether we need to spill registers or
not.

I have another change I want to make to this code, but this change
seemed to stand on its own. I left the curly braces since I need them
for the other change.
2023-12-03 20:35:12 -08:00
Philip Reames
e817966718
[RISCV] Collapse fast unaligned access into a single feature [nfc-ish] (#73971)
When we'd originally added unaligned-scalar-mem and
unaligned-vector-mem, they were separated into two parts under the
theory that some processor might implement one, but not the other. At
the moment, we don't have evidence of such a processor. The C/C++ level
interface, and the clang driver command lines have settled on a single
unaligned flag which indicates both scalar and vector support unaligned.
Given that, let's remove the test matrix complexity for a set of
configurations which don't appear useful.

Given these are internal feature names, I don't think we need to provide
any forward compatibility. Anyone disagree?

Note: The immediate trigger for this patch was finding another case
where the unaligned-vector-mem wasn't being properly serialized to IR
from clang which resulted in problems reproducing assembly from clang's
-emit-llvm feature. Instead of fixing this, I decided getting rid of the
complexity was the better approach.
2023-12-01 11:00:59 -08:00
Philip Reames
ff5e536b5e
[RISCV] Add combines to form binop from tail insert idioms (#72675)
This patch contains two related combines:
1) If we have an scalar vector insert into the result of a
concat_vector,
   sink the insert into the operand of the concat.
2) If we have a insert of a scalar binop into a vector binop of the
   same opcode and the RHS of both are constant, perform the insert
   and then the binop.

The common theme to both is pushing inserts closer to the sources of the
computation graph. The goal is to enable forming vector bin ops from
inserts of scalar binops at the end of another vector.

For RISCV specifically, the concat_vector transform will push inserts to
smaller vectors. This will have the effect of reducing lmul for the
vslides, and usually doesn't require an additional vsetvli since
the source vectors are already working in the narrower VL.   I tried
that one as a target independent combine first, and it doesn't appear
profitable on all targets.

This is only one approach to the problem. Another idea would be to
aggressively form build_vectors and subvector inserts from the
individual scalar inserts, and then have a transform which sunk a
subvector_insert down through the concat. The advantage of the alternate
approach is that we expose parallelism in the insert sequence, even if
the source vector isn't a concat_vector. If reviewers are okay with it,
I'd like to start with this approach, and then explore that direction in
a follow up patch.
2023-11-30 07:32:42 -08:00
Craig Topper
e3021bdecd [RISCV] Add RISCVISD::SLLW to computeKnownBitsForTargetNode.
Found while investigating whether we still need to stop DAG combiner
from turning (i64 (sext (i32 X))) into zext when i32 is known non
negative.

No test case because I still need to find fixes for some other issues
before I can remove the code from DAGCombiner.
2023-11-29 16:21:43 -08:00
Yeting Kuo
f35c0f2f23
[RISCV] Refine pattern (select_cc seteq (and x, C), 0, 0, A) with Zbs. (#73746)
PR #72978 disabled transformation (select_cc seteq (and x, C), 0, 0, A)
-> (and (sra(shl x)), A) for better Zicond codegen. It still enables the
combine when C is not fit into 12-bits. This patch disables the combine
when Zbs enabled.
2023-11-29 13:09:47 +08:00
Yeting Kuo
f73844d92b
[RISCV] Generate bexti for (select(setcc eq (and x, c))) where c is power of 2. (#73649)
Currently, llvm can transform (setcc ne (and x, c)) to (bexti x,
log2(c)) where c is power of 2.
This patch transform (select (setcc ne (and x, c)), T, F) into (select
(setcc eq (and x, c)), F, T).
It is benefit to the case c is not fit to 12-bits.
2023-11-29 11:56:48 +08:00
Philip Reames
02cbae4fe0
[RISCV] Work on subreg for insert_vector_elt when vlen is known (#72666) (#73680)
If we have a constant index and a known vlen, then we can identify which
registers out of a register group is being accessed. Given this, we can
reuse the (slightly generalized) existing handling for working on
sub-register groups. This results in all constant index extracts with
known vlen becoming m1 operations.

One bit of weirdness to highlight and explain: the existing code uses
the VL from the original vector type, not the inner vector type. This is
correct because the inner register group must be smaller than the
original (possibly fixed length) vector type. Overall, this seems to a
reasonable codegen tradeoff as it biases us towards immediate AVLs,
which avoids needing the vsetvli form which clobbers a GPR for no real
purpose. The downside is that for large fixed length vectors, we end up
materializing an immediate in register for little value. We should
probably generalize this idea and try to optimize the large fixed length
vector case, but that can be done in separate work.
2023-11-28 10:45:22 -08:00
Philip Reames
f3a9dbe7fc
[RISCV] Split build_vector into vreg sized pieces when exact VLEN is known (#73606)
If we have a high LMUL build_vector and a known exact VLEN, we can
decompose the build_vector into one build_vector per register in the
register group. Doing so requires exact knowledge of which elements
correspond to each register in the register group, and thus an exact
VLEN must be known.

Since we no longer have operations which are linear (or worse) in LMUL,
this also allows us to lower all build_vectors without resorting to
going through the stack.
2023-11-28 07:39:58 -08:00
Philip Reames
a3ae7b660a [RISCV] Minor style cleanup to cf17a24 [nfc]
This was suggested in another related review, so backporting it to the
existing code as well.
2023-11-28 07:27:30 -08:00
Philip Reames
cf17a24a4b
[RISCV] Use subreg extract for extract_vector_elt when vlen is known (#72666)
This is the first in a planned patch series to teach our vector lowering
how to exploit register boundaries in LMUL>1 types when VLEN is known to
be an exact constant. This corresponds to code compiled by clang with
the -mrvv-vector-bits=zvl option.

For extract_vector_elt, if we have a constant index and a known vlen,
then we can identify which register out of a register group is being
accessed. Given this, we can do a sub-register extract for that
register, and then shift any remaining index.

This results in all constant index extracts becoming m1 operations, and
thus eliminates the complexity concern for explode-vector idioms at high
lmul.
2023-11-27 14:33:16 -08:00
Zi Xuan Wu (Zeson)
e89324219a
[RISCV] Don't combine store of vmv.x.s/vfmv.f.s to vp_store with VL of 1 when it's indexed store (#73219)
Because we can't support vp_store with indexed address mode by lowering to vse intrinsic later.
2023-11-27 13:39:35 +08:00
Wang Pengcheng
5973272af7
[RISCV] Add MinimumJumpTableEntries to TuneInfo (#72963)
This is like what AArch64 has done in #71166 except that we don't
handle `HasMinSize` case now.
2023-11-23 14:05:23 +08:00
Min Hsu
e096732307 [RISCV][NFC] Rename RISCVISD::FPCLASS to RISCVISD::FCLASS
To be consistent with `fclass.s/d`. Also rename `riscv_fpclass` to
`riscv_fclass`. NFC.
2023-11-22 16:24:05 -08:00
Sander de Smalen
81b7f115fb
[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979)
It seems TypeSize is currently broken in the sense that:

  TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)

without failing its assert that explicitly tests for this case:

  assert(LHS.Scalable == RHS.Scalable && ...);

The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.

This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.

The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
2023-11-22 08:52:53 +00:00
Zi Xuan Wu (Zeson)
06e733b198
[RISCV] Fix the order of arguments of setTruncStoreAction and setLoadExtAction (#73090)
The first argument of setTruncStoreAction/setLoadExtAction should be
Value VT instead of Memory VT.
2023-11-22 15:32:39 +08:00
Yeting Kuo
a756a6b97e
[TargetLowering][RISCV] Introduce shouldFoldSelectWithSingleBitTest and RISC-V implement. (#72978)
DAGCombiner folds (select_cc seteq (and x, y), 0, 0, A) to (and (sra
(shl x)) A) where y has a single bit set. Previously, DAGCombiner relies
on `shouldAvoidTransformToShift` to decide when to do the combine, but
`shouldAvoidTransformToShift` is only about shift cost. This patch
introuduces a specific hook to decide when to do the combine and disable
the combine when Zicond enabled and AndMask <= 1024.
2023-11-22 08:22:14 +08:00
Craig Topper
7a6fd49c8a [RISCV] Use short forward branch for ISD::ABS.
We can use short forward branch to conditionally negate if the
value is negative.
2023-11-21 11:00:06 -08:00
Brandon Wu
2749f52ec4
[RISCV] Convert all floating point vector type operands to integer vector type (#69559) 2023-11-21 23:19:10 +08:00
Liao Chunyu
9166cd2a71
[RISCV] DAG combine (mul (add x, 1), y) -> vmadd (#71495)
vmadd: (mul (add x, 1), y) -> (add (mul x, y), y)
           (mul x, add (y, 1)) -> (add x, (mul x, y))
    vnmsub: (mul (sub 1, x), y) -> (sub y, (mul x, y))
            (mul x, (sub 1, y)) -> (sub x, (mul x, y))
    
    Comparison with gcc:
    vmadd: https://gcc.godbolt.org/z/xjePx87Y7
    vnsub: https://gcc.godbolt.org/z/b17zG7nT1
2023-11-21 13:43:34 +08:00
Philip Reames
144b2f579e
[RISCV] Start vslide1down sequence with a dependency breaking splat (#72691)
If we are using entirely vslide1downs to initialize an otherwise undef
vector, we end up with an implicit_def as the source of the first
vslide1down. This register has to be allocated, and creates false
dependencies with surrounding code.

Instead, start our sequence with a vmv.v.x in the hopes of creating a
dependency breaking idiom. Unfortunately, it's not clear this will
actually work as due to the VL=0 special case for T.A. the hardware has
to work pretty hard to recognize that the vmv.v.x actually has no source
dependence. I don't think we can reasonable expect all hardware to have
optimized this case, but I also don't see any downside in prefering it.
2023-11-17 12:02:58 -08:00
Sacha Coppey
aeedc07637 [IR] Add GraalVM calling conventions
Adds GraalVM calling conventions. The only difference with the default calling conventions is that GraalVM reserves two registers for the heap base and the thread. Since the registers are then accessed by name, getRegisterByName has to be updated accordingly.

This patch implements the calling conventions only for X86, AArch64 and RISC-V.

For X86, the reserved registers are X14 and X15. For AArch64, they are X27 and X28. For RISC-V, they are X23 and X27.

This patch has been used by the LLVM backend of GraalVM's Native Image project in production for around 4 months with no major issues.

Differential Revision: https://reviews.llvm.org/D151107
2023-11-17 16:30:09 +00:00
Nemanja Ivanovic
0765f6451f
[RISCV] Use correct register class for Z[df]inx inline asm (#71872)
Allocate a register of the correct register class for inline asm
constraint "r" when used for FP values with -Zfinx/-Zdinx.

---------

Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>
2023-11-17 16:17:48 +01:00
Philip Reames
8f81c605f5
[RISCV] Remove custom instruction selection for VFCVT_RM and friends (#72540)
We already have the pseudo's for lowering these as MI nodes with
rounding mode operands, and the generic FRM insertion pass. Doing the
insertion later in the backend allows SSA level passes to avoid
reasoning about physical register copies, and happens to produce better
code in practice. The later is mostly an accident of our insertion
order; we happen to place the frm write after the vsetvli, and it's very
common for a register to be killed at the vsetvli. End result is that we
get slightly better scalar register allocation.

I'm a bit unclear on the history here. I was surprised to find this code
in ISEL lowering at all, but am also surprised once I found it that all
the patterns and pseudos seem to already exist. My best guess is that
maybe we didn't do all the possible cleanup after introducing the
HasRoundMode mechanism?
2023-11-17 07:07:37 -08:00
Craig Topper
927f6f1858 [RISCV] Use bset+addi for (not (sll -1, X)).
This is an alternative to #71420 that handles i32 on RV64 safely
by pre-promoting the pattern in DAG combine.
2023-11-16 11:14:53 -08:00
Craig Topper
84044061e8 [RISCV] Make getDefaultVLOps call getDefaultScalableVLOps instead of the other way around. NFC
Previously getDefaultScalableVLOps called getDefaultVLOps. getDefaultVLOps
also handles fixed vectors so had to then check if it was fixed
or scalable.

Since getDefaultScalableVLOps know the type is scalable, it makes
sense for it to contain the scalable case directly and have
getDefaultVLOps call it for the scalable case.
2023-11-15 19:28:00 -08:00
Michael Maitland
725e599637
[RISCV][GISEL] Add support for scalable vector types in lowerReturnVal (#71587)
Scalable vector types from LLVM IR are lowered into physical vector
registers in MIR based on calling convention for return instructions.
2023-11-15 17:30:53 -05:00
Yingwei Zheng
650026897c
[RISCV][SDAG] Prefer ShortForwardBranch to lower sdiv by pow2 (#67364)
This patch lowers `sdiv x, +/-2**k` to `add + select + shift` when the
short forward branch optimization is enabled. The latter inst seq
performs faster than the seq generated by target-independent
DAGCombiner. This algorithm is described in ***Hacker's Delight***.

This patch also removes duplicate logic in the X86 and AArch64 backend.
But we cannot do this for the PowerPC backend since it generates a
special instruction `addze`.
2023-11-10 21:38:47 +08:00
Craig Topper
72f30acfed [RISCV] Disable Zbs special case in performTRUNCATECombine with -riscv-experimental-rv64-legal-i32. 2023-11-09 10:19:28 -08:00
Craig Topper
679cc16c99 [RISCV] Disable early promotion for Zbs in performANDCombine with riscv-experimental-rv64-legal-i32
We can match this directly in isel with the i32 type being legal.

The generic DAG combine will unpromote part of the pattern and
prevent it from being matched in isel.
2023-11-09 09:51:31 -08:00
Wang Pengcheng
e179b125fb
[RISCV][NFC] Pass MCSubtargetInfo instead of FeatureBitset in RISCVMatInt (#71770)
The use of `hasFeature` is more descriptive and the callers of
`RISCVMatInt` have no need to call `getFeatureBits()` any more.
2023-11-09 15:15:23 +08:00
Jianjian Guan
d36eb79ccc
[RISCV] Support Strict FP arithmetic Op when only have Zvfhmin (#68867)
Include: STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV,
STRICT_FSQRT and STRICT_FMA.
2023-11-09 09:55:48 +08:00
Craig Topper
90f768440d
[VP][RISCV] Add llvm.experimental.vp.reverse. (#70405)
This is similar to vector.reverse, but only reverses the first EVL
elements.

I extracted this code from our downstream. Some of it may have come from
https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.
2023-11-05 22:39:27 -08:00
Craig Topper
ae49bf5a7b [RISCV] Fix stale comments in lowerShift*Parts. NFC
This comment was not updated when we changed from xor back to sub.
2023-11-02 22:20:49 -07:00
Craig Topper
5570d3250f [RISCV] Don't promote i32 and/or/xor with -riscv-experimental-rv64-legal-i32.
Some test improvements, but also some regressions that need to be
fixed.
2023-11-01 11:36:46 -07:00
Craig Topper
8912200966
[RISCV] Add experimental support for making i32 a legal type on RV64 in SelectionDAG. (#70357)
This will select i32 operations directly to W instructions without
custom nodes. Hopefully this can allow us to be less dependent on
hasAllNBitUsers to recover i32 operations in RISCVISelDAGToDAG.cpp.

This support is enabled with a command line option that is off by
default.

Generated code is still not optimal.

I've duplicated many test cases for this, but its not complete. Enabling this runs all existing lit tests without crashing.
2023-11-01 09:36:41 -07:00