1640 Commits

Author SHA1 Message Date
Hassnaa Hamdi
3176f255c9
[IA][AArch64]: Construct (de)interleave4 out of (de)interleave2 (#89276)
- [AArch64]: TargetLowering is updated to spot load/store (de)interleave4 like sequences using PatternMatch,
   and emit equivalent sve.ld4 and sve.st4 intrinsics.
2024-08-12 17:23:00 +01:00
Craig Topper
257c479b91
[LegalizeTypes][RISCV] Use SExtOrZExtPromotedOperands to promote operands for USUBSAT. (#102781)
It doesn't matter which extend we use to promote the operands. Use
whatever is the most efficient.

The custom handler for RISC-V was using SIGN_EXTEND when the Zbb
extension is enabled so we no longer need that.
2024-08-11 10:22:31 -07:00
Craig Topper
ca7ad38ca0
[RISCV] Remove riscv-experimental-rv64-legal-i32. (#102509)
This has received no development work in a while and is slowly bit
rotting as new extensions are added.

At the moment, I don't think this is viable without adding a new
invariant that 32 bit values are always in sign extended form like
Mips64 does. We are very dependent on computeKnownBits and
ComputeNumSignBits in SelectionDAG to remove sign extends created for
ABI reasons. If we can't propagate sign bit information through 64-bit
values in SelectionDAG, we can't effectively clean up those extends.
2024-08-09 11:48:48 -07:00
Kazu Hirata
f4fb735840
[llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (#102578) 2024-08-09 09:15:42 -07:00
Craig Topper
dac9042cc6 [RISCV] Use uint64_t operations instead of APInt operations. NFC
We already know the type is i64 here. Just extract the uint64_t.
2024-08-05 10:37:04 -07:00
Sergei Barannikov
4527fba9ad
Revert "[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall" (#101740)
Reverts the rest of llvm/llvm-project#99752
2024-08-03 01:51:26 +03:00
Craig Topper
4aac78dd4a
[RISCV] Generalize existing SRA combine to fix #101040. (#101610)
We already had a DAG combine for (sra (sext_inreg (shl X, C1), i32), C2)
-> (sra (shl X, C1+32), C2+32) that we used for RV64. This patch
generalizes it to other sext_inregs for both RV32 and RV64.
    
Fixes #101040.
2024-08-02 09:02:58 -07:00
Sergei Barannikov
92e18ffd80
[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall (#99752)
The main change is adding CTPOP to `RuntimeLibcalls.def` to allow
targets to use LibCall action for CTPOP. DAG legalizers are changed
accordingly.
2024-08-02 12:29:39 +03:00
Craig Topper
840ec59a0b [RISCV] Refactor setOperationActions for scalar ISD::ABS. NFC 2024-08-01 22:42:58 -07:00
Craig Topper
3626443507
[RISCV] Use X0 for VLMax for slide1up/slide1down in lowerVectorIntrinsicScalars. (#101384)
Previously, we created a vsetvlimax intrinsic. Using X0 simplifies the
code and enables some optimizations to kick when the exact value of
vlmax is known.
2024-07-31 13:01:31 -07:00
Craig Topper
76a15e5fc1 [RISCV] Keep all the setOperationActions for the same types and opcodes together.
Some of XCV subtarget checks were in their own area.
2024-07-30 14:48:10 -07:00
Philip Reames
8a4b095403 [RISCV][TLI/TTI] Reject scalable offsets in isLegalAddressing mode
None of our addressing modes support a scalable offset.  I could not
figure out how to get LSR to actually try such a formula, but let's
be defensive and explicitly prevent this case from being considered
a valid address mode match.
2024-07-30 12:04:23 -07:00
Luke Lau
b1542afd0b
[RISCV] Rename merge operand -> passthru. NFC (#100330)
We sometimes call the first tied dest operand in vector pseudos the
merge operand, and other times the passthru.

Passthru seems to be more common, and it's what the C intrinsics call
it[^1], so this renames all usages of merge to passthru to be
consistent. It also helps prevent confusion with vmerge.vvm in some of
the peephole optimisations.

[^1]:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc#the-passthrough-vd-argument-in-the-intrinsics
2024-07-30 17:47:00 +08:00
Craig Topper
43de4e03a3
[RISCV] Rename hasVInstructionsBF16 to hasVInstructionsBF16Minimal. NFC (#101080)
This makes it more consistent with Zvfhmin since it is not a complete
bf16 implementation.
2024-07-29 21:55:42 -07:00
Craig Topper
ad80265874
[RISCV] Qualify all XCV predicates with !is64Bit. (#101074)
The tablegen patterns all have isRV32. I did not check if any of them
could naively support RV64.

Fixes #101067 and probably other bugs like it we haven't found yet.
2024-07-29 21:52:57 -07:00
Luke Lau
e90c21831f
[RISCV] Use APInt in isSimpleVIDSequence to account for index overflow (#100072)
At zvl1024b, we may have legal fixed length vectors where a vid.v would
overflow at i8, e.g. <512 x i8>.

When lowering constant build_vectors, isSimpleVIDSequence used uint64_t
to model the vid.v sequence which meant it didn't account for the fact
that it could overflow in these larger types.

This patch fixes it by modelling the sequence with an SEW-wide APInt so
if it does overflow the loop that checks/calculates the addend will
detect it and bail.

Fixes #99729
2024-07-30 12:16:07 +08:00
Craig Topper
3de76e4f57 [RISCV] Replace hasStdExtV with hasVInstructions in isTruncateFree.
This prevents excluding the embedded vector extensions Zve*.
2024-07-29 17:15:02 -07:00
Craig Topper
03e1eb29e7 [RISCV] Replace hasStdExtV with hasVInstructions.
This prevents excluding Zve*
2024-07-29 16:58:48 -07:00
Craig Topper
279953f1da
[RISCV] Simplify code in decomposeMulByConstant. NFC (#100946)
We already checked that the type is a scalar integer, so the constant
node should definitely be a ConstantSDNode. We can use cast instead of
dyn_cast.
2024-07-28 19:29:39 -07:00
Yingwei Zheng
13996378d8
[RISCV][ISel] Fold FSGNJX idioms (#100718)
This patch folds `fmul X, (fcopysign 1.0, Y)` into `fsgnjx X, Y`. This
pattern exists in some graphics applications/math libraries.
Alive2: https://alive2.llvm.org/ce/z/epyL33

Since fpimm +1.0 is lowered to a load from constant pool after
OpLegalization, I have to introduce a new RISCVISD node FSGNJX and fold
this pattern in DAGCombine.

Closes https://github.com/dtcxzyw/llvm-opt-benchmark/issues/1072.
2024-07-27 12:51:58 +08:00
Craig Topper
b582b658d6
[RISCV] Add FMA support to combineOp_VLToVWOp_VL. (#100454)
This adds FMA to the widening web support we have for add, sub, mul, and
shl.

Extra care needs to be taken to not widen the third FMA operand.
2024-07-26 09:12:40 -07:00
Luke Lau
754dc9ff5a
[RISCV] Move exact VLEN VLMAX transform to RISCVVectorPeephole (#100551)
We can teach RISCVVectorPeephole to detect when an AVL is equal to the
VLMAX when the exact VLEN is known and use the VLMAX sentinel instead,
and in doing so remove the need for getVLOp in RISCVISelLowering. This
keeps all the VLMAX logic in one place.
2024-07-26 07:56:12 +08:00
Luke Lau
63ae1e9550
[RISCV] Emit VP strided loads/stores in RISCVGatherScatterLowering (#98111)
RISCVGatherScatterLowering is the last user of
riscv_masked_strided_{load,store} after #98131 and #98112, this patch
changes it to emit the VP equivalent instead. This allows us to remove
the masked_strided intrinsics so we have only have one lowering path.

riscv_masked_strided_{load,store} didn't have AVL operands and were
always VLMAX, so this passes in the fixed or scalable element count to
the EVL instead, which RISCVVectorPeephole should now convert to VLMAX
after #97800.
For loads we also use a vp_select to get passthru (mask undisturbed)
behaviour
2024-07-24 13:56:22 +08:00
Craig Topper
caaba2a883
[RISCV] Replace VNCLIP RISCVISD opcodes with TRUNCATE_VECTOR_VL_SSAT/USAT opcodes (#100173)
These new opcodes drop the shift amount, rounding mode, and passthru.
Making them exactly like TRUNCATE_VECTOR_VL. The shift amount, rounding
mode, and passthru are added in isel patterns similar to how we
translate TRUNCATE_VECTOR_VL to vnsrl with a shift of 0.

This should simplify #99418 a little.
2024-07-23 14:57:31 -07:00
Craig Topper
ef1367faed
[RISCV] Use vnclip(u) to handle fp_to_(s/u)int_sat that needs additional narrowing. (#100071)
If vncvt doesn't produce the destination type directly, use vnclip to do
additional narrowing with saturation.
2024-07-23 09:09:35 -07:00
Craig Topper
df4fa47b57 [RISCV] Use MVT::changeVectorElementType. NFC 2024-07-23 09:02:15 -07:00
Craig Topper
0950533cff [RISCV] Move call to EmitLoweredCascadedSelect above some variable declarations. NFC
These variables aren't used if we call EmitLoweredCascadedSelect
so move the call above them.
2024-07-22 10:37:45 -07:00
Craig Topper
d221662ed0 [RISCV] In emitSelectPseudo, copy call frame size from LastSelectPseudo instead of MI.
The split point is LastSelectPseudo. If MI is earlier, we might
sink it to LastSelectPseudo.
2024-07-22 10:37:44 -07:00
Craig Topper
2c92335eb7
[RISCV] Copy call frame size when splitting basic block in emitSelectPseudo. (#99823)
Fixes #97304.
2024-07-22 09:21:01 -07:00
Craig Topper
77ac07444d Re-commit "[RISCV] Use Root instead of N throughout the worklist loop in combineBinOp_VLToVWBinOp_VL. (#99416)"
With correct test update.

Original message:

We were only checking that the node from the worklist is a supported
root. We weren't checking the strategy or any of its operands unless it
was the original node. For any other node, we just rechecked the
original node's strategy and operands.

The effect of this is that we don't do all of the transformations at
once. Instead, when there were multiple possible nodes to transform we
would only do them as each node was visited by the main DAG combine
worklist.

The test shows a case where we widened an instruction without removing
all of the uses of the vsext. The sext is shared by one node that shares
another sext node with the root another node that doesn't share anything
with the root.
2024-07-18 09:10:52 -07:00
Craig Topper
10627d2004 Revert "[RISCV] Use Root instead of N throughout the worklist loop in combineBinOp_VLToVWBinOp_VL. (#99416)"
This reverts commit 0c4023ae3b64c54ff51947e9776aee0e963c5635.

I messed up re-generating the test after the change.
2024-07-18 09:02:26 -07:00
Craig Topper
0c4023ae3b
[RISCV] Use Root instead of N throughout the worklist loop in combineBinOp_VLToVWBinOp_VL. (#99416)
We were only checking that the node from the worklist is a supported
root. We weren't checking the strategy or any of its operands unless it
was the original node. For any other node, we just rechecked the
original node's strategy and operands.

The effect of this is that we don't do all of the transformations at
once. Instead, when there were multiple possible nodes to transform we
would only do them as each node was visited by the main DAG combine
worklist.

The test shows a case where we widened an instruction without removing
all of the uses of the vsext. The sext is shared by one node that shares
another sext node with the root another node that doesn't share anything
with the root.
2024-07-18 08:47:06 -07:00
Craig Topper
d85f1054fb
[RISCV] Teach fillUpExtensionSupportForSplat to handle nxvXi64 VMV_V_X_VL on RV32. (#99251)
A nxvXi64 VMV_V_X_VL on RV32 sign extends its 32 bit input to 64 bits.
If that input is positive, the sign extend can also be considered as a
zero extend.
2024-07-17 12:23:05 -07:00
Amara Emerson
f270a4dd66
[AArch64] Don't tail call memset if it would convert to a bzero. (#98969)
Well, not quite that simple. We can tc memset since it returns the first
argument but bzero doesn't do that and therefore we can end up
miscompiling.

This patch also refactors the logic out of isInTailCallPosition() into the callers.
As a result memcpy and memmove are also modified to do the same thing
for consistency.

rdar://131419786
2024-07-17 01:31:52 -07:00
Yeting Kuo
746cea3eb7
[VP][RISCV] Introduce vp.splat and RISC-V. (#98731)
This patch introduces a vp intrinsic for splat. It's helpful for
IR-level passes to create a splat with specific vector length.
2024-07-17 08:40:42 +08:00
Yeting Kuo
b6c4ad700b
[RISCV] Remove x7 from fastcc list. (#96729)
Like #93321, this patch also tries to solve the conflict usage of x7 for
fastcc and Zicfilp. But this patch removes x7 from fastcc directly. Its
purpose is to reduce the code complexity of #93321, and we also found
that it at most increase 0.02% instruction count for most benchmarks and
it might be benefit for benchmarks.
2024-07-17 08:37:55 +08:00
Craig Topper
9e52a9ee63 [RISCV] Simplify some checks of when we can't form a widening vector FP operation. NFCI
The _VL nodes are only used with scalable vectors so we don't need
to check that.

It doesn't matter if Zvfhmin is enabled. All that really matters is
whether Zvfh is.
2024-07-15 17:50:43 -07:00
Craig Topper
7863e4ed88
[RISCV] Form VFWMUL.VF and VFWADD.VF/WF when fp_extend is scalar and then splatted. (#98590)
Previously we only supported the extend being in the vector domain after
the splat.
2024-07-15 17:50:23 -07:00
Froster
c8dc21d77f
[SelectionDAG][RISCV] Fix break of vnsrl pattern in issue #94265 (#95563)
Added a RISCV overload of `isTruncateFree` to fix the break of vnsrl described in issue #94265.

Fixes #94265
2024-07-14 12:09:37 +01:00
Philip Reames
657dbc3feb
[RISCV] Reorder shuffle operands if one side is an identity (#98534)
Doing so allows one side to fold entirely into the mask applied to the
other recursive call (or a vmerge.vv at worst). This is a generalization
of the existing IsSelect case (both operands are selects), so I removed
that code in the process.

This actually started as an attempt to remove the IsSelect bit as I'd
thought it was fully redundant
with the recursive formulation, but digging into test deltas revealed
that we depended on that
to catch the majority of the identity cases, and that in turn we were
missing some cases where only RHS was an identity.
2024-07-11 13:52:41 -07:00
Joseph Huber
3f1a767572
[LLVM] Factor disabled Libcalls into the initializer (#98421)
Summary:
These Libcalls represent which functions are available to the backend.
If a runtime call is not available, the target sets the the name to
`nullptr`. Currently, this logic is spread around the various targets.
This patch pulls all of the locations that disable libcalls into the
intializer. This patch is effectively NFC.

The motivation behind this patch is that currently the LTO handling uses
the list of all runtime calls to determine which functions cannot be
internalized and must be extracted from static libraries. We do not want
this to happen for libcalls that are not emitted by the backend. A
follow-up patch will move out this logic so the LTO pass can know which
rtlib calls are actually used by the backend.
2024-07-11 12:59:25 -05:00
Luke Lau
19cc46144d
[RISCV] Use VP strided load in concat_vectors combine (#98131) 2024-07-09 18:36:00 +08:00
Luke Lau
ed51908cec
[RISCV] Emit VP strided load in mgather combine. NFCI (#98112)
This combine is a duplication of the transform in
RISCVGatherScatterLowering but at the SelectionDAG level, so similarly
to #98111 we can replace the use of riscv_masked_strided_load with a VP
strided load.

Unlike #98111 we don't require #97800 or #97798 since it only operates
on fixed vectors with a non-zero stride.
2024-07-09 14:57:21 +08:00
Craig Topper
bb8998dd3b [RISCV] Don't custom legalize vXf16 SPLAT_VECTOR with Zvfhmin without Zfhmin.
Marking SPLAT_VECTOR as Custom enables generic DAGCombine to turn
BUILD_VECTOR into SPLAT_VECTOR. We need to custom type legalize BUILD_VECTOR
without Zfhmin since we don't have the scalar f16 type. If we allow
SPLAT_VECTOR to be formed, we'll need to custom type legalize it too.

Easiest fix is to only enable SPLAT_VECTOR with Zvfhmin+Zfhmin. There's
still an issue that we need to properly support BUILD_VECTOR with Zvfhmin+Zfhmin.

Should fix the new case reported in #97849.

I've also changed the predicates to Zfhmin instead of ZfhminOrZhinxmin
since Zhinx isn't compatible with Zvfhmin.
2024-07-08 22:44:58 -07:00
Philip Reames
c95935789d
[RISCV] Directly use pack* in build_vector lowering (#98084)
In 03d4332, we extended build_vector lowering to pack elements into the
largest size which doesn't exceed either ELEN or XLEN. The zbkb
extension - ratified under scalar crypto, but otherwise not really
connected to crypto per se - adds the packh, packw, and pack
instructions. These instructions are designed for exactly this pairwise
packing.

I ended up choosing to directly lower to machine nodes. A combination of
the slightly non-uniform semantics of these instructions (packw *sign*
extends the result, whereas packh *zero* extends it), and our generic
dag canonicalization (which sinks shl through or nodes), make pattern
matching these tricky and not particularly robust. Another alternative
was to have an ISD node for them, but that didn't seem to add much in
practice.
2024-07-08 16:10:25 -07:00
Paul Kirth
a4fec164bf
Reapply "[llvm][RISCV] Enable trailing fences for seq-cst stores by default (#87376)" (#90267)
With the tag merging in place, we can safely change the default for
+seq-cst-trailing-fence to the default, according to the recommendation
in
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-atomic.adoc

This patch changes the default for the feature flag, and moves to more
consistent naming with respect to existing features.

This was reverted with https://github.com/llvm/llvm-project/pull/84597,
because ld.bfd would segfault with unknown riscv attributes. Now that
attributes emission is guarded with a backend flag,
`--riscv-abi-attributes`, this should be safe to reland, since it won't 
introduce abi tags unless the user opts into them.
2024-07-08 13:35:36 -07:00
Philip Reames
03d4332625
[RISCV] Pack build_vectors into largest available element type (#97351)
Our worst case build_vector lowering is a serial chain of vslide1down.vx
operations which creates a serial dependency chain through a relatively
high latency operation. We can instead pack together elements into ELEN
sized chunks, and move them from integer to scalar in a single
operation.

This reduces the length of the serial chain on the vector side, and
costs at most three scalar instructions per element. This is a win for
all cores when the sum of the latencies of the scalar instructions is
less than the vslide1down.vx being replaced, and is particularly
profitable for out-of-order cores which can overlap the scalar
computation.

This patch is restricted to configurations with zba and zbb. Without
both, the zero extend might require two instructions which would bring
the total scalar instructions per element to 4. zba and zba are both
present in the rva22u64 baseline which is looking to be quite common for
hardware in practice; we could extend this to systems without bitmanip
with a bit of extra effort.
2024-07-08 10:38:15 -07:00
Craig Topper
e4ee9bf0d2
[RISCV] Custom legalize vXf16 BUILD_VECTOR without Zfhmin. (#97874)
If we don't have Zfhmin, we will call `SoftPromoteHalfOperand` on the
BUILD_VECTOR. This operation is not supported by the generic code.

Instead, custom lower to a vXi16 BUILD_VECTOR using bitcasts.

Fixes #97849.
2024-07-07 20:25:09 -07:00
Craig Topper
6337fdcc52
[RISCV] Use EXTLOAD in lowerVECTOR_SHUFFLE. (#97862)
We're creating a load and a splat. The splat doesn't use the extended
bits so it doesn't matter what extend we use.
2024-07-05 19:34:19 -07:00
Craig Topper
f118c882fe [RISCV] Remove unnecessary setOperationAction for ISD::SELECT_CC for fixed vectors. NFC
We already looped through all builtin operations and marked them as
Expand. We don't need to do it to SELECT_CC again.
2024-07-05 17:35:34 -07:00