1787 Commits

Author SHA1 Message Date
Craig Topper
c67653fbc3
[RISCV] Support vXf16 vector_shuffle with Zvfhmin. (#97491)
We can shuffle vXf16 vectors just like vXi16 vectors. We don't need any
FP instructions. Update the predicates for vrgather and vslides patterns
to only check the predicates based on the equivalent integer type. If we
use the FP type it will check Zvfh and block Zvfhmin.

These are probably not the only patterns that need to be fixed, but the
test from the bug report no longer crashes.

Fixes #97477
2024-07-03 23:56:17 -07:00
realqhc
56f0ecd6db
[RISCV] Implement Intrinsics Support for XCValu Extension in CV32E40P (#85603)
Implement XCValu intrinsics for CV32E40P according to the specification.

This commit is part of a patch-set to upstream the vendor specific
extensions of CV32E40P that need LLVM intrinsics to implement Clang
builtins.

Contributors: @CharKeaney, @ChunyuLiao, @jeremybennett, @lewis-revill,
@NandniJamnadas, @PaoloS02, @serkm, @simonpcook, @xingmingjie.
2024-07-04 01:25:10 +10:00
Craig Topper
57555c6a0a [RISCV] Don't custom lower f16 SCALAR_TO_VECTOR with Zvfhmin.
This doesn't appear to be tested and our custom handler doesn't
support this right now.
2024-07-02 14:48:21 -07:00
Craig Topper
7e6e4986e6
[RISCV] Use EXTLOAD instead of ZEXTLOAD when lowering riscv_masked_strided_load with zero stride. (#97317)
The splat we generate after the load doesn't use the extended bits, so it
shouldn't matter which extend type we use.

EXTLOAD is lowered as SEXTLOAD on every element type except i8.
2024-07-01 13:11:52 -07:00
Nikita Popov
2d209d964a
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...

`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
2024-06-27 16:38:15 +02:00
Simon Pilgrim
3d7d246977 [RICSV] PerformDAGCombine - don't directly dereference dyn_cast results
Use cast<> to assert the cast is valid to help avoid null dereferences

Fixes static analyser warnings
2024-06-27 12:29:56 +01:00
Craig Topper
847235bbef
[RISCV] Add DAG combine to turn (sub (shl X, 8), X) into orc.b (#96680)
If only bits 8, 16, 24, 32, etc. can be non-zero.

This is what (mul X, 255) is decomposed to. This decomposition happens
early before RISC-V DAG combine runs.

This patch does not support types larger than XLen so i64 on rv32 fails
to generate 2 orc.b instructions. It might have worked if the mul hadn't
been decomposed before it was expanded.

Partial fix for #96595.
2024-06-25 15:55:09 -07:00
Jiahan Xie
43d207adda
[RISCV][GISEL] IRTranslator for Scalable Vector Store (#86699)
Support IR translation for scalable vector store
2024-06-24 13:48:59 -04:00
Nikita Popov
f2f18459d4 Revert "Intrinsic: introduce minimumnum and maximumnum (#93841)"
As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.

This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
2024-06-21 08:34:04 +02:00
YunQiang Su
8988148003
Intrinsic: introduce minimumnum and maximumnum (#93841)
Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:

When we compare sNaN vs NUM:

ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
     +0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.

So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.

Half-fix: #93033
2024-06-21 11:53:08 +08:00
Jianjian Guan
7625465651
[RISCV] Make M imply Zmmul (#95070)
According to the spec, M implies Zmmul.
2024-06-21 11:11:10 +08:00
Philip Reames
3e55ac94c7
[RISCV] Strength reduce mul by 2^N - 2^M (#88983)
This is a three instruction expansion, and does not depend on zba, so
most of the test changes are in base RV32/64I configurations.

With zba, this gets immediates such as 14, 28, 30, 56, 60, 62.. which
aren't covered by our other expansions.
2024-06-20 07:36:48 -07:00
Roger Ferrer Ibáñez
5ef02d9963
[RISCV] Lower llvm.clear_cache to __riscv_flush_icache for glibc targets (#93481)
This change is a preliminary step to support trampolines on RISC-V. Trampolines are used by flang to implement obtaining the address of an internal program (i.e., a nested function in Fortran parlance).

In this change we lower `llvm.clear_cache` intrinsic on glibc targets to
`__riscv_flush_icache` which is what GCC is currently doing for Linux targets.
2024-06-20 07:27:07 +02:00
Craig Topper
cb021f5e46 [RISCV] Don't use SEW=16 .vf instructions to move scalar bf16 into a vector.
The instructions are only defined to operator f16 data. If the
scalar FPR register isn't properly nan-boxed, these instructions
will create a fp16 nan not a bf16 nan in the vector register.
2024-06-13 18:12:25 -07:00
Craig Topper
dc8e078a59
[RISCV] SPLAT_VECTOR of bf16 should not require Zvfhmin. (#95357)
The custom lowering converts to f32, splats as f32, then narrows the
vector to bf16. None of that requires Zvfhmin.

Add new bf16 test files without Zvfh/Zvfmin in their RUN lines. I will
remove the bf16 tests from other files in a follow up patch.
2024-06-13 08:42:36 -07:00
Craig Topper
e9fa6ffaf7
[RISCV] Fold (vXi8 (trunc (vselect (setltu, X, 256), X, (sext (setgt X, 0))))) to vmax+vnclipu. (#94720)
This pattern is an obscured way to express saturating a signed value
into a smaller unsigned value.

If (setltu, X, 256) is true, then the value is already in the desired
range so we can pick X. If it's false, we select (sext (setgt X, 0))
which is 0 for negative values and all ones for positive values. The all
ones value when truncated to the final type will still be all ones like
we want.
2024-06-07 09:57:03 -07:00
Liao Chunyu
2afea72968
[RISCV] Codegen support for XCVmem extension (#76916)
All post-Increment load/store, register-register load/store

spec:

https://github.com/openhwgroup/cv32e40p/blob/master/docs/source/instruction_set_extensions.rst

Contributors: @CharKeaney, @jeremybennett, @lewis-revill,
@NandniJamnadas, @PaoloS02, @serkm, @simonpcook, @xingmingjie, @realqhc
2024-06-07 21:45:49 +08:00
Jianjian GUAN
be18daad06 Reland "[RISCV] Support select/merge like ops for bf16 vectors when have Zvfbfmin" (#94565)" 2024-06-07 15:55:16 +08:00
Mehdi Amini
8c452d0cc5
Revert "[RISCV] Support select/merge like ops for bf16 vectors when have Zvfbfmin" (#94565)
Reverts llvm/llvm-project#91936

Premerge bots are broken.
2024-06-05 21:13:52 -07:00
Jianjian Guan
d5ab38f69c
[RISCV] Support select/merge like ops for bf16 vectors when have Zvfbfmin (#91936) 2024-06-06 10:33:54 +08:00
Craig Topper
edf4e02906
[RISCV] Support multiple levels of truncates in combineTruncToVnclip. (#93752)
We can use multiple vnclips to saturate an i32 value into an i8 value.
2024-05-31 09:09:12 -05:00
Craig Topper
8247068b70
[RISCV] Support (truncate (smin (smax X, C1), C2)) for vnclipu in combineTruncToVnclip. (#93756)
If the smax removed all negative numbers, then we can treat the smin
like a umin.

If the smin and smax are in the other order we can swap them and use a
vnclipu as long as the smax constant is smaller than the smin constant.

This is based on similar code from X86's detectUSatPattern.
2024-05-30 15:41:07 -05:00
Craig Topper
8a8cd8a766
[RISCV] Move vnclip patterns into DAGCombiner. (#93728)
Similar to #93596, this moves the signed vnclip patterns into DAG
combine.
    
This will allows us to support more than 1 level of truncate in a
future patch.
2024-05-29 16:46:36 -07:00
Craig Topper
424f82c204 [RISCV] Refactor combineTruncToVnclipu to prepare for adding signed vnclip support. NFC
Reviewed as part of #93728.
2024-05-29 16:34:11 -07:00
Craig Topper
ec8fe598a9
[RISCV] Move vnclipu patterns into DAGCombiner. (#93596)
I plan to add support for multiple layers of vnclipu. For example,
i32->i8 using 2 vnclipu instructions. First clipping to 65535, then
clipping to 255. Similar for signed vnclip.
    
This scales poorly if we need to add patterns with 2 or 3 truncates.
Instead, move the code to DAGCombiner with new ISD opcodes to represent
VCLIP(U).
    
This patch just moves the existing patterns into DAG combine. Support
for multiple truncates will as a follow up. A similar patch series will
be made for the signed vnclip.
2024-05-29 13:00:15 -07:00
Craig Topper
b3bbb2de6f
[RISCV] Verify the VL and Mask on the outer TRUNCATE_VECTOR_VL in combineTruncOfSraSext. (#93578)
We checked the VL and mask of any additional TRUNCATE_VECTOR_VL
nodes we peek through, but not the outermost.
    
This moves the check to the outer node and then verifies all the
additional nodes have the same VL and Mask.

Stacked on #93574
2024-05-29 11:53:01 -07:00
Craig Topper
f7c8a0339c
[RISCV] Combine vXi32 (mul (and (lshr X, 15), 0x10001), 0xffff) -> (bitcast (sra (v2Xi16 (bitcast X)), 15)) (#93565)
Similar for i16 and i64 elements for both fixed and scalable vectors.

This reduces the number of vector instructions, but increases vl/vtype
toggles.

This reduces some code in 525.x264_r from SPEC2017. In that usage, the
vectors are fixed with a small number of elements so vsetivli can be
used.

This is similar to `performMulVectorCmpZeroCombine` from AArch64.
2024-05-28 15:54:44 -07:00
Craig Topper
060b3023e1
[RISCV] Move TRUNCATE_VECTOR_VL combine into a helper function. NFC (#93574)
I plan to add other combines on TRUNCATE_VECTOR_VL.
2024-05-28 14:49:57 -07:00
Craig Topper
d490ce22e9
[RISCV] Use mask undisturbed policy when silencing sNans for strict rounding ops. (#93356)
The elements that aren't sNans need to get passed through this fadd
instruction unchanged. With the agnostic mask policy they might be
forced to all ones.
2024-05-28 08:51:42 -07:00
Craig Topper
a1c9b9673c
[SelectionDAG][RISCV][VE] Rename VP_ASHR->VP_SRA VP_LSHR->VP_SRL. (#93221)
This maintains consistency with the non-VP ISD opcodes.
2024-05-24 09:03:19 -07:00
Yingwei Zheng
557bf3835b
[RISCV][ISel] Allow opaque constants in hasAndNotCompare (#92926)
See the following code:

4ae896fe97/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L9334-L9357)

> Combining: t47: i64 = xor t43, OpaqueConstant:i64<31808>
X: i64 = Constant<0>
Y: i64 = OpaqueConstant<31808>

The assertion failed because both `X` and `Y` are constants.
This patch allows opaque constants in `hasAndNotCompare` to fix the
issue.

Fixes https://github.com/llvm/llvm-project/issues/90730.
2024-05-22 00:48:26 +08:00
Yingwei Zheng
76748119bf
[GISel][RISCV] Add irtranslator/legalizer/selector support for G_FREEZE. (#92744)
This patch adds support for G_FREEZE on riscv. It will be selected into
a copy instruction.
 
The ll test is copied from the AArch64 patch:
665da59685.
2024-05-21 23:59:51 +08:00
Craig Topper
6246b495ad
[RISCV] Select ISD::AVGCEILS/AVGFLOORS as vaadd. (#92839)
I think the behaviors are the same if this describes their behavior.

AVGFLOORS sign extends the inputs by 1 bit, adds them, then does an
arithmetic shift right by 1 before truncating to the original bit width.
This is vaadd with rdn rounding mode.

AVGCEILS sign extends the inputs by 1 bit, adds them, then does an
arithmetic shift right by 1. If the bit shifted out is 1, it adds 1 to
the shifted value. Then truncates to the original bit width. This is vaadd
with rnu rounding mode.

I think this wasn't implemented previously because there was some
confusion about what average means. Some may expect average to round
towards zero, but there is no way to do that in RISC-V or with the
SelectionDAG nodes. Related issue
https://github.com/riscv/riscv-v-spec/issues/935
2024-05-20 23:24:22 -07:00
Kazu Hirata
79a6a7e28f [RISCV] Fix a warning
This patch fixes:

  llvm/lib/Target/RISCV/RISCVISelLowering.cpp:19848:11: error:
  enumeration value 'SW_GUARDED_BRIND' not handled in switch
  [-Werror,-Wswitch]
2024-05-14 00:48:56 -07:00
Yeting Kuo
d488a54b40
[RISCV] Use software guarded branch for indirect jump table branch. (#66762)
When Zicfilp enabled, indirect jump table branch should be a software
guarded branch.
2024-05-14 14:44:25 +08:00
Philip Reames
6140b5bae4
[RISCV] Use RISCVISD::SHL_ADD in transformAddShlImm (#89832)
Doing so avoids negative interactions with other combines which don't
know the shl_add is a single instruction. From the commit log, we've had
several combine loops already.

This was originally posted as part of #88791, where a bug was pointed
out. That bug was fixed by #89789 which hits the same issue from another
angle. To confirm the fix, I included the reduced test case here.
2024-05-13 09:48:46 -07:00
Paul Kirth
d95f7c9cab
[RISCV] Use the thread local stack protector for Android targets (#87672)
Android supports per thread stack protectors that are individually
managed and
initialized, which can provide stronger protections than using the
global stack
protector cookie. This patch matches the convention for other
architectures
targeting Android platforms.
2024-05-13 08:52:59 -07:00
Min-Yih Hsu
f8063ffe73
[VP][RISCV] Add vp.reduce.fmaximum/fminimum and its RISC-V codegen (#91782)
`vp.reduce.fmaximum/fminimum` are the VP version of
`vector.reduce.fmaximum/fminimum`.
2024-05-10 16:01:47 -07:00
Luke Lau
d24eaef925
[RISCV] Sink vector select splat operands (#91554)
vmerge.vxm allows us to splat the true operand of a select, so sink it
where possible to reduce vector register pressure.
2024-05-10 10:01:23 +08:00
Harald van Dijk
8fd838a8c4
[RISC-V] Limit vscale interleaving to addrspace 0. (#91573)
The vlseg and vsseg intrinsic functions are not overloaded on pointer
type, so cannot handle non-default address spaces.

This fixes an error we see after #90583.
2024-05-09 19:15:42 +01:00
Philip Reames
4298fc5eb5
[RISCV] Move strength reduction of mul X, 3/5/9*2^N to combine (#89966)
This moves our last major category tablegen driven multiply strength
reduction into the post legalize combine framework. The one slightly
tricky bit is making sure that we use a leading shl if we can form a
slli.uw, and trailing shl otherwise. Having the trailing shl is critical
for shNadd matching, and folding any following sext.w.

As can be seen in the TD deltas, this allows us to kill off both the
actual multiply patterns and the explicit add (mul X, C) Y patterns. The
later are now handled by the generic shNadd matching code, with the
exception of the THead only C=200 case because we don't (yet) have a
multiply expansion with two shNadd + a shift.

---------

Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
2024-05-08 10:13:01 -07:00
Liao Chunyu
f4d2f7a3b7
[RISCV] Codegen support for XCVbi extension (#89719)
spec:
https://github.com/openhwgroup/cv32e40p/blob/master/docs/source/instruction_set_extensions.rst#immediate-branching-operations

Contributors: @CharKeaney, @jeremybennett, @lewis-revill,
@NandniJamnadas,
@PaoloS02, @simonpcook, @xingmingjie, @realqhc, @PhilippvK,@melonedo
2024-05-08 11:22:16 +08:00
Jianjian Guan
37fcb323f6
[RISCV] Add codegen support for Zvfbfmin (#87911)
This patch adds basic codegen support for Zvfbfmin extension.
2024-05-07 10:25:06 +08:00
Yingwei Zheng
2647bd7369
[RISCV][ISel] Fix types in tryFoldSelectIntoOp (#90659)
```
SelectionDAG has 17 nodes:
  t0: ch,glue = EntryToken
    t6: i64,ch = CopyFromReg t0, Register:i64 %2
  t8: i1 = truncate t6
          t4: i64,ch = CopyFromReg t0, Register:i64 %1
        t7: i1 = truncate t4
            t2: i64,ch = CopyFromReg t0, Register:i64 %0
          t10: i64,i1 = saddo t2, Constant:i64<1>
        t11: i1 = or t8, t10:1
      t12: i1 = select t7, t8, t11
    t13: i64 = any_extend t12
  t15: ch,glue = CopyToReg t0, Register:i64 $x10, t13
  t16: ch = RISCVISD::RET_GLUE t15, Register:i64 $x10, t15:1
```

`OtherOpVT` should be i1, but `OtherOp->getValueType(0)` returns `i64`,
which ignores `ResNo` in `SDValue`.

Fix https://github.com/llvm/llvm-project/issues/90652.
2024-05-01 06:51:36 +08:00
Luke Lau
f565b79f9f
[RISCV] Handle fixed length vectors with exact VLEN in lowerINSERT_SUBVECTOR (#84107)
This is the insert_subvector equivalent to #79949, where we can avoid
sliding up by the full LMUL amount if we know the exact subregister the
subvector will be inserted into.

This mirrors the lowerEXTRACT_SUBVECTOR changes in that we handle this
in two parts:

- We handle fixed length subvector types by converting the subvector to
a scalable vector. But unlike EXTRACT_SUBVECTOR, we may also need to
convert the vector being inserted into too.

- Whenever we don't need a vslideup because either the subvector fits
exactly into a vector register group *or* the vector is undef, we need
to emit an insert_subreg ourselves because RISCVISelDAGToDAG::Select
doesn't correctly handle fixed length subvectors yet: see d7a28f7ad

A subvector exactly fits into a vector register group if its size is a
known multiple of the size of a vector register, and this adds a new
overload for TypeSize::isKnownMultipleOf for scalable to scalable
comparisons to help reason about this.

I've left RISCVISelDAGToDAG::Select untouched for now (minus relaxing an
invariant), so that the insert_subvector and extract_subvector code
paths are the same.

We should teach it to properly handle fixed length subvectors in a
follow-up patch, so that the "exact subregsiter" logic is handled in one
place instead of being spread across both RISCVISelDAGToDAG.cpp and
RISCVISelLowering.cpp.
2024-05-01 01:35:13 +08:00
Min-Yih Hsu
539f626ecd
[VP][RISCV] Add vp.cttz.elts intrinsic and its RISC-V codegen (#90502)
This intrinsic is the VP version of `experimental.cttz.elts`.
2024-04-30 09:27:10 -07:00
Craig Topper
2524146b25
[RISCV] Add DAG combine for (vmv_s_x_vl (undef) (vmv_x_s X). (#90524)
We can use the original vector as long as the type of X matches the
result type of the vmv_s_x_vl.
2024-04-29 23:35:30 -07:00
Craig Topper
f9d4d54aa0
[RISCV] Break the (czero_eqz x, (setne x, 0)) -> x combine into 2 combines. (#90428)
We can think of this as two separate combines

(czero_eqz x, (setne y, 0)) -> (czero_eqz x, y)
and
(czero_eqz x, x) -> x

Similary the (czero_nez x, (seteq x, 0)) -> x combine can be broken into

(czero_nez x, (seteq y, 0)) -> (czero_eqz x, y)
and
(czero_eqz x, x) -> x

isel already does the (czero_eqz x, (setne y, 0)) -> (czero_eqz x, y)
and (czero_nez x, (seteq y, 0)) -> (czero_eqz x, y) combines, but doing
them early could expose other opportunities.
2024-04-29 10:15:57 -07:00
Maciej Gabka
bfc0317153
Move several vector intrinsics out of experimental namespace (#88748)
This patch is moving out following intrinsics:
* vector.interleave2/deinterleave2
* vector.reverse
* vector.splice

from the experimental namespace.

All these intrinsics exist in LLVM for more than a year now, and are
widely used, so should not be considered as experimental.
2024-04-29 10:16:45 +01:00
Qiu Chaofan
4a8f2f2e1a
[Legalizer] Expand fmaximum and fminimum (#67301)
According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and
propagates NaN.

Expand the nodes on targets not supporting the operation, by adding
extra check for NaN and using is_fpclass to check zero signs.
2024-04-29 15:09:54 +08:00