7012 Commits

Author SHA1 Message Date
Luke Lau
eafe31b293
[RISCV] Don't lose elements from False in vmerge -> vmv peephole (#149720)
In the vmerge peephole, we currently allow different AVLs for the vmerge
and its true operand.
If vmerge's VL > true's VL, vmerge can "preserve" elements from false
that would otherwise be clobbered with a tail agnostic policy on true.

    mask	1 1 1 1 0 0 0 0
    true	x x x x|. . . . AVL=4
    vmerge	x x x x f f|. . AVL=6

If we convert this to vmv.v.v we will lose those false elements:

    mask	1 1 1 1 0 0 0 0
    true	x x x x|. . . . AVL=4
    vmv.v.v	x x x x . .|. . AVL=6

Fix this by checking that vmerge's AVL is <= true's AVL.

Should fix #149335
2025-07-22 15:21:42 +08:00
Craig Topper
8b9e760271
[RISCV] Don't use RVInstIBase for P-ext plui/pli. NFC (#149940)
These instructions don't have an rs1 field unlike other instructions
that use RVInstIBase.

Rename the classes to not use Unary since we have historically used that
for a single register operand.
2025-07-21 21:44:21 -07:00
Brandon Wu
24bf4aea0c
[RISCV][llvm] Handle vector callee saved register correctly (#149467)
In TargetFrameLowering::determineCalleeSaves, any vector register is
marked
as saved if any of its subregister is clobbered, this is not correct in
vector registers. We only want the vector register to be marked as saved
only if all of its subregisters are clobbered.
This patch handles vector callee saved registers in target hook.
2025-07-21 17:49:34 -07:00
Craig Topper
5e8e03d859 [RISCV] Simplify RVPUnary tablegen class. NFC
imm field was unused. rs1 is already handled in RVInstIBase.
2025-07-21 15:16:18 -07:00
Philip Reames
bf86abee3e
[RISCV][IA] Support masked.store of deinterleaveN intrinsic (#149893)
This is the masked.store side to the masked.load support added in
881b3fd.

With this change, we support masked.load and masked.store via the
intrinsic lowering path used primarily with scalable vectors. An
upcoming change will extend the fixed vector (i.a. shuffle vector) paths
in the same manner.
2025-07-21 14:04:03 -07:00
Craig Topper
860ff8714b
[RISCV] Use empty() instead of size()==0. NFC (#149868)
Move the assert past the code that determines if the pass should run.
2025-07-21 13:16:38 -07:00
Philip Reames
d93f91fc46 [RISCV][IA] Prefer switch over intrinsic ID instead of if-chain [nfc] 2025-07-21 12:24:51 -07:00
Philip Reames
881b3fdfad
[RISCV][IA] Support masked.load for deinterleaveN matching (#149556)
This builds on the whole series of recent API reworks to implement
support for deinterleaveN of masked.load. The goal is to be able to
enable masked interleave groups in the vectorizer once all the codegen
and costing pieces are in place.

I considered including the shuffle path support in this review as well
(since the RISCV target specific stuff should be common), but decided to
separate it into it's own review just to focus attention on one thing at
a time.
2025-07-21 11:07:41 -07:00
Alex Bradbury
fc69f25a8f
[RISCV] Convert LWU to LW if possible in RISCVOptWInstrs (#144703)
After the refactoring in #149710 the logic change is trivial.

Motivation for preferring sign-extended 32-bit loads (LW) vs
zero-extended (LWU):
* LW is compressible while LWU is not.
* Helps to minimise the diff vs RV32 (e.g. LWU vs LW)
* Helps to minimise distracting diffs vs GCC. I see this come up
frequently when comparing GCC code and in these cases it's a red
herring.

Similar normalisation could be done for LHU and LH, but this is less
well motivated as there is a compressed LHU (and if performing the
change in RISCVOptWInstrs it wouldn't be done for RV32). There is a
compressed LBU but not LB, meaning doing a similar normalisation for
byte-sized loads would actually be a regression in terms of code size.
Load narrowing when allowed by hasAllNBitUsers isn't explored in this
patch.

This changes ~20500 instructions in an RVA22 build of the
llvm-test-suite including SPEC 2017. As part of the review, the option
of doing the change at ISel time was explored but was found to be less
effective.
2025-07-21 11:48:33 +01:00
Alex Bradbury
9311f3814b
[RISCV][NFC] Combine RISCVOptWInstrs::stripWSuffixes and appendWSuffixes into canonicalizeWSuffixes (#149710)
This refactor was suggested in
<https://github.com/llvm/llvm-project/pull/144703>.

I have checked for unexpected changes by comparing builds of
llvm-test-suite with/without this refactor, including with preferWInst
force enabled.
2025-07-21 10:48:06 +01:00
Luke Lau
b832c49cb4
[RISCV] Fix VLOptimizer assert, relax ElementsDependOn on viota/vms{b,i,o}f.m (#149698)
The previous assert wasn't passing the TSFlags but the opcode, so wasn't
working.

Fixing it reveals that it was actually triggering, because we're too
strict with viota and vmsxf.m We already reduce the VL on these
instructions because the result in each element doesn't depend on VL.
However, it does change if masked, so account for that.
2025-07-21 14:51:41 +08:00
Sudharsan Veeravalli
84e689b1db
[RISCV] Swap source register operands in QC_SHLADD ISEL patterns (#149697)
The instruction does `rd = (rs1 << shamt) + rs2` but the ISEL patterns
had `rs1` and `rs2` the other way around which is incorrect.
2025-07-21 12:03:55 +05:30
Alex Bradbury
c58225f757
[RISCV] Add RISCV::SUBW to RISCVOptWInstrs::stripWSuffixes (#149071)
This is purely a benefit for reducing unnecessary diffs between RV32 and
RV64, as RVC does have a compressed form of SUBW (so SUB isn't more
compressible). This affects ~57.2k instructions in an rva22u64 build of
llvm-test-suite with SPEC CPU 2017 included.
2025-07-20 09:59:41 +01:00
Alex Bradbury
971bfbead2 [RISCV][NFC] Add NumTransformedToNonWInstrs statistic to RISCVOptWInstrs extend debug printing
RISCVOptWInstrs has a NumTransformedToWInstrs statistic, but didn't have
one for the W=>Non-W transform done by stripWSuffixes. It also didn't do
debug printing of the transformation. This patch addresses both issues.

Reviewed as part of <https://github.com/llvm/llvm-project/pull/149071>,
but landing separately.
2025-07-20 09:54:10 +01:00
Fangrui Song
fd6d6a7c8d
MC: Refactor FT_Align fragments when linker relaxation is enabled
Previously, two MCAsmBackend hooks were used, with
shouldInsertFixupForCodeAlign calling getWriter().recordRelocation
directly, bypassing generic code.

This patch:

* Introduces MCAsmBackend::relaxAlign to replace the two hooks.
* Tracks padding size using VarContentEnd (content is ignored).
* Move setLinkerRelaxable from MCObjectStreamer::emitCodeAlignment to the backends.

Pull Request: https://github.com/llvm/llvm-project/pull/149465
2025-07-20 00:55:54 -07:00
Fangrui Song
2ba5e0ad17
MC: Encode FT_Align in fragment's variable-size tail
Follow-up to #148544

Pull Request: https://github.com/llvm/llvm-project/pull/149030
2025-07-20 00:46:51 -07:00
Craig Topper
ff0cbecb68 [RISCV] Add a non-template version of SelectAddrRegZextRegScale and move code there. NFC
The template versions now call the non-template version. This
avoids duplicating the code for each template.
2025-07-19 17:53:39 -07:00
Philip Reames
f6641e2f23
[RISCV][IA] Factor out code for extracting operands from mem insts [nfc] (#149344)
We're going to end up repeating the operand extraction four times once
all of the routines have been updated to support both plain load/store
and vp.load/vp.store. I plan to add masked.load/masked.store in the near
future, and we'd need to add that to each of the four cases. Instead,
factor out a single copy of the operand normalization.
2025-07-18 11:04:18 -07:00
Sergei Barannikov
6112ebde0c
[RISCV] Guard CFI emission code with MF.needsFrameMoves() (#136060)
Currently, AsmPrinter skips CFI instructions created by a backend if
they are not needed. I'd like to change that so that it always
prints/encodes CFI instructions if a backend created them.

This change should slightly (perhaps negligibly) improve compile time as
post-PEI passes no longer need to skip over these instructions in
no-exceptions no-debug builds, and will allow to simplify convoluted
logic in AsmPrinter once other targets stop emitting CFI instructions
when they are not needed (that's my final goal).

The changes in a test seem to be caused by slightly different post-RA
scheduling in the absence of CFI instructions.
2025-07-18 16:49:30 +03:00
Liao Chunyu
52a9c493e6
Reland "[RISCV] AddEdge between mask producer and user of V0 (#146855)" (#148566)
The defmask vector cannot contain instructions that use V0. 

for `MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/mesh.cpp`
Save `%173:vrm2nov0 = PseudoVMERGE_VVM_M2 undef %173:vrm2nov0(tied-def
0), %116:vrm2, %173:vrm2nov0, killed $v0, -1, 5 `to def mask caused crash.
2025-07-18 10:56:07 +08:00
Philip Reames
28417e6459
[IA] Support vp.load in lowerInterleavedLoad [nfc-ish] (#149174)
This continues in the direction started by commit 4b81dc7. We
essentially merges the handling for VPLoad - currently in
lowerInterleavedVPLoad - into the existing dedicated routine. This
removes the last use of the dedicate lowerInterleavedVPLoad and thus we
can remove it.

This isn't quite NFC as the main callback has support for the strided
load optimization whereas the VPLoad specific version didn't. So this
adds the ability to form a strided load for a vp.load deinterleave with
one shuffle used.
2025-07-17 17:29:28 -07:00
Philip Reames
8f18dde6c0 [RISCV][IA] Rearrange code for readability and ease of merge [nfc] 2025-07-17 07:38:15 -07:00
Craig Topper
0f71424280
[RISCV] Teach SelectAddrRegRegScale that ADD is commutable. (#149231) 2025-07-17 07:13:50 -07:00
Philip Reames
b36188514a
[RISCV][IA] Check nuw on multiply when analyzing EVL (#149205)
If we're checking to see if a number is a multiple of a small constant,
we need to be sure the multiply doesn't overflow for the mul logic to
hold. The VL is a unsigned number, so we care about unsigned overflow.
Once we've proven a number of a multiple, we can also use an
exact udiv as we know we're not discarding any bits.

This fixes what is technically a miscompile with EVL vectorization, but
I doubt we'd ever have seen it in practice since most EVLs are going to
much less than UINT_MAX.
2025-07-16 19:11:32 -07:00
Philip Reames
b9adc4a59c
[IA] Use a single callback for lowerInterleaveIntrinsic [nfc] (#148978) (#149168)
This continues in the direction started by commit 4b81dc7. We
essentially merges the handling for VPStore - currently in
lowerInterleavedVPStore which is shared between shuffle and intrinsic
based interleaves - into the existing dedicated routine.
2025-07-16 18:09:27 -07:00
Min-Yih Hsu
6824bcfdb4
[IA] Relax the requirement of having ExtractValue users on deinterleave intrinsic (#148716)
There are cases where InstCombine / InstSimplify might sink extractvalue
instructions that use a deinterleave intrinsic into successor blocks,
which prevents InterleavedAccess from kicking in because the current
pattern requires deinterleave intrinsic to be used by extractvalue.
However, this requirement is bit too strict while we could have just
replaced the users of deinterleave intrinsic with whatever generated by
the target TLI hooks.
2025-07-16 13:46:02 -07:00
Mikhail R. Gadelha
ececa87708
[RISCV][VLOPT] Add support for vrgather (#148249)
This PR adds support for the vrgather.vi, vrgather.vx, vrgather.vv,
vrgatherei16.vv instructions in the RISC-V VLOptimizer.

To support vrgatherei16.vv I also needed to add support for it in
getOperandLog2EEW.
2025-07-16 17:25:27 -03:00
Mikhail R. Gadelha
c4d4e761ef [RISCV] Pre-commit RVV instructions to the x60 scheduling model and tests 2025-07-16 15:48:37 -03:00
Serge Pavlov
372e99938f
Remove unused variable (#149115) 2025-07-16 11:28:57 -04:00
Serge Pavlov
c71b92d09f
[RISCV][FPE] Remove unused variable (#149054)
It was added by me in 905bb5bddb690765cab5416d55ab017d7c832eb3, which
committed PR https://github.com/llvm/llvm-project/pull/148569.
2025-07-16 19:56:31 +07:00
Serge Pavlov
905bb5bddb
[RISCV][FPEnv] Lowering of fpmode intrinsics (#148569)
The change implements custom lowering of `get_fpmode`, `set_fpmode` and
`reset_fpmode` for RISCV target. The implementation is aligned with the
functions `fegetmode` and `fesetmode` in GLIBC.
2025-07-16 16:02:15 +07:00
Jim Lin
3e4153c97b
[RISCV] Implement Builtins for XAndesBFHCvt extension. (#148804)
XAndesBFHCvt provides two builtins functions for converting between
float and bf16. Users can use them to convert bf16 values loaded from
memory to float, perform arithmetic operations, then convert them back
to bf16 and store them to memory.

The load/store and move operations for bf16 will be handled in a later
patch.
2025-07-16 16:13:31 +08:00
Craig Topper
dbb6ed7631 [RISCV] Refactor SelectAddrRegRegScale. NFC
Rename UnwrapShl->SelectShl. Make it only responsible for matching
a SHL by constant.

Handle the fallback case of reg+reg with no scale outside of SelectShl.

Reorder the check so RHS is checked for shift first. The base pointer
is most likely on the LHS. It's very unlikely both operands are shifts.

This is preparation for adding better costing decisions to this code.
2025-07-15 22:43:56 -07:00
Fangrui Song
dc3a4c0fcf
MC: Restructure MCFragment as a fixed part and a variable tail
Refactor the fragment representation of `push rax; jmp foo; nop; jmp foo`,
previously encoded as
`MCDataFragment(nop); MCRelaxableFragment(jmp foo); MCDataFragment(nop); MCRelaxableFragment(jmp foo)`,

to

```
MCFragment(fixed: push rax, variable: jmp foo)
MCFragment(fixed: nop, variable: jmp foo)
```

Changes:

* Eliminate MCEncodedFragment, moving content and fixup storage to MCFragment.
* The new MCFragment contains a fixed-size content (similar to previous
  MCDataFragment) and an optional variable-size tail.
* The variable-size tail supports FT_Relaxable, FT_LEB, FT_Dwarf, and
  FT_DwarfFrame, with plans to extend to other fragment types.
  dyn_cast/isa should be avoided for the converted fragment subclasses.
* In `setVarFixups`, source fixup offsets are relative to the variable part's start.
  Stored fixup (in `FixupStorage`) offsets are relative to the fixed part's start.
  A lot of code does `getFragmentOffset(Frag) + Fixup.getOffset()`,
  expecting the fixup offset to be relative to the fixed part's start.
* HexagonAsmBackend::fixupNeedsRelaxationAdvanced needs to know the
  associated instruction for a fixup. We have to add a `const MCFragment &` parameter.
* In MCObjectStreamer, extend `absoluteSymbolDiff` to apply to
  FT_Relaxable as otherwise there would be many more FT_DwarfFrame
  fragments in -g compilations.

https://llvm-compile-time-tracker.com/compare.php?from=28e1473e8e523150914e8c7ea50b44fb0d2a8d65&to=778d68ad1d48e7f111ea853dd249912c601bee89&stat=instructions:u

```
stage2-O0-g instructins:u geomeon (-0.07%)
stage1-ReleaseLTO-g (link only) max-rss geomean (-0.39%)
```

```
% /t/clang-old -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}'
Total 59675
Align 2215
Data 29700
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396
% /t/clang-new -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}'
Total 32287
Align 2215
Data 2312
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396
```

Pull Request: https://github.com/llvm/llvm-project/pull/148544
2025-07-15 21:56:55 -07:00
Ming-Yi Lai
9b3064aec8
[llvm-objdump][RISCV] Display `@plt' symbols when disassembling .plt section (#147933)
This patch adds dummy symbols for PLT entries for RISC-V 32-bit and
64-bit targets so llvm-objdump can show the function symbol that
corresponds to each PLT entry.
2025-07-16 11:41:17 +08:00
Craig Topper
5ff99f2757 [RISCV] Remove duplicate check in an if statement. NFC 2025-07-15 18:52:57 -07:00
Craig Topper
a87b8398f9 [RISCV] Simplify conversion from ISD::Constant to ISD::TargetConstant in SelectAddrRegRegScale. NFC
Directly copy the underlying ConstantInt instead of reconstructing it.
2025-07-15 18:41:28 -07:00
Philip Reames
4b81dc75f4
[IA] Use a single callback for lowerDeinterleaveIntrinsic [nfc] (#148978)
This essentially merges the handling for VPLoad - currently in
lowerInterleavedVPLoad which is shared between shuffle and intrinsic
based interleaves - into the existing dedicated routine.

My plan is that if we like this factoring is that I'll do the same for
the intrinsic store paths, and then remove the excess generality from
the shuffle paths since we don't need to support both modes in the
shared VPLoad/Store callbacks. We can probably even fold the VP versions
into the non-VP shuffle variants in the analogous way.
2025-07-15 18:08:57 -07:00
Philip Reames
c7d1eae4fc
[RISCV] Use masked segment LD/ST intrinsics in (de)interleaveN lowering [nfc] (#148966)
Follow up on the work from e5bc7e7d, and extend it to the lowering used
for interleave and deinterleave when we can't combine with a nearby
memory operation.
2025-07-15 17:12:08 -07:00
Craig Topper
4bd0e9e7f3 [RISCV] Add early out to reduce indentation in SelectAddrRegRegScale. NFC 2025-07-15 17:01:08 -07:00
Craig Topper
b64d7baf9c
[RISCV] Change the InstFormat for Zicbop prefetch instructions to InstFormatOther. (#148934)
The lower 5-bits of the immediate are not part of the address unlike
other InstFormatS instructions.

We use InstFormatS in RISCVRegisterInfo::needsFrameBaseReg and
RISCVRegisterInfo::getFrameIndexInstrOffset which is not aware of this
special encoding. Force the format to InstFormatOther so those functions
will ignore it.

InstFormatS is also used by relocation emission, but I don't believe we
ever emit these instructions with a relocation because of the encoding.
2025-07-15 14:49:54 -07:00
Philip Reames
bc187b8270 [RISCV] Use early-return in lowerInterleaveIntrinsicToStore [nfc] 2025-07-15 13:58:22 -07:00
Philip Reames
e5bc7e7df3
[RISCV][IA] Always generate masked versions of segment LD/ST [nfc-ish] (#148905)
Goal is to be able to eventually merge some of these code path. Having
the mask operand should get dropped cleanly via pattern match.
2025-07-15 13:02:24 -07:00
Sudharsan Veeravalli
d67d91a990
[RISCV] Fix issues in ORI to QC.INSBI transformation (#148809)
The transformation done in #147349 was incorrect since we were not
passing the input node of the `OR` instruction to the `QC.INSBI`
instruction leading to the generated instruction doing the wrong thing.
In order to do this we first needed to add the output register to
`QC.INSBI` as being both an input and output.

The code produced after the above fix will need a copy (mv) to preserve
the register input to the OR instruction if it has more than one use
making the transformation net neutral ( `6-byte QC.E.ORI/ORAI` vs
`2-byte C.MV + 4-byte QC.INSB`I). Avoid doing the transformation if
there is more than one use of the input register to the OR instruction.
2025-07-15 12:01:33 -07:00
Luke Lau
bc2004c2e4
[RISCV] Handle LHS == 0 in isVLKnownLE (#148860)
If a VL is zero then it's known to be less than or equal to every other
VL.

This looks weird on its own since a VL of zero isn't that common. The
test diffs come from a type being split resulting in a VP intrinsic's
EVL being zero.

The motivation for this is to split off part of an upcoming patch I plan
on submitting for RISCVVLOptimizer, which generalizes it to handle
recurrences, and needs to reason about an initial state of demanded VLs
set to zero.
2025-07-16 02:04:13 +08:00
Craig Topper
63d099af14
[RISCV] Remove incorrect and untested FrameIndex support from SelectAddrRegImm9. (#148779)
To fold a FrameIndex, we need to teach eliminateFrameIndex to respect
the uimm9 range.
2025-07-15 10:49:23 -07:00
Raphael Moreira Zinsly
1db9eb2320
[RISCV] Pass the MachineInstr flag as argument to allocateStack (#147531)
When not in the prologue we do not want to set the FrameSetup flag, by
passing the flag as argument we can use allocateStack correctly on those
cases.
This fixes the allocation and probe in eliminateCallFramePseudoInstr.
2025-07-15 08:09:18 -07:00
Philip Reames
e282cdb0a2 [RISCV][IA] Avoid use of redundant variables which differ solely by type [nfc]
Instead of using dyn_cast, just use isa combined with accessors on the base
VectotType class.  Working towards being able to merge code from some of
these routines.
2025-07-15 08:04:03 -07:00
Mikhail R. Gadelha
2435ea6975
[RISCV][VLOPT] Add support for vfclass.v (#148246)
This PR adds support for the vfclass.v instruction in the RISC-V
VLOptimizer.
2025-07-15 11:49:43 -03:00
Mikhail R. Gadelha
a606f4441a
[RISCV][VLOPT] Add support for vector integer add-with-carry/subtract-with-borrow instructions (#148247)
This PR adds support for the vmadc.vim, vmadc.vvm, vmadc.vxm, vmsbc.vvm,
vmsbc.vxm, vsbc.vvm, vsbc.vxm instructions in the RISC-V VLOptimizer.
2025-07-15 11:49:19 -03:00