2281 Commits

Author SHA1 Message Date
Paul Kirth
ec8b9ca47d
Revert "[clang][DebugInfo] Add virtuality call-site target informatio… (#182343)
…n in DWARF. (#167666)"

This reverts commit 418ba6e8ae2cde7924388142b8ab90c636d2c21f.

The commit caused an ICE due to hitting unreachable in
llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:1307

Fixes #182337
2026-02-19 12:19:11 -08:00
Carlos Alberto Enciso
418ba6e8ae
[clang][DebugInfo] Add virtuality call-site target information in DWARF. (#167666)
Given the test case:

  struct CBase {
    virtual void foo();
  };

  void bar(CBase *Base) {
    Base->foo();
  }

and using '-emit-call-site-info' with llc, the following DWARF
is produced for the indirect call 'Base->foo()':

1$: DW_TAG_structure_type "CBase"
      ...
2$:   DW_TAG_subprogram "foo"
        ...

3$: DW_TAG_subprogram "bar"
      ...
4$:   DW_TAG_call_site
        ...

We add DW_AT_LLVM_virtual_call_origin to existing call-site
information, linking indirect calls to the function-declaration
they correspond to.

4$:   DW_TAG_call_site
        ...
        DW_AT_LLVM_virtual_call_origin (2$ "_ZN5CBase3fooEv")

The new attribute DW_AT_LLVM_virtual_call_origin helps to
address the ambiguity to any consumer due to the usage of
DW_AT_call_origin.

The functionality is available to all supported debuggers.
2026-02-19 14:48:59 +00:00
Craig Topper
2cb342c733
[RISCV] Add combines to form WSUBAU on RV32 with P. (#181604) 2026-02-17 15:32:47 -08:00
Craig Topper
7fd56a0d74
[RISCV] Calculate max call frame size in RISCVTargetLowering::finalizeLowering. (#181302)
I want to enable the frame pointer when the call frame size is too large
to access emergency spill slots. To do that I need to know the call
frame size early enough to reserve FP.

The code here is copied from AArch64. ARM does the same. I did not check
other targets.

Splitting this off separately because it stops us from unnecessarily
reserving the base pointer in the some RVV tests. That appears to due to
this check

(!hasReservedCallFrame(MF) && (!MFI.isMaxCallFrameSizeComputed() ||
MFI.getMaxCallFrameSize() != 0))) &&

By calculating early !MFI.isMaxCallFrameSizeComputed() is no longer true
and the size is zero.
2026-02-13 20:32:48 -08:00
Craig Topper
75cc975c2c
[RISCV] Combine ADDD(lo, hi, x, 0) -> WADDAU(lo, hi, x, 0). Combine WADDAU (WADDAU lo, hi, x, 0), y, 0 -> WADDAU lo, hi, x, y (#181396)
WADDAU is rd += zext(rs1) + zext(rs2)

If we only have 1 32-bit input can force rs2 to avoid zeroing the upper
part of a register pair to use ADDD.

Unfortunately, WADDAU clobbers rd so it might need a GPRPair copy
if we need the old value of rd. We might need to look into that in
the future. Maybe we could have convertToThreeAddress could turn
it back into ADDD+WADDU or ADDD+LI.

Assisted-by: claude
2026-02-13 13:39:57 -08:00
Craig Topper
a809d6409f
[RISCV] Remove RISCVISD::WMACC*. Match during isel. NFC (#181197)
I think we may want to be able to fold ADDD nodes independent of the MUL
in some cases. For example turning NSRAI into NSRARI.

If we fold ADDD into WMACC we would need to be able to extract it again.
Keep the nodes separate avoids this.

Code change was assisted by AI.
2026-02-12 22:06:01 -08:00
Craig Topper
664663cbbf
[RISCV] Improve 2*XLEN SHL legalization with P extension. (#181056)
For an i64 shift by a constant < 32 on RV32, we can use NSRLI
with 32-ShAmt to calculate the high half of the result.
    
For non-constant shifts, we can use SLX and some bit tricks to
avoid branches. I wanted to use the target independent code from
TargetLowering, but it currently produces worse code.

Assisted-by: claude
2026-02-11 23:32:02 -08:00
Craig Topper
db588931c5
[RISCV] Use NSRL/NSRA for legalizing i64 shifts with P extension on RV32. (#181040)
If the shift amount might be in the range [0, 31], we can use
NSRL/NSRA to shift the i64 value to compute the lower 32 bits of
the result.

If the shift amount is >= 32, the high half of the result is all
zeros or sign bits. Otherwise it is a srl/sra of the high bits.

I've handled the constant case in ReplaceNodeResults but deferred
the non-constant case to lowerShiftRightParts. This function is
not called for constants. This gives the opportunity for DAGCombine to
optimize the SRL_PARTS/SRA_PARTS if the shift amount can be proven
to be >= 32 or < 32.

Sequences were also discussed on the P extension mailing list here
https://lists.riscv.org/g/tech-p-ext/message/861

Assisted-by: claude
2026-02-11 22:37:47 -08:00
Folkert de Vries
6a81656f7d
[RISCV] improve musttail support (#170547)
Basically https://github.com/llvm/llvm-project/pull/168506 but for
riscv, so to be clear the hard work here is @heiher 's. I figured we may
as well get some extra eyeballs on this from riscv too.

Previously the riscv backend could not handle `musttail` calls with more
arguments than fit in registers, or any explicit `byval` or `sret`
parameters/return values. Those have now been implemented.

This is part of my push to get more LLVM backends to support `byval` and
`sret` parameters so that rust can stabilize guaranteed tail call
support. See also:

- https://github.com/llvm/llvm-project/pull/168956
- https://github.com/rust-lang/rust/issues/148748

---------

Co-authored-by: WANG Rui <wangrui@loongson.cn>
2026-02-11 17:27:51 +01:00
Pengcheng Wang
e84659b71b
[RISCV][CodeGen] Combine vwaddu+vabd(u) to vwabda(u)
Note that we only support SEW=8/16 for `vwabda(u)`.

Reviewers: topperc, lukel97, preames

Reviewed By: topperc, lukel97

Pull Request: https://github.com/llvm/llvm-project/pull/180162
2026-02-11 18:53:29 +08:00
Luke Lau
cd2761f7ab
[RISCV] Remove vp.reverse mask check in performVP_REVERSECombine (#180724)
Similar to #180706, the masked off lanes in vp.reverse are poison so can
be replaced with anything. Because of this, we should be able to fold a
masked vp.reverse(vp.load) into a vp.strided.load stride=-1 even when
the mask isn't all ones.
2026-02-11 09:13:42 +00:00
Luke Lau
ffe446e734
[RISCV] Relax reversed mask's mask requirement in reverse to strided load/store combine (#180706)
We have combines for vp.reverse(vp.load) -> vp.strided.load stride=-1
and vp.store(vp.reverse) -> vp.strided.store stride=-1.

If the load or store is masked, the mask needs to be also a vp.reverse
with the same EVL. However we also have the requirement that the mask's
vp.reverse is unmasked (has an all-ones mask).

vp.reverse's mask only sets masked off lanes to poison, and doesn't
affect the permutation of elements. So given those lanes are poison, I
believe the combine is valid for any mask, not just all ones.

This is split off from another patch I plan on posting to generalize
those combines to vector.splice+vector.reverse patterns, as part of
#172961
2026-02-11 16:43:02 +08:00
Craig Topper
31e1bcfd09
[RISCV] Add basic scalar support for MERGE, MVM, and MVMN from P extension (#180677)
These are 3 variations of the same operation with a different operand
tied to the destination register. We need to pick the one that
minimizes the number of mvs.

To do this we take the approach used by AArch64 to select between
BIT, BIF, and BSL which the same operations. We define a pseudo
with no tied constraint and expand it after register allocation based
on where the destination register ended up. If the destination
register is none of the operands, we'll insert a mv.

I've replaced RISCVISD::MVM with RISCVISD::MERGE and updated the operand
order accordingly. I find the MERGE name easier to read so I've made it
the canonical name.

Ideally we could use commuteInstructionImpl and the
TwoAddressInstructionPass
to select the opcode before register allocation. That only works if
you can commute exactly 2 operands and maybe change the opcode in the MI
representation of any of the forms to get to the either of the other 2
forms.
That is not possible. We'd need to define 3 more pseudoinstructions
with different permutations.

With the current approach it might be possible that we insert a mv
not because all of the operand registers we needed by later
instructions,
but because the register allocator needed to put the result in a
different register. It's possible a different allocation for other
instructions might have avoided the mv.

I wrote the patch based on the AArch64, but the tests were generated
by AI.
2026-02-10 13:39:34 -08:00
Craig Topper
f33ea53451
[RISCV] Remove redundant czero in multi-word comparisons (#180485)
When comparing multi-word integers with Zicond, we generate:
  (or (czero_eqz (lo1 < lo2), (hi1 == hi2)),
      (czero_nez (hi1 < hi2), (hi1 == hi2)))

The czero_nez is redundant because when hi1 == hi2 is true, hi1 < hi2 is
already 0. This patch adds a DAG combine to recognize:
  czero_nez (setcc X, Y, CC), (setcc X, Y, eq) -> (setcc X, Y, CC)
when CC is a strict inequality (lt, gt, ult, ugt).

This saves one instruction in 128-bit comparisons on RV64 with Zicond.

Note the czero_nez becomes a czero.eqz in the final assembly because the
seteq is replaced by an xor that produces 0 when the values are equal.

Part of #179584

Assisted-by: claude
2026-02-09 21:48:14 -08:00
Ryan Buchner
d69ccf3b34
[RISCV] Combine shuffle of shuffles to a single shuffle (#178095)
Compressing to a single shuffle doesn't remove any information and the backend can better apply specific optimizations to a single shuffle.

Addresses #176218.

---------

Co-authored-by: Luke Lau <luke_lau@igalia.com>
2026-02-09 14:48:31 -08:00
Craig Topper
e6a72a1d42
[RISCV] Combine ADDD+WMULSU to WMACCSU (#180454)
Extend the existing combineADDDToWMACC DAG combine to also match
RISCVISD::WMULSU and produce RISCVISD::WMACCSU. This is similar to
how ADDD+UMUL_LOHI is combined to WMACCU and ADDD+SMUL_LOHI is
combined to WMACC.

This patch was generated by AI, but I reviewed it.
2026-02-09 08:51:27 -08:00
Pengcheng Wang
972e73b812
[RISCV][CodeGen] Lower ISD::ABS to Zvabd instructions
We add pseudos/patterns for `vabs.v` instruction and handle the
lowering in `RISCVTargetLowering::lowerABS`.

Reviewers: topperc, 4vtomat, mshockwave, preames, lukel97, tclin914

Reviewed By: mshockwave

Pull Request: https://github.com/llvm/llvm-project/pull/180142
2026-02-09 15:21:25 +08:00
Pengcheng Wang
e992593341
[RISCV][CodeGen] Lower abds/abdu to Zvabd instructions
We directly lower `ISD::ABDS`/`ISD::ABDU` to `Zvabd` instructions.

Note that we only support SEW=8/16 for `vabd.vv`/`vabdu.vv`.

Reviewers: mshockwave, lukel97, topperc, preames, tclin914, 4vtomat

Reviewed By: lukel97, topperc

Pull Request: https://github.com/llvm/llvm-project/pull/180141
2026-02-09 15:12:22 +08:00
Craig Topper
769b734c02
[RISCV] Combine ADDD with UMUL_LOHI/SMUL_LOHI into WMACCU/WMACC (#180383)
Combine the pattern:
  ADDD(addlo, addhi, UMUL_LOHI(x, y).0, UMUL_LOHI(x, y).1)
into:
  WMACCU(x, y, addlo, addhi)

And similarly for SMUL_LOHI -> WMACC.


This patch was written with AI, but I reviewed it carefully.
2026-02-08 13:39:32 -08:00
Craig Topper
5c826f5172
[RISCV] Emit MULHU/MULHS/UMUL_LOHI/SMUL_LOHI from our custom XLen*2 expansion. (#180379)
We already do all the checks necessary in order to prioritize
MULHU/MULHS/UMUL_LOHI/SMUL_LOHI over MULHSU/WMULSU. We might as
well just emit the nodes instead of letting generic type legalization
redo the checks.

This is slightly different than the default legalization because we
don't have access to ExpandInteger so we have to emit TRUNCATES and
BUILD_PAIR. Not sure if this will result in any differences in practice.
2026-02-08 13:39:15 -08:00
Craig Topper
a563e6bb7e
[RISCV] Add support for forming WMULSU during type legalization. (#180331)
Add a DAG combine to turn it into MULHSU if the lower half result
is unused.
2026-02-08 12:38:56 -08:00
Craig Topper
370764c8cb
[RISCV] Use addd/subd for i64 add/sub for RV32+P. (#180129)
Add RISCVISD opcodes and custom type legalize to them.
2026-02-06 12:42:11 -08:00
Brandon Wu
d99f1cdd66
[RISCV][llvm] Support INSERT_VECTOR_ELT codegen for P extension (#179471)
Add custom lowering for INSERT_VECTOR_ELT on P extension vector types
using the MVM instruction.

TODO: Handle <4 x i8> on RV64 which is constructed to extract_vector_elt
+ build_vector instead of insert_vector_elt.
2026-02-06 14:12:18 +08:00
Craig Topper
22c5c2583d
[RISCV] Reorder the operands for RISCVISD::PPAIRE_DB. NFC (#180111)
Order the operands so the the low and high part of the rs1 regpair are
first, followed by the low and high part of the rs2 regpair.

Also change the type to use v4i8 for the result so that it's only
shuffling elements not combining elements into a larger elment.

I'm planning to add ADDD and SUBD opcodes that will be defined with the
same operand order allowing RISCVISelDAGToDAG.cpp code to be shared.
2026-02-05 21:35:47 -08:00
Craig Topper
1ad20b9428
[RISCV] Rename RISCVISD::PPACK_DH->PPAIRE_DB. NFC (#180089)
The instruction was renamed, but we hadn't renamed the ISD opcode.
2026-02-05 17:35:12 -08:00
Craig Topper
313d9ac1cf
[RISCV] Add wmul(u) codegen for RV32+P (#180032)
mulh tests are to make sure we continue to use mulh when only the
upper half is used.
2026-02-05 17:34:25 -08:00
Craig Topper
6c37aa8ffd
[RISCV] Remove P from RISCVISD::PASUB(U)/PMULHSU/PMULHR(U)/PMULHRSU. NFC (#180064)
There's a good chance we'll want to use these for scalar too.

Drop vector type from SDTypeProfile. Remove PMULHSU since we already
have RISCVISD::MULHSU for scalars in the base ISA.
2026-02-05 17:33:35 -08:00
Jameson Nash
d762cc2f03
[GlobalISel] Add SVE support for alloca (#178976)
Complementary to the same handling code in SelectionDAG:

f3d81d4110/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp (L160-L165)

f3d81d4110/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (L4613-L4623)

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 14:00:34 -05:00
Luke Lau
9ed7ba87c4
[RISCV] Remove redundant vand.vi with fpto*i to i1 (#179876)
If the source of an fpto*i doesn't fit in the destination type, the
result is poison. For i1 destinations, this means the result needs to be
0 or 1/-1, so we can just compare the result to 0 directly instead of
truncating.

The VP lowering for fpto*i already does this.
2026-02-06 00:06:32 +08:00
Craig Topper
fc56916a5d
[RISCV] Correct lowering of ISD::SETGE/SETULE/SETLE/SETUGE in lowerVPSetCCMaskOp. (#179801)
XOR should be OR to match the comment.

Found while reviewing #179622 which deletes this function. I would like
to commit this first so we have a correct baseline for reviewing that
patch.
2026-02-04 20:25:13 -08:00
Luke Lau
3794b83ae5
[RISCV] Don't emit VP_SETCC in combineVectorSizedSetCCEquality. NFC (#179479)
This is part of the work to remove trivial VP intrinsics.

In the combineVectorSizedSetCCEquality combine, used for the compares
that ExpandMemcmp generates, we currently emit a VP_SETCC. We can just
emit a regular SETCC and let RISCVVLOptimizer take care of reducing the
VL.
2026-02-04 06:59:27 +00:00
Craig Topper
42b1beb3f0
[RISCV] Default all ISD opcodes to Expand for P extension. (#179396)
Legal is the default for most opcodes, but we don't yet support all of
them. Override the ones that we support back to Legal.
2026-02-02 22:59:32 -08:00
Nicolai Hähnle
6f0b873f1c
[CodeGen] Refactor targets to override the new getTgtMemIntrinsic overload (NFC) (#175844)
This is a fairly mechanical change. Instead of returning true/false,
we either keep the Infos vector empty or push one entry.
2026-02-02 17:40:02 -08:00
Francesco Petrogalli
c6086dd550
[RISC-V][Mach-O] Add codegen support for Mach-O object format. (#178263)
This commit enables code generation for RISC-V targeting Mach-O:

- Implement RISCVMachOTargetObjectFile::getNameWithPrefix method to
handle Mach-O symbol naming requirements.
- Use shouldAssumeDSOLocal() in RISCVTargetLowering::lowerGlobalAddress
instead of isDSOLocal() for proper Mach-O semantics in global address
lowering. Note that this is a NFC for RISCV when targeting ELF.
- Add comprehensive tests for various relocation types (direct globals,
GOT-based addressing, static vs PIC models).
- Test function calls, tail calls, and various symbol reference patterns
including addends and subtractions.

This patch is based on code originally written by Tim Northover.
2026-02-02 14:11:27 -08:00
Craig Topper
80cbd1d696
[RISCV] Support ISD::CLMUL/CLMULH for i64 scalable vectors with Zvbc. (#178340)
We also get some i32->i64 promotion for CLMULH. The DAGCombiner
change is to prevent an infinite loop from that.

Test file was rewritten to cover all types and split between clmul
and clmulh.

I added a couple masked tests to show that VectorPeephole works.
The test outputs were already large so I didn't want to add more than a couple.
2026-01-29 13:17:03 -08:00
Craig Topper
f37bf0ce65
Revert "[RISCV] Support RISCV BitInt larger than 128 (#175515)" (#178311)
This reverts commit e3156c531da5aa4ec604605ed4e19638879d773c.

We need to resolve a crash on trunk and LLVM 22. Reverting makes it
easier to backport.

Fixes #176637.
2026-01-29 07:16:14 -08:00
Craig Topper
c8b1ff90f3
[RISCV] Hoist a duplicate setOperationAction to a common place. NFC (#178364) 2026-01-27 22:54:49 -08:00
Craig Topper
05e2ee9664
[RISCV] Replace riscv.clmul intrinsic with llvm.clmul (#178092)
I did not replace riscv.clmulh/clmulr since those require a multiple
instruction pattern match. I wanted to ensure that -O0 will select the
correct instructions without relying on combines.
2026-01-26 21:12:48 -08:00
Sudharsan Veeravalli
3ed48305ab
[RISCV] Run combineOrToBitfieldInsert after DAG legalize (#177830)
Not combing `OR` into `QC.INSB(I)` before DAG legalization helps known
bits analysis to simplify the code if possible.
2026-01-26 15:43:00 +05:30
Craig Topper
5c35af8f1e
[RISCV] Replace RISCVISD::CLMUL* with ISD::CLMUL*. (#177386)
This patch does the minimum to remove RISCVISD::CLMUL*. It does not
remove existing intrinsics.

There's some missed optimizations for i32 CLMULH/CLMULR on RV64, but
those may be generic issues.

I've put the test cases in the existing files so it's more obvious what
the missed optimizations are by comparing within the file.
2026-01-22 09:39:44 -08:00
Craig Topper
73a309e20e
[RISCV] Add ZZZ_ to some inline assembly vector register classes to sort them after VR/VRNoV0 in regclass enum. (#177087)
This prevents getCommonSubClass from finding them before VR/VRNoV0.

Fixes a crash reported post-commit in #171231. getCommonSubClass
returned one of these classes, but it doesn't have the same VTs as
VR/VRNoV0 leading to an assertion failure.

The subregister-undef-early-clobber.mir still ends up finding these
register classes in the InitUndef pass.
2026-01-21 21:23:06 -08:00
Brandon Wu
72915ea145
[RISCV][llvm] Support setcc codegen for zvfbfa (#176866) 2026-01-21 07:25:37 +00:00
Brandon Wu
d23c3a5ea7
[RISCV][llvm] Support strict fadd/fsub/fmul/fma codegen for zvfbfa (#176719)
This is same as normal version.

stack on: https://github.com/llvm/llvm-project/pull/176716
2026-01-21 14:40:01 +08:00
Matt Arsenault
aa57ee958d
CodeGen: Use LibcallLoweringInfo for stack protector insertion (#176829)
Thread LibcallLoweringInfo into the TargetLowering hooks used
by the stack protector passes.
2026-01-20 12:37:31 +01:00
Brandon Wu
1887fca885
[RISCV][llvm] Handle sub-register vector shifts for P-extension (#176109)
For sub-register width vectors (v2i16, v4i8) on RV64 with P-extension,
the type legalizer widens them to legal types, i.e. v4i16, v8i8, before
they're getting unrolled, so they'll be redundant computation for higher
part of register.
The correct way to handle is similar to widening div/rem where there's
undef padded for high part.

stack on: https://github.com/llvm/llvm-project/pull/176093
2026-01-19 05:22:50 +00:00
Brandon Wu
2a8a694b50
[RISCV][llvm] Handle calling convention for P extension fixed vectors (#176093)
P extension packed SIMD types are passed in GPRs. For types larger than
XLen (e.g. v8i8 on RV32), they are split and passed via the 2XLen
mechanism, similar to i64 on RV32.

FIXME: Need to figure out the mechanism when P and V are enabled at the
same time.

stack on: https://github.com/llvm/llvm-project/pull/176193
2026-01-19 12:09:27 +08:00
Craig Topper
1621e007db
[RISCV] Remove unnecessary EVT->MVT->EVT conversions. NFC (#176214)
We don't need to use getSimpleValueType if we're just passing to
getNode.
2026-01-15 15:48:18 -08:00
Akshay Deodhar
3860147a7f
[NFC][TargetLowering] Make shouldExpandAtomicRMWInIR and shouldExpandAtomicCmpXchgInIR take a const Instruction pointer (#176073)
Splits out change from https://github.com/llvm/llvm-project/pull/176015

Changes shouldExpandAtomicRMWInIR to take a constant argument: This is
to allow some other TargetLowering constant-argument functions to call
it. This change touches several backends. An alternative solution
exists, but to me, this seems the "right" way.
2026-01-15 14:22:57 -08:00
Brandon Wu
546ba870f7
[RISCV][llvm] Refactor unpackFromMemLoc to use convertLocVTToValVT. NFC (#175969)
Simplify unpackFromMemLoc to use convertLocVTToValVT for handling
LocInfo conversions, making it consistent with unpackFromRegLoc.
2026-01-16 02:50:01 +08:00
Brandon Wu
c7e4350cdc
[RISCV][llvm] Support select codegen for P extension (#175741)
This is scalar condition with fixed vector true/false value, we can just
handle it same as scalars.
2026-01-14 14:05:45 +08:00