63532 Commits

Author SHA1 Message Date
Simon Pilgrim
2f0400c1a1
[Thumb2] mve-shuffle.ll - add missing check prefix coverage for some fullfp16 cases (#180567)
Noticed while working on some upcoming generic shuffle handling
2026-02-10 13:37:24 +00:00
Simon Pilgrim
dca7b11a32 [X86] Add tests showing failure to reduce the vector width of vpmaddwd/vpmaddubsw/pmulhrsw nodes (#180728)
Missing demanded elts handling
2026-02-10 12:45:21 +00:00
Steffen Larsen
9501114ca0
[Verifier] Make verifier fail when global variable size exceeds address space size (#179625)
When a global variable has a size that exceeds the size of the address
space it resides in, the verifier should fail as the variable can
neither be materialized nor fully accessed. This patch adds a check to
the verifier to enforce it.

---------

Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>
Co-authored-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>
2026-02-10 13:27:38 +01:00
Mirko Brkušanin
4280f0d241
[AMDGPU] Add dot4 fp8/bf8 instructions for gfx1170 (#180516) 2026-02-10 12:14:49 +01:00
Anshil Gandhi
bd6dd94584
[AMDGPU] Add legalization rules for atomicrmw max/min ops (#180502)
Adds rules for G_ATOMICRMW_{MAX, MIN, UMAX, UMIN, UINC_WRAP, UDEC_WRAP}.
Each of these generic opcode are supported for S32 and S64 types
on flat, global and local address spaces.
2026-02-10 16:04:05 +05:30
Kerry McLaughlin
e043195ef4
[AArch64] Add support for intent to read prefetch intrinsic (#179709)
This patch adds support in Clang for the PRFM IR instruction, by adding
the following builtin:

  void __pldir(void const *addr);

This builtin is described in the following ACLE proposal:
https://github.com/ARM-software/acle/pull/406
2026-02-10 10:12:52 +00:00
Matt Arsenault
302ff8fd00
InstCombine: Use SimplifyDemandedFPClass on fmul (#177490)
Start trying to use SimplifyDemandedFPClass on instructions, starting
with fmul. This subsumes the old transform on multiply of 0. The
main change is the introduction of nnan/ninf. I do not think anywhere
was systematically trying to introduce fast math flags before, though
a few odd transforms would set them.

Previously we only called SimplifyDemandedFPClass on function returns
with nofpclass annotations. Start following the pattern of
SimplifyDemandedBits, where this will be called from relevant root
instructions.

I was wondering if this should go into InstCombineAggressive, but that
apparently does not make use of InstCombineInternal's worklist.
2026-02-10 09:49:31 +00:00
Benjamin Maxwell
b91eb9b4e5
[SDAG] Implement missing legalization for ISD::VECTOR_FIND_LAST_ACTIVE (#180290)
This lowers the splitting as:
```
any_active(hi_mask)
  ? (find_last_active(hi_mask) + lo_mask.getVectorElementCount())
  : find_last_active(lo_mask)
```

And trivially lowers `<1 x i1>` scalarization to returning zero. Which
is a natural result of the splitting (and the lack of a sentinel
"none-active" result value).

The lowerings likely can be improved. This patch is for completeness.

Should fix:
https://github.com/llvm/llvm-project/pull/178862#issuecomment-3862310334
Fixes #180212
2026-02-10 09:01:13 +00:00
Diana Picus
24405f070f
[AMDGPU] Add intrinsic exposing s_alloc_vgpr (#163951)
Make it possible to use `s_alloc_vgpr` at the IR level. This is a huge
footgun and use for anything other than compiler internal purposes is
heavily discouraged. The calling code must make sure that it does not
allocate fewer VGPRs than necessary - the intrinsic is NOT a request to
the backend to limit the number of VGPRs it uses (in essence it's not so
different from what we do with the dynamic VGPR flags of the
`amdgcn.cs.chain` intrinsic, it just makes it possible to use this
functionality in other scenarios).
2026-02-10 09:28:31 +01:00
Pengcheng Wang
8c5f31b365
[RISCV] Enable select optimization by default (#178394)
And we add `TuneEnableSelectOptimize` to:
* `generic`
* `generic-ooo`
* `sifive-p550`
* `spacemit-x60`
2026-02-10 16:19:01 +08:00
Kyungtak Woo
a56b877056
[NewPM] Port x86-global-base-reg (#180119)
Had to move X86GlobalBaseRegPass to its own file like in
https://github.com/llvm/llvm-project/pull/179864

No test coverage added for now as there are no MIR->MIR tests exercising
this pass and we do not have enough ported to run any end to end tests.

This is a redo of https://github.com/llvm/llvm-project/pull/180070
2026-02-09 22:54:41 -08:00
Craig Topper
f33ea53451
[RISCV] Remove redundant czero in multi-word comparisons (#180485)
When comparing multi-word integers with Zicond, we generate:
  (or (czero_eqz (lo1 < lo2), (hi1 == hi2)),
      (czero_nez (hi1 < hi2), (hi1 == hi2)))

The czero_nez is redundant because when hi1 == hi2 is true, hi1 < hi2 is
already 0. This patch adds a DAG combine to recognize:
  czero_nez (setcc X, Y, CC), (setcc X, Y, eq) -> (setcc X, Y, CC)
when CC is a strict inequality (lt, gt, ult, ugt).

This saves one instruction in 128-bit comparisons on RV64 with Zicond.

Note the czero_nez becomes a czero.eqz in the final assembly because the
seteq is replaced by an xor that produces 0 when the values are equal.

Part of #179584

Assisted-by: claude
2026-02-09 21:48:14 -08:00
vangthao95
8d8864237b
AMDGPU/GlobalISel: Regbanklegalize rules for G_FSQRT (#179817)
Add S16 rules for G_FSQRT. S32 and S64 are expanded by the legalizer.
2026-02-09 18:24:28 -08:00
Lleu Yang
8fc59bc0e3
[SPIRV] Add handling for uinc_wrap and udec_wrap atomics (#179114)
This adds atomicrmw `uinc_wrap` and `udec_wrap` operations support for
SPIR-V. Since SPIR-V doesn't provide dedicated instructions for those
two operations, we have to use the `AtomicExpand` pass to expand the
operations into CAS forms.

Closes #177204.
2026-02-10 01:39:05 +01:00
Dmitry Sidorov
19705bd7fc
[SPIR-V] Emit ceil(Bitwidth / 32) words during OpConstant creation (#180218)
Fixes error of handing constant integers with width in (64; 128) range.
Found during review of
https://github.com/llvm/llvm-project/pull/180182
2026-02-10 00:40:35 +01:00
Daniel Paoliello
853a39043e
[win][aarch64] The Windows Control Flow Guard Check function also preserves X15 (#179738)
The target function to be checked by the Control Flow Guard Check
function is stored in `X15` on AArch64. This register is guaranteed to
be preserved by that function (on success), thus after it returns `X15`
can be used to branch to the target function instead of having to load
it from another register or the stack.
2026-02-09 15:35:20 -08:00
Ryan Buchner
d69ccf3b34
[RISCV] Combine shuffle of shuffles to a single shuffle (#178095)
Compressing to a single shuffle doesn't remove any information and the backend can better apply specific optimizations to a single shuffle.

Addresses #176218.

---------

Co-authored-by: Luke Lau <luke_lau@igalia.com>
2026-02-09 14:48:31 -08:00
Steven Perron
e1d2ff6caf
[SPIRV] Implement lowering for HLSL Texture2D sampling intrinsics (#179312)
This patch implements the SPIR-V lowering for the following HLSL
intrinsics:
- SampleBias
- SampleGrad
- SampleLevel
- SampleCmp
- SampleCmpLevelZero

It defines the required LLVM intrinsics in 'IntrinsicsDirectX.td' and
'IntrinsicsSPIRV.td'.

It updates 'SPIRVInstructionSelector.cpp' to handle the new intrinsics
and
generates the correct 'OpImageSample*' instructions with the required
operands
(Bias, Grad, Lod, ConstOffset, MinLod, etc.).

CodeGen tests are added to verify the implementation for images with
dimension 1D, 2D, 3D, and Cube.

Assisted-by: Gemini
2026-02-09 17:28:58 -05:00
Alexey Merzlyakov
4136d3f248
[AArch64] Inline asm v0-v31 are scalar when having less than 64-bit capacity (#169930)
If 32-bit (or less) "v0" registers coming from inline asm are treated as
vector ones, codegen might produce incorrect vector<->scalar
conversions. This causes types mismatch assertion failures later during
compile-time. The fix treats 32-bit or less v0-v31 AArch64 registers as
scalar, along with 64-bit ones.

Fixes #153442
2026-02-09 13:26:31 -08:00
Gheorghe-Teodor Bercea
d1dc843c18
[AMDGPU] Enable sinking of free vector ops that will be folded into their uses (#162580)
Sinking ShuffleVectors / ExtractElement / InsertElement into user blocks
can help enable SDAG combines by providing visibility to the values
instead of emitting CopyTo/FromRegs. The sink IR pass disables sinking
into loops, so this PR extends the CodeGenPrepare target hook
shouldSinkOperands.

Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2026-02-09 14:14:31 -05:00
John Brawn
77cb666078
[AArch64] Add support for B and H loads/stores in LoadStoreOptimizer (#180535)
This means the load/store optimizer can generate pre and post increment
versions of these instructions.
2026-02-09 17:48:31 +00:00
vangthao95
404f9e6c99
AMDGPU/GlobalISel: RegBankLegalize rules for amdgcn_sffbh (#180099)
Change test to use update_llc_test_checks.py and make `v_flbit` test
actually divergent.
2026-02-09 09:18:03 -08:00
vangthao95
0040bdf532
AMDGPU/GlobalISel: Regbanklegalize rules for buffer atomic swap (#180265) 2026-02-09 09:04:17 -08:00
Ryan Mitchell
8bbdac9e52
[MIParser] - Add support for MMRAs (#180320)
Probably just forgotten in #78569
2026-02-09 18:01:02 +01:00
Craig Topper
e6a72a1d42
[RISCV] Combine ADDD+WMULSU to WMACCSU (#180454)
Extend the existing combineADDDToWMACC DAG combine to also match
RISCVISD::WMULSU and produce RISCVISD::WMACCSU. This is similar to
how ADDD+UMUL_LOHI is combined to WMACCU and ADDD+SMUL_LOHI is
combined to WMACC.

This patch was generated by AI, but I reviewed it.
2026-02-09 08:51:27 -08:00
Simon Pilgrim
a911fc12ec
[X86] Fold expand(splat,passthrough,mask) -> select(splat,passthrough,mask) (#180238)
If all elements of the expansion vector are already splatted in place then we can use a vselect directly
2026-02-09 16:15:41 +00:00
Nathan Gauër
091972c354
[SPIR-V] initial support for @llvm.structured.gep (#178668)
This commit adds initial support to lower the intrinsinc
`@llvm.structured.gep` into proper SPIR-V.
For now, the backend continues to support both GEP formats. We might
want to revisit this at some point for the logical part.
2026-02-09 15:51:07 +00:00
Anshil Gandhi
ab2e10d80f
[AMDGPU] Add legalization rules for G_ATOMICRMW_FADD (#175257)
G_ATOMICRMW_FADD is supported on flat, global and local address spaces
for S32, S64 and V2S16 values.
2026-02-09 15:37:27 +00:00
Shilei Tian
65b4099219
[AMDGPU] Fix instruction size for 64-bit literal constant operands (#180387)
`getLit64Encoding` uses a different approach to determine whether 64-bit
literal encoding is used, which caused a size mismatch between the
`MachineInstr` and the `MCInst`.

For `!isValid32BitLiteral`, it is effectively `!(isInt<32>(Val) ||
isUInt<32>(Val))`, which is `!isInt<32>(Val) && !isUInt<32>(Val)`, but
in `getLit64Encoding`, it is `!isInt<32>(Val) || !isUInt<32>(Val)`.
2026-02-09 14:31:52 +00:00
Dmitry Sidorov
f6ee5bd4df
[SPIRV] Fix constant materialization for width > 64bit (#180182)
selectConst() was asserting for constants wider than 64 bits. Add APInt
overloads of getOrCreateConstInt and getOrCreateConstVector that avoid
the uint64_t truncation.
2026-02-09 15:05:23 +01:00
Shilei Tian
392f0c9767
[NFC][AMDGPU] Add a test to show the impact of wrong s_mov_b64 instruction size (#180386) 2026-02-09 08:56:28 -05:00
Djordje Todorovic
b2e6b98783
[MIPS] Fix argument size in Fast ISel (#180336)
Fix bug where Fast ISel incorrectly set `IncomingArgSize` to `0` for
functions with no arguments, since `MIPS O32` uses _the reserved
argument area_ of 16 bytes even for the functions with no args at all.
2026-02-09 21:36:35 +08:00
Mirko Brkušanin
45b037cf7a
[AMDGPU] Add fp8/bf8 conversion instructions for gfx1170 (#180191) 2026-02-09 13:56:43 +01:00
Simon Pilgrim
964651ad51
[X86] Allow handling of i128/256/512 SELECT on the FPU (#180197)
If the scalar integer selection sources are freely transferable to the
FPU, then splat to create an allbits select condition and create a
vector select instead
2026-02-09 10:34:02 +00:00
Petr Kurapov
27a8ab09fa
[AMDGPU] Fix V_INDIRECT_REG_READ_GPR_IDX expansion with immediate index (#179699)
The definition for V_INDIRECT_REG_READ_GPR_IDX_B32_V*'s SSrc_b32 operand
allows immediates, but the expansion logic handles only register cases
now. This can result in expansion failures when e.g.
llvm.amdgcn.wave.reduce.umin.i32 is folded into a constant and then used
as an insertelement idx.
2026-02-09 11:33:30 +01:00
Matt Arsenault
2ffb54364f
AMDGPU: Add a test for libcall simplify pow handling (#180491)
This case could be turned into powr or pown, so track which
case ends up preferred.
2026-02-09 10:01:26 +00:00
Gergo Stomfai
2298b8606d
[GISel] computeKnownBits - add CTLS handling (#178063)
Closes llvm/llvm-project#174370
2026-02-09 09:30:45 +00:00
Pierre van Houtryve
b79ba02479
[AMDGPU][GFX12.5] Reimplement monitor load as an atomic operation (#177343)
Load monitor operations make more sense as atomic operations, as
non-atomic operations cannot be used for inter-thread communication w/o
additional synchronization.
The previous built-in made it work because one could just override the
CPol bits, but that bypasses the memory model and forces the user to learn
about ISA bits encoding.

Making load monitor an atomic operation has a couple of advantages.
First, the memory model foundation for it is stronger. We just lean on the
existing rules for atomic operations. Second, the CPol bits are abstracted away
from the user, which avoids leaking ISA details into the API.

This patch also adds supporting memory model and intrinsics
documentation to AMDGPUUsage.

Solves SWDEV-516398.
2026-02-09 09:57:27 +01:00
Matt Arsenault
8554ed738f
AMDGPU: Add syntax for s_wait_event values (#180272)
Previously this would just print hex values. Print names for the
recognized values, matching the sp3 syntax.
2026-02-09 08:29:55 +00:00
Matt Arsenault
0c583e784e
AMDGPU: Add llvm.amdgcn.s.wait.event intrinsic (#180170)
Exactly match the s_wait_event instruction. For some reason we already
had this instruction used through llvm.amdgcn.s.wait.event.export.ready,
but that hardcodes a specific value. This should really be a bitmask
that
can combine multiple wait types.

gfx11 -> gfx12 broke compatabilty in a weird way, by inverting the
interpretation of the bit but also shifting the used bit by 1. Simplify
the selection of the old intrinsic by just using the magic number 2,
which should satisfy both cases.
2026-02-09 08:45:13 +01:00
Pengcheng Wang
972e73b812
[RISCV][CodeGen] Lower ISD::ABS to Zvabd instructions
We add pseudos/patterns for `vabs.v` instruction and handle the
lowering in `RISCVTargetLowering::lowerABS`.

Reviewers: topperc, 4vtomat, mshockwave, preames, lukel97, tclin914

Reviewed By: mshockwave

Pull Request: https://github.com/llvm/llvm-project/pull/180142
2026-02-09 15:21:25 +08:00
Pengcheng Wang
e992593341
[RISCV][CodeGen] Lower abds/abdu to Zvabd instructions
We directly lower `ISD::ABDS`/`ISD::ABDU` to `Zvabd` instructions.

Note that we only support SEW=8/16 for `vabd.vv`/`vabdu.vv`.

Reviewers: mshockwave, lukel97, topperc, preames, tclin914, 4vtomat

Reviewed By: lukel97, topperc

Pull Request: https://github.com/llvm/llvm-project/pull/180141
2026-02-09 15:12:22 +08:00
Pengcheng Wang
151fadecd1
[RISCV][MC] Support experimental Zvabd instructions
The `Zvabd` is for `RISC-V Integer Vector Absolute Difference` and
it provides 5 instructions:

* `vabs.v`: Vector Signed Integer Absolute.
* `vabd.vv`: Vector Signed Integer Absolute Difference.
* `vabdu.vv`: Vector Unsigned Integer Absolute Difference.
* `vwabda.vv`: Vector Signed Integer Absolute Difference And Accumulate.
* `vwabdau.vv`: Vector Unsigned Integer Absolute Difference And Accumulate.

Doc: https://github.com/riscv/integer-vector-absolute-difference

Reviewers: topperc, lukel97, preames, tclin914, asb, kito-cheng, mshockwave

Pull Request: https://github.com/llvm/llvm-project/pull/180139
2026-02-09 14:18:39 +08:00
JaydeepChauhan14
fad32ff3ea
[X86] Optimized ADC + ADD to ADC (#176713) 2026-02-09 11:43:56 +05:30
Jim Lin
84b5e9f8db
[RISCV] Add used callee-saved registers as implicit/implicit-def registers to save/restore call (#180133)
We should add used callee-saved registers as implicit used to save
libcall and as implicit defined to restore libcall. It likes what we did
for CM_PUSH/CM_POPRET. That can help to construct correct dataflow. In
entry bb, save libcall implicitly uses the callee-saved registers which
live in. And in return bb, restore libcall implicitly defines the
callee-saved registers which live out.
2026-02-09 10:00:32 +08:00
paperchalice
c53acf0443
[SelectionDAGBuilder] Remove NoNaNsFPMath uses (#169904)
Replaced by checking fast-math flags or value tracking results.
2026-02-09 09:48:07 +08:00
paperchalice
5c5677d7b8
[llvm] Remove "no-infs-fp-math" attribute support (#180083)
One of global options in `TargetMachine::resetTargetOptions`, now all
backends no longer support it, remove it.
2026-02-09 08:43:33 +08:00
Craig Topper
769b734c02
[RISCV] Combine ADDD with UMUL_LOHI/SMUL_LOHI into WMACCU/WMACC (#180383)
Combine the pattern:
  ADDD(addlo, addhi, UMUL_LOHI(x, y).0, UMUL_LOHI(x, y).1)
into:
  WMACCU(x, y, addlo, addhi)

And similarly for SMUL_LOHI -> WMACC.


This patch was written with AI, but I reviewed it carefully.
2026-02-08 13:39:32 -08:00
Craig Topper
a563e6bb7e
[RISCV] Add support for forming WMULSU during type legalization. (#180331)
Add a DAG combine to turn it into MULHSU if the lower half result
is unused.
2026-02-08 12:38:56 -08:00
Simon Pilgrim
ed9c18693b
[MIPS] musttail.ll - regenerate test checks (#180423) 2026-02-08 17:41:30 +00:00