10269 Commits

Author SHA1 Message Date
Simon Pilgrim
8991ce9cff
[AMDGPU] Add basic clmul test coverage (#190205) 2026-04-02 16:41:34 +00:00
zGoldthorpe
e9a62c7698
[DAG] computeKnownFPClass: handle ISD::FABS (#190069)
Use `KnownFPClass::fabs` to handle `ISD::FABS`.

This case will help with updating #188356 to use `computeKnownFPClass`.
2026-04-02 14:48:54 +00:00
Petar Avramovic
5226289b8e
Revert "AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3" (#190159)
This reverts commit 47f6a19181b426baa03182ab6a7a41e16b35301d.
Breaks MIOpen, don't have propper fix yet.
2026-04-02 14:05:08 +00:00
Gabriel Baraldi
5e0a06b34d
Move ExpandMemCmp and MergeIcmp to the middle end (#77370)
Moving these into the middle-end pipeline will allow for additional
optimization of the expansion result, such as CSE of redundant loads
(c.f. https://godbolt.org/z/bEna4Md9r). For now, we conservatively place
the passes at the end of the middle-end pipeline, so we mostly don't
benefit from additional optimizations yet. The pipeline position will be
moved in a future change.

This builds on work done by legrosbuffle in
https://reviews.llvm.org/D60318.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 09:57:00 +02:00
zGoldthorpe
9a354fc5a1
[SelectionDAG] Use KnownBits to determine if an operand may be NaN. (#188606)
Given a bitcast into a fp type, use the known bits of the operand to
infer whether the resulting value can never be NaN.
2026-04-01 22:47:01 -06:00
Stanislav Mekhanoshin
a9df7c7186
[AMDGPU] True16 support for bf16 clamp pattern on gfx1250 (#190036) 2026-04-01 14:26:42 -07:00
Mirko Brkušanin
5d9eb0c76a
[AMDGPU] Define new targets gfx1171 and gfx1172 (#187735) 2026-04-01 18:16:11 +02:00
LU-JOHN
c245d764b8
[CodeGen] Do not remove IMPLICIT_DEF unless all uses have undef flag added (#188133)
Do not remove IMPLICIT_DEF of a physreg unless all uses have an undef
flag added. Previously, only the first use instruction had undef flags
added. This will cause a failure in machine instruction verification.
Multi-instruction uses tested in AMDGPU/multi-use-implicit-def.mir and
X86/multi-use-implicit-def.mir.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2026-04-01 10:11:42 -05:00
Sergio Afonso
2cff995e91
[AMDGPU] Fix crash with dead frame indices in debug values (#183297)
When spill slots are eliminated (VGPR-to-AGPR, SGPR-to-VGPR lanes),
debug values referencing these frame indices were not always properly
cleaned up. This caused an assertion failure in getObjectOffset() when
PrologEpilogInserter tried to access the offset of a dead frame object.

The existing debug fixup code in SIFrameLowering and SILowerSGPRSpills
had two limitations:
1. It only checked one operand position, but DBG_VALUE_LIST instructions
can have multiple debug operands with frame indices.
2. It didn't handle all types of dead frame indices uniformly.

Fix by centralizing debug info cleanup in removeDeadFrameIndices(),
which already knows all frame indices being removed. This iterates over
all debug operands using MI.debug_operands().

Assisted-by: Claude Code.
2026-04-01 13:41:53 +01:00
Manuel Carrasco
ab4b689258
[AMDGPU][SIFoldOperands] Fix OR -1 fold (#189655)
In SIFoldOperands, folding `or x, -1` to `v_mov_b32 -1` removed
`Src1Idx`, which is incorrect because `-1` is in `Src0Idx` (after
canonicalization).

Closes https://github.com/llvm/llvm-project/issues/189677.
2026-04-01 13:37:37 +01:00
Daniil Fukalov
6bf794a02a
[AMDGPU] Disable generic DAG combines at -O0 to preserve debuggability. (#176304)
Disable generic DAG combines for AMDGPU at -O0 via
disableGenericCombines() to preserve instructions that users may want to
set breakpoints on during debugging.

Assisted-by: Cursor / Claude Opus 4.6
2026-04-01 11:55:17 +02:00
ambergorzynski
67d4842910
[NFC][AMDGPU] New test for untested case in SILowerSGPRSpills (#189426)
[This
case](f380a878d5/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp (L343-L345))
is not covered by any existing tests (checked using code coverage and by
inserting an `abort` at that line). I propose a new test that tests this
line.

This is demonstrated by showing that it is the only test that fails in
the presence of the `abort`.
2026-03-31 13:12:29 +01:00
Luke Lau
598f3535fa
[SelectionDAG] Expand CTTZ_ELTS[_ZERO_POISON] and handle legalization (#188691)
This is a second attempt at "[SelectionDAG] Expand
CTTZ_ELTS[_ZERO_POISON] and handle splitting" (#188220)

That PR had to be reverted in 7d39664a6ae8daaf186b65578492244d96a50bf2
because we had crashes on AMDGPU since we didn't have scalarization
support, and other crashes on PowerPC because we didn't handle the case
when a vector needed widened. Tests for these are added in
AMDGPU/cttz-elts.ll, RISCV/rvv/cttz-elts-scalarize.ll and
PowerPC/cttz-elts.ll.

The former crash has been fixed by adding
DAGTypeLegalizer::ScalarizeVecOp_CTTZ_ELTS.

The second crash has been fixed by reworking
TargetLowering::expandCttzElts. The expansion for CTTZ_ELTS is nearly
identical to VECTOR_FIND_LAST_ACTIVE, except it uses a reverse step
vector and subtracts the result from VF. The easiest way to fix these
crashes without introducing regressions is to reuse the
VECTOR_FIND_LAST_ACTIVE expansion which already handles the case where
the vector needs widened.

This means that the node now needs to take in a boolean vector argument
and uses VSELECT instead of an AND to zero out inactive lanes, so the op
promotion code has also been shared.
2026-03-31 07:25:57 +00:00
Matt Arsenault
f48425edca
AMDGPU: Match fract pattern with swapped edge case check (#189081)
A fract implementation can equivalently be written as
  r = fmin(x - floor(x))
  r = isnan(x) ? x : r;
  r = isinf(x) ? 0.0 : r;

or:
  r = fmin(x - floor(x));
  r = isinf(x) ? 0.0 : r;
  r = isnan(x) ? x : r;

Previously this only matched the previous form. Match
the case where the isinf check is the inner clamp. There are
a few more ways to write this pattern (e.g., move the clamp of
infinity to the input) but I haven't encountered that in the wild.

The existing code seems to be trying too hard to match noncanonical
variants of the pattern. Only handles the result that all 4 permutations
of compare and select produce out of instcombine.
2026-03-31 09:13:58 +02:00
vangthao95
b85492b3d3
AMDGPU/GlobalISel: RegBankLegalize rules for sudot4/sudot8 (#189104) 2026-03-30 16:23:25 -07:00
Simon Pilgrim
d74f098a30
[DAG] isKnownNeverNaN - fallback to computeKnownFPClass check (#189476)
Remove ConstantFPSDNode handling from isKnownNeverNaN and fallback to
using computeKnownFPClass if there are no opcode matches in
isKnownNeverNaN

The test check changes are due to isKnownNeverNaN not handling
UNDEF/POISON but computeKnownFPClass does (POISON in particular now
returns isKnownNeverNaN == true, preventing a ISD::FCANONICALIZE call in
expandFMINNUM_FMAXNUM).
2026-03-30 21:49:15 +00:00
Stanislav Mekhanoshin
5f99854d01
[AMDGPU] Drop A and B neg modifier from amdgcn_wmma_bf16_16x16x32_bf16 (#189468)
Fixes: LCOMPILER-1673
2026-03-30 14:14:22 -07:00
Alexey Merzlyakov
06725d7ef5
[GISel] Keep non-negative info in SUB(CTLZ) (#189314)
Implement non-negative value tracking for SUB-CTLZ chains in GlobalISel,
matching the behavior previously added to SelectionDAG.

Additionally, refactor the SelectionDAG implementation from the previous
patch to improve performance and code density.

Related to https://github.com/llvm/llvm-project/issues/136516 and
https://github.com/llvm/llvm-project/pull/186338#discussion_r2980420174
2026-03-30 22:10:47 +02:00
Jeffrey Byrnes
7364203924
Reapply "[AMDGPU] Add HWUI pressure heuristics to coexec strategy (#184929)" (#189121)
Reland https://github.com/llvm/llvm-project/pull/184929 after fixing
some issues in the NDEBUG builds.

3a640ee is unchanged from the previously approved PR, the unreviewed
portion of this PR is 9cabd8d
2026-03-30 12:18:29 -07:00
vangthao95
ec6574e90e
AMDGPU/GlobalISel: RegBankLegalize rules for udot2/sdot2 (#189103) 2026-03-30 10:43:05 -07:00
vangthao95
35a1961287
AMDGPU/GlobalISel: RegBankLegalize rules for dot products (#189110) 2026-03-30 10:15:12 -07:00
vangthao95
2f0118895b
AMDGPU/GlobalISel: RegBankLegalize rules for ds_append/ds_consume (#189143) 2026-03-30 09:57:57 -07:00
vangthao95
c32d670757
AMDGPU/GlobalISel: RegBankLegalize rules for ds_ordered_add/swap (#189137) 2026-03-30 09:57:04 -07:00
vangthao95
27e3c43d74
AMDGPU/GlobalISel: RegBankLegalize rules for global_load_lds (#189135) 2026-03-30 09:53:12 -07:00
vangthao95
f4d1745ab3
AMDGPU/GlobalISel: RegBankLegalize rules for lds_direct_load (#189134) 2026-03-30 09:52:34 -07:00
Mariusz Sikora
6caec7ecdb
[AMDGPU] Add tanh tests for gfx13 (#188240) 2026-03-30 14:30:04 +02:00
Matt Arsenault
c67475fb05
AMDGPU: Avoid using -march in tests (#189285) 2026-03-29 21:31:59 +00:00
Matt Arsenault
2c41a8de9a
AMDGPU: Fix using -march in a couple tests (#189271) 2026-03-29 18:42:28 +00:00
Folkert de Vries
73cddef788
optimize is_finite assembly (#169402)
Fixes https://github.com/llvm/llvm-project/issues/169270

Changes the implementation of `is_finite` to emit fewer instructions,
e.g.

X86_64

```asm
old: # 18 bytes
        movd    %xmm0, %eax
        andl    $2147483647, %eax
        cmpl    $2139095040, %eax
        setl    %al
        retq
new: # 15 bytes
        movd    %xmm0, %eax
        addl    %eax, %eax
        cmpl    $-16777216, %eax
        setb    %al
        retq
```

Aarch64

```asm
old:
        fmov    w9, s0
        mov     w8, #2139095040
        and     w9, w9, #0x7fffffff
        cmp     w9, w8
        cset    w0, lt
        ret
new:
        fmov    w8, s0
        ubfx    w8, w8, #23, #8
        cmp     w8, #255
        cset    w0, lo
        ret
```

See the issue for more information.
2026-03-29 14:28:07 +00:00
Ruiling, Song
c6fa976d5b
AMDGPU: Make VarIndex WeakTrackingVH in AMDGPUPromoteAlloca (#188921)
The test used to look all good, but actually not. The WeakVH just make
itself null after the pointed value being replaced. So a zero value was
used because VarIndex become null. The test checks looks all good.

Actually only the WeakTrackingVH have the ability to be updated to new
value.

Change the test slightly to make that using zero index is wrong.
2026-03-28 09:50:25 +08:00
Matt Arsenault
9be0cc173d
AMDGPU: Skip last corrections and scaling for afn llvm.sqrt.f64 (#183697)
Device libs has a fast sqrt macro implemented this way.
2026-03-27 23:59:25 +00:00
Stanislav Mekhanoshin
a2d84b5d8d
[AMDGPU] Remove neg support from 4 more gfx1250 WMMA (#189115)
These are previously covered by AMDGPUWmmaIntrinsicModsAllReuse.
2026-03-27 15:20:14 -07:00
Matt Arsenault
e825f42427
AMDGPU: Improve fsqrt f64 expansion with ninf (#183695) 2026-03-27 22:25:32 +01:00
Jeffrey Byrnes
1c3018b3d6
Revert "[AMDGPU] Add HWUI pressure heuristics to coexec strategy" (#189107)
Seems to be triggering some issues with the buildbots

https://lab.llvm.org/buildbot/#/builders/159/builds/44122

Unused variable + bad debug build.
2026-03-27 13:48:49 -07:00
Jeffrey Byrnes
a9f5f93440
[AMDGPU] Add HWUI pressure heuristics to coexec strategy (#184929)
Adds basic support for new heuristics for the CoExecSchedStrategy.

InstructionFlavor provides a way to map instructions to different
"Flavors". These "Flavors" all have special scheduling considerations --
either they map to different HarwareUnits, or have unique scheduling
properties like fences.

HardwareUnitInfo provides a way to track and analyze the usage of some
hardware resource across the current scheduling region.

CandidateHeuristics holds the state for new heuristics, as well as the
implementations.

In addition, this adds new heuristics to use the various support pieces
listed above. tryCriticalResource attempts to schedule instructions that
use the most demanded HardwareUnit. If no such instructions are ready to
be scheduled, tryCriticalResourceDependency attempts to schedule
instructions which enable instructions that use demanded HardwareUnits.

We are incrementally adding the new heuristics. While in the process of
this, the state of tryCandidateCoexec may not be great - as is the case
after this PR.
2026-03-27 13:34:03 -07:00
Matt Arsenault
28f24b5029
AMDGPU: Add baseline tests for more fract patterns (#189092) 2026-03-27 19:38:54 +00:00
Kewen Meng
a996f2a8db
Revert "AMDGPU: Fold frame indexes into disjoint s_or_b32" (#189074)
Reverts llvm/llvm-project#102345

unblock bot: https://lab.llvm.org/buildbot/#/builders/10/builds/25403
2026-03-27 18:33:01 +00:00
vangthao95
87bec47152
AMDGPU/GlobalISel: RegBankLegalize rules for div_fmas/fixup/scale (#188305) 2026-03-27 10:10:09 -07:00
Marina Taylor
55322f2d43
[ObjCARC] Run ObjCARCContract before PreISelIntrinsicLowering (#184149)
74e4694 moved ObjCARCContract from running before the codegen pipeline
into addISelPrepare(), which runs after PreISelIntrinsicLowering.

This broke ObjCARCContract's retainRV-to-claimRV optimization because
ObjCARCContract identifies ARC calls via intrinsics, not their lowered
counterparts.

This patch restores the pre-74e4694 ordering by moving ObjCARCContract
to addISelPasses.

The IntrinsicInst.cpp change looks extraneous but is required here:
ObjCARCContract may now rewrite the bundle operand from retainRV to
claimRV. When PreISelIntrinsicLowering then encounters this new
intrinsic use, lowerObjCCall asserts mayLowerToFunctionCall.

Assisted-by: claude

rdar://137997453
2026-03-27 15:37:47 +00:00
Matt Arsenault
dba3de54a2
AMDGPU: Allow poison vector elts in fract pattern (#188991) 2026-03-27 13:59:28 +00:00
Matt Arsenault
fc2dac83ed
AMDGPU: Fold frame indexes into disjoint s_or_b32 (#102345)
Some pointer adds get turned into ors, and sometimes and is
performed on pointers for masking.
2026-03-27 13:13:48 +01:00
Osama Abdelkader
0959a2a4bd
Enable generic overlapping optimization for memmove (#177885)
Fixes: #165948
2026-03-27 07:22:05 +00:00
Anshil Gandhi
3833f03054
[AMDGPU][GlobalISel] Add RegBankLegalize rules for amdgcn_perm intrinsic (#187798)
Add uniform and divergent register bank legalization rules for the amdgcn_perm intrinsic (v_perm_b32). Since this is a VALU-only instruction, the uniform case maps the destination to UniInVgprB32 and all source operands to VgprB32.
2026-03-27 00:03:32 +00:00
Anshil Gandhi
966d96942a
[AMDGPU][GlobalISel] Add RegBankLegalize rules for amdgcn_permlane64 (#187840)
Add register bank legalization rules for the amdgcn_permlane64 intrinsic
in the new RegBankLegalize framework.

After GISel legalization, permlane64 always operates on S32 — sub-32-bit
types are anyext'd to S32 and types wider than 32 bits are split into
S32 parts by legalizeLaneOp. Add rules for B32 type.

Also enable -new-reg-bank-select in the permlane64 lit test and update
affected check lines.
2026-03-26 23:43:41 +00:00
vangthao95
b9b87dd796
AMDGPU/GlobalISel: RegBankLegalize rules for buffer atomics (#187550)
Add RegBankLegalize rules for the buffer atomics and/xor/or/inc/dec.
2026-03-26 16:28:37 -07:00
Matt Arsenault
67ea4de3c6
AMDGPU: Regenerate test checks (#188862) 2026-03-26 22:48:52 +00:00
vangthao95
29886a1494
AMDGPU/GlobalISel: RegBankLegalize rules for ds_permute (#188266) 2026-03-26 15:24:50 -07:00
Changpeng Fang
df71894094
[AMDGPU] Do not overlap dst with srcs for v_cvt_scalef32_2xpk16_fp6/bf6_f32 (#188809)
v_cvt_scalef32_2xpk16_fp6_f32 and v_cvt_scalef32_2xpk16_bf6_f32, as multipass instructions,
the destination operand must not overlap with any of the source operands.
In this work, we apply Constraints = "@earlyclobber $vdst" to these two instructions.

Fixes: LCCOMPILER-561
2026-03-26 14:38:22 -07:00
134ARG
331c1c0b84
[ValueTracking] Refine SIToFP/UIToFP FPClass inference with KnownBits (#187185)
This patch propagates the KnownBits of the source integer to improve
floating-point class inference for sitofp and uitofp instructions.

Specifically,
1. The result is never -0.0.
2. The result is not +0.0 if the source integer is known non-zero.
3. The result is not negative if the source integer is known
non-negative (or for uitofp).
4. The result is not Infinity if the largest possible integer magnitude
fits within the target FP type's exponent limits.

alive2 results for added testcases:
testcase 1: https://alive2.llvm.org/ce/z/eM34LB
testcase 2: https://alive2.llvm.org/ce/z/ext7XF 
testcase 3: https://alive2.llvm.org/ce/z/g8yb6q
testcase 4: https://alive2.llvm.org/ce/z/cyFYRy
testcase 5: https://alive2.llvm.org/ce/z/LePFrm

alive2 for updated testcase in binop-itofp:

updated 1: https://alive2.llvm.org/ce/z/KPQ5bZ
udpated 2: https://alive2.llvm.org/ce/z/bGf43t
updated 3: https://alive2.llvm.org/ce/z/YKnCwU
updaetd 4: https://alive2.llvm.org/ce/z/mqKaq-
updated 5: https://alive2.llvm.org/ce/z/jYSAB5

Fix #186952
2026-03-26 18:14:58 +01:00
Syadus Sefat
5f5f330ee4
[AMDGPU][GlobalIsel] Add register bank legalization rules for amdgcn_interp_inreg (#187248)
This patch adds register bank legalization rules for amdgcn_interp_inreg
operations in the AMDGPU GlobalISel pipeline.
2026-03-26 11:55:17 -05:00