llvm-project

Author	SHA1	Message	Date
Joe Nash	af95b0a615	[AMDGPU] Remove implicit super-reg defs on mov64 pseudos (#190379 ) The mov64 pseudo is split into two 32 bit movs, but those 32 bit movs had the full 64-bit register still implicitly defined. VOPD formation is affected, so we can emit more of them.	2026-04-06 21:11:06 +00:00
Chinmay Deshpande	9033e872fd	[AMDGPU][GISel] RegBankLegalize rules for update_dpp (#190662 )	2026-04-06 13:52:10 -07:00
Chinmay Deshpande	40d5a7d69e	[AMDGPU][UniformityAnalysis] Mark set_inactive and set_inactive_chain_arg as SourceOfDivergence (#190640 ) `set_inactive` produces a result that varies per-lane based on the EXEC mask, even when both inputs are uniform.	2026-04-06 12:40:22 -07:00
Chinmay Deshpande	12e957fd7f	[AMDGPU][GISel] RegBankLegalize rules for amdgcn_inverse_ballot (#190629 )	2026-04-06 10:30:35 -07:00
vangthao95	eb065bf028	AMDGPU/GlobalISel: RegBankLegalize rules for G_EXTRACT_VECTOR_ELT (#189144 )	2026-04-06 10:22:11 -07:00
Wooseok Lee	0bef4c7aab	[AMDGPU] Add v2i32 and/or patterns for VOP3 AND_OR and OR3 operations (#188375 ) Add ThreeOp_v2i32_Pats pattern class to support v2i32 vector operations for AND_OR_B32 and OR3_B32 instructions. The new patterns check the v2i32 and-or or or-or instruction sequence, extract individual 32-bit elements from v2i32 operands, and applies the and_or or or3 vop3 operations.	2026-04-06 16:54:21 +00:00
Domenic Nutile	5b33f85a08	[AMDGPU] Change isSingleLaneExecution to account for WWM enabling lanes even if there's only one workitem (#188316 ) This issue was discovered during some downstream work around Vulkan CTS tests, specifically `dEQP-VK.subgroups.arithmetic.compute.subgroupadd_float` --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2026-04-06 12:51:46 -04:00
Stanislav Mekhanoshin	e0908cd7a5	[AMDGPU] Specialize gfx1250 codegen tests for fake and real t16. NFC. (#190390 ) This is preparation of turning on real true16, so we can easily apply it or revert.	2026-04-04 01:55:18 -07:00
vangthao95	df1e67b379	AMDGPU/GlobalISel: RegBankLegalize rules for s_memtime, s_get_waveid (#190268 )	2026-04-03 09:46:56 -07:00
Lakreite	a44c15874d	[AMDGPU][CodeGen] Implement SimplifyDemandedBitsForTargetNode for readfirstlane. (#190009 ) Propagate demanded bits through readfirstlane intrinsic in AMDGPUISelLowering with SimplifyDemandedBitsForTargetNode implementation. This allows upstream zero/sign extensions to be eliminated when only a subset of bits is used after the intrinsic. Partially addresses #128390.	2026-04-03 14:30:47 +02:00
michaelselehov	df48719df3	[AMDGPU] Add !noalias metadata to mem-accessing calls w/o pointer args (#188949 ) addAliasScopeMetadata in AMDGPULowerKernelArguments skips instructions with empty PtrArgs, including memory-accessing calls that have no pointer arguments (e.g. builtins like threadIdx()). Because these calls never receive !noalias metadata, ScopedNoAliasAA cannot prove they don't alias noalias kernel arguments. MemorySSA then conservatively reports them as clobbers, which prevents AMDGPUAnnotateUniformValues from marking loads as noclobber, blocking scalarization (s_load) and forcing expensive vector loads (global_load) instead. Fix by adding all noalias kernel argument scopes to !noalias metadata for memory-accessing instructions with no pointer arguments. Since such instructions cannot access memory through any kernel pointer argument, all noalias scopes are safe to apply. This fixes a performance regression in rocFFT introduced by bd9668df0f00 ("[AMDGPU] Propagate alias information in AMDGPULowerKernelArguments"). Assisted-by: Claude Opus	2026-04-03 08:41:05 +02:00
Stanislav Mekhanoshin	7084f18f27	[AMDGPU] Fix i16/i8 flat store in true16 with sramecc (#190238 ) The pattern was guarded by the D16PreservesUnusedBits predicate which is not needed for stores.	2026-04-02 17:32:50 -07:00
Simon Pilgrim	8991ce9cff	[AMDGPU] Add basic clmul test coverage (#190205 )	2026-04-02 16:41:34 +00:00
zGoldthorpe	e9a62c7698	[DAG] `computeKnownFPClass`: handle `ISD::FABS` (#190069 ) Use `KnownFPClass::fabs` to handle `ISD::FABS`. This case will help with updating #188356 to use `computeKnownFPClass`.	2026-04-02 14:48:54 +00:00
Petar Avramovic	5226289b8e	Revert "AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3" (#190159 ) This reverts commit 47f6a19181b426baa03182ab6a7a41e16b35301d. Breaks MIOpen, don't have propper fix yet.	2026-04-02 14:05:08 +00:00
Gabriel Baraldi	5e0a06b34d	Move ExpandMemCmp and MergeIcmp to the middle end (#77370 ) Moving these into the middle-end pipeline will allow for additional optimization of the expansion result, such as CSE of redundant loads (c.f. https://godbolt.org/z/bEna4Md9r). For now, we conservatively place the passes at the end of the middle-end pipeline, so we mostly don't benefit from additional optimizations yet. The pipeline position will be moved in a future change. This builds on work done by legrosbuffle in https://reviews.llvm.org/D60318. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 09:57:00 +02:00
zGoldthorpe	9a354fc5a1	[SelectionDAG] Use `KnownBits` to determine if an operand may be NaN. (#188606 ) Given a bitcast into a fp type, use the known bits of the operand to infer whether the resulting value can never be NaN.	2026-04-01 22:47:01 -06:00
Stanislav Mekhanoshin	a9df7c7186	[AMDGPU] True16 support for bf16 clamp pattern on gfx1250 (#190036 )	2026-04-01 14:26:42 -07:00
Mirko Brkušanin	5d9eb0c76a	[AMDGPU] Define new targets gfx1171 and gfx1172 (#187735 )	2026-04-01 18:16:11 +02:00
LU-JOHN	c245d764b8	[CodeGen] Do not remove IMPLICIT_DEF unless all uses have undef flag added (#188133 ) Do not remove IMPLICIT_DEF of a physreg unless all uses have an undef flag added. Previously, only the first use instruction had undef flags added. This will cause a failure in machine instruction verification. Multi-instruction uses tested in AMDGPU/multi-use-implicit-def.mir and X86/multi-use-implicit-def.mir. --------- Signed-off-by: John Lu <John.Lu@amd.com>	2026-04-01 10:11:42 -05:00
Sergio Afonso	2cff995e91	[AMDGPU] Fix crash with dead frame indices in debug values (#183297 ) When spill slots are eliminated (VGPR-to-AGPR, SGPR-to-VGPR lanes), debug values referencing these frame indices were not always properly cleaned up. This caused an assertion failure in getObjectOffset() when PrologEpilogInserter tried to access the offset of a dead frame object. The existing debug fixup code in SIFrameLowering and SILowerSGPRSpills had two limitations: 1. It only checked one operand position, but DBG_VALUE_LIST instructions can have multiple debug operands with frame indices. 2. It didn't handle all types of dead frame indices uniformly. Fix by centralizing debug info cleanup in removeDeadFrameIndices(), which already knows all frame indices being removed. This iterates over all debug operands using MI.debug_operands(). Assisted-by: Claude Code.	2026-04-01 13:41:53 +01:00
Manuel Carrasco	ab4b689258	[AMDGPU][SIFoldOperands] Fix OR -1 fold (#189655 ) In SIFoldOperands, folding `or x, -1` to `v_mov_b32 -1` removed `Src1Idx`, which is incorrect because `-1` is in `Src0Idx` (after canonicalization). Closes https://github.com/llvm/llvm-project/issues/189677.	2026-04-01 13:37:37 +01:00
Daniil Fukalov	6bf794a02a	[AMDGPU] Disable generic DAG combines at -O0 to preserve debuggability. (#176304 ) Disable generic DAG combines for AMDGPU at -O0 via disableGenericCombines() to preserve instructions that users may want to set breakpoints on during debugging. Assisted-by: Cursor / Claude Opus 4.6	2026-04-01 11:55:17 +02:00
ambergorzynski	67d4842910	[NFC][AMDGPU] New test for untested case in SILowerSGPRSpills (#189426 ) [This case](`f380a878d5/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp (L343-L345)`) is not covered by any existing tests (checked using code coverage and by inserting an `abort` at that line). I propose a new test that tests this line. This is demonstrated by showing that it is the only test that fails in the presence of the `abort`.	2026-03-31 13:12:29 +01:00
Luke Lau	598f3535fa	[SelectionDAG] Expand CTTZ_ELTS[_ZERO_POISON] and handle legalization (#188691 ) This is a second attempt at "[SelectionDAG] Expand CTTZ_ELTS[_ZERO_POISON] and handle splitting" (#188220) That PR had to be reverted in 7d39664a6ae8daaf186b65578492244d96a50bf2 because we had crashes on AMDGPU since we didn't have scalarization support, and other crashes on PowerPC because we didn't handle the case when a vector needed widened. Tests for these are added in AMDGPU/cttz-elts.ll, RISCV/rvv/cttz-elts-scalarize.ll and PowerPC/cttz-elts.ll. The former crash has been fixed by adding DAGTypeLegalizer::ScalarizeVecOp_CTTZ_ELTS. The second crash has been fixed by reworking TargetLowering::expandCttzElts. The expansion for CTTZ_ELTS is nearly identical to VECTOR_FIND_LAST_ACTIVE, except it uses a reverse step vector and subtracts the result from VF. The easiest way to fix these crashes without introducing regressions is to reuse the VECTOR_FIND_LAST_ACTIVE expansion which already handles the case where the vector needs widened. This means that the node now needs to take in a boolean vector argument and uses VSELECT instead of an AND to zero out inactive lanes, so the op promotion code has also been shared.	2026-03-31 07:25:57 +00:00
Matt Arsenault	f48425edca	AMDGPU: Match fract pattern with swapped edge case check (#189081 ) A fract implementation can equivalently be written as r = fmin(x - floor(x)) r = isnan(x) ? x : r; r = isinf(x) ? 0.0 : r; or: r = fmin(x - floor(x)); r = isinf(x) ? 0.0 : r; r = isnan(x) ? x : r; Previously this only matched the previous form. Match the case where the isinf check is the inner clamp. There are a few more ways to write this pattern (e.g., move the clamp of infinity to the input) but I haven't encountered that in the wild. The existing code seems to be trying too hard to match noncanonical variants of the pattern. Only handles the result that all 4 permutations of compare and select produce out of instcombine.	2026-03-31 09:13:58 +02:00
vangthao95	b85492b3d3	AMDGPU/GlobalISel: RegBankLegalize rules for sudot4/sudot8 (#189104 )	2026-03-30 16:23:25 -07:00
Simon Pilgrim	d74f098a30	[DAG] isKnownNeverNaN - fallback to computeKnownFPClass check (#189476 ) Remove ConstantFPSDNode handling from isKnownNeverNaN and fallback to using computeKnownFPClass if there are no opcode matches in isKnownNeverNaN The test check changes are due to isKnownNeverNaN not handling UNDEF/POISON but computeKnownFPClass does (POISON in particular now returns isKnownNeverNaN == true, preventing a ISD::FCANONICALIZE call in expandFMINNUM_FMAXNUM).	2026-03-30 21:49:15 +00:00
Stanislav Mekhanoshin	5f99854d01	[AMDGPU] Drop A and B neg modifier from amdgcn_wmma_bf16_16x16x32_bf16 (#189468 ) Fixes: LCOMPILER-1673	2026-03-30 14:14:22 -07:00
Alexey Merzlyakov	06725d7ef5	[GISel] Keep non-negative info in SUB(CTLZ) (#189314 ) Implement non-negative value tracking for SUB-CTLZ chains in GlobalISel, matching the behavior previously added to SelectionDAG. Additionally, refactor the SelectionDAG implementation from the previous patch to improve performance and code density. Related to https://github.com/llvm/llvm-project/issues/136516 and https://github.com/llvm/llvm-project/pull/186338#discussion_r2980420174	2026-03-30 22:10:47 +02:00
Jeffrey Byrnes	7364203924	Reapply "[AMDGPU] Add HWUI pressure heuristics to coexec strategy (#184929 )" (#189121 ) Reland https://github.com/llvm/llvm-project/pull/184929 after fixing some issues in the NDEBUG builds. 3a640ee is unchanged from the previously approved PR, the unreviewed portion of this PR is 9cabd8d	2026-03-30 12:18:29 -07:00
vangthao95	ec6574e90e	AMDGPU/GlobalISel: RegBankLegalize rules for udot2/sdot2 (#189103 )	2026-03-30 10:43:05 -07:00
vangthao95	35a1961287	AMDGPU/GlobalISel: RegBankLegalize rules for dot products (#189110 )	2026-03-30 10:15:12 -07:00
vangthao95	2f0118895b	AMDGPU/GlobalISel: RegBankLegalize rules for ds_append/ds_consume (#189143 )	2026-03-30 09:57:57 -07:00
vangthao95	c32d670757	AMDGPU/GlobalISel: RegBankLegalize rules for ds_ordered_add/swap (#189137 )	2026-03-30 09:57:04 -07:00
vangthao95	27e3c43d74	AMDGPU/GlobalISel: RegBankLegalize rules for global_load_lds (#189135 )	2026-03-30 09:53:12 -07:00
vangthao95	f4d1745ab3	AMDGPU/GlobalISel: RegBankLegalize rules for lds_direct_load (#189134 )	2026-03-30 09:52:34 -07:00
Mariusz Sikora	6caec7ecdb	[AMDGPU] Add tanh tests for gfx13 (#188240 )	2026-03-30 14:30:04 +02:00
Matt Arsenault	c67475fb05	AMDGPU: Avoid using -march in tests (#189285 )	2026-03-29 21:31:59 +00:00
Matt Arsenault	2c41a8de9a	AMDGPU: Fix using -march in a couple tests (#189271 )	2026-03-29 18:42:28 +00:00
Folkert de Vries	73cddef788	optimize `is_finite` assembly (#169402 ) Fixes https://github.com/llvm/llvm-project/issues/169270 Changes the implementation of `is_finite` to emit fewer instructions, e.g. X86_64 ```asm old: # 18 bytes movd %xmm0, %eax andl $2147483647, %eax cmpl $2139095040, %eax setl %al retq new: # 15 bytes movd %xmm0, %eax addl %eax, %eax cmpl $-16777216, %eax setb %al retq ``` Aarch64 ```asm old: fmov w9, s0 mov w8, #2139095040 and w9, w9, #0x7fffffff cmp w9, w8 cset w0, lt ret new: fmov w8, s0 ubfx w8, w8, #23, #8 cmp w8, #255 cset w0, lo ret ``` See the issue for more information.	2026-03-29 14:28:07 +00:00
Ruiling, Song	c6fa976d5b	AMDGPU: Make VarIndex WeakTrackingVH in AMDGPUPromoteAlloca (#188921 ) The test used to look all good, but actually not. The WeakVH just make itself null after the pointed value being replaced. So a zero value was used because VarIndex become null. The test checks looks all good. Actually only the WeakTrackingVH have the ability to be updated to new value. Change the test slightly to make that using zero index is wrong.	2026-03-28 09:50:25 +08:00
Matt Arsenault	9be0cc173d	AMDGPU: Skip last corrections and scaling for afn llvm.sqrt.f64 (#183697 ) Device libs has a fast sqrt macro implemented this way.	2026-03-27 23:59:25 +00:00
Stanislav Mekhanoshin	a2d84b5d8d	[AMDGPU] Remove neg support from 4 more gfx1250 WMMA (#189115 ) These are previously covered by AMDGPUWmmaIntrinsicModsAllReuse.	2026-03-27 15:20:14 -07:00
Matt Arsenault	e825f42427	AMDGPU: Improve fsqrt f64 expansion with ninf (#183695 )	2026-03-27 22:25:32 +01:00
Jeffrey Byrnes	1c3018b3d6	Revert "[AMDGPU] Add HWUI pressure heuristics to coexec strategy" (#189107 ) Seems to be triggering some issues with the buildbots https://lab.llvm.org/buildbot/#/builders/159/builds/44122 Unused variable + bad debug build.	2026-03-27 13:48:49 -07:00
Jeffrey Byrnes	a9f5f93440	[AMDGPU] Add HWUI pressure heuristics to coexec strategy (#184929 ) Adds basic support for new heuristics for the CoExecSchedStrategy. InstructionFlavor provides a way to map instructions to different "Flavors". These "Flavors" all have special scheduling considerations -- either they map to different HarwareUnits, or have unique scheduling properties like fences. HardwareUnitInfo provides a way to track and analyze the usage of some hardware resource across the current scheduling region. CandidateHeuristics holds the state for new heuristics, as well as the implementations. In addition, this adds new heuristics to use the various support pieces listed above. tryCriticalResource attempts to schedule instructions that use the most demanded HardwareUnit. If no such instructions are ready to be scheduled, tryCriticalResourceDependency attempts to schedule instructions which enable instructions that use demanded HardwareUnits. We are incrementally adding the new heuristics. While in the process of this, the state of tryCandidateCoexec may not be great - as is the case after this PR.	2026-03-27 13:34:03 -07:00
Matt Arsenault	28f24b5029	AMDGPU: Add baseline tests for more fract patterns (#189092 )	2026-03-27 19:38:54 +00:00
Kewen Meng	a996f2a8db	Revert "AMDGPU: Fold frame indexes into disjoint s_or_b32" (#189074 ) Reverts llvm/llvm-project#102345 unblock bot: https://lab.llvm.org/buildbot/#/builders/10/builds/25403	2026-03-27 18:33:01 +00:00
vangthao95	87bec47152	AMDGPU/GlobalISel: RegBankLegalize rules for div_fmas/fixup/scale (#188305 )	2026-03-27 10:10:09 -07:00

1 2 3 4 5 ...

10281 Commits