llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	8991ce9cff	[AMDGPU] Add basic clmul test coverage (#190205 )	2026-04-02 16:41:34 +00:00
zGoldthorpe	e9a62c7698	[DAG] `computeKnownFPClass`: handle `ISD::FABS` (#190069 ) Use `KnownFPClass::fabs` to handle `ISD::FABS`. This case will help with updating #188356 to use `computeKnownFPClass`.	2026-04-02 14:48:54 +00:00
Petar Avramovic	5226289b8e	Revert "AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3" (#190159 ) This reverts commit 47f6a19181b426baa03182ab6a7a41e16b35301d. Breaks MIOpen, don't have propper fix yet.	2026-04-02 14:05:08 +00:00
Gabriel Baraldi	5e0a06b34d	Move ExpandMemCmp and MergeIcmp to the middle end (#77370 ) Moving these into the middle-end pipeline will allow for additional optimization of the expansion result, such as CSE of redundant loads (c.f. https://godbolt.org/z/bEna4Md9r). For now, we conservatively place the passes at the end of the middle-end pipeline, so we mostly don't benefit from additional optimizations yet. The pipeline position will be moved in a future change. This builds on work done by legrosbuffle in https://reviews.llvm.org/D60318. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 09:57:00 +02:00
zGoldthorpe	9a354fc5a1	[SelectionDAG] Use `KnownBits` to determine if an operand may be NaN. (#188606 ) Given a bitcast into a fp type, use the known bits of the operand to infer whether the resulting value can never be NaN.	2026-04-01 22:47:01 -06:00
Stanislav Mekhanoshin	a9df7c7186	[AMDGPU] True16 support for bf16 clamp pattern on gfx1250 (#190036 )	2026-04-01 14:26:42 -07:00
Mirko Brkušanin	5d9eb0c76a	[AMDGPU] Define new targets gfx1171 and gfx1172 (#187735 )	2026-04-01 18:16:11 +02:00
LU-JOHN	c245d764b8	[CodeGen] Do not remove IMPLICIT_DEF unless all uses have undef flag added (#188133 ) Do not remove IMPLICIT_DEF of a physreg unless all uses have an undef flag added. Previously, only the first use instruction had undef flags added. This will cause a failure in machine instruction verification. Multi-instruction uses tested in AMDGPU/multi-use-implicit-def.mir and X86/multi-use-implicit-def.mir. --------- Signed-off-by: John Lu <John.Lu@amd.com>	2026-04-01 10:11:42 -05:00
Sergio Afonso	2cff995e91	[AMDGPU] Fix crash with dead frame indices in debug values (#183297 ) When spill slots are eliminated (VGPR-to-AGPR, SGPR-to-VGPR lanes), debug values referencing these frame indices were not always properly cleaned up. This caused an assertion failure in getObjectOffset() when PrologEpilogInserter tried to access the offset of a dead frame object. The existing debug fixup code in SIFrameLowering and SILowerSGPRSpills had two limitations: 1. It only checked one operand position, but DBG_VALUE_LIST instructions can have multiple debug operands with frame indices. 2. It didn't handle all types of dead frame indices uniformly. Fix by centralizing debug info cleanup in removeDeadFrameIndices(), which already knows all frame indices being removed. This iterates over all debug operands using MI.debug_operands(). Assisted-by: Claude Code.	2026-04-01 13:41:53 +01:00
Manuel Carrasco	ab4b689258	[AMDGPU][SIFoldOperands] Fix OR -1 fold (#189655 ) In SIFoldOperands, folding `or x, -1` to `v_mov_b32 -1` removed `Src1Idx`, which is incorrect because `-1` is in `Src0Idx` (after canonicalization). Closes https://github.com/llvm/llvm-project/issues/189677.	2026-04-01 13:37:37 +01:00
Daniil Fukalov	6bf794a02a	[AMDGPU] Disable generic DAG combines at -O0 to preserve debuggability. (#176304 ) Disable generic DAG combines for AMDGPU at -O0 via disableGenericCombines() to preserve instructions that users may want to set breakpoints on during debugging. Assisted-by: Cursor / Claude Opus 4.6	2026-04-01 11:55:17 +02:00
ambergorzynski	67d4842910	[NFC][AMDGPU] New test for untested case in SILowerSGPRSpills (#189426 ) [This case](`f380a878d5/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp (L343-L345)`) is not covered by any existing tests (checked using code coverage and by inserting an `abort` at that line). I propose a new test that tests this line. This is demonstrated by showing that it is the only test that fails in the presence of the `abort`.	2026-03-31 13:12:29 +01:00
Luke Lau	598f3535fa	[SelectionDAG] Expand CTTZ_ELTS[_ZERO_POISON] and handle legalization (#188691 ) This is a second attempt at "[SelectionDAG] Expand CTTZ_ELTS[_ZERO_POISON] and handle splitting" (#188220) That PR had to be reverted in 7d39664a6ae8daaf186b65578492244d96a50bf2 because we had crashes on AMDGPU since we didn't have scalarization support, and other crashes on PowerPC because we didn't handle the case when a vector needed widened. Tests for these are added in AMDGPU/cttz-elts.ll, RISCV/rvv/cttz-elts-scalarize.ll and PowerPC/cttz-elts.ll. The former crash has been fixed by adding DAGTypeLegalizer::ScalarizeVecOp_CTTZ_ELTS. The second crash has been fixed by reworking TargetLowering::expandCttzElts. The expansion for CTTZ_ELTS is nearly identical to VECTOR_FIND_LAST_ACTIVE, except it uses a reverse step vector and subtracts the result from VF. The easiest way to fix these crashes without introducing regressions is to reuse the VECTOR_FIND_LAST_ACTIVE expansion which already handles the case where the vector needs widened. This means that the node now needs to take in a boolean vector argument and uses VSELECT instead of an AND to zero out inactive lanes, so the op promotion code has also been shared.	2026-03-31 07:25:57 +00:00
Matt Arsenault	f48425edca	AMDGPU: Match fract pattern with swapped edge case check (#189081 ) A fract implementation can equivalently be written as r = fmin(x - floor(x)) r = isnan(x) ? x : r; r = isinf(x) ? 0.0 : r; or: r = fmin(x - floor(x)); r = isinf(x) ? 0.0 : r; r = isnan(x) ? x : r; Previously this only matched the previous form. Match the case where the isinf check is the inner clamp. There are a few more ways to write this pattern (e.g., move the clamp of infinity to the input) but I haven't encountered that in the wild. The existing code seems to be trying too hard to match noncanonical variants of the pattern. Only handles the result that all 4 permutations of compare and select produce out of instcombine.	2026-03-31 09:13:58 +02:00
vangthao95	b85492b3d3	AMDGPU/GlobalISel: RegBankLegalize rules for sudot4/sudot8 (#189104 )	2026-03-30 16:23:25 -07:00
Simon Pilgrim	d74f098a30	[DAG] isKnownNeverNaN - fallback to computeKnownFPClass check (#189476 ) Remove ConstantFPSDNode handling from isKnownNeverNaN and fallback to using computeKnownFPClass if there are no opcode matches in isKnownNeverNaN The test check changes are due to isKnownNeverNaN not handling UNDEF/POISON but computeKnownFPClass does (POISON in particular now returns isKnownNeverNaN == true, preventing a ISD::FCANONICALIZE call in expandFMINNUM_FMAXNUM).	2026-03-30 21:49:15 +00:00
Stanislav Mekhanoshin	5f99854d01	[AMDGPU] Drop A and B neg modifier from amdgcn_wmma_bf16_16x16x32_bf16 (#189468 ) Fixes: LCOMPILER-1673	2026-03-30 14:14:22 -07:00
Alexey Merzlyakov	06725d7ef5	[GISel] Keep non-negative info in SUB(CTLZ) (#189314 ) Implement non-negative value tracking for SUB-CTLZ chains in GlobalISel, matching the behavior previously added to SelectionDAG. Additionally, refactor the SelectionDAG implementation from the previous patch to improve performance and code density. Related to https://github.com/llvm/llvm-project/issues/136516 and https://github.com/llvm/llvm-project/pull/186338#discussion_r2980420174	2026-03-30 22:10:47 +02:00
Jeffrey Byrnes	7364203924	Reapply "[AMDGPU] Add HWUI pressure heuristics to coexec strategy (#184929 )" (#189121 ) Reland https://github.com/llvm/llvm-project/pull/184929 after fixing some issues in the NDEBUG builds. 3a640ee is unchanged from the previously approved PR, the unreviewed portion of this PR is 9cabd8d	2026-03-30 12:18:29 -07:00
vangthao95	ec6574e90e	AMDGPU/GlobalISel: RegBankLegalize rules for udot2/sdot2 (#189103 )	2026-03-30 10:43:05 -07:00
vangthao95	35a1961287	AMDGPU/GlobalISel: RegBankLegalize rules for dot products (#189110 )	2026-03-30 10:15:12 -07:00
vangthao95	2f0118895b	AMDGPU/GlobalISel: RegBankLegalize rules for ds_append/ds_consume (#189143 )	2026-03-30 09:57:57 -07:00
vangthao95	c32d670757	AMDGPU/GlobalISel: RegBankLegalize rules for ds_ordered_add/swap (#189137 )	2026-03-30 09:57:04 -07:00
vangthao95	27e3c43d74	AMDGPU/GlobalISel: RegBankLegalize rules for global_load_lds (#189135 )	2026-03-30 09:53:12 -07:00
vangthao95	f4d1745ab3	AMDGPU/GlobalISel: RegBankLegalize rules for lds_direct_load (#189134 )	2026-03-30 09:52:34 -07:00
Mariusz Sikora	6caec7ecdb	[AMDGPU] Add tanh tests for gfx13 (#188240 )	2026-03-30 14:30:04 +02:00
Matt Arsenault	c67475fb05	AMDGPU: Avoid using -march in tests (#189285 )	2026-03-29 21:31:59 +00:00
Matt Arsenault	2c41a8de9a	AMDGPU: Fix using -march in a couple tests (#189271 )	2026-03-29 18:42:28 +00:00
Folkert de Vries	73cddef788	optimize `is_finite` assembly (#169402 ) Fixes https://github.com/llvm/llvm-project/issues/169270 Changes the implementation of `is_finite` to emit fewer instructions, e.g. X86_64 ```asm old: # 18 bytes movd %xmm0, %eax andl $2147483647, %eax cmpl $2139095040, %eax setl %al retq new: # 15 bytes movd %xmm0, %eax addl %eax, %eax cmpl $-16777216, %eax setb %al retq ``` Aarch64 ```asm old: fmov w9, s0 mov w8, #2139095040 and w9, w9, #0x7fffffff cmp w9, w8 cset w0, lt ret new: fmov w8, s0 ubfx w8, w8, #23, #8 cmp w8, #255 cset w0, lo ret ``` See the issue for more information.	2026-03-29 14:28:07 +00:00
Ruiling, Song	c6fa976d5b	AMDGPU: Make VarIndex WeakTrackingVH in AMDGPUPromoteAlloca (#188921 ) The test used to look all good, but actually not. The WeakVH just make itself null after the pointed value being replaced. So a zero value was used because VarIndex become null. The test checks looks all good. Actually only the WeakTrackingVH have the ability to be updated to new value. Change the test slightly to make that using zero index is wrong.	2026-03-28 09:50:25 +08:00
Matt Arsenault	9be0cc173d	AMDGPU: Skip last corrections and scaling for afn llvm.sqrt.f64 (#183697 ) Device libs has a fast sqrt macro implemented this way.	2026-03-27 23:59:25 +00:00
Stanislav Mekhanoshin	a2d84b5d8d	[AMDGPU] Remove neg support from 4 more gfx1250 WMMA (#189115 ) These are previously covered by AMDGPUWmmaIntrinsicModsAllReuse.	2026-03-27 15:20:14 -07:00
Matt Arsenault	e825f42427	AMDGPU: Improve fsqrt f64 expansion with ninf (#183695 )	2026-03-27 22:25:32 +01:00
Jeffrey Byrnes	1c3018b3d6	Revert "[AMDGPU] Add HWUI pressure heuristics to coexec strategy" (#189107 ) Seems to be triggering some issues with the buildbots https://lab.llvm.org/buildbot/#/builders/159/builds/44122 Unused variable + bad debug build.	2026-03-27 13:48:49 -07:00
Jeffrey Byrnes	a9f5f93440	[AMDGPU] Add HWUI pressure heuristics to coexec strategy (#184929 ) Adds basic support for new heuristics for the CoExecSchedStrategy. InstructionFlavor provides a way to map instructions to different "Flavors". These "Flavors" all have special scheduling considerations -- either they map to different HarwareUnits, or have unique scheduling properties like fences. HardwareUnitInfo provides a way to track and analyze the usage of some hardware resource across the current scheduling region. CandidateHeuristics holds the state for new heuristics, as well as the implementations. In addition, this adds new heuristics to use the various support pieces listed above. tryCriticalResource attempts to schedule instructions that use the most demanded HardwareUnit. If no such instructions are ready to be scheduled, tryCriticalResourceDependency attempts to schedule instructions which enable instructions that use demanded HardwareUnits. We are incrementally adding the new heuristics. While in the process of this, the state of tryCandidateCoexec may not be great - as is the case after this PR.	2026-03-27 13:34:03 -07:00
Matt Arsenault	28f24b5029	AMDGPU: Add baseline tests for more fract patterns (#189092 )	2026-03-27 19:38:54 +00:00
Kewen Meng	a996f2a8db	Revert "AMDGPU: Fold frame indexes into disjoint s_or_b32" (#189074 ) Reverts llvm/llvm-project#102345 unblock bot: https://lab.llvm.org/buildbot/#/builders/10/builds/25403	2026-03-27 18:33:01 +00:00
vangthao95	87bec47152	AMDGPU/GlobalISel: RegBankLegalize rules for div_fmas/fixup/scale (#188305 )	2026-03-27 10:10:09 -07:00
Marina Taylor	55322f2d43	[ObjCARC] Run ObjCARCContract before PreISelIntrinsicLowering (#184149 ) 74e4694 moved ObjCARCContract from running before the codegen pipeline into addISelPrepare(), which runs after PreISelIntrinsicLowering. This broke ObjCARCContract's retainRV-to-claimRV optimization because ObjCARCContract identifies ARC calls via intrinsics, not their lowered counterparts. This patch restores the pre-74e4694 ordering by moving ObjCARCContract to addISelPasses. The IntrinsicInst.cpp change looks extraneous but is required here: ObjCARCContract may now rewrite the bundle operand from retainRV to claimRV. When PreISelIntrinsicLowering then encounters this new intrinsic use, lowerObjCCall asserts mayLowerToFunctionCall. Assisted-by: claude rdar://137997453	2026-03-27 15:37:47 +00:00
Matt Arsenault	dba3de54a2	AMDGPU: Allow poison vector elts in fract pattern (#188991 )	2026-03-27 13:59:28 +00:00
Matt Arsenault	fc2dac83ed	AMDGPU: Fold frame indexes into disjoint s_or_b32 (#102345 ) Some pointer adds get turned into ors, and sometimes and is performed on pointers for masking.	2026-03-27 13:13:48 +01:00
Osama Abdelkader	0959a2a4bd	Enable generic overlapping optimization for memmove (#177885 ) Fixes: #165948	2026-03-27 07:22:05 +00:00
Anshil Gandhi	3833f03054	[AMDGPU][GlobalISel] Add RegBankLegalize rules for amdgcn_perm intrinsic (#187798 ) Add uniform and divergent register bank legalization rules for the amdgcn_perm intrinsic (v_perm_b32). Since this is a VALU-only instruction, the uniform case maps the destination to UniInVgprB32 and all source operands to VgprB32.	2026-03-27 00:03:32 +00:00
Anshil Gandhi	966d96942a	[AMDGPU][GlobalISel] Add RegBankLegalize rules for amdgcn_permlane64 (#187840 ) Add register bank legalization rules for the amdgcn_permlane64 intrinsic in the new RegBankLegalize framework. After GISel legalization, permlane64 always operates on S32 — sub-32-bit types are anyext'd to S32 and types wider than 32 bits are split into S32 parts by legalizeLaneOp. Add rules for B32 type. Also enable -new-reg-bank-select in the permlane64 lit test and update affected check lines.	2026-03-26 23:43:41 +00:00
vangthao95	b9b87dd796	AMDGPU/GlobalISel: RegBankLegalize rules for buffer atomics (#187550 ) Add RegBankLegalize rules for the buffer atomics and/xor/or/inc/dec.	2026-03-26 16:28:37 -07:00
Matt Arsenault	67ea4de3c6	AMDGPU: Regenerate test checks (#188862 )	2026-03-26 22:48:52 +00:00
vangthao95	29886a1494	AMDGPU/GlobalISel: RegBankLegalize rules for ds_permute (#188266 )	2026-03-26 15:24:50 -07:00
Changpeng Fang	df71894094	[AMDGPU] Do not overlap dst with srcs for v_cvt_scalef32_2xpk16_fp6/bf6_f32 (#188809 ) v_cvt_scalef32_2xpk16_fp6_f32 and v_cvt_scalef32_2xpk16_bf6_f32, as multipass instructions, the destination operand must not overlap with any of the source operands. In this work, we apply Constraints = "@earlyclobber $vdst" to these two instructions. Fixes: LCCOMPILER-561	2026-03-26 14:38:22 -07:00
134ARG	331c1c0b84	[ValueTracking] Refine SIToFP/UIToFP FPClass inference with KnownBits (#187185 ) This patch propagates the KnownBits of the source integer to improve floating-point class inference for sitofp and uitofp instructions. Specifically, 1. The result is never -0.0. 2. The result is not +0.0 if the source integer is known non-zero. 3. The result is not negative if the source integer is known non-negative (or for uitofp). 4. The result is not Infinity if the largest possible integer magnitude fits within the target FP type's exponent limits. alive2 results for added testcases: testcase 1: https://alive2.llvm.org/ce/z/eM34LB testcase 2: https://alive2.llvm.org/ce/z/ext7XF testcase 3: https://alive2.llvm.org/ce/z/g8yb6q testcase 4: https://alive2.llvm.org/ce/z/cyFYRy testcase 5: https://alive2.llvm.org/ce/z/LePFrm alive2 for updated testcase in binop-itofp: updated 1: https://alive2.llvm.org/ce/z/KPQ5bZ udpated 2: https://alive2.llvm.org/ce/z/bGf43t updated 3: https://alive2.llvm.org/ce/z/YKnCwU updaetd 4: https://alive2.llvm.org/ce/z/mqKaq- updated 5: https://alive2.llvm.org/ce/z/jYSAB5 Fix #186952	2026-03-26 18:14:58 +01:00
Syadus Sefat	5f5f330ee4	[AMDGPU][GlobalIsel] Add register bank legalization rules for amdgcn_interp_inreg (#187248 ) This patch adds register bank legalization rules for amdgcn_interp_inreg operations in the AMDGPU GlobalISel pipeline.	2026-03-26 11:55:17 -05:00

1 2 3 4 5 ...

10269 Commits