llvm-project

Author	SHA1	Message	Date
Jay Foad	fb49adb6ea	[AMDGPU] Another test for missing S_WAIT_XCNT (#166154 )	2025-11-05 10:17:52 +00:00
Jan Patrick Lehr	833983918d	Revert "CodeGen: Record MMOs in finalizeBundle" (#166520 ) Reverts llvm/llvm-project#166210 Buildbot failures in the libc on GPU bot: https://lab.llvm.org/buildbot/#/builders/10/builds/16711	2025-11-05 11:11:08 +01:00
Nicolai Hähnle	304d2ff4d9	CodeGen: Record MMOs in finalizeBundle (#166210 ) This allows more accurate alias analysis to apply at the bundle level. This has a bunch of minor effects in post-RA scheduling that look mostly beneficial to me, all of them in AMDGPU (the Thumb2 change is cosmetic). The pre-existing (and unchanged) test in CodeGen/MIR/AMDGPU/custom-pseudo-source-values.ll tests that MIR with a bundle with MMOs can be parsed successfully. v2: - use cloneMergedMemRefs - add another test to explicitly check the MMO bundling behavior v3: - use poison instead of undef to initialize the global variable in the test	2025-11-05 06:56:19 +00:00
Matt Arsenault	849038cad1	AMDGPU: Do not infer implicit inputs for !nocallback intrinsics (#131759) This isn't really the right check, we want to know that the intrinsic does not perform a true function call to any code (in the module or not). nocallback appears to be the closest thing to this property we have now though. Fixes theoretically miscompiles with intrinsics like statepoint, which hide a call to a real function. Also do the same for inferring no-agpr usage.	2025-11-05 04:53:42 +00:00
Vigneshwar Jayakumar	b5f200129a	[CodeGen] Register-coalescer remat fix subreg liveness (#165662 ) This is a bugfix in rematerialization where the liveness of subreg mask was incorrectly updated causing crash in scheduler.	2025-11-04 22:40:40 -06:00
Abhay Kanhere	d998f92a00	[CodeGen] MachineVerifier to check early-clobber constraint (#151421 ) Currently MachineVerifier is missing verifying early-clobber operand constraint. The only other machine operand constraint - TiedTo is already verified.	2025-11-04 18:39:31 -08:00
Nicolai Hähnle	d6fdfe0a27	CodeGen: Record tied virtual register operands in finalizeBundle (#166209 ) This is in preparation of a future AMDGPU change where we are going to create bundles before register allocation and want to rely on the TwoAddressInstructionPass handling those bundles correctly. v2: - simplify the virtual register check and the test	2025-11-05 02:18:39 +00:00
Nicolai Hähnle	3974157929	AMDGPU: Pre-commit a test (#166414 )	2025-11-05 01:21:37 +00:00
Matt Arsenault	2b4ac66297	AMDGPU: Cleanup and modernize limit-coalesce.mir test (#166465 )	2025-11-04 23:57:39 +00:00
Syadus Sefat	ce091da5df	[AMDGPU] Mark WMMA machine instructions as convergent (#165602 ) The WMMA MI(s) are missing the isConvergent flag. This causes incorrect behavior in passes like machine-sink, where WMMA instructions get sunk into divergent branches. This patch fixes the issue by setting the isConvergent flag to 1 in the VOP3PInstructions.td file.	2025-11-04 15:37:27 -06:00
choikwa	8f683c3e4b	[AMDGPU] NFC, delete promote-alloca testcase (#166297 ) previous merge did not delete.	2025-11-04 14:34:54 -05:00
Jay Foad	dbce71382c	[AMDGPU] Skip debug instructions when eliminating S_SET_GPR_IDX_ON/OFF (#160715 )	2025-11-04 12:03:16 +00:00
Jay Foad	f037f41350	[IR] Add new function attribute nocreateundeforpoison (#164809 ) Also add a corresponding intrinsic property that can be used to mark intrinsics that do not introduce poison, for example simple arithmetic intrinsics that propagate poison just like a simple arithmetic instruction. As a smoke test this patch adds the new property to llvm.amdgcn.fmul.legacy.	2025-11-04 12:00:44 +00:00
Jay Foad	99a1fcad5d	[UTC] Update AMDGPU asm regexp for private functions (#166169 ) Since #163011 changed AMDGPU to use ELF mangling, the regexp failed to match private functions because of the inconsistent presence/absence of the .L prefix on the first line of the function e.g.: ``` .Lfoo: ; @foo ```	2025-11-04 11:59:43 +00:00
Robert Imschweiler	c02bdd466a	[AMDGPU] Fix handling of FP in cs.chain functions (#161194 ) In case there is an dynamic alloca / an alloca which is not in the entry block, cs.chain functions do not set up an FP, but are reported to need one. This results in a failed assertion in `SIFrameLowering::emitPrologue()` (Assertion `(!HasFP \|\| FPSaved) && "Needed to save FP but didn't save it anywhere"' failed.) This commit changes `hasFPImpl` so that the need for an SP in a cs.chain function does not directly imply the need for an FP anymore. This LLVM defect was identified via the AMD Fuzzing project.	2025-11-04 10:22:13 +01:00
Robert Imschweiler	a8ea7f4580	Reapply: [AMDGPU][UnifyDivergentExitNodes][StructurizeCFG] Add support for callbr instruction with inline-asm (#152161 ) (#166195 ) Reapply #152161 with fixed 'changed' flags.	2025-11-03 20:59:48 +01:00
vangthao95	e8765401d4	[AMDGPU][GlobalISel] Add RegBankLegalize support for G_FENCE (#165939 )	2025-11-03 09:36:49 -08:00
Robert Imschweiler	af68efc9c4	Revert "[AMDGPU][UnifyDivergentExitNodes][StructurizeCFG] Add support for callbr instruction with inline-asm" (#166186 ) Reverts llvm/llvm-project#152161 Need to revert to fix changed logic for the expensive checks.	2025-11-03 16:33:20 +00:00
Robert Imschweiler	332f9b5eee	[AMDGPU][UnifyDivergentExitNodes][StructurizeCFG] Add support for callbr instruction with inline-asm (#152161 ) Finishes adding inline-asm callbr support for AMDGPU, started by https://github.com/llvm/llvm-project/pull/149308.	2025-11-03 16:09:12 +01:00
Aaditya	c8187f6539	[AMDGPU] Fix Xcnt handling between blocks (#165201 ) For blocks with multiple predescessors, there maybe `SMEM` and `VMEM` events active at the same time. This patch handles these cases.	2025-11-01 16:48:48 +05:30
Matt Arsenault	f4f247f01e	AMDGPU/GlobalISel: Fix vgpr abs tests using SGPR return (#165965 ) Fix the calling convention to use normal functions instead of amdgpu_cs	2025-10-31 21:41:53 -07:00
Matt Arsenault	cf829cc11c	AMDGPU: Add baseline test for #161651 (#165921 )	2025-10-31 21:50:17 +00:00
vangthao95	d1d635083d	[AMDGPU][GlobalISel] Clean up selectCOPY_SCC_VCC function (#165797 ) Follow-up patch to address the comments in https://github.com/llvm/llvm-project/pull/165355.	2025-10-31 13:17:44 -07:00
Stanislav Mekhanoshin	be2ae264dd	[AMDGPU] Record old VGPR MSBs in the high bits of s_set_vgpr_msb (#165035 ) Fixes: SWDEV-562450	2025-10-31 12:21:59 -07:00
choikwa	4a5692d6b3	[AMDGPU] NFC, add testcase showing promote-alloca of array of vectors to a large vector (#165824 ) later patch will target series of extractelement/insertelement pairs.	2025-10-31 14:43:35 -04:00
Stanislav Mekhanoshin	0d9c75be2d	[AMDGPU] Reset VGPR MSBs at the end of fallthrough basic block (#164901 ) By convention a basic block shall start with MSBs zero. We also need to know a previous mode in all cases as SWDEV-562450 asks to record the old mode in the high bits of the mode.	2025-10-31 10:58:22 -07:00
vangthao95	2837a4bdd7	[AMDGPU][GlobalISel] Add RegBankLegalize support for G_READCYCLECOUNTER (#165754 )	2025-10-31 09:12:56 -07:00
Abhinav Garg	1057c63b24	[AMDGPU][GlobalISel] Add register bank legalization for G_FADD (#163407 ) This patch adds register bank legalization support for G_FADD opcodes in the AMDGPU GlobalISel pipeline. Added new reg bank type UniInVgprS64. This patch also adds a combine logic for ReadAnyLane + Trunc + AnyExt. --------- Co-authored-by: Abhinav Garg <abhigarg@amd.com>	2025-10-31 16:45:40 +05:30
Changpeng Fang	6b5afdc3ab	[AMDGPU] Support bfloat comparison for ballot intrinsic (#165495 ) We do not have native instructions for direct bfloat comparisons. However, we can expand bfloat to float, and do float comparison instead. TODO: handle bfloat comparison for ballot intrinsic on global isel path. Fixes: SWDEV-563403	2025-10-30 09:44:25 -07:00
Anshil Gandhi	b1d5a2a156	[AMDGPU] Add regbankselect rules for G_ADD/SUB and variants (#159860 ) Add legalization rules for G_ADD, G_UADDO, G_UADDE and their SUB counterparts.	2025-10-30 11:45:02 -04:00
vangthao95	ba5cde79aa	[AMDGPU][GlobalISel] Fix issue with copy_scc_vcc on gfx7 (#165355 ) When selecting for G_AMDGPU_COPY_SCC_VCC, we use S_CMP_LG_U64 or S_CMP_LG_U32 for wave64 and wave32 respectively. However, on gfx7 we do not have the S_CMP_LG_U64 instruction. Work around this issue by using S_OR_B64 instead.	2025-10-30 08:19:12 -07:00
Vigneshwar Jayakumar	469702c5d5	[LICM] Sink unused l-invariant loads in preheader. (#157559 ) Unused loop invariant loads were not sunk from the preheader to the exit block, increasing live range. This commit moves the sinkUnusedInvariant logic from indvarsimplify to LICM also adds functionality to sink unused load that's not clobbered by the loop body.	2025-10-30 09:23:04 -05:00
Pankaj Dwivedi	4d7093b806	[AMDGPU] Enable "amdgpu-uniform-intrinsic-combine" pass in pipeline. (#162819 ) This PR enables AMDGPUUniformIntrinsicCombine pass in the llc pipeline. Also introduces the "amdgpu-uniform-intrinsic-combine" command-line flag to enable/disable the pass. see the PR:https://github.com/llvm/llvm-project/pull/116953	2025-10-30 12:32:32 +05:30
Stanislav Mekhanoshin	5f1813e826	[AMDGPU] Support true16 spill restore with sram-ecc (#165320 )	2025-10-29 12:35:01 -07:00
Pankaj Dwivedi	20532c0aab	[AMDGPU] make AMDGPUUniformIntrinsicCombine a function pass (#165265 ) There has been an issue(using function analysis inside the module pass in OPM) integrating this pass into the LLC pipeline, which currently lacks NPM support. I tried finding a way to get the per-function analysis, but it seems that in OPM, we don't have that option. So the best approach would be to make it a function pass. Ref: https://github.com/llvm/llvm-project/pull/116953	2025-10-29 11:56:43 +05:30
Harrison Hao	d604ab6288	[AMDGPU] Support image atomic no return instructions (#150742 ) Add support for no-return variants of image atomic operations (e.g. IMAGE_ATOMIC_ADD_NORTN, IMAGE_ATOMIC_CMPSWAP_NORTN). These variants are generated when the return value of the intrinsic is unused, allowing the backend to select no return type instructions.	2025-10-29 10:42:15 +08:00
David Green	d51dcf929e	[GlobalISel] Combine away G_UNMERGE(G_IMPLICITDEF). (#119183 ) This helps clean up some more legalization artefacts during legalization, in a similar way to other operations, and helps some of the DUP cases get through legalization successfully.	2025-10-28 09:57:31 +00:00
Carl Ritson	385c12134a	[AMDGPU] Rework GFX11 VALU Mask Write Hazard (#138663 ) Apply additional counter waits to address VALU writes to SGPRs. Rework expiry detection and apply wait coalescing to mitigate some of the additional waits.	2025-10-28 16:09:28 +09:00
LU-JOHN	7d14733c12	[AMDGPU] Generate s_absdiff_i32 (#164835 ) Generate s_absdiff_i32. Tested in absdiff.ll. Also update s_cmp_0.ll to test that s_absdiff_i32 is foldable with a s_cmp_lg_u32 sX, 0. --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-10-27 14:40:56 -05:00
Jeffrey Byrnes	30f2bf7558	[AMDGPU] Use implicit operand to preserve liveness of COPY (#164911 ) When lowering spills / restores, we may end up partially lowering the spill via copies and the remaining portion with loads/stores. In this partial lowering case,the implicit-def operands added to the restore load clobber the preceding copies -- telling MachineCopyPropagation to delete them. By also attaching an implicit operand to the load, the COPYs have an artificial use and thus will not be deleted - this is the same strategy taken in https://github.com/llvm/llvm-project/pull/115285 I'm not sure that we need implicit-def operands on any load restore, but I guess it may make sense if it needs to be split into multiple loads and some have been optimized out as containing undef elements. These implicit / implicit-def operands continue to cause correctness issues. A previous / ongoing long term plan to remove them is being addressed via: https://discourse.llvm.org/t/llvm-codegen-rfc-add-mo-lanemask-type-and-a-new-copy-lanemask-instruction/88021 https://github.com/llvm/llvm-project/pull/151123 https://github.com/llvm/llvm-project/pull/151124	2025-10-27 10:47:11 -07:00
Gheorghe-Teodor Bercea	bce7f7cc22	[AMDGPU] Precommit test for sinking vector ops PR 162580 (#165050 ) Pre-commit test for PR: https://github.com/llvm/llvm-project/pull/162580	2025-10-27 13:44:44 -04:00
Jay Foad	60f20ea465	[AMDGPU] Add target feature for waits before system scope stores. NFC. (#164993 )	2025-10-27 10:31:37 +00:00
Yunqing Yu	059d90d08f	[Legalizer] Cache extracted element when lowering G_SHUFFLE_VECTOR. (#163893 ) Cache extracted elements in lowerShuffleVector(). For example, when lowering ``` %0:_(<2 x s32>) = G_BUILD_VECTOR %0, %1 %2:_(<N x s32>) = G_SHUFFLE_VECTOR %1, shufflemask(0, 0, 0, 0 ... x N ) ``` Currently, we generate `N` `G_EXTRACT_VECTOR_ELT` for each element in shufflemask. This is undesirable and bloats the code, especially for larger vectors. With this change, we only generate one `G_EXTRACT_VECTOR_ELT` from `%0` and reuse it for all four result elements.	2025-10-25 10:26:11 -05:00
Mirko Brkušanin	bdec5bf69c	[AMDGPU][GlobalISel] Combine (or s64, zext(s32)) (#151519 ) If we only deal with a one part of 64bit value we can just generate merge and unmerge which will be either combined away or selected into copy / mov_b32.	2025-10-24 17:25:00 +02:00
Mirko Brkušanin	fe5f49942e	[AMDGPU][GlobalISel] Lower G_FMINIMUM and G_FMAXIMUM (#151122 ) Add GlobalISel lowering of G_FMINIMUM and G_FMAXIMUM following the same logic as in SDag's expandFMINIMUM_FMAXIMUM. Update AMDGPU legalization rules: Pre GFX12 now uses new lowering method and make G_FMINNUM_IEEE and G_FMAXNUM_IEEE legal to match SDag.	2025-10-24 14:48:27 +02:00
David Green	a1e59bdc17	[GlobalISel] Make scalar G_SHUFFLE_VECTOR illegal. (#140508 ) I'm not sure if this is the best way forward or not, but we have a lot of issues with forgetting that shuffle_vectors can be scalar again and again. (There is another example from the recent known-bits code added recently). As a scalar-dst shuffle vector is just an extract, and a scalar-source shuffle vector is just a build vector, this patch makes scalar shuffle vector illegal and adjusts the irbuilder to create the correct node as required. Most targets do this already through lowering or combines. Making scalar shuffles illegal simplifies gisel as a whole, it just requires that transforms that create shuffles of new sizes to account for the scalar shuffle being illegal (mostly IRBuilder and LessElements).	2025-10-24 08:21:35 +01:00
Stanislav Mekhanoshin	ef923f1b28	[AMDGPU] Change patterns for v_[pk_]add_{min\|max} (#164881 ) The intermediate result is in fact the add with saturation regardless of the clamp bit.	2025-10-23 15:45:15 -07:00
paperchalice	c2b2a347bf	[AMDGPU][test] Remove unsafe-fp-math uses (NFC) (#164609 ) Post cleanup for #164534.	2025-10-23 01:45:54 +00:00
Matt Arsenault	1d9f9ad531	CodeGen: Fix crash when no libcall is available for stackguard (#164211 ) Not all the paths appear to be implemented for GlobalISel	2025-10-23 10:40:40 +09:00
Stanislav Mekhanoshin	9b5bc98743	[AMDGPU] Add intrinsics for v_[pk]_add_{min\|max}_* instructions (#164731 )	2025-10-22 17:46:33 -07:00

1 2 3 4 5 ...

9513 Commits