llvm-project

Author	SHA1	Message	Date
Shoreshen	04aebbfbe2	[AMDGPU] Delete AMDGPU Unify Metadata pass (#153548 ) Fixes #153150	2025-08-14 16:16:32 +08:00
Robert Imschweiler	d21feb5e66	[AMDGPU] Fix crash for inline-asm inputs of type MVT::Other (#153425 )	2025-08-13 17:27:31 +02:00
Petar Avramovic	4d4966d481	AMDGPU/GlobalISel: Add regbanklegalize rules for ptr-add (#153175 )	2025-08-13 15:49:48 +02:00
Diana Picus	420a5de1a4	[AMDGPU] Ignore inactive VGPRs in .vgpr_count (#149052 ) When using the `amdgcn.init.whole.wave` intrinsic, we add dummy VGPR arguments with the purpose of preserving their inactive lanes. The pattern may look something like this: ``` entry: call amdgcn.init.whole.wave branch to shader or tail shader: $vInactive = IMPLICIT_DEF ; Tells regalloc it's safe to use the active lanes actual code... tail: call amdgcn.cs.chain [...], implicit $vInactive ``` We should not report these VGPRs in the `.vgpr_count` metadata. This patch achieves that goal by ignoring meta instructions and calls. This should be safe since if those registers are actually used in any other context, they will be counted there. The same reasoning applies in the general case, so we don't explicitly check for the existence of `init.whole.wave`. This is a reworked version of #133242, which was reverted in #144039 and split into smaller bits.	2025-08-13 10:47:00 +02:00
Shoreshen	db96363c0a	[AMDGPU] Avoid put implicit_def into bundle that break reg's liveness (#142563 ) Cause: 1. `implicit_def` inside bundle does not count for define of reg in machineinst verifier 2. Including `implicit_def` will cause relative reg not define, result in `Bad machine code: Using an undefined physical register` in the machineinst verifier Fixes https://github.com/llvm/llvm-project/issues/139102 --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>	2025-08-13 10:41:44 +08:00
Stanislav Mekhanoshin	d0ee82040c	[AMDGPU] Add s_barrier_init\|join\|leave instructions (#153296 )	2025-08-12 15:07:07 -07:00
Adam Yang	8710571aba	[AMDGPU] Fixed llvm-debuginfo-analyzer for AMDGPU. (#145125 ) Constructing Target triple with `ObjectFile::makeTriple` instead of just with `Arch` and leaving the rest unknown. Also creating the subtarget with the `CPU`. AMDGPU needs the full triple and `CPU` to disassemble correctly. To run a full test, also fixed a failure in `SIPreAllocateWWMRegs` with the `$noreg` operand in `DBG_VALUE`. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-08-12 22:04:52 +00:00
Philip Reames	4d629f9744	[MIR] Remove std::variant from multiple save/restore point handling [nfc] (#153226 ) In review of bbde6b, I had originally proposed that we support the legacy text format. As review evolved, it bacame clear this had been a bad idea (too much complexity), but in order to let that patch finally move forward, I approved the change with the variant. This change undoes the variant, and updates all the tests to just use the array form.	2025-08-12 11:23:05 -07:00
Orlando Cazalet-Hyams	54f92c7806	[RemoveDIs][AMDGPU] Replace defunct getAssignmentMarkers call (#153212 ) Not quite NFC as it looks like the original intrinsic-handling code never got updated to use records. This was never caught because that code wasn't tested. I've adjusted an existing test so the behaviour is now covered.	2025-08-12 17:20:38 +01:00
Petar Avramovic	f88be47fbf	AMDGPU/GlobalISel: Switch a few tests to new-reg-bank-select (#153174 )	2025-08-12 15:03:31 +02:00
Jim M. R. Teichgräber	5d54a576fe	[AMDGPU] AMDGPULateCodeGenPrepare Legacy PM: replace `setPreservesAll()` with `setPreservesCFG()` (#148167 ) This PR depends on #148165; the first commit (90f1d0a881a21a8b4f192622d798c290770fda63) belongs to that PR. The changes are distinct, so separate PRs seemed like the best option. I don't have commit access, so I couldn't use user-branches to mark the dependency. As AMDGPULateCodeGenPrepare actually performs changes that invalidate Uniformity Analysis; use `setPreservesCFG()` to mark this, instead of `setPreservesAll()` which wrongly includes preserving Uniformity Analysis. Note that before #148165, this would still have preserved Uniformity Analysis, hence the dependency. In addition, `amdgpu/llc-pipeline.cc` needs to be changed when both changes are in effect, but those changes would make the test fail if the PRs weren't based on one another. Note on why this hasn't caused issues so far: It just so happens that AMDGPULateCodeGenPrepare is always immediately followed by AMDGPUUnifyDivergentExitNodes, which does invalidate most analyses, including Uniformity. And because UnifyDivergentExitNodes only looks at terminators, and LateCGP seemingly does not replace uniform values with divergent values, or divergent values with uniform values, and it only inserts new values that are not looked at by UnifyDivergentExitNodes, this bug remained hidden. --- I ran `git-clang-format` on my changes. I tested them using the `check-llvm` target; no unexpected failures occurred after I made the change to `amdgpu/llc-pipeline.ll`.	2025-08-12 19:40:02 +09:00
Fabian Ritter	e9ece175f9	[AMDGPU][GISel] Only fold flat offsets if they are inbounds (#153001 ) For flat memory instructions where the address is supplied as a base address register with an immediate offset, the memory aperture test ignores the immediate offset. Currently, ISel does not respect that, which leads to miscompilations where valid input programs crash when the address computation relies on the immediate offset to get the base address in the proper memory aperture. Global or scratch instructions are not affected. This patch only selects flat instructions with immediate offsets from address computations with the inbounds flag: If the address computation does not leave the bounds of the allocated object, it cannot leave the bounds of the memory aperture and is therefore safe to handle with an immediate offset. Relevant tests are in fold-gep-offset.ll. Analogous to #132353 for SDAG (which is not yet in a mergeable state, its progress is currently blocked by #146076). Fixes SWDEV-516125 for GISel.	2025-08-12 10:14:20 +02:00
Fabian Ritter	53af2e693d	[AMDGPU][GISel] Add inbounds flag to FLAT GISel tests (#153000 ) This is in preparation for a patch that disables folding offsets into FLAT instructions if the corresponding address computation is not inbounds, to avoid miscompilations where this would lead to wrong aperture check results. With the added inbounds flags for GEPs and G_PTR_ADDs affecting FLAT instructions, the outputs for these tests won't change. For SWDEV-516125.	2025-08-12 09:35:19 +02:00
Matt Arsenault	ff53086924	AMDGPU: Add new VA inline asm constraint for AV registers (#152665 ) Add a new constraint corresponding to the AV_* register classes for operands which can allocate AGPRs or VGPRs. This applies to load and stores on gfx90a+, and srcA / srcB for MFMA instructions. The error emitted on unsupported targets isn't ideal, it is produced by the register allocator without a rationale, but it is consistent with the existing errors. I mostly want this for writing allocation tests.	2025-08-12 10:17:28 +09:00
Stanislav Mekhanoshin	ea14834966	[AMDGPU] Per-subtarget DPP instruction classification (#153096 ) This is NFCI at this point.	2025-08-11 15:41:02 -07:00
Stanislav Mekhanoshin	b9ecee9d47	[AMDGPU] Fix DPP combining into V_BITOP3_B32 (#153083 )	2025-08-11 15:39:02 -07:00
Matt Arsenault	9a293530d9	AMDGPU: Handle multiple AGPR MFMA rewrites (#147975 ) I have this firing on one of the real examples, need to produce the tests and check a few edge cases	2025-08-11 23:10:35 +09:00
Alexander Richardson	87ad9122e5	[AMDGPULowerBufferFatPointers] Handle ptrtoaddr by extending the offset Reviewed By: krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/139413	2025-08-09 16:28:12 -07:00
Alexander Richardson	4cdfc0dc0d	[AMDGPU] Baseline test for ptrtoaddr in lower-buffer-fat-pointers We should only be extracting the 32-bit offset in the ptrtoaddr case. Reviewed By: arsenm Pull Request: https://github.com/llvm/llvm-project/pull/143812	2025-08-09 16:26:57 -07:00
Chris Jackson	f9cb95c9b0	[AMDGPU] Add additional test cases to integer src mod test (#152692 ) Adds missing 16-bit test cases to the test that src mods are not applied to integers in instructions with canonicalizing patterns.	2025-08-08 23:14:36 +01:00
Lucas Ramirez	83c308f014	[AMDGPU][Scheduler] Consistent occupancy calculation during rematerialization (#149224 ) The `RPTarget`'s way of determining whether VGPRs are beneficial to save and whether the target has been reached w.r.t. VGPR usage currently assumes, if `CombinedVGPRSavings` is true, that free slots in one VGPR RC can always be used for the other. Implicitly, this makes the rematerialization stage (only current user of `RPTarget`) follow a different occupancy calculation than the "regular one" that the scheduler uses, one that assumes that ArchVGPR/AGPR usage can be balanced perfectly and at no cost, which is untrue in general. This ultimately yields suboptimal rematerialization decisions that require cross-VGPR-RC copies unnecessarily. This fixes that, making the `RPTarget`'s internal model of occupancy consistent with the regular one. The `CombinedVGPRSavings` flag is removed, and a form of cross-VGPR-RC saving implemented only for unified RFs, which is where it makes the most sense. Only when the amount of free VGPRs in a given VGPR RC (ArchVPGR or AGPR) is lower than the excess VGPR usage in the other VGPR RC does the `RPTarget` consider that a pressure reduction in the former will be beneficial to the latter.	2025-08-08 14:26:04 +02:00
Diana Picus	a910a6a8b5	[AMDGPU] AsmPrinter: Unify arg handling (#151672 ) When computing the number of registers required by entry functions, the `AMDGPUAsmPrinter` needs to take into account both the register usage computed by the `AMDGPUResourceUsageAnalysis` pass, and the number of registers initialized by the hardware. At the moment, the way it computes the latter is different for graphics vs compute, due to differences in the implementation. For kernels, all the information needed is available in the `SIMachineFunctionInfo`, but for graphics shaders we would iterate over the `Function` arguments in the `AMDGPUAsmPrinter`. This pretty much repeats some of the logic from instruction selection. This patch introduces 2 new members to `SIMachineFunctionInfo`, one for SGPRs and one for VGPRs. Both will be computed during instruction selection and then used during `AMDGPUAsmPrinter`, removing the need to refer to the `Function` when printing assembly. This patch is NFC except for the fact that we now add the extra SGPRs (VCC, XNACK etc) to the number of SGPRs computed for graphics entry points. I'm not sure why these weren't included before. It would be nice if someone could confirm if that was just an oversight or if we have some docs somewhere that I haven't managed to find. Only one test is affected (its SGPR usage increases because we now take into account the XNACK registers).	2025-08-08 12:00:37 +02:00
Matt Arsenault	81f3ddf4a2	AMDGPU: Rewrite VGPR MFMAs to AGPR when directly copied to AGPR class (#152480 )	2025-08-08 18:20:21 +09:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
David Stuttard	c7c0229480	Revert "[AMDGPU] SelectionDAG divergence tracking should take into account Target divergency. (#147560 )" (#152548 ) This reverts commit 9293b65a616b8de432a654d046e802540b146372.	2025-08-08 09:05:59 +01:00
Stanislav Mekhanoshin	29cde86ecc	[AMDGPU] Removed extra blank lines from tests. NFC. (#152612 )	2025-08-08 00:53:00 -07:00
Carl Ritson	0bdd312b1d	[AMDGPU] Generate some WQM/WWM tests (NFC) (#152635 ) Update llvm.amdgcn.kill.ll and wqm.mir to be generated. This preparatory work for refactoring of WQM/WWM pass.	2025-08-08 14:10:13 +09:00
Stanislav Mekhanoshin	469863111f	[AMDGPU] Enable CodeGen for v_pk_fma_bf16 (#152578 )	2025-08-07 16:19:59 -07:00
Stanislav Mekhanoshin	dddeb07c2e	[AMDGPU] Restrict packed math FP32 instructions to read only one SGPR per operand on gfx12+ (#152465 ) Sec. 4.6.7.1 of the gfx1250 SPG states that if an SGPR is used as an operand, only one SGPR will be read for both the low and high operations. As a result, the corresponding bits in `op_sel` and `op_sel_hi` must be the same when the operand is an SGPR. Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com> Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>	2025-08-07 16:13:34 -07:00
Stanislav Mekhanoshin	82046c7f33	[AMDGPU] Adjust hard clause rules for gfx1250 (#152592 ) Change from GFX12: Relax S_CLAUSE rules to all all non-flat memory types in the same clause, and all Flat types in the same. For VMEM/FLAT clause types now look like: - Non-Flat (load, store, atomic): buffer, global, scratch, TDM, Async - Flat: load, store, atomic	2025-08-07 14:59:31 -07:00
Stanislav Mekhanoshin	abc22f771e	[AMDGPU] Fix buffer addressing mode matching (#152584 ) Starting in gfx1250, voffset and immoffset are zero-extended from 32 bits to 45 bits before being added together.	2025-08-07 14:23:41 -07:00
Stanislav Mekhanoshin	d09dbdabb9	[AMDGPU] bf16 clamp folding (#152573 )	2025-08-07 12:59:50 -07:00
Chris Jackson	9f102a9004	[AMDGPU] Recognise bitmask operations as srcmods on select (#152119 ) Add to the VOP patterns to recognise when or/xor/and are masking only the most significant bit of i32/v2i32/i64 and replace with the corresponding FP source modifier.	2025-08-07 19:45:09 +01:00
Matt Arsenault	7402cd6ded	AMDGPU: Disable AGPR selection in mfma rewrite test This makes the test actually test the intended rewrite pass. Also add some tests with inline immediates in src2. Switch the target to gfx942 for future test functions.	2025-08-07 16:28:19 +09:00
Matt Arsenault	f44d8d583c	AMDGPU: Add a few missing mfma rewrite tests (#152434 ) Test other splitting situations that appear in greedy. This includes ensuring we have a case that hits a local split and instruction split (most of the tests hit the region split path). Also test a few cases where the final result isn't fully used, resulting in partial copy bundles instead of a simple full copy. Test physreg and virtreg agpr interference with a reassignment candidate. I'm accumulating too many failure cases, and MIR tests are very prone to painful merge conflicts, so I've added a few more tests and extracted new tests from #147975. Closes #149026	2025-08-07 16:14:45 +09:00
Stanislav Mekhanoshin	b296ea9c14	[AMDGPU] s_get_shader_cycles_u64 gfx1250 instruction (#152390 ) It is the same as reading SHADER_CYCLES_LO and SHADER_CYCLES_HI but with a single instruction.	2025-08-06 15:32:28 -07:00
Stanislav Mekhanoshin	c2eddec4ff	[AMDGPU] System scope atomics are emulated over PCIe in gfx1250 (#152369 ) HW will emulate unsupported PCIe atomics via CAS loop, we do not need to expand these anymore.	2025-08-06 13:08:12 -07:00
Stanislav Mekhanoshin	334d0be2d4	[AMDGPU] Support 64-bit LDS atomic fadd on gfx1250 (#152368 )	2025-08-06 13:07:56 -07:00
Changpeng Fang	32161e9de3	[AMDGPU] Do not fold an immediate into instructions with frame indexes (#151263 ) Do not fold an immediate into an instruction that already has a frame index operand. A frame index could possibly turn out to be another immediate. Fixes: SWDEV-536263 --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-08-06 11:47:37 -07:00
Tim Renouf	e99c565cd2	MC,AMDGPU: Don't pad .text with s_code_end if it would otherwise be empty (#147980 ) We don't want that padding in a module that only contains data, not code. Also fix MCSection::hasInstructions() so it works with the asm streamer too.	2025-08-06 13:25:45 +01:00
Diana Picus	14cd133931	Revert "[AMDGPU] Intrinsic for launching whole wave functions" (#152286 ) Reverts llvm/llvm-project#145859 because it broke a HIP test: ``` [34/59] Building CXX object External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o FAILED: External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG -O3 -DNDEBUG -w -Werror=date-time --rocm-path=/opt/botworker/llvm/External/hip/rocm-6.3.0 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /home/botworker/bbot/clang-hip-vega20/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc fatal error: error in backend: Cannot select: intrinsic %llvm.amdgcn.readfirstlane ```	2025-08-06 12:24:52 +02:00
Diana Picus	0461cd3d1d	[AMDGPU] Intrinsic for launching whole wave functions (#145859 ) Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave functions. This will take as its first argument the callee with the amdgpu_gfx_whole_wave calling convention, followed by the call parameters which must match the signature of the callee except for the first function argument (the i1 original EXEC mask, which doesn't need to be passed in). Indirect calls are not allowed. Make direct calls to amdgpu_gfx_whole_wave functions a verifier error. Unspeakable horrors happen around calls from whole wave functions, the plan is to improve the handling of caller/callee-saved registers in a future patch. Tail calls are also handled in a future patch.	2025-08-06 10:25:53 +02:00
Stanislav Mekhanoshin	b8eb61adc9	[AMDGPU] Implement addrspacecast from flat <-> private on gfx1250 (#152218 )	2025-08-05 16:25:23 -07:00
Stanislav Mekhanoshin	34aed0ed56	[AMDGPU] Add gfx1250 wmma_scale[16]_f32_32x16x128_f4 instructions (#152194 )	2025-08-05 15:15:21 -07:00
Simon Pilgrim	9f50224b25	[DAG] Remove Depth=1 hack from isGuaranteedNotToBeUndefOrPoison checks (#152127 ) Now that #146490 removed the assertion in visitFreeze to assert that the node was still isGuaranteedNotToBeUndefOrPoison we no longer need this reduced depth hack (which had to account for the difference in depth of freeze(op()) vs op(freeze()) Helps with some of the minor regressions in #150017	2025-08-05 13:35:04 +01:00
Simon Pilgrim	d561259a08	[DAG] visitFREEZE - replace multiple frozen/unfrozen uses of an SDValue with just the frozen node (#150017 ) Similar to InstCombinerImpl::freezeOtherUses, attempt to ensure that we merge multiple frozen/unfrozen uses of a SDValue. This fixes a number of hasOneUse() problems when trying to push FREEZE nodes through the DAG. Remove SimplifyMultipleUseDemandedBits handling of FREEZE nodes as we now want to keep the common node, and not bypass for some nodes just because of DemandedElts. Fixes #149799	2025-08-05 09:24:09 +01:00
Stanislav Mekhanoshin	a153e83e41	[AMDGPU] gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 codegen (#152036 )	2025-08-04 19:16:34 -07:00
Maksim Shelegov	862fb42b06	[AMDGPU][GlobalISel] Fix G_UNMERGE_VALUES combine (#141812 ) Fixes #139791. When trying to combine two G_UNMERGE_VALUES with a COPY between them, a crash occurs in tryCombineUnmergeValues() because getDefIndex() tries to find the index of the original source register in the def found by getDefIgnoringCopies(). In the case of a COPY in between, this register is not present in the def, only in the COPY.	2025-08-04 17:06:34 -07:00
Stanislav Mekhanoshin	37fe9f6382	[AMDGPU] Add gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 MC support (#152014 ) This adds new VOP3PX2e encoding	2025-08-04 14:20:12 -07:00
Ivan Kosarev	2b20cf7291	[AMDGPU] Fold into uses of splat REG_SEQUENCEs through COPYs. (#145691 )	2025-08-04 16:18:33 +01:00

1 2 3 4 5 ...

9133 Commits