llvm-project

Author	SHA1	Message	Date
LU-JOHN	87b1d3537a	[AMDGPU][NFC] Avoid copying MachineOperands (#166293 ) Avoid copying machine operands. Signed-off-by: John Lu <John.Lu@amd.com>	2025-11-04 23:18:40 -06:00
Kazu Hirata	902b0bd04a	[llvm] Remove "const" in the presence of "constexpr" (NFC) (#166109 ) "const" is extraneous in the presence of "constexpr" for simple variables and arrays.	2025-11-02 15:52:44 -08:00
LU-JOHN	7ed2f1b82b	[AMDGPU][NFC] Refactor SCC optimization (#165871 ) Refactor SCC optimization --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-10-31 22:26:28 -05:00
LU-JOHN	9abbec66bf	[AMDGPU] Reland "Remove redundant s_cmp_lg_* sX, 0" (#164201 ) Reland PR https://github.com/llvm/llvm-project/pull/162352. Fix by excluding SI_PC_ADD_REL_OFFSET from instructions that set SCC = DST!=0. Passes check-libc-amdgcn-amd-amdhsa now. Distribution of instructions that allowed a redundant S_CMP to be deleted in check-libc-amdgcn-amd-amdhsa test: ``` S_AND_B32 485 S_AND_B64 47 S_ANDN2_B32 42 S_ANDN2_B64 277492 S_CSELECT_B64 17631 S_LSHL_B32 6 S_OR_B64 11 ``` --------- Signed-off-by: John Lu <John.Lu@amd.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-10-22 08:42:29 -05:00
Nicolai Hähnle	896d546cf3	AMDGPU: Refactor three-address conversion (NFC) (#162558 ) Extract the core of the instruction rewriting into an implementation method, and unify the update of live variables / intervals updates in its caller. This is intended to help make future changes to three-address conversion more robust.	2025-10-20 17:17:51 -07:00
Jan Patrick Lehr	023b1f6a8e	Revert "[AMDGPU] Remove redundant s_cmp_lg_* sX, 0 " (#164116 ) Reverts llvm/llvm-project#162352 Broke our buildbot: https://lab.llvm.org/buildbot/#/builders/10/builds/15674 To reproduce cd llvm-project cmake -S llvm -B thebuild -C offload/cmake/caches/AMDGPULibcBot.cmake -GNinja cd thebuild ninja ninja check-libc-amdgcn-amd-amdhsa	2025-10-18 22:38:14 +02:00
LU-JOHN	8e5f6dd37c	[AMDGPU] Remove redundant s_cmp_lg_* sX, 0 (#162352 ) Remove redundant s_cmp_lg_* sX, 0 if SALU instruction already sets SCC if sX!=0. --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-10-18 09:33:47 -05:00
Brox Chen	ac193bc20f	[AMDGPU][True16][CodeGen] S_PACK_XX_B32_B16 lowering for true16 mode (#162389 ) S_PACK_XX_B32_B16 requires special lowering for true16 mode when it's being lowered to VALU in fix-sgpr-copy pass. Added test cases in fix-sgpr-copies-f16-true16.mir	2025-10-17 14:29:37 -04:00
Jay Foad	8c6b499f06	[AMDGPU] Simplify vcc handling in copyPhysReg. NFC. (#163340 )	2025-10-14 15:18:22 +01:00
carlobertolli	8892825917	[AMDGPU] Enable saving SHARED_BASE to VCC (#163244 )	2025-10-13 15:38:28 -05:00
Matt Arsenault	1a5494ca4a	AMDGPU: Use RegClassByHwMode to manage operand VGPR operand constraints (#158272 ) This removes special case processing in TargetInstrInfo::getRegClass to fixup register operands which depending on the subtarget support AGPRs, or require even aligned registers. This regresses assembler diagnostics, which currently work by hackily accepting invalid cases and then post-rejecting a validly parsed instruction. On the plus side this now emits a comment when disassembling unaligned registers for targets with the alignment requirement.	2025-10-08 11:19:54 +09:00
Brox Chen	b8127cc8d0	[AMDGPU][True16][CodeGen] fix v_mov_b16_t16 index in folding pass (#161764 ) With true16 mode v_mov_b16_t16 is added as new foldable copy inst, but the src operand is in different index. Use the correct src index for v_mov_b16_t16.	2025-10-03 17:34:42 -04:00
Matt Arsenault	5601c4080a	AMDGPU: Stop trying to constrain register class of post-RA-pseudos (#161792 ) This is trying to constrain the register class of a physical register, which makes no sense.	2025-10-03 21:19:33 +09:00
Matt Arsenault	2a39d8be87	AMDGPU: Remove dead code trying to constrain a physical register (#161790 ) This constrainRegClass check would never pass for a physical register.	2025-10-03 21:19:13 +09:00
Pierre van Houtryve	14fcd81861	[AMDGPU][InsertWaitCnts] Refactor some helper functions, NFC (#161160 ) - Remove one-line wrappers around a simple function call when they're only used once or twice. - Move very generic helpers into SIInstrInfo - Delete unused functions The goal is simply to reduce the noise in SIInsertWaitCnts without hiding functionality. I focused on moving trivial helpers, or helpers with very descriptive/verbose names (so it doesn't hide too much logic away from the pass), and that have some reusability potential. I'm also trying to make the code style more consistent. It doesn't make sense to see a function call `TII->isXXX` then suddenly call a random `isY` method that just wraps around `TII->isY`. The context of this work is that I'm trying to learn how this pass works, and while going through the code I noticed some little things here and there that I thought would be good to fix.	2025-10-01 10:51:00 +02:00
Brox Chen	934f802731	[AMDGPU][True16][CodeGen] true16 isel pattern for fma_mix_f16/bf16 (#159648 ) This patch includes: 1. fma_mix inst takes fp16 type as input, but place the operand in vgpr32. Update selector to insert vgpr32 for true16 mode if necessary. 2. fma_mix inst returns fp16 type as output, but place the vdst in vgpr32. Create a fma_mix_t16 pesudo inst for isel pattern, and lower it to mix_lo/hi in the mc lowering pass. These stop isel from emitting illegal `vgpr32 = COPY vgpr16` and improve code quality	2025-09-24 11:27:26 -04:00
Philip Reames	8b7a76a2ac	[CodeGen] Rename isReallyTriviallyReMaterializable [nfc] .. to isReMaterializableImpl. The "Really" naming has always been awkward, and we're working towards removing the "Trivial" part now, so go ehead and remove both pieces in a single rename. Note that this doesn't change any aspect of the current implementation; we still "mostly" only return instructions which are trivial (meaning no virtual register uses), but some targets do lie about that today.	2025-09-23 11:58:37 -07:00
Jay Foad	b7a848e5ce	[AMDGPU] Skip debug uses in SIInstrInfo::foldImmediate (#160102 )	2025-09-22 15:22:09 +01:00
Akash Dutta	c256966fe2	[AMDGPU]: Unpack packed instructions overlapped by MFMAs post-RA scheduling (#157968 ) This is a cleaned up version of PR #151704. These optimizations are now performed post-RA scheduling.	2025-09-19 09:41:02 -07:00
Matt Arsenault	daed12d00d	AMDGPU: Remove unnecessary AGPR legalize logic (#159491 ) The manual legalizeOperands code only need to consider cases that require full instruction context to know if the operand is legal. This does not need to handle basic operand register class constraints.	2025-09-19 09:51:46 +09:00
Matt Arsenault	aa8b624518	AMDGPU: Remove unnecessary operand legalization for WMMAs (#159370 ) The operand constraints already express this constraint, and InstrEmitter will respect them.	2025-09-18 09:20:18 +09:00
Matt Arsenault	d57aa484e1	AMDGPU: Constrain regclass when replacing SGPRs with VGPRs (#159369 )	2025-09-18 07:36:28 +09:00
Jay Foad	eeced0d073	[AMDGPU] Use larger immediate values in S_NOP (#158990 ) The S_NOP instruction has an immediate operand which is one less than the number of cycles to delay for. The maximum value that may be encoded in this field was increased in GFX8 and again in GFX12.	2025-09-16 15:51:06 +01:00
Stanislav Mekhanoshin	72aa946762	[AMDGPU] Drop high 32 bits of aperture registers (#158725 ) Fixes: SWDEV-551181	2025-09-16 02:11:39 -07:00
Carl Ritson	fdb06d9792	[AMDGPU] Refactor out common exec mask opcode patterns (NFCI) (#154718 ) Create utility mechanism for finding wave size dependent opcodes used to manipulate exec/lane masks.	2025-09-16 03:22:14 +00:00
Matt Arsenault	1dc4db8f1e	AMDGPU: Relax verifier for agpr/vgpr loads and stores (#158391 )	2025-09-13 16:34:02 +09:00
Matt Arsenault	7289f2cd0c	CodeGen: Remove MachineFunction argument from getRegClass (#158188 ) This is a low level utility to parse the MCInstrInfo and should not depend on the state of the function.	2025-09-12 19:22:02 +09:00
Matt Arsenault	3b48c64d08	AMDGPU: Move spill pseudo special case out of adjustAllocatableRegClass (#158246 ) This is special for the same reason av_mov_b64_imm_pseudo is special.	2025-09-12 18:35:57 +09:00
Matt Arsenault	9e1d656c68	AMDGPU: Remove MIMG special case in adjustAllocatableRegClass (#158184 ) I have no idea why this was here. MIMG atomics use tied operands for the input and output, so AV classes should have always worked. We have poor test coverage for AGPRs with atomics, so add a partial set. Everything seems to work OK, although it seems image cmpswap always uses VGPRs unnecessarily.	2025-09-12 09:02:24 +00:00
Matt Arsenault	5a21128f24	AMDGPU: Relax legal register operand constraint (#157989 ) Find a common subclass instead of directly checking for a subclass relationship. This fixes folding logic for unaligned register defs into aligned use contexts. e.g., a vreg_64 def into an av_64_align2 use should be able to find the common subclass vreg_align2. This avoids regressions in future patches. Checking the subclass was also redundant on the subregister path; getMatchingSuperRegClass is sufficient.	2025-09-12 08:57:47 +09:00
Matt Arsenault	1c325a07f8	AMDGPU: Stop checking allocatable in adjustAllocatableRegClass (#158105 ) This no longer does anything.	2025-09-12 08:56:34 +09:00
Petar Avramovic	41c685975e	AMDGPU/UniformityAnalysis: fix G_ZEXTLOAD and G_SEXTLOAD (#157845 ) Use same rules for G_ZEXTLOAD and G_SEXTLOAD as for G_LOAD. Flat addrspace(0) and private addrspace(5) G_ZEXTLOAD and G_SEXTLOAD should be always divergent.	2025-09-10 17:57:15 +02:00
Stanislav Mekhanoshin	b0ee92be94	[AMDGPU] Restrict scale operands of WMMA to low 256 VGPRs (#157526 ) These cannot accept high registers.	2025-09-08 15:44:51 -07:00
Matt Arsenault	727e9f5ea5	CodeGen: Pass SubtargetInfo to TargetGenInstrInfo constructors (#157337 ) This will make it possible for tablegen to make subtarget dependent decisions without adding new arguments to every target. --------- Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>	2025-09-08 12:12:19 +09:00
Matt Arsenault	884130bf93	AMDGPU: Allow folding multiple uses of some immediates into copies (#154757 ) In some cases this will require an avoidable re-defining of a register, but it works out better most of the time. Also allow folding 64-bit immediates into subregister extracts, unless it would break an inline constant. We could be more aggressive here, but this set of conditions seems to do a reasonable job without introducing too many regressions.	2025-09-06 08:22:09 +09:00
Matt Arsenault	d096b1d48e	AMDGPU: Remove flat special case in getRegClass (#156991 )	2025-09-06 07:42:16 +09:00
Stanislav Mekhanoshin	1f0f3473e6	[AMDGPU] High VGPR lowering on gfx1250 (#156965 )	2025-09-04 16:20:47 -07:00
Pierre van Houtryve	e2bd10cf16	[AMDGPU][gfx1250] Add 128B cooperative atomics (#156418 ) - Add clang built-ins + sema/codegen - Add IR Intrinsic + verifier - Add DAG/GlobalISel codegen for the intrinsics - Add lowering in SIMemoryLegalizer using a MMO flag.	2025-09-04 09:19:25 +00:00
Diana Picus	018dc1b397	[AMDGPU] Tail call support for whole wave functions (#145860 ) Support tail calls to whole wave functions (trivial) and from whole wave functions (slightly more involved because we need a new pseudo for the tail call return, that patches up the EXEC mask). Move the expansion of whole wave function return pseudos (regular and tail call returns) to prolog epilog insertion, since that's where we patch up the EXEC mask.	2025-09-04 10:34:43 +02:00
Matt Arsenault	a23a5b0683	AMDGPU: Remove the DS special case in getRegClass (#156696 ) These instructions should now have proper representation with separate instructions for operands which must be paired.	2025-09-04 15:14:17 +09:00
Matt Arsenault	dc170c7e31	AMDGPU: Special case align requirement for AV_MOV_B64_IMM_PSEUDO This should not require aligned registers. Fixes expensive_checks test failure. I don't see a better way until the new system to specify the alignment per register is done.	2025-09-04 09:55:39 +09:00
Matt Arsenault	dd5eb46690	AMDGPU: Fold 64-bit immediate into copy to AV class (#155615 ) This is in preparation for patches which will intoduce more copies to av registers.	2025-09-03 09:29:59 +09:00
Matt Arsenault	d7484684e5	AMDGPU: Refactor isImmOperandLegal (#155607 ) The goal is to expose more variants that can operate without preconstructed MachineInstrs or MachineOperands.	2025-09-03 09:06:18 +09:00
Matt Arsenault	d6a72cb300	AMDGPU: Fix fixme for out of bounds indexing in usesConstantBus check (#155603 ) This loop over all the operands in the MachineInstr will eventually go past the end of the MCInstrDesc's explicit operands. We don't need the instr desc to compute the constant bus usage, just the register and whether it's implicit or not. The check here is slightly conservative. e.g. a random vcc implicit use appended to an instruction will falsely report a constant bus use.	2025-09-02 17:25:08 +00:00
Matt Arsenault	e3e1652d18	AMDGPU: Add version of isImmOperandLegal for MCInstrDesc (#155560 ) This avoids the need for a pre-constructed instruction, at least for the first argument.	2025-09-03 01:18:41 +09:00
Chris Jackson	7d0203b39f	[AMDGPU] Prevent generation of unused SGPR IMPLICIT_DEF assignments (#155241 ) Dead VGPR->SGPR copies were converted to IMPLICIT_DEF assignments that were unused. Prevent these from being created and update the numerous affected tests.	2025-08-27 13:18:18 +01:00
Matt Arsenault	de99aabed6	AMDGPU: Remove unused argument from adjustAllocatableRegClass (#155554 )	2025-08-27 06:00:34 +00:00
Matt Arsenault	05f208ac0b	AMDGPU: Stop checking if registers are reserved in adjustAllocatableRegClass (#155125 ) This function is used to implement TargetInstrInfo::getRegClass and conceptually should not depend on the dynamic state of the function.	2025-08-26 20:09:32 +09:00
Matt Arsenault	db024764c1	AMDGPU: Fix not diagnosing unaligned VGPRs for vsrc operands (#155104 ) This was not checking the alignment requirement for 64-bit operands which accept inline immediates. Not all custom operand types were handled in the switch, so round out with explicit handling of all enum values, and change the default to use the default checks for unhandled cases. Fixes #155095	2025-08-25 17:42:58 +09:00
Matt Arsenault	52ed03db59	AMDGPU: Simplify foldImmediate with register class based checks (#154682 ) Generalize the code over the properties of the mov instruction, rather than maintaining parallel logic to figure out the type of mov to use. I've maintained the behavior with 16-bit physical SGPRs, though I think the behavior here is broken and corrupting any value that happens to be live in the high bits. It just happens there's no way to separately write to those with a real instruction but I don't think we should be trying to make assumptions around that property. This is NFC-ish. It now does a better job with imm pseudos which practically won't reach here. This also will make it easier to support more folds in a future patch. I added a couple of new tests with 16-bit extract of 64-bit sources.	2025-08-23 02:13:50 +00:00

1 2 3 4 5 ...

1023 Commits