llvm-project

Author	SHA1	Message	Date
Brox Chen	5d1c596ab4	[AMDGPU][True16][MC] true16 for minimummaximum/max/min/max3/min3 (#124184 ) true16 support for gfx12 instructions including: v_minimummaximum_f16 v_maximumminimum_f16 v_maximum_f16 v_minimum_f16 v_maximum3_f16 v_minimum3_f16	2025-01-27 16:52:59 -05:00
Venkata Ramanaiah Nalamothu	f7d8336a2f	[llvm] Pass MachineInstr flags to storeRegToStackSlot/loadRegFromStackSlot (NFC) (#120622 ) This patch is in preparation to enable setting the MachineInstr::MIFlag flags, i.e. FrameSetup/FrameDestroy, on callee saved register spill/reload instructions in prologue/epilogue. This eventually helps in setting the prologue_end and epilogue_begin markers more accurately. The DWARF Spec in "6.4 Call Frame Information" says: The code that allocates space on the call frame stack and performs the save operation is called the subroutine’s prologue, and the code that performs the restore operation and deallocates the frame is called its epilogue. which means the callee saved register spills and reloads are part of prologue (a.k.a frame setup) and epilogue (a.k.a frame destruction), respectively. And, IIUC, LLVM backend uses FrameSetup/FrameDestroy flags to identify instructions that are part of call frame setup and destruction. In the trunk, while most targets consistently set FrameSetup/FrameDestroy on save/restore call frame information (CFI) instructions of callee saved registers, they do not consistently set those flags on the actual callee saved register spill/reload instructions. I believe this patch provides a clean mechanism to set FrameSetup/FrameDestroy flags on the actual callee saved register spill/reload instructions as needed. And, by having default argument of MachineInstr::NoFlags for Flags, this patch is a NFC. With this patch, the targets have to just pass FrameSetup/FrameDestroy flag to the storeRegToStackSlot/loadRegFromStackSlot calls from the target derived spillCalleeSavedRegisters and restoreCalleeSavedRegisters to set those flags on callee saved register spill/reload instructions. Also, this patch makes it very easy to set the source line information on callee saved register spill/reload instructions which is needed by the DwarfDebug.cpp implementation to set prologue_end and epilogue_begin markers more accurately. As per DwarfDebug.cpp implementation: prologue_end is the first known non-DBG_VALUE and non-FrameSetup location that marks the beginning of the function body epilogue_begin is the first FrameDestroy location that has been seen in the epilogue basic block With this patch, the targets have to just do the following to set the source line information on callee saved register spill/reload instructions, without hampering the LLVM's efforts to avoid adding source line information on the artificial code generated by the compiler. <Foo>InstrInfo::storeRegToStackSlot() { ... DebugLoc DL = Flags & MachineInstr::FrameSetup ? DebugLoc() : MBB.findDebugLoc(I); ... } <Foo>InstrInfo::loadRegFromStackSlot() { ... DebugLoc DL = Flags & MachineInstr::FrameDestroy ? MBB.findDebugLoc(I) : DebugLoc(); ... } While I understand this patch would break out-of-tree backend builds, I think it is in the right direction. One immediate use case that can benefit from this patch is fixing #120553 becomes simpler.	2025-01-22 13:36:39 +05:30
Kazu Hirata	ceaaa2b9ae	[AMDGPU] Fix warnings This patch fixes: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2792:14: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare] llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2797:14: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare]	2025-01-21 20:24:30 -08:00
Shoreshen	7c58d6363a	[AMDGPU] Add commute for some VOP3 inst (#121326 ) add commute for some VOP3 inst, allow commute for both inline constant operand, adjust tests Fixes #111205	2025-01-22 11:08:26 +07:00
Austin Kerbow	657fb4433e	[AMDGPU] Add target hook to isGlobalMemoryObject (#112781 ) We want special handing for IGLP instructions in the scheduler but they should still be treated like they have side effects by other passes. Add a target hook to the ScheduleDAGInstrs DAG builder so that we have more control over this.	2025-01-11 09:57:57 -08:00
Matt Arsenault	f6365a47a1	AMDGPU: Fix assert on physreg MUBUF rsrc operand (#120815 ) The stack case uses a physical register and should not ordinarily reach here, but strange things happen at -O0. The testcase still errors because we do not yet attempt to handle arbitrary dynamic sized allocas yet. Fixes: SWDEV-503538	2025-01-07 08:11:05 +07:00
Brox Chen	ce831a231a	[AMDGPU][True16][MC] true16 for v_fma_f16 (#119477 ) Support true16 format for v_fma_f16 in MC. Since we are replacing v_fma_f16 to v_fma_f16_t16/v_fma_f16_fake16 in Post-GFX11, have to update the CodeGen pattern for v_fma_f16_fake16 to get CodeGen test passing. There is no pattern modified/created, but just replacing the v_fma_f16 with fake16 format.	2025-01-06 15:02:04 -05:00
Brox Chen	e10b12e656	[AMDGPU][True16][MC] true16 for v_div_fixup_f16 (#119613 ) Support true16 format for v_div_fixup_f16 in MC.	2024-12-18 18:01:13 -05:00
Ruiling, Song	67c55b1ffc	[AMDGPU] Make max dwords of memory cluster configurable (#119342 ) We find it helpful to increase the value for graphics workload. Make it configurable so we can experiment with a different value.	2024-12-18 14:17:27 +08:00
Matt Arsenault	5e53a8dadb	AMDGPU: Fix verifier assert with out of bounds subregister indexes (#119799 ) The manual check for aligned VGPR classes would assert if a virtual register used an index not supported by the register class.	2024-12-13 11:52:11 +09:00
Matt Arsenault	1944d192bd	AMDGPU: Use isWave[32\|64] instead of comparing size value (#117411 )	2024-11-23 09:30:57 -08:00
Matt Arsenault	d1cca3133a	AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260 ) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.	2024-11-22 20:12:50 -08:00
Brox Chen	4cc278587f	[AMDGPU][True16][MC] VOPC profile fake16 pseudo update (#113175 ) Update VOPC profile with VOP3 pseudo: 1. On GFX11+, v_cmp_class_f16 has src1 type f16 for literals, however it's semantically interpreted as an integer. Update VOPC class f16 profile from operand type f16, i16 to f16, f16, currently updating it for fake16 format, and will update t16 format in the following patch. 2. 16bit V_CMP_CLASS instructions (V_CMP_**_U/I/F16) are named with `t16`, but actually using 32 bit registers. Correct it by updating the pseudo definitions with useRealTrue16/useFakeTrue16 predicates and rename these `t16` instructions to `fake16`. 3. Update the inst select so that `t16`/`fake16` instructions are selected in true16/fake16 flow. 4. The mir test file are impacted for a name change of these impacted 16 bit V_CMP instructions, but non-functional change to emitted code	2024-11-22 12:12:13 -05:00
Christudasan Devadasan	2b5b57c5cf	[AMDGPU] Skip non-wwm reg implicit-def from bb prolog (#115834 ) Currently all implicit-def instructions are part of bb prolog. We should only include the wwm-register's implicit definitions into the BB prolog. The other vector class registers' implicit defs when exist at the bb top might cause interference when pushed the LR_split copy insertion downwards. The SplitKit is very strict on altering the insertion points and will assert such instances.	2024-11-12 23:30:57 +05:30
Brox Chen	e8644e3b47	[AMDGPU][True16][MC] VOP2 update instructions with fake16 format (#114436 ) Some old "t16" VOP2 instructions are actually in fake16 format. Correct and update test file	2024-11-05 16:12:49 -05:00
Matt Arsenault	8e61aaa021	AMDGPU: Fix illegal commute with frame index (#114497 ) In ca409892c5396fa3fbb8ea4dbf53d0e952f36d09, frame indexes started being treated more like registers, rather than immediates. Update the commute logic to avoid failing the verifier by moving illegal SGPR operands in place of a frame index.	2024-11-01 10:02:29 -07:00
Christudasan Devadasan	3c5cea650d	[AMDGPU]: Add implicit-def to the BB prolog (#112872 ) IMPLICIT_DEF inserted for a wwm-register at the very first block or the predecessor block where it is used for sgpr spilling can appear at a block begin that requires spill-insertion during per-lane VGPR regalloc phase. The presence of the IMPLICIT_DEF currently breaks the BB prolog. Fixes: SWDEV-490717	2024-10-21 13:21:16 +05:30
Nikita Popov	255a99c29f	[APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (#80309 ) This fixes all the places that hit the new assertion added in https://github.com/llvm/llvm-project/pull/106524 in tests. That is, cases where the value passed to the APInt constructor is not an N-bit signed/unsigned integer, where N is the bit width and signedness is determined by the isSigned flag. The fixes either set the correct value for isSigned, set the implicitTrunc flag, or perform more calculations inside APInt. Note that the assertion is currently still disabled by default, so this patch is mostly NFC.	2024-10-17 08:48:08 +02:00
Brox Chen	35e937b4de	[AMDGPU][True16][CodeGen] fp conversion in true/fake16 format (#101678 ) fp conversion V_CVT_F_F/V_CVT_F_U instructions true16 format were previously implemented using fake16 profile. With the MC support inplace, correct and support these instructions in true16/fake16 format in CodeGen	2024-10-16 12:26:01 -04:00
Changpeng Fang	f6e93b8147	AMDGPU: Minor improvement and cleanup for waterfall loop generation (#111886 ) First, ReadlanePieces should be in the scope of each MachineOperand. It is not correct if we declare in a outer scope without clearing after the use for a MachineOperand. Additionally, we do not need the OrigBB argyment for emitLoadScalarOpsFromVGPRLoop, since MachineFunction (the only use) can be obtained from LoopBB (or BodyBB).	2024-10-10 12:13:36 -07:00
Christudasan Devadasan	6636f32615	[AMDGPU] Include WWM register spill into BB Prolog (#111496 ) With #93526 we split the regalloc pipeline further to have a standalone allocation for wwm registers and per-lane VGPRs. Currently the presence of the wwm-spill reloads inserted at the bb-top limits the isBasicPrologue function during the per-lane vgpr regalloc to skip past the exec manipulation instruction and ended up causing incorrect codegen. The wmm-spill inserted during the wwm-regalloc pipeline should also be included in the bb-prolog so that the per-lane vgpr regalloc pipeline can identify the appropriate insertion points for their spills and copies.	2024-10-08 15:13:12 +05:30
Yaxun (Sam) Liu	3b88805ca2	[AMDGPU] Fix SDWA commuting (#106920 ) SDWA insts miss reverse opcode, which causes them to be treated as commutable with default reverse opcode i.e. their own opcode. As a result, SWDA F16 sub A, B and Sub B, A are merged by machine CSE. The correct behavior is to merged sub A, B and subrev B, A instead of sub B, A. This issues caused failures in rocFFT tests. Another issue is that src0_sel and src1_sel are not swapped when SDWA insts are commuted. Verified that this fixes rocFFT tests failure.	2024-10-04 15:53:40 -04:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Jay Foad	735a5f67e3	[AMDGPU] When allocating VGPRs, VGPR spills are not part of the prologue (#109439 ) PRs #69924 and #72140 modified SIInstrInfo::isBasicBlockPrologue to skip over EXEC modifications and spills when allocating VGPRs. But treating VGPR spills as part of the prologue can confuse the register allocator as in #109294, so restrict it to SGPR spills, which were inserted during SGPR allocation which is done in an earlier pass. Fixes: #109294 Fixes: SWDEV-485841	2024-09-30 13:24:55 +01:00
Corbin Robeck	661666d43a	[AMDGPU] Move renamedInGFX9 from TableGen to SIInstrInfo helper function/macro to free up a bit slot (#82787 ) Follow on to #81525 and #81901 in the series of consolidating bits in TSFlags. Remove renamedInGFX9 from SIInstrFormats.td and move to helper function/macro in SIInstrInfo. renamedInGFX9 points to V_{add, sub, subrev, addc, subb, subbrev}_ U32 and V_{div_fixup_F16, fma_F16, interp_p2_F16, mad_F16, mad_U16, mad_I16}.	2024-09-25 20:38:51 -04:00
Georgi Mirazchiyski	c30fa3cde7	[AMDGPU] Fix has_single_bit assertion for Mask in SIInstrInfo (#109785 ) Convert the `int64_t` Mask to `uint64_t` for `llvm::has_single_bit` to compile.	2024-09-24 15:54:05 +04:00
Georgi Mirazchiyski	6cfe6a6b3e	[NFC][AMDGPU] Assert no bad shift operations will happen (#108416 ) The assumption in the asserts is based on the fact that no SGPR/VGPR register Arg mask in the ISelLowering and Legalizer can equal zero. They are implicitly set to ~0 by default (meaning non-masked) or explicitly to a non-zero value. The `optimizeCompareInstr` case is different from the above described. It requires the mask to be a power-of-two because it's a special-case optimization, hence in this case we still cannot have an invalid shift. This commit also silences static analysis tools wrt potential bad shifts that could result from the output of `countr_zero(Mask)`.	2024-09-24 14:47:57 +04:00
Jun Wang	f6a8eb98b1	[AMDGPU][MC] Disallow null as saddr in flat instructions (#101730 ) Some flat instructions have an saddr operand. When 'null' is provided as saddr, it may have the same encoding as another instruction. For example, the instructions 'global_atomic_add v1, v2, null' and 'global_atomic_add v[1:2], v2, off' have the same encoding. This patch disallows having null as saddr.	2024-09-24 11:08:41 +04:00
Matt Arsenault	8632e8bd64	AMDGPU: Fix implicit vcc def to vcc_lo on wave32 targets (#109514 )	2024-09-23 13:20:21 +04:00
Carl Ritson	d147b6d581	[AMDGPU] Add hazard workarounds to insertIndirectBranch (#109127 ) BranchRelaxation runs after the hazard recognizer, so workarounds for SGPR accesses need to be applied directly inline to the code it generates.	2024-09-22 14:56:11 +09:00
Jay Foad	73b8074e68	[AMDGPU] Do not use APInt for simple 64-bit arithmetic. NFC. (#109414 )	2024-09-20 13:45:04 +01:00
Nikita Popov	cee0bf9626	[AMDGPU] Use Lo_32 and Hi_32 helpers (NFC) (#109413 )	2024-09-20 14:35:38 +02:00
Aditi Medhane	60a8b2b1d0	[AMDGPU] Add MachineVerifier check to detect illegal copies from vector register to SGPR (#105494 ) Addition of a check in the MachineVerifier to detect and report illegal vector registers to SGPR copies in the AMDGPU backend, ensuring correct code generation. We can enforce this check only after SIFixSGPRCopies pass. This is half-fix in the pipeline with the help of isSSA MachineFuction property, the check is happening for passes after phi-node-elimination.	2024-09-19 13:57:44 +05:30
Aditi Medhane	5a8d2dd1f9	[AMDGPU] Handle subregisters properly in generic operand legalizer (#108496 ) Fix for the issue found during COPY introduction during legalization of PHI operands for sgpr to vgpr copy when subreg is involved.	2024-09-18 13:14:49 +05:30
Jay Foad	e55d6f5ea2	[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (#107889 ) Always generate v_cndmask_b32 instead of modifying exec around v_mov_b32. This is expected to be faster because modifying exec generally causes pipeline stalls.	2024-09-11 17:16:06 +01:00
Brox Chen	35e27c0ee5	[AMDGPU][True16][MC] 16bit vsrc and vdst support in MC (#104510 ) This is a large patch includes the MC level support for V_CVT_F16_F32, V_CVT_F32_F16 and V_LDEXP_F16 in true16 format. This patch includes the asm/disasm changes to encode/decode the 16bit vsrc, vdst and src modifieres for vop and dpp format. This patch is a dependency for many 16 bit instructions while only three instructions are updated to make it easier to review. There will be another patch to support these three instructions in the codeGen level, this patch just replaces these two instructions with its fake16 format.	2024-09-11 10:48:11 -04:00
Jay Foad	7a30b9c0f0	[AMDGPU] Make more use of getWaveMaskRegClass. NFC. (#108186 )	2024-09-11 14:55:53 +01:00
Matt Arsenault	a9daad8280	AMDGPU: Update live intervals in convertToThreeAddress (#104610 ) Fixes #98741	2024-09-06 18:18:27 +04:00
Stanislav Mekhanoshin	bd840a4004	[AMDGPU] Add target intrinsic for s_prefetch_data (#107133 )	2024-09-05 15:14:31 -07:00
Carl Ritson	16cda01d22	[AMDGPU] V_SET_INACTIVE optimizations (#98864 ) Optimize V_SET_INACTIVE by allow it to run in WWM. Hence WWM sections are not broken up for inactive lane setting. WWM V_SET_INACTIVE can typically be lower to V_CNDMASK. Some cases require use of exec manipulation V_MOV as previous code. GFX9 sees slight instruction count increase in edge cases due to smaller constant bus. Additionally avoid introducing exec manipulation and V_MOVs where a source of V_SET_INACTIVE is the destination. This is a common pattern as WWM register pre-allocation often assigns the same register.	2024-09-05 14:39:28 +09:00
Piyou Chen	b01c006f73	[TII][RISCV] Add renamable bit to copyPhysReg (#91179 ) The renamable flag is useful during MachineCopyPropagation but renamable flag will be dropped after lowerCopy in some case. This patch introduces extra arguments to pass the renamable flag to copyPhysReg.	2024-08-27 10:08:43 +08:00
Juan Manuel Martinez Caamaño	cbf34a5f77	[AMDGPU] Remove dead pass: AMDGPUMachineCFGStructurizer (#105645 )	2024-08-23 14:06:17 +02:00
Carl Ritson	fc6300a5f7	[AMDGPU] Disable inline constants for pseudo scalar transcendentals (#104395 ) Prevent operand folding from inlining constants into pseudo scalar transcendental f16 instructions. However still allow literal constants.	2024-08-17 16:52:38 +09:00
Ivan Kosarev	f0fe6c66cb	[AMDGPU][NFC] Rename isHi() to isHi16Reg() for clarity. (#103888 ) And declare it to take an MCRegister. Also rename related entities and remove a comment for the function that depending on its purpose is either irrelevant or misleading.	2024-08-14 17:04:15 +01:00
Brox Chen	ae059a1f9f	[AMDGPU][True16][CodeGen] support v_mov_b16 and v_swap_b16 in true16 format (#102198 ) support v_swap_b16 in true16 format. update tableGen pattern and folding for v_mov_b16. --------- Co-authored-by: guochen2 <guochen2@amd.com>	2024-08-08 16:52:59 -04:00
Matt Arsenault	ca409892c5	AMDGPU: Permit more frame index operands in verifier (#101691 ) Treat FI operands more like a register. When it gets materialized, we will typically need to introduce a scavenged register anyway. Add baseline tests for folding frame indexes into add/or.	2024-08-05 21:34:58 +04:00
Carl Ritson	62aa596ba1	[AMDGPU] Add no return image_sample intrinsics and instructions (#97542 ) An appropriately configured image resource descriptor can trigger image_sample instructions to store outputs directly to a linked memory location instead of returning to VGPRs. This is opaque to the backend as instruction encoding is unchanged; however, a mechanism is require to allow frontends to communicate that these instructions do not require destination VGPRs and store to memory. Flagging these as stores means they will not be optimized away.	2024-07-20 17:26:58 +09:00
Jay Foad	0ce3ea1bff	[AMDGPU] Simplify selection of llvm.amdgcn.inverse.ballot. NFCI. (#99345 )	2024-07-18 07:45:13 +01:00
Jay Foad	63fae3ed65	[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298 )	2024-07-17 21:11:00 +01:00
Jay Foad	f10a78b7e4	[AMDGPU] clang-tidy: use std::make_unique. NFC.	2024-07-17 07:58:09 +01:00

1 2 3 4 5 ...

896 Commits