llvm-project

Author	SHA1	Message	Date
Matt Arsenault	01c9a14ccf	AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723 ) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp). I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns. The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.	2024-11-21 08:51:58 -08:00
Gang Chen	8c752900dd	[AMDGPU] modify named barrier builtins and intrinsics (#114550 ) Use a local pointer type to represent the named barrier in builtin and intrinsic. This makes the definitions more user friendly bacause they do not need to worry about the hardware ID assignment. Also this approach is more like the other popular GPU programming language. Named barriers should be represented as global variables of addrspace(3) in LLVM-IR. Compiler assigns the special LDS offsets for those variables during AMDGPULowerModuleLDS pass. Those addresses are converted to hw barrier ID during instruction selection. The rest of the instruction-selection changes are primarily due to the intrinsic-definition changes.	2024-11-06 10:37:22 -08:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Jun Wang	f6a8eb98b1	[AMDGPU][MC] Disallow null as saddr in flat instructions (#101730 ) Some flat instructions have an saddr operand. When 'null' is provided as saddr, it may have the same encoding as another instruction. For example, the instructions 'global_atomic_add v1, v2, null' and 'global_atomic_add v[1:2], v2, off' have the same encoding. This patch disallows having null as saddr.	2024-09-24 11:08:41 +04:00
Aditi Medhane	60a8b2b1d0	[AMDGPU] Add MachineVerifier check to detect illegal copies from vector register to SGPR (#105494 ) Addition of a check in the MachineVerifier to detect and report illegal vector registers to SGPR copies in the AMDGPU backend, ensuring correct code generation. We can enforce this check only after SIFixSGPRCopies pass. This is half-fix in the pipeline with the help of isSSA MachineFuction property, the check is happening for passes after phi-node-elimination.	2024-09-19 13:57:44 +05:30
Jay Foad	e55d6f5ea2	[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (#107889 ) Always generate v_cndmask_b32 instead of modifying exec around v_mov_b32. This is expected to be faster because modifying exec generally causes pipeline stalls.	2024-09-11 17:16:06 +01:00
Carl Ritson	16cda01d22	[AMDGPU] V_SET_INACTIVE optimizations (#98864 ) Optimize V_SET_INACTIVE by allow it to run in WWM. Hence WWM sections are not broken up for inactive lane setting. WWM V_SET_INACTIVE can typically be lower to V_CNDMASK. Some cases require use of exec manipulation V_MOV as previous code. GFX9 sees slight instruction count increase in edge cases due to smaller constant bus. Additionally avoid introducing exec manipulation and V_MOVs where a source of V_SET_INACTIVE is the destination. This is a common pattern as WWM register pre-allocation often assigns the same register.	2024-09-05 14:39:28 +09:00
Piyou Chen	b01c006f73	[TII][RISCV] Add renamable bit to copyPhysReg (#91179 ) The renamable flag is useful during MachineCopyPropagation but renamable flag will be dropped after lowerCopy in some case. This patch introduces extra arguments to pass the renamable flag to copyPhysReg.	2024-08-27 10:08:43 +08:00
Juan Manuel Martinez Caamaño	cbf34a5f77	[AMDGPU] Remove dead pass: AMDGPUMachineCFGStructurizer (#105645 )	2024-08-23 14:06:17 +02:00
Carl Ritson	fc6300a5f7	[AMDGPU] Disable inline constants for pseudo scalar transcendentals (#104395 ) Prevent operand folding from inlining constants into pseudo scalar transcendental f16 instructions. However still allow literal constants.	2024-08-17 16:52:38 +09:00
Carl Ritson	befd44bcdc	[AMDGPU] Update hasUnwantedEffectsWhenEXECEmpty (#97982 ) Add barriers and s_wait_event to hasUnwantedEffectsWhenEXECEmpty. Add a comment documenting the current expected use of the function.	2024-07-17 11:30:44 +09:00
Carl Ritson	7eb1a320cc	[AMDGPU] Update EXECZ retention in SIPreEmitPeephole for GFX10/12 (#97676 ) The check to maintain EXECZ branches only checks S_WAITCNT. Add handling for new waitcnt instructions in GFX10 and GFX12.	2024-07-09 14:44:31 +09:00
Kazu Hirata	acd7a688fc	[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#97778 )	2024-07-06 16:48:32 +09:00
David Stuttard	5fb1e2825f	[AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595 ) Code to determine if a waitcnt is required before a barrier instruction only considered S_BARRIER. gfx12 adds barrier_signal/wait so need to enhance the existing code to look for a barrier start (which is just an S_BARRIER for earlier architectures).	2024-05-01 11:37:13 +01:00
Emma Pilkington	a04714701f	[AMDGPU] Add a trap lowering workaround for gfx11 (#85854 ) On gfx11 shaders run with PRIV=1, which causes `s_trap 2` to be treated as a nop, which means it isn't a correct lowering for the trap intrinsic. As a workaround, this commit instead lowers the trap intrinsic to instructions that simulate the behavior of s_trap 2. Fixes: SWDEV-438421	2024-04-24 09:43:54 -04:00
Changpeng Fang	ab76052fa9	AMDGPU: Treat SWMMAC the same as MFMA and other WMMA for sched_barrier (#85721 )	2024-03-19 09:58:09 -07:00
Shilei Tian	e963d0740e	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#84402 ) The current implementation of `isInlinableLiteral16` assumes, a 16-bit inlinable literal is either an `i16` or a `fp16`. This is not always true because of `bf16`. However, we can't tell `fp16` and `bf16` apart by just looking at the value. This patch splits `isInlinableLiteral16` into three versions, `i16`, `fp16`, `bf16` respectively, and call the corresponding version.	2024-03-08 14:49:52 -05:00
David Green	44be5a7fdc	[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875 ) This is another part of #70452 which makes getMemOperandsWithOffsetWidth use a LocationSize for Width, as opposed to the unsigned it currently uses. The advantages on it's own are not super high if getMemOperandsWithOffsetWidth usually uses known sizes, but if the values can come from an MMO it can help be more accurate in case they are Unknown (and in the future, scalable).	2024-03-06 17:40:13 +00:00
Shilei Tian	e9c1dbb408	Revert "[AMDGPU] Replace `isInlinableLiteral16` with specific version (#81345 )" This reverts commit 530f0e64ec11327879c44f2fd55c7c28efdbaa2d because it breaks downstream.	2024-03-06 08:42:54 -05:00
Shilei Tian	530f0e64ec	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#81345 )	2024-03-04 08:40:42 -05:00
Jay Foad	53f89a0bb7	[AMDGPU] Remove AtomicNoRet class and getAtomicNoRetOp table (#83593 )	2024-03-01 17:18:55 +00:00
Petar Avramovic	3e35ba53e2	AMDGPU/GFX12: Insert waitcnts before stores with scope_sys (#82996 ) Insert waitcnts for loads and atomics before stores with system scope. Scope is field in instruction encoding and corresponds to desired coherence level in cache hierarchy. Intrinsic stores can set scope in cache policy operand. If volatile keyword is used on generic stores memory legalizer will set scope to system. Generic stores, by default, get lowest scope level. Waitcnts are not required if it is guaranteed that memory is cached. For example vulkan shaders can guarantee this. TODO: implement flag for frontends to give us a hint not to insert waits. Expecting vulkan flag to be implemented as vulkan:private MMRA.	2024-02-28 16:18:04 +01:00
Corbin Robeck	2d9f350449	[AMDGPU] Consolidate SGPRSpill and VGPRSpill into single Spill bit (#81901 ) Follow on to #81525 in the series of consolidating bits in TSFlags. Merge SGPRSpill and VGPRSpill into single Spill bit Modify isSGPRSpill and isVGPRSpill helper functions to differentiate VGPR and SGPR spills: Spill+SALU=SGPR Spill Spill+VALU=VGPR Spill The only exception here is SGPR spills to VGPRs which require an explicit instruction check.	2024-02-16 13:32:59 -05:00
Konstantin Zhuravlyov	fcef407aa2	AMDGPU/NFC: Remove some bits from TSFlags (#81525 ) - AMDGPU/NFC: Purge SOPK_ZEXT from TSFlags - Moved to helper function in SIInstInfo - AMDGPU/NFC: Purge VOPAsmPrefer32Bit from TSFlags - This flag did not make sense / remnants of something else I think	2024-02-12 16:43:48 -05:00
Philip Reames	3ff7caea33	[TTI] Use Register in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#80339 )	2024-02-01 17:52:35 -08:00
Shengchen Kan	550f0eb2ce	[NFC] Rename TargetInstrInfo::FoldImmediate to TargetInstrInfo::foldImmediate and simplify implementation for X86	2024-01-26 20:50:58 +08:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Jay Foad	ba52f06f9d	[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438 ) Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into separate LOADCNT, SAMPLECNT and BVHCNT counters.	2024-01-18 10:47:45 +00:00
Mirko Brkušanin	1d286ad59b	[AMDGPU] Add mark last scratch load pass (#75512 )	2024-01-18 09:36:44 +01:00
Jay Foad	9d8e53818d	[AMDGPU] Refactor getNonSoftWaitcntOpcode and its callers (#77933 ) This avoids listing all soft waitcnt opcodes in two places (getNonSoftWaitcntOpcode and isSoftWaitcnt) and avoids the need for helpers isWaitcnt and isWaitcntVsCnt.	2024-01-12 17:12:09 +00:00
Jay Foad	daa4728dee	[AMDGPU] Add CodeGen support for GFX12 s_mul_u64 (#75825 )	2024-01-08 19:13:38 +00:00
Acim Maravic	48f36c6e74	[LLVM] Make use of s_flbit_i32_b64 and s_ff1_i32_b64 (#75158 ) Update DAG ISel to support 64bit versions S_FF1_I32_B64 and S_FLBIT_I32_B664 --------- Co-authored-by: Acim Maravic <Acim.Maravic@amd.com>	2023-12-25 11:55:20 +01:00
Mirko Brkušanin	07a6d73664	[AMDGPU] CodeGen for GFX12 VFLAT, VSCRATCH and VGLOBAL instructions (#75493 )	2023-12-15 15:01:40 +01:00
Mirko Brkušanin	5879162f7f	[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492 )	2023-12-15 13:45:03 +01:00
Pierre van Houtryve	ef067f5204	[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (#72830 ) Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>	2023-12-15 12:33:32 +01:00
Stanislav Mekhanoshin	c6ecbcb48b	[AMDGPU] Fix no waitcnt produced between LDS DMA and ds_read on gfx10 (#75245 ) BUFFER_LOAD_DWORD_LDS was incorrectly touching vscnt instead of the vmcnt. This is VMEM load and DS store, so it shall use vmcnt.	2023-12-13 10:49:36 -08:00
Jay Foad	220e095a2c	[AMDGPU] Remove unused function splitScalar64BitAddSub	2023-12-12 15:01:57 +00:00
Alex Bradbury	b717365216	[MachineScheduler][NFCI] Add Offset and OffsetIsScalable args to shouldClusterMemOps (#73778 ) These are picked up from getMemOperandsWithOffsetWidth but weren't then being passed through to shouldClusterMemOps, which forces backends to collect the information again if they want to use the kind of heuristics typically used for the similar shouldScheduleLoadsNear function (e.g. checking the offset is within 1 cache line). This patch just adds the parameters, but doesn't attempt to use them. There is potential to use them in the current PPC and AArch64 shouldClusterMemOps implementation, and I intend to use the offset in the heuristic for RISC-V. I've left these for future patches in the interest of being as incremental as possible. As noted in the review and in an inline FIXME, an ElementCount-style abstraction may later be used to condense these two parameters to one argument. ElementCount isn't quite suitable as it doesn't support negative offsets.	2023-12-06 15:30:48 +00:00
Mirko Brkušanin	f5868cb6a6	[AMDGPU][MC] Add GFX12 VIMAGE and VSAMPLE encodings (#74062 )	2023-12-04 13:04:42 +01:00
Alex Bradbury	6cf3566850	[NFC][MachineScheduler] Rename NumLoads parameter of shouldClusterMemOps to ClusterSize (#73757 ) As the same hook is called for both load and store clustering, NumLoads is a misleading name. Use ClusterSize instead.	2023-11-29 09:47:03 +00:00
Christudasan Devadasan	ce7fd498ed	[AMDGPU] RA inserted scalar instructions can be at the BB top (#72140 ) We adjust the insertion point at the BB top for spills/copies during RA to ensure they are placed after the exec restore instructions required for the divergent control flow execution. This is, however, required only for the vector operations. The insertions for scalar registers can still go to the BB top.	2023-11-16 10:30:03 +05:30
Diana	1fa58c7790	[AMDGPU] Callee saves for amdgpu_cs_chain[_preserve] (#71526 ) Teach prolog epilog insertion how to handle functions with the amdgpu_cs_chain or amdgpu_cs_chain_preserve calling conventions. For amdgpu_cs_chain functions, we only need to preserve the inactive lanes of VGPRs above v8, and only in the presence of calls via @llvm.amdgcn.cs.chain. For amdgpu_cs_chain_preserve functions, we will also need to preserve the active lanes for registers above the last argument VGPR. AFAICT there's no direct way to find out what the last argument VGPR is, so instead the patch uses the fact that chain calls from amdgpu_cs_chain_preserve functions can't use more VGPRs than the caller's VGPR arguments. In other words, it removes the operands of SI_CS_CHAIN_TC instructions from the list of callee saved registers. For both calling conventions, registers v0-v7 never need to be saved and restored, so we should never add them as WWM spills. Differential Revision: https://reviews.llvm.org/D156412	2023-11-08 08:28:15 +01:00
Christudasan Devadasan	a0eb6b88f9	[AMDGPU] Try to fix the block prologs broken by RA inserted instructions (#69924 ) The insertion point determined by RA while attempting spills and liverange split at the beginning of a block goes wrong at times, and the newly inserted vector instructions are placed before the exec-mask restore instruction which is wrong. It occurs mainly due to the dependency on isBasicBlockPrologue that doesn't account early inserted instructions (spills and splits) during RA and causes the block prolog break. A better approach for deciding the insertion point should be worked out. For now, improving the helper function to consider all possible early insertions. This patch includes the spill instructions. The copies associated with liverange split should also be included in the block prolog.	2023-10-27 19:10:18 +05:30
Jay Foad	3c58e53041	[AMDGPU] Use const reference in SIInstrInfo::buildExtractSubReg. NFC.	2023-10-26 15:42:24 +01:00
Piotr Sobczak	ba3d6e0499	[AMDGPU] Rematerialize scalar loads (#68778 ) Extend the list of instructions that can be rematerialized in SIInstrInfo::isReallyTriviallyReMaterializable() to support scalar loads. Try shrinking instructions to remat only the part needed for current context. Add SIInstrInfo::reMaterialize target hook, and handle shrinking of S_LOAD_DWORDX16_IMM to S_LOAD_DWORDX8_IMM as a proof of concept.	2023-10-26 11:34:33 +02:00
Petar Avramovic	2fa7d652d0	AMDGPU: Fix temporal divergence introduced by machine-sink (#67456 ) Temporal divergence that was present in input or introduced in IR transforms, like code-sinking or LICM, is handled in SIFixSGPRCopies by changing sgpr source instr to vgpr instr. After 5b657f5, that moved LICM after AMDGPUCodeGenPrepare, machine-sinking can introduce temporal divergence by sinking instructions outside of the cycle. Add isSafeToSink callback in TargetInstrInfo.	2023-10-06 15:00:08 +02:00
Yashwant Singh	7ac532efc8	[AMDGPU] Introduce AMDGPU::SGPR_SPILL asm comment flag (#67091 ) Use this flag to give more context to implicit def comments in assembly. Reviewed on phabricator: https://reviews.llvm.org/D153754	2023-09-29 11:15:01 +05:30
Christudasan Devadasan	81827f8cfb	[AMDGPU] Support wwm-reg AV spill pseudos The wwm register spill pseudos are currently defined for VGPR_32 regclass. It causes a verifier error for gfx908 or above as the regalloc sometimes restores the values to the vector superclass AV_32. Fixing it by supporting AV wwm-spill pseudos as well. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155646	2023-08-17 20:04:18 +05:30
Kazu Hirata	bcbffe4f33	[AMDGPU] Remove redundant member initialization	2023-08-11 21:44:31 -07:00
Jay Foad	e61ca23289	[AMDGPU] Add and use SIInstrFlags::GWS. NFC. This reduces the number of places where we have to check for a list of DS_GWS_* opcodes. Differential Revision: https://reviews.llvm.org/D157099	2023-08-07 12:05:14 +01:00

1 2 3 4 5 ...

323 Commits