llvm-project

Author	SHA1	Message	Date
Matt Arsenault	2b9e947d43	AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743 ) OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z] where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read. OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d] where, c & d i.e. OPSEL[3 : 2] selects which dst_byte to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:20:09 -05:00
Matt Arsenault	62584f32eb	AMDGPU: Builtins & Codegen support for v_cvt_scalef32_pk_f32_{fp8\|bf8} for gfx950 (#117741 ) OPSEL[0] determines low/high 16 bits of src0 to read. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:12:18 -05:00
Matt Arsenault	803bd812b1	AMDGPU: Builtins & Codegen support for v_cvt_scalef32_pk_{fp8\|bf8}_f32 for gfx950 (#117740 ) OPSEL[3] determines low/high 16 bits of word to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 14:57:09 -05:00
Matt Arsenault	815069c701	AMDGPU: Builtins & Codegen support for: v_cvt_scalef32_[f16\|f32]_[bf8\|fp8] (#117739 ) OPSEL[1:0] collectively decide which byte to read from src input. Builtin takes additional imm argument which represents index (with valid values:[0:3]) of src byte read. Out of bounds checks will added in next patch. OPSEL ASM Syntax: opsel:[x,y,z] where, opsel[x] = Inst{11} = src0_modifier{2} opsel[y] = Inst{12} = src1_modifier{2} opsel[z] = Inst{14} = src0_modifier{3} Note: Inst{13} i.e. OPSEL[2] is ignored in asm syntax and opsel[z] is meaningless for v_cvt_scalef32_f32_{fp\|bf}8 Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 14:54:10 -05:00
Matt Arsenault	d1cca3133a	AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260 ) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.	2024-11-22 20:12:50 -08:00
Matt Arsenault	01c9a14ccf	AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723 ) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp). I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns. The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.	2024-11-21 08:51:58 -08:00
Gang Chen	8c752900dd	[AMDGPU] modify named barrier builtins and intrinsics (#114550 ) Use a local pointer type to represent the named barrier in builtin and intrinsic. This makes the definitions more user friendly bacause they do not need to worry about the hardware ID assignment. Also this approach is more like the other popular GPU programming language. Named barriers should be represented as global variables of addrspace(3) in LLVM-IR. Compiler assigns the special LDS offsets for those variables during AMDGPULowerModuleLDS pass. Those addresses are converted to hw barrier ID during instruction selection. The rest of the instruction-selection changes are primarily due to the intrinsic-definition changes.	2024-11-06 10:37:22 -08:00
Diana Picus	3356208531	Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512 ) This reverts commit `7792b4ae79`. The problem was a conflict with `e55d6f5ea2` "[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (https://github.com/llvm/llvm-project/pull/107889)" which changed the syntax of V_SET_INACTIVE (and thus made my MIR test crash). ...if only we had a merge queue.	2024-09-13 11:54:30 +02:00
Diana Picus	7792b4ae79	Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 )"" (#108341 ) Reverts llvm/llvm-project#108173 si-init-whole-wave.mir crashes on some buildbots (although it passed both locally with sanitizers enabled and in pre-merge tests). Investigating.	2024-09-12 10:12:09 +02:00
Diana Picus	703ebca869	Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 )" (#108173 ) This reverts commit `c7a7767fca`. The buildbots failed because I removed a MI from its parent before updating LIS. This PR should fix that.	2024-09-12 09:11:41 +02:00
Vitaly Buka	c7a7767fca	Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 ) Breaks bots, see #105822. Reverts llvm/llvm-project#105822	2024-09-10 09:51:43 -07:00
Diana Picus	44556e64f2	[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822 ) This intrinsic is meant to be used in functions that have a "tail" that needs to be run with all the lanes enabled. The "tail" may contain complex control flow that makes it unsuitable for the use of the existing WWM intrinsics. Instead, we will pretend that the function starts with all the lanes enabled, then branches into the actual body of the function for the lanes that were meant to run it, and then finally all the lanes will rejoin and run the tail. As such, the intrinsic will return the EXEC mask for the body of the function, and is meant to be used only as part of a very limited pattern (for now only in amdgpu_cs_chain functions): ``` entry: %func_exec = call i1 @llvm.amdgcn.init.whole.wave() br i1 %func_exec, label %func, label %tail func: ; ... stuff that should run with the actual EXEC mask br label %tail tail: ; ... stuff that runs with all the lanes enabled; ; can contain more than one basic block ``` It's an error to use the result of this intrinsic for anything other than a branch (but unfortunately checking that in the verifier is non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if between the intrinsic and the branch). The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now is expanded in si-wqm (which is where SI_INIT_EXEC is handled too); however the information that the function was conceptually started in whole wave mode is stored in the machine function info (hasInitWholeWave). This will be useful in prolog epilog insertion, where we can skip saving the inactive lanes for CSRs (since if the function started with all the lanes active, then there are no inactive lanes to preserve).	2024-09-10 13:24:53 +02:00
Changpeng Fang	26b0bef192	AMDGPU: Use pattern to select instruction for intrinsic llvm.fptrunc.round (#105761 ) Use GCNPat instead of Custom Lowering to select instructions for intrinsic llvm.fptrunc.round. "SupportedRoundMode : TImmLeaf" is used as a predicate to select only when the rounding mode is supported. "as_hw_round_mode : SDNodeXForm" is developed to translate the round modes to the corresponding ones that hardware recognizes.	2024-08-29 11:43:58 -07:00
Juan Manuel Martinez Caamaño	cbf34a5f77	[AMDGPU] Remove dead pass: AMDGPUMachineCFGStructurizer (#105645 )	2024-08-23 14:06:17 +02:00
Petar Avramovic	269cefbc02	AMDGPU/GlobalISel: Fix isExtractHiElt when selecting fma_mix (#102130 ) isExtractHiElt should return new source register instead of returning instruction that defines it. Src = MI.getOperand(0).getReg() is not correct when MI(for example G_UNMERGE_VALUES) defines multiple registers. Refactor existing code to work with source registers only.	2024-08-07 12:13:39 +02:00
Matt Arsenault	42d641ef5c	AMDGPU/GlobalISel: Select all constants in tablegen (#100788 ) This regresses the arbitrary address space pointer case. Ideally we could write a pattern that matches a pointer based only on its size, but using iPTR/iPTRAny seem to not work for this.	2024-07-30 18:31:18 +04:00
Matt Arsenault	b356aa3e2d	AMDGPU/GlobalISel: Partially move constant selection to patterns (#100786 ) This is still relying on the manual code for splitting 64-bit constants, and handling pointers. We were missing some of the tablegen patterns for all immediate types, so this has some side effect DAG path improvements. This also reduces the diff in the 2 selector outputs.	2024-07-30 18:18:16 +04:00
Jay Foad	0ce3ea1bff	[AMDGPU] Simplify selection of llvm.amdgcn.inverse.ballot. NFCI. (#99345 )	2024-07-18 07:45:13 +01:00
Matt Arsenault	2ff22d7485	AMDGPU/GlobalISel: Reorganize select switch cases	2024-06-30 10:28:58 +02:00
Jay Foad	bf536cc7db	[AMDGPU] Fix unwanted LICM/CSE of llvm.amdgcn.pops.exiting.wave.id (#96190 ) Mark both the intrinsic and the selected MachineInstr as having side effects to prevent MachineLICM and MachineCSE from moving/removing them.	2024-06-27 09:27:52 +01:00
Jay Foad	990bed64fb	[AMDGPU] New intrinsic llvm.amdgcn.pops.exiting.wave.id (#89612 ) This provides access to the special scalar source value SRC_POPS_EXITING_WAVE_ID on GFX9 and GFX10.	2024-05-22 19:47:59 +01:00
Shilei Tian	9c6a2de24b	[AMDGPU] Clean up functions for checking inline literals (#81282 ) This patch cleans up functions for checking inline literals.	2024-02-15 12:11:51 -05:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Krzysztof Drewniak	88871784fd	[AMDGPU] Allow buffer intrinsics to be marked volatile at the IR level (#77847 ) In order to ensure the correctness of ptr addrspace(7) lowering, we need a backwards-compatible way to flag buffer intrinsics as volatile that can't be dropped (unlike metadata). To acheive this in a backwards-compatible way, we use bit 31 of the auxilliary immediates of buffer intrinsics as the volatile flag. When this bit is set, the MachineMemOperand for said intrinsic is marked volatile. Existing code will ensure that this results in the appropriate use of flags like glc and dlc. This commit also harmorizes the handling of the auxilliary immediate for atomic intrinsics, which new go through extract_cpol like loads and stores, which masks off the volatile bit.	2024-01-12 11:20:01 -06:00
Mirko Brkušanin	2adbf254a1	[AMDGPU][NFC] Rename DotIUVOP3PMods to VOP3PModsNeg (#77785 ) This is used to select the source modifier (neg) from the immediate operand. After a follow up commit this will no longer be DOTIU specific. Co-authored-by: Changpeng Fang <changpeng.fang@amd.com>	2024-01-12 10:57:24 +01:00
Mirko Brkušanin	5879162f7f	[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492 )	2023-12-15 13:45:03 +01:00
Mariusz Sikora	7f55d7de1a	[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2023-12-13 15:01:13 +01:00
Ruiling, Song	c1511a65d5	[AMDGPU] Folding imm offset in more cases for scratch access (#70634 ) For scratch load/store, our hardware only accept non-negative value in SGPR/VGPR. Besides the case that we can prove from known bits, we can also prove that the value in `base` will be non-negative: 1.) When the ADD for the address calculation has NonUnsignedWrap flag. 2.) When the immediate offset is already negative.	2023-11-29 12:46:45 +08:00
Mirko Brkušanin	ecfdc23dd2	[AMDGPU] Select gfx1150 SALU Float instructions (#66885 )	2023-09-21 12:22:55 +02:00
Matt Arsenault	1030483561	AMDGPU/GlobalISel: Handle stacksave/stackrestore https://reviews.llvm.org/D156670	2023-08-11 10:25:01 -04:00
Matt Arsenault	29fff3e2ab	AMDGPU: Try to select fmul by power of 2 to ldexp For the f64 case, this gives us a cheaper to materialize 32-bit constant. It's less obviously a win for f32 and f16. It forces us to use a VOP3 encoding so it's a neutral code size change. GlobalISel cases don't work because of the constant-is-copy-to-vgpr problem. https://reviews.llvm.org/D157111	2023-08-11 07:57:55 -04:00
Matt Arsenault	fb54afd1b7	AMDGPU: Fold fsub [+-0] into fneg when folding source modifiers This isn't always folded to fneg for a freestanding fsub depending on the denormal mode. When matching source modifiers, we're implicitly canonicalizing the input so we can fold it here. Doesn't bother handling the VOP3P case since it's only relevant with DAZ, which nobody really uses with f16. For f64, tests show an existing bug where DAGCombiner tries to respect the denormal mode for fsub -0, x, but not after it's lowered to fadd -0, (fneg x). Either the fold is wrong or we shouldn't restrict the fsub case based on the denormal mode. https://reviews.llvm.org/D155652	2023-07-20 19:29:40 -04:00
pvanhout	8444038d16	[AMDGPU] Use GlobalISel MatchTable Combiner Backend Use the new matchtable-based combiner backend for all AMDGPU combiners. This drop-in from the user's perspective; there are no test changes, the new combiner behaves exactly like the old one. Depends on D153757 NOTE: This would land iff D153757 (RFC) lands too. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D153758	2023-07-11 11:27:13 +02:00
Jessica Del	04317d4da7	[AMDGPU][GISel] Add inverse ballot intrinsic The inverse ballot intrinsic takes in a boolean mask for all lanes and returns the boolean for the current lane. See SPIR-V's `subgroupInverseBallot()` in the [[ https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt \| GL_KHR_shader_subgroup extension ]]. This allows decision making via branch and select instructions with a manually manipulated mask. Implemented in GlobalISel and SelectionDAG, since currently both are supported. The SelectionDAG required pseudo instructions to use the custom inserter. The boolean mask needs to be uniform for all lanes. Therefore we expect SGPR input. In case the source is in a VGPR, we insert one or more `v_readfirstlane` instructions. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D146287	2023-04-06 07:46:50 +02:00
Jay Foad	c75e266d31	[AMDGPU] Remove two unused ComplexRendererFns These were left over after https://reviews.llvm.org/D98663	2023-03-30 10:44:45 +01:00
Petar Avramovic	5e56d59999	Fix SGPR + offset Scratch offset folding Values in SGPR register are treated as unsigned by hardware. When value in 32-bit SGPR base can be negative calculate offset using 32-bit add instruction, otherwise use sgpr base(unsigned) + offset. Does not affect case where whole offset comes from SGPR register (immediate offset is 0). LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in SGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144955	2023-03-09 10:52:44 +01:00
Justin Bogner	c083c89744	[AMDGPU] Move V_FMA_MIX pattern matching into tablegen. NFC The matching for V_FMA_MIX was partially implemented with a C++ matcher (for fmas with 32 bit results and 16 bit inputs) and partially in tablegen (for fmas with 16 bit results). Move the C++ matcher logic into tablegen to make this more consistent and so we can remove the duplication between SDAG and GISel. Differential Revision: https://reviews.llvm.org/D144612	2023-02-23 10:23:34 -08:00
Joe Nash	80a8e6805a	[AMDGPU] Don't set src mods on permlane16 v_permlane16_b32 and v_permlanex16_b32 should not set abs and neg src modifiers on any input, but they can set op_sel on src0 or src1 to represent fi or bc when desired. The ISel patterns were setting the src_modifier bits to -1, effectively setting abs and neg as well, whenever it was intended to set op_sel, due to an error in ISel. ISel should now correctly only set the op_sel bits. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144519	2023-02-22 11:41:52 -05:00
Justin Bogner	916ae0a060	[AMDGPU] Handle nnan and fast on the call in fpmed3 patterns We were only allowing these med3 patterns if the operands were known to not be NaN, but we should also allow it if the calls to max/min have the `nnan` or `fast` flags. Differential Revision: https://reviews.llvm.org/D139506	2022-12-06 22:57:52 -08:00
Pierre van Houtryve	9e7febb4f7	[AMDGPU][GISel] Select llvm.amdgcn.fcmp intrinsics Adds FP CCs opcodes/selection logic, including src mods selection Depends on D136591, D136448 Resolves #58326 (https://github.com/llvm/llvm-project/issues/58326) Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D136592	2022-11-22 14:18:58 +00:00
Petar Avramovic	0f3e72e86c	AMDGPU/GlobalISel: Fix crash after mad/fma_mix fails selection When selectVOP3PMadMixModsImpl fails, it can still create new copy instr via selectVOP3ModsImpl. When selectG_FMA_FMAD gives up, new copy instr will remain dead but will not be automatically removed. InstructionSelect does not check if instructions created during selection are dead. Such dead copy doesn't have register class on dst operand and causes crash. Fix is to build copy when operands are being added to selected instruction. Differential Revision: https://reviews.llvm.org/D138044	2022-11-18 18:02:26 +01:00
Pierre van Houtryve	767999fca8	[AMDGPU][GlobalISel] Support mad/fma_mix selection Adds support for selecting the following instructions using GlobalISel: - v_mad_mix/v_fma_mix - v_mad_mixhi/v_fma_mixhi - v_mad_mixlo/v_fma_mixlo To select those instructions properly, some additional changes were needed which impacted other tests as well. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134354	2022-11-08 08:02:34 +00:00
Pierre van Houtryve	c93104073c	[AMDGPU] Always lower SHUFFLE_VECTOR Make it illegal, remove InstructionSelector logic for it Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134967	2022-10-04 14:23:17 +00:00
Pierre van Houtryve	9a67a6b72a	[AMDGPU][GISel] Legalize V2S16 G_BUILD_VECTOR Preparation patch for D134354 to make V2S16 G_BUILD_VECTOR legal. Also removes RegBankInfo's scalarization of small BUILD_VECTORs, replacing it with InstructionSelector logic instead. This allows for V2S16 BUILD_VECTOR instructions to survive all the way to ISel so we can select FMA/MAD_MIX instructions in D134354. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134433	2022-09-30 14:04:53 +00:00
Petar Avramovic	6db7921b65	AMDGPU: Use tablegen patterns for buffer global and flat atomic fadd Remove manual selection for atomic fadd from global-isel. Stop pre-isel translation to AtomicLoadFAdd/G_ATOMICRMW_FADD which corresponds to llvm-ir's atomicrmw fadd instruction. global and flat atomic fadd patterns changes: Split rtn/no-rtn patterns Add missing patterns or fix predicates Remove atomicrmw patterns for v2f16 (atomic rmw doesn't support vectors). Patterns now check addrspace of pointer, added patterns for flat intrinsic. with global addrspace pointer that selects into global atomic instruction. buffer atomic fadd patterns changes: Rdit patterns to import into global-isel. Remove gfx6/gfx7 _addr64 and _offset patterns. Remove patterns that can't be reached (same pattern but different feature). Differential Revision: https://reviews.llvm.org/D130579	2022-09-23 17:52:10 +02:00
Jay Foad	3822a01e0b	[AMDGPU] Add GFX11 ds_bvh_stack_rtn_b32 instruction Differential Revision: https://reviews.llvm.org/D133928	2022-09-15 16:46:14 +01:00
Ivan Kosarev	f33645301e	[AMDGPU][CodeGen] Support (soffset + offset) s_buffer_load's. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130263	2022-09-05 12:53:05 +01:00
Ivan Kosarev	432cbd7827	[AMDGPU][CodeGen] Support (register + immediate) SMRD offsets. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129381	2022-07-18 11:29:31 +01:00
Piotr Sobczak	4874838a63	[AMDGPU] gfx11 WMMA instruction support gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D128756	2022-06-30 11:13:45 -04:00
Joe Nash	20d20156f4	[AMDGPU] gfx11 VINTERP intrinsics and ISel support Depends on D127664 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D127756	2022-06-17 09:16:59 -04:00

1 2 3 4

190 Commits