llvm-project

Author	SHA1	Message	Date
David Green	2802739dfd	[NFC] Replace ;; with ;	2023-06-11 10:25:24 +01:00
Matt Arsenault	eece6ba283	IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics AMDGPU has native instructions and target intrinsics for this, but these really should be subject to legalization and generic optimizations. This will enable legalization of f16->f32 on targets without f16 support. Implement a somewhat horrible inline expansion for targets without libcall support. This could be better if we could introduce control flow (GlobalISel version not yet implemented). Support for strictfp legalization is less complete but works for the simple cases.	2023-06-06 17:07:18 -04:00
Sergei Barannikov	e744e51b12	[SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC) This will make them consistent with other overflow-aware nodes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D148196	2023-04-29 21:59:58 +03:00
Jessica Del	04317d4da7	[AMDGPU][GISel] Add inverse ballot intrinsic The inverse ballot intrinsic takes in a boolean mask for all lanes and returns the boolean for the current lane. See SPIR-V's `subgroupInverseBallot()` in the [[ https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt \| GL_KHR_shader_subgroup extension ]]. This allows decision making via branch and select instructions with a manually manipulated mask. Implemented in GlobalISel and SelectionDAG, since currently both are supported. The SelectionDAG required pseudo instructions to use the custom inserter. The boolean mask needs to be uniform for all lanes. Therefore we expect SGPR input. In case the source is in a VGPR, we insert one or more `v_readfirstlane` instructions. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D146287	2023-04-06 07:46:50 +02:00
Kazu Hirata	7bb6d1b32e	[llvm] Skip getAPIntValue (NFC) ConstantSDNode provides some convenience functions like isZero, getZExtValue, and isMinSignedValue that are named identically to those provided by APInt, so we can "skip" getAPIntValue.	2023-03-22 22:10:25 -07:00
pvanhout	1f1fea6c38	Reland: [DAG/AMDGPU] Use UniformityAnalysis in DAGISel Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis. No explosions seen during internal testing so this looks like a smooth transition. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D145918	2023-03-14 14:38:45 +01:00
pvanhout	0e79106fc9	Revert "[DAG/AMDGPU] Use UniformityAnalysis in DAGISel" This reverts commit 0022b5803fd4f5a4e9fcf233267c0ffa1b88f763.	2023-03-14 11:48:58 +01:00
pvanhout	0022b5803f	[DAG/AMDGPU] Use UniformityAnalysis in DAGISel Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis. No explosions seen during internal testing so this looks like a smooth transition. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D145918	2023-03-14 11:18:28 +01:00
Petar Avramovic	ded69779be	Fix SGPR + VGPR + offset Scratch offset folding Values in SGPR and VGPR register are treated as unsigned by hardware. When value in 32-bit SGPR or VGPR base can be negative calculate offset using 32-bit add instructions, otherwise use sgpr(unsigned) + vgpr(unsigned) + offset. LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in SGPR or VGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144957	2023-03-09 10:53:41 +01:00
Petar Avramovic	3ae310d0ae	Fix VGPR + offset Scratch offset folding Values in VGPR register are treated as unsigned by hardware. When value in 32-bit VGPR base can be negative calculate offset using 32-bit add instruction, otherwise use vgpr base(unsigned) + offset. Does not affect case where whole offset comes from VGPR register (immediate offset is 0). LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in VGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144956	2023-03-09 10:52:44 +01:00
Petar Avramovic	5e56d59999	Fix SGPR + offset Scratch offset folding Values in SGPR register are treated as unsigned by hardware. When value in 32-bit SGPR base can be negative calculate offset using 32-bit add instruction, otherwise use sgpr base(unsigned) + offset. Does not affect case where whole offset comes from SGPR register (immediate offset is 0). LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in SGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144955	2023-03-09 10:52:44 +01:00
Justin Bogner	c083c89744	[AMDGPU] Move V_FMA_MIX pattern matching into tablegen. NFC The matching for V_FMA_MIX was partially implemented with a C++ matcher (for fmas with 32 bit results and 16 bit inputs) and partially in tablegen (for fmas with 16 bit results). Move the C++ matcher logic into tablegen to make this more consistent and so we can remove the duplication between SDAG and GISel. Differential Revision: https://reviews.llvm.org/D144612	2023-02-23 10:23:34 -08:00
Jay Foad	dcb834843e	[AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC. This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies future changes to make some of it depend on the subtarget. Differential Revision: https://reviews.llvm.org/D144650	2023-02-23 16:38:15 +00:00
Piotr Sobczak	51a49ec52a	[AMDGPU] Clean up MUBUF immediate offset D143174 lifted the artificial type restriction by promoting offset to i32. This patch handles more cases: those involving immediate offset in MUBUF. Differential Revision: https://reviews.llvm.org/D144628	2023-02-23 13:29:53 +01:00
Piotr Sobczak	a3d7b3121c	[AMDGPU][NFC] Add getMaxMUBUFImmOffset Replace magic constant 4095 with the function getMaxMUBUFImmOffset(). Differential Revision: https://reviews.llvm.org/D144623	2023-02-23 11:29:59 +01:00
Kazu Hirata	f8f3db2756	Use APInt::count{l,r}_{zero,one} (NFC)	2023-02-19 22:04:47 -08:00
Matt Arsenault	93ec3fa402	AMDGPU: Support atomicrmw uinc_wrap/udec_wrap For now keep the exising intrinsics working.	2023-01-27 22:17:16 -04:00
Jay Foad	245e3dd948	[MC] Do not copy MCInstrDescs. NFC. Avoid copying MCInstrDesc instances because a future patch will change them to find their implicit operands and operand info array based on their own "this" pointer, so it will only work for MCInstrDescs in the TargetInsts table, not for a copy of an MCInstrDesc at a different address. Differential Revision: https://reviews.llvm.org/D142214	2023-01-23 11:55:49 +00:00
Jay Foad	768aed1378	[MC] Make more use of MCInstrDesc::operands. NFC. Change MCInstrDesc::operands to return an ArrayRef so we can easily use it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end. A future patch will remove opInfo_begin and opInfo_end. Also use it instead of raw access to the OpInfo pointer. A future patch will remove this pointer. Differential Revision: https://reviews.llvm.org/D142213	2023-01-23 11:31:41 +00:00
Kazu Hirata	caa99a01f5	Use llvm::popcount instead of llvm::countPopulation(NFC)	2023-01-22 12:48:51 -08:00
Nick Desaulniers	ad99774a5f	[llvm][PassSupport] don't require passes to be default constructible Quite a few passes are not default constructible. In order to properly support -{start\|stop}-{before\|after}= for these passes, we would like to continue to use INITIALIZE_PASS, but not necessarily provide a default constructor. Delete the default constructors of classes derived from SelectionDAGISel. Link: https://github.com/llvm/llvm-project/issues/59538 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D140349	2022-12-20 14:07:29 -08:00
Carl Ritson	5bc703f755	[AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass Accelerate finding the base class for a physical register by building a statically mapping table from physical registers to base classes using TableGen. Replace uses of SIRegisterInfo::getPhysRegClass with TargetRegisterInfo::getPhysRegBaseClass in order to use the computed table. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D139422	2022-12-20 16:22:14 +09:00
Craig Topper	c09edce1b3	[SelectionDAG] Give all the target specific subclasses of SelectionDAGISel their own pass ID. Previously we had a shared ID in SelectionDAGISel. AMDGPU has an initializePass function for its subclass of SelectionDAGISel. No other target does. This causes all target specific SelectionDAGISel passes to be known as "amdgpu-isel". I'm not sure what would happen if another target tried to implement an initializePass function too since the ID is already claimed. This patch gives all targets their own ID and passes it down to SelectionDAGISel constructor to MachineFunctionPass's constructor. Unfortunately, I think this causes most targets to lose print-before/after-all support for their SelectionDAGISel pass. And they probably no longer support start/stop-before/after. We can add initializePass functions to fix this as a follow up. NOTE: This was probably also broken if the AMDGPU target isn't compiled in. Step 1 to fixing PR59538. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D140161	2022-12-15 15:48:55 -08:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Fangrui Song	67819a72c6	[CodeGen] llvm::Optional => std::optional	2022-12-13 09:06:36 +00:00
Justin Bogner	916ae0a060	[AMDGPU] Handle nnan and fast on the call in fpmed3 patterns We were only allowing these med3 patterns if the operands were known to not be NaN, but we should also allow it if the calls to max/min have the `nnan` or `fast` flags. Differential Revision: https://reviews.llvm.org/D139506	2022-12-06 22:57:52 -08:00
Thomas Symalla	851176c7f7	[AMDGPU] Remove AMDGPUISelDAGToDAG::isKnownNeverNaN This function removes the mentioned function, as it only does two checks which are already implemented as part of SelectionDAG::isKnownNeverNaN - which is called there. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D138938	2022-11-30 07:58:32 +01:00
Mirko Brkusanin	e58b116843	[AMDGPU] Add subtarget feature for MAD_U64/I64 bug on GFX11 Differential Revision: https://reviews.llvm.org/D133012	2022-11-18 18:19:27 +01:00
Jay Foad	3822a01e0b	[AMDGPU] Add GFX11 ds_bvh_stack_rtn_b32 instruction Differential Revision: https://reviews.llvm.org/D133928	2022-09-15 16:46:14 +01:00
Piotr Sobczak	abd927e5a8	[AMDGPU] Check for num elts in SelectVOP3PMods The rest of the code section assumes there are exactly two elements in the vector (Lo, Hi), so add the check before entering the section. Differential Revision: https://reviews.llvm.org/D133852	2022-09-14 20:00:19 +02:00
Ivan Kosarev	5db8d6fd2b	[AMDGPU][CodeGen] Support (base \| offset) SMEM loads. Prevents generation of unnecessary s_or_b32 instructions. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D132552	2022-09-05 14:22:06 +01:00
Ivan Kosarev	f33645301e	[AMDGPU][CodeGen] Support (soffset + offset) s_buffer_load's. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130263	2022-09-05 12:53:05 +01:00
Stanislav Mekhanoshin	813ae2871d	[AMDGPU] Detect uniformness of TID / wavefrontsize A value of 'workitemid / wavefrontize' or 'workitemid & (wavefrontize - 1)' is wave uniform. Differential Revision: https://reviews.llvm.org/D132511	2022-08-26 23:26:08 -07:00
Fangrui Song	c17450a094	[AMDGPU] Change DEBUG_TYPE from isel to amdgpu-isel to match all other *ISelDAGToDAG.cpp	2022-07-23 11:32:02 -07:00
Ivan Kosarev	432cbd7827	[AMDGPU][CodeGen] Support (register + immediate) SMRD offsets. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129381	2022-07-18 11:29:31 +01:00
Ivan Kosarev	9c66c02e2e	[AMDGPU][CodeGen] Match SMRDs with constant bases and register offsets. Saves some add instructions on a couple Rage 2 shaders and is also a prerequisite for a coming-soon change matching (register + immediate) offsets. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D129095	2022-07-18 11:18:23 +01:00
Ivan Kosarev	4696a33dfa	[AMDGPU][NFC] Refine matching SMRD offsets. Tell the matcher what we are looking for instead of matching everything and then discarding the result if doesn't fit. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D128171	2022-07-05 14:07:22 +01:00
Piotr Sobczak	4874838a63	[AMDGPU] gfx11 WMMA instruction support gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D128756	2022-06-30 11:13:45 -04:00
Joe Nash	20d20156f4	[AMDGPU] gfx11 VINTERP intrinsics and ISel support Depends on D127664 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D127756	2022-06-17 09:16:59 -04:00
Joe Nash	2d43de13df	[AMDGPU] gfx11 new dot instruction codegen support Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D127904	2022-06-16 14:19:34 -04:00
Jay Foad	7b9f620e78	[AMDGPU] Work around GFX11 flat scratch SVS swizzling bug Differential Revision: https://reviews.llvm.org/D127635	2022-06-13 21:00:42 +01:00
Jay Foad	d943c51465	[AMDGPU] Fix GFX11 codegen for V_MAD_U64_U32 and V_MAD_I64_I32 GFX11 uses different pseudos for these because of a new constraint on which operands' registers can overlap. Differential Revision: https://reviews.llvm.org/D127659	2022-06-13 20:59:18 +01:00
Guillaume Chatelet	0788186182	[Alignment][NFC] Remove usage of MemSDNode::getAlignment I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with TypeSize since it came up a few times. Differential Revision: https://reviews.llvm.org/D126910	2022-06-07 13:52:20 +00:00
Abinav Puthan Purayil	f59cb41ba1	[AMDGPU] Select buffer_atomic_cmpswap* in tblgen This change replaces the manual selection of buffer_atomic_cmpswap* instructions in SelectionDAG and GlobalISel with a tblgen based selection in BUFInstructions.td. This allows us to select the return and no-return variants in tblgen. Differential Revision: https://reviews.llvm.org/D121770	2022-03-17 10:12:32 +05:30
Stanislav Mekhanoshin	c4500de255	[AMDGPU] gfx940: disable OP_SEL on V_DOT instructions Differential Revision: https://reviews.llvm.org/D121634	2022-03-14 17:02:00 -07:00
Stanislav Mekhanoshin	36fe3f13a9	[AMDGPU] flat scratch SVS addressing mode for gfx940 Both VADDR and SADDR are used in SVS mode. Differential Revision: https://reviews.llvm.org/D121254	2022-03-14 15:23:36 -07:00
Sebastian Neubauer	6527b2a4d5	[AMDGPU][NFC] Fix typos Fix some typos in the amdgpu backend. Differential Revision: https://reviews.llvm.org/D119235	2022-02-18 15:05:21 +01:00
Jay Foad	d7e03df719	[AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32 Select SelectionDAG ops smul_lohi/umul_lohi to v_mad_i64_i32/v_mad_u64_u32 respectively, with an addend of 0. v_mul_lo, v_mul_hi and v_mad_i64/u64 are all quarter-rate instructions so it is better to use one instruction than two. Further improvements are possible to make better use of the addend operand, but this is already a strict improvement over what we have now. Differential Revision: https://reviews.llvm.org/D113986	2021-11-24 11:25:02 +00:00
Abinav Puthan Purayil	078da26b1c	[AMDGPU] Check for unneeded shift mask in shift PatFrags. The existing constrained shift PatFrags only dealt with masked shift from OpenCL front-ends. This change copies the X86DAGToDAGISel::isUnneededShiftMask() function to AMDGPU and uses it in the shift PatFrag predicates. Differential Revision: https://reviews.llvm.org/D113448	2021-11-24 10:53:12 +05:30
alex-t	0a3d755ee9	[AMDGPU] Enable divergence-driven BFE selection Detailed description: This change enables the bit field extract patterns selection to s_bfe_u32 or v_bfe_u32 dependent on the pattern root node divergence. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D110950	2021-11-03 23:26:59 +03:00

1 2 3 4 5 ...

342 Commits