llvm-project

Author	SHA1	Message	Date
Mirko Brkušanin	ecfdc23dd2	[AMDGPU] Select gfx1150 SALU Float instructions (#66885 )	2023-09-21 12:22:55 +02:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Matt Arsenault	16bc07ac91	AMDGPU: Select f64 fmul by negative power of 2 to ldexp Select fmul x, -K -> ldexp(-x, log2(fabsK)) Select fmul fabs(x), -K -> ldexp(-\|x\|, log2(fabsK)) https://reviews.llvm.org/D158173	2023-08-23 20:36:01 -04:00
Matt Arsenault	1030483561	AMDGPU/GlobalISel: Handle stacksave/stackrestore https://reviews.llvm.org/D156670	2023-08-11 10:25:01 -04:00
Matt Arsenault	29fff3e2ab	AMDGPU: Try to select fmul by power of 2 to ldexp For the f64 case, this gives us a cheaper to materialize 32-bit constant. It's less obviously a win for f32 and f16. It forces us to use a VOP3 encoding so it's a neutral code size change. GlobalISel cases don't work because of the constant-is-copy-to-vgpr problem. https://reviews.llvm.org/D157111	2023-08-11 07:57:55 -04:00
Matt Arsenault	4f851361e4	AMDGPU: Remove extra parentheses	2023-08-04 17:40:54 -04:00
Jay Foad	c2093b8504	[AMDGPU] Add target features for GDS and GWS GFX9 subtargets from GFX90A onwards lack GDS but still have GWS. Differential Revision: https://reviews.llvm.org/D156713	2023-08-02 09:02:07 +01:00
Matt Arsenault	0aa439d502	AMDGPU/GlobalISel: Use SGPR results for G_AMDGPU_WAVE_ADDRESS	2023-07-31 19:16:11 -04:00
Sameer Sahasrabuddhe	d9847cde48	[GlobalISel] convergent intrinsics Introduced the convergent equivalent of the existing G_INTRINSIC opcodes: - G_INTRINSIC_CONVERGENT - G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS Out of the targets that currently have some support for GlobalISel, the patch assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154766	2023-07-31 12:15:39 +05:30
Matt Arsenault	3240ae7034	AMDGPU/GlobalISel: Set dead on scc on manually selected instructions In SelectionDAG InstrEmitter automatically puts dead flags on unused physreg defs everywhere. The generated selectors should also set dead on physreg defs that were not used in the pattern.	2023-07-28 14:14:06 -04:00
Sameer Sahasrabuddhe	7c760b224b	Restore "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR" Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic ID is the first non-def operand to the instruction. These are now represented as a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID() is now moved to this subclass GIntrinsic. Some target-defined instructions behave like GMIR intrinsics, and have an Intrinsic::ID operand. But they should not be recognized as generic intrinsics, and should not use GIntrinsic::getIntrinsicID(). Separated these out by introducing a new AMDGPU::getIntrinsicID(). Reviewed By: arsenm, Pierre-vh Differential Revision: https://reviews.llvm.org/D155556 This restores commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa. Originally reverted in d0f7850b01cf17e50a4f4b00e3b84dded94df6b8.	2023-07-27 14:49:17 +05:30
Sameer Sahasrabuddhe	d0f7850b01	Revert "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR" This reverts commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa. The changes did not cover all occurrences of the deteleted function MachineInstr::getIntrinsicID().	2023-07-27 10:14:24 +05:30
Sameer Sahasrabuddhe	baa3386edb	[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic ID is the first non-def operand to the instruction. These are now represented as a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID() is now moved to this subclass GIntrinsic. Some target-defined instructions behave like GMIR intrinsics, and have an Intrinsic::ID operand. But they should not be recognized as generic intrinsics, and should not use GIntrinsic::getIntrinsicID(). Separated these out by introducing a new AMDGPU::getIntrinsicID(). Reviewed By: arsenm, Pierre-vh Differential Revision: https://reviews.llvm.org/D155556	2023-07-27 10:00:45 +05:30
Matt Arsenault	fb54afd1b7	AMDGPU: Fold fsub [+-0] into fneg when folding source modifiers This isn't always folded to fneg for a freestanding fsub depending on the denormal mode. When matching source modifiers, we're implicitly canonicalizing the input so we can fold it here. Doesn't bother handling the VOP3P case since it's only relevant with DAZ, which nobody really uses with f16. For f64, tests show an existing bug where DAGCombiner tries to respect the denormal mode for fsub -0, x, but not after it's lowered to fadd -0, (fneg x). Either the fold is wrong or we shouldn't restrict the fsub case based on the denormal mode. https://reviews.llvm.org/D155652	2023-07-20 19:29:40 -04:00
pvanhout	07c5920487	Reland "[AMDGPU] Wave32 CodeGen for amdgcn.ballot.i64" This time without the extra `->dump()` A recent addition to the device libs, `__ockl_dm_trim`, caused a series of failures at O0 due to a i64 ballot intrinsic being inlined into a wave32 function. The quick fix for this is to support codegen for this rare case. A proper long-term fix for this type of issue is still being discussed. Fixes SWDEV-408929, SWDEV-408957, SWDEV-409885, SWDEV-410193 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D155050	2023-07-13 15:58:48 +02:00
pvanhout	aec971adec	Revert "[AMDGPU] Wave32 CodeGen for amdgcn.ballot.i64" This reverts commit cfa2d0a3aa0beb5422107dc9943cb0eae6d93896.	2023-07-13 15:52:27 +02:00
pvanhout	cfa2d0a3aa	[AMDGPU] Wave32 CodeGen for amdgcn.ballot.i64 A recent addition to the device libs, `__ockl_dm_trim`, caused a series of failures at O0 due to a i64 ballot intrinsic being inlined into a wave32 function. The quick fix for this is to support codegen for this rare case. A proper long-term fix for this type of issue is still being discussed. Fixes SWDEV-408929, SWDEV-408957, SWDEV-409885, SWDEV-410193 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D155050	2023-07-13 15:20:58 +02:00
pvanhout	3c30179e98	[GlobalISel] Rename KnownBits field of InstructionSelector `KnownBits` is also a type name. Having a field with this name prevents derived classes from using the `KnownBits` type unless they use `struct KnownBits`. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D155082	2023-07-12 15:28:11 +02:00
pvanhout	8444038d16	[AMDGPU] Use GlobalISel MatchTable Combiner Backend Use the new matchtable-based combiner backend for all AMDGPU combiners. This drop-in from the user's perspective; there are no test changes, the new combiner behaves exactly like the old one. Depends on D153757 NOTE: This would land iff D153757 (RFC) lands too. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D153758	2023-07-11 11:27:13 +02:00
pvanhout	1fe7d9c799	[GlobalISel] Generalize `InstructionSelector` Match Tables Makes `InstructionSelector.h`/`InstructionSelectorImpl.h` generic so the match tables can also be used for the combiner. Some notes: - Coverage was made an optional parameter of `executeMatchTable`, combines won't use it for now. - `GIPFP_` -> `GICXXPred_` so it's more generic. Those are just C++ predicates and aren't PatFrag-specific. - Pass the MatcherState directly to testMIPredicate_MI, the combiner will need it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D153755	2023-07-11 09:42:30 +02:00
Krzysztof Drewniak	faa2c678aa	[AMDGPU] Add buffer intrinsics that take resources as pointers In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backend), define versions of the raw and structured buffer intrinsics that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their rsrc arguments. The new intrinsics are named by replacing `buffer.` with `buffer.ptr`. One advantage to these intrinsic definitions is that, instead of specifying that a buffer load/store will read/write some memory, we can indicate that the memory read or written will be based on the pointer argument. This means that, for example, a read from a `noalias` buffer can be pulled out of a loop that is modifying a distinct buffer. In the future, we will define custom PseudoSourceValues that will allow us to package up the (buffer, index, offset) triples that buffer intrinsics contain and allow for more precise backend analysis. This work also enables creating address space 7, which represents manipulation of raw buffers using native LLVM load and store instructions. Where tests simply used a buffer intrinsic while testing some other code path (such as the tests for VGPR spills), they have been updated to use the new intrinsic form. Tests that are "about" buffer intrinsics (for instance, those that ensure that they codegen as expected) have been duplicated, either within existing files or into new ones. Depends on D145441 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D147547	2023-06-05 16:59:07 +00:00
Kazu Hirata	aa144fbeaf	[AMDGPU] Fix warnings This patch fixes warnings like: llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h:711: warning: enumerated and non-enumerated type in conditional expression	2023-05-19 10:56:23 -07:00
Joe Nash	05d04a0180	[AMDGPU] NFC. Refactor GISel for cmp intrinsics Combine the logic for fcmp and icmp intrinsics and use operand presence instead. Reviewed By: kosarev, foad Differential Revision: https://reviews.llvm.org/D148716	2023-04-19 11:33:47 -04:00
Jay Foad	6b5067a81a	[AMDGPU] Don't assert that image intrinsics are supported Unsupported intrinsics should give a regular "cannot select" error. Differential Revision: https://reviews.llvm.org/D148147	2023-04-16 19:54:55 +01:00
Chen Zheng	a3d5ec51ba	[AMDGPU][Global-ISel] reuse extension related patterns in td file However the imported rules can not be used for now because Global ISel selectImpl() seems has some bug/limitation to create a illegl COPY from VGPR to SGPR. So currently workaround this by not auto selecting these patterns. Fixes #61468 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147780	2023-04-10 02:11:33 +00:00
Jessica Del	04317d4da7	[AMDGPU][GISel] Add inverse ballot intrinsic The inverse ballot intrinsic takes in a boolean mask for all lanes and returns the boolean for the current lane. See SPIR-V's `subgroupInverseBallot()` in the [[ https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt \| GL_KHR_shader_subgroup extension ]]. This allows decision making via branch and select instructions with a manually manipulated mask. Implemented in GlobalISel and SelectionDAG, since currently both are supported. The SelectionDAG required pseudo instructions to use the custom inserter. The boolean mask needs to be uniform for all lanes. Therefore we expect SGPR input. In case the source is in a VGPR, we insert one or more `v_readfirstlane` instructions. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D146287	2023-04-06 07:46:50 +02:00
Jay Foad	c75e266d31	[AMDGPU] Remove two unused ComplexRendererFns These were left over after https://reviews.llvm.org/D98663	2023-03-30 10:44:45 +01:00
Petar Avramovic	ded69779be	Fix SGPR + VGPR + offset Scratch offset folding Values in SGPR and VGPR register are treated as unsigned by hardware. When value in 32-bit SGPR or VGPR base can be negative calculate offset using 32-bit add instructions, otherwise use sgpr(unsigned) + vgpr(unsigned) + offset. LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in SGPR or VGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144957	2023-03-09 10:53:41 +01:00
Petar Avramovic	3ae310d0ae	Fix VGPR + offset Scratch offset folding Values in VGPR register are treated as unsigned by hardware. When value in 32-bit VGPR base can be negative calculate offset using 32-bit add instruction, otherwise use vgpr base(unsigned) + offset. Does not affect case where whole offset comes from VGPR register (immediate offset is 0). LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in VGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144956	2023-03-09 10:52:44 +01:00
Petar Avramovic	5e56d59999	Fix SGPR + offset Scratch offset folding Values in SGPR register are treated as unsigned by hardware. When value in 32-bit SGPR base can be negative calculate offset using 32-bit add instruction, otherwise use sgpr base(unsigned) + offset. Does not affect case where whole offset comes from SGPR register (immediate offset is 0). LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in SGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144955	2023-03-09 10:52:44 +01:00
Justin Bogner	c083c89744	[AMDGPU] Move V_FMA_MIX pattern matching into tablegen. NFC The matching for V_FMA_MIX was partially implemented with a C++ matcher (for fmas with 32 bit results and 16 bit inputs) and partially in tablegen (for fmas with 16 bit results). Move the C++ matcher logic into tablegen to make this more consistent and so we can remove the duplication between SDAG and GISel. Differential Revision: https://reviews.llvm.org/D144612	2023-02-23 10:23:34 -08:00
Jay Foad	dcb834843e	[AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC. This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies future changes to make some of it depend on the subtarget. Differential Revision: https://reviews.llvm.org/D144650	2023-02-23 16:38:15 +00:00
Mirko Brkusanin	926746d22a	[AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions If more registers are needed for VAddr then the NSA format allows then the final register can act as a contigous set of remaining addresses. Update legalizer to pack register for this new format and allow instruction selection to use NSA encoding when number of addresses exceeds max size. Also update SIShrinkInstructions to handle partial NSA. Differential Revision: https://reviews.llvm.org/D144034	2023-02-23 13:33:34 +01:00
Piotr Sobczak	a3d7b3121c	[AMDGPU][NFC] Add getMaxMUBUFImmOffset Replace magic constant 4095 with the function getMaxMUBUFImmOffset(). Differential Revision: https://reviews.llvm.org/D144623	2023-02-23 11:29:59 +01:00
Joe Nash	80a8e6805a	[AMDGPU] Don't set src mods on permlane16 v_permlane16_b32 and v_permlanex16_b32 should not set abs and neg src modifiers on any input, but they can set op_sel on src0 or src1 to represent fi or bc when desired. The ISel patterns were setting the src_modifier bits to -1, effectively setting abs and neg as well, whenever it was intended to set op_sel, due to an error in ISel. ISel should now correctly only set the op_sel bits. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144519	2023-02-22 11:41:52 -05:00
Kazu Hirata	f8f3db2756	Use APInt::count{l,r}_{zero,one} (NFC)	2023-02-19 22:04:47 -08:00
Kazu Hirata	cbde2124f1	Use APInt::popcount instead of APInt::countPopulation (NFC) This is for consistency with the C++20-style bit manipulation functions in <bit>.	2023-02-19 11:29:12 -08:00
Mirko Brkusanin	43924cbd29	[AMDGPU][GlobalISel] Fix selection of image sample g16 instructions Pre-GFX10 A16 modifier would imply G16. From GFX10 and onwards there are separate instructions for 16bit gradients. This fixes the condition for selecting G16 opcodes. Also stop adding G16 flag to instructions that do not use gradients for GFX10 onwards.	2023-02-09 16:26:55 +01:00
Matt Arsenault	93ec3fa402	AMDGPU: Support atomicrmw uinc_wrap/udec_wrap For now keep the exising intrinsics working.	2023-01-27 22:17:16 -04:00
Kazu Hirata	22cdc6a126	[llvm] Use llvm::bit_ceil instead of PowerOf2Ceil (NFC) The arguments to PowerOf2Ceil in this patch are all known to be nonzero, so we can safely use llvm::bit_ceil here.	2023-01-25 00:05:33 -08:00
Kazu Hirata	caa99a01f5	Use llvm::popcount instead of llvm::countPopulation(NFC)	2023-01-22 12:48:51 -08:00
Fangrui Song	21c4dc7997	std::optional::value => operator*/operator-> value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). This fixes clang.	2022-12-17 00:42:05 +00:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Fangrui Song	67819a72c6	[CodeGen] llvm::Optional => std::optional	2022-12-13 09:06:36 +00:00
Justin Bogner	916ae0a060	[AMDGPU] Handle nnan and fast on the call in fpmed3 patterns We were only allowing these med3 patterns if the operands were known to not be NaN, but we should also allow it if the calls to max/min have the `nnan` or `fast` flags. Differential Revision: https://reviews.llvm.org/D139506	2022-12-06 22:57:52 -08:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
Kazu Hirata	959c9cc7ac	[AMDGPU] Use std::optional in AMDGPUInstructionSelector.cpp (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-11-25 22:23:09 -08:00
Pierre van Houtryve	9e7febb4f7	[AMDGPU][GISel] Select llvm.amdgcn.fcmp intrinsics Adds FP CCs opcodes/selection logic, including src mods selection Depends on D136591, D136448 Resolves #58326 (https://github.com/llvm/llvm-project/issues/58326) Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D136592	2022-11-22 14:18:58 +00:00
Pierre van Houtryve	a751676f98	[AMDGPU][GISel] Add llvm.amdgcn.icmp selection Add missing logic to select i16 variants and enable GISel testing. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136448	2022-11-22 08:26:50 +00:00
Mirko Brkusanin	e58b116843	[AMDGPU] Add subtarget feature for MAD_U64/I64 bug on GFX11 Differential Revision: https://reviews.llvm.org/D133012	2022-11-18 18:19:27 +01:00

1 2 3 4 5 ...

396 Commits