llvm-project

Author	SHA1	Message	Date
Craig Topper	4a486e773e	[CodeGen] Use Register/MCRegister::isPhysical. NFC	2025-01-18 23:37:03 -08:00
Tim Gymnich	2db2dc8ab9	[GlobalISel][NFC] Fix LLT Propagation (#119587 ) Retain LLT type information by creating new LLTs from the original LLT instead of only using the original scalar size. This PR prepares for the [LLT FPInfo RFC](https://discourse.llvm.org/t/rfc-globalisel-adding-fp-type-information-to-llt/83349/24) where LLTs will carry additional floating point type information in addition to the scalar size.	2024-12-12 09:47:46 -08:00
Jon Chesterfield	4e0ba801ea	Revert "[amdgpu][lds] Simplify error diag path - lds variable names are no longer special" Test case didn't run locally, investigating This reverts commit 7bad469182ff2f6423ea209d5a1e81acca600568.	2024-12-08 12:00:13 +00:00
Jon Chesterfield	7bad469182	[amdgpu][lds] Simplify error diag path - lds variable names are no longer special	2024-12-08 11:26:33 +00:00
Matt Arsenault	15676ec552	AMDGPU: Add support for V_CVT_PK_F16_F32 instruction for gfx950 (#118300 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-12-02 16:04:24 -05:00
Kazu Hirata	be187369a0	[AMDGPU] Remove unused includes (NFC) (#116154 ) Identified with misc-include-cleaner.	2024-11-13 21:10:03 -08:00
Gang Chen	8c752900dd	[AMDGPU] modify named barrier builtins and intrinsics (#114550 ) Use a local pointer type to represent the named barrier in builtin and intrinsic. This makes the definitions more user friendly bacause they do not need to worry about the hardware ID assignment. Also this approach is more like the other popular GPU programming language. Named barriers should be represented as global variables of addrspace(3) in LLVM-IR. Compiler assigns the special LDS offsets for those variables during AMDGPULowerModuleLDS pass. Those addresses are converted to hw barrier ID during instruction selection. The rest of the instruction-selection changes are primarily due to the intrinsic-definition changes.	2024-11-06 10:37:22 -08:00
Stanislav Mekhanoshin	6d7e51de5e	[AMDGPU] Extend type support for update_dpp intrinsic (#114597 ) We can split 64-bit DPP as a post-RA pseudo if control values are supported, but cannot handle other types.	2024-11-05 13:59:14 -08:00
Krzysztof Drewniak	ea33af63de	Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714 )" v3 (#114443 ) This reverts commit 8a849a2a567d4e519b246a16936b6e7519936d4b. It seems I missed a spot when trying to ensure the code in the instruction selection tests were actually legalized MIR.	2024-11-01 11:13:29 -05:00
Stanislav Mekhanoshin	7cd29741fa	[AMDGPU] Extend mov_dpp8 intrinsic lowering for generic types (#114296 ) The int_amdgcn_mov_dpp8 is overloaded, but we can only select i32. To allow a corresponding builtin to be overloaded the same way as int_amdgcn_mov_dpp we need it to be able to split unsupported values.	2024-10-31 01:15:25 -07:00
Mikhail Goncharov	8a849a2a56	Revert "Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714 )" v2 (#111708 )" This reverts commit 4b4a0d419c81b8b12a7dbb33dae1f7e9be91a88f. New test fails on buildbots https://lab.llvm.org/buildbot/#/builders/63/builds/2039 https://lab.llvm.org/buildbot/#/builders/127/builds/1055	2024-10-10 13:37:44 +02:00
Krzysztof Drewniak	4b4a0d419c	Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714 )" v2 (#111708 ) This adds `-disable-gisel-legality-check` to some gfx6 and gfx7 test lines to prevent behavior mismatches between debug and release builds The first attempted reapply was #111059 This reverts commit e075dcf7d270fd52dc837163ff24e8c872dfeb49.	2024-10-09 17:11:41 -05:00
Shilei Tian	88a239d292	[AMDGPU] Adopt new lowering sequence for `fdiv16` (#109295 ) The current lowering of `fdiv16` can generate incorrectly rounded result in some cases. The new sequence was provided by the HW team, as shown below written in C++. ``` half fdiv(half a, half b) { float a32 = float(a); float b32 = float(b); float r32 = 1.0f / b32; float q32 = a32 * r32; float e32 = -b32 * q32 + a32; q32 = e32 * r32 + q32; e32 = -b32 * q32 + a32; float tmp = e32 * r32; uin32_t tmp32 = std::bit_cast<uint32_t>(tmp); tmp32 = tmp32 & 0xff800000; tmp = std::bit_cast<float>(tmp32); q32 = tmp + q32; half q16 = half(q32); q16 = div_fixup_f16(q16); return q16; } ``` Fixes SWDEV-477608.	2024-10-08 09:49:20 -04:00
NAKAMURA Takumi	e075dcf7d2	Revert "Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714 )" (#111059 )" This reverts commit 98a15c7b0c6ec129d371f0c121dbe9396c4f5609. (llvmorg-20-init-8051-g98a15c7b0c6e)	2024-10-06 10:50:51 +09:00
Krzysztof Drewniak	98a15c7b0c	Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714 )" (#111059 ) This reverts commit 650c41aad2eb43c634a05b2b5799a0c13a73b92f. The test failures appear to be from conflicts with other PRs that landed around this time.	2024-10-04 12:33:26 -05:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
NAKAMURA Takumi	650c41aad2	Revert "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714 )" Some builders has been failing tests. ``` Failed Tests (2): LLVM :: CodeGen/AMDGPU/GlobalISel/inst-select-load-global-old-legalization.mir LLVM :: CodeGen/AMDGPU/GlobalISel/inst-select-load-local.mir ``` This reverts commit ae5bd2a9f292037c605b2ec0ee31200581bd8701. (llvmorg-20-init-7805-gae5bd2a9f292)	2024-10-03 15:38:34 +09:00
Krzysztof Drewniak	ae5bd2a9f2	[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714 ) Certain pointer address spaces were not being correctly handled by the GlobalISel lowering for buffer_load and buffer_store. 1. ptr addrspace(1) and addrspace(4) did not have rewrite patterns defined for them, while p0 did, since those pointer types weren't in the list of types that was iterated to form the patterns. 2. Vectors of pointers need to be bitcast to vectors of the corresponding scalars, since there doesn't seem to be a good way to define the rewrite patterns for buffer_load/store of those types The need to bitcast vectors of pointers was also revealed to affect ordinary `G_LOAD` and `G_STORE` in some cases, so `shouldBitcastLoadStore()` has been fixed to handle it properly.	2024-10-02 13:46:56 -05:00
sstipano	4f951503b9	Reland "[AMDGPU][GlobalIsel] Use isRegisterClassType for G_FREEZE and G_IMPLICIT_DEF (#101331 )" (#109958 ) S192 type was missing from AllScalarTypes.	2024-09-25 13:02:29 +02:00
NAKAMURA Takumi	4fc08b6cd5	Revert "[AMDGPU][GlobalIsel] Use isRegisterClassType for G_FREEZE and G_IMPLICIT_DEF (#101331 )" This reverts commit 63b2595846b86b4e4eb9afba5e97dd64e8135c10. (llvmorg-20-init-6782-g63b2595846b8) A few bots have been failing on `inst-select-unmerge-values.mir`	2024-09-25 10:02:49 +09:00
sstipano	63b2595846	[AMDGPU][GlobalIsel] Use isRegisterClassType for G_FREEZE and G_IMPLICIT_DEF (#101331 ) G_FREEZE was legal for <13 x S32> which caused an infinite loop in the combiner	2024-09-24 07:49:40 +02:00
Jay Foad	e55d6f5ea2	[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (#107889 ) Always generate v_cndmask_b32 instead of modifying exec around v_mov_b32. This is expected to be faster because modifying exec generally causes pipeline stalls.	2024-09-11 17:16:06 +01:00
Stanislav Mekhanoshin	0745219d4a	[AMDGPU] Add target intrinsic for s_buffer_prefetch_data (#107293 )	2024-09-06 11:41:21 -07:00
Changpeng Fang	26b0bef192	AMDGPU: Use pattern to select instruction for intrinsic llvm.fptrunc.round (#105761 ) Use GCNPat instead of Custom Lowering to select instructions for intrinsic llvm.fptrunc.round. "SupportedRoundMode : TImmLeaf" is used as a predicate to select only when the rounding mode is supported. "as_hw_round_mode : SDNodeXForm" is developed to translate the round modes to the corresponding ones that hardware recognizes.	2024-08-29 11:43:58 -07:00
Jay Foad	564bd20658	[AMDGPU][GlobalISel] Save a copy in one case of addrspacecast (#104789 ) Refactor legalization of addrspacecast local/private -> flat to avoid building a copy in the nonnull case.	2024-08-19 18:22:29 +01:00
Changpeng Fang	16929219b0	AMDGPU: Add tonearest and towardzero roundings for intrinsic llvm.fptrunc.round (#104486 ) This work simplifies and generalizes the instruction definition for intrinsic llvm.fptrunc.round. We no longer name the instruction with the rounding mode. Instead, we introduce an immediate operand for the rounding mode for the pseudo instruction. This immediate will be used to set up the hardware mode register at the time the real instruction is generated. We name the pseudo instruction as FPTRUNC_ROUND_F16_F32 (for f32 -> f16), which is easy to generalize for other types. "round.towardzero" and "round.tonearest" are added for f32 -> f16 truncating, in addition to the existing "round.upward" and "round.downward". Other rounding modes are not supported by hardware at this moment.	2024-08-17 11:22:47 -07:00
Sumanth Gundapaneni	0ee32c4573	[AMDGPU] Implement llvm.lrint intrinsic lowering (#98931 ) This patch enabled the target-independent lowering of llvm.lrint via GlobalISel. For SelectionDAG, the instrinsic is custom lowered for AMDGPU.	2024-07-24 23:34:31 +04:00
Jessica Del	6a1b119035	[AMDGPU] Add intrinsics for atomic struct buffer loads (#100140 ) Mark these intrinsics as atomic loads within LLVM to prevent hoisting out of loops in cases where the load is considered invariant. Similar to https://github.com/llvm/llvm-project/pull/97707, but for struct buffer loads.	2024-07-24 11:05:28 +02:00
Sumanth Gundapaneni	fc832d5349	[AMDGPU] Implement llvm.lround intrinsic lowering. (#98970 ) This patch enables the target-independent lowering of llvm.lround via GlobalISel. For SelectionDAG, the instrinsic is custom lowered for AMDGPU. In order to support vector floating point input for llvm.lround, this patch extends the target independent APIs and provide support for scalarizing. pr98950 is needed to let verifier allow vector floating point types	2024-07-23 20:34:34 +04:00
Jessica Del	ec7f8e1113	[AMDGPU] Add intrinsic for raw atomic buffer loads (#97707 ) Upstream the intrinsics `llvm.amdgcn.raw.atomic.buffer.load` and `llvm.amdgcn.raw.atomic.ptr.buffer.load`. These additional intrinsics mark atomic buffer loads as atomic to LLVM by removing the `IntrReadMem` attribute. Otherwise, it could hoist these intrinsics out of loops in cases where LLVM marks them as invariant. That can cause issues such as infinite loops. Continuation of https://reviews.llvm.org/D138786 with the additional use in the fat buffer lowering, more test cases and the additional ptr versions of these intrinsics. --------- Co-authored-by: rtayl <> Co-authored-by: Jay Foad <jay.foad@amd.com> Co-authored-by: Mariusz Sikora <mariusz.sikora@amd.com>	2024-07-22 18:04:49 +02:00
Carl Ritson	62aa596ba1	[AMDGPU] Add no return image_sample intrinsics and instructions (#97542 ) An appropriately configured image resource descriptor can trigger image_sample instructions to store outputs directly to a linked memory location instead of returning to VGPRs. This is opaque to the backend as instruction encoding is unchanged; however, a mechanism is require to allow frontends to communicate that these instructions do not require destination VGPRs and store to memory. Flagging these as stores means they will not be optimized away.	2024-07-20 17:26:58 +09:00
Matt Arsenault	611212fc9a	AMDGPU/GlobalISel: Legalize atomicrmw fmin/fmax (#97048 ) We only handled the easy LDS case before. Handle the other address spaces with the more complicated legality logic.	2024-07-03 23:30:05 +02:00
Matt Arsenault	28d142a485	AMDGPU/GlobalISel: Make pk f16 atomicrmw fadd legal for gfx908 The subtarget features for these are a bit of a mess; the no return version should probably be implied by the with-return feature.	2024-06-28 11:33:42 +02:00
Matt Arsenault	4477ff6836	AMDGPU: Remove ds_fmin/ds_fmax intrinsics (#96739 ) These have been replaced with atomicrmw.	2024-06-27 15:35:24 +02:00
Vikram Hegde	35f7b60aa6	[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725 ) These are incremental changes over #89217 , with core logic being the same. This patch along with #89217 and #91190 should get us ready to enable 64 bit optimizations in atomic optimizer.	2024-06-26 09:24:09 +05:30
Vikram Hegde	5feb32ba92	[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217 ) This patch is intended to be the first of a series with end goal to adapt atomic optimizer pass to support i64 and f64 operations (along with removing all unnecessary bitcasts). This legalizes 64 bit readlane, writelane and readfirstlane ops pre-ISel --------- Co-authored-by: vikramRH <vikhegde@amd.com>	2024-06-25 14:35:19 +05:30
Matt Arsenault	70c8b9c24a	AMDGPU: Remove ds atomic fadd intrinsics (#95396 ) These have been replaced with atomicrmw fadd	2024-06-23 10:30:20 +02:00
Matt Arsenault	8520061281	AMDGPU: Support local atomicrmw fmin/fmax for float/double (#95590 ) This has always been supported. Somehow, we ended up with 2 copies of clang builtins for this case, and the newer one erroneously requires gfx8-insts.	2024-06-18 18:34:34 +02:00
Matt Arsenault	3b997294d6	AMDGPU: Remove .v2bf16 buffer atomic fadd intrinsics (#95783 ) These are redundant with the unsuffixed versions, and have a name collision with surprising behavior when the base intrinsic is used with v2bf16. The global and flat variants should be removed too, but those are complicated due to using v2i16 in place of the natural v2bf16. Those cases can soon be completely deleted in favor of atomicrmw. The GlobalISel codegen change is broken and substitutes handling as bf16 for handling as f16, but it's a bug that this passed the IRTranslator in the first place.	2024-06-17 21:44:52 +02:00
Matt Arsenault	4cf1a19b7e	Reapply "AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (#95394 )" This reverts commit 95b77d90aae10725ea692e120aac083ef1c1297d.	2024-06-17 16:34:35 +02:00
Matt Arsenault	405882db94	AMDGPU: Fix legalization for llvm.amdgcn.struct.buffer.atomic.fadd.v2bf16	2024-06-17 15:20:23 +02:00
Nico Weber	95b77d90aa	Revert "AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (#95394 )" This reverts commit 5021e6dd548323e1169be3d466d440009e6d1f8e. Breaks tests, see https://github.com/llvm/llvm-project/pull/95394#issuecomment-2169394503	2024-06-15 12:33:13 -04:00
Christudasan Devadasan	5e9fcb9572	[AMDGPU][GISel] Use datalayout alignment for buffer-load legalization (#95578 ) It matches the legalization of buffer loads similar to the SelectionDAG.	2024-06-15 18:19:52 +05:30
Matt Arsenault	5021e6dd54	AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (#95394 ) Unlike the existing fadd cases, choose to ignore the requirement for amdgpu-unsafe-fp-atomics in case of fine-grained memory access. This is to minimize migration pain to the new atomic control metadata. This should not break any users, as the atomic intrinsics are still directly consumed, and clang does not yet produce vector FP atomicrmw.	2024-06-15 09:58:12 +02:00
Matt Arsenault	0a9a5f989f	AMDGPU: Legalize atomicrmw fadd for v2f16/v2bf16 for local memory (#95393 ) Make this legal for gfx940 and gfx12	2024-06-15 09:55:04 +02:00
Mirko Brkušanin	1e6a82b8ef	[AMDGPU] Legalize and select raw/struct_buffer_load with tfe (#93310 )	2024-05-27 14:09:17 +02:00
Leon Clark	e1c06c380c	[AMDGPU] Fix error in #88512 . (#92770 ) Fixes error in GlobalISel CTLZ lowering caused by [#88512](https://github.com/llvm/llvm-project/pull/88512). --------- Co-authored-by: Leon Clark <leoclark@amd.com>	2024-05-20 20:32:53 +01:00
Leon Clark	fb2c6597e3	[AMDGPU] Use LSH for lowering ctlz_zero_undef.i8/i16 (#88512 ) Use LSH to lower ctlz_zero_undef instead of subtracting leading zeros for i8 and i16. Related to [77615](https://github.com/llvm/llvm-project/pull/77615). --------- Co-authored-by: Leon Clark <leoclark@amd.com>	2024-05-19 21:45:24 +01:00
Kazu Hirata	c18bcd0a57	[Target] Use StringRef::operator== instead of StringRef::equals (NFC) (#91072 ) (#91138 ) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 38 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".	2024-05-05 13:43:10 -07:00
Emma Pilkington	a04714701f	[AMDGPU] Add a trap lowering workaround for gfx11 (#85854 ) On gfx11 shaders run with PRIV=1, which causes `s_trap 2` to be treated as a nop, which means it isn't a correct lowering for the trap intrinsic. As a workaround, this commit instead lowers the trap intrinsic to instructions that simulate the behavior of s_trap 2. Fixes: SWDEV-438421	2024-04-24 09:43:54 -04:00

1 2 3 4 5 ...

712 Commits