llvm-project

Author	SHA1	Message	Date
Matt Arsenault	8e61aaa021	AMDGPU: Fix illegal commute with frame index (#114497 ) In ca409892c5396fa3fbb8ea4dbf53d0e952f36d09, frame indexes started being treated more like registers, rather than immediates. Update the commute logic to avoid failing the verifier by moving illegal SGPR operands in place of a frame index.	2024-11-01 10:02:29 -07:00
Shilei Tian	10a1ea9b53	[NFC][AMDGPU] Remove the empty FPM as well as the adaptor to MPM (#114558 )	2024-11-01 12:21:26 -04:00
Krzysztof Drewniak	ea33af63de	Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714 )" v3 (#114443 ) This reverts commit 8a849a2a567d4e519b246a16936b6e7519936d4b. It seems I missed a spot when trying to ensure the code in the instruction selection tests were actually legalized MIR.	2024-11-01 11:13:29 -05:00
Jay Foad	550501f21c	[AMDGPU] Simplify GFX12 VBUFFER definitions. NFC. (#114403 ) For GFX12 hasTFE is always true because it does not have the buffer load to LDS instructions.	2024-11-01 10:06:45 +00:00
Shilei Tian	9234ae1bbe	[NFC] clang-format -i llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp	2024-10-31 11:44:15 -04:00
Matt Arsenault	1d0370872f	AMDGPU: Expand flat atomics that may access private memory (#109407 ) If the runtime flat address resolves to a scratch address, 64-bit atomics do not work correctly. Insert a runtime address space check (which is quite likely to be uniform) and select between the non-atomic and real atomic cases. Consider noalias.addrspace metadata and avoid this expansion when possible (we also need to consider it to avoid infinitely expanding after adding the predication code).	2024-10-31 08:08:48 -07:00
Matt Arsenault	12409024d3	AMDGPU/GlobalISel: Handle atomic sextload and zextload (#111721 ) Atomic loads are handled differently from the DAG, and have separate opcodes and explicit control over the extensions, like ordinary loads. Add new patterns for these. There's room for cleanup and improvement. d16 cases aren't handled. Fixes #111645	2024-10-31 07:44:52 -07:00
Stanislav Mekhanoshin	7cd29741fa	[AMDGPU] Extend mov_dpp8 intrinsic lowering for generic types (#114296 ) The int_amdgcn_mov_dpp8 is overloaded, but we can only select i32. To allow a corresponding builtin to be overloaded the same way as int_amdgcn_mov_dpp we need it to be able to split unsupported values.	2024-10-31 01:15:25 -07:00
Changpeng Fang	ca1154d1d4	AMDGPU: Disable pattern matching "x<<32-y>>32-y" to "bfe x, 0, y" (#114279 ) It is not correct to lower "x<<32-y>>32-y" to "bfe x, 0, y". When y equals to 32, the left-hand side is still x (unchanged), however, the right-hand side will be evaluated to 0. So it is not always correct to do such transformation. We may be able to keep the pattern for immediate y while y is within [0, 31]. However, the immediate operands of the sub (32 - y) are easily folded, and "(x << imm) >> imm" will be lowered to "and x, (2^(32-imm))-1" anyway. So no bfe matching is needed.	2024-10-30 11:07:15 -07:00
Jay Foad	6bf4476ffb	[AMDGPU] Fix @llvm.amdgcn.cs.chain with callee not provably uniform (#114200 ) The correct behavior is to insert a readfirstlane. This worked except for an inappropriate assertion in SITargetLowering::LowerCall.	2024-10-30 16:18:29 +00:00
Jay Foad	8ee5e19c87	[AMDGPU] Fix @llvm.amdgcn.cs.chain with SGPR args not provably uniform (#114232 ) The correct behaviour is to insert a readfirstlane. SelectionDAG was already doing this in some cases, but not in the general case for chain calls. GlobalISel was already doing this for return values but not for arguments.	2024-10-30 16:12:37 +00:00
Matt Arsenault	6d9fc1b846	AMDGPU: Fix producing invalid IR on vector typed getelementptr (#114113 ) This did not consider the IR change to allow a scalar base with a vector offset part. Reject any users that are not explicitly handled. In this situation we could handle the vector GEP, but that is a larger change. This just avoids the IR verifier error by rejecting it.	2024-10-29 22:14:24 -07:00
Shilei Tian	9a7519fdb3	Revert "[NFC][AMDGPU][Attributor] Exit earlier if entry CC (#114177 )" This reverts commit 922a0d3dfe2db7a2ef50e8cef4537fa94a7b95bb.	2024-10-30 00:53:43 -04:00
Shilei Tian	922a0d3dfe	[NFC][AMDGPU][Attributor] Exit earlier if entry CC (#114177 ) Avoid calling TTI or other stuff unnecessarily	2024-10-30 00:42:44 -04:00
Shilei Tian	3de5dbb111	[AMDGPU][Attributor] Check the validity of a dependent AA before using its value (#114165 ) Even though the Attributor framework will invalidate all its dependent AAs after the current iteration, a dependent AA can still use the worst state of a depending AA if it doesn't check the state of the depending AA in current iteration.	2024-10-29 23:43:45 -04:00
Fangrui Song	facdae62b7	[MCInstPrinter] Make printRegName non-const Similar to printInst. printRegName may change states (e.g. #113834).	2024-10-29 19:14:54 -07:00
Jay Foad	a156362e93	[AMDGPU] Fix machine verification failure after SIFoldOperandsImpl::tryFoldOMod (#113544 ) Fixes #54201	2024-10-29 14:59:37 +00:00
Shilei Tian	e268398fa8	[NFC][AMDGPU] Use `!foreach` to replace explicit list of registers (#114005 )	2024-10-29 10:50:06 -04:00
Matt Arsenault	88e23eb2cf	DAG: Fix legalization of vector addrspacecasts (#113964 )	2024-10-29 08:08:50 -05:00
Shilei Tian	4cf128512b	[NFC][AMDGPU] Use C++17 structured bindings as much as possible (#113939 ) This only changes `llvm/lib/Target/AMDGPU/SIISelLowering.cpp`. There are five uses of `std::tie` remaining because they can't be replaced with C++17 structured bindings.	2024-10-28 13:55:57 -04:00
Mirko Brkušanin	fa4790e404	[AMDGPU][MC] Fix disassembler for VIMAGE when non-first vaddr is v0 (#113569 ) For disassembler tables we use V1_V4 variants for VIMAGE and then remove unused vaddr fields. V1_V1 variant, which has every vaddr field other than vaddr0 set to 0, was also enabled and caused confusion when decoding cases which used v0 (whose encoded value is 0)	2024-10-28 10:43:18 +01:00
Fabian Ritter	a4fd3dba6e	[AMDGPU] Use wider loop lowering type for LowerMemIntrinsics (#112332 ) When llvm.memcpy or llvm.memmove intrinsics are lowered as a loop in LowerMemIntrinsics.cpp, the loop consists of a single load/store pair per iteration. We can improve performance in some cases by emitting multiple load/store pairs per iteration. This patch achieves that by increasing the width of the loop lowering type in the GCN target and letting legalization split the resulting too-wide access pairs into multiple legal access pairs. This change only affects lowered memcpys and memmoves with large (>= 1024 bytes) constant lengths. Smaller constant lengths are handled by ISel directly; non-constant lengths would be slowed down by this change if the dynamic length was smaller or slightly larger than what an unrolled iteration copies. The chosen default unroll factor is the result of microbenchmarks on gfx1030. This change leads to speedups of 15-38% for global memory and 1.9-5.8x for scratch in these microbenchmarks. Part of SWDEV-455845.	2024-10-28 09:04:19 +01:00
Jun Wang	19b0453361	[AMDGPU][MC] Fix disassembler problem for image_atomic with TFE (#112622 ) For image_atomic instructions with TFE, in some cases (e.g., when dmask=3) the disassembler produces dst register with wrong size (e.g., image_atomic_smin v5, v1, s[8:15] dmask:0x3 tfe, instead of v[5:7]). This patch fixes the VDataDwords values for image atomic instructions.	2024-10-24 16:19:18 -07:00
Akshat Oke	789fdd536d	[CodeGen][NewPM] Make MFProperties methods const NFC (#113304 ) Makes them congruent with the legacy PM methods.	2024-10-25 00:24:44 +05:30
Kazu Hirata	141574bacb	[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#113415 )	2024-10-23 10:44:09 -07:00
Kazu Hirata	0cb80c4f00	[AMDGPU] Avoid repeated hash lookups (NFC) (#113409 )	2024-10-22 23:02:34 -07:00
Carl Ritson	076aac59ac	[AMDGPU] Add a new target for gfx1153 (#113138 )	2024-10-23 12:56:58 +09:00
Janek van Oirschot	a18826d75c	[AMDGPU] Create local KnownBits in case DenseMap gets invalidated (#111568 ) KnownBits retrieved from DenseMap may invalidate if insertion requires a (re)growth. Fixes https://github.com/llvm/llvm-project/issues/110930	2024-10-22 16:05:07 +01:00
Jay Foad	b3acb25735	[AMDGPU] Don't rely on !eq comparing int with bits<5>. NFC. (#113279 ) Tweak VOP2eInst_Base so that it does not rely on !eq comparing an int value (-1) with a bits<5> value. This is to avoid a change in behaviour when #112904 lands, which is a bug fix which has the side effect of implicitly casting template arguments to the declared template parameter type.	2024-10-22 12:20:36 +01:00
Akshat Oke	ca32bd643b	[NewPM][AMDGPU] Port SIPreAllocateWWMRegs to NPM (#109939 )	2024-10-22 15:37:08 +05:30
Akshat Oke	4e32d7236b	[NewPM][CodeGen] Port LiveRegMatrix to NPM (#109938 )	2024-10-22 15:28:04 +05:30
Akshat Oke	834b820f40	[AMDGPU] Correct pass dependencies for SILowerSGPRSpills (#109937 ) Replace unused analysis (VirtRegMap) dependency with the used one (SlotIndexes) Initializes `SlotIndexesWrapperPass` which is used by SILowerSGPRSpills to ensure that legacy pass manager finds it. Removes the initialization for `VirtRegMapWrapperPass` since it is not requested in this pass.	2024-10-22 15:20:54 +05:30
Akshat Oke	93802815ab	[NewPM][CodeGen] Port VirtRegMap to NPM (#109936 )	2024-10-22 15:15:56 +05:30
Fabian Ritter	69abfd3141	[AMDGPU] Allow casts between the Global and Constant Addr Spaces in isValidAddrSpaceCast (#112493 ) So far, isValidAddrSpaceCast only allows casts to the flat address space and between the constant(32) address spaces. It does not allow casting between the global and constant address spaces, even though they alias. That affects, e.g., the lowering of memmoves from the constant to the global address space in LowerMemIntrinsics, since that requires aliasing address spaces to be castable. This patch relaxes isValidAddrSpaceCast and allows such casts. It also includes a memmove test that would crash with the previous implementation because the memmove IR lowering would not be applicable for the move from constant AS to global AS.	2024-10-22 09:33:21 +02:00
Shilei Tian	c3fe0e46e2	[NFC][AMDGPU] clang-format `llvm/lib/Target/AMDGPU/SIISelLowering.cpp` (#112645 )	2024-10-21 16:42:25 -07:00
Kazu Hirata	766bd6f4d0	[AMDGPU] Avoid repeated map lookups (NFC) (#112819 )	2024-10-21 10:35:53 -07:00
Stanislav Mekhanoshin	3277c7cd28	[AMDGPU] Skip VGPR deallocation for waveslot limited kernels (#112765 ) MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's been identified this message is only really needed for VGPR limited kernels. A kernel becomes VGPR limited if a total number of VGPRs per SIMD / number of used VGPRs is more than a number of wave slots.	2024-10-21 09:39:52 -07:00
Akshat Oke	6360652e9f	Reland [AMDGPU] Serialize WWM_REG vreg flag (#110229 ) (#112492 ) A reland but not an exact copy as `VRegInfo.Flags` from the parser is now an int8 instead of a vector; so only need to copy over the value.	2024-10-21 13:44:09 +05:30
Christudasan Devadasan	3c5cea650d	[AMDGPU]: Add implicit-def to the BB prolog (#112872 ) IMPLICIT_DEF inserted for a wwm-register at the very first block or the predecessor block where it is used for sgpr spilling can appear at a block begin that requires spill-insertion during per-lane VGPR regalloc phase. The presence of the IMPLICIT_DEF currently breaks the BB prolog. Fixes: SWDEV-490717	2024-10-21 13:21:16 +05:30
Matt Arsenault	ef91cd3f01	AMDGPU: Handle folding frame indexes into add with immediate (#110738 )	2024-10-19 12:33:03 -07:00
Jay Foad	922992a22f	Fix typo "instrinsic" (#112899 )	2024-10-18 15:58:33 +01:00
Mariusz Sikora	bafc66e50f	[AMDGPU][NFC] Correct description (#112847 )	2024-10-18 10:41:16 +02:00
Alex Rønne Petersen	ad4a582fd9	[llvm] Consistently respect `naked` fn attribute in `TargetFrameLowering::hasFP()` (#106014 ) Some targets (e.g. PPC and Hexagon) already did this. I think it's best to do this consistently so that frontend authors don't run into inconsistent results when they emit `naked` functions. For example, in Zig, we had to change our emit code to also set `frame-pointer=none` to get reliable results across targets. Note: I don't have commit access.	2024-10-18 09:35:42 +04:00
goldsteinn	c85611e858	[SimplifyLibCall][Attribute] Fix bug where we may keep `range` attr with incompatible type (#112649 ) In a variety of places we change the bitwidth of a parameter but don't update the attributes. The issue in this case is from the `range` attribute when inlining `__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an `i8`, and if the `i32` had a `range` attr assosiated it will cause an error. Fixes #112633	2024-10-17 10:32:55 -05:00
Jay Foad	85c17e4092	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706 ) Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsic call.	2024-10-17 16:20:43 +01:00
Stanislav Mekhanoshin	1cc5290a30	[AMDGPU] Factor out getNumUsedPhysRegs(). NFC. (#112624 ) I will need it from one more place.	2024-10-17 00:47:19 -07:00
Nikita Popov	255a99c29f	[APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (#80309 ) This fixes all the places that hit the new assertion added in https://github.com/llvm/llvm-project/pull/106524 in tests. That is, cases where the value passed to the APInt constructor is not an N-bit signed/unsigned integer, where N is the bit width and signedness is determined by the isSigned flag. The fixes either set the correct value for isSigned, set the implicitTrunc flag, or perform more calculations inside APInt. Note that the assertion is currently still disabled by default, so this patch is mostly NFC.	2024-10-17 08:48:08 +02:00
Brox Chen	35e937b4de	[AMDGPU][True16][CodeGen] fp conversion in true/fake16 format (#101678 ) fp conversion V_CVT_F_F/V_CVT_F_U instructions true16 format were previously implemented using fake16 profile. With the MC support inplace, correct and support these instructions in true16/fake16 format in CodeGen	2024-10-16 12:26:01 -04:00
Jay Foad	d9c95efb6c	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112546 ) Convert almost every instance of: CreateCall(Intrinsic::getOrInsertDeclaration(...), ...) to the equivalent CreateIntrinsic call.	2024-10-16 15:43:30 +01:00
Brox Chen	7b4c8b35d4	[AMDGPU][True16][MC] VOP3 profile in True16 format (#109031 ) Modify VOP3 profile and pesudo, and add encoding info for VOP3 True16 including DPP and DPP8 in true16 and fake16 format. This patch applies true16/fake16 changes and asm/dasm changes to V_ADD_NC_U16 V_ADD_NC_I16 V_SUB_NC_U16 V_SUB_NC_I16	2024-10-16 10:27:44 -04:00

1 2 3 4 5 ...

9730 Commits