llvm-project

Author	SHA1	Message	Date
Jay Foad	a156362e93	[AMDGPU] Fix machine verification failure after SIFoldOperandsImpl::tryFoldOMod (#113544 ) Fixes #54201	2024-10-29 14:59:37 +00:00
Shilei Tian	e268398fa8	[NFC][AMDGPU] Use `!foreach` to replace explicit list of registers (#114005 )	2024-10-29 10:50:06 -04:00
Matt Arsenault	88e23eb2cf	DAG: Fix legalization of vector addrspacecasts (#113964 )	2024-10-29 08:08:50 -05:00
Shilei Tian	4cf128512b	[NFC][AMDGPU] Use C++17 structured bindings as much as possible (#113939 ) This only changes `llvm/lib/Target/AMDGPU/SIISelLowering.cpp`. There are five uses of `std::tie` remaining because they can't be replaced with C++17 structured bindings.	2024-10-28 13:55:57 -04:00
Mirko Brkušanin	fa4790e404	[AMDGPU][MC] Fix disassembler for VIMAGE when non-first vaddr is v0 (#113569 ) For disassembler tables we use V1_V4 variants for VIMAGE and then remove unused vaddr fields. V1_V1 variant, which has every vaddr field other than vaddr0 set to 0, was also enabled and caused confusion when decoding cases which used v0 (whose encoded value is 0)	2024-10-28 10:43:18 +01:00
Fabian Ritter	a4fd3dba6e	[AMDGPU] Use wider loop lowering type for LowerMemIntrinsics (#112332 ) When llvm.memcpy or llvm.memmove intrinsics are lowered as a loop in LowerMemIntrinsics.cpp, the loop consists of a single load/store pair per iteration. We can improve performance in some cases by emitting multiple load/store pairs per iteration. This patch achieves that by increasing the width of the loop lowering type in the GCN target and letting legalization split the resulting too-wide access pairs into multiple legal access pairs. This change only affects lowered memcpys and memmoves with large (>= 1024 bytes) constant lengths. Smaller constant lengths are handled by ISel directly; non-constant lengths would be slowed down by this change if the dynamic length was smaller or slightly larger than what an unrolled iteration copies. The chosen default unroll factor is the result of microbenchmarks on gfx1030. This change leads to speedups of 15-38% for global memory and 1.9-5.8x for scratch in these microbenchmarks. Part of SWDEV-455845.	2024-10-28 09:04:19 +01:00
Jun Wang	19b0453361	[AMDGPU][MC] Fix disassembler problem for image_atomic with TFE (#112622 ) For image_atomic instructions with TFE, in some cases (e.g., when dmask=3) the disassembler produces dst register with wrong size (e.g., image_atomic_smin v5, v1, s[8:15] dmask:0x3 tfe, instead of v[5:7]). This patch fixes the VDataDwords values for image atomic instructions.	2024-10-24 16:19:18 -07:00
Akshat Oke	789fdd536d	[CodeGen][NewPM] Make MFProperties methods const NFC (#113304 ) Makes them congruent with the legacy PM methods.	2024-10-25 00:24:44 +05:30
Kazu Hirata	141574bacb	[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#113415 )	2024-10-23 10:44:09 -07:00
Kazu Hirata	0cb80c4f00	[AMDGPU] Avoid repeated hash lookups (NFC) (#113409 )	2024-10-22 23:02:34 -07:00
Carl Ritson	076aac59ac	[AMDGPU] Add a new target for gfx1153 (#113138 )	2024-10-23 12:56:58 +09:00
Janek van Oirschot	a18826d75c	[AMDGPU] Create local KnownBits in case DenseMap gets invalidated (#111568 ) KnownBits retrieved from DenseMap may invalidate if insertion requires a (re)growth. Fixes https://github.com/llvm/llvm-project/issues/110930	2024-10-22 16:05:07 +01:00
Jay Foad	b3acb25735	[AMDGPU] Don't rely on !eq comparing int with bits<5>. NFC. (#113279 ) Tweak VOP2eInst_Base so that it does not rely on !eq comparing an int value (-1) with a bits<5> value. This is to avoid a change in behaviour when #112904 lands, which is a bug fix which has the side effect of implicitly casting template arguments to the declared template parameter type.	2024-10-22 12:20:36 +01:00
Akshat Oke	ca32bd643b	[NewPM][AMDGPU] Port SIPreAllocateWWMRegs to NPM (#109939 )	2024-10-22 15:37:08 +05:30
Akshat Oke	4e32d7236b	[NewPM][CodeGen] Port LiveRegMatrix to NPM (#109938 )	2024-10-22 15:28:04 +05:30
Akshat Oke	834b820f40	[AMDGPU] Correct pass dependencies for SILowerSGPRSpills (#109937 ) Replace unused analysis (VirtRegMap) dependency with the used one (SlotIndexes) Initializes `SlotIndexesWrapperPass` which is used by SILowerSGPRSpills to ensure that legacy pass manager finds it. Removes the initialization for `VirtRegMapWrapperPass` since it is not requested in this pass.	2024-10-22 15:20:54 +05:30
Akshat Oke	93802815ab	[NewPM][CodeGen] Port VirtRegMap to NPM (#109936 )	2024-10-22 15:15:56 +05:30
Fabian Ritter	69abfd3141	[AMDGPU] Allow casts between the Global and Constant Addr Spaces in isValidAddrSpaceCast (#112493 ) So far, isValidAddrSpaceCast only allows casts to the flat address space and between the constant(32) address spaces. It does not allow casting between the global and constant address spaces, even though they alias. That affects, e.g., the lowering of memmoves from the constant to the global address space in LowerMemIntrinsics, since that requires aliasing address spaces to be castable. This patch relaxes isValidAddrSpaceCast and allows such casts. It also includes a memmove test that would crash with the previous implementation because the memmove IR lowering would not be applicable for the move from constant AS to global AS.	2024-10-22 09:33:21 +02:00
Shilei Tian	c3fe0e46e2	[NFC][AMDGPU] clang-format `llvm/lib/Target/AMDGPU/SIISelLowering.cpp` (#112645 )	2024-10-21 16:42:25 -07:00
Kazu Hirata	766bd6f4d0	[AMDGPU] Avoid repeated map lookups (NFC) (#112819 )	2024-10-21 10:35:53 -07:00
Stanislav Mekhanoshin	3277c7cd28	[AMDGPU] Skip VGPR deallocation for waveslot limited kernels (#112765 ) MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's been identified this message is only really needed for VGPR limited kernels. A kernel becomes VGPR limited if a total number of VGPRs per SIMD / number of used VGPRs is more than a number of wave slots.	2024-10-21 09:39:52 -07:00
Akshat Oke	6360652e9f	Reland [AMDGPU] Serialize WWM_REG vreg flag (#110229 ) (#112492 ) A reland but not an exact copy as `VRegInfo.Flags` from the parser is now an int8 instead of a vector; so only need to copy over the value.	2024-10-21 13:44:09 +05:30
Christudasan Devadasan	3c5cea650d	[AMDGPU]: Add implicit-def to the BB prolog (#112872 ) IMPLICIT_DEF inserted for a wwm-register at the very first block or the predecessor block where it is used for sgpr spilling can appear at a block begin that requires spill-insertion during per-lane VGPR regalloc phase. The presence of the IMPLICIT_DEF currently breaks the BB prolog. Fixes: SWDEV-490717	2024-10-21 13:21:16 +05:30
Matt Arsenault	ef91cd3f01	AMDGPU: Handle folding frame indexes into add with immediate (#110738 )	2024-10-19 12:33:03 -07:00
Jay Foad	922992a22f	Fix typo "instrinsic" (#112899 )	2024-10-18 15:58:33 +01:00
Mariusz Sikora	bafc66e50f	[AMDGPU][NFC] Correct description (#112847 )	2024-10-18 10:41:16 +02:00
Alex Rønne Petersen	ad4a582fd9	[llvm] Consistently respect `naked` fn attribute in `TargetFrameLowering::hasFP()` (#106014 ) Some targets (e.g. PPC and Hexagon) already did this. I think it's best to do this consistently so that frontend authors don't run into inconsistent results when they emit `naked` functions. For example, in Zig, we had to change our emit code to also set `frame-pointer=none` to get reliable results across targets. Note: I don't have commit access.	2024-10-18 09:35:42 +04:00
goldsteinn	c85611e858	[SimplifyLibCall][Attribute] Fix bug where we may keep `range` attr with incompatible type (#112649 ) In a variety of places we change the bitwidth of a parameter but don't update the attributes. The issue in this case is from the `range` attribute when inlining `__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an `i8`, and if the `i32` had a `range` attr assosiated it will cause an error. Fixes #112633	2024-10-17 10:32:55 -05:00
Jay Foad	85c17e4092	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706 ) Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsic call.	2024-10-17 16:20:43 +01:00
Stanislav Mekhanoshin	1cc5290a30	[AMDGPU] Factor out getNumUsedPhysRegs(). NFC. (#112624 ) I will need it from one more place.	2024-10-17 00:47:19 -07:00
Nikita Popov	255a99c29f	[APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (#80309 ) This fixes all the places that hit the new assertion added in https://github.com/llvm/llvm-project/pull/106524 in tests. That is, cases where the value passed to the APInt constructor is not an N-bit signed/unsigned integer, where N is the bit width and signedness is determined by the isSigned flag. The fixes either set the correct value for isSigned, set the implicitTrunc flag, or perform more calculations inside APInt. Note that the assertion is currently still disabled by default, so this patch is mostly NFC.	2024-10-17 08:48:08 +02:00
Brox Chen	35e937b4de	[AMDGPU][True16][CodeGen] fp conversion in true/fake16 format (#101678 ) fp conversion V_CVT_F_F/V_CVT_F_U instructions true16 format were previously implemented using fake16 profile. With the MC support inplace, correct and support these instructions in true16/fake16 format in CodeGen	2024-10-16 12:26:01 -04:00
Jay Foad	d9c95efb6c	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112546 ) Convert almost every instance of: CreateCall(Intrinsic::getOrInsertDeclaration(...), ...) to the equivalent CreateIntrinsic call.	2024-10-16 15:43:30 +01:00
Brox Chen	7b4c8b35d4	[AMDGPU][True16][MC] VOP3 profile in True16 format (#109031 ) Modify VOP3 profile and pesudo, and add encoding info for VOP3 True16 including DPP and DPP8 in true16 and fake16 format. This patch applies true16/fake16 changes and asm/dasm changes to V_ADD_NC_U16 V_ADD_NC_I16 V_SUB_NC_U16 V_SUB_NC_I16	2024-10-16 10:27:44 -04:00
Rahul Joshi	6924fc0326	[LLVM] Add `Intrinsic::getDeclarationIfExists` (#112428 ) Add `Intrinsic::getDeclarationIfExists` to lookup an existing declaration of an intrinsic in a `Module`.	2024-10-16 07:21:10 -07:00
Christudasan Devadasan	72a7b471de	[AMDGPU][NewPM] Fill out addILPOpts. (#108514 )	2024-10-16 13:30:46 +05:30
Christudasan Devadasan	488d3924dd	[CodeGen][NewPM] Port EarlyIfConversion pass to NPM. (#108508 )	2024-10-16 13:22:57 +05:30
Petar Avramovic	14d006c53c	AMDGPU/GlobalISel: Run redundant_and combine in RegBankCombiner (#112353 ) Combine is needed to clear redundant ANDs with 1 that will be created by reg-bank-select to clean-up high bits in register. Fix replaceRegWith from CombinerHelper: If copy had to be inserted, first create copy then delete MI. If MI is deleted first insert point is not valid.	2024-10-16 09:43:16 +02:00
Peter Collingbourne	3cab8827fd	Revert "[AMDGPU] Serialize WWM_REG vreg flag (#110229 )" This reverts commit bec839d8eed9dd13fa7eaffd50b28f8f913de2e2. Caused buildbot failures, e.g. https://lab.llvm.org/buildbot/#/builders/52/builds/2928	2024-10-15 13:18:43 -07:00
Kazu Hirata	bc09bebcc2	[AMDGPU] Avoid repeated hash lookups (NFC) (#112309 )	2024-10-14 23:09:28 -07:00
Pierre van Houtryve	b3a8400afa	(reland) [AMDGPU][SplitModule] Handle !callees metadata (#108802 ) (reland with fixed sed command for macos) Handle the `!callees` metadata to further reduce the amount of indirect call cases that end up conservatively assuming that any indirectly callable function is a potential target.	2024-10-15 07:16:57 +02:00
Carl Ritson	784230b850	[AMDGPU] Tidy SIPreAllocateWWMRegs after recent changes (NFCI) (#111967 ) - V_SET_INACTIVE is always in WWM/WQM so can be treated like any other operation in WWM/WQM. - After encountering SI_SPILL_S32_TO_VGPR loop should bypass to avoid double processing its defs.	2024-10-15 11:48:22 +09:00
Nico Weber	140cbca83d	Revert "[AMDGPU][SplitModule] Handle !callees metadata (#108802 )" This reverts commit 4a0dc3ef36ceff20787ff277a1fb6a1b513c4934. Breaks tests, see comments on https://github.com/llvm/llvm-project/pull/108802	2024-10-14 17:26:15 -04:00
Shilei Tian	a74659445d	[AMDGPU] Skip terminators when forcing emit zero flag (#112116 ) When forcing emit zero, we need to skip terminators of a MBB; otherwise the terminator list of the MBB would be broken.	2024-10-14 11:46:18 -04:00
Jay Foad	cbc4be2dd5	[AMDGPU] Use MachineInstr::mayLoadOrStore. NFC.	2024-10-14 15:37:56 +01:00
Akshat Oke	bec839d8ee	[AMDGPU] Serialize WWM_REG vreg flag (#110229 )	2024-10-14 14:37:21 +05:30
Pierre van Houtryve	4a0dc3ef36	[AMDGPU][SplitModule] Handle !callees metadata (#108802 ) See #106528 to review the first commit. Handle the `!callees` metadata to further reduce the amount of indirect call cases that end up conservatively assuming that any indirectly callable function is a potential target.	2024-10-14 08:55:12 +02:00
Shilei Tian	ed77df56f2	[NFC] clang-format llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp	2024-10-14 00:57:01 -04:00
Shilei Tian	3da7d55b35	[NFC][AMDGPU] Remove unnecessary member `ForceEmitZeroWaitcnts` (#112114 ) We can use `ForceEmitZeroFlag` directly.	2024-10-14 00:54:16 -04:00
Kazu Hirata	48deb3568e	[AMDGPU] Avoid repeated hash lookups (NFC) (#112115 )	2024-10-12 22:07:22 -07:00

1 2 3 4 5 ...

9714 Commits