9714 Commits

Author SHA1 Message Date
Jay Foad
a156362e93
[AMDGPU] Fix machine verification failure after SIFoldOperandsImpl::tryFoldOMod (#113544)
Fixes #54201
2024-10-29 14:59:37 +00:00
Shilei Tian
e268398fa8
[NFC][AMDGPU] Use !foreach to replace explicit list of registers (#114005) 2024-10-29 10:50:06 -04:00
Matt Arsenault
88e23eb2cf
DAG: Fix legalization of vector addrspacecasts (#113964) 2024-10-29 08:08:50 -05:00
Shilei Tian
4cf128512b
[NFC][AMDGPU] Use C++17 structured bindings as much as possible (#113939)
This only changes `llvm/lib/Target/AMDGPU/SIISelLowering.cpp`.
There are five uses of `std::tie` remaining because they can't be
replaced with
C++17 structured bindings.
2024-10-28 13:55:57 -04:00
Mirko Brkušanin
fa4790e404
[AMDGPU][MC] Fix disassembler for VIMAGE when non-first vaddr is v0 (#113569)
For disassembler tables we use *V1_V4* variants for VIMAGE and then
remove unused vaddr fields. *V1_V1* variant, which has every vaddr
field other than vaddr0 set to 0, was also enabled and caused confusion
when decoding cases which used v0 (whose encoded value is 0)
2024-10-28 10:43:18 +01:00
Fabian Ritter
a4fd3dba6e
[AMDGPU] Use wider loop lowering type for LowerMemIntrinsics (#112332)
When llvm.memcpy or llvm.memmove intrinsics are lowered as a loop in
LowerMemIntrinsics.cpp, the loop consists of a single load/store pair
per iteration. We can improve performance in some cases by emitting
multiple load/store pairs per iteration. This patch achieves that by
increasing the width of the loop lowering type in the GCN target and
letting legalization split the resulting too-wide access pairs into
multiple legal access pairs.

This change only affects lowered memcpys and memmoves with large (>=
1024 bytes) constant lengths. Smaller constant lengths are handled by
ISel directly; non-constant lengths would be slowed down by this change
if the dynamic length was smaller or slightly larger than what an
unrolled iteration copies.

The chosen default unroll factor is the result of microbenchmarks on
gfx1030. This change leads to speedups of 15-38% for global memory and
1.9-5.8x for scratch in these microbenchmarks.

Part of SWDEV-455845.
2024-10-28 09:04:19 +01:00
Jun Wang
19b0453361
[AMDGPU][MC] Fix disassembler problem for image_atomic with TFE (#112622)
For image_atomic instructions with TFE, in some cases (e.g., when
dmask=3) the disassembler produces dst register with wrong size (e.g.,
image_atomic_smin v5, v1, s[8:15] dmask:0x3 tfe, instead of v[5:7]).
This patch fixes the VDataDwords values for image atomic instructions.
2024-10-24 16:19:18 -07:00
Akshat Oke
789fdd536d
[CodeGen][NewPM] Make MFProperties methods const NFC (#113304)
Makes them congruent with the legacy PM methods.
2024-10-25 00:24:44 +05:30
Kazu Hirata
141574bacb
[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#113415) 2024-10-23 10:44:09 -07:00
Kazu Hirata
0cb80c4f00
[AMDGPU] Avoid repeated hash lookups (NFC) (#113409) 2024-10-22 23:02:34 -07:00
Carl Ritson
076aac59ac
[AMDGPU] Add a new target for gfx1153 (#113138) 2024-10-23 12:56:58 +09:00
Janek van Oirschot
a18826d75c
[AMDGPU] Create local KnownBits in case DenseMap gets invalidated (#111568)
KnownBits retrieved from DenseMap may invalidate if insertion requires a
(re)growth.

Fixes https://github.com/llvm/llvm-project/issues/110930
2024-10-22 16:05:07 +01:00
Jay Foad
b3acb25735
[AMDGPU] Don't rely on !eq comparing int with bits<5>. NFC. (#113279)
Tweak VOP2eInst_Base so that it does not rely on !eq comparing an int
value (-1) with a bits<5> value. This is to avoid a change in behaviour
when #112904 lands, which is a bug fix which has the side effect of
implicitly casting template arguments to the declared template parameter
type.
2024-10-22 12:20:36 +01:00
Akshat Oke
ca32bd643b
[NewPM][AMDGPU] Port SIPreAllocateWWMRegs to NPM (#109939) 2024-10-22 15:37:08 +05:30
Akshat Oke
4e32d7236b
[NewPM][CodeGen] Port LiveRegMatrix to NPM (#109938) 2024-10-22 15:28:04 +05:30
Akshat Oke
834b820f40
[AMDGPU] Correct pass dependencies for SILowerSGPRSpills (#109937)
Replace unused analysis (VirtRegMap) dependency with the used one (SlotIndexes)
Initializes `SlotIndexesWrapperPass` which is used by SILowerSGPRSpills to ensure that legacy pass manager finds it.
Removes the initialization for `VirtRegMapWrapperPass` since it is not requested in this pass.
2024-10-22 15:20:54 +05:30
Akshat Oke
93802815ab
[NewPM][CodeGen] Port VirtRegMap to NPM (#109936) 2024-10-22 15:15:56 +05:30
Fabian Ritter
69abfd3141
[AMDGPU] Allow casts between the Global and Constant Addr Spaces in isValidAddrSpaceCast (#112493)
So far, isValidAddrSpaceCast only allows casts to the flat address
space and between the constant(32) address spaces. It does not allow
casting between the global and constant address spaces, even though they
alias. That affects, e.g., the lowering of memmoves from the constant to
the global address space in LowerMemIntrinsics, since that requires
aliasing address spaces to be castable.

This patch relaxes isValidAddrSpaceCast and allows such casts. It also
includes a memmove test that would crash with the previous
implementation because the memmove IR lowering would not be
applicable for the move from constant AS to global AS.
2024-10-22 09:33:21 +02:00
Shilei Tian
c3fe0e46e2
[NFC][AMDGPU] clang-format llvm/lib/Target/AMDGPU/SIISelLowering.cpp (#112645) 2024-10-21 16:42:25 -07:00
Kazu Hirata
766bd6f4d0
[AMDGPU] Avoid repeated map lookups (NFC) (#112819) 2024-10-21 10:35:53 -07:00
Stanislav Mekhanoshin
3277c7cd28
[AMDGPU] Skip VGPR deallocation for waveslot limited kernels (#112765)
MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's
been identified this message is only really needed for VGPR limited
kernels. A kernel becomes VGPR limited if a total number of VGPRs per
SIMD / number of used VGPRs is more than a number of wave slots.
2024-10-21 09:39:52 -07:00
Akshat Oke
6360652e9f
Reland [AMDGPU] Serialize WWM_REG vreg flag (#110229) (#112492)
A reland but not an exact copy as `VRegInfo.Flags` from the parser is
now an int8 instead of a vector; so only need to copy over the value.
2024-10-21 13:44:09 +05:30
Christudasan Devadasan
3c5cea650d
[AMDGPU]: Add implicit-def to the BB prolog (#112872)
IMPLICIT_DEF inserted for a wwm-register at the
very first block or the predecessor block where
it is used for sgpr spilling can appear at a block
begin that requires spill-insertion during per-lane
VGPR regalloc phase. The presence of the IMPLICIT_DEF
currently breaks the BB prolog.

Fixes: SWDEV-490717
2024-10-21 13:21:16 +05:30
Matt Arsenault
ef91cd3f01
AMDGPU: Handle folding frame indexes into add with immediate (#110738) 2024-10-19 12:33:03 -07:00
Jay Foad
922992a22f
Fix typo "instrinsic" (#112899) 2024-10-18 15:58:33 +01:00
Mariusz Sikora
bafc66e50f
[AMDGPU][NFC] Correct description (#112847) 2024-10-18 10:41:16 +02:00
Alex Rønne Petersen
ad4a582fd9
[llvm] Consistently respect naked fn attribute in TargetFrameLowering::hasFP() (#106014)
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.

Note: I don't have commit access.
2024-10-18 09:35:42 +04:00
goldsteinn
c85611e858
[SimplifyLibCall][Attribute] Fix bug where we may keep range attr with incompatible type (#112649)
In a variety of places we change the bitwidth of a parameter but don't
update the attributes.

The issue in this case is from the `range` attribute when inlining
`__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an
`i8`, and if the `i32` had a `range` attr assosiated it will cause an
error.

Fixes #112633
2024-10-17 10:32:55 -05:00
Jay Foad
85c17e4092
[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)
Convert many instances of:
  Fn = Intrinsic::getOrInsertDeclaration(...);
  CreateCall(Fn, ...)
to the equivalent CreateIntrinsic call.
2024-10-17 16:20:43 +01:00
Stanislav Mekhanoshin
1cc5290a30
[AMDGPU] Factor out getNumUsedPhysRegs(). NFC. (#112624)
I will need it from one more place.
2024-10-17 00:47:19 -07:00
Nikita Popov
255a99c29f
[APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (#80309)
This fixes all the places that hit the new assertion added in
https://github.com/llvm/llvm-project/pull/106524 in tests. That is,
cases where the value passed to the APInt constructor is not an N-bit
signed/unsigned integer, where N is the bit width and signedness is
determined by the isSigned flag.

The fixes either set the correct value for isSigned, set the
implicitTrunc flag, or perform more calculations inside APInt.

Note that the assertion is currently still disabled by default, so this
patch is mostly NFC.
2024-10-17 08:48:08 +02:00
Brox Chen
35e937b4de
[AMDGPU][True16][CodeGen] fp conversion in true/fake16 format (#101678)
fp conversion V_CVT_F_F/V_CVT_F_U instructions true16 format were
previously implemented using fake16 profile.

With the MC support inplace, correct and support these instructions in
true16/fake16 format in CodeGen
2024-10-16 12:26:01 -04:00
Jay Foad
d9c95efb6c
[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112546)
Convert almost every instance of:
  CreateCall(Intrinsic::getOrInsertDeclaration(...), ...)
to the equivalent CreateIntrinsic call.
2024-10-16 15:43:30 +01:00
Brox Chen
7b4c8b35d4
[AMDGPU][True16][MC] VOP3 profile in True16 format (#109031)
Modify VOP3 profile and pesudo, and add encoding info for VOP3 True16
including DPP and DPP8 in true16 and fake16 format.

This patch applies true16/fake16 changes and asm/dasm changes to
V_ADD_NC_U16
V_ADD_NC_I16
V_SUB_NC_U16
V_SUB_NC_I16
2024-10-16 10:27:44 -04:00
Rahul Joshi
6924fc0326
[LLVM] Add Intrinsic::getDeclarationIfExists (#112428)
Add `Intrinsic::getDeclarationIfExists` to lookup an existing
declaration of an intrinsic in a `Module`.
2024-10-16 07:21:10 -07:00
Christudasan Devadasan
72a7b471de
[AMDGPU][NewPM] Fill out addILPOpts. (#108514) 2024-10-16 13:30:46 +05:30
Christudasan Devadasan
488d3924dd
[CodeGen][NewPM] Port EarlyIfConversion pass to NPM. (#108508) 2024-10-16 13:22:57 +05:30
Petar Avramovic
14d006c53c
AMDGPU/GlobalISel: Run redundant_and combine in RegBankCombiner (#112353)
Combine is needed to clear redundant ANDs with 1 that will be
created by reg-bank-select to clean-up high bits in register.
Fix replaceRegWith from CombinerHelper:
If copy had to be inserted, first create copy then delete MI.
If MI is deleted first insert point is not valid.
2024-10-16 09:43:16 +02:00
Peter Collingbourne
3cab8827fd Revert "[AMDGPU] Serialize WWM_REG vreg flag (#110229)"
This reverts commit bec839d8eed9dd13fa7eaffd50b28f8f913de2e2.

Caused buildbot failures, e.g.
https://lab.llvm.org/buildbot/#/builders/52/builds/2928
2024-10-15 13:18:43 -07:00
Kazu Hirata
bc09bebcc2
[AMDGPU] Avoid repeated hash lookups (NFC) (#112309) 2024-10-14 23:09:28 -07:00
Pierre van Houtryve
b3a8400afa (reland) [AMDGPU][SplitModule] Handle !callees metadata (#108802)
(reland with fixed sed command for macos)

Handle the `!callees` metadata to further reduce the amount of indirect
call cases that end up conservatively assuming that any indirectly
callable function is a potential target.
2024-10-15 07:16:57 +02:00
Carl Ritson
784230b850
[AMDGPU] Tidy SIPreAllocateWWMRegs after recent changes (NFCI) (#111967)
- V_SET_INACTIVE is always in WWM/WQM so can be treated like any other
operation in WWM/WQM.
- After encountering SI_SPILL_S32_TO_VGPR loop should bypass to avoid
double processing its defs.
2024-10-15 11:48:22 +09:00
Nico Weber
140cbca83d Revert "[AMDGPU][SplitModule] Handle !callees metadata (#108802)"
This reverts commit 4a0dc3ef36ceff20787ff277a1fb6a1b513c4934.
Breaks tests, see comments on
https://github.com/llvm/llvm-project/pull/108802
2024-10-14 17:26:15 -04:00
Shilei Tian
a74659445d
[AMDGPU] Skip terminators when forcing emit zero flag (#112116)
When forcing emit zero, we need to skip terminators of a MBB; otherwise
the terminator list of the MBB would be broken.
2024-10-14 11:46:18 -04:00
Jay Foad
cbc4be2dd5 [AMDGPU] Use MachineInstr::mayLoadOrStore. NFC. 2024-10-14 15:37:56 +01:00
Akshat Oke
bec839d8ee
[AMDGPU] Serialize WWM_REG vreg flag (#110229) 2024-10-14 14:37:21 +05:30
Pierre van Houtryve
4a0dc3ef36
[AMDGPU][SplitModule] Handle !callees metadata (#108802)
See #106528 to review the first commit.

Handle the `!callees` metadata to further reduce the amount of indirect
call cases that end up conservatively assuming that any indirectly
callable function is a potential target.
2024-10-14 08:55:12 +02:00
Shilei Tian
ed77df56f2 [NFC] clang-format llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 2024-10-14 00:57:01 -04:00
Shilei Tian
3da7d55b35
[NFC][AMDGPU] Remove unnecessary member ForceEmitZeroWaitcnts (#112114)
We can use `ForceEmitZeroFlag` directly.
2024-10-14 00:54:16 -04:00
Kazu Hirata
48deb3568e
[AMDGPU] Avoid repeated hash lookups (NFC) (#112115) 2024-10-12 22:07:22 -07:00