765 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
47ed921985
[AMDGPU] Add legality check when folding short 64-bit literals (#69391)
We can only fold it if it can fit into 32-bit. I believe it did not
trigger yet because we do not select 64-bit literals generally.
2023-10-18 09:22:23 -07:00
Sirish Pande
28e4f97320
[AMDGPU] Save/Restore SCC bit across waterfall loop. (#68363)
Waterfall loop is overwriting SCC bit of status register. Make sure SCC
bit is saved and restored across.
We need to save/restore only in cases where SCC is live across waterfall
loop.

Co-authored-by: Sirish Pande <sirish.pande@amd.com>
2023-10-18 08:43:29 -05:00
Stanislav Mekhanoshin
a22a1fe151
[AMDGPU] support 64-bit immediates in SIInstrInfo::FoldImmediate (#69260)
This is a part of https://github.com/llvm/llvm-project/issues/67781.
Until we select more 64-bit move immediates the impact is minimal.
2023-10-17 10:53:22 -07:00
Petar Avramovic
2fa7d652d0 AMDGPU: Fix temporal divergence introduced by machine-sink (#67456)
Temporal divergence that was present in input or introduced in IR
transforms, like code-sinking or LICM, is handled in SIFixSGPRCopies
by changing sgpr source instr to vgpr instr.
After 5b657f5, that moved LICM after AMDGPUCodeGenPrepare,
machine-sinking can introduce temporal divergence by sinking
instructions outside of the cycle.
Add isSafeToSink callback in TargetInstrInfo.
2023-10-06 15:00:08 +02:00
Ivan Kosarev
f04aa1f814
[AMDGPU][CodeGen] Fold immediates in src1 operands of V_MAD/MAC/FMA/FMAC. (#68002) 2023-10-05 14:22:29 +03:00
Ivan Kosarev
cf80defae2
[AMDGPU][GFX11] Do not rewrite V_FMA/FMAC_* to V_FMAAK_F16_t16 on operand legalization. (#66202)
V_FMAAK_F16_t16 takes VGPR_32_Lo128 operands whereas the original
instructions would have VGPR_32 operands. Switching the opcodes without
updating operands' register classes leads to MachineVerifier complaining
about the classes not matching instruction definitions. The problem only
reveals itself of builds with expensive checks enabled because of
missing -verify-machineinstrs in the test.

This is the third attempt to update CodeGen/AMDGPU/fma.f16.ll to run for
GFX11, following the second attempt in a1e38e0b8e3e, partially reverted
in eaf737a4e004.
2023-10-04 12:41:46 +01:00
Ivan Kosarev
64482d5766
[AMDGPU] Fix passing CodeGen/AMDGPU/frem.ll on gfx1150. (#67425)
We would currently crash on it trying to use t16 instructions instead of
fake16 ones.
2023-09-26 15:13:23 +01:00
Ivan Kosarev
287f6cdd17 [AMDGPU] Remove the support for non-True16 copies between different register sizes.
Differential Revision: https://reviews.llvm.org/D156985
2023-09-26 14:46:34 +01:00
Ivan Kosarev
758df22bcf [AMDGPU][True16] Support emitting copies between different register sizes.
Differential Revision: https://reviews.llvm.org/D156105
2023-09-26 12:15:34 +01:00
Mirko Brkušanin
ecfdc23dd2
[AMDGPU] Select gfx1150 SALU Float instructions (#66885) 2023-09-21 12:22:55 +02:00
Nicolai Hähnle
2eb767c9e1
AMDGPU: Scratch instructions are trivially disjoint from SMEM and buffer instructions (#65287)
Scratch instructions are always in addrspace(5), which can only alias
with flat (and itself). SMEM and buffer instructions can never reference
those address spaces, so they are trivially disjoint.
2023-09-08 07:43:36 +02:00
Stanislav Mekhanoshin
294f632859 [AMDGPU] Move add64/sub64 to VALU
This is NFCI as far as I can tell, but I see no reason not to do it.

Differential Revision: https://reviews.llvm.org/D159077
2023-08-29 09:31:04 -07:00
Mateusz Hurnik
232f0c9a9a [NFC][AMDGPU] Remove redundant code
As the result of this constant function is unused it is redundant.

Reviewed by: arsenm
Differential Revision: https://reviews.llvm.org/D158747
2023-08-25 11:18:48 +01:00
Stanislav Mekhanoshin
cfe9a134bb [AMDGPU] Rename 64BitDPP feature and fix the checks
Names '64BitDPP' and especially 'DPP64' were found misleading, and
DPP64 can easily be mixed with DPP16 and DPP8 while these are
different concepts. DPP16 and DPP8 refers to lanes where DPP64
refers to the operand size.

In fact the essential part here is that these instructions are
executed on the DP ALU, so rename the feature accordingly.

I have also found a bug in a check for these instructions, which is
fixed here and a common utility function is now used.

Differential Revision: https://reviews.llvm.org/D158465
2023-08-22 11:00:10 -07:00
Mirko Brkusanin
de82fde22d AMDGPU/Uniformity/GlobalISel: G_AMDGPU atomics are always divergent
Patch by: Acim Maravic

Differential Revision: https://reviews.llvm.org/D157091
2023-08-18 18:23:40 +02:00
Christudasan Devadasan
81827f8cfb [AMDGPU] Support wwm-reg AV spill pseudos
The wwm register spill pseudos are currently defined for VGPR_32
regclass. It causes a verifier error for gfx908 or above as the
regalloc sometimes restores the values to the vector superclass AV_32.
Fixing it by supporting AV wwm-spill pseudos as well.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155646
2023-08-17 20:04:18 +05:30
Stanislav Mekhanoshin
f7480bc5c1 [AMDGPU] Decouple V_PK_MOV_B32 from FeaturePackedFP32Ops
This is not an FP32 operation.

Differential Revision: https://reviews.llvm.org/D157909
2023-08-14 14:22:52 -07:00
Mirko Brkusanin
1e5359c6ba [AMDGPU] Treat KIMM32 and KIMM16 operand types as noninlinable
While they are represent 32/16 bit immediate values they are already
included in encoding of the instructions that use them and are not true
literals. FMAMK and FMAAK instructions that use them are marked with fixed
size so getInstSizeInBytes will not increase the size for these operands.

We also add tests whose logic relies on KIMM16 and KIMM32 being considered
not inlinable.

Differential Revision: https://reviews.llvm.org/D157624
2023-08-11 18:46:39 +02:00
Matt Arsenault
4b1702e87a AMDGPU: Fix counting source modifiers as literal constants
This fixes over estimating code size. This was broken by
79f52af4cd9a76485dd50bcdbb5d393eb7a70103.

https://reviews.llvm.org/D157103
2023-08-07 18:40:16 -04:00
Sander de Smalen
bbb95893de [TII] NFCI: Simplify the interface for isTriviallyReMaterializable
Currently `isTriviallyReMaterializable` calls
`isReallyTriviallyReMaterializable` and
`isReallyTriviallyReMaterializableGeneric`. The two interfaces
are confusing, but there are also some real issues with this.

The documentation of this function (see below) suggests that
`isReallyTriviallyRematerializable` allows the target to override the
default behaviour.

  /// For instructions with opcodes for which the M_REMATERIALIZABLE flag is
  /// set, this hook lets the target specify whether the instruction is actually
  /// trivially rematerializable, taking into consideration its operands.

It however implements something different. The default behaviour
is the analysis done in `isReallyTriviallyReMaterializableGeneric`,
which is testing if it is safe to rematerialize the MachineInstr.

The result of `isReallyTriviallyReMaterializable` is only considered if
`isReallyTriviallyReMaterializableGeneric` returns `false`.  That means
there is no way to override the default behaviour if
`isReallyTriviallyReMaterializableGeneric` returns true (i.e. it is safe to
rematerialize, but we'd rather not).

By making this a single interface, we can override the interface to do either.

Reviewed By: craig.topper, nemanjai

Differential Revision: https://reviews.llvm.org/D156520
2023-08-07 13:01:06 +00:00
Jay Foad
e61ca23289 [AMDGPU] Add and use SIInstrFlags::GWS. NFC.
This reduces the number of places where we have to check for a list of
DS_GWS_* opcodes.

Differential Revision: https://reviews.llvm.org/D157099
2023-08-07 12:05:14 +01:00
Matt Arsenault
4d42e8b5d1 Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.

The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work
around the underlying problem with SUBREG_TO_REG.
2023-07-31 20:15:45 -04:00
Sameer Sahasrabuddhe
7c760b224b Restore "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR"
Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic
ID is the first non-def operand to the instruction. These are now represented as
a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID()
is now moved to this subclass GIntrinsic.

Some target-defined instructions behave like GMIR intrinsics, and have an
Intrinsic::ID operand. But they should not be recognized as generic intrinsics,
and should not use GIntrinsic::getIntrinsicID(). Separated these out by
introducing a new AMDGPU::getIntrinsicID().

Reviewed By: arsenm, Pierre-vh

Differential Revision: https://reviews.llvm.org/D155556

This restores commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa.
Originally reverted in d0f7850b01cf17e50a4f4b00e3b84dded94df6b8.
2023-07-27 14:49:17 +05:30
Vitaly Buka
a496c8be6e Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.

No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

Reviewed By: alexfh

Differential Revision: https://reviews.llvm.org/D156381
2023-07-26 22:13:32 -07:00
Sameer Sahasrabuddhe
d0f7850b01 Revert "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR"
This reverts commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa.

The changes did not cover all occurrences of the deteleted function
MachineInstr::getIntrinsicID().
2023-07-27 10:14:24 +05:30
Sameer Sahasrabuddhe
baa3386edb [GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR
Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic
ID is the first non-def operand to the instruction. These are now represented as
a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID()
is now moved to this subclass GIntrinsic.

Some target-defined instructions behave like GMIR intrinsics, and have an
Intrinsic::ID operand. But they should not be recognized as generic intrinsics,
and should not use GIntrinsic::getIntrinsicID(). Separated these out by
introducing a new AMDGPU::getIntrinsicID().

Reviewed By: arsenm, Pierre-vh

Differential Revision: https://reviews.llvm.org/D155556
2023-07-27 10:00:45 +05:30
Stanislav Mekhanoshin
7972b9c829 [AMDGPU] Move SIEncodingFamily into SIDefines.h. NFC.
I need this for future patch in the MC, while TII is not available
in the llvm-mc. Besides this is not a first time I want it there.

Differential Revision: https://reviews.llvm.org/D155228
2023-07-13 12:42:28 -07:00
Christudasan Devadasan
b4a62b1fa5 [AMDGPU] Enable whole wave register copy
So far, we haven't exposed the allocation of whole-wave
registers to regalloc. We hand-picked them for various
whole wave mode operations. With a future patch, we
want the allocator to efficiently allocate them rather
than using the custom pre-allocation pass.

Any liverange split of virtual registers involved in
whole-wave operations require the resulting COPY
introduced with the split to be performed for all
lanes. It isn't implemented in the compiler yet.

This patch would identify all such copies and
manipulate the exec mask around them to enable all
lanes without affecting the value of exec mask
elsewhere.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D143762
2023-07-07 22:58:55 +05:30
Christudasan Devadasan
b78b36e1a2 [AMDGPU] Implement whole wave register spill
To reduce the register pressure during allocation,
when the allocator spills a virtual register that
corresponds to a whole wave mode operation, the
spill loads and restores should be activated for
all lanes by temporarily flipping all bits in exec
register to one just before the spills. It is not
implemented in the compiler as of today and this
patch enables the necessary support.

This is a pre-patch before the SGPR spill to virtual
VGPR lanes that would eventually causes the whole
wave register spills during allocation.

Reviewed By: arsenm, cdevadas

Differential Revision: https://reviews.llvm.org/D143759
2023-07-07 22:51:45 +05:30
Ivan Kosarev
9d8171f8c4 [AMDGPU][Codegen] Clean up legalizeOpWithMove().
The removed logic was added in
<https://reviews.llvm.org/rG0c93c9ecee0624f8469f5a971a09fbc9e9cc1061>,
but now doesn't seem to be needed.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D154220
2023-06-30 16:48:16 +01:00
Brendon Cahoon
853b2a84cb [AMDGPU] Reserve SGPR pair when long branches are present
Branch relaxation requires 2 additional SGPRs for AMDGPU to handle the
case when an indirect branch target is too far away. The register
scavanger may not find available registers, which causes a “did not find
scavenging index” assert to occur in assignRegToScavengingIndex.

In this patch, we estimate before register allocation whether an
indirect branch is likely to be needed, and reserve 2 SGPRs if the
branch distance is found to be above a threshold. The distance threshold
is an approximation as the exact code size and branch distance are
unknown prior to register allocation.

Patch by Corbin Robeck. Thanks!

Differential Review: https://reviews.llvm.org/D149775
2023-06-29 16:50:46 -05:00
Ivan Kosarev
4e312abdfd [AMDGPU][NFC] Add a getRegBitWidth() helper for TargetRegisterClass operands.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D152257
2023-06-07 11:41:11 +01:00
Jay Foad
3030c03988 [AMDGPU] Make use of MachineInstr::all_defs and all_uses. NFCI. 2023-06-05 10:32:33 +01:00
Carl Ritson
2e87ed80b2 [AMDGPU] WQM: Allow insertion of exact mode transition as terminator
Allow WQM pass to insert transitions to exact mode among block
terminators, instead of forcing them to occur before terminators.

This should not yield any functional change, but allows block
splitting of control flow, such as that in D145329.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D151797
2023-06-02 14:00:54 +09:00
Jay Foad
94e48d433d [AMDGPU] Switch to backwards scavenging in non-spill cases
When the scavenger is not allowed to spill, the only difference between
forward and backward should be the heuristics used to pick an available
register. Forwards scavenging tries to pick a register that can be used
again later in the BB; backwards scavenging tries to pick one that can
be used earlier.

Backwards scavenging is preferred because it does not rely on accurate
kill flags.

Differential Revision: https://reviews.llvm.org/D151323
2023-05-24 15:03:33 +01:00
Jay Foad
3c0d81d43f [AMDGPU] Simplify scavenging in indirectCopyToAGPR
This just makes it clearer that we do not want the scavenger to spill
here. NFCI.

Differential Revision: https://reviews.llvm.org/D150774
2023-05-22 11:55:26 +01:00
Jay Foad
64c938e8e3 [AMDGPU] Avoid RegScavenger::forward in copyPhysReg/indirectCopyToAGPR
RegScavenger::backward is preferred because it does not rely on accurate
kill flags.

Differential Revision: https://reviews.llvm.org/D150571
2023-05-16 15:51:31 +01:00
Jeffrey Byrnes
88149fb3f4 [AMDGPU][GFX908] IndirectCopyToAGPR: Confirm modified register is dst reg of accvgpr_write
IndirectCopyToAGPR should be reworked as to avoid optimizing during copy lowering. However, as it stands, the code is buggy. This patch replaces the call to definesRegister with modifiesRegister, and confirms that the dest reg of the found accvgpr_write is in fact the src reg of our copy.

Differential Revision: https://reviews.llvm.org/D149873

Change-Id: Id8a61659ac15565dcb970069d0624f0925a46e6d
2023-05-12 12:38:29 -07:00
skc7
e016fb57b3 [AMDGPU] Legalize soffset of buffer instructions. Use Waterfall loop logic.
Legalize soffset of buffer instructions using waterfall loop.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D141030
2023-04-27 19:36:50 +05:30
Janek van Oirschot
124acb7ca3 [AMDGPU] Fix negative offset values interpretation in getMemOperandsWithOffset for DS
The offset values may result in an erroneous scheduling of a load before write for a memory location if the offset values are represented as negative values in MIR, despite actually being unsigned values. This representation in MIR happens as SelectionDAG::getConstant could go through APInt to represent the encoding which assumes the MSB of the encoding as a sign-bit, regardless of whether it is supposed to be a signed value. The 8-bit negative (interpreted) value gets cast to an unsigned 32 bit value in getMemOperandsWithOffset used for comparisons in areMemAccessesTriviallyDisjoint eventually leading to an erroneous schedule in the machine scheduler.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D149080
2023-04-26 14:10:25 +01:00
Kazu Hirata
972983539b [llvm] Apply fixes from readability-redundant-control-flow (NFC) 2023-04-16 00:13:46 -07:00
Diana Picus
b9ba05360e [AMDGPU] Don't S_MOV_B32 into $scc
The peephole optimizer tries to replace
```
%n:sgpr_32 = S_MOV_B32 x
$scc = COPY %n
```
with a `S_MOV_B32` directly into `$scc`.

This crashes because `S_MOV_B32` cannot take `$scc` as input.

We currently generate code like this from GlobalISel when lowering a
G_BRCOND with a constant condition. We should probably look into
removing this kind of branch altogether, but until then we should at
least not crash.

This patch fixes the issue by making sure we don't apply the peephole
optimization when trying to move into a physical register that
doesn't belong to the correct register class.

Differential Revision: https://reviews.llvm.org/D148117
2023-04-14 10:24:43 +02:00
skc7
b434051dc8 [AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D147168
2023-04-10 11:34:14 +05:30
Mateja Marjanovic
48f6964bcb [AMDGPU][GlobalISel] Add support for S_INDIRECT_REG_WRITE_MOVREL_B32_V[9|10|11|12] 2023-03-30 18:27:49 +02:00
Jay Foad
3e3594c771 [AMDGPU] Do not fix implicit vcc operand on INLINEASM
An INLINEASM can have an implicit def of vcc. It is not appropriate for
fixImplicitOperands to change this to vcc_lo on wave32.

Differential Revision: https://reviews.llvm.org/D147157
2023-03-29 20:23:36 +01:00
Mirko Brkusanin
2eada459c7 [AMDGPU][MachineVerifier] Fix vdata reg count for MIMG d16
Differential Revision: https://reviews.llvm.org/D145785
2023-03-10 14:47:49 +01:00
Jay Foad
08bdff862c [AMDGPU] Fix error message for illegal copy 2023-03-03 11:46:01 +00:00
ZHU Zijia
8fccdfa436 [AMDGPU] Remove outdated FIXME in comments [NFC]
This case has already been handled by D106449.
2023-03-03 01:34:19 +08:00
Mirko Brkusanin
926746d22a [AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions
If more registers are needed for VAddr then the NSA format allows then the
final register can act as a contigous set of remaining addresses. Update
legalizer to pack register for this new format and allow instruction
selection to use NSA encoding when number of addresses exceeds max size.
Also update SIShrinkInstructions to handle partial NSA.

Differential Revision: https://reviews.llvm.org/D144034
2023-02-23 13:33:34 +01:00
Piotr Sobczak
a3d7b3121c [AMDGPU][NFC] Add getMaxMUBUFImmOffset
Replace magic constant 4095 with the function getMaxMUBUFImmOffset().

Differential Revision: https://reviews.llvm.org/D144623
2023-02-23 11:29:59 +01:00