563 Commits

Author SHA1 Message Date
Mirko Brkušanin
7fdf608cef
[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-24 13:43:07 +01:00
Petar Avramovic
c46109d0d7
Revert "AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis" (#79274)
Reverts llvm/llvm-project#78482
2024-01-24 12:18:34 +01:00
Petar Avramovic
91ddcba83a
AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis (#78482)
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper.
Use machine uniformity analysis to find divergent i1 phis and select
them as lane mask phis in same way SILowerI1Copies select VReg_1 phis.
Note that divergent i1 phis include phis created by LCSSA and all cases
of uses outside of cycle are actually covered by "lowering LCSSA phis".
GlobalISel lane masks are registers with sgpr register class and S1 LLT.

TODO: General goal is that instructions created in this pass are fully
instruction-selected so that selection of lane mask phis is not split
across multiple passes.

patch 3 from: https://github.com/llvm/llvm-project/pull/73337
2024-01-24 11:58:32 +01:00
Jay Foad
ea9d75aa2a [AMDGPU] Misc formatting fixes. NFC. 2024-01-19 13:50:26 +00:00
Jay Foad
c111dc72e9
[AMDGPU] Allow potentially negative flat scratch offsets on GFX12 (#78193)
https://github.com/llvm/llvm-project/pull/70634 has disabled use
of potentially negative scratch offsets, but we can use it on GFX12.

---------

Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2024-01-18 10:02:40 +00:00
Petar Avramovic
90bdf76fdb
Revert "AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis" (#78468)
Reverts llvm/llvm-project#76145
2024-01-17 17:41:19 +01:00
Petar Avramovic
1fbf533286
AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis (#76145)
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper.
Use machine uniformity analysis to find divergent i1 phis and select
them as lane mask phis in same way SILowerI1Copies select VReg_1 phis.
Note that divergent i1 phis include phis created by LCSSA and all cases
of uses outside of cycle are actually covered by "lowering LCSSA phis".
GlobalISel lane masks are registers with sgpr register class and S1 LLT.

TODO: General goal is that instructions created in this pass are fully
instruction-selected so that selection of lane mask phis is not split
across multiple passes.

patch 3 from: https://github.com/llvm/llvm-project/pull/73337
2024-01-17 12:10:24 +01:00
Krzysztof Drewniak
88871784fd
[AMDGPU] Allow buffer intrinsics to be marked volatile at the IR level (#77847)
In order to ensure the correctness of ptr addrspace(7) lowering, we need
a backwards-compatible way to flag buffer intrinsics as volatile that
can't be dropped (unlike metadata).

To acheive this in a backwards-compatible way, we use bit 31 of the
auxilliary immediates of buffer intrinsics as the volatile flag. When
this bit is set, the MachineMemOperand for said intrinsic is marked
volatile. Existing code will ensure that this results in the appropriate
use of flags like glc and dlc.

This commit also harmorizes the handling of the auxilliary immediate for
atomic intrinsics, which new go through extract_cpol like loads and
stores, which masks off the volatile bit.
2024-01-12 11:20:01 -06:00
Mirko Brkušanin
2adbf254a1
[AMDGPU][NFC] Rename DotIUVOP3PMods to VOP3PModsNeg (#77785)
This is used to select the source modifier (neg) from the immediate
operand. After a follow up commit this will no longer be DOTIU specific.

Co-authored-by: Changpeng Fang <changpeng.fang@amd.com>
2024-01-12 10:57:24 +01:00
Stanislav Mekhanoshin
3bcee8568a
[AMDGPU] Global ISel for llvm.prefetch (#76183) 2024-01-03 01:02:55 -08:00
Mirko Brkušanin
5879162f7f
[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492) 2023-12-15 13:45:03 +01:00
Mirko Brkušanin
26b14aedb7
[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488) 2023-12-15 12:40:23 +01:00
Mariusz Sikora
7f55d7de1a
[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-13 15:01:13 +01:00
Fangrui Song
9535e011c7 [AMDGPU] Fix -Wc++98-compat-extra-semi after c1511a65d5c09f7cff15feba91ce9bda23d74b6e 2023-11-28 21:45:09 -08:00
Ruiling, Song
c1511a65d5
[AMDGPU] Folding imm offset in more cases for scratch access (#70634)
For scratch load/store, our hardware only accept non-negative value in
SGPR/VGPR. Besides the case that we can prove from known bits, we can
also prove that the value in `base` will be non-negative: 1.) When the
ADD for the address calculation has NonUnsignedWrap flag. 2.) When the
immediate offset is already negative.
2023-11-29 12:46:45 +08:00
Acim Maravic
01c1c7a19e
[AMDGPU][CodeGen] Update support (soffset + offset) s_buffer_load's (#68302)
getBaseWithConstantOffset() is used for scalar and non-scalar buffer
loads. Diffrence between s_load and load instruction is that s_load
instruction extends 32-bit offset to 64-bits, so a 32-bit (address +
offset) should not cause unsigned 32-bit integer wraparound, because it
performs addition in 64-bits.
2023-11-14 19:06:45 +01:00
Stanislav Mekhanoshin
fe8335babb
[AMDGPU] Select 64-bit imm moves if can be encoded as 32 bit operand (#70395)
This allows folding of 64-bit operands if fit into 32-bit. Fixes
https://github.com/llvm/llvm-project/issues/67781
2023-10-30 08:12:28 -07:00
Mirko Brkušanin
ecfdc23dd2
[AMDGPU] Select gfx1150 SALU Float instructions (#66885) 2023-09-21 12:22:55 +02:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Matt Arsenault
16bc07ac91 AMDGPU: Select f64 fmul by negative power of 2 to ldexp
Select fmul x, -K -> ldexp(-x, log2(fabsK))
Select fmul fabs(x), -K -> ldexp(-|x|, log2(fabsK))

https://reviews.llvm.org/D158173
2023-08-23 20:36:01 -04:00
Matt Arsenault
1030483561 AMDGPU/GlobalISel: Handle stacksave/stackrestore
https://reviews.llvm.org/D156670
2023-08-11 10:25:01 -04:00
Matt Arsenault
29fff3e2ab AMDGPU: Try to select fmul by power of 2 to ldexp
For the f64 case, this gives us a cheaper to materialize 32-bit
constant. It's less obviously a win for f32 and f16. It forces us to
use a VOP3 encoding so it's a neutral code size change.

GlobalISel cases don't work because of the constant-is-copy-to-vgpr
problem.

https://reviews.llvm.org/D157111
2023-08-11 07:57:55 -04:00
Matt Arsenault
4f851361e4 AMDGPU: Remove extra parentheses 2023-08-04 17:40:54 -04:00
Jay Foad
c2093b8504 [AMDGPU] Add target features for GDS and GWS
GFX9 subtargets from GFX90A onwards lack GDS but still have GWS.

Differential Revision: https://reviews.llvm.org/D156713
2023-08-02 09:02:07 +01:00
Matt Arsenault
0aa439d502 AMDGPU/GlobalISel: Use SGPR results for G_AMDGPU_WAVE_ADDRESS 2023-07-31 19:16:11 -04:00
Sameer Sahasrabuddhe
d9847cde48 [GlobalISel] convergent intrinsics
Introduced the convergent equivalent of the existing G_INTRINSIC opcodes:

- G_INTRINSIC_CONVERGENT
- G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS

Out of the targets that currently have some support for GlobalISel, the patch
assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D154766
2023-07-31 12:15:39 +05:30
Matt Arsenault
3240ae7034 AMDGPU/GlobalISel: Set dead on scc on manually selected instructions
In SelectionDAG InstrEmitter automatically puts dead flags on unused
physreg defs everywhere. The generated selectors should also set dead
on physreg defs that were not used in the pattern.
2023-07-28 14:14:06 -04:00
Sameer Sahasrabuddhe
7c760b224b Restore "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR"
Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic
ID is the first non-def operand to the instruction. These are now represented as
a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID()
is now moved to this subclass GIntrinsic.

Some target-defined instructions behave like GMIR intrinsics, and have an
Intrinsic::ID operand. But they should not be recognized as generic intrinsics,
and should not use GIntrinsic::getIntrinsicID(). Separated these out by
introducing a new AMDGPU::getIntrinsicID().

Reviewed By: arsenm, Pierre-vh

Differential Revision: https://reviews.llvm.org/D155556

This restores commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa.
Originally reverted in d0f7850b01cf17e50a4f4b00e3b84dded94df6b8.
2023-07-27 14:49:17 +05:30
Sameer Sahasrabuddhe
d0f7850b01 Revert "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR"
This reverts commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa.

The changes did not cover all occurrences of the deteleted function
MachineInstr::getIntrinsicID().
2023-07-27 10:14:24 +05:30
Sameer Sahasrabuddhe
baa3386edb [GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR
Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic
ID is the first non-def operand to the instruction. These are now represented as
a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID()
is now moved to this subclass GIntrinsic.

Some target-defined instructions behave like GMIR intrinsics, and have an
Intrinsic::ID operand. But they should not be recognized as generic intrinsics,
and should not use GIntrinsic::getIntrinsicID(). Separated these out by
introducing a new AMDGPU::getIntrinsicID().

Reviewed By: arsenm, Pierre-vh

Differential Revision: https://reviews.llvm.org/D155556
2023-07-27 10:00:45 +05:30
Matt Arsenault
fb54afd1b7 AMDGPU: Fold fsub [+-0] into fneg when folding source modifiers
This isn't always folded to fneg for a freestanding fsub depending on
the denormal mode. When matching source modifiers, we're implicitly
canonicalizing the input so we can fold it here.

Doesn't bother handling the VOP3P case since it's only relevant with
DAZ, which nobody really uses with f16.

For f64, tests show an existing bug where DAGCombiner tries to respect
the denormal mode for fsub -0, x, but not after it's lowered to fadd
-0, (fneg x). Either the fold is wrong or we shouldn't restrict the
fsub case based on the denormal mode.

https://reviews.llvm.org/D155652
2023-07-20 19:29:40 -04:00
pvanhout
07c5920487 Reland "[AMDGPU] Wave32 CodeGen for amdgcn.ballot.i64"
This time without the extra `->dump()`

A recent addition to the device libs, `__ockl_dm_trim`, caused a series of
failures at O0 due to a i64 ballot intrinsic being inlined into a wave32 function.

The quick fix for this is to support codegen for this rare case.
A proper long-term fix for this type of issue is still being discussed.

Fixes SWDEV-408929, SWDEV-408957, SWDEV-409885, SWDEV-410193

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D155050
2023-07-13 15:58:48 +02:00
pvanhout
aec971adec Revert "[AMDGPU] Wave32 CodeGen for amdgcn.ballot.i64"
This reverts commit cfa2d0a3aa0beb5422107dc9943cb0eae6d93896.
2023-07-13 15:52:27 +02:00
pvanhout
cfa2d0a3aa [AMDGPU] Wave32 CodeGen for amdgcn.ballot.i64
A recent addition to the device libs, `__ockl_dm_trim`, caused a series of
failures at O0 due to a i64 ballot intrinsic being inlined into a wave32 function.

The quick fix for this is to support codegen for this rare case.
A proper long-term fix for this type of issue is still being discussed.

Fixes SWDEV-408929, SWDEV-408957, SWDEV-409885, SWDEV-410193

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D155050
2023-07-13 15:20:58 +02:00
pvanhout
3c30179e98 [GlobalISel] Rename KnownBits field of InstructionSelector
`KnownBits` is also a type name. Having a field with this name
prevents derived classes from using the `KnownBits` type unless they use `struct KnownBits`.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D155082
2023-07-12 15:28:11 +02:00
pvanhout
8444038d16 [AMDGPU] Use GlobalISel MatchTable Combiner Backend
Use the new matchtable-based combiner backend for all AMDGPU combiners.
This drop-in from the user's perspective; there are no test changes, the new combiner behaves exactly like the old one.

Depends on D153757

NOTE: This would land iff D153757 (RFC) lands too.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D153758
2023-07-11 11:27:13 +02:00
pvanhout
1fe7d9c799 [GlobalISel] Generalize InstructionSelector Match Tables
Makes `InstructionSelector.h`/`InstructionSelectorImpl.h` generic so the match tables can also be used for the combiner.

Some notes:
 - Coverage was made an optional parameter of `executeMatchTable`, combines won't use it for now.
 - `GIPFP_` -> `GICXXPred_` so it's more generic. Those are just C++ predicates and aren't PatFrag-specific.
 - Pass the MatcherState directly to testMIPredicate_MI, the combiner will need it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D153755
2023-07-11 09:42:30 +02:00
Krzysztof Drewniak
faa2c678aa [AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.

The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.

One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.

In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.

This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.

Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.

Depends on D145441

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D147547
2023-06-05 16:59:07 +00:00
Kazu Hirata
aa144fbeaf [AMDGPU] Fix warnings
This patch fixes warnings like:

  llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h:711: warning:
  enumerated and non-enumerated type in conditional expression
2023-05-19 10:56:23 -07:00
Joe Nash
05d04a0180 [AMDGPU] NFC. Refactor GISel for cmp intrinsics
Combine the logic for fcmp and icmp intrinsics and use operand presence
instead.

Reviewed By: kosarev, foad

Differential Revision: https://reviews.llvm.org/D148716
2023-04-19 11:33:47 -04:00
Jay Foad
6b5067a81a [AMDGPU] Don't assert that image intrinsics are supported
Unsupported intrinsics should give a regular "cannot select" error.

Differential Revision: https://reviews.llvm.org/D148147
2023-04-16 19:54:55 +01:00
Chen Zheng
a3d5ec51ba [AMDGPU][Global-ISel] reuse extension related patterns in td file
However the imported rules can not be used for now because Global ISel
selectImpl() seems has some bug/limitation to create a illegl COPY
from VGPR to SGPR. So currently workaround this by not auto selecting these
patterns.

Fixes #61468

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D147780
2023-04-10 02:11:33 +00:00
Jessica Del
04317d4da7 [AMDGPU][GISel] Add inverse ballot intrinsic
The inverse ballot intrinsic takes in a boolean mask for all lanes and
returns the boolean for the current lane. See SPIR-V's
`subgroupInverseBallot()` in the [[ https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt | GL_KHR_shader_subgroup extension ]].
This allows decision making via branch and select instructions with a manually
manipulated mask.

Implemented in GlobalISel and SelectionDAG, since currently both are supported.
The SelectionDAG required pseudo instructions to use the custom inserter.

The boolean mask needs to be uniform for all lanes.
Therefore we expect SGPR input. In case the source is in a
VGPR, we insert one or more `v_readfirstlane` instructions.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D146287
2023-04-06 07:46:50 +02:00
Jay Foad
c75e266d31 [AMDGPU] Remove two unused ComplexRendererFns
These were left over after https://reviews.llvm.org/D98663
2023-03-30 10:44:45 +01:00
Petar Avramovic
ded69779be Fix SGPR + VGPR + offset Scratch offset folding
Values in SGPR and VGPR register are treated as unsigned by hardware.

When value in 32-bit SGPR or VGPR base can be negative calculate offset
using 32-bit add instructions, otherwise use
sgpr(unsigned) + vgpr(unsigned) + offset.

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in SGPR or VGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144957
2023-03-09 10:53:41 +01:00
Petar Avramovic
3ae310d0ae Fix VGPR + offset Scratch offset folding
Values in VGPR register are treated as unsigned by hardware.

When value in 32-bit VGPR base can be negative calculate offset using
32-bit add instruction, otherwise use vgpr base(unsigned) + offset.
Does not affect case where whole offset comes from VGPR register
(immediate offset is 0).

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in VGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144956
2023-03-09 10:52:44 +01:00
Petar Avramovic
5e56d59999 Fix SGPR + offset Scratch offset folding
Values in SGPR register are treated as unsigned by hardware.

When value in 32-bit SGPR base can be negative calculate offset using
32-bit add instruction, otherwise use sgpr base(unsigned) + offset.
Does not affect case where whole offset comes from SGPR register
(immediate offset is 0).

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in SGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144955
2023-03-09 10:52:44 +01:00
Justin Bogner
c083c89744 [AMDGPU] Move V_FMA_MIX pattern matching into tablegen. NFC
The matching for V_FMA_MIX was partially implemented with a C++
matcher (for fmas with 32 bit results and 16 bit inputs) and partially
in tablegen (for fmas with 16 bit results). Move the C++ matcher logic
into tablegen to make this more consistent and so we can remove the
duplication between SDAG and GISel.

Differential Revision: https://reviews.llvm.org/D144612
2023-02-23 10:23:34 -08:00
Jay Foad
dcb834843e [AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC.
This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies
future changes to make some of it depend on the subtarget.

Differential Revision: https://reviews.llvm.org/D144650
2023-02-23 16:38:15 +00:00
Mirko Brkusanin
926746d22a [AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions
If more registers are needed for VAddr then the NSA format allows then the
final register can act as a contigous set of remaining addresses. Update
legalizer to pack register for this new format and allow instruction
selection to use NSA encoding when number of addresses exceeds max size.
Also update SIShrinkInstructions to handle partial NSA.

Differential Revision: https://reviews.llvm.org/D144034
2023-02-23 13:33:34 +01:00