1270 Commits

Author SHA1 Message Date
Matt Arsenault
248fba0cd8 AMDGPU: Remove pointless setOperationAction for xint_to_fp
The legalize action for uint_to_fp/sint_to_fp uses the source integer
type, not the result FP type so setting an action on an FP type does
nothing.
2023-12-22 11:24:35 +07:00
Stanislav Mekhanoshin
e5c523e861
[AMDGPU] Produce better memoperand for LDS DMA (#75247)
1) It was marked as volatile. This is not needed and the only reason
   it was done is because it is both load and store and handled
   together with atomics. Global load to LDS was marked as volatile
   just because buffer load was done that way.
2) Preserve at least LDS (store) pointer which we always have with
   the intrinsics.
3) Use PoisonValue instead of nullptr for load memop as a Value.
2023-12-18 11:01:12 -08:00
Mariusz Sikora
414d27419f
[AMDGPU] GFX12: select @llvm.prefetch intrinsic (#74576)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-15 17:15:55 +01:00
Jessica Del
32f9983c06
[AMDGPU] - Add address space for strided buffers (#74471)
This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they
point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which contain a
128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially,
they are fat buffer pointers with an additional 32-bit index.
2023-12-15 15:49:25 +01:00
Mirko Brkušanin
07a6d73664
[AMDGPU] CodeGen for GFX12 VFLAT, VSCRATCH and VGLOBAL instructions (#75493) 2023-12-15 15:01:40 +01:00
Mirko Brkušanin
5879162f7f
[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492) 2023-12-15 13:45:03 +01:00
Mirko Brkušanin
26b14aedb7
[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488) 2023-12-15 12:40:23 +01:00
Mirko Brkušanin
a278ac577e
[AMDGPU] CodeGen for SMEM instructions (#75579) 2023-12-15 12:10:33 +01:00
Mariusz Sikora
7f55d7de1a
[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-13 15:01:13 +01:00
Piotr Sobczak
6eec80133b
[AMDGPU] Min/max changes for GFX12 (#75214)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 14:18:10 +01:00
Jay Foad
8005ee6da3
[AMDGPU] CodeGen for GFX12 64-bit scalar add/sub (#75070) 2023-12-12 17:41:40 +00:00
Saiyedul Islam
777b6de7a4
[AMDGPU][NFC] Test autogenerated llc tests for COV5 (#74339)
Regenerate a few llc tests to test for COV5 instead of the default ABI
version.
2023-12-12 14:35:13 +05:30
Kazu Hirata
586ecdf205
[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-11 21:01:36 -08:00
Ruiling, Song
90681d3a41
AMDGPU: Return legal addressmode correctly for flat scratch (#71494) 2023-12-05 11:22:39 +08:00
Jeffrey Byrnes
1b02f594b3
[AMDGPU] Rework dot4 signedness checks (#68757)
Using the known/unknown value of the sign bit, reason about the signedness version of the dot4 instruction.
2023-11-30 13:38:05 -08:00
Stanislav Mekhanoshin
246b8eae20
[AMDGPU] Preserve existing alias info on LDS DMA intrinsics in MIR (#73870)
As of now it is NFCI.
2023-11-29 18:09:31 -08:00
Sander de Smalen
81b7f115fb
[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979)
It seems TypeSize is currently broken in the sense that:

  TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)

without failing its assert that explicitly tests for this case:

  assert(LHS.Scalable == RHS.Scalable && ...);

The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.

This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.

The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
2023-11-22 08:52:53 +00:00
Acim-Maravic
f3138524db
[AMDGPU] Generic lowering for rint and nearbyint (#69596)
The are three different rounding intrinsics, that are brought down to
same instruction.

Co-authored-by: Acim Maravic <acim.maravic@amd.com>
2023-11-14 18:49:21 +01:00
Jun Wang
54470176af
[AMDGPU] Add inreg support for SGPR arguments (#67182)
Function parameters marked with inreg are supposed to be allocated to
SGPRs. However, for compute functions, this is ignored and function
parameters are allocated to VGPRs. This fix modifies CC_AMDGPU_Func in
AMDGPUCallingConv.td to use SGPRs if input arg is marked inreg.
---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2023-11-08 11:35:52 -08:00
Pierre van Houtryve
4428b01faa Reland: [AMDGPU] Remove Code Object V3 (#67118)
V3 has been deprecated for a while as well, so it can safely be removed
like V2 was removed.

- [Clang] Set minimum code object version to 4
- [lld] Fix tests using code object v3
- Remove code object V3 from the AMDGPU backend, and delete or port v3
tests to v4.
- Update docs to make it clear V3 can no longer be emitted.
2023-11-07 12:23:03 +01:00
Valery Pykhtin
fe6893b1d8
Improve selection of conditional branch on amdgcn.ballot!=0 condition in SelectionDAG. (#68714)
Improve selection of the following pattern:

bool cnd = ...
if (amdgcn.ballot(cnd) != 0) {
  ...
}

which means "execute _then_ if any lane has satisfied the _cnd_
condition".
2023-11-06 15:16:49 +01:00
Diana
7f5d59b38d
[AMDGPU] ISel for @llvm.amdgcn.cs.chain intrinsic (#68186)
The @llvm.amdgcn.cs.chain intrinsic is essentially a call. The call
parameters are bundled up into 2 intrinsic arguments, one for those that
should go in the SGPRs (the 3rd intrinsic argument), and one for those
that should go in the VGPRs (the 4th intrinsic argument). Both will
often be some kind of aggregate.

Both instruction selection frameworks have some internal representation
for intrinsics (G_INTRINSIC[_WITH_SIDE_EFFECTS] for GlobalISel,
ISD::INTRINSIC_[VOID|WITH_CHAIN] for DAGISel), but we can't use those
because aggregates are dissolved very early on during ISel and we'd lose
the inreg information. Therefore, this patch shortcircuits both the
IRTranslator and SelectionDAGBuilder to lower this intrinsic as a call
from the very start. It tries to use the existing infrastructure as much
as possible, by calling into the code for lowering tail calls.

This has already gone through a few rounds of review in Phab:

Differential Revision: https://reviews.llvm.org/D153761
2023-11-06 12:30:07 +01:00
Jay Foad
1590cac494
[AMDGPU] Implement moveToVALU for S_CSELECT_B64 (#70352)
moveToVALU previously only handled S_CSELECT_B64 in the trivial case
where it was semantically equivalent to a copy. Implement the general
case using V_CNDMASK_B64_PSEUDO and implement post-RA expansion of
V_CNDMASK_B64_PSEUDO with immediate as well as register operands.
2023-11-02 10:08:09 +00:00
Jay Foad
86f2e09250
[AMDGPU] Tweak handling of GlobalAddress operands in SI_PC_ADD_REL_OFFSET (#70960)
When SI_PC_ADD_REL_OFFSET is expanded to S_GETPC/S_ADD/S_ADDC, the
GlobalAddress operands have to be adjusted by 4 or 12 bytes to account
for the offset from the end of the S_GETPC instruction to the literal
operands. Do this all in SIInstrInfo::expandPostRAPseudo instead of
duplicating the adjustment code in both AMDGPULegalizerInfo and
SITargetLowering. NFCI.
2023-11-01 19:48:30 +00:00
Changpeng Fang
8ceb72ffe5
[AMDGPU] make v32i16/v32f16 legal (#70484)
Some upcoming intrinsics will be using these new types
2023-10-27 15:28:31 -07:00
Craig Topper
c9ca2fe739
[AMDGPU] Fix gcc -Wparentheses warning. NFC (#70239) 2023-10-25 15:47:53 -07:00
Jay Foad
c82ebfb97a Revert "[AMDGPU] Accept arbitrary sized sources in CalculateByteProvider"
This reverts commit ef33659492325de7871c8c85e35bd9c1c37f7347.

It was causing incorrect codegen for some Vulkan CTS tests.
2023-10-25 11:11:27 +01:00
Jie Fu
49893fbff3 [AMDGPU] Fix -Wunused-variable in SIISelLowering.cpp (NFC)
/llvm-project/llvm/lib/Target/AMDGPU/SIISelLowering.cpp:11257:7: error: unused variable 'VT' [-Werror,-Wunused-variable]
  EVT VT = N->getValueType(0);
      ^
1 error generated.
2023-10-24 09:07:18 +08:00
Jeffrey Byrnes
ef33659492 [AMDGPU] Accept arbitrary sized sources in CalculateByteProvider
This allows working with e.g. v8i8 / v16i8 sources.

It is generally useful, but is primarily beneficial when allowing e.g. v8i8s to be passed to branches directly through registers. As such, this is the first in a series of patches to enable that work. However, it effects https://reviews.llvm.org/D155995, so it has been implemented on top of that.

Differential Revision: https://reviews.llvm.org/D159036

Change-Id: Idfcb57dacd0c32cab040fe4dd4ac2ec762750664
2023-10-23 16:07:54 -07:00
alex-t
2973febe10
[AMDGPU] Force the third source operand of the MAI instructions to VGPR if no AGPRs are used. (#69720)
eaf85b9c28 "[AMDGPU] Select VGPR versions of MFMA if possible" prevents
the compiler from reserving AGPRs if a kernel has no inline asm
explicitly using AGPRs, no calls, and runs at least 2 waves with not
more than 256 VGPRs. This, in turn, makes it impossible to allocate AGPR
if necessary. As a result, regalloc fails in case we have an MAI
instruction that has at least one AGPR operand.
This change checks if we have AGPRs and forces operands to VGPR if we do
not have them.

---------

Co-authored-by: Alexander Timofeev <alexander.timofeev@amd.com>
2023-10-23 19:41:07 +02:00
Kazu Hirata
291d8ab3ed [llvm] Use llvm::find_if (NFC) 2023-10-20 00:03:19 -07:00
pvanhout
868abf0961 Revert "[AMDGPU] Remove Code Object V3 (#67118)"
This reverts commit 544d91280c26fd5f7acd70eac4d667863562f4cc.
2023-10-18 12:55:36 +02:00
Pierre van Houtryve
cc3d2533cc
[AMDGPU] Add i1 mul patterns (#67291)
i1 muls can sometimes happen after SCEV. They resulted in ISel failures
because we were missing the patterns for them.

Solves SWDEV-423354
2023-10-16 16:18:27 +02:00
Pierre van Houtryve
544d91280c
[AMDGPU] Remove Code Object V3 (#67118)
V3 has been deprecated for a while as well, so it can safely be removed
like V2 was removed.

- [Clang] Set minimum code object version to 4
- [lld] Fix tests using code object v3
- Remove code object V3 from the AMDGPU backend, and delete or port v3
tests to v4.
- Update docs to make it clear V3 can no longer be emitted.
2023-10-16 08:21:48 +02:00
Jay Foad
b49a0dbaeb
[AMDGPU] Fix comments about afn and arcp in fast unsafe fdiv handling (#68982) 2023-10-13 19:23:53 +01:00
Thomas Symalla
aa5158cd1e
[AMDGPU] Use absolute relocations when compiling for AMDPAL and Mesa3D (#67791)
The primary ISA-independent justification for using PC-relative
addressing is that it makes code position-independent and therefore
allows sharing of .text pages between processes.

When not sharing .text pages, we can use absolute relocations instead,
which will possibly prevent a bubble introduced by s_getpc_b64.

Co-authored-by: Thomas Symalla <thomas.symalla@amd.com>
2023-10-10 09:22:02 +02:00
Simon Pilgrim
29b20829cc Fix Wunused-variable warning. NFC. 2023-10-09 10:26:01 +01:00
Jeffrey Byrnes
7794e16b49 [AMDGPU]: Allow combining into v_dot4
Differential Revision: https://reviews.llvm.org/D155995

Change-Id: Id15d232629a32a3549b13d47bf84d7a61b28b928
2023-10-04 13:31:36 -07:00
Austin Kerbow
0455596e1e [AMDGPU] Add DAG ISel support for preloaded kernel arguments
This patch adds the DAG isel changes for kernel argument preloading.
These changes are not usable with older firmware but subsequent patches
in the series will make the codegen backwards compatible. This patch
should only be submitted alongside that subsequent patch.

Preloading here begins from the start of the kernel arguments until the
amount of arguments indicated by the CL flag
amdgpu-kernarg-preload-count.

Aggregates and arguments passed by-ref are not supported.

Special care for the alignment of the kernarg segment is needed as well
as consideration of the alignment of addressable SGPR tuples when we
cannot directly use misaligned large tuples that the arguments are
loaded to.

Reviewed By: bcahoon

Differential Revision: https://reviews.llvm.org/D158579
2023-09-25 09:32:59 -07:00
Nick Desaulniers
330fa7d2a4
[TargetLowering] Deduplicate choosing InlineAsm constraint between ISels (#67057)
Given a list of constraints for InlineAsm (ex. "imr") I'm looking to
modify the order in which they are chosen. Before doing so, I noticed a
fair
amount of logic is duplicated between SelectionDAGISel and GlobalISel
for this.

That is because SelectionDAGISel is also trying to lower immediates
during selection. If we detangle these concerns into:
1. choose the preferred constraint
2. attempt to lower that constraint

Then we can slide down the list of constraints until we find one that
can be lowered. That allows the implementation to be shared between
instruction selection frameworks.

This makes it so that later I might only need to adjust the priority of
constraints in one place, and have both selectors behave the same.
2023-09-25 08:53:03 -07:00
Simon Pilgrim
142efd6d61 [AMDGPU] Add ISD::FSHR Handling to AMDGPUISD::PERM matching
Pulled out of D159533, which encourages (zext (trunc x)) -> x folds, leading to more ISD::FSHR nodes, which was breaking some existing AMDGPUISD::PERM tests

Differential Revision: https://reviews.llvm.org/D159533
2023-09-24 13:40:07 +01:00
Simon Pilgrim
6e3827af98 [AMDGPU] Create matchPERM helper from performOrCombine PERM matching code.
Pulled out as NFC(ish) pre-commit from D159533
2023-09-22 16:21:28 +01:00
Ivan Kosarev
bea56b0bc0 [AMDGPU] Have a subtarget feature to control use of real True16 instructions.
Real True16 instructions are as they are defined in the ISA. Fake True16
instructions are identical to real ones except that they take 32-bit
registers as operands and always use their low halves.

Reviewed By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D156100
2023-09-22 10:47:13 +01:00
Pierre van Houtryve
e9e3868707
[AMDGPU] Correctly restore FP mode in FDIV32 lowering (#66346)
Addresses the FIXME for both DAGISel and GISel.
2023-09-15 08:11:01 +02:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Pierre van Houtryve
3d0353793b
[AMDGPU] Fix HasFP32Denormals check in FDIV32 lowering (#66212)
Fixes SWDEV-403219
2023-09-14 08:47:10 +02:00
Matt Arsenault
edecb60481 Reapply "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp"
This reverts commit d9333e360a7c52587ab6e4328e7493b357fb2cf3.
2023-09-13 08:38:48 +03:00
Jeffrey Byrnes
db47264ab3
Revert "[AMDGPU]: Allow combining into v_dot4" (#66158)
This reverts commit 7fda1b74be4a173031192d8516869e87e6b7582d.
2023-09-12 16:57:17 -07:00
Matt Arsenault
72a7024add AMDGPU: Correctly lower llvm.sqrt.f32
Make codegen emit correctly rounded sqrt by default.

Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare
based on !fpmath, like the fdiv case. Hack around visitation ordering
problems from AMDGPUCodeGenPrepare using forward iteration instead of
a well behaved combiner.

https://reviews.llvm.org/D158129
2023-09-12 23:22:54 +03:00
Kazu Hirata
0bb49afeaf [AMDGPU] Fix an unused variable warning
This patch fixes:

  llvm/lib/Target/AMDGPU/SIISelLowering.cpp:2493:33: error: unused
  variable 'UserSGPRInfo' [-Werror,-Wunused-variable]
2023-09-12 12:06:36 -07:00