437 Commits

Author SHA1 Message Date
Changpeng Fang
26b0bef192
AMDGPU: Use pattern to select instruction for intrinsic llvm.fptrunc.round (#105761)
Use GCNPat instead of Custom Lowering to select instructions for
intrinsic llvm.fptrunc.round. "SupportedRoundMode : TImmLeaf" is used as
a predicate to select only when the rounding mode is supported.
"as_hw_round_mode : SDNodeXForm" is developed to translate the round
modes to the corresponding ones that hardware recognizes.
2024-08-29 11:43:58 -07:00
Juan Manuel Martinez Caamaño
cbf34a5f77
[AMDGPU] Remove dead pass: AMDGPUMachineCFGStructurizer (#105645) 2024-08-23 14:06:17 +02:00
Brox Chen
afd42fb303
[AMDGPU][True16][CodeGen] Support AND/OR/XOR and LDEXP True16 format (#102620)
Support AND/OR/XOR true16 and LDEXP true/fake16 format.

These instructions are previously implemented with fake16 profile.
Fixing the implementation.

Added a RA hint so that when using 16bit register in a 32bit
instruction, try to use the register directly without an extra 16bit
move

---------

Co-authored-by: guochen2 <guochen2@amd.com>
2024-08-13 12:23:39 -04:00
Simon Pilgrim
11ba72e651
[KnownBits] Add KnownBits::add and KnownBits::sub helper wrappers. (#99468) 2024-08-12 10:21:28 +01:00
Petar Avramovic
269cefbc02
AMDGPU/GlobalISel: Fix isExtractHiElt when selecting fma_mix (#102130)
isExtractHiElt should return new source register instead of returning
instruction that defines it. Src = MI.getOperand(0).getReg() is not
correct when MI(for example G_UNMERGE_VALUES) defines multiple registers.
Refactor existing code to work with source registers only.
2024-08-07 12:13:39 +02:00
Matt Arsenault
42d641ef5c
AMDGPU/GlobalISel: Select all constants in tablegen (#100788)
This regresses the arbitrary address space pointer case. Ideally
we could write a pattern that matches a pointer based only on
its size, but using iPTR/iPTRAny seem to not work for this.
2024-07-30 18:31:18 +04:00
Matt Arsenault
b356aa3e2d
AMDGPU/GlobalISel: Partially move constant selection to patterns (#100786)
This is still relying on the manual code for splitting 64-bit
constants, and handling pointers.

We were missing some of the tablegen patterns for all immediate types,
so this has some side effect DAG path improvements. This also reduces
the diff in the 2 selector outputs.
2024-07-30 18:18:16 +04:00
Matt Arsenault
3113bfbfa7
AMDGPU/GlobalISel: Use getSubRegFromChannel (#100732) 2024-07-26 18:35:35 +04:00
Carl Ritson
62aa596ba1
[AMDGPU] Add no return image_sample intrinsics and instructions (#97542)
An appropriately configured image resource descriptor can trigger
image_sample instructions to store outputs directly to a linked memory
location instead of returning to VGPRs.

This is opaque to the backend as instruction encoding is unchanged;
however, a mechanism is require to allow frontends to communicate that
these instructions do not require destination VGPRs and store to memory.
Flagging these as stores means they will not be optimized away.
2024-07-20 17:26:58 +09:00
Jay Foad
0ce3ea1bff
[AMDGPU] Simplify selection of llvm.amdgcn.inverse.ballot. NFCI. (#99345) 2024-07-18 07:45:13 +01:00
Matt Arsenault
2ff22d7485 AMDGPU/GlobalISel: Reorganize select switch cases 2024-06-30 10:28:58 +02:00
Jay Foad
bf536cc7db
[AMDGPU] Fix unwanted LICM/CSE of llvm.amdgcn.pops.exiting.wave.id (#96190)
Mark both the intrinsic and the selected MachineInstr as having side
effects to prevent MachineLICM and MachineCSE from moving/removing them.
2024-06-27 09:27:52 +01:00
vangthao95
3aef525aa4
[AMDGPU] Fix negative immediate offset for unbuffered smem loads (#89165)
For unbuffered smem loads, it is illegal for the immediate offset to be
negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is
negative.

New PR of https://github.com/llvm/llvm-project/pull/79553.
2024-06-24 14:18:23 -07:00
Matt Arsenault
8520061281
AMDGPU: Support local atomicrmw fmin/fmax for float/double (#95590)
This has always been supported. Somehow, we ended up with 2
copies of clang builtins for this case, and the newer one
erroneously requires gfx8-insts.
2024-06-18 18:34:34 +02:00
Matt Arsenault
08d168c56d AMDGPU/GlobalISel: Use correct type for intrinsic ID 2024-05-30 14:31:19 +02:00
Jay Foad
4e0f8a4919 [AMDGPU] Fix EXPENSIVE_CHECKS failure in #89612 2024-05-23 12:22:25 +01:00
Jay Foad
990bed64fb
[AMDGPU] New intrinsic llvm.amdgcn.pops.exiting.wave.id (#89612)
This provides access to the special scalar source value
SRC_POPS_EXITING_WAVE_ID on GFX9 and GFX10.
2024-05-22 19:47:59 +01:00
Jay Foad
6eb9e214b3
RFC: [AMDGPU] Check subtarget features for consistency (#86957)
Implement GCNSubtarget::checkSubtargetFeatures as a canonical place to
check subtarget features for consistency and diagnose any
inconsistencies. To start with, the implementation just checks that
either wavefrontsize32 or wavefrontsize64 is selected.

checkSubtargetFeatures is called at the start of instruction selection.
This is pretty arbitrary. It is just a convenient point at which we have
access to the subtarget that we're going to use for codegenning a
particular function.
2024-05-09 11:37:28 +01:00
David Stuttard
75e528fdd9
[AMDGPU] Extend zero initialization of return values for TFE (#85759)
buffer_load instructions that use TFE also need to zero initialize
return values similar to how the image instructions currently work. Add
support for this with standard zero init of all results + zero init of
just TFE flag when enable-prt-strict-null subtarget feature is disabled.
2024-03-25 09:01:46 +00:00
Noah Goldstein
61c06775c9 [KnownBits] Add API for nuw flag in computeForAddSub; NFC 2024-03-05 12:59:58 -06:00
Petar Avramovic
0d572c41f9
AMDGPU\GlobalISel: remove amdgpu-global-isel-risky-select flag (#83426)
AMDGPUInstructionSelector should no longer attempt to select S1 G_PHIs.
Remove MIR test that attempts to inst-select divergent vcc(S1) G_PHI.
Lane mask merging algorithm for GlobalISel is now responsible for
selecting divergent S1 G_PHIs in AMDGPUGlobalISelDivergenceLowering.
Uniform S1 G_PHIs should be lowered to S32 G_PHIs in reg bank select
pass. In summary S1 G_PHIs should not reach AMDGPUInstructionSelector.
2024-02-29 15:38:54 +01:00
Martin Wehking
4bf06c16fc
Initialize unsigned integer when declared (#81894)
Initialize ModOpcode directly before the loop execution to silence
static analyzer warnings about the usage of an uninitialized variable.

This leads to a redundant assignment of ElV2F16 inside the first loop
execution, but also avoids superfluous emptiness checks of EltsV2F16
after the first execution of the loop.
2024-02-25 18:26:12 +05:30
Shilei Tian
9c6a2de24b
[AMDGPU] Clean up functions for checking inline literals (#81282)
This patch cleans up functions for checking inline literals.
2024-02-15 12:11:51 -05:00
Petar Avramovic
06f711a906
AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis (#80003)
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper.
Use machine uniformity analysis to find divergent i1 phis and select
them as lane mask phis in same way SILowerI1Copies select VReg_1 phis.
Note that divergent i1 phis include phis created by LCSSA and all cases
of uses outside of cycle are actually covered by "lowering LCSSA phis".
GlobalISel lane masks are registers with sgpr register class and S1 LLT.

TODO: General goal is that instructions created in this pass are fully
instruction-selected so that selection of lane mask phis is not split
across multiple passes.

patch 3 from: https://github.com/llvm/llvm-project/pull/73337
2024-02-05 14:07:01 +01:00
Mirko Brkušanin
7fdf608cef
[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-24 13:43:07 +01:00
Petar Avramovic
c46109d0d7
Revert "AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis" (#79274)
Reverts llvm/llvm-project#78482
2024-01-24 12:18:34 +01:00
Petar Avramovic
91ddcba83a
AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis (#78482)
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper.
Use machine uniformity analysis to find divergent i1 phis and select
them as lane mask phis in same way SILowerI1Copies select VReg_1 phis.
Note that divergent i1 phis include phis created by LCSSA and all cases
of uses outside of cycle are actually covered by "lowering LCSSA phis".
GlobalISel lane masks are registers with sgpr register class and S1 LLT.

TODO: General goal is that instructions created in this pass are fully
instruction-selected so that selection of lane mask phis is not split
across multiple passes.

patch 3 from: https://github.com/llvm/llvm-project/pull/73337
2024-01-24 11:58:32 +01:00
Jay Foad
ea9d75aa2a [AMDGPU] Misc formatting fixes. NFC. 2024-01-19 13:50:26 +00:00
Jay Foad
c111dc72e9
[AMDGPU] Allow potentially negative flat scratch offsets on GFX12 (#78193)
https://github.com/llvm/llvm-project/pull/70634 has disabled use
of potentially negative scratch offsets, but we can use it on GFX12.

---------

Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2024-01-18 10:02:40 +00:00
Petar Avramovic
90bdf76fdb
Revert "AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis" (#78468)
Reverts llvm/llvm-project#76145
2024-01-17 17:41:19 +01:00
Petar Avramovic
1fbf533286
AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis (#76145)
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper.
Use machine uniformity analysis to find divergent i1 phis and select
them as lane mask phis in same way SILowerI1Copies select VReg_1 phis.
Note that divergent i1 phis include phis created by LCSSA and all cases
of uses outside of cycle are actually covered by "lowering LCSSA phis".
GlobalISel lane masks are registers with sgpr register class and S1 LLT.

TODO: General goal is that instructions created in this pass are fully
instruction-selected so that selection of lane mask phis is not split
across multiple passes.

patch 3 from: https://github.com/llvm/llvm-project/pull/73337
2024-01-17 12:10:24 +01:00
Krzysztof Drewniak
88871784fd
[AMDGPU] Allow buffer intrinsics to be marked volatile at the IR level (#77847)
In order to ensure the correctness of ptr addrspace(7) lowering, we need
a backwards-compatible way to flag buffer intrinsics as volatile that
can't be dropped (unlike metadata).

To acheive this in a backwards-compatible way, we use bit 31 of the
auxilliary immediates of buffer intrinsics as the volatile flag. When
this bit is set, the MachineMemOperand for said intrinsic is marked
volatile. Existing code will ensure that this results in the appropriate
use of flags like glc and dlc.

This commit also harmorizes the handling of the auxilliary immediate for
atomic intrinsics, which new go through extract_cpol like loads and
stores, which masks off the volatile bit.
2024-01-12 11:20:01 -06:00
Mirko Brkušanin
2adbf254a1
[AMDGPU][NFC] Rename DotIUVOP3PMods to VOP3PModsNeg (#77785)
This is used to select the source modifier (neg) from the immediate
operand. After a follow up commit this will no longer be DOTIU specific.

Co-authored-by: Changpeng Fang <changpeng.fang@amd.com>
2024-01-12 10:57:24 +01:00
Stanislav Mekhanoshin
3bcee8568a
[AMDGPU] Global ISel for llvm.prefetch (#76183) 2024-01-03 01:02:55 -08:00
Mirko Brkušanin
5879162f7f
[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492) 2023-12-15 13:45:03 +01:00
Mirko Brkušanin
26b14aedb7
[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488) 2023-12-15 12:40:23 +01:00
Mariusz Sikora
7f55d7de1a
[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-13 15:01:13 +01:00
Fangrui Song
9535e011c7 [AMDGPU] Fix -Wc++98-compat-extra-semi after c1511a65d5c09f7cff15feba91ce9bda23d74b6e 2023-11-28 21:45:09 -08:00
Ruiling, Song
c1511a65d5
[AMDGPU] Folding imm offset in more cases for scratch access (#70634)
For scratch load/store, our hardware only accept non-negative value in
SGPR/VGPR. Besides the case that we can prove from known bits, we can
also prove that the value in `base` will be non-negative: 1.) When the
ADD for the address calculation has NonUnsignedWrap flag. 2.) When the
immediate offset is already negative.
2023-11-29 12:46:45 +08:00
Acim Maravic
01c1c7a19e
[AMDGPU][CodeGen] Update support (soffset + offset) s_buffer_load's (#68302)
getBaseWithConstantOffset() is used for scalar and non-scalar buffer
loads. Diffrence between s_load and load instruction is that s_load
instruction extends 32-bit offset to 64-bits, so a 32-bit (address +
offset) should not cause unsigned 32-bit integer wraparound, because it
performs addition in 64-bits.
2023-11-14 19:06:45 +01:00
Stanislav Mekhanoshin
fe8335babb
[AMDGPU] Select 64-bit imm moves if can be encoded as 32 bit operand (#70395)
This allows folding of 64-bit operands if fit into 32-bit. Fixes
https://github.com/llvm/llvm-project/issues/67781
2023-10-30 08:12:28 -07:00
Mirko Brkušanin
ecfdc23dd2
[AMDGPU] Select gfx1150 SALU Float instructions (#66885) 2023-09-21 12:22:55 +02:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Matt Arsenault
16bc07ac91 AMDGPU: Select f64 fmul by negative power of 2 to ldexp
Select fmul x, -K -> ldexp(-x, log2(fabsK))
Select fmul fabs(x), -K -> ldexp(-|x|, log2(fabsK))

https://reviews.llvm.org/D158173
2023-08-23 20:36:01 -04:00
Matt Arsenault
1030483561 AMDGPU/GlobalISel: Handle stacksave/stackrestore
https://reviews.llvm.org/D156670
2023-08-11 10:25:01 -04:00
Matt Arsenault
29fff3e2ab AMDGPU: Try to select fmul by power of 2 to ldexp
For the f64 case, this gives us a cheaper to materialize 32-bit
constant. It's less obviously a win for f32 and f16. It forces us to
use a VOP3 encoding so it's a neutral code size change.

GlobalISel cases don't work because of the constant-is-copy-to-vgpr
problem.

https://reviews.llvm.org/D157111
2023-08-11 07:57:55 -04:00
Matt Arsenault
4f851361e4 AMDGPU: Remove extra parentheses 2023-08-04 17:40:54 -04:00
Jay Foad
c2093b8504 [AMDGPU] Add target features for GDS and GWS
GFX9 subtargets from GFX90A onwards lack GDS but still have GWS.

Differential Revision: https://reviews.llvm.org/D156713
2023-08-02 09:02:07 +01:00
Matt Arsenault
0aa439d502 AMDGPU/GlobalISel: Use SGPR results for G_AMDGPU_WAVE_ADDRESS 2023-07-31 19:16:11 -04:00
Sameer Sahasrabuddhe
d9847cde48 [GlobalISel] convergent intrinsics
Introduced the convergent equivalent of the existing G_INTRINSIC opcodes:

- G_INTRINSIC_CONVERGENT
- G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS

Out of the targets that currently have some support for GlobalISel, the patch
assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D154766
2023-07-31 12:15:39 +05:30