438 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
c6e560a25b
[AMDGPU] Select scale_offset for scratch instructions on gfx1250 (#150111) 2025-07-22 15:24:55 -07:00
Stanislav Mekhanoshin
a0973de745
[AMDGPU] Select scale_offset for global instructions on gfx1250 (#150107)
Also switches immediate offset to signed for the subtarget.
2025-07-22 15:04:52 -07:00
Stanislav Mekhanoshin
a0aebb1935
[AMDGPU] Select scale_offset with SMEM instructions (#150078) 2025-07-22 13:26:28 -07:00
Chris Jackson
b3e016e05f
Revert "[AMDGPU] Recognise bitmask operations as srcmods" (#150000)
Reverts llvm/llvm-project#149110 due to various buildbot failures.
2025-07-22 12:16:03 +01:00
Chris Jackson
c51b48be47
[AMDGPU] Recognise bitmask operations as srcmods on integer types (#149110)
Add to the VOP patterns to recognise when or/xor/and are masking only
the most significant bit of i32/v2i32/i64 and replace with the appropriate source modifier.
2025-07-22 11:35:17 +01:00
Stanislav Mekhanoshin
cfa918bec1
[AMDGPU] Select flat GVS atomics on gfx1250 (#149554) 2025-07-18 12:31:29 -07:00
Changpeng Fang
868793fa8e
AMDGPU: Support intrinsic selection for gfx1250 wmma instructions (#148957)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>
2025-07-15 15:25:05 -07:00
Stanislav Mekhanoshin
a32040e483
[AMDGPU] Use 64-bit literals in codegen on gfx1250 (#148727) 2025-07-14 15:47:24 -07:00
Kazu Hirata
f1791c0ae3
[AMDGPU] Remove unnecessary casts (NFC) (#148340)
getRegisterInfo() already returns const SIRegisterInfo *.

Likewise, getInstrInfo() already returns const SIInstrInfo *.
2025-07-12 11:28:41 -07:00
Fabian Ritter
a1edb1dbc6
[AMDGPU] Fix broken uses of isLegalFLATOffset and splitFlatOffset (#147469)
The last parameter of these functions used to be `Signed`, and it looks
like a few calls weren't updated when that was changed to `FlatVariant`.
Effectively, the functions were called with `FlatVariant=SALU` due to
integer promotions, which doesn't make any sense.
2025-07-08 11:18:36 +02:00
Matt Arsenault
dd4776d429
AMDGPU: Remove AMDGPUInstrInfo class (#144984)
This was never constructed and only provided one static helper
function.
2025-06-20 18:26:56 +09:00
Brox Chen
cd6c4b6103
[AMDGPU][True16][CodeGen] optimize codegen for mad-mix in true16 (#124995)
remove unnecessary COPY for SDAG for mad-mix pattern
2025-05-05 23:08:03 -04:00
Kazu Hirata
aa15596b5f
[llvm] Remove unused local variables (NFC) (#138478) 2025-05-04 21:33:54 -07:00
Mariusz Sikora
6b47bba440
[AMDGPU] Add intrinsics and MIs for ds_bvh_stack_* (#130007)
New intrinsics / instructions :
int_amdgcn_ds_bvh_stack_push4_pop1_rtn / ds_bvh_stack_push4_pop1_rtn_b32
int_amdgcn_ds_bvh_stack_push8_pop1_rtn / ds_bvh_stack_push8_pop1_rtn_b32
int_amdgcn_ds_bvh_stack_push8_pop2_rtn / ds_bvh_stack_push8_pop2_rtn_b64

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
2025-03-17 09:13:20 +01:00
Shilei Tian
dccc0a836c
[NFC][AMDGPU] Replace more direct arch comparison with isAMDGCN() (#131379)
This is an extension of #131357. Hopefully this would be the last one.
2025-03-14 17:02:15 -04:00
Shilei Tian
51c706c119
[NFC][AMDGPU] Replace direct arch comparison with isAMDGCN() (#131357) 2025-03-14 14:21:44 -04:00
Matt Arsenault
e28e93550a
AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (#123684)
For VALU shuffles, this saves an instruction in some case.
2025-01-23 20:58:02 +07:00
Jay Foad
f33e3d422d
[AMDGPU] Fix DAG types for V_MAD_I64_I32 and V_MAD_U64_U32. NFC. (#123629)
These instructions return a 64-bit result and a 1-bit carry, unlike
smul_lohi and umul_lohi which return a pair of 32-bit results.

This does not appear to make any difference in practice because the DAG
types are not used for anything before these nodes are converted to
MachineInstrs.
2025-01-20 16:29:23 +00:00
jofrn
c8bbbaa5c7
[SelectionDAG][AMDGPU] Negative offset when selecting scratch sv offsets (#122251)
APInt will fail when given a negative offset. SelectScratchSVAddr
utilizes this function and can be given a negative offset as well, so
this change modifies it to use APSInt instead.
2025-01-15 06:56:28 -05:00
Changpeng Fang
68694259b2
AMDGPU: Use getSignedTargetConstant for ImmOffset in SelectScratchSVAddr (#121978)
ImmOffset is signed and we will hit an assert with negative ImmOffset
when getTargetConstant is used.

Fixes: SWDEV-506453
2025-01-07 12:02:18 -08:00
Konstantina Mitropoulou
d3508ccd15
[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. (#120588)
- **[AMDGPU] Add new test.**
- **[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions.**

---------

Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou@amd.com>
2024-12-19 11:20:43 -08:00
Craig Topper
f139bde8d8
[SelectionDAG] Move SDNode::use_iterator::getOperandNo to SDUse. (#120536)
This allows us to write more range based for loops because we no
longer need the iterator. It also matches IR's Use class.
2024-12-19 09:07:42 -08:00
Craig Topper
e6b2495545
[SelectionDAG] Split SDNode::use_iterator into user_iterator and use_iterator. (#120531)
SDNode::use_iterator now returns an SDUse& when dereferenced.
SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses
work on use_iterator. SDNode::user_begin/user_end/users work on
user_iterator.

We can now write range based for loops using SDUse& and SDNode::uses().
I've converted many of these in this patch. I didn't update loops that
have additional variables updated in their for statement.

Some loops use SDNode::use_iterator::getOperandNo() which also prevents
using range based for loops. I plan to move this into SDUse in a follow
up patch.
2024-12-19 08:35:32 -08:00
Jay Foad
a161e73fcc [AMDGPU] Remove unnecessary casts to GCNSubtarget 2024-12-19 15:50:53 +00:00
Matt Arsenault
b4a16a78c2
AMDGPU: Match and Select BITOP3 on gfx950 (#117843)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2024-11-27 01:31:19 -05:00
Nikita Popov
3317c9ceac
[AMDGPU] Use getSignedConstant() where necessary (#117328)
Create signed constant using getSignedConstant(), to avoid future
assertion failures when we disable implicit truncation in getConstant().

This also touches some generic legalization code, which apparently only
AMDGPU tests.
2024-11-25 09:49:34 +01:00
Matt Arsenault
d1cca3133a
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
2024-11-22 20:12:50 -08:00
Changpeng Fang
e3e7c756fb
AMDGPU: Update pattern matching from "x&(-1>>(32-y))" to "bfe x, 0, y" (#116115)
It is not correct to lower "x&(-1>>(32-y))" to "bfe x, 0, y". When y
equals 32, "-1" is not shifted, so x&(-1>>(32-32) is still x, but "bfe
x, 0, 32" is 0. However, if we know y is at most of 5 bits (< 32), we
can still do the pattern matching.
2024-11-14 12:21:34 -08:00
Kazu Hirata
be187369a0
[AMDGPU] Remove unused includes (NFC) (#116154)
Identified with misc-include-cleaner.
2024-11-13 21:10:03 -08:00
Changpeng Fang
9778fc76e3
Revert "AMDGPU: Don't avoid clamp of bit shift in BFE pattern (#115372)" (#116091)
Based on the suggestion from
https://github.com/llvm/llvm-project/pull/115543, we should not do the
pattern matching from x << (32-y) >> (32-y) to "bfe x, 0, y" at all.
This reverts commits a2bacf8ab58af4c1a0247026ea131443d6066602 and
bdf8e308b7.
2024-11-13 14:01:21 -08:00
Changpeng Fang
bdf8e308b7
AMDGPU: Don't avoid clamp of bit shift in BFE pattern (#115372)
Enable pattern matching from "x<<32-y>>32-y" to "bfe x, 0, y" when we
know y is in [0,31].
This is the follow-up for the PR:
https://github.com/llvm/llvm-project/pull/114279 to fix the issue:
https://github.com/llvm/llvm-project/issues/114282
2024-11-07 14:15:33 -08:00
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
Petar Avramovic
83fe85115d
AMDGPU: Fix inst-selection of large scratch offsets with sgpr base (#110256)
Use i32 for offset instead of i16, this way it does not get interpreted
as negative 16 bit offset.
2024-09-30 10:44:59 +02:00
Jay Foad
73b8074e68
[AMDGPU] Do not use APInt for simple 64-bit arithmetic. NFC. (#109414) 2024-09-20 13:45:04 +01:00
Nikita Popov
cee0bf9626
[AMDGPU] Use Lo_32 and Hi_32 helpers (NFC) (#109413) 2024-09-20 14:35:38 +02:00
Diana Picus
3356208531
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512)
This reverts commit
7792b4ae79.

The problem was a conflict with
e55d6f5ea2
"[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive
(https://github.com/llvm/llvm-project/pull/107889)"
which changed the syntax of V_SET_INACTIVE (and thus made my MIR test
crash).

...if only we had a merge queue.
2024-09-13 11:54:30 +02:00
Diana Picus
7792b4ae79
Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)"" (#108341)
Reverts llvm/llvm-project#108173

si-init-whole-wave.mir crashes on some buildbots (although it passed
both locally with sanitizers enabled and in pre-merge tests).
Investigating.
2024-09-12 10:12:09 +02:00
Diana Picus
703ebca869
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)" (#108173)
This reverts commit
c7a7767fca.

The buildbots failed because I removed a MI from its parent before
updating LIS. This PR should fix that.
2024-09-12 09:11:41 +02:00
Vitaly Buka
c7a7767fca
Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)
Breaks bots, see #105822.

Reverts llvm/llvm-project#105822
2024-09-10 09:51:43 -07:00
Diana Picus
44556e64f2
[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)
This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may contain
complex control flow that makes it unsuitable for the use of the
existing WWM intrinsics. Instead, we will pretend that the function
starts with all the lanes enabled, then branches into the actual body of
the function for the lanes that were meant to run it, and then finally
all the lanes will rejoin and run the tail.

As such, the intrinsic will return the EXEC mask for the body of the
function, and is meant to be used only as part of a very limited pattern
(for now only in amdgpu_cs_chain functions):

```
entry:
  %func_exec = call i1 @llvm.amdgcn.init.whole.wave()
  br i1 %func_exec, label %func, label %tail

func:
  ; ... stuff that should run with the actual EXEC mask
  br label %tail

tail:
  ; ... stuff that runs with all the lanes enabled;
  ; can contain more than one basic block
```

It's an error to use the result of this intrinsic for anything
other than a branch (but unfortunately checking that in the verifier is
non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if
between the intrinsic and the branch).

The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now
is expanded in si-wqm (which is where SI_INIT_EXEC is handled too);
however the information that the function was conceptually started in
whole wave mode is stored in the machine function info
(hasInitWholeWave). This will be useful in prolog epilog insertion,
where we can skip saving the inactive lanes for CSRs (since if the
function started with all the lanes active, then there are no inactive
lanes to preserve).
2024-09-10 13:24:53 +02:00
Juan Manuel Martinez Caamaño
cbf34a5f77
[AMDGPU] Remove dead pass: AMDGPUMachineCFGStructurizer (#105645) 2024-08-23 14:06:17 +02:00
Simon Pilgrim
11ba72e651
[KnownBits] Add KnownBits::add and KnownBits::sub helper wrappers. (#99468) 2024-08-12 10:21:28 +01:00
Matt Arsenault
dd094b2647
NewPM/AMDGPU: Port AMDGPUPerfHintAnalysis to new pass manager (#102645)
This was much more difficult than I anticipated. The pass is
not in a good state, with poor test coverage. The legacy PM
does seem to be relying on maintaining the map state between
different SCCs, which seems bad. The pass is going out of its
way to avoid putting the attributes it introduces onto non-callee
functions. If it just added them, we could use them directly
instead of relying on the map, I would think.

The NewPM path uses a ModulePass; I'm not sure if we should be
using CGSCC here but there seems to be some missing infrastructure
to support backend defined ones.
2024-08-11 15:11:10 +04:00
Kazu Hirata
f4fb735840
[llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (#102578) 2024-08-09 09:15:42 -07:00
Ivan Kosarev
430cf6537b
[AMDGPU][NFCI] Declare offset0/1 operands to be i32. (#100560)
Being of type i8 makes them signed, which they aren't, and requires
extra work masking them on verbalisation.

Part of <https://github.com/llvm/llvm-project/issues/62629>.
2024-07-25 14:32:19 +01:00
Jay Foad
0ce3ea1bff
[AMDGPU] Simplify selection of llvm.amdgcn.inverse.ballot. NFCI. (#99345) 2024-07-18 07:45:13 +01:00
Jay Foad
bf536cc7db
[AMDGPU] Fix unwanted LICM/CSE of llvm.amdgcn.pops.exiting.wave.id (#96190)
Mark both the intrinsic and the selected MachineInstr as having side
effects to prevent MachineLICM and MachineCSE from moving/removing them.
2024-06-27 09:27:52 +01:00
vangthao95
3aef525aa4
[AMDGPU] Fix negative immediate offset for unbuffered smem loads (#89165)
For unbuffered smem loads, it is illegal for the immediate offset to be
negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is
negative.

New PR of https://github.com/llvm/llvm-project/pull/79553.
2024-06-24 14:18:23 -07:00
Jay Foad
90779fdc19
[AMDGPU] Preserve chain when selecting llvm.amdgcn.pops.exiting.wave.id (#96167)
Without this SelectionDAG could fail assertions when using the intrinsic
in a non-entry BB.
2024-06-20 12:30:34 +01:00
Matt Arsenault
8520061281
AMDGPU: Support local atomicrmw fmin/fmax for float/double (#95590)
This has always been supported. Somehow, we ended up with 2
copies of clang builtins for this case, and the newer one
erroneously requires gfx8-insts.
2024-06-18 18:34:34 +02:00