8310 Commits

Author SHA1 Message Date
Yeaseen
96c723374a
[llvm] Remove br i1 undef from some llvm/test/CodeGen tests (#128272) 2025-02-23 09:23:33 +00:00
Matt Arsenault
ccad5e7744
AMDGPU: Respect amdgpu-no-agpr in functions and with calls (#128147)
Remove the MIR scan to detect whether AGPRs are used or not,
and the special case for callable functions. This behavior was
confusing, and not overridable. The amdgpu-no-agpr attribute was
intended to avoid this imprecise heuristic for how many AGPRs to
allocate. It was also too confusing to make this interact with
the pending amdgpu-num-agpr replacement for amdgpu-no-agpr.

Also adds an xfail-ish test where the register allocator asserts
after allocation fails which I ran into.

Future work should reintroduce a more refined MIR scan to estimate
AGPR pressure for how to split AGPRs and VGPRs.
2025-02-23 09:00:37 +07:00
Matt Arsenault
1bb43068f1
PeepholeOpt: Allow introducing subregister uses on reg_sequence (#127052)
This reverts d246cc618adc52fdbd69d44a2a375c8af97b6106. We now handle
composing subregister extracts through reg_sequence.
2025-02-22 09:16:14 +07:00
Brox Chen
61c6e0061c
[AMDGPU][True16][CodeGen] flat/global/scratch load/store pseudo for true16 (#127945)
T16D16 table is implemented in
https://github.com/llvm/llvm-project/pull/127673

this is a follow up patch to add load/store pseudo for:
flat_store 
global_load/global_store
scratch_load/scratch_store

in true16 mode and updated the codegen test file
2025-02-21 17:06:48 -05:00
Brox Chen
c896f7bdaa
[AMDGPU][True16][CodeGen] build_vector pattern in true16 (#118904)
build_vector pattern in true16 SDAG
2025-02-21 14:02:12 -05:00
Matt Arsenault
0c50054820 Revert "RegAlloc: Fix verifier error after failed allocation (#119690)"
This reverts commit 34167f99668ce4d4d6a1fb88453a8d5b56d16ed5.

Different set of verifier errors appears after other regalloc failure
tests with EXPENSIVE_CHECKS.
2025-02-22 00:23:21 +07:00
Matt Arsenault
34167f9966
RegAlloc: Fix verifier error after failed allocation (#119690)
In some cases after reporting an allocation failure, this would fail
the verifier. It picks the first allocatable register and assigns it,
but didn't update the liveness appropriately. When VirtRegRewriter
relied on the liveness to set kill flags, it would incorrectly add
kill flags if there was another overlapping kill of the virtual
register.

We can't properly assign the register to an overlapping range, so
break the liveness of the failing register (and any other interfering
registers) instead. Give the virtual register dummy liveness by
effectively deleting all the uses by setting them to undef.

The edge case not tested here which I'm worried about is if the read
of the register is a def of a subregister. I've been unable to come up
with a test where this occurs.

https://reviews.llvm.org/D122616
2025-02-21 22:11:51 +07:00
Akshat Oke
bd16a87d05
[AMDGPU][NewPM] Port SIPostRABundler to NPM (#123717) 2025-02-21 16:05:58 +05:30
Matt Arsenault
cc46d00a86
AMDGPU: Form v2f16 minimum3/maximum3 on gfx950 (#128123) 2025-02-21 12:11:51 +07:00
Matt Arsenault
e729dc759d
AMDGPU: Widen f16 minimum/maximum to v2f16 on gfx950 (#128121)
Unfortunately we only have the vector versions of v2f16 minimum3
and maximum. Widen to v2f16 so we can lower as minimum333(x, y, y).
2025-02-21 12:08:49 +07:00
Pravin Jagtap
7c2ebe5dbb
AMDGPU: Restrict src0 to VGPRs only for certain cvt scale opcodes. (#127464)
The Src0 operand width higher that 32-bits of cvt_scale opcodes
operating on FP6/BF6/FP4 need to be restricted to take only VGPRs.
2025-02-21 07:27:25 +05:30
Akshat Oke
9855d761f3
[AMDGPU][NewPM] Port SIOptimizeExecMaskingPreRA to NPM (#125351) 2025-02-20 17:35:56 +05:30
Diana Picus
611a648327
[AMDGPU] Add llvm.amdgcn.dead intrinsic (#123190)
Shaders that use the llvm.amdgcn.init.whole.wave intrinsic need to
explicitly preserve the inactive lanes of VGPRs of interest by adding
them as dummy arguments. The code usually looks something like this:

```
define amdgcn_cs_chain void f(active vgpr args..., i32 %inactive.vgpr1, ..., i32 %inactive.vgprN) {
entry:
  %c = call i1 @llvm.amdgcn.init.whole.wave()
  br i1 %c, label %shader, label %tail

shader:
  [...]

tail:
  %inactive.vgpr.arg1 = phi i32 [ %inactive.vgpr1, %entry], [poison, %shader]
  [...]
  ; %inactive.vgpr* then get passed into a llvm.amdgcn.cs.chain call
```

Unfortunately, this kind of phi node will get optimized away and the
backend won't be able to figure out that it's ok to use the active lanes
of `%inactive.vgpr*` inside `shader`.

This patch fixes the issue by introducing a llvm.amdgcn.dead intrinsic,
whose result can be used as a PHI operand instead of the poison. This
will be selected to an IMPLICIT_DEF, which the backend can work with.

At the moment, the llvm.amdgcn.dead intrinsic works only on i32 values.
Support for other types can be added later if needed.
2025-02-20 09:25:48 +01:00
Matt Arsenault
37c341df28 Revert "AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (#127711)"
This reverts commit 36eaf0daf5d6dd665d7c7a9ec38ea22f27709fed.

This is not a sound approach to dealing with this instruction change.
The new behavior is a different opcode pair, not a modifier on the
existing opcode.
2025-02-20 10:19:14 +07:00
Changpeng Fang
36eaf0daf5
AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (#127711)
For targets that support IEEE fminimum_num/fmaximum_num, the
corresponding *_min_num_fXY/*_max_num_fXY instructions themselves
already did the canonicalization for the inputs. As a result, we do not
need to explicitly canonicalize the inputs for fminnum/fmaxnum.
2025-02-19 11:16:43 -08:00
Brox Chen
210036a22e
[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (#127240)
Previous PR https://github.com/llvm/llvm-project/pull/122950 get
reverted since it hit the buildbot failure. Another patch get merged
when this PR is under review, and thus causing one test not up to date.

repen this PR and fixed the issue.
2025-02-19 11:37:24 -05:00
Fabian Ritter
8615f9aaff
[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all non-documentation occurrences of gfx940/gfx941 from
the llvm directory, and the remaining occurrences in clang.

Documentation changes will follow.

For SWDEV-512631
2025-02-19 10:20:48 +01:00
Shilei Tian
a44284c02f
[AMDGPU] Add isAsCheapAsAMove for v_pk_mov_b32 (#127632)
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2025-02-19 00:51:57 -05:00
Shilei Tian
8187caf8e3
[NFC][AMDGPU] Pre-commit a test case of checking register coalescer on v_pk_mov_b32 (#127715)
This PR serves as a preliminary step, adding a test case for register coalescer on v_pk_mov_b32. It is intended to demonstrate the code changes introduced in an upcoming PR.
2025-02-18 23:42:51 -05:00
Matt Arsenault
22d65d8989
AMDGPU: Teach isOperandLegal about SALU literal restrictions (#127626)
isOperandLegal mostly implemented the VALU operand rules, and
largely ignored SALU restrictions. This theoretically avoids
folding literals into SALU insts which already have a literal
operand. This issue is currently avoided due to a bug in
SIFoldOperands; this change will allow using raw operand
legality rules.

This breaks the formation of s_fmaak_f32 in SIFoldOperands,
but it probably should not have been forming there in the first
place. TwoAddressInsts or RA should generally handle that,
and this only worked by accident.
2025-02-19 10:53:03 +07:00
Chaitanya
aed9f11965
[AMDGPU] Handle lowering addrspace casts from LDS to FLAT address in amdgpu-sw-lower-lds. (#121214)
"infer-address-spaces" pass replaces all refinable generic pointers with
equivalent specific pointers.

At -O0 optimisation level, infer-address-spaces pass doesn't run in the
pipeline.

"amdgpu-sw-lower-lds" pass instruments memory operations on addrspace(3)
ptrs. Since, extra addrspacecasts are present from lds to flat
addrspaces at -O0 and the actual store/load memory instructions are now
on flat addrspace, these addrspacecast need to be handled in the
amdgpu-sw-lower-lds pass itself. This patch lowers the lds ptr first to
the corresponding ptr in the global memory from the asan_malloc. Then
replaces the original cast with addrspacecast from global ptr to flat
ptr.
2025-02-19 08:50:23 +05:30
Brox Chen
7c24041895
[AMDGPU][True16][CodeGen] reopen "FLAT_load using D16 pseudo instruction" (#127673)
Previous patch is merged
https://github.com/llvm/llvm-project/pull/114500 and it hit a buildbot
failure and thus reverted

It seems the AMDGPU::OpName::OPERAND_LAST is removed at the meantime
when previous patch is merged and that's causing the compile error.
Fixed and reopen it here
2025-02-18 18:16:23 -05:00
Stanislav Mekhanoshin
8529bd7b96
[AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (#127142) 2025-02-18 13:19:33 -08:00
Krzysztof Drewniak
f7d03707d1
[AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (#126828)
Attempting to pass a `ptr addrspace(7)` to functions that take `ptr`
arguments produces undesirable `addrspacecast(addrspacecast(p8 x to p7)
to p0) => addrspacecast(p8 x to p0)` folds. This results in illegal GEP
operations on buffer resources, which can't be GEP'd. (However, note
that, while unimplemneted, addressspacecast from ptr addrspace(7) to ptr
is legal - it's just an effective address computation)

To resolve this problem, and thus prevent illegal
`getelementptr T, ptr addrspace(8) %x, ...` s from being produces, this
commit extends amdgcn.make.buffer.rsrc to also be variadic in its result
type, auto-upgrading old manglings.

The logic for handling a make.buffer.rsrc in instruction selection
remains untouched and expects the output type to be a ptr addrspace(8),
as does the Clang lowering for its builtin (the pointer-to-pointer
version might want a different name in clang). LowerBufferFatPointers
has been updated to lower
amdgcn.make.buffer.rsrc.p7.p* to amdgcn.make.buffer.rsrc.p8.p* .

This'll also make exposing buffer fat pointers in Clang easier, since
you don't have to cast between a `__amdgcn_rsrc_t` and a pointer.
2025-02-18 14:15:28 -06:00
Nikita Popov
2cb5241c77 Revert "[AMDGPU][True16][CodeGen] FLAT_load using D16 pseudo instruction (#114500)"
This reverts commit f7a5f067885b7f6cc4a000c8392adf6b777a9108.

Fails to build with:

llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:126:37: error: no member named 'OPERAND_LAST' in 'llvm::AMDGPU::OpName'
  126 |   uint16_t OpName = AMDGPU::OpName::OPERAND_LAST;
2025-02-18 17:16:12 +01:00
Brox Chen
f7a5f06788
[AMDGPU][True16][CodeGen] FLAT_load using D16 pseudo instruction (#114500)
Implement new pseudos with the suffix _t16 for FLAT_LOAD which have
VGPR_16 as the load dst. Lower the pseudos to the existing real
instructions with VGPR_32 src or dst (which makes them consistent with
the hardware encoding). This patch reduces VGPR usage by making hi
halves of VGPRs available for other values.

There are more 8/16 bits ld/st instructions to be supported in the
up-coming patches
2025-02-18 11:05:25 -05:00
Matt Arsenault
eb7c947272
AMDGPU: Correct legal literal operand logic for multiple uses (#127594)
The same literal can be used multiple times in an instruction,
not just once. We were not tracking the used value to verify this,
so correct this.

This helps avoid regressions in a future patch.
2025-02-18 19:58:42 +07:00
Matt Arsenault
cd10c01767
AMDGPU: Handle subregister uses in SIFoldOperands constant folding (#127485) 2025-02-18 17:19:53 +07:00
Stanislav Mekhanoshin
bc4f05d8a8
[AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC. (#127129)
It does not change the estimate because getInstSizeInBytes() already
returns 0 for meta instructions, but added a test and early bail.
2025-02-18 02:08:28 -08:00
Matt Arsenault
c5def84ca4
AMDGPU: Handle brev and not cases in getConstValDefinedInReg (#127483)
We should not encounter these cases in the peephole-opt use today,
but get the common helper function to handle these.
2025-02-18 11:23:49 +07:00
Matt Arsenault
4dee305ce2
AMDGPU: Fix foldImmediate breaking register class constraints (#127481)
This fixes a verifier error when folding an immediate materialized
into an aligned vgpr class into a copy to an unaligned virtual register.
2025-02-18 10:34:48 +07:00
Matt Arsenault
fe1ef413ab
AMDGPU: Add more tests for peephole-opt immediate folding (#127480) 2025-02-18 10:31:46 +07:00
Matt Arsenault
b5b8a59a53
AMDGPU: Implement getRequiredProperties for SIFoldOperands (#127522)
Fix the broken MIR tests violating isSSA.
2025-02-18 08:22:45 +07:00
Matt Arsenault
ed38d6702f
PeepholeOpt: Handle subregister compose when looking through reg_sequence (#127051)
Previously this would give up on folding subregister copies through
a reg_sequence if the input operand already had a subregister index.
d246cc618adc52fdbd69d44a2a375c8af97b6106 stopped introducing these
subregister uses, and this is the first step to lifting that restriction.

I was expecting to be able to implement this only purely with compose /
reverse compose, but I wasn't able to make it work so relies on testing
the lanemasks for whether the copy reads a subset of the input.
2025-02-18 08:07:29 +07:00
Scott Linder
29ca3b8b28
[AMDGPU] Push amdgpu-preload-kern-arg-prolog after livedebugvalues (#126148)
This is effectively a workaround for a bug in livedebugvalues, but seems
to potentially be a general improvement, as BB sections seems like it
could ruin the special 256-byte prelude scheme that
amdgpu-preload-kern-arg-prolog requires anyway. Moving it even later
doesn't seem to have any material impact, and just adds livedebugvalues
to the list of things which no longer have to deal with pseudo
multiple-entry functions.

AMDGPU debug-info isn't supported upstream yet, so the bug being avoided
isn't testable here. I am posting the patch upstream to avoid an
unnecessary diff with AMD's fork.
2025-02-17 13:29:56 -05:00
Scott Linder
eaa460ca49
[AMDGPU] Remove dead function metadata after amdgpu-lower-kernel-arguments (#126147)
The verifier ensures function !dbg metadata is unique across the module,
so ensure the old nameless function we leave behind doesn't violate
this invariant.

Removing the function via e.g. eraseFromParent seems like a better
option, but doesn't seem to be legal from a FunctionPass.
2025-02-17 13:27:23 -05:00
Shilei Tian
8aff59d3f4
[NFC][AMDGPU] Auto generate check lines for three test cases (#127352)
- `CodeGen/AMDGPU/spill_more_than_wavesize_csr_sgprs.ll`
- `CodeGen/AMDGPU/call-preserved-registers.ll`
- `CodeGen/AMDGPU/stack-realign.ll`

This is to make preparation for another PR.
2025-02-17 11:22:08 -05:00
Matt Arsenault
18ea6c9280
AMDGPU: Stop emitting an error on illegal addrspacecasts (#127487)
These cannot be static compile errors, and should be treated as
poison. Invalid casts may be introduced which are dynamically dead.

For example:

```
  void foo(volatile generic int* x) {
    __builtin_assume(is_shared(x));
    *x = 4;
  }

  void bar() {
    private int y;
    foo(&y); // violation, wrong address space
  }
```

This could produce a compile time backend error or not depending on
the optimization level. Similarly, the new test demonstrates a failure
on a lowered atomicrmw which required inserting runtime address
space checks. The invalid cases are dynamically dead, we should not
error, and the AtomicExpand pass shouldn't have to consider the details
of the incoming pointer to produce valid IR.

This should go to the release branch. This fixes broken -O0 compiles
with 64-bit atomics which would have started failing in
1d0370872f28ec9965448f33db1b105addaf64ae.
2025-02-17 21:03:50 +07:00
Vikram Hegde
06a3abd9e8
[AMDGPU][NewPM] Port "SIFormMemoryClauses" to NPM (#127181) 2025-02-17 11:07:17 +05:30
Yeaseen
6e94007623
[llvm] Remove br i1 undef in some llvm/test/CodeGen tests (#127368)
This PR replaces some instances of `br i1 undef` with function argument
value in several tests under `llvm/test/CodeGen/ `directory. This PR is
a continuation of PR #125460
2025-02-16 18:44:46 +00:00
Jeffrey Byrnes
a1120c9b79
[AMDGPU] NFC: Fix some details for lit test (#127141)
Addressed comments in https://github.com/llvm/llvm-project/pull/126976
2025-02-16 19:34:20 +07:00
Brox Chen
cf1165cb9c
Revert "[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (#12… (#127175)
Reverting this patch since it raise buildbot failure

This reverts commit 2a7487cc2e0fb8bd91784e2d9636a65baa6d90ed.
2025-02-14 02:28:45 -05:00
Brox Chen
2a7487cc2e
[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (#122950)
true16 codegen pattern for f16 fma.

created a duplicated shrink-mad-fma-gfx10.mir from shrink-mad-fma to
seperate pre-GFX11 and GFX11 mir test.
2025-02-14 02:16:00 -05:00
Scott Linder
0aafb8aca3
[AMDGPU] Add test for failure with function !dbg info in amdgpu-lower-kernel-arguments (#126146) 2025-02-13 15:58:45 -05:00
LU-JOHN
5decab178f
AMDGPU: Reduce shl64 to shl32 if shift range is [63-32] (#125574)
Reduce:

   DST = shl i64 X, Y

where Y is in the range [63-32] to:

   DST = [0, shl i32 X, (Y & 32)]


Alive2 analysis:

https://alive2.llvm.org/ce/z/w_u5je

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-02-13 13:40:25 -06:00
Robert Imschweiler
41e49fadd4
[AMDGPU] Fix llvm.amdgcn.workitem.id-unsupported-calling-convention.ll (#127041)
Follow-up fix for #126058. (@arsenm)
2025-02-13 22:23:47 +07:00
Robert Imschweiler
0da8d0f9b7
[AMDGPU] Change handling of unsupported non-compute shaders with HSA (#126798)
Previous handling in `SITargetLowering::LowerFormalArguments` only
reported a diagnostic message and continued execution by returning a
non-usable `SDValue`. This results in llvm crashing later with an
unrelated error. This commit changes the detection of an unsupported
non-compute shader to be a fatal error right away.

As an example situation, take the usage of an `amdgpu_ps` function and
the `amdgcn-unknown-amdhsa` target triple.
```
define amdgpu_ps void @foo(ptr %p, i32 %i) {
        store i32 %i, ptr %p
        ret void
}
```
Compiling this code (with `llc -mtriple=amdgcn-unknown-amdhsa
-mcpu=gfx942`, for example) fails with:
```
error: <unknown>:0:0: in function foo void (ptr, i32): unsupported non-compute shaders with HSA

llc:
[...]/git/trunk21.0/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:11790:
void llvm::SelectionDAGISel::LowerArguments(const llvm::Function&):
Assertion `InVals.size() == Ins.size() && "LowerFormalArguments didn't emit the correct number of values!"' failed.
[...]
```
2025-02-13 22:23:08 +07:00
Fabian Ritter
a33a84ee63
[AMDGPU][NFC] Replace gfx940 and gfx941 with gfx942 in llvm/test (#125711)
[AMDGPU][NFC] Replace gfx940 and gfx941 with gfx942 in llvm/test

gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base.

This PR uses gfx942 instead of gfx940 and gfx941 in the test RUN-lines (unless there is already a RUN-line for gfx942).

The only notable difference in the test output is that gfx942 does not force the use of sc0 and sc1 on stores while gfx940 and gfx941 do (cf. https://reviews.llvm.org/D149986).

For SWDEV-512631
2025-02-13 15:17:12 +01:00
Matt Arsenault
eef0205345
AMDGPU: Add baseline test for treating v_pk_mov_b32 like reg_sequence (#125656) 2025-02-13 18:12:09 +07:00
Jay Foad
0b0f3da6a8
[AMDGPU] Add a regression test for -mattr=dumpcode (#116982) 2025-02-13 11:04:08 +00:00