6326 Commits

Author SHA1 Message Date
Joseph Huber
a1da746157 [AMDGPU] Place global constructors in .init_array and .fini_array
For the GPU, we emit external kernels that call the initializers and
constructors, however if we had a persistent kernel like in the `_start`
kernel for the `libc` project, we could initialize the standard way of
calling constructors. This patch adds new global variables containing
pointers to the constructors to be called. If these are placed in the
`.init_array` and `.fini_array` sections, then the backend will handle
them specially. The linker will then provide the `__init_array_` and
`__fini_array_` sections to traverse them. An implementation would look
like this.

```
extern uintptr_t __init_array_start[];
extern uintptr_t __init_array_end[];
extern uintptr_t __fini_array_start[];
extern uintptr_t __fini_array_end[];

using InitCallback = void(int, char **, char **);
using FiniCallback = void(void);

extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void
_start(int argc, char **argv, char **envp) {
  uint64_t init_array_size = __init_array_end - __init_array_start;
  for (uint64_t i = 0; i < init_array_size; ++i)
    reinterpret_cast<InitCallback *>(__init_array_start[i])(argc, argv, env);
  uint64_t fini_array_size = __fini_array_end - __fini_array_start;
  for (uint64_t i = 0; i < fini_array_size; ++i)
    reinterpret_cast<FiniCallback *>(__fini_array_start[i])();
}
```

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D149340
2023-04-29 08:40:19 -05:00
Jay Foad
56af0e913c [EarlyCSE] Do not CSE convergent calls in different basic blocks
"convergent" is documented as meaning that the call cannot be made
control-dependent on more values, but in practice we also require that
it cannot be made control-dependent on fewer values, e.g. it cannot be
hoisted out of the body of an "if" statement.

In code like this, if we allow CSE to combine the two calls:

  x = convergent_call();
  if (cond) {
    y = convergent_call();
    use y;
  }

then we get this:

  x = convergent_call();
  if (cond) {
    use x;
  }

This is conceptually equivalent to moving the second call out of the
body of the "if", up to the location of the first call, so it should be
disallowed.

Differential Revision: https://reviews.llvm.org/D149348
2023-04-28 14:50:48 +01:00
Jay Foad
5534d1d834 [CSE] Precommit an AMDGPU test case for D149348
Differential Revision: https://reviews.llvm.org/D149349
2023-04-28 14:50:48 +01:00
Jeffrey Byrnes
7f0a881e6c [AMDGPU] Track liveins for max-ilp-sched-strategy
Even if optimizing for ILP, it is still useful to track RP to avoid spilling. Given that, we need to maintin consistent liveness state with the RP tracker. This patch makes RP tracking consistent by updating for liveins.

Otherwise, we should completely eliminate RP tracking for this scheduler (checkScheduling, initCandidate).

Differential Revision: https://reviews.llvm.org/D149358
2023-04-27 16:45:45 -07:00
Changpeng Fang
1ab8b9ae15 AMDGPU: Define sub-class of SGPR_64 for tail call return
Summary:
  Registers for tail call return should not be clobbered by callee.
So we need a sub-class of SGPR_64 (excluding callee saved registers (CSR)) to hold
the tail call return address.

Because GFX and C calling conventions have different CSR, we need to define
the sub-class separately. This work is an extension of D147096 with the
consideration of GFX calling convention.

Based on the calling conventions, different instructions will be selected with
different sub-class of SGPR_64 as the input.

Reviewers: arsenm, cdevadas and sebastian-ne

Differential Revision: https://reviews.llvm.org/D148824
2023-04-27 10:45:11 -07:00
skc7
e016fb57b3 [AMDGPU] Legalize soffset of buffer instructions. Use Waterfall loop logic.
Legalize soffset of buffer instructions using waterfall loop.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D141030
2023-04-27 19:36:50 +05:30
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
Jay Foad
47d3cbcf84 [BranchFolder] Skip redundant IMPLICIT_DEFs of subregs
Differential Revision: https://reviews.llvm.org/D148509
2023-04-27 09:40:06 +01:00
Jay Foad
12b70ad68c [BranchFolder] Precommit AMDGPU test case for D148509 2023-04-27 09:40:06 +01:00
Nicolai Hähnle
1e63f8272e AMDGPU: Fix an assertion in SIOptimizeVGPRLiveRange
As the comment notes, the shader results in an INSERT_SUBREG with
"undef" (dead) operand in the Endif block. The same can happen with
REG_SEQUENCE. The register is considered dead from a liveness
analysis perspective. The correct thing to do seems to be nothing:
we keep the undef use of the register, the register allocator should
still be able to take the liveness into account correctly.

Differential Revision: https://reviews.llvm.org/D149161
2023-04-27 09:39:44 +02:00
Matt Arsenault
bbc7b30fbf AMDGPU: Remove invalid testcase for enqueue kernel
The call didn't have the right calling convention, but calls to
kernels are supposed to be illegal anyway.
2023-04-26 17:25:30 -04:00
Jay Foad
22516593ae [AMDGPU] Add GFX11 ds_min_f32 / ds_max_f32 tests 2023-04-26 17:09:12 +01:00
Joe Nash
f8ec7a0944 [AMDGPU] Delete test for illegal v_cndmask_b16_dpp
There are no VOP2 or VOP2 with dpp forms of v_cndmask_b16. Delete the
test. NFC.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D149184
2023-04-26 09:50:44 -04:00
Janek van Oirschot
124acb7ca3 [AMDGPU] Fix negative offset values interpretation in getMemOperandsWithOffset for DS
The offset values may result in an erroneous scheduling of a load before write for a memory location if the offset values are represented as negative values in MIR, despite actually being unsigned values. This representation in MIR happens as SelectionDAG::getConstant could go through APInt to represent the encoding which assumes the MSB of the encoding as a sign-bit, regardless of whether it is supposed to be a signed value. The 8-bit negative (interpreted) value gets cast to an unsigned 32 bit value in getMemOperandsWithOffset used for comparisons in areMemAccessesTriviallyDisjoint eventually leading to an erroneous schedule in the machine scheduler.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D149080
2023-04-26 14:10:25 +01:00
OCHyams
2b3c13b716 [DebugInfo] Treat empty metadata operands the same as undef operands in SelectionDAG
Without this patch SelectionDAG silently drops dbg.values using `!{}` operands.

Related to https://discourse.llvm.org/t/auto-undef-debug-uses-of-a-deleted-value

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D140990
2023-04-26 09:03:07 +01:00
Jay Foad
0c13e0b748 [AMDGPU] Do not handle _SGPR SMEM instructions in SILoadStoreOptimizer
After D147334 we never select _SGPR forms of SMEM instructions on
subtargets that also support the _SGPR_IMM form, so there is no need to
handle them here.

Differential Revision: https://reviews.llvm.org/D149139
2023-04-25 15:40:13 +01:00
Matt Arsenault
2fce50e8f5 AMDGPU: Fix assertion with multiple uses of f64 fneg of select
A bitcast needs to be inserted back to the original type. Just
skip the multiple use case for a safer quick fix. Handling
the multiple use case seems to be beneficial in some but not
all cases.
2023-04-20 10:15:18 -04:00
Matt Arsenault
99d4c722e3 AMDGPU: Really invert handling of enqueued block detection
Remove the broken call graph analysis in the block enqueue lowering
pass. The previous iteration was reverted due to a runtime bug when
the completion action was unconditionally enabled.
2023-04-20 06:58:24 -04:00
Pravin Jagtap
21a69bdb66 [NewPM][AMDGPU] Port amdgpu-atomic-optimizer
Reviewed By: arsenm, sameerds, gandhi21299

Differential Revision: https://reviews.llvm.org/D148628
2023-04-20 00:27:47 -04:00
Jay Foad
e1ae0e2b7d [AMDGPU] Fix some check prefixes 2023-04-19 16:15:14 +01:00
Jay Foad
141c476a36 [AMDGPU] Remove unused check lines from tests 2023-04-19 16:15:14 +01:00
Jay Foad
43b035b483 [AMDGPU] Remove unused check lines from GlobalISel IR tests 2023-04-19 15:13:02 +01:00
Jay Foad
d9ed0dee0c [AMDGPU] Remove unused check lines from GlobalISel MIR tests 2023-04-19 15:03:32 +01:00
Jay Foad
bf4dc4381e [AMDGPU] Don't transform illegal intrinsics to V_ILLEGAL
This reverts parts of D123693. The functionality of allowing unsupported
intrinsics to select has been superseded by D139000 "Remove function
with incompatible features".

Retain assembler/disassembler support for v_illegal on GFX10+ only,
where it is documented.

Differential Revision: https://reviews.llvm.org/D148127
2023-04-19 09:59:46 +01:00
Chen Zheng
3f4055dec4 [GlobalISelEmitter] handle operand without MVT/class
There are some patterns in td files without MVT/class set
for some operands in target pattern that are from the source
pattern. This prevents GlobalISelEmitter from adding them as
a valid rule, because the target child operand is an
unsupported kind operand. For now, for a leaf child, only
IntInit and DefInit are handled in GlobalISelEmitter.

This issue can be workaround by adding MVT/class to the
patterns in the td files, like the workarounds for patterns
anyext and setcc in PPCInstrInfo.td in D140878.

To avoid adding the same workarounds for other patterns in
td files, this patch tries to handle the UnsetInit case in
GlobalISelEmitter.

Adding the new handling allows us to remove the workarounds
in the td files and also generates many selection rules for
PPC target.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D141247
2023-04-19 07:00:57 +00:00
chenglin.bi
9356097206 Revert "[AMDGPU] Ressociate patterns with sub to use SALU"
The patch will caused dead loop because of DAGCombiner's canonicalization:
  // (x + C) - y  ->  (x - y) + C
  // y - (x + C)  ->  (y - x) - C
  // (x - C) - y  ->  (x - y) - C
  // (C - x) - y  ->  C - (x + y)

This reverts commit b3529b5bf3ba2cd7f38665de16450afefb263c9b.
2023-04-19 11:15:14 +08:00
David Stuttard
4d54565436 [AMDGPU] Remove unnecessary assert
Also remove the function attributes from the test. For PAL based shaders this isn't required.

Differential Revision: https://reviews.llvm.org/D148625
2023-04-18 13:41:38 +01:00
pvanhout
ec82188451 [AMDGPU] Do not crash on agpr_hi16 in AMDGPUResourceUsageAnalysis
Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D148438
2023-04-18 13:53:01 +02:00
Kriti Gupta
a3dfa4e083 [test] Remove occurences of br undef in CodeGen/AMDGPU tests
Differential Revision: https://reviews.llvm.org/D148041
2023-04-18 08:47:29 +01:00
chenglin.bi
b3529b5bf3 [AMDGPU] Ressociate patterns with sub to use SALU
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D148463
2023-04-18 06:49:41 +08:00
Jay Foad
2be3189601 [AMDGPU] Don't select _SGPR forms of SMEM instructions on GFX9+
On GFX9+, SMEM instructions have an _SGPR_IMM form which is strictly
more powerful than the _SGPR form. It simplifies codegen if we always
select the _SGPR_IMM form with an immediate offset of 0 instead of the
_SGPR form.

Note that this patch just makes minimal changes to the selection
patterns to prove the concept. Further simplifications are possible to
reduced the number of selection patterns.

On GFX9 the _SGPR form of the Real instruction is still required for
assembly/disassembly but on GFX10+ it can be removed completely.

Differential Revision: https://reviews.llvm.org/D147334
2023-04-17 16:23:30 +01:00
pvanhout
ae77aceba5 [Analysis] Remove DA & LegacyDA
UniformityAnalysis offers all of the same features and much more, there is no reason left to use the legacy DAs.
See RFC: https://discourse.llvm.org/t/rfc-deprecate-divergenceanalysis-legacydivergenceanalysis/69538

- Remove LegacyDivergenceAnalysis.h/.cpp
- Remove DivergenceAnalysis.h/.cpp + Unit tests
- Remove SyncDependenceAnalysis - it was not a real registered analysis and was only used by DAs
- Remove/adjust references to the passes in the docs where applicable
- Remove TTI hook associated with those passes.
- Move tests to UniformityAnalysis folder.
  - Remove RUN lines for the DA, leave only the UA ones.
- Some tests had to be adjusted/removed depending on how they used the legacy DAs.

Reviewed By: foad, sameerds

Differential Revision: https://reviews.llvm.org/D148116
2023-04-17 09:01:22 +02:00
Jay Foad
6b5067a81a [AMDGPU] Don't assert that image intrinsics are supported
Unsupported intrinsics should give a regular "cannot select" error.

Differential Revision: https://reviews.llvm.org/D148147
2023-04-16 19:54:55 +01:00
pvanhout
b3b3cb2d2f [AMDGPU] Less aggressively break large PHIs
In some cases, breaking large PHIs can very negatively affect
performance (3x more instructions observed in a particular test case).

This patch adds some basic profitability heuristics to help with some of these issues without affecting the "good" cases.
e.g. avoid breaking PHIs if it causes back-and-forth between vector/scalar form for no good reason.

Fixes SWDEV-392803
Fixes SWDEV-393781
Fixes SWDEV-394228

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D147786
2023-04-14 15:41:26 +02:00
David Stuttard
fc83f1de5d [AMDGPU] Add backend support for new PAL ELF Metadata 3.0
PAL Metadata 3.0 introduces an explicit structure in metadata for the
programmable registers written out by the compiler backend.
Rather than using opaque registers which can change between different
architectures and requires encoding the bitfield information in the backend,
which may change between versions.

This is the initial minimal implementation that enables the use of PAL Metadata
3.0.

The change itself should be NFC for non-PAL, although the way RSRC2 register is
handled has been changed slightly.

The test is fairly minimal, but checks that the metadata format looks as
expected and verifies a couple of special cases such as tgid_[xyz]_en handling
and PsInputAddr/Ena which also change to explicit fields.

Differential Revision: https://reviews.llvm.org/D147143
2023-04-14 09:57:13 +01:00
Diana Picus
b9ba05360e [AMDGPU] Don't S_MOV_B32 into $scc
The peephole optimizer tries to replace
```
%n:sgpr_32 = S_MOV_B32 x
$scc = COPY %n
```
with a `S_MOV_B32` directly into `$scc`.

This crashes because `S_MOV_B32` cannot take `$scc` as input.

We currently generate code like this from GlobalISel when lowering a
G_BRCOND with a constant condition. We should probably look into
removing this kind of branch altogether, but until then we should at
least not crash.

This patch fixes the issue by making sure we don't apply the peephole
optimization when trying to move into a physical register that
doesn't belong to the correct register class.

Differential Revision: https://reviews.llvm.org/D148117
2023-04-14 10:24:43 +02:00
Jay Foad
2d39f5b5cd [AMDGPU] Allow use of TTMP registers in AMDGPUResourceUsageAnalysis
With architected SGPRs, workgroup IDs are passed into a compute shader
in TTMP registers. Allow for this in AMDGPUResourceUsageAnalysis instead
of failing an assertion.

Differential Revision: https://reviews.llvm.org/D148239
2023-04-13 16:56:22 +01:00
pvanhout
fd1d60873f [AMDGPU] Remove CC exception for Promote Alloca Limits
Apparently it was used to work around some issue that has been fixed.
Removing it helps with high scratch usage observed in some cases due to failed alloca promotion.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D145586
2023-04-13 08:48:34 +02:00
Sebastian Neubauer
fee3980df5 [AMDGPU] Fix amdgpu_gfx tail-call test
The inreg argument prevented the tail call optimization to kick in.
Remove the inreg, so this test actually uses a tail call.

Note that it now uses s[4:5] for the return address, which is invalid,
because these registers are supposed to be callee-save.
D147096 tried to fix that problem for the C calling convention.

Differential Revision: https://reviews.llvm.org/D148119
2023-04-12 16:15:09 +02:00
Matt Arsenault
f608ac6286 AMDGPU: Push fneg into bitcast of integer select
Avoids some regressions in the math libraries in a future
patch.
2023-04-12 06:48:58 -04:00
Joe Nash
57fe7f450d [AMDGPU] NFC. Clean up check prefixes in fcmp test
Fix the test introduced in D136592 which appeared to have a few check lines
containing an un-checked prefix "GISEL-GFX".
Also canonicalize the other prefixes to minimize churn if SDag and GISel
diverge.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D147958
2023-04-11 09:58:03 -04:00
skc7
97f8d6b2ec [AMDGPU][NFC] Regenerate test checks for sub.ll 2023-04-11 17:55:49 +05:30
Matt Arsenault
0f59720e1c AMDGPU: Fold fneg into bitcast of build_vector
The math libraries have a lot of code that performs
manual sign bit operations by bitcasting doubles to int2
and doing bithacking on them. This is a bad canonical form
we should rewrite to use high level sign operations directly
on double. To avoid codegen regressions, we need to do a better
job moving fnegs to operate only on the high 32-bits.

This is only halfway to fixing the real case.
2023-04-11 07:12:01 -04:00
Diana Picus
d9bf8aba23 [AMDGPU] Add MMOs for GFX11 Streamout Instructions
The GFX11 NGG Streamout Instructions perform atomic operations on
dedicated registers. At the moment, they lack machine memory operands,
which causes the si-memory-legalizer pass to treat them conservatively
and introduce several unnecessary waits and cache invalidations.

This patch introduces a new address space to represent these special
registers and teaches instruction selection to add memory operands with
this new address space to DS_ADD/SUB_GS_REG_RTN.

Since this address space is meant to be compiler-internal, we move it
up a bit from the other address spaces and give it the number 128.
According to the LLVM Language Reference, address space numbers can go
all the way up to 2^24, but I'm not sure how well this is supported in
practice [1], so using a smaller number seems safer.

[1] 0107513fe7/llvm/utils/TableGen/IntrinsicEmitter.cpp (L401)

Differential Revision: https://reviews.llvm.org/D146031
2023-04-11 11:11:32 +02:00
pvanhout
0ff02cf015 [AMDGPU] Hide "removing function" diagnostics by default
Use an `OptimizationRemark` for them even though it's not really an
optimization. It just integrates better with the other diagnostics
(enabling is easy with `-pass-remark`).

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D147703
2023-04-11 09:26:20 +02:00
Changpeng Fang
3bc1e084ee AMDGPU: Created a subclass for the return address operand in the tail call return instruction
Summary:
  This is to avoid using the callee saved registers for the return address
of the tail call return instruction.

Reviewers:
  arsenm, cdevadas

Differential Revision:
  https://reviews.llvm.org/D147096
2023-04-10 10:53:33 -07:00
skc7
635c725b30 [AMDGPU][NFC] Regenerate test checks for merge-tbuffer.mir 2023-04-10 18:47:52 +05:30
mmarjano
f6e70ed1c7 [AMDGPU] Extend tbuffer_load_format merge
Add support for merging _IDXEN and _BOTHEN variants of
TBUFFER_LOAD_FORMAT instruction.
2023-04-10 12:24:21 +02:00
skc7
b434051dc8 [AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D147168
2023-04-10 11:34:14 +05:30
Sergei Barannikov
790252dfb3 [AMDGPU] Update mcp-overlap-after-propagation.mir test
The test was added in D69953, but since D67794 landed a bit later it no
longer catches the bug. Update the test to be representative again.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D146934
2023-04-09 01:09:00 +03:00