7812 Commits

Author SHA1 Message Date
Bjorn Pettersson
21a6890856 [Vectorize] Clean up Transforms/Vectorize.h
Removed definitions of vectorizeBasicBlock and VectorizeConfig
(possibly a remnant from the BBVectorize pass that was removed
way back in 2017).

Also reduced amount of include dependencies to Transforms/Vectorize.h.
2023-04-17 13:54:19 +02:00
Kazu Hirata
bfaf0e7858 [AMDGPU] Modernize Status and BlockData (NFC)
Identified with modernize-use-default-member-init.
2023-04-16 13:03:02 -07:00
Jay Foad
6b5067a81a [AMDGPU] Don't assert that image intrinsics are supported
Unsupported intrinsics should give a regular "cannot select" error.

Differential Revision: https://reviews.llvm.org/D148147
2023-04-16 19:54:55 +01:00
NAKAMURA Takumi
7d5d987e93 [CMake] Reorder and reformat deps 2023-04-17 00:32:16 +09:00
Kazu Hirata
972983539b [llvm] Apply fixes from readability-redundant-control-flow (NFC) 2023-04-16 00:13:46 -07:00
Kazu Hirata
4241d890ae [Target] Use range-based for loops (NFC) 2023-04-15 14:14:56 -07:00
Akshay Khadse
842dc35fc9 Guard against dereferencing a nullptr
In `lib/CodeGen/PrologEpilogInserter.cpp` file, `RS` is assigned via `RS = TRI->requiresRegisterScavenging(MF) ? new RegScavenger() : nullptr;`. This means that `RS` can be `nullptr`. While executing the `TFI->processFunctionBeforeFrameFinalized(MF, RS);`, the `RS` can be dereferenced in the call `RS->enterBasicBlock(MBB);` in file `lib/Target/AMDGPU/SIFrameLowering.cpp`

Reviewed By: skan, arsenm

Differential Revision: https://reviews.llvm.org/D146791
2023-04-15 11:30:43 +08:00
pvanhout
b3b3cb2d2f [AMDGPU] Less aggressively break large PHIs
In some cases, breaking large PHIs can very negatively affect
performance (3x more instructions observed in a particular test case).

This patch adds some basic profitability heuristics to help with some of these issues without affecting the "good" cases.
e.g. avoid breaking PHIs if it causes back-and-forth between vector/scalar form for no good reason.

Fixes SWDEV-392803
Fixes SWDEV-393781
Fixes SWDEV-394228

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D147786
2023-04-14 15:41:26 +02:00
David Stuttard
fc83f1de5d [AMDGPU] Add backend support for new PAL ELF Metadata 3.0
PAL Metadata 3.0 introduces an explicit structure in metadata for the
programmable registers written out by the compiler backend.
Rather than using opaque registers which can change between different
architectures and requires encoding the bitfield information in the backend,
which may change between versions.

This is the initial minimal implementation that enables the use of PAL Metadata
3.0.

The change itself should be NFC for non-PAL, although the way RSRC2 register is
handled has been changed slightly.

The test is fairly minimal, but checks that the metadata format looks as
expected and verifies a couple of special cases such as tgid_[xyz]_en handling
and PsInputAddr/Ena which also change to explicit fields.

Differential Revision: https://reviews.llvm.org/D147143
2023-04-14 09:57:13 +01:00
Diana Picus
b9ba05360e [AMDGPU] Don't S_MOV_B32 into $scc
The peephole optimizer tries to replace
```
%n:sgpr_32 = S_MOV_B32 x
$scc = COPY %n
```
with a `S_MOV_B32` directly into `$scc`.

This crashes because `S_MOV_B32` cannot take `$scc` as input.

We currently generate code like this from GlobalISel when lowering a
G_BRCOND with a constant condition. We should probably look into
removing this kind of branch altogether, but until then we should at
least not crash.

This patch fixes the issue by making sure we don't apply the peephole
optimization when trying to move into a physical register that
doesn't belong to the correct register class.

Differential Revision: https://reviews.llvm.org/D148117
2023-04-14 10:24:43 +02:00
Jay Foad
2d39f5b5cd [AMDGPU] Allow use of TTMP registers in AMDGPUResourceUsageAnalysis
With architected SGPRs, workgroup IDs are passed into a compute shader
in TTMP registers. Allow for this in AMDGPUResourceUsageAnalysis instead
of failing an assertion.

Differential Revision: https://reviews.llvm.org/D148239
2023-04-13 16:56:22 +01:00
Jay Foad
cf736e2325 [AMDGPU] Avoid else-after-return in isLegalAddressingMode. NFC. 2023-04-13 16:55:58 +01:00
Simon Pilgrim
9e30b87afb [TTI] getMinMaxReductionCost - add FastMathFlag argument
Similar to the getArithmeticReductionCost / getExtendedReductionCost calls (which really don't need to use std::optional<>).

This will be necessary to correct recognize fast/nnan fmax/fmul reductions which can avoid nan handling - which will allow us to remove the fmax/fmin special case in X86TTIImpl::getMinMaxCost and use getIntrinsicInstrCost like we do for integer reductions (63c3895327839ba5b57f5b99ec9e888abf976ac6).

Differential Revision: https://reviews.llvm.org/D148149
2023-04-13 10:42:42 +01:00
pvanhout
fd1d60873f [AMDGPU] Remove CC exception for Promote Alloca Limits
Apparently it was used to work around some issue that has been fixed.
Removing it helps with high scratch usage observed in some cases due to failed alloca promotion.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D145586
2023-04-13 08:48:34 +02:00
Matt Arsenault
f608ac6286 AMDGPU: Push fneg into bitcast of integer select
Avoids some regressions in the math libraries in a future
patch.
2023-04-12 06:48:58 -04:00
Michael Liao
72fc08a541 [InstCombine] Teach alloca replacement to handle addrspacecast
- As the address space cast may not be valid on a specific target,
  `addrspacecast` is not handled when an `alloca` is able to be replaced
  with the source of memcpy/memmove. This patch addresses that by
  querying a target hook on whether that address space cast is valid.
  For example, on most GPU targets, the cast from a global pointer to a
  generic pointer is valid.
- If that cast is allowedd (by querying `isValidAddrSpaceCast`), the
  replacement is enhanced to handle that `addrspacecast` as well.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D147025
2023-04-11 11:47:37 -04:00
Matt Arsenault
0f59720e1c AMDGPU: Fold fneg into bitcast of build_vector
The math libraries have a lot of code that performs
manual sign bit operations by bitcasting doubles to int2
and doing bithacking on them. This is a bad canonical form
we should rewrite to use high level sign operations directly
on double. To avoid codegen regressions, we need to do a better
job moving fnegs to operate only on the high 32-bits.

This is only halfway to fixing the real case.
2023-04-11 07:12:01 -04:00
Jon Chesterfield
e17c1bb494 [amdgpu][nfc] Update comments on LDS lowering 2023-04-11 10:48:19 +01:00
Diana Picus
d9bf8aba23 [AMDGPU] Add MMOs for GFX11 Streamout Instructions
The GFX11 NGG Streamout Instructions perform atomic operations on
dedicated registers. At the moment, they lack machine memory operands,
which causes the si-memory-legalizer pass to treat them conservatively
and introduce several unnecessary waits and cache invalidations.

This patch introduces a new address space to represent these special
registers and teaches instruction selection to add memory operands with
this new address space to DS_ADD/SUB_GS_REG_RTN.

Since this address space is meant to be compiler-internal, we move it
up a bit from the other address spaces and give it the number 128.
According to the LLVM Language Reference, address space numbers can go
all the way up to 2^24, but I'm not sure how well this is supported in
practice [1], so using a smaller number seems safer.

[1] 0107513fe7/llvm/utils/TableGen/IntrinsicEmitter.cpp (L401)

Differential Revision: https://reviews.llvm.org/D146031
2023-04-11 11:11:32 +02:00
Diana Picus
f8861ea023 clang-format file I'm about to touch. NFCI 2023-04-11 11:11:32 +02:00
pvanhout
0ff02cf015 [AMDGPU] Hide "removing function" diagnostics by default
Use an `OptimizationRemark` for them even though it's not really an
optimization. It just integrates better with the other diagnostics
(enabling is easy with `-pass-remark`).

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D147703
2023-04-11 09:26:20 +02:00
Kazu Hirata
53ead5215b [Target] Use isNullConstant and isOneConstant (NFC) 2023-04-10 18:23:07 -07:00
Changpeng Fang
3bc1e084ee AMDGPU: Created a subclass for the return address operand in the tail call return instruction
Summary:
  This is to avoid using the callee saved registers for the return address
of the tail call return instruction.

Reviewers:
  arsenm, cdevadas

Differential Revision:
  https://reviews.llvm.org/D147096
2023-04-10 10:53:33 -07:00
Jay Foad
f34a1953ce [AMDGPU] Fix AddedComplexity for s_buffer_load patterns. NFCI.
We set AddedComplexity = 100 for s_load patterns to prefer them over
global loads, but for s_buffer_load patterns there is no need to do
this and it was quietly overriding the AddedComplexity of each
individual GCNPat that is defined inside SMLoad_Pattern (but in practice
that did not appear to make any difference).

Differential Revision: https://reviews.llvm.org/D145396
2023-04-10 17:26:50 +01:00
mmarjano
f6e70ed1c7 [AMDGPU] Extend tbuffer_load_format merge
Add support for merging _IDXEN and _BOTHEN variants of
TBUFFER_LOAD_FORMAT instruction.
2023-04-10 12:24:21 +02:00
skc7
b434051dc8 [AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D147168
2023-04-10 11:34:14 +05:30
Chen Zheng
a3d5ec51ba [AMDGPU][Global-ISel] reuse extension related patterns in td file
However the imported rules can not be used for now because Global ISel
selectImpl() seems has some bug/limitation to create a illegl COPY
from VGPR to SGPR. So currently workaround this by not auto selecting these
patterns.

Fixes #61468

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D147780
2023-04-10 02:11:33 +00:00
Matt Arsenault
2a9f1dad2c AMDGPU: Fix LiveVariables verifier error for values defined before SI_END_CF
GlobalISel happens to insert some constant materializes before SI_END_CF
in one test. These need to be excluded from AliveBlocks since they
are defined in the original block and used in the split block,
so they aren't fully alive through either block.

The case where the value defined in the first block which was originally used
in a later block is still broken.

Avoids a verifier error in a future patch.
2023-04-08 07:05:35 -04:00
Matt Arsenault
7ac3ab34cb AMDGPU: Fix missing MIR serialization for PSInputAddr/PSInputEnable
Resuming any mir test for a pixel shader would assert in the AsmPrinter.
2023-04-08 07:05:35 -04:00
Jay Foad
985e24cc16 [AMDGPU] Fix a case of updating LiveIntervals in SIOptimizeExecMaskingPreRA
This was causing two test failures when I applied D129208 to enable
extra verification of LiveIntervals:

  LLVM :: CodeGen/AMDGPU/optimize-negated-cond-exec-masking-wave32.mir
  LLVM :: CodeGen/AMDGPU/optimize-negated-cond-exec-masking.mir

Differential Revision: https://reviews.llvm.org/D147721
2023-04-08 08:39:21 +01:00
Jay Foad
b2e98c18cc [AMDGPU] Fix comment in SIOptimizeExecMaskingPreRA 2023-04-07 11:07:50 +01:00
Ruiling Song
2ab6835f28 AMDGPU: mark SET_INACTIVE_* as convergent operation
set_inactive is actually a kind of operation that is passing certain
value from active threads to inactive threads. In later WWM operation,
the activated threads which were disabled before would read such
values passed to them by set_inactive operation. So I think the
set_inactive is a convergent operation.

Differential Revision: https://reviews.llvm.org/D147683
2023-04-07 09:10:43 +08:00
Nico Weber
72e01ef1f1 Revert "[AMDGPU] Add Lower Bound to PipelineSolver"
This reverts commit 3c42a58c4f20ae3b621733bf5ee6d57c912994a9.
Breaks tests on mac, see https://reviews.llvm.org/rG3c42a58c4f20ae3b621733bf5ee6d57c912994a9#1191724
2023-04-06 12:35:44 -04:00
Alexis Engelke
0c049ea60a [MC] Always encode instruction into SmallVector
All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream
to encode the instruction into a SmallVector. The raw_ostream however
incurs some overhead for the actual encoding.

This change allows an MCCodeEmitter to directly emit an instruction into
a SmallVector without using a raw_ostream and therefore allow for
performance improvments in encoding. A default path that uses existing
raw_ostream implementations is provided.

Reviewed By: MaskRay, Amir

Differential Revision: https://reviews.llvm.org/D145791
2023-04-06 16:21:49 +02:00
Jessica Del
04317d4da7 [AMDGPU][GISel] Add inverse ballot intrinsic
The inverse ballot intrinsic takes in a boolean mask for all lanes and
returns the boolean for the current lane. See SPIR-V's
`subgroupInverseBallot()` in the [[ https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt | GL_KHR_shader_subgroup extension ]].
This allows decision making via branch and select instructions with a manually
manipulated mask.

Implemented in GlobalISel and SelectionDAG, since currently both are supported.
The SelectionDAG required pseudo instructions to use the custom inserter.

The boolean mask needs to be uniform for all lanes.
Therefore we expect SGPR input. In case the source is in a
VGPR, we insert one or more `v_readfirstlane` instructions.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D146287
2023-04-06 07:46:50 +02:00
Jeff Byrnes
3c42a58c4f [AMDGPU] Add Lower Bound to PipelineSolver 2023-04-05 14:54:59 -07:00
Jon Chesterfield
3c76e5f0c8 [amdgpu][nfc] Remove dead code associated with LDS lowering
Pass disabled since approximately D104962 for miscompiling openmp

The functions under ReplaceConstant miscompile phis as noted in D112717 and
have no users in tree other than the disabled pass. It seems likely it has no
users out of tree.

Deletes the test cases associated with the disabled pass as well.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D147586
2023-04-05 22:24:22 +01:00
Jon Chesterfield
0507448d82 [amdgpu] Implement dynamic LDS accesses from non-kernel functions
The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so.

1/ First it crawls the callgraph to work out which external LDS variables are reachable from a given kernel
2/ Then it creates a new `extern char[0]` variable for each kernel, which will alias all the other extern LDS variables because that's the documented behaviour of these variables
3/ The address of that variable is written to a lookup table. The global variable is tagged with metadata to track what address it was allocated at by codegen
4/ The assembler builds the lookup table using the metadata
5/ Any non-kernel functions use the same magic intrinsic used by table lookups of non-dynamic LDS variables to find the address to use

Heavy overlap with the code paths taken for other lowering, in particular the same intrinsic is used to pass the dynamic scope information through the same sgpr as for table lookups of static LDS.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144233
2023-04-04 20:06:34 +01:00
Changpeng Fang
282b8ac1f0 Revert "AMDGPU: Created a subclass for the return address operand in the tail call return instruction"
This reverts commit 461a559bc9bd755436ba8f12f8b74757e03f9b9f.
2023-04-04 11:44:52 -07:00
Changpeng Fang
461a559bc9 AMDGPU: Created a subclass for the return address operand in the tail call return instruction
Summary:
  This is to avoid using the callee saved registers for the return address of the tail call return instruction.

Reviewers:
  arsenm, cdevadas

Differential Revision:
  https://reviews.llvm.org/D147096
2023-04-04 10:56:58 -07:00
Craig Topper
1f60c8d025 [IR] Replace calls to ConstantFP::getNullValue with ConstantFP::getZero. NFC
There is no getNullValue in ConstantFP. Due to inheritance, we're calling
Constant::getNullValue which handles any type including FP.
Since we already know we want an FP constant we can use ConstantFP::getZero
which might be faster and is a more readable name for an FP zero.
2023-04-03 23:14:02 -07:00
Jon Chesterfield
62951784f0 [amdgpu][nfc] Refactor prior to D144233 to remove noise from diff 2023-04-03 16:47:01 +01:00
Craig Topper
219ff07f72 [Targets] Rename Flag->Glue. NFC
Long long ago Glue was called Flag, and it was never completely
renamed.
2023-04-02 19:28:51 -07:00
Aaron Ballman
ec1f5b947e Revert "AMDGPU: Created a subclass for the return address operand in the tail call return instruction"
This reverts commit 7a98934fadc3581ff024a77dc696b62f1a538ad5.

This appears to have broken several bots, including:
https://lab.llvm.org/buildbot/#/builders/42/builds/9472
2023-04-01 10:50:59 -04:00
Simon Pilgrim
8153b92d9b [DAG] Add SelectionDAG::SplitScalar helper
Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source.

Differential Revision: https://reviews.llvm.org/D147264
2023-03-31 18:35:40 +01:00
Jay Foad
bdf52b5dfe [AMDGPU] Don't bother to use OffsetMode to define Real SMEM instructions
Various Real classes took an OffsetMode parameter, but only used it
to extract the suffix for the name of the corresponding pseudo. I found
this confusing because you couldn't usefully define and use a different
OffsetMode here, e.g. one with different operand types to affect how the
instruction was printed.

Overall I think it's simpler to just pass in the suffixed pseudo name
directly.

Differential Revision: https://reviews.llvm.org/D147242
2023-03-31 15:00:30 +01:00
Jay Foad
8bad806f29 [AMDGPU] Do not reserve 16-bit registers
There should be no need to reserve all SGPR hi16/lo16 halves, or all
AGPR hi16 halves. This should be done by marking the corresponding
register classes as not allocatable instead.

Differential Revision: https://reviews.llvm.org/D147158
2023-03-31 14:56:27 +01:00
pvanhout
16e1f8e970 Revert "[AMDGPU] Select v_sat_pk_u8_i16"
This reverts commit 64b45db34a0cd979dae9ca3016e9da517e57b987.

Reason: the patterns are wrong which can result in a miscompilation.
However, fixing the pattern is not trivial due to how i8 values
are handled, and due to the additional type-checking performed by
D147127: trunc/smax/smin are all defined as int ops in the DAG
despite them working on vectors too.

As this is not a much-needed pattern, I prefer reverting for now
until I can find time to properly rewrite the pattern.
2023-03-31 12:39:49 +02:00
Jay Foad
6b6303ac00 [AMDGPU] Fix whitespace after D147216 2023-03-31 11:10:25 +01:00
Changpeng Fang
7a98934fad AMDGPU: Created a subclass for the return address operand in the tail call return instruction
Summary:
  This is to avoid using the callee saved registers for the return address of the tail call return instruction.

Reviewers:
  arsenm, cdevadas

Differential Revision:
  https://reviews.llvm.org/D147096
2023-03-30 11:13:21 -07:00