501 Commits

Author SHA1 Message Date
Pravin Jagtap
597fb7fb46 [AMDGPU] Switch to the new cl option amdgpu-atomic-optimizer-strategy.
Atomic optimizer is turned on by default through D152649. This patch
removes the usage of old command line option amdgpu-atomic-optimizations
and transfer the responsibility to `amdgpu-atomic-optimizer-strategy`.

We can safely remove old option when LLPC remove its all usage.

Reviewed By: foad, arsenm, #amdgpu, cdevadas

Differential Revision: https://reviews.llvm.org/D153007
2023-06-22 07:06:42 -04:00
Matt Arsenault
e777da468c AMDGPU: Delete old AMDGPUPropagateAttributes pass
The optimizing, non-broken features have all been moved to
AMDGPUAttributor. The only remaining piece of functionality was the
broken propagation of the wavesize features. This was fundamentally
broken and a hack for device library linking. It doesn't matter when
the device libraries are correctly linked and internalized.

In case of linked-as-normal-bitcode (as comgr still does), we're
reliant on the global subtarget anyway. If we can get away without
forcing target-cpu, we should just as well be able to get away without
propagating target-features.
2023-06-20 13:05:45 -04:00
Jay Foad
eb7491769a [AMDGPU] Reimplement the GFX11 early release VGPRs optimization
Implement this optimization in SIInsertWaitcnts, where we already have
information about whether there might be outstanding VMEM store
instructions. This has the following advantages:
- Correctly handles atomics-with-return.
- Correctly handles call instructions.
- Should be faster because it does not require running a separate pass.

Differential Revision: https://reviews.llvm.org/D153279
2023-06-19 17:12:54 +01:00
Pravin Jagtap
03d92501f3 [AMDGPU] Enable Atomic Optimizer and Default to Iterative Scan Strategy.
The D147408 implemented new Iterative approach for scan computations
and  added new flag `amdgpu-atomic-optimizer-strategy` which is
defaulted to DPP.

The changeset https://github.com/GPUOpen-Drivers/llpc/pull/2506
adapts to the new changes in LLPC.

This patch enables atomic optimizer pass and selects Iterative
approach for scan computations by default for compute pipeline.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D152649
2023-06-15 01:18:38 -04:00
Matt Arsenault
5b657f50b8 AMDGPU: Move LICM after AMDGPUCodeGenPrepare
The commit that added the run says it's to hoist uniform parts of
integer division expansion. That expansion is performed later, so this
didn't do anything in that case. Move this later so the original test
shows the improvement.

This also saves a run of "Canonicalize natural loops". Not sure why
this appears to be still getting a separate loop PM run. Also feels a
bit heavy to run this just for divide. Is there a way to specifically
hoist the divide sequence when it expands?
2023-06-10 07:37:32 -04:00
Matt Arsenault
3c848194f2 CodeGen: Expand memory intrinsics in PreISelIntrinsicLowering
Expand large or unknown size memory intrinsics into loops in the
default lowering pipeline if the target doesn't have the corresponding
libfunc. Previously AMDGPU had a custom pass which existed to call the
expansion utilities.

With a default no-libcall option, we can remove the libfunc checks in
LoopIdiomRecognize for these, which never made any sense. This also
provides a path to lifting the immarg restriction on
llvm.memcpy.inline.

There seems to be a bug where TLI reports functions as available if
you use -march and not -mtriple.
2023-06-09 21:04:37 -04:00
Pravin Jagtap
f6c8a8e9cb [AMDGPU] Iterative scan implementation for atomic optimizer.
This patch provides an alternative implementation to DPP for Scan Computations.

An alternative implementation iterates over all active lanes of Wavefront
using llvm.cttz and performs the following steps:
    1.  Read the value that needs to be atomically incremented using
        llvm.amdgcn.readlane intrinsic
    2.  Accumulate the result.
    3.  Update the scan result using llvm.amdgcn.writelane intrinsic
        if intermediate scan results are needed later in the kernel.

Reviewed By: arsenm, cdevadas

Differential Revision: https://reviews.llvm.org/D147408
2023-06-09 01:08:44 -04:00
Matt Arsenault
846a360e16 AMDGPU: Don't run AMDGPUAttributor with -O0 2023-06-08 07:52:37 -04:00
Valery Pykhtin
342acfc9bb [AMDGPU] Turn off pass to rewrite partially used virtual superregisters after RenameIndependentSubregs pass with registers of minimal size.
There is a failure with this pass in the case when target register class for a subregister isn't known from instruction description (for ex. COPY).
Currently in this situation the RC is obtained using TargetRegisterInfo::getSubRegisterClass but in general it's not working.

In order to fix this two things should be done:
1. Stop processing a subregister if the target register class is unknown (conservative approach)
2. Improve deduction of subregister' target register class (i.e by processing COPY chain)

I was going to implement point 1 but my tests use implicit operands for S_NOP and they don't have associated target register class and all tests fail.
Therefore I decided to turn off the pass now, implement point 1 and fix my tests.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D152291
2023-06-07 12:05:25 +02:00
Valery Pykhtin
8d0412ce9d [AMDGPU] Add pass to rewrite partially used virtual superregisters after RenameIndependentSubregs pass with registers of minimal size.
The main purpose of this is to simplify register pressure tracking as after the pass there is no need
to track subreg liveness anymore.

On the other hand this pass creates more possibilites for the subreg unaware code, as many of the subregs
becomes ordinary registers.

Intersting sideeffect: spill-vgpr.ll has lost a lot of spills.

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D139732
2023-05-26 09:05:44 +02:00
Anshil Gandhi
a22ef958cb [AMDGPUCodegenPrepare] Add NewPM Support
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D151241
2023-05-26 00:20:01 -06:00
Joseph Huber
4a1236e0f6 [AMDGPU] Add an option to disable manual ctor / dtor lowering
Currently AMDGPU offers extra ctor / dtor lowering by emitting a kernel
that can be called. It's possible to handle ctors and dtors using the
standard method as shown in D149340's commit message. In which case we
on't need these extra kernels as they won't be called. This patch simply
adds a way to conditionally turn off this handling if we do not want to
get extra kernels in the output.

Unrelated, but we could convert this handling to an ODR function that simply
calls the code in D149340 constructed via LLVM-IR. That would handle priority
correctly and would then be correct if not run in LTO mode.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D150565
2023-05-23 09:03:10 -05:00
Krzysztof Drewniak
f0415f2a45 Re-land "[AMDGPU] Define data layout entries for buffers""
Re-land D145441 with data layout upgrade code fixed to not break OpenMP.

This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27.

Differential Revision: https://reviews.llvm.org/D149776
2023-05-03 19:43:56 +00:00
Krzysztof Drewniak
3f2fbe92d0 Revert "[AMDGPU] Define data layout entries for buffers"
This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850.

Differential Revision: https://reviews.llvm.org/D149758
2023-05-03 16:11:00 +00:00
Krzysztof Drewniak
f9c1ede254 [AMDGPU] Define data layout entries for buffers
Per discussion at
https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798,
we define two new address spaces for AMDGCN targets.

The first is address space 7, a non-integral address space (which was
already in the data layout) that has 160-bit pointers (which are
256-bit aligned) and uses a 32-bit offset. These pointers combine a
128-bit buffer descriptor and a 32-bit offset, and will be usable with
normal LLVM operations (load, store, GEP). However, they will be
rewritten out of existence before code generation.

The second of these is address space 8, the address space for "buffer
resources". These will be used to represent the resource arguments to
buffer instructions, and new buffer intrinsics will be defined that
take them instead of <4 x i32> as resource arguments. ptr
addrspace(8). These pointers are 128-bits long (with the same
alignment). They must not be used as the arguments to getelementptr or
otherwise used in address computations, since they can have
arbitrarily complex inherent addressing semantics that can't be
represented in LLVM. Even though, like their address space 7 cousins,
these pointers have deterministic ptrtoint/inttoptr semantics, they
are defined to be non-integral in order to prevent optimizations that
rely on pointers being a [0, [addr_max]] value from applying to them.

Future work includes:
- Defining new buffer intrinsics that take ptr addrspace(8) resources.
- A late rewrite to turn address space 7 operations into buffer
intrinsics and offset computations.

This commit also updates the "fallback address space" for buffer
intrinsics to the buffer resource, and updates the alias analysis
table.

Depends on D143437

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D145441
2023-05-03 15:25:58 +00:00
Pravin Jagtap
21a69bdb66 [NewPM][AMDGPU] Port amdgpu-atomic-optimizer
Reviewed By: arsenm, sameerds, gandhi21299

Differential Revision: https://reviews.llvm.org/D148628
2023-04-20 00:27:47 -04:00
Bjorn Pettersson
21a6890856 [Vectorize] Clean up Transforms/Vectorize.h
Removed definitions of vectorizeBasicBlock and VectorizeConfig
(possibly a remnant from the BBVectorize pass that was removed
way back in 2017).

Also reduced amount of include dependencies to Transforms/Vectorize.h.
2023-04-17 13:54:19 +02:00
Jon Chesterfield
3c76e5f0c8 [amdgpu][nfc] Remove dead code associated with LDS lowering
Pass disabled since approximately D104962 for miscompiling openmp

The functions under ReplaceConstant miscompile phis as noted in D112717 and
have no users in tree other than the disabled pass. It seems likely it has no
users out of tree.

Deletes the test cases associated with the disabled pass as well.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D147586
2023-04-05 22:24:22 +01:00
Juan Manuel MARTINEZ CAAMAÑO
215cfa01f2 [AMDGPU][printf] Run AMDGPUPrintfRuntimeBindingPass in -O0
AMDGPUPrintfRuntimeBindingPass is not run in the IR optimization
pipeline with -O0.

This means that with OpenCL the printf definition coming from
device_libs gets linked with the user's code, which blocks
AMDGPUPrintfRuntimeBindingPass from working after the linkage is done.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D146720
2023-03-27 09:43:36 +02:00
Anshil Gandhi
b48e7c2d01 [AMDGPUUnifyDivergentExitNodes] Add NewPM support
Meanwhile, use UniformityAnalysis instead of LegacyDivergenceAnalysis to collect divergence info.

Reviewed By: arsenm, sameerds

Differential Revision: https://reviews.llvm.org/D141355
2023-03-25 14:04:36 -06:00
Arthur Eubanks
fa6ea7a419 [AlwaysInliner] Make legacy pass like the new pass
The legacy pass is only used in AMDGPU codegen, which doesn't care about running it in call graph order (it actually has to work around that fact).

Make the legacy pass a module pass and share code with the new pass.

This allows us to remove the legacy inliner infrastructure.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D146446
2023-03-21 11:04:22 -07:00
Vitaly Buka
aa15fe98b6 Revert "[AMDGPUUnifyDivergentExitNodes] Add NewPM support"
Introduces nullptr dereference.

This reverts commit a5455e32b364dabe499ec11722626d4bbaf047ba.
2023-03-16 19:03:46 -07:00
Anshil Gandhi
a5455e32b3 [AMDGPUUnifyDivergentExitNodes] Add NewPM support
Meanwhile, use UniformityAnalysis instead of LegacyDivergenceAnalysis to collect divergence info.

Reviewed By: arsenm, sameerds

Differential Revision: https://reviews.llvm.org/D141355
2023-03-16 16:13:29 +00:00
pvanhout
8e68c12045 [AMDGPU] Remove function with incompatible features
Adds a new pass that removes functions
if they use features that are not supported on the current GPU.

This change is aimed at preventing crashes when building code at O0 that
uses idioms such as `if (ISA_VERSION >= N) intrinsic_a(); else intrinsic_b();`
where ISA_VERSION is not constexpr, and intrinsic_a is not selectable
on older targets.
This is a pattern that's used all over the ROCm device libs. The main
motive behind this change is to allow code using ROCm device libs
to be built at O0.

Note: the feature checking logic is done ad-hoc in the pass. There is no other
pass that needs (or will need in the foreseeable future) to do similar
feature-checking logic so I did not see a need to generalize the feature
checking logic yet. It can (and should probably) be generalized later and
moved to a TargetInfo-like class or helper file.

Reviewed By: arsenm, Joe_Nash

Differential Revision: https://reviews.llvm.org/D139000
2023-02-21 10:42:39 +01:00
Bjorn Pettersson
2dd221fe48 Remove no longer needed includes of LegacyPassManager.h
Most of the removed includes should probably have been removed already
when we removed TargetMachine::adjustPassManager.
2023-02-06 13:38:57 +01:00
Matt Arsenault
e9c49901a4 AMDGPU/GlobalISel: Add stub custom regbankselect pass
Uniformity analysis needs to be the fundamental basis for
regbank decisions. The considerations of the default pass
are secondary, but potentially useful for some edge cases (e.g.
selecting AGPRs when arbitrary loads and stores can directly use
them). This needs to be a separate pass since it requires new
analysis dependencies.

Boilerplate to subclass the existing pass which does nothing
different.
2023-01-30 16:18:20 -04:00
Matt Arsenault
4463badf46 AMDGPU: Use DenormalMode type in FP mode tracking
This simplies a future patch. The MIR handling should be fixed. We're
still printing these in custom MachineFunctionInfo as bools (plus the
inverted meaning is hard to follow).
2022-12-21 20:35:48 -05:00
Matt Arsenault
69e75ae695 CodeGen: Don't lazily construct MachineFunctionInfo
This fixes what I consider to be an API flaw I've tripped over
multiple times. The point this is constructed isn't well defined, so
depending on where this is first called, you can conclude different
information based on the MachineFunction. For example, the AMDGPU
implementation inspected the MachineFrameInfo on construction for the
stack objects and if the frame has calls. This kind of worked in
SelectionDAG which visited all allocas up front, but broke in
GlobalISel which hasn't visited any of the IR when arguments are
lowered.

I've run into similar problems before with the MIR parser and trying
to make use of other MachineFunction fields, so I think it's best to
just categorically disallow dependency on the MachineFunction state in
the constructor and to always construct this at the same time as the
MachineFunction itself.

A missing feature I still could use is a way to access an custom
analysis pass on the IR here.
2022-12-21 10:49:32 -05:00
Christudasan Devadasan
a3028239a7 Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs"
This reverts commit 40ba0942e2ab1107f83aa5a0ee5ae2980bf47b1a.
2022-12-21 16:17:42 +05:30
Nick Desaulniers
ad99774a5f [llvm][PassSupport] don't require passes to be default constructible
Quite a few passes are not default constructible. In order to properly
support -{start|stop}-{before|after}= for these passes, we would like to
continue to use INITIALIZE_PASS, but not necessarily provide a default
constructor.

Delete the default constructors of classes derived from
SelectionDAGISel.

Link: https://github.com/llvm/llvm-project/issues/59538

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D140349
2022-12-20 14:07:29 -08:00
Christudasan Devadasan
40ba0942e2 [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.

This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.

Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which isn an unproblematic case.

This patch also implements the whole wave spills which
might occur if RA spills any live range of virtual registers
involved in the whole wave operations. Earlier, we had
been hand-picking registers for such machine operands.
But now with SGPR spills into virtual VGPR lanes, we are
exposing them to the allocator.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D124196
2022-12-17 11:56:32 +05:30
Jay Foad
6443c0ee02 [AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.

Differential Revision: https://reviews.llvm.org/D139828
2022-12-14 13:22:26 +00:00
Matt Arsenault
f23f26032d AMDGPU: Port AMDGPUCtorDtorLowering to new PM 2022-12-09 13:43:38 -05:00
Krzysztof Parzyszek
c589730ad5 [YAML] Convert Optional to std::optional 2022-12-06 12:49:32 -08:00
Fangrui Song
bac974278c CodeGen/CommandFlags: Convert Optional to std::optional 2022-12-03 18:38:12 +00:00
Krzysztof Parzyszek
8c7c20f033 Convert Optional<CodeModel> to std::optional<CodeModel> 2022-12-03 12:08:47 -06:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Nicolai Hähnle
43b86bf992 AMDGPU: Remove BufferPseudoSourceValue
The use of a PSV for buffer intrinsics is misleading because it may be
misinterpreted as all buffer intrinsics accessing the same address in
memory, which is clearly not true.

Instead, build MachineMemOperands without a pointer value but with an
address space, so that address space-based alias analysis can still
work.

There is a lot of test churn because previously address space 4
(constant address space) was used as an address space for buffer
intrinsics. This doesn't make much sense and seems to have been an
accident -- see the change in
AMDGPUTargetMachine::getAddressSpaceForPseudoSourceKind.

Differential Revision: https://reviews.llvm.org/D138711
2022-11-29 22:15:11 +01:00
Bjorn Pettersson
99c47d9e31 Remove TargetMachine::adjustPassManager
Since opt no longer supports to run default (O0/O1/O2/O3/Os/Oz)
pipelines using the legacy PM, there are no in-tree uses of
TargetMachine::adjustPassManager remaining. This patch removes the
no longer used adjustPassManager functions.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D137796
2022-11-28 10:24:16 +01:00
Ruiling Song
cf14c7caac AMDGPU: Add a pass to rewrite certain undef in PHI
For the pattern of IR (%if terminates with a divergent branch.),
divergence analysis will report %phi as uniform to help optimal code
generation.
```
  %if
  | \
  | %then
  | /
  %endif: %phi = phi [ %uniform, %if ], [ %undef, %then ]
```
In the backend, %phi and %uniform will be assigned a scalar register.
But the %undef from %then will make the scalar register dead in %then.
This will likely cause the register being over-written in %then. To fix
the issue, we will rewrite %undef as %uniform. For details, please refer
the comment in AMDGPURewriteUndefForPHI.cpp. Currently there is no test
changes shown, but this is mandatory for later changes.

Reviewed by: sameerds

Differential Revision: https://reviews.llvm.org/D133840
2022-09-26 09:54:47 +08:00
Austin Kerbow
b0f4678b90 [AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy
Adds a builtin that serves as an optimization hint to apply specific optimized
DAG mutations during scheduling. This also disables any other mutations or
clustering that may interfere with the desired pipeline. The first optimization
strategy that is added here is designed to improve the performance of small gemm
kernels on gfx90a.

Reviewed By: jrbyrnes

Differential Revision: https://reviews.llvm.org/D132079
2022-08-19 15:38:36 -07:00
Austin Kerbow
3dfa562643 [AMDGPU] Add CL option for max-ilp scheduler.
When compiling for multiple targets the scheduler that is selected via the
-misched option is applied globally. This patch adds a target CL option instead.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D131022
2022-08-02 16:52:14 -07:00
Austin Kerbow
d7100b398b [AMDGPU] Add GCNMaxILPSchedStrategy
Creates a new scheduling strategy that attempts to maximize ILP for a single
wave.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D130869
2022-08-02 13:21:24 -07:00
Jay Foad
e301e071ba [AMDGPU] Remove IR SpeculativeExecution pass from codegen pipeline
This pass seems to have very little effect because all it does is hoist
some instructions, but it is followed later in the codegen pipeline by
the IR CodeSinking pass which does the opposite.

Differential Revision: https://reviews.llvm.org/D130258
2022-08-02 17:35:20 +01:00
Jon Chesterfield
3a20597776 [amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS variable to avoid wasting space for it.

There are a number of implicit arguments accessed by intrinsic already
so this implementation closely follows the existing handling. It is slightly
novel in that this SGPR is written by the kernel prologue.

It is necessary in the general case to put variables at different addresses
such that they can be compactly allocated and thus necessary for an
indirect function call to have some means of determining where a
given variable was allocated. Claiming an arbitrary SGPR into which
an integer can be written by the kernel, in this implementation based
on metadata associated with that kernel, which is then passed on to
indirect call sites is sufficient to determine the variable address.

The intent is to emit a __const array of LDS addresses and index into it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D125060
2022-07-19 17:46:19 +01:00
Joe Nash
d1af09ad96 [AMDGPU] gfx11 Generate VOPD Instructions
We form VOPD  instructions in the GCNCreateVOPD pass by combining
back-to-back component instructions. There are strict register
constraints for creating a legal VOPD, namely that the matching operands
(e.g. src0x and src0y, src1x and src1y) must be in different register
banks. We add a PostRA scheduler
mutation to put possible VOPD components back-to-back.

Depends on D128442, D128270

Reviewed By: #amdgpu, rampitec

Differential Revision: https://reviews.llvm.org/D128656
2022-07-05 09:18:19 -04:00
Jay Foad
0f94d2b385 [AMDGPU] GFX11: automatically release VGPRs at the end of the shader
GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to
release a shader's VGPRs. Sending this at the end of a shader (just
before the s_endpgm) can help overall system performance in cases where
the s_endpgm would have to wait for outstanding VMEM stores to complete
before releasing the VGPRs.

Differential Revision: https://reviews.llvm.org/D128442
2022-06-30 20:55:14 +01:00
Jay Foad
cfb7ffdec0 [AMDGPU] New AMDGPUInsertDelayAlu pass
Differential Revision: https://reviews.llvm.org/D128270
2022-06-29 21:30:20 +01:00
Jay Foad
b5818e4eb4 [AMDGPU] Cluster stores as well as loads for GFX11
Differential Revision: https://reviews.llvm.org/D128517
2022-06-27 16:41:41 +01:00
Kazu Hirata
7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00