594 Commits

Author SHA1 Message Date
Juan Manuel Martinez Caamaño
cbf34a5f77
[AMDGPU] Remove dead pass: AMDGPUMachineCFGStructurizer (#105645) 2024-08-23 14:06:17 +02:00
Matt Arsenault
dd90c72b05 AMDGPU: Temporarily stop adding AtomicExpand to new PM passes
This breaks using -passes=atomic-expand (but only sometimes?).
Somehow an AtomicExpand pass ends up running without a TargetMachine,
despite always being constructed with one.
2024-08-21 00:19:37 +04:00
Matt Arsenault
33e18b2b43
AMDGPU/NewPM: Start filling out addIRPasses (#102884)
This is not complete, but gets AtomicExpand running. I was able
to get further than I expected; we're quite close to having all
the IR codegen passes ported.
2024-08-20 23:38:05 +04:00
Matt Arsenault
afeef4dbc3
AMDGPU/NewPM: Fill out passes in addCodeGenPrepare (#102867)
AMDGPUAnnotateKernelFeatures hasn't been ported yet, but it
should be soon removable.
2024-08-20 23:35:01 +04:00
Matt Arsenault
7022498ac2
AMDGPU/NewPM: Start implementing addCodeGenPrepare (#102816) 2024-08-20 00:10:45 +04:00
Christudasan Devadasan
a449b85724
[AMDGPU][R600] Move R600CodeGenPassBuilder into R600TargetMachine(NFC). (#103721) 2024-08-19 20:40:12 +05:30
Christudasan Devadasan
a566635915
[AMDGPU] Move AMDGPUCodeGenPassBuilder into AMDGPUTargetMachine(NFC) (#103720)
This will allow us to reuse the existing flags and the static
functions while building the pipeline for new pass manager.
2024-08-19 20:32:55 +05:30
Matt Arsenault
36a0f20ac3
AMDGPU/NewPM: Fill out addPreISelPasses (#102814)
This specific callback should now be at parity with the old
pass manager version. There are still some missing IR passes
before this point.

Also I don't understand the need for the RequiresAnalysisPass at the
end. SelectionDAG should just be using the uncached getResult?
2024-08-14 20:57:00 +04:00
Shilei Tian
862f5040fb
[AMDGPU] Enable AMDGPUAttributorPass in full LTO (#102673)
This is basically same as
https://github.com/llvm/llvm-project/pull/102086 but reverts some test
case changes that are no longer needed.
2024-08-12 13:39:23 -04:00
Matt Arsenault
05b75e006b
AMDGPU/NewPM: Port AMDGPULateCodeGenPrepare to new pass manager (#102806) 2024-08-12 15:09:12 +04:00
Matt Arsenault
1c764b952a
AMDGPU: Use GCNTargetMachine in AMDGPUCodeGenPassBuilder (#102805)
R600 has a separate CodeGenPassBuilder anyway.
2024-08-12 15:02:48 +04:00
Matt Arsenault
dd094b2647
NewPM/AMDGPU: Port AMDGPUPerfHintAnalysis to new pass manager (#102645)
This was much more difficult than I anticipated. The pass is
not in a good state, with poor test coverage. The legacy PM
does seem to be relying on maintaining the map state between
different SCCs, which seems bad. The pass is going out of its
way to avoid putting the attributes it introduces onto non-callee
functions. If it just added them, we could use them directly
instead of relying on the map, I would think.

The NewPM path uses a ModulePass; I'm not sure if we should be
using CGSCC here but there seems to be some missing infrastructure
to support backend defined ones.
2024-08-11 15:11:10 +04:00
Matt Arsenault
3696a34e59
AMDGPU/NewPM: Port SILowerI1Copies to new pass manager (#102663) 2024-08-10 07:08:22 +04:00
Matt Arsenault
77e68fbdd3
AMDGPU/NewPM: Port AMDGPUAnnotateUniformValues to new pass manager (#102654) 2024-08-10 07:06:08 +04:00
Matt Arsenault
76f722f10c
AMDGPU/NewPM: Port SIAnnotateControlFlow to new pass manager (#102653)
Does not yet add it to the pass pipeline. Somehow it causes
2 tests to assert in SelectionDAG, in functions without any
control flow.
2024-08-10 07:02:21 +04:00
Shilei Tian
786c409234
[AMDGPU][Attributor] Add a pass parameter closed-world for AMDGPUAttributor pass (#101760) 2024-08-09 22:12:09 -04:00
Shilei Tian
492484e657 Revert "[AMDGPU] Move AMDGPUAttributorPass to full LTO post link stage (#102086)"
This reverts commit 2fe61a5acf272d6826352ef72f47196b01003fc5.
2024-08-09 15:12:24 -04:00
Shilei Tian
2fe61a5acf
[AMDGPU] Move AMDGPUAttributorPass to full LTO post link stage (#102086)
Currently `AMDGPUAttributorPass` is registered in default optimizer
pipeline.
This will allow the pass to run in default pipeline as well as at
thinLTO post
link stage. However, it will not run in full LTO post link stage. This
patch
moves it to full LTO.
2024-08-09 13:35:00 -04:00
Matt Arsenault
cf54cae26b
AMDGPU/NewPM: Port SIFixSGPRCopies to new pass manager (#102614)
This allows moving some tests relying on -stop-after=amdgpu-isel
to move to checking -stop-after=finalize-isel instead, which
will more reliably pass the verifier.
2024-08-09 17:52:41 +04:00
Christudasan Devadasan
15b41d207e
[CodeGen] change prototype of regalloc filter function (#93525)
[CodeGen] Change the prototype of regalloc filter function

Change the prototype of the filter function so that we can
filter not just by RegClass. We need to implement more
complicated filter based upon some other info associated
with each register.

Patch provided by: Gang Chen (gangc@amd.com)
2024-07-22 16:49:39 +05:30
Jay Foad
74b87b02d2 [AMDGPU] Fix and add namespace closing comments. NFC. 2024-07-16 16:56:31 +01:00
Matt Arsenault
b1bcb7ca46 Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799072438dd744b038e6fd50a2d78.

Drop the -O3 checks from default-attributes.hip. I don't know why they
are different on some bots but reverting this is far too disruptive.
2024-07-15 11:51:44 +04:00
dyung
adaff46d08
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0e6aa30538eb187990a6a8f53c0 and
78bc1b64a6dc3fb6191355a5e1b502be8b3668e7.

The test CodeGenHIP/default-attributes.hip is failing on multiple bots
even after the attempted fix including the following:
- https://lab.llvm.org/buildbot/#/builders/3/builds/1473
- https://lab.llvm.org/buildbot/#/builders/65/builds/1380
- https://lab.llvm.org/buildbot/#/builders/161/builds/595
- https://lab.llvm.org/buildbot/#/builders/154/builds/1372
- https://lab.llvm.org/buildbot/#/builders/133/builds/1547
- https://lab.llvm.org/buildbot/#/builders/81/builds/755
- https://lab.llvm.org/buildbot/#/builders/40/builds/570
- https://lab.llvm.org/buildbot/#/builders/13/builds/748
- https://lab.llvm.org/buildbot/#/builders/12/builds/1845
- https://lab.llvm.org/buildbot/#/builders/11/builds/1695
- https://lab.llvm.org/buildbot/#/builders/190/builds/1829
- https://lab.llvm.org/buildbot/#/builders/193/builds/962
- https://lab.llvm.org/buildbot/#/builders/23/builds/991
- https://lab.llvm.org/buildbot/#/builders/144/builds/2256
- https://lab.llvm.org/buildbot/#/builders/46/builds/1614

These bots have been broken for a day, so reverting to get everything
back to green.
2024-07-14 18:48:54 -07:00
Matt Arsenault
78bc1b64a6
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.

Mostly mechanical, but there are some creative test updates. I preferred
to take the changes as-is in tests where the ABI isn't relevant. In
cases where it's more relevant, or the optimize out logic was too
ingrained in the test, I pre-run the optimization. Some cases manually
add attributes to disable inputs.
2024-07-14 08:36:33 +04:00
Jeffrey Byrnes
5da7179cb3 [AMDGPU] Reland: Add IR LiveReg type-based optimization 2024-07-03 09:26:19 -07:00
Vitaly Buka
3e53c97d33
Revert "[AMDGPU] Add IR LiveReg type-based optimization" (#97138)
Part of #66838.

https://lab.llvm.org/buildbot/#/builders/52/builds/404
https://lab.llvm.org/buildbot/#/builders/55/builds/358
https://lab.llvm.org/buildbot/#/builders/164/builds/518

This reverts commit ded956440739ae326a99cbaef18ce4362e972679.
2024-06-28 23:18:26 -07:00
Jeffrey Byrnes
ded9564407 [AMDGPU] Add IR LiveReg type-based optimization
Change-Id: Ia0d11b79b8302e79247fe193ccabc0dad2d359a0
2024-06-28 15:01:39 -07:00
Nikita Popov
5cd0ba30f5
Reapply [IR] Lazily initialize the class to pass name mapping (NFC) (#96321) (#96462)
On MSVC the `this` uses inside `decltype` require a lambda capture. On
clang they result in an unused capture warning instead. Add the capture
and suppress the warning with `(void)this`.

-----

Initializing this map is somewhat expensive (especially for O0), so we
currently only do it if certain flags are used. I would like to make use
of it for crash dumps (#96078), where we don't know in advance whether
it will be needed or not.

This patch changes the initialization to a lazy approach, where a
callback is registered that does the actual initialization. The
callbacks will be run the first time the pass name is requested.

This way there is no compile-time impact if the mapping is not used.
2024-06-24 15:00:11 +02:00
Nikita Popov
e5a41f0afc Revert "[IR] Lazily initialize the class to pass name mapping (NFC) (#96321)"
My attempt to fix the Windows build made things worse,
revert entirely for now.

This reverts commit e7137f2fed5cfee822ae3c4c6d39188adb59a16c.
This reverts commit 6eaf204dbb0a6a81cddfd02f625c130f7bb1aae5.
This reverts commit 957dc4366dd2ce9d5d2991c3ad76bbf438e9954e.
2024-06-24 10:32:03 +02:00
Nikita Popov
957dc4366d
[IR] Lazily initialize the class to pass name mapping (NFC) (#96321)
Initializing this map is somewhat expensive (especially for O0), so we
currently only do it if certain flags are used. I would like to make use
of it for crash dumps (#96078), where we don't know in advance whether
it will be needed or not.

This patch changes the initialization to a lazy approach, where a
callback is registered that does the actual initialization. The
callbacks will be run the first time the pass name is requested.

This way there is no compile-time impact if the mapping is not used.
2024-06-24 09:40:09 +02:00
vg0204
c2fc7f75f6 Revert "[AMDGPU]Optimize SGPR spills (#93668)"
This reverts commit 4b9112e88a998ce620e4683548f2afd17cc5fe95. A separate
issue(#96353) describing it has been opened to further keep its track.
2024-06-24 12:36:36 +05:30
Vikash Gupta
4b9112e88a
[AMDGPU]Optimize SGPR spills (#93668)
This PR is dependent on
[#93779](https://github.com/llvm/llvm-project/pull/93779).

As currently, each SGPR Spills are lowered to go into distinct stack
slots in stack frame after SGPR allocation phase. Therefore, this patch
utilizes the capability of StackSlotColoring to ensure the stack slot
sharing if possible for stack frame index, where the SGPR spills are
occuring in the non-interfering region.

StackSlotColoring is introduced immediately after SGPR register
allocation, just to ensure that any further lowering happens on the
optimally allocated stack slots, with certain flags to indicate the
preservation of certain analysis result later to be used by RA of other
register classes.
2024-06-18 20:55:31 +05:30
Pierre van Houtryve
d95b82c49a
[NFC][AMDGPU] Make AMDGPUSplitModule a ModulePass (#95773)
It allows it to access TTI correctly, and opens the door to accessing
more analysis in the future.

I went back and forth between this, and also making the default
SplitModule a Pass too to make it uniform, but I decided against it
because it's just needless complications. Neither llvm-split or
LTOBackend have a PM ready to use so we need to create one anyway. Let's
keep all the mess hidden in the AMDGPU version for now to keep this
change more self-contained.
2024-06-18 09:16:32 +02:00
paperchalice
1bc8b3258e
[NewPM][CodeGen] Port regallocfast to new pass manager (#94426)
This pull request port `regallocfast` to new pass manager. It exposes
the parameter `filter` to handle different register classes for AMDGPU.
IIUC AMDGPU need to allocate different register classes separately so it
need implement its own `--<reg-class>-regalloc`. Now users can use e.g.
`-passe=regallocfast<filter=sgpr>` to allocate specific register class.
The command line option `--regalloc-npm` is still in work progress, plan
to reuse the syntax of passes, e.g. use
`--regalloc-npm=regallocfast<filter=sgpr>,greedy<filter=vgpr>` to
replace `--sgpr-regalloc` and `--vgpr-regalloc`.
2024-06-07 12:22:42 +08:00
Jon Chesterfield
8516f54e6a
[AMDGPU] Implement variadic functions by IR lowering (#93362)
This is a mostly-target-independent variadic function optimisation and
lowering pass. It is only enabled for AMDGPU in this initial commit.

The purpose is to make C style variadic functions a zero cost
abstraction. They are lowered to equivalent IR which is then amenable to
other optimisations. This is inherently slightly target specific but
much less so than one might expect - the C varargs interface heavily
constrains the ABI design divergence.

The pass is primarily tested from webassembly. This is because wasm has
a straightforward variadic lowering strategy which coincides exactly
with what this pass transforms code into and a struct passing convention
with few cases to check. Adding further targets conventions is
straightforward and elided from this patch primarily to simplify the
review. Implemented in other branches are Linux X86, AMD64, AArch64 and
NVPTX.

Testing for targets that have existing lowering for va_arg from clang is
most efficiently done by checking that clang | opt completely elides the
variadic syntax from test cases. The lowering produces a struct for each
call site which can be inspected to check the various alignment and
indirections are correct.

AMDGPU presently has no variadic support other than some ad hoc printf
handling. Combined with the pass being inactive on all other targets
landing this represents strict increase in capability with zero risk.
Testing and refining will continue post commit.

In addition to the compiler tests included here, a self contained x64
clang/musl toolchain was constructed using the "lowering" instead of the
systemv ABI and used to build various C programs like lua and libxml2.
2024-06-06 10:44:53 +01:00
paperchalice
7652a59407
Reland "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94149)
- Fix build with `EXPENSIVE_CHECKS`
- Remove unused `PassName::ID` to resolve warning
- Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly
2024-06-04 08:10:58 +08:00
paperchalice
8917afaf0e
Revert "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94146)
This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to
de37c06f01772e02465ccc9f538894c76d89a7a1

It still breaks EXPENSIVE_CHECKS build. Sorry.
2024-06-02 14:31:52 +08:00
paperchalice
d2cdc8ab45
[NewPM][CodeGen] Port selection dag isel to new pass manager (#83567)
Port selection dag isel to new pass manager.
Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.
2024-06-02 09:12:33 +08:00
Pierre van Houtryve
43fd244b3d Reland "[AMDGPU] Add AMDGPU-specific module splitting (#89245)"
(with fix for ubsan)

This enables the --lto-partitions option to work more consistently.

This module splitting logic is fully aware of AMDGPU modules and their
specificities and takes advantage of
them to split modules in a way that avoids compilation issue (such as
resource usage being incorrectly represented).

This also includes a logging system that's more elaborate than just
LLVM_DEBUG which allows
printing logs to uniquely named files, and optionally with all value
names hidden so they can be safely shared without leaking informatiton
about the source. Logs can also be enabled through an environment
variable, which avoids the sometimes complicated process of passing a
-mllvm option all the way from clang driver to the offload linker that
handles full LTO codegen.
2024-05-27 10:43:00 +02:00
Vitaly Buka
5a48223d1c
Revert "[AMDGPU] Add AMDGPU-specific module splitting" (#93275)
Fails on https://lab.llvm.org/buildbot/#/builders/85/builds/24181
and https://lab.llvm.org/buildbot/#/builders/5/builds/43589

Reverts llvm/llvm-project#89245
2024-05-23 23:48:39 -07:00
Pierre van Houtryve
d7c3713000
[AMDGPU] Add AMDGPU-specific module splitting (#89245)
This enables the --lto-partitions option to work more consistently.

This module splitting logic is fully aware of AMDGPU modules and their
specificities and takes advantage of
them to split modules in a way that avoids compilation issue (such as
resource usage being incorrectly represented).

This also includes a logging system that's more elaborate than just
LLVM_DEBUG which allows
printing logs to uniquely named files, and optionally with all value
names hidden so they can be safely shared without leaking informatiton
about the source. Logs can also be enabled through an environment
variable, which avoids the sometimes complicated process of passing a
-mllvm option all the way from clang driver to the offload linker that
handles full LTO codegen.
2024-05-23 12:26:24 +02:00
paperchalice
b4ba3fe006
[NewPM][AMDGPU] Add CodeGenPassBuilder (#91040)
In order to test SelectionDAG for target AMDGPU, we need
CodeGenPassBuilder.
2024-05-19 15:57:02 +08:00
Matt Arsenault
bc3620d3a8
AMDGPU: Move libcall simplify into PeepholeEP (#88853)
We were running this immediately on the incoming IR, which
is still littered with temporary allocas obscuring trivial values.
This needs to run after initial SROA to handle sincos insertion.
2024-04-17 08:50:14 +02:00
paperchalice
a2dfc9ac7d
[NewPM][AMDGPU] Add AMDGPUPassRegistry.def (#86095)
Move the pass registry to a separate file, prepare for porting dag-isel.
2024-03-22 08:49:29 +08:00
Pierre van Houtryve
95a834a16c
(Reland) [AMDGPU] Run LowerLDS at the end of the fullLTO pipeline (#85626)
Reland of #75333
2024-03-21 11:44:47 +01:00
pvanhout
3493438605 Revert "[AMDGPU] Run LowerLDS at the end of the fullLTO pipeline (#75333)"
This reverts commit 9b98692eedb78aa106539c36ba02944f32cae1ff.
2024-03-18 11:18:57 +01:00
Pierre van Houtryve
9b98692eed
[AMDGPU] Run LowerLDS at the end of the fullLTO pipeline (#75333)
This change allows us to use `--lto-partitions` in some cases (not at
all guaranteed it works perfectly), as LDS is lowered before the module
is split for parallel codegen.

We must run LowerLDS before splitting modules as it needs to see all
callers of functions with LDS to properly lower them.
2024-03-18 09:09:43 +01:00
Krzysztof Drewniak
6540f1635a
[AMDGPU] Add IR-level pass to rewrite away address space 7 (#77952)
This commit adds the -lower-buffer-fat-pointers pass, which is
applicable to all AMDGCN compilations.

The purpose of this pass is to remove the type `ptr addrspace(7)` from
incoming IR. This must be done at the LLVM IR level because `ptr
addrspace(7)`, as a 160-bit primitive type, cannot be correctly handled
by SelectionDAG.

The detailed operation of the pass is described in comments, but, in
summary, the removal proceeds by:
1. Rewriting loads and stores of ptr addrspace(7) to loads and stores of
i160 (including vectors and aggregates). This is needed because the
in-register representation of these pointers will stop matching their
in-memory representation in step 2, and so ptrtoint/inttoptr operations
are used to preserve the expected memory layout

2. Mutating the IR to replace all occurrences of `ptr addrspace(7)` with
the type `{ptr addrspace(8), ptr addrspace(6) }`, which makes the two
parts of a buffer fat pointer (the 128-bit address space 8 resource and
the 32-bit address space 6 offset) visible in the IR. This also impacts
the argument and return types of functions.

3. *Splitting* the resource and offset parts. All instructions that
produce or consume buffer fat pointers (like GEP or load) are rewritten
to produce or consume the resource and offset parts separately. For
example, GEP updates the offset part of the result and a load uses the
resource and offset parts to populate the relevant
llvm.amdgcn.raw.ptr.buffer.load intrinsic call.

At the end of this process, the original mutated instructions are
replaced by their new split counterparts, ensuring no invalidly-typed IR
escapes this pass. (For operations like call, where the struct form is
needed, insertelement operations are inserted).

Compared to LGC's PatchBufferOp (

32cda89776/lgc/patch/PatchBufferOp.cpp
): this pass
- Also handles vectors of ptr addrspace(7)s
- Also handles function boundaries
- Includes the same uniform buffer optimization for loops and
conditionals
- Does *not* handle memcpy() and friends (this is future work)
- Does *not* break up large loads and stores into smaller parts. This
should be handled by extending the legalization
of *.buffer.{load,store} to handle larger types by producing multiple
instructions (the same way ordinary LOAD and STORE are legalized). That
work is planned for a followup commit.
- Does *not* have special logic for handling divergent buffer
descriptors. The logic in LGC is, as far as I can tell, incorrect in
general, and, per discussions with @nhaehnle, isn't widely used.
Therefore, divergent descriptors are handled with waterfall loops later
in legalization.

As a final matter, this commit updates atomic expansion to treat buffer
operations analogously to global ones.

(One question for reviewers: is the new pass is the right place? Should
it be later in the pipeline?)

Differential Revision: https://reviews.llvm.org/D158463
2024-03-06 09:49:58 -06:00
Sameer Sahasrabuddhe
60822637bf Restore "Implement convergence control in MIR using SelectionDAG (#71785)"
This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.
Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306.

LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-03-06 12:19:32 +05:30
Mitch Phillips
f010b1bef4 Revert "Restore "Implement convergence control in MIR using SelectionDAG (#71785)""
This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.

Reason: Broke the sanitizer buildbots. See the comments at
https://github.com/llvm/llvm-project/pull/71785
for more information.
2024-03-04 17:05:34 +01:00