563 Commits

Author SHA1 Message Date
Diana Picus
3356208531
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512)
This reverts commit
7792b4ae79.

The problem was a conflict with
e55d6f5ea2
"[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive
(https://github.com/llvm/llvm-project/pull/107889)"
which changed the syntax of V_SET_INACTIVE (and thus made my MIR test
crash).

...if only we had a merge queue.
2024-09-13 11:54:30 +02:00
Diana Picus
7792b4ae79
Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)"" (#108341)
Reverts llvm/llvm-project#108173

si-init-whole-wave.mir crashes on some buildbots (although it passed
both locally with sanitizers enabled and in pre-merge tests).
Investigating.
2024-09-12 10:12:09 +02:00
Diana Picus
703ebca869
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)" (#108173)
This reverts commit
c7a7767fca.

The buildbots failed because I removed a MI from its parent before
updating LIS. This PR should fix that.
2024-09-12 09:11:41 +02:00
Akshat Oke
e1ee07d0ff
[AMDGPU][NewPM] Port SIPeepholeSDWA pass to NPM (#107049) 2024-09-11 14:30:16 +04:00
Vitaly Buka
c7a7767fca
Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)
Breaks bots, see #105822.

Reverts llvm/llvm-project#105822
2024-09-10 09:51:43 -07:00
Diana Picus
44556e64f2
[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)
This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may contain
complex control flow that makes it unsuitable for the use of the
existing WWM intrinsics. Instead, we will pretend that the function
starts with all the lanes enabled, then branches into the actual body of
the function for the lanes that were meant to run it, and then finally
all the lanes will rejoin and run the tail.

As such, the intrinsic will return the EXEC mask for the body of the
function, and is meant to be used only as part of a very limited pattern
(for now only in amdgpu_cs_chain functions):

```
entry:
  %func_exec = call i1 @llvm.amdgcn.init.whole.wave()
  br i1 %func_exec, label %func, label %tail

func:
  ; ... stuff that should run with the actual EXEC mask
  br label %tail

tail:
  ; ... stuff that runs with all the lanes enabled;
  ; can contain more than one basic block
```

It's an error to use the result of this intrinsic for anything
other than a branch (but unfortunately checking that in the verifier is
non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if
between the intrinsic and the branch).

The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now
is expanded in si-wqm (which is where SI_INIT_EXEC is handled too);
however the information that the function was conceptually started in
whole wave mode is stored in the machine function info
(hasInitWholeWave). This will be useful in prolog epilog insertion,
where we can skip saving the inactive lanes for CSRs (since if the
function started with all the lanes active, then there are no inactive
lanes to preserve).
2024-09-10 13:24:53 +02:00
Christudasan Devadasan
6c143a86cd
[CodeGen][NewPM] Port MachineCSE pass to new pass manager. (#106605) 2024-09-04 18:54:07 +05:30
Christudasan Devadasan
042104985c
[AMDGPU][NewPM] Port SIShrinkInstructions to new pass manager. (#106967) 2024-09-03 10:52:50 +05:30
Akshat Oke
da13754103
AMDGPU/NewPM Port SILoadStoreOptimizer to NPM (#106362) 2024-09-02 11:41:56 +05:30
Shilei Tian
84ed3c29e8
Revert "[AMDGPU][LTO] Assume closed world after linking (#105845)" (#106889)
We can't assume closed world even in full LTO post-link stage. It is
only true
if we are building a "GPU executable". However, AMDGPU does support
"dyamic
library". I'm not aware of any approach to tell if it is relocatable
link when
we create the pass. For now let's revert the patch as it is currently
breaking things.
We can re-enable it once we can handle it correctly.
2024-09-01 09:32:08 -04:00
Akshat Oke
fdca2c33a1
AMDGPU/NewPM Port GCNDPPCombine to NPM (#105816)
Co-authored-by: Akshat Oke <Akshat.Oke@amd.com>
2024-08-29 14:49:52 +05:30
Akshat Oke
2adc94cd6c
AMDGPU/NewPM: Port SIFoldOperands to new pass manager (#105801) 2024-08-29 11:34:54 +05:30
Chaitanya
1f02be2e17
[AMDGPU] Enable "amdgpu-sw-lower-lds" pass in pipeline. (#89206)
This PR enables "amdgpu-sw-lower-lds" pass in the pipeline.
Also introduces "amdgpu-enable-sw-lower-lds" cmd line flag to
enbale/disable the pass.
2024-08-26 14:21:19 +05:30
Chaitanya
7bc9d95b7e
[AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses. (#87265)
This PR introduces new pass "amdgpu-sw-lower-lds". 

This pass lowers the local data store, LDS, uses in kernel and
non-kernel functions in module to use dynamically allocated global
memory. Packed LDS Layout is emulated in the global memory.
The lowered memory instructions from LDS to global memory are then
instrumented for address sanitizer, to catch addressing errors.
This pass only work when address sanitizer has been enabled and has
instrumented the IR. It identifies that IR has been instrumented using
"nosanitize_address" module flag.

For a kernel, LDS access can be static or dynamic which are direct
(accessed within kernel) and indirect (accessed through non-kernels).

**Replacement of Kernel LDS accesses:** 
- All the LDS accesses corresponding to kernel will be packed together,
where all static LDS accesses will be allocated first and then dynamic
LDS follows. The total size with alignment is calculated. A new LDS
global will be created for the kernel called "SW LDS" and it will have
the attribute "amdgpu-lds-size" attached with value of the size
calculated. All the LDS accesses in the module will be replaced by GEP
with offset into the "Sw LDS".
- A new "llvm.amdgcn.<kernel>.dynlds" is created per kernel accessing
the dynamic LDS. This will be marked used by kernel and will have
MD_absolue_symbol metadata set to total static LDS size, Since dynamic
LDS allocation starts after all static LDS allocation.

- A device global memory equal to the total LDS size will be allocated.
At the prologue of the kernel, a single work-item from the work-group,
does a "malloc" and stores the pointer of the allocation in "SW LDS". To
store the offsets corresponding to all LDS accesses, another global
variable is created which will be called "SW LDS metadata" in this pass.

- **SW LDS:** 
It is LDS global of ptr type with name
"llvm.amdgcn.sw.lds.<kernel-name>".

- **SW LDS Metadata:** 
It is of struct type, with n members. n equals the number of LDS globals
accessed by the kernel(direct and indirect). Each member of struct is
another struct of type {i32, i32, i32}. First member corresponds to
offset, second member corresponds to size of LDS global being replaced
and third represents the total aligned size. It will have name
"llvm.amdgcn.sw.lds.<kernel-name>.md". This global will have an
intializer with static LDS related offsets and sizes initialized. But
for dynamic LDS related entries, offsets will be intialized to previous
static LDS allocation end offset. Sizes for them will be zero initially.
These dynamic LDS offset and size values will be updated with in the
kernel, since kernel can read the dynamic LDS size allocation done at
runtime with query to "hidden_dynamic_lds_size" hidden kernel argument.

- At the epilogue of kernel, allocated memory would be made free by the
same single work-item.

**Replacement of non-kernel LDS accesses:** 
- Multiple kernels can access the same non-kernel function. All the
kernels accessing LDS through non-kernels are sorted and assigned a
kernel-id. All the LDS globals accessed by non-kernels are sorted.

- This information is used to build two tables: 
- **Base table:** 
Base table will have single row, with elements of the row placed as per
kernel ID. Each element in the row corresponds to ptr of "SW LDS"
variable created for that kernel.

- **Offset table:** 
Offset table will have multiple rows and columns. Rows are assumed to be
from 0 to (n-1). n is total number of kernels accessing the LDS through
non-kernels. Each row will have m elements. m is the total number of
unique LDS globals accessed by all non-kernels. Each element in the row
correspond to the ptr of the replacement of LDS global done by that
particular kernel.

- A LDS variable in non-kernel will be replaced based on the information
from base and offset tables. Based on kernel-id query, ptr of "SW LDS"
for that corresponding kernel is obtained from base table. The Offset
into the base "SW LDS" is obtained from corresponding element in offset
table. With this information, replacement value is obtained.
2024-08-26 08:59:26 +05:30
Anshil Gandhi
033e225d90
Revert "Revert "[AMDGPU][LTO] Assume closed world after linking (#105845)" (#106000)" (#106001)
This reverts commit 4b6c064dd124c70ff163411dff120c6174e0e022.

Add a requirement for an amdgpu target in the test.
2024-08-25 17:23:36 -04:00
Anshil Gandhi
4b6c064dd1
Revert "[AMDGPU][LTO] Assume closed world after linking (#105845)" (#106000)
This reverts commit 33f3ebc86e7d3afcb65c551feba5bbc2421b42ed.
2024-08-25 14:56:39 -04:00
Anshil Gandhi
33f3ebc86e
[AMDGPU][LTO] Assume closed world after linking (#105845) 2024-08-25 14:06:29 -04:00
Juan Manuel Martinez Caamaño
5def27c72c
[AMDGPU] Remove "amdgpu-enable-structurizer-workarounds" flag (#105819) 2024-08-23 15:04:03 +02:00
Juan Manuel Martinez Caamaño
2b4b909509
[AMDGPU] Remove unused amdgpu-disable-structurizer flag (#105800) 2024-08-23 14:14:17 +02:00
Juan Manuel Martinez Caamaño
cbf34a5f77
[AMDGPU] Remove dead pass: AMDGPUMachineCFGStructurizer (#105645) 2024-08-23 14:06:17 +02:00
Matt Arsenault
dd90c72b05 AMDGPU: Temporarily stop adding AtomicExpand to new PM passes
This breaks using -passes=atomic-expand (but only sometimes?).
Somehow an AtomicExpand pass ends up running without a TargetMachine,
despite always being constructed with one.
2024-08-21 00:19:37 +04:00
Matt Arsenault
33e18b2b43
AMDGPU/NewPM: Start filling out addIRPasses (#102884)
This is not complete, but gets AtomicExpand running. I was able
to get further than I expected; we're quite close to having all
the IR codegen passes ported.
2024-08-20 23:38:05 +04:00
Matt Arsenault
afeef4dbc3
AMDGPU/NewPM: Fill out passes in addCodeGenPrepare (#102867)
AMDGPUAnnotateKernelFeatures hasn't been ported yet, but it
should be soon removable.
2024-08-20 23:35:01 +04:00
Matt Arsenault
7022498ac2
AMDGPU/NewPM: Start implementing addCodeGenPrepare (#102816) 2024-08-20 00:10:45 +04:00
Christudasan Devadasan
a449b85724
[AMDGPU][R600] Move R600CodeGenPassBuilder into R600TargetMachine(NFC). (#103721) 2024-08-19 20:40:12 +05:30
Christudasan Devadasan
a566635915
[AMDGPU] Move AMDGPUCodeGenPassBuilder into AMDGPUTargetMachine(NFC) (#103720)
This will allow us to reuse the existing flags and the static
functions while building the pipeline for new pass manager.
2024-08-19 20:32:55 +05:30
Matt Arsenault
36a0f20ac3
AMDGPU/NewPM: Fill out addPreISelPasses (#102814)
This specific callback should now be at parity with the old
pass manager version. There are still some missing IR passes
before this point.

Also I don't understand the need for the RequiresAnalysisPass at the
end. SelectionDAG should just be using the uncached getResult?
2024-08-14 20:57:00 +04:00
Shilei Tian
862f5040fb
[AMDGPU] Enable AMDGPUAttributorPass in full LTO (#102673)
This is basically same as
https://github.com/llvm/llvm-project/pull/102086 but reverts some test
case changes that are no longer needed.
2024-08-12 13:39:23 -04:00
Matt Arsenault
05b75e006b
AMDGPU/NewPM: Port AMDGPULateCodeGenPrepare to new pass manager (#102806) 2024-08-12 15:09:12 +04:00
Matt Arsenault
1c764b952a
AMDGPU: Use GCNTargetMachine in AMDGPUCodeGenPassBuilder (#102805)
R600 has a separate CodeGenPassBuilder anyway.
2024-08-12 15:02:48 +04:00
Matt Arsenault
dd094b2647
NewPM/AMDGPU: Port AMDGPUPerfHintAnalysis to new pass manager (#102645)
This was much more difficult than I anticipated. The pass is
not in a good state, with poor test coverage. The legacy PM
does seem to be relying on maintaining the map state between
different SCCs, which seems bad. The pass is going out of its
way to avoid putting the attributes it introduces onto non-callee
functions. If it just added them, we could use them directly
instead of relying on the map, I would think.

The NewPM path uses a ModulePass; I'm not sure if we should be
using CGSCC here but there seems to be some missing infrastructure
to support backend defined ones.
2024-08-11 15:11:10 +04:00
Matt Arsenault
3696a34e59
AMDGPU/NewPM: Port SILowerI1Copies to new pass manager (#102663) 2024-08-10 07:08:22 +04:00
Matt Arsenault
77e68fbdd3
AMDGPU/NewPM: Port AMDGPUAnnotateUniformValues to new pass manager (#102654) 2024-08-10 07:06:08 +04:00
Matt Arsenault
76f722f10c
AMDGPU/NewPM: Port SIAnnotateControlFlow to new pass manager (#102653)
Does not yet add it to the pass pipeline. Somehow it causes
2 tests to assert in SelectionDAG, in functions without any
control flow.
2024-08-10 07:02:21 +04:00
Shilei Tian
786c409234
[AMDGPU][Attributor] Add a pass parameter closed-world for AMDGPUAttributor pass (#101760) 2024-08-09 22:12:09 -04:00
Shilei Tian
492484e657 Revert "[AMDGPU] Move AMDGPUAttributorPass to full LTO post link stage (#102086)"
This reverts commit 2fe61a5acf272d6826352ef72f47196b01003fc5.
2024-08-09 15:12:24 -04:00
Shilei Tian
2fe61a5acf
[AMDGPU] Move AMDGPUAttributorPass to full LTO post link stage (#102086)
Currently `AMDGPUAttributorPass` is registered in default optimizer
pipeline.
This will allow the pass to run in default pipeline as well as at
thinLTO post
link stage. However, it will not run in full LTO post link stage. This
patch
moves it to full LTO.
2024-08-09 13:35:00 -04:00
Matt Arsenault
cf54cae26b
AMDGPU/NewPM: Port SIFixSGPRCopies to new pass manager (#102614)
This allows moving some tests relying on -stop-after=amdgpu-isel
to move to checking -stop-after=finalize-isel instead, which
will more reliably pass the verifier.
2024-08-09 17:52:41 +04:00
Christudasan Devadasan
15b41d207e
[CodeGen] change prototype of regalloc filter function (#93525)
[CodeGen] Change the prototype of regalloc filter function

Change the prototype of the filter function so that we can
filter not just by RegClass. We need to implement more
complicated filter based upon some other info associated
with each register.

Patch provided by: Gang Chen (gangc@amd.com)
2024-07-22 16:49:39 +05:30
Jay Foad
74b87b02d2 [AMDGPU] Fix and add namespace closing comments. NFC. 2024-07-16 16:56:31 +01:00
Matt Arsenault
b1bcb7ca46 Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799072438dd744b038e6fd50a2d78.

Drop the -O3 checks from default-attributes.hip. I don't know why they
are different on some bots but reverting this is far too disruptive.
2024-07-15 11:51:44 +04:00
dyung
adaff46d08
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0e6aa30538eb187990a6a8f53c0 and
78bc1b64a6dc3fb6191355a5e1b502be8b3668e7.

The test CodeGenHIP/default-attributes.hip is failing on multiple bots
even after the attempted fix including the following:
- https://lab.llvm.org/buildbot/#/builders/3/builds/1473
- https://lab.llvm.org/buildbot/#/builders/65/builds/1380
- https://lab.llvm.org/buildbot/#/builders/161/builds/595
- https://lab.llvm.org/buildbot/#/builders/154/builds/1372
- https://lab.llvm.org/buildbot/#/builders/133/builds/1547
- https://lab.llvm.org/buildbot/#/builders/81/builds/755
- https://lab.llvm.org/buildbot/#/builders/40/builds/570
- https://lab.llvm.org/buildbot/#/builders/13/builds/748
- https://lab.llvm.org/buildbot/#/builders/12/builds/1845
- https://lab.llvm.org/buildbot/#/builders/11/builds/1695
- https://lab.llvm.org/buildbot/#/builders/190/builds/1829
- https://lab.llvm.org/buildbot/#/builders/193/builds/962
- https://lab.llvm.org/buildbot/#/builders/23/builds/991
- https://lab.llvm.org/buildbot/#/builders/144/builds/2256
- https://lab.llvm.org/buildbot/#/builders/46/builds/1614

These bots have been broken for a day, so reverting to get everything
back to green.
2024-07-14 18:48:54 -07:00
Matt Arsenault
78bc1b64a6
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.

Mostly mechanical, but there are some creative test updates. I preferred
to take the changes as-is in tests where the ABI isn't relevant. In
cases where it's more relevant, or the optimize out logic was too
ingrained in the test, I pre-run the optimization. Some cases manually
add attributes to disable inputs.
2024-07-14 08:36:33 +04:00
Jeffrey Byrnes
5da7179cb3 [AMDGPU] Reland: Add IR LiveReg type-based optimization 2024-07-03 09:26:19 -07:00
Vitaly Buka
3e53c97d33
Revert "[AMDGPU] Add IR LiveReg type-based optimization" (#97138)
Part of #66838.

https://lab.llvm.org/buildbot/#/builders/52/builds/404
https://lab.llvm.org/buildbot/#/builders/55/builds/358
https://lab.llvm.org/buildbot/#/builders/164/builds/518

This reverts commit ded956440739ae326a99cbaef18ce4362e972679.
2024-06-28 23:18:26 -07:00
Jeffrey Byrnes
ded9564407 [AMDGPU] Add IR LiveReg type-based optimization
Change-Id: Ia0d11b79b8302e79247fe193ccabc0dad2d359a0
2024-06-28 15:01:39 -07:00
Nikita Popov
5cd0ba30f5
Reapply [IR] Lazily initialize the class to pass name mapping (NFC) (#96321) (#96462)
On MSVC the `this` uses inside `decltype` require a lambda capture. On
clang they result in an unused capture warning instead. Add the capture
and suppress the warning with `(void)this`.

-----

Initializing this map is somewhat expensive (especially for O0), so we
currently only do it if certain flags are used. I would like to make use
of it for crash dumps (#96078), where we don't know in advance whether
it will be needed or not.

This patch changes the initialization to a lazy approach, where a
callback is registered that does the actual initialization. The
callbacks will be run the first time the pass name is requested.

This way there is no compile-time impact if the mapping is not used.
2024-06-24 15:00:11 +02:00
Nikita Popov
e5a41f0afc Revert "[IR] Lazily initialize the class to pass name mapping (NFC) (#96321)"
My attempt to fix the Windows build made things worse,
revert entirely for now.

This reverts commit e7137f2fed5cfee822ae3c4c6d39188adb59a16c.
This reverts commit 6eaf204dbb0a6a81cddfd02f625c130f7bb1aae5.
This reverts commit 957dc4366dd2ce9d5d2991c3ad76bbf438e9954e.
2024-06-24 10:32:03 +02:00
Nikita Popov
957dc4366d
[IR] Lazily initialize the class to pass name mapping (NFC) (#96321)
Initializing this map is somewhat expensive (especially for O0), so we
currently only do it if certain flags are used. I would like to make use
of it for crash dumps (#96078), where we don't know in advance whether
it will be needed or not.

This patch changes the initialization to a lazy approach, where a
callback is registered that does the actual initialization. The
callbacks will be run the first time the pass name is requested.

This way there is no compile-time impact if the mapping is not used.
2024-06-24 09:40:09 +02:00
vg0204
c2fc7f75f6 Revert "[AMDGPU]Optimize SGPR spills (#93668)"
This reverts commit 4b9112e88a998ce620e4683548f2afd17cc5fe95. A separate
issue(#96353) describing it has been opened to further keep its track.
2024-06-24 12:36:36 +05:30