4708 Commits

Author SHA1 Message Date
Simon Pilgrim
34dc4f24f2 Revert rG939291041bb35b8088e3b61be2b8b3bc950f64a7 "[AMDGPU] Regenerate wave32.ll test checks"
This still breaks buildbots
2021-07-25 15:59:26 +01:00
Simon Pilgrim
939291041b [AMDGPU] Regenerate wave32.ll test checks
To simplify diff in future patch
2021-07-25 15:13:09 +01:00
Simon Pilgrim
54e5ced7e6 [AMDGPU] Regenerate mul24 test checks
To simplify diffs in future patch
2021-07-25 15:13:09 +01:00
Simon Pilgrim
9591abd74e [AMDGPU] Regenerate global-load-saddr-to-vaddr test checks
To simplify diff in future patch
2021-07-25 14:05:10 +01:00
Simon Pilgrim
00e37c1cd4 [AMDGPU] Regenerate ctpop16 test checks
To simplify diff in future patch
2021-07-25 14:05:09 +01:00
Simon Pilgrim
249ef1fa82 [AMDGPU] Regenerate half test checks
To simplify diff in future patch
2021-07-25 14:05:08 +01:00
Simon Pilgrim
97d2277b37 [AMDGPU] Regenerate anyext test checks
To simplify diff in future patch
2021-07-25 14:05:08 +01:00
Kuter Dinel
96709823ec [AMDGPU] Deduce attributes with the Attributor
This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.

Reviewed By: jdoerfert, arsenm

Differential Revision: https://reviews.llvm.org/D104997
2021-07-24 06:07:15 +03:00
Carl Ritson
7d4baf25aa [AMDGPU] Add maximum NSA size limit ISA feature
Add maximum NSA size limit as an ISA feature.
Use this to reduce NSA usage on GFX10.1 to avoid stability issues
with 4 and 5 dwords NSA instructions.
Maintain use of longer NSA instructions on GFX10.3.

Note: this also contains some minor fixes for GlobalISel which
did not work correctly with non-NSA form instructions on GFX10.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D103348
2021-07-23 16:16:06 +09:00
Carl Ritson
6efb3220b4 [AMDGPU] Add VReg_192/VReg_224 support for MIMG instructions
Allow MIMG instructions to be selected with 6/7 VGPRs for vaddr.
Previously these were rounded up to VReg_256 this saves VGPRs.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D103800
2021-07-22 10:42:15 +09:00
Carl Ritson
9dcd75f86f [AMDGPU] Allow frontends to disable null export for pixel shaders
Disable null export (for kills) when a frontend defines a pixel
shader as not exporting using amdgpu-color-export and
amdgpu-depth-export function attrbutes.
This allows the generation of export free pixel shaders.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D105683
2021-07-22 10:20:46 +09:00
Stanislav Mekhanoshin
c54c76037b Prevent dead uses in register coalescer after rematerialization
The coalescer does not check if register uses are available
at the point of rematerialization. If it attempts to rematerialize
an instruction with such uses it can end up with use without a def.

LiveRangeEdit does such check during rematerialization, so just
call LiveRangeEdit::allUsesAvailableAt() to avoid the problem.

Differential Revision: https://reviews.llvm.org/D106396
2021-07-21 15:19:55 -07:00
Stanislav Mekhanoshin
fe197ef9f1 [AMDGPU] Mark relevant rematerializable VOP3 instructions
Differential Revision: https://reviews.llvm.org/D106110
2021-07-21 14:44:13 -07:00
Stanislav Mekhanoshin
9625ca5b60 [AMDGPU] Mark relevant rematerializable VOP2 instructions
Differential Revision: https://reviews.llvm.org/D106023
2021-07-21 14:24:59 -07:00
Stanislav Mekhanoshin
4eb24817ec [AMDGPU] Mark all relevant VOP1 instructions rematerializable
Differential Revision: https://reviews.llvm.org/D105919
2021-07-21 14:05:32 -07:00
Stanislav Mekhanoshin
d01b34ed31 [AMDGPU] Move perfhint analysis
This is SCC pass, moving it to the end of SCC PM saves one
Function PM. This needs the analysis to take into account
memory access width since it is now places after the
load/store optimizer (D105651).

Differential Revision: https://reviews.llvm.org/D105652
2021-07-21 13:06:49 -07:00
Stanislav Mekhanoshin
a397c1c82f [AMDGPU] Tune perfhint analysis to account access width
A function with less memory instructions but wider access
is the same as a function with more but narrower accesses
in terms of memory boundness. In fact the pass would give
different answers before and after vectorization without
this change.

Differential Revision: https://reviews.llvm.org/D105651
2021-07-21 12:46:10 -07:00
Sebastian Neubauer
b642d01fa8 [AMDGPU] Improve killed check for vgpr optimization
The killed flag is not always set. E.g. when a variable is used in a
loop, it is never marked as killed, although it is unused in following
basic blocks. Also, we try to deprecate kill flags and not use them.

Check if the register is live in the endif block. If not, consider it
killed in the then and else blocks.

The vgpr-liverange tests have two new tests with loops
(pre-committed, so the diff is visible).
I also needed to change the subtarget to gfx10.1, otherwise calls
are not working.

Differential Revision: https://reviews.llvm.org/D106291
2021-07-21 15:24:59 +02:00
Sebastian Neubauer
aba1f157ca [AMDGPU] Precommit vgpr-liverange tests 2021-07-21 15:24:59 +02:00
Sebastian Neubauer
2b08f6af62 [AMDGPU] Improve register computation for indirect calls
First, collect the register usage in each function, then apply the
maximum register usage of all functions to functions with indirect
calls.

This is more accurate than guessing the maximum register usage without
looking at the actual usage.

As before, assume that indirect calls will hit a function in the
current module.

Differential Revision: https://reviews.llvm.org/D105839
2021-07-20 13:48:50 +02:00
Jay Foad
0821c8824b [AMDGPU] Pre-commit test case for D106284
This test case shows the scheduler wrongly reordering two buffer
accesses that might alias.
2021-07-20 12:05:33 +01:00
Stanislav Mekhanoshin
9dc2636623 [AMDGPU] Disable LDS lowering for GFX shaders
Apparently these need external LDS symbols to remain.

Fixes: SC1-3279

Differential Revision: https://reviews.llvm.org/D106288
2021-07-20 02:55:25 -07:00
Matt Arsenault
c9ec807b11 CodeGen: Make MachineOptimizationRemarkEmitterPass a CFG analysis
This avoids rerunning it a few times.
2021-07-19 21:08:26 -04:00
Matt Arsenault
67d6132463 GlobalISel: Preserve memory types for implicit sret load/stores 2021-07-19 11:52:42 -04:00
Matt Arsenault
9236125ec8 GlobalISel: Preserve LLT when bitcasting loads and stores
This also avoids improperly legalizing some truncating vector stores.
2021-07-19 11:30:14 -04:00
Simon Pilgrim
5643be96bc [DAG] Enable foldSelectOfBinops on select(setcc(),binop(),binop()) calls 2021-07-18 18:38:59 +01:00
Carl Ritson
c7f2f81f5e [AMDGPU] Tidy SReg/SGPR definitions using template class
Use a multiclass to consistently define SReg/SGPR/TTMP register classes.
Add missing TTMP registers for 96b, 160b, 192b, 224b.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D105800
2021-07-17 11:26:46 +09:00
Matt Arsenault
51f115b078 AMDGPU/GlobalISel: Add a few tests for struct arguments
Test structs with pointers and vectors of pointers since this stresses
a future patch.
2021-07-16 20:20:55 -04:00
Matt Arsenault
27addb85a6 AMDGPU/GlobalISel: Fix some incorrect memory types in tests 2021-07-16 20:20:55 -04:00
Matt Arsenault
3ceb92295e AMDGPU/GlobalISel: Preserve more memory types 2021-07-16 08:57:26 -04:00
Matt Arsenault
21a0ef8d19 AMDGPU/GlobalISel: Redo kernel argument load handling
This avoids relying on G_EXTRACT on unusual types, and also properly
decomposes structs into multiple registers. This also preserves the
LLTs in the memory operands.
2021-07-16 08:56:54 -04:00
Matt Arsenault
a81a7a9ad8 AMDGPU/GlobalISel: Fix incorrect memory types in test 2021-07-15 19:11:40 -04:00
Matt Arsenault
e91da668d0 GlobalISel: Track argument pointeriness with arg flags
Since we're still building on top of the MVT based infrastructure, we
need to track the pointer type/address space on the side so we can end
up with the correct pointer LLTs when interpreting CCValAssigns.
2021-07-15 19:11:40 -04:00
Fangrui Song
aa3df8ddcd [test] Avoid llvm-readelf/llvm-readobj one-dash long options and deprecated aliases (e.g. --file-headers) 2021-07-15 10:26:21 -07:00
Stanislav Mekhanoshin
c46d99e4ba [AMDGPU] Refine -O0 and -O1 passes.
Differential Revision: https://reviews.llvm.org/D105579
2021-07-15 09:51:54 -07:00
Kuter Dinel
a7749c3f79 [AMDGPU] Use update_test_checks.py script for annotate kernel features tests.
This patch makes the annotate kernel features tests use the update_tests_checks.py
script. Which makes it easy to update the tests.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D105864
2021-07-15 03:13:37 +03:00
Stanislav Mekhanoshin
76b7d3432e [AMDGPU] Add TII::isIgnorableUse() to allow VOP rematerialization
Any def of EXEC prevents rematerialization of any VOP instruction
because of the physreg use. Create a callback to check if the
physreg use can be ingored to allow rematerialization.

Differential Revision: https://reviews.llvm.org/D105836
2021-07-14 13:03:58 -07:00
Matt Arsenault
47269da5d8 GlobalISel: Handle lowering non-power-of-2 extloads 2021-07-14 11:54:11 -04:00
Jay Foad
372bb08252 [AMDGPU] Check llc-pipeline.ll with -match-full-lines -strict-whitespace
This prevents breaking the indentation that shows the structure of the
pass managers.

Differential Revision: https://reviews.llvm.org/D105891
2021-07-14 16:33:50 +01:00
Djordje Todorovic
df686842bc [RemoveRedundantDebugValues] Add a Pass that removes redundant DBG_VALUEs
This new MIR pass removes redundant DBG_VALUEs.

After the register allocator is done, more precisely, after
the Virtual Register Rewriter, we end up having duplicated
DBG_VALUEs, since some virtual registers are being rewritten
into the same physical register as some of existing DBG_VALUEs.
Each DBG_VALUE should indicate (at least before the LiveDebugValues)
variables assignment, but it is being clobbered for function
parameters during the SelectionDAG since it generates new DBG_VALUEs
after COPY instructions, even though the parameter has no assignment.
For example, if we had a DBG_VALUE $regX as an entry debug value
representing the parameter, and a COPY and after the COPY,
DBG_VALUE $virt_reg, and after the virtregrewrite the $virt_reg gets
rewritten into $regX, we'd end up having redundant DBG_VALUE.

This breaks the definition of the DBG_VALUE since some analysis passes
might be built on top of that premise..., and this patch tries to fix
the MIR with the respect to that.

This first patch performs bacward scan, by trying to detect a sequence of
consecutive DBG_VALUEs, and to remove all DBG_VALUEs describing one
variable but the last one:

For example:

(1) DBG_VALUE $edi, !"var1", ...
(2) DBG_VALUE $esi, !"var2", ...
(3) DBG_VALUE $edi, !"var1", ...
 ...

in this case, we can remove (1).

By combining the forward scan that will be introduced in the next patch
(from this stack), by inspecting the statistics, the RemoveRedundantDebugValues
removes 15032 instructions by using gdb-7.11 as a testbed.

Differential Revision: https://reviews.llvm.org/D105279
2021-07-14 04:29:42 -07:00
Sebastian Neubauer
4359b870b1 [AMDGPU] Init scratch only if necessary
If no scratch or flat instructions are used, we do not need to
initialize the flat scratch hardware register.

Differential Revision: https://reviews.llvm.org/D105920
2021-07-14 10:45:22 +02:00
Sebastian Neubauer
a12e551882 [AMDGPU] Precommit flat-scratch-init.ll test 2021-07-14 10:45:22 +02:00
Ruiling Song
d9b9fdd91b [AMDGPU] Don't handle export done when unify exit nodes
This patch aims to revert the changes introduced by D70781 D71192 D76364

D70781 was introduced to fix hardware hang where we do not insert exp-
null-done for a kill inside infinit loop. At that time we have not added
exp-null-done for kill early termination, but I believe as for now, we will
always add the exp-null-done for early termination case in LaterBranchLowering.

D71192 was introduced to handle the only_kill case, which is also been
handled by the kill early termination work.

D76364 was used to fix a regression by D71192, where we cleared the done
bit of the export in the existing program and not let the normal return
block branching to the new unified return block.

With this change, we just trust frontends have setup exp-done correctly
which is true for all existing frontends. The backend only inserts
exp-null-done for the kill cases which is handled in SILateBranchLowering.cpp.

Reviewed by: critson

Differential Revision: https://reviews.llvm.org/D105610
2021-07-14 14:54:37 +08:00
Ruiling Song
1d9585c8c1 [NFC][AMDGPU] autogenerate kill-infinite-loop.ll checks
This would help us to track the assembly changes to these tests.

Reviewed by: foad

Differential Revision: https://reviews.llvm.org/D105609
2021-07-14 14:52:31 +08:00
Ruiling Song
40e3df2a1b [RegisterCoalescer] Resolve conflict based on liveness of subregister
Currently we are resolving lane/subregister conflict by visiting
instructions sequentially in current block to see whether there is any
use of the tainted lanes. To save compile time, we are not doing further
check in successor blocks. This sounds reasonable without subgregister liveness.

But since we have added subregister liveness tracking capability to
register coalescer, we can easily determine whether we have subregister
liveness conflict by checking subranges. This would help coalescing more
COPYs for target that enables subregister liveness tracking.

Reviewed by: arsenm, qcolombet

Differential Revision: https://reviews.llvm.org/D104509
2021-07-14 14:43:22 +08:00
Matt Arsenault
3191ac27e3 AMDGPU: Try to fix test failure with EXPENSIVE_CHECKS
The machine verifier is enabled by default for EXPENSIVE_CHECKS, so
the pass runs of it would pollute the output here.
2021-07-13 19:34:14 -04:00
Matt Arsenault
eebe841a47 RegAlloc: Allow targets to split register allocation
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.

Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run.  This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.

This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.

In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.

One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.

Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.
2021-07-13 18:49:29 -04:00
Matt Arsenault
222fde1eec GlobalISel: Use extension instead of merge with undef in common case
This fixes not respecting signext/zeroext in these cases. In the
anyext case, this avoids a larger merge with undef and should be a
better canonical form.

This should also handle this if a merge is needed, but I'm not aware
of a case where that can happen. In a future change this will also
allow AMDGPU to drop some custom code without introducing regressions.
2021-07-13 11:04:47 -04:00
Sebastian Neubauer
ad2c66ec5d [AMDGPU] Optimize VGPR LiveRange in waterfall loops
The loops are run exactly once per lane, so VGPRs do not need to be
saved. Use the SIOptimizeVGPRLiveRange pass to add phi nodes that take
undef when coming from the loop.

There is still a shortcoming:
Return values from a function call in the loop are copied because their
live range conflicts with the live range of arguments, even if arguments
are only IMPLICIT_DEF after the phi insertion.

Differential Revision: https://reviews.llvm.org/D105192
2021-07-13 12:15:08 +02:00
Sebastian Neubauer
9d72c0ad43 [AMDGPU] Mark waterfall loops as SI_WATERFALL_LOOP
This way, they can be detected later, e.g. by the
SIOptimizeVGPRLiveRange pass.

Differential Revision: https://reviews.llvm.org/D105467
2021-07-13 12:15:08 +02:00