This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.
Reviewed By: jdoerfert, arsenm
Differential Revision: https://reviews.llvm.org/D104997
Add maximum NSA size limit as an ISA feature.
Use this to reduce NSA usage on GFX10.1 to avoid stability issues
with 4 and 5 dwords NSA instructions.
Maintain use of longer NSA instructions on GFX10.3.
Note: this also contains some minor fixes for GlobalISel which
did not work correctly with non-NSA form instructions on GFX10.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D103348
Allow MIMG instructions to be selected with 6/7 VGPRs for vaddr.
Previously these were rounded up to VReg_256 this saves VGPRs.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D103800
Disable null export (for kills) when a frontend defines a pixel
shader as not exporting using amdgpu-color-export and
amdgpu-depth-export function attrbutes.
This allows the generation of export free pixel shaders.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D105683
The coalescer does not check if register uses are available
at the point of rematerialization. If it attempts to rematerialize
an instruction with such uses it can end up with use without a def.
LiveRangeEdit does such check during rematerialization, so just
call LiveRangeEdit::allUsesAvailableAt() to avoid the problem.
Differential Revision: https://reviews.llvm.org/D106396
This is SCC pass, moving it to the end of SCC PM saves one
Function PM. This needs the analysis to take into account
memory access width since it is now places after the
load/store optimizer (D105651).
Differential Revision: https://reviews.llvm.org/D105652
A function with less memory instructions but wider access
is the same as a function with more but narrower accesses
in terms of memory boundness. In fact the pass would give
different answers before and after vectorization without
this change.
Differential Revision: https://reviews.llvm.org/D105651
The killed flag is not always set. E.g. when a variable is used in a
loop, it is never marked as killed, although it is unused in following
basic blocks. Also, we try to deprecate kill flags and not use them.
Check if the register is live in the endif block. If not, consider it
killed in the then and else blocks.
The vgpr-liverange tests have two new tests with loops
(pre-committed, so the diff is visible).
I also needed to change the subtarget to gfx10.1, otherwise calls
are not working.
Differential Revision: https://reviews.llvm.org/D106291
First, collect the register usage in each function, then apply the
maximum register usage of all functions to functions with indirect
calls.
This is more accurate than guessing the maximum register usage without
looking at the actual usage.
As before, assume that indirect calls will hit a function in the
current module.
Differential Revision: https://reviews.llvm.org/D105839
This avoids relying on G_EXTRACT on unusual types, and also properly
decomposes structs into multiple registers. This also preserves the
LLTs in the memory operands.
Since we're still building on top of the MVT based infrastructure, we
need to track the pointer type/address space on the side so we can end
up with the correct pointer LLTs when interpreting CCValAssigns.
This patch makes the annotate kernel features tests use the update_tests_checks.py
script. Which makes it easy to update the tests.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D105864
Any def of EXEC prevents rematerialization of any VOP instruction
because of the physreg use. Create a callback to check if the
physreg use can be ingored to allow rematerialization.
Differential Revision: https://reviews.llvm.org/D105836
This new MIR pass removes redundant DBG_VALUEs.
After the register allocator is done, more precisely, after
the Virtual Register Rewriter, we end up having duplicated
DBG_VALUEs, since some virtual registers are being rewritten
into the same physical register as some of existing DBG_VALUEs.
Each DBG_VALUE should indicate (at least before the LiveDebugValues)
variables assignment, but it is being clobbered for function
parameters during the SelectionDAG since it generates new DBG_VALUEs
after COPY instructions, even though the parameter has no assignment.
For example, if we had a DBG_VALUE $regX as an entry debug value
representing the parameter, and a COPY and after the COPY,
DBG_VALUE $virt_reg, and after the virtregrewrite the $virt_reg gets
rewritten into $regX, we'd end up having redundant DBG_VALUE.
This breaks the definition of the DBG_VALUE since some analysis passes
might be built on top of that premise..., and this patch tries to fix
the MIR with the respect to that.
This first patch performs bacward scan, by trying to detect a sequence of
consecutive DBG_VALUEs, and to remove all DBG_VALUEs describing one
variable but the last one:
For example:
(1) DBG_VALUE $edi, !"var1", ...
(2) DBG_VALUE $esi, !"var2", ...
(3) DBG_VALUE $edi, !"var1", ...
...
in this case, we can remove (1).
By combining the forward scan that will be introduced in the next patch
(from this stack), by inspecting the statistics, the RemoveRedundantDebugValues
removes 15032 instructions by using gdb-7.11 as a testbed.
Differential Revision: https://reviews.llvm.org/D105279
If no scratch or flat instructions are used, we do not need to
initialize the flat scratch hardware register.
Differential Revision: https://reviews.llvm.org/D105920
This patch aims to revert the changes introduced by D70781 D71192 D76364
D70781 was introduced to fix hardware hang where we do not insert exp-
null-done for a kill inside infinit loop. At that time we have not added
exp-null-done for kill early termination, but I believe as for now, we will
always add the exp-null-done for early termination case in LaterBranchLowering.
D71192 was introduced to handle the only_kill case, which is also been
handled by the kill early termination work.
D76364 was used to fix a regression by D71192, where we cleared the done
bit of the export in the existing program and not let the normal return
block branching to the new unified return block.
With this change, we just trust frontends have setup exp-done correctly
which is true for all existing frontends. The backend only inserts
exp-null-done for the kill cases which is handled in SILateBranchLowering.cpp.
Reviewed by: critson
Differential Revision: https://reviews.llvm.org/D105610
Currently we are resolving lane/subregister conflict by visiting
instructions sequentially in current block to see whether there is any
use of the tainted lanes. To save compile time, we are not doing further
check in successor blocks. This sounds reasonable without subgregister liveness.
But since we have added subregister liveness tracking capability to
register coalescer, we can easily determine whether we have subregister
liveness conflict by checking subranges. This would help coalescing more
COPYs for target that enables subregister liveness tracking.
Reviewed by: arsenm, qcolombet
Differential Revision: https://reviews.llvm.org/D104509
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.
Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run. This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.
This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.
In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.
One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.
Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.
This fixes not respecting signext/zeroext in these cases. In the
anyext case, this avoids a larger merge with undef and should be a
better canonical form.
This should also handle this if a merge is needed, but I'm not aware
of a case where that can happen. In a future change this will also
allow AMDGPU to drop some custom code without introducing regressions.
The loops are run exactly once per lane, so VGPRs do not need to be
saved. Use the SIOptimizeVGPRLiveRange pass to add phi nodes that take
undef when coming from the loop.
There is still a shortcoming:
Return values from a function call in the loop are copied because their
live range conflicts with the live range of arguments, even if arguments
are only IMPLICIT_DEF after the phi insertion.
Differential Revision: https://reviews.llvm.org/D105192