NFC, since no instructions have their AsmMatchConverter
changed, but prepares for that to happen.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103046
Change-Id: I6afefad899076de7b9a412374d09b95b29e012fa
NFC. Renames the variable in the dpp input operand generators
from DstRC to OldRC, because that is what it actually sets.
Also documents the importance of setting HasModifiers = 0 in the
dpp8 asm string.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D103047
Change-Id: Ice69ae38f644de7f228a75ca47c43e88b1f7d9e1
I do not see any practical difference but technically
used.* variables are internal and a call to getGlobalVariable
misses true as a second argument. NFC as far as I can tell.
Differential Revision: https://reviews.llvm.org/D102884
Accesses to global module LDS variable start from null,
but kernel also thinks its variables start address is
null. Fixed by not using a null as an address.
Differential Revision: https://reviews.llvm.org/D102882
Set the output register class based on the output type, instead of
hard-coding VGPR_32. I think this is more correct. It doesn't make any
difference at the moment because we use the same class for 16- and
32-bit results, but it might in future if we make more use of true
16-bit register classes.
Differential Revision: https://reviews.llvm.org/D102622
The FixSGPRCopies pass converts instructions to VALU when
removing illegal VGPR to SGPR copies. Instructions that use SCC
are changed to use VCC instead. When that happens, the pass must
also change instructions that define SCC to define VCC.
The pass was not changing the SCC definition when an ADDC is
converted due to a input that is a VGPR to SGPR copy. But, the
initial ADD insruction, which define SCC, is not converted.
This causes a compilation failure due to a use of an undefined
physical register.
This patch adds code that inserts the SCC definition in the
MoveToVALU worklist when a SCC use is converted to a VCC use.
Differential Revision: https://reviews.llvm.org/D102111
moveOperands does not handle moving tied operands since it would
generally have to fixup the tied operand references. Avoid the assert
by untying and retying after the modification. These in place
modifications really aren't managable.
A consequence is that checkInstOffsetsDoNotOverlap can now distinguish
sp+offset from fp+offset, so it knows that it shouldn't try to work out
whether the accesses overlap just by comparing the offsets. For example
in these two instructions:
MIR:
BUFFER_STORE_DWORD_OFFSET %0:vgpr_32(s32), $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into stack + 4, addrspace 5)
%4:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0.alloca, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from `i8 addrspace(5)* undef`, addrspace 5)
ISA:
buffer_store_dword v0, off, s[0:3], s32 offset:4
buffer_load_dword v0, off, s[0:3], s34
Differential Revision: https://reviews.llvm.org/D73957
For gfx10 gradient (g16) and address (a16) can be independent. Previous
implementation assumed that a16 implied g16.
There are some other changes that fix the verification (as well as asm/disasm)
that are required for the included test to pass - the XFAIL will be removed in
those changes.
This also includes required fixes for GlobalISel
Differential Revision: https://reviews.llvm.org/D102066
Change-Id: I7d171cc90994de05f41669b66a6d0ffa2ed05d09
A16 support for image instructions assembly/disassembly (gfx10) was missing
Also refactor MIMG op addr size calcs to common function
We'd got 3 places where the same operation was being done.
One test is now marked XFAIL until a related codegen patch is in place
Differential Revision: https://reviews.llvm.org/D102231
Change-Id: I7e86e730ef8c71901457855cba570581f4f576bb
Previously we were allowing to use FP atomics without
-amdgpu-unsafe-fp-atomics option if a scope is less then
system. This is not safe just as well if we have UC memory.
This change only allows global and flat FP atomics with
the unsafe option. Consequentially that makes a check for
denorm mode redundant since we skip it with the unsafe
option and do not have a way to produce these instructions
without it anyway.
Differential Revision: https://reviews.llvm.org/D102347
This patch disables the SIFormMemoryClauses pass at -O1. This pass has a
significant impact on compilation time, so we only want it to be enabled
starting from -O2.
Differential Revision: https://reviews.llvm.org/D101939
getVectorNumElements() returns a value for scalable vectors
without any warning so it is effectively getVectorMinNumElements().
By renaming it and making getVectorNumElements() forward to
it, we can insert a check for scalable vectors into getVectorNumElements()
similar to EVT. I didn't do that in this patch because there are still more
fixes needed, but I was able to temporarily do it and passed the RISCV
lit tests with these changes.
The changes to isPow2VectorType and getPow2VectorType are copied from EVT.
The change to TypeInfer::EnforceSameNumElts reduces the size of AArch64's isel table.
We're now considering SameNumElts to require the scalable property to match which
removes some unneeded type checks.
This was motivated by the bug I fixed yesterday in 80b9510806cf11c57f2dd87191d3989fc45defa8
Reviewed By: frasercrmck, sdesmalen
Differential Revision: https://reviews.llvm.org/D102262
Improve the code generation of build_vector.
Use the v_pack_b32_f16 instruction instead of
v_and_b32 + v_lshl_or_b32
Differential Revision: https://reviews.llvm.org/D98081
Patch by Julien Pagès!
No need to handle invariant loads when avoiding WAR conflicts, as
there cannot be a vector store to the same memory location.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D101177
Currently the ValueHandler handles both selecting the type and
location for arguments, as well as inserting instructions needed to
handle them. Split this so that the determination of the argument
handling is independent of the function state. Currently the checks
for tail call compatibility do not follow the full assignment logic,
so it misses cases where arguments require nontrivial legalization.
This should help avoid targets ending up in a buggy state where the
argument evaluation may change in different contexts.
The waitcnt pass would increment the number of vmem events for some buffer
invalidates that were not handled by the pass.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D102252
Moving code sinking pass before structurizer creates more sinking
opportunities.
The extra flow edges introduced by the structurizer can have adverse
effects on sinking, because the sinking pass prefers moving instructions
to blocks with unique predecessors and the structurizer destroys that
property in some cases.
A notable example is moving high-latency image instructions across kills.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D101115
Large loads on target that does not useFlatForGlobal have to be split
in regbankselect. This did not happen in case when destination had vgpr
bank and address had sgpr bank.
Instead of checking if address bank is sgpr check bank of the destination.
Differential Revision: https://reviews.llvm.org/D101992
Printing pass manager invocations is fairly verbose and not super
useful.
This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.
This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D101797
gfx9 does not work with negative offsets, gfx10 works only with
aligned negative offsets, but not with unaligned negative offsets.
This is slightly more conservative than needed, gfx9 does support
negative offsets when a VGPR address is used and gfx10 supports
negative, unaligned offsets when an SGPR address is used, but we
do not make use of that with this patch.
Differential Revision: https://reviews.llvm.org/D101292
Retrying after revert and fix (removed implicit def flag from operand). Now
passes with expensive_checks enabled.
Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.
Differential Revision: https://reviews.llvm.org/D101830
Change-Id: Ie3b8b2921237968caca91527dd0c97b1b0cc0360
Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts.
Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication.
I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to).
NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly.
Differential Revision: https://reviews.llvm.org/D101987
Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.
Differential Revision: https://reviews.llvm.org/D101830
Change-Id: Id8de5566b0d1a07a814e2e7db016df9d20bf6d2c
Serialize ScavengeFI from SIMachineFunctionInfo into yaml.
ScavengeFI is not used outside of the PrologEpilogInserter,
so this shouldn't change anything.
Differential Revision: https://reviews.llvm.org/D101367
AMDGPUAsmParser::isSupportedDPPCtrl() was failing to correctly
find a DPP register operand, regadless of the position it is
always src0. Moved this check into a new validateDPP() method
where we have full instruction already. In particular it was
failing to reject this case:
v_cvt_u32_f64 v5, v[0:1] quad_perm:[0,2,1,1] row_mask:0xf bank_mask:0xf
Essentially it was broken for any case where size of dst and
src0 differ.
It also improves the diagnostics with a proper error message.
The check in the InstPrinter also drops verification of the dst
register as it does not have anything to do with the dpp operand.
Differential Revision: https://reviews.llvm.org/D101930
Instruction test for inactive kill/demote needs to be based on
actual opcode not whether instruction would be lowered to demote.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D101966
First clean up the strange API of tryConstantFoldOp where it took an
immediate operand value, but no indication of which operand it was the
value for.
Second clean up the loop that calls tryConstantFoldOp so that it does
not have to restart from the beginning every time it folds an
instruction.
This is NFCI but there are some minor changes caused by the order in
which things are folded.
Differential Revision: https://reviews.llvm.org/D100031
This shall speedup compilation and also remove threshold
limitations used by memory dependency analysis.
It also seem to fix the bug in the coalescer_remat.ll
where an SMRD load was used in presence of a potentially
clobbering store.
Fixes: SWDEV-272132
Differential Revision: https://reviews.llvm.org/D101962
Preexisting waitcnt may not update the scoreboard if the instruction
being examined needed to wait on fewer counters than what was encoded in
the old waitcnt instruction. Fixing this results in the elimination of
some redudnat waitcnt.
These changes also enable combining consecutive waitcnt into a single
S_WAITCNT or S_WAITCNT_VSCNT instruction.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D100281