6209 Commits

Author SHA1 Message Date
Joe Nash
b67ea3d0c9 [AMDGPU] Allow no-modifier operands in cvtDPP
NFC, since no instructions have their AsmMatchConverter
changed, but prepares for that to happen.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D103046

Change-Id: I6afefad899076de7b9a412374d09b95b29e012fa
2021-05-25 10:58:06 -04:00
Joe Nash
67c3707b31 [AMDGPU] More accurate names for dpp operand types
NFC. Renames the variable in the dpp input operand generators
from DstRC to OldRC, because that is what it actually sets.

Also documents the importance of setting HasModifiers = 0 in the
dpp8 asm string.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D103047

Change-Id: Ice69ae38f644de7f228a75ca47c43e88b1f7d9e1
2021-05-25 10:35:25 -04:00
Christudasan Devadasan
e3b8e6d482 [AMDGPU] Remove dead declaration (NFC). 2021-05-25 16:04:04 +05:30
Christudasan Devadasan
90d784053f AMDGPU/GlobalISel: Legalize G_[SU]DIVREM instructions
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D100726
2021-05-25 10:51:07 +05:30
Stanislav Mekhanoshin
4902885863 [AMDGPU] Request module used variables from LDS lowering as internal
I do not see any practical difference but technically
used.* variables are internal and a call to getGlobalVariable
misses true as a second argument. NFC as far as I can tell.

Differential Revision: https://reviews.llvm.org/D102884
2021-05-20 20:55:47 -07:00
Stanislav Mekhanoshin
748db5bfac [AMDGPU] Fix module LDS selection
Accesses to global module LDS variable start from null,
but kernel also thinks its variables start address is
null. Fixed by not using a null as an address.

Differential Revision: https://reviews.llvm.org/D102882
2021-05-20 15:59:01 -07:00
Jay Foad
092a3ce569 [AMDGPU] Fix typo in comment 2021-05-18 10:15:49 +01:00
Stanislav Mekhanoshin
45764efb69 [AMDGPU] Do not check denorm for LDS FP atomic with unsafe flag
This is already how it is handled for global and flat atomics.

Differential Revision: https://reviews.llvm.org/D102366
2021-05-17 16:53:09 -07:00
Stanislav Mekhanoshin
f4c0fdc6c9 [AMDGPU] Set unused dst_sel to '?' in the encoding
This is to allow disasm with any bits in the unused fields.

Differential Revision: https://reviews.llvm.org/D102526
2021-05-17 08:38:52 -07:00
Jay Foad
472f856714 [AMDGPU] Tweak VOP3_INTERP16 profile
Set the output register class based on the output type, instead of
hard-coding VGPR_32. I think this is more correct. It doesn't make any
difference at the moment because we use the same class for 16- and
32-bit results, but it might in future if we make more use of true
16-bit register classes.

Differential Revision: https://reviews.llvm.org/D102622
2021-05-17 15:28:00 +01:00
Brendon Cahoon
3f7b7e7393 [AMDGPU] Update SCC defs to VCC when uses are changed to VCC
The FixSGPRCopies pass converts instructions to VALU when
removing illegal VGPR to SGPR copies. Instructions that use SCC
are changed to use VCC instead. When that happens, the pass must
also change instructions that define SCC to define VCC.

The pass was not changing the SCC definition when an ADDC is
converted due to a input that is a VGPR to SGPR copy. But, the
initial ADD insruction, which define SCC, is not converted.
This causes a compilation failure due to a use of an undefined
physical register.

This patch adds code that inserts the SCC definition in the
MoveToVALU worklist when a SCC use is converted to a VCC use.

Differential Revision: https://reviews.llvm.org/D102111
2021-05-14 18:05:05 -04:00
Stanislav Mekhanoshin
6fb02596a2 [AMDGPU] Add support for architected flat scratch
Add support for the readonly flat Scratch register initialized
by the SPI.

Differential Revision: https://reviews.llvm.org/D102432
2021-05-14 10:53:48 -07:00
Matt Arsenault
c7cff08f79 AMDGPU: Fix assert when rewriting saddr d16 loads
moveOperands does not handle moving tied operands since it would
generally have to fixup the tied operand references. Avoid the assert
by untying and retying after the modification. These in place
modifications really aren't managable.
2021-05-14 13:24:19 -04:00
Jay Foad
7f81c5a5ba [AMDGPU] getMemOperandsWithOffset: add vaddr operand for stack access BUF instructions
A consequence is that checkInstOffsetsDoNotOverlap can now distinguish
sp+offset from fp+offset, so it knows that it shouldn't try to work out
whether the accesses overlap just by comparing the offsets. For example
in these two instructions:

MIR:
BUFFER_STORE_DWORD_OFFSET %0:vgpr_32(s32), $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into stack + 4, addrspace 5)
%4:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0.alloca, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from `i8 addrspace(5)* undef`, addrspace 5)

ISA:
buffer_store_dword v0, off, s[0:3], s32 offset:4
buffer_load_dword v0, off, s[0:3], s34

Differential Revision: https://reviews.llvm.org/D73957
2021-05-14 10:10:43 +01:00
David Stuttard
31b62aa162 [AMDGPU] Fix codegen of image intrinsics for g16 and a16
For gfx10 gradient (g16) and address (a16) can be independent. Previous
implementation assumed that a16 implied g16.

There are some other changes that fix the verification (as well as asm/disasm)
that are required for the included test to pass - the XFAIL will be removed in
those changes.

This also includes required fixes for GlobalISel

Differential Revision: https://reviews.llvm.org/D102066

Change-Id: I7d171cc90994de05f41669b66a6d0ffa2ed05d09
2021-05-14 09:28:15 +01:00
David Stuttard
72d570ca08 [AMDGPU][AsmParser/Disassembler] Correct A16 and G16 handling
A16 support for image instructions assembly/disassembly (gfx10) was missing

Also refactor MIMG op addr size calcs to common function

We'd got 3 places where the same operation was being done.

One test is now marked XFAIL until a related codegen patch is in place

Differential Revision: https://reviews.llvm.org/D102231

Change-Id: I7e86e730ef8c71901457855cba570581f4f576bb
2021-05-14 09:25:44 +01:00
Carl Ritson
9cf6ff7aff [AMDGPU] Do not clause NSA instructions
To ensure correct behaviour NSA instructions should not be claused.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D102211
2021-05-14 12:54:56 +09:00
Matt Arsenault
85394d9ed7 AMDGPU/GlobalISel: Don't hardcode stack alignment in assert message 2021-05-13 19:00:13 -04:00
Matt Arsenault
6a70874d27 AMDGPU/GlobalISel: Implement tail calls
Or at least the sibling call cases which the DAG already handles.
2021-05-13 18:57:42 -04:00
Aakanksha Patil
464e4dc50f [AMDGPU] Add gfx1034 target
Differential Revision: https://reviews.llvm.org/D102306
2021-05-13 14:25:18 -04:00
Stanislav Mekhanoshin
8f98356bb5 [AMDGPU] Only allow global fp atomics with unsafe option
Previously we were allowing to use FP atomics without
-amdgpu-unsafe-fp-atomics option if a scope is less then
system. This is not safe just as well if we have UC memory.

This change only allows global and flat FP atomics with
the unsafe option. Consequentially that makes a check for
denorm mode redundant since we skip it with the unsafe
option and do not have a way to produce these instructions
without it anyway.

Differential Revision: https://reviews.llvm.org/D102347
2021-05-13 08:52:20 -07:00
Stanislav Mekhanoshin
bd00106d1e [AMDGPU] Refactor shouldExpandAtomicRMWInIR(). NFC.
This is logic simplification for better readability.

Differential Revision: https://reviews.llvm.org/D102371
2021-05-12 16:39:03 -07:00
Baptiste Saleil
5885f1a4cb [AMDGPU] Disable the SIFormMemoryClauses pass at -O1
This patch disables the SIFormMemoryClauses pass at -O1. This pass has a
significant impact on compilation time, so we only want it to be enabled
starting from -O2.

Differential Revision: https://reviews.llvm.org/D101939
2021-05-12 11:51:59 -04:00
Craig Topper
44e0e91db0 [ValueTypes] Rename MVT::getVectorNumElements() to MVT::getVectorMinNumElements(). Fix some misuses of getVectorNumElements()
getVectorNumElements() returns a value for scalable vectors
without any warning so it is effectively getVectorMinNumElements().
By renaming it and making getVectorNumElements() forward to
it, we can insert a check for scalable vectors into getVectorNumElements()
similar to EVT. I didn't do that in this patch because there are still more
fixes needed, but I was able to temporarily do it and passed the RISCV
lit tests with these changes.

The changes to isPow2VectorType and getPow2VectorType are copied from EVT.

The change to TypeInfer::EnforceSameNumElts reduces the size of AArch64's isel table.
We're now considering SameNumElts to require the scalable property to match which
removes some unneeded type checks.

This was motivated by the bug I fixed yesterday in 80b9510806cf11c57f2dd87191d3989fc45defa8

Reviewed By: frasercrmck, sdesmalen

Differential Revision: https://reviews.llvm.org/D102262
2021-05-12 07:46:45 -07:00
Julien Pagès
46adccc5cc [AMDGPU] Improve Codegen for build_vector
Improve the code generation of build_vector.
Use the v_pack_b32_f16 instruction instead of
v_and_b32 + v_lshl_or_b32

Differential Revision: https://reviews.llvm.org/D98081

Patch by Julien Pagès!
2021-05-12 14:17:44 +01:00
Piotr Sobczak
a4db7025a9 [AMDGPU] Remove assert
Remove assert introduced in D101177, following post-commit feedback.
2021-05-12 14:52:37 +02:00
Piotr Sobczak
68137ef568 [AMDGPU] Skip invariant loads when avoiding WAR conflicts
No need to handle invariant loads when avoiding WAR conflicts, as
there cannot be a vector store to the same memory location.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D101177
2021-05-12 10:57:05 +02:00
Matt Arsenault
cc79aaced0 AMDGPU: Fix SILoadStoreOptimizer for gfx90a
This was hardcoding the register class to use for the newly created
pointer registers, violating the aligned VGPR requirement.
2021-05-11 21:26:43 -04:00
Matt Arsenault
a15ed701ab AMDGPU: Fix assert on constant load from addrspacecasted pointer
This was trying to create a bitcast between different address spaces.
2021-05-11 20:12:20 -04:00
Matt Arsenault
24e2e5df0e GlobalISel: Split ValueHandler into assignment and emission classes
Currently the ValueHandler handles both selecting the type and
location for arguments, as well as inserting instructions needed to
handle them. Split this so that the determination of the argument
handling is independent of the function state. Currently the checks
for tail call compatibility do not follow the full assignment logic,
so it misses cases where arguments require nontrivial legalization.

This should help avoid targets ending up in a buggy state where the
argument evaluation may change in different contexts.
2021-05-11 19:50:12 -04:00
Austin Kerbow
4433f4601e [AMDGPU] Fix extra waitcnt being added with BUFFER_INVL2
The waitcnt pass would increment the number of vmem events for some buffer
invalidates that were not handled by the pass.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D102252
2021-05-11 13:17:33 -07:00
Aakanksha Patil
c58912eca7 Fix typo "Execpt" in comments
Differential Revision: https://reviews.llvm.org/D101858
2021-05-11 10:47:01 -04:00
Piotr Sobczak
09fe84abb4 [AMDGPU] Move code sinking before structurizer
Moving code sinking pass before structurizer creates more sinking
opportunities.

The extra flow edges introduced by the structurizer can have adverse
effects on sinking, because the sinking pass prefers moving instructions
to blocks with unique predecessors and the structurizer destroys that
property in some cases.

A notable example is moving high-latency image instructions across kills.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D101115
2021-05-11 14:07:23 +02:00
Petar Avramovic
f6985a197e AMDGPU/GlobalISel: Use destination register bank in applyMappingLoad
Large loads on target that does not useFlatForGlobal have to be split
in regbankselect. This did not happen in case when destination had vgpr
bank and address had sgpr bank.
Instead of checking if address bank is sgpr check bank of the destination.

Differential Revision: https://reviews.llvm.org/D101992
2021-05-10 10:18:30 +02:00
Arthur Eubanks
34a8a437bf [NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose
Printing pass manager invocations is fairly verbose and not super
useful.

This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.

This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D101797
2021-05-07 21:51:47 -07:00
Sebastian Neubauer
13c0316239 [AMDGPU] Restrict immediate scratch offsets
gfx9 does not work with negative offsets, gfx10 works only with
aligned negative offsets, but not with unaligned negative offsets.

This is slightly more conservative than needed, gfx9 does support
negative offsets when a VGPR address is used and gfx10 supports
negative, unaligned offsets when an SGPR address is used, but we
do not make use of that with this patch.

Differential Revision: https://reviews.llvm.org/D101292
2021-05-07 14:51:32 +02:00
David Stuttard
606d4e8061 AMDGPU: Correct const_index_stride for wave 32 for PAL ABI
Retrying after revert and fix (removed implicit def flag from operand). Now
passes with expensive_checks enabled.

Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.

Differential Revision: https://reviews.llvm.org/D101830

Change-Id: Ie3b8b2921237968caca91527dd0c97b1b0cc0360
2021-05-07 13:42:57 +01:00
Simon Pilgrim
280aa3415e [DAG] Add a generic expansion for SHIFT_PARTS opcodes using funnel shifts
Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts.

Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication.

I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to).

NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly.

Differential Revision: https://reviews.llvm.org/D101987
2021-05-07 13:12:30 +01:00
David Stuttard
793b4b2603 Revert "AMDGPU: Correct const_index_stride for wave 32 for PAL ABI"
This reverts commit 442de0c1adf36bfddb5fb66b442bba8999fa733b.
2021-05-07 12:49:17 +01:00
David Stuttard
442de0c1ad AMDGPU: Correct const_index_stride for wave 32 for PAL ABI
Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.

Differential Revision: https://reviews.llvm.org/D101830

Change-Id: Id8de5566b0d1a07a814e2e7db016df9d20bf6d2c
2021-05-07 12:19:49 +01:00
Sebastian Neubauer
98e5ede604 [AMDGPU] Serialize MFInfo::ScavengeFI
Serialize ScavengeFI from SIMachineFunctionInfo into yaml.

ScavengeFI is not used outside of the PrologEpilogInserter,
so this shouldn't change anything.

Differential Revision: https://reviews.llvm.org/D101367
2021-05-07 11:15:25 +02:00
Stanislav Mekhanoshin
c714d03785 [AMDGPU] Expose __builtin_amdgcn_perm for v_perm_b32
Differential Revision: https://reviews.llvm.org/D102022
2021-05-06 16:17:33 -07:00
Stanislav Mekhanoshin
28f1d018b1 [AMDGPU] Fix 64 bit DPP validation
AMDGPUAsmParser::isSupportedDPPCtrl() was failing to correctly
find a DPP register operand, regadless of the position it is
always src0. Moved this check into a new validateDPP() method
where we have full instruction already. In particular it was
failing to reject this case:

v_cvt_u32_f64 v5, v[0:1] quad_perm:[0,2,1,1] row_mask:0xf bank_mask:0xf

Essentially it was broken for any case where size of dst and
src0 differ.

It also improves the diagnostics with a proper error message.

The check in the InstPrinter also drops verification of the dst
register as it does not have anything to do with the dpp operand.

Differential Revision: https://reviews.llvm.org/D101930
2021-05-06 08:40:26 -07:00
Austin Kerbow
172d746e16 [AMDGPU][NFC] Fix typos in SIFormMemoryClauses description
NFC.
2021-05-06 07:47:39 -07:00
Jay Foad
9e026273b0 [AMDGPU] SIInsertHardClauses: move more stuff into the class. NFC. 2021-05-06 14:47:54 +01:00
Carl Ritson
67cfefebbb [AMDGPU] Fix WQM failure with single block inactive demote
Instruction test for inactive kill/demote needs to be based on
actual opcode not whether instruction would be lowered to demote.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D101966
2021-05-06 21:02:26 +09:00
Jay Foad
7c706af03b [AMDGPU] SIFoldOperands: clean up tryConstantFoldOp
First clean up the strange API of tryConstantFoldOp where it took an
immediate operand value, but no indication of which operand it was the
value for.

Second clean up the loop that calls tryConstantFoldOp so that it does
not have to restart from the beginning every time it folds an
instruction.

This is NFCI but there are some minor changes caused by the order in
which things are folded.

Differential Revision: https://reviews.llvm.org/D100031
2021-05-06 09:55:22 +01:00
Stanislav Mekhanoshin
ab90ae6f47 [AMDGPU] Switch AnnotateUniformValues to MemorySSA
This shall speedup compilation and also remove threshold
limitations used by memory dependency analysis.

It also seem to fix the bug in the coalescer_remat.ll
where an SMRD load was used in presence of a potentially
clobbering store.

Fixes: SWDEV-272132

Differential Revision: https://reviews.llvm.org/D101962
2021-05-05 18:34:41 -07:00
Austin Kerbow
6617a5a5ea [AMDGPU] Move insertion of function entry waitcnt later
This allows tracking these as preexisting waitcnt.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D101380
2021-05-05 17:58:38 -07:00
Austin Kerbow
f5199d7ae0 [AMDGPU] Revise handling of preexisting waitcnt
Preexisting waitcnt may not update the scoreboard if the instruction
being examined needed to wait on fewer counters than what was encoded in
the old waitcnt instruction. Fixing this results in the elimination of
some redudnat waitcnt.

These changes also enable combining consecutive waitcnt into a single
S_WAITCNT or S_WAITCNT_VSCNT instruction.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100281
2021-05-05 17:21:33 -07:00