1394 Commits

Author SHA1 Message Date
Matt Arsenault
a020581f2e AMDGPU/GlobalISel: Add baseline test for new ABI attribute hints 2021-08-26 22:09:11 -04:00
Matt Arsenault
98d7aa435f AMDGPU: Stop inferring use of llvm.amdgcn.kernarg.segment.ptr
We no longer use this intrinsic outside of the backend and no longer
support using it outside of kernels.
2021-08-26 20:30:03 -04:00
Konstantin Schwarz
4b4bc1ea16 [GlobalISel] Do not generate illegal G_SEXTLOADs after legalization
The sext_inreg_of_load combine did not have the isLegalOrBeforeLegalizer check,
leading to the generation of potentially illegal G_SEXTLOADs when run after legalization.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108626
2021-08-25 10:13:39 +02:00
Petar Avramovic
2bf4eeeeb6 [GlobalISel] Avoid creating COPY in LegalizationArtifactCombiner
When Src and Dst used in buildAnyExtOrTrunc or buildSExtOrTrunc
have the same type (creates COPY) use Src register directly or
use replaceRegOrBuildCopy instead.

Differential Revision: https://reviews.llvm.org/D108306
2021-08-24 11:09:56 +02:00
Simon Pilgrim
fb81271e8b [AMDGPU] Fix lowering of AMDGPU::G_CTTZ_ZERO_UNDEF to AMDGPU::G_AMDGPU_FFBL_B32
As mentioned on D107474, there was a copy+paste typo repeating G_CTLZ_ZERO_UNDEF that coverity reported as dead code.

Differential Revision: https://reviews.llvm.org/D108210
2021-08-17 18:09:57 +01:00
Sebastian Neubauer
fbae34635d [GlobalISel] Add combine for PTR_ADD with regbanks
Combine two G_PTR_ADDs, but keep the register bank of the constant.
That way, the combine can be used in post-regbank-select combines.

Introduce two helper methods in CombinerHelper, getRegBank and
setRegBank that get and set an optional register bank to a register.
That way, they can be used before and after register bank selection.

Differential Revision: https://reviews.llvm.org/D103326
2021-08-17 13:58:16 +02:00
Matt Arsenault
d719f1c3cc AMDGPU: Add alloc priority to global ranges
The requested register class priorities weren't respected
globally. Not sure why this is a target option, and not just the
expected behavior (recently added in
1a6dc92be7d68611077f0fb0b723b361817c950c). This avoids an allocation
failure when many wide tuple spills are introduced. I think this is a
workaround since I would not expect the allocation priority to be
required, and only a performance hint. The allocator should be smarter
about when only a subregister needs to be spilled and restored.

This does regress a couple of degenerate store stress lit tests which
shouldn't be too important.
2021-08-10 13:12:34 -04:00
Konstantin Schwarz
64bef13f08 [GlobalISel] Look through truncs and extends in narrowScalarShift
If a G_SHL is fed by a G_CONSTANT, the lower and upper bits of the source can be
shifted individually by the constant shift amount.

However in case the shift amount came from a G_TRUNC(G_CONSTANT), the generic shift legalization
code was used, producing intermediate shifts that are potentially illegal on some targets.

This change teaches narrowScalarShift to look through G_TRUNCs and G_*EXTs.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D89100
2021-08-10 13:49:22 +02:00
Michael Liao
05783e1cfe [amdgpu] Revise the conversion from i64 to f32.
- Replace 'cmp+sel' with 'umin' if possible.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107507
2021-08-06 17:01:47 -04:00
Jay Foad
57b9107e3f [GlobalISel] Improve widening of cttz/cttz_zero_undef
Differential Revision: https://reviews.llvm.org/D107631
2021-08-06 14:25:56 +01:00
Jay Foad
83610d4eb0 [AMDGPU][GlobalISel] Better legalization of 32-bit ctlz/cttz
Differential Revision: https://reviews.llvm.org/D107474
2021-08-06 09:40:48 +01:00
Jay Foad
24b67a9024 [AMDGPU][GlobalISel] Improve regbankselect for 64-bit VGPR ctlz_zero_undef/cttz_zero_undef
We can improve on the generic splitting by using ffbh/ffbl, which have a
defined result when the input is zero.

Differential Revision: https://reviews.llvm.org/D107442
2021-08-06 09:40:48 +01:00
Jay Foad
d77b43c385 [AMDGPU][GlobalISel] Add G_AMDGPU_FFBL_B32
This is the counterpart to G_AMDGPU_FFBH_U32 which already exists. These
instructions have a defined result of -1 when the input is zero.

Differential Revision: https://reviews.llvm.org/D107441
2021-08-06 09:40:48 +01:00
Jay Foad
cd2594e1c6 [GlobalISel] Improve legalization of narrow CTTZ
Differential Revision: https://reviews.llvm.org/D107457
2021-08-06 09:40:48 +01:00
Amara Emerson
1577c41090 [GlobalISel] Allow the ArtifactValueFinder to return the best available register on failure.
In some cases, like with inserts, we may have a matching size register already,
but still decide to try to look further. This change adds a CurrentBest
register to the value finder state, and any time a method fails to make progress,
returns that register (which may just be an empty Register).

To facilitate this, add a new entry point to the findValueFromDef() function
which initializes this state.

Also fix the build vector finder to return the current build_vector if all
sources are being requested.

Differential Revision: https://reviews.llvm.org/D107017
2021-08-05 17:37:30 -07:00
Petar Avramovic
66de26b1f9 GlobalISel: Fix matchEqualDefs for instructions with multiple defs
Instructions that produceSameValue produce same values for operands with
same index. matchEqualDefs used to return true for any two values from
different instructions that produce same values. Fix this by checking if
values are defined by operands with the same index.

Differential Revision: https://reviews.llvm.org/D107362
2021-08-05 15:05:45 +02:00
Dominik Montada
cc947e29ea [GlobalISel] Combine shr(shl x, c1), c2 to G_SBFX/G_UBFX
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107330
2021-08-05 13:52:10 +02:00
Michael Liao
5edc886e90 [amdgpu] Add an enhanced conversion from i64 to f32.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D107187
2021-08-04 15:33:12 -04:00
Carl Ritson
675c942373 [AMDGPU] Disable NSA for BVH instructions when appropriate
Check maximum NSA size when selecting NSA or non-NSA BVH instructions.

Differential Revision: https://reviews.llvm.org/D103230
2021-08-02 20:09:26 +09:00
Petar Avramovic
dc3fbe293f GlobalISel: Fix infinite loop in legalization artifact combiner
ArtifactValueFinder keeps trying to combine g_unmerge_values in some cases.
Fix is to skip combine attempt for dead defs.

Differential Revision: https://reviews.llvm.org/D106879
2021-08-02 12:58:10 +02:00
Carl Ritson
a441de6d94 [AMDGPU][GlobalISel] Add missing default mapping for BVH intrinsics
Application of default mapping to BVH intrinsics was missing.
Copy parts of SelectionDAG test to GlobalISel test as these would
have indicated this error.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D107211
2021-08-02 12:43:38 +09:00
Matt Arsenault
ebc17a0d68 GlobalISel: Scalarize unaligned vector stores
This has the same problems and limitations as the load path.
2021-07-31 10:37:15 -04:00
Matt Arsenault
43c7cb9a3c AMDGPU/GlobalISel: Check some remarks for failed legalizations
The load/store tests are giant and have some cases that fail in them,
but it's hard to tell which ones are really failing. Check the remarks
to make it easier to track.
2021-07-31 10:37:15 -04:00
Matt Arsenault
e46badd4e9 GlobalISel: Have lowerLoad scalarize unaligned vectors
This could be smarter by picking an ideal type, or at least splitting
the vector in half first. Also handles lower for non-power-of-2,
non-extending vector loads.

Currently this just avoids failing to legalize some odd vector AMDGPU
tests, but is a step towards removing the split logic from the
NarrowScalar logic.
2021-07-30 13:23:29 -04:00
Matt Arsenault
f19226dda5 GlobalISel: Have load lowering handle some unaligned accesses
The code for splitting an unaligned access into 2 pieces is
essentially the same as for splitting a non-power-of-2 load for
scalars. It would be better to pick an optimal memory access size and
directly use it, but splitting in half is what the DAG does.

As-is this fixes handling of some unaligned sextload/zextloads for
AMDGPU. In the future this will help drop the ugly abuse of
narrowScalar to handle splitting unaligned accesses.
2021-07-30 12:55:58 -04:00
Matt Arsenault
05ecd7a2ac AMDGPU/GlobalISel: Fix tests using illegal copies to physregs
These are unlegalizable and introduce spurious failures. Ideally the
verifier would reject them. Also avoid some weird G_INSERTs.
2021-07-30 12:37:29 -04:00
Matt Arsenault
faccf427df AMDGPU/GlobalISel: Remove special case lowering for non-pow-2 stores
We end up with extra copies from buildAnyExtOrTrunc if these are
lowered after the register types are legalized.
2021-07-30 12:37:29 -04:00
Mirko Brkusanin
971f4173f8 [AMDGPU][GlobalISel] Insert an and with exec before s_cbranch_vccnz if necessary
While v_cmp will AND inactive lanes with 0, that is not the case for logical
operations.

This fixes a Vulkan CTS test that would hang otherwise.

Differential Revision: https://reviews.llvm.org/D105709
2021-07-29 11:20:49 +02:00
Matt Arsenault
d7d2e4545e AMDGPU/GlobalISel: Fix selecting G_SEXTLOAD/G_ZEXTLOAD pre-gfx9
The patterns for the m0 glue patterns were failing to import.
2021-07-27 15:56:42 -04:00
Matt Arsenault
82ab1ae54e AMDGPU/GlobalISel: Fix wrong addrspace in test MMOs 2021-07-27 15:56:41 -04:00
Matt Arsenault
74c65906bc AMDGPU/GlobalISel: Add a few tests for unaligned truncating stores 2021-07-27 15:56:41 -04:00
Jay Foad
dc4ca0dbbc [GlobalISel] Constant fold G_SITOFP and G_UITOFP in CSEMIRBuilder
Differential Revision: https://reviews.llvm.org/D104528
2021-07-27 11:27:58 +01:00
Amara Emerson
c658b472f3 [GlobalISel] Add a constant folding combine.
Use it AArch64 post-legal combiner. These don't always get folded because when
the instructions are created the constants are obscured by artifacts.

Differential Revision: https://reviews.llvm.org/D106776
2021-07-26 14:53:33 -07:00
Jay Foad
59f6865231 [AMDGPU][GISel] Fix MMO for raw/struct buffer access with non-constant offset
Codegen for the raw/struct buffer access intrinsics would update the
offset in the MMO to reflect the combined offset, if it was known to be
constant. If the combined offset was not known to be constant, or if
there was an index, it would set the offset in the MMO to 0. This is
unsafe because it makes it look like the access does not alias with
another access with a fixed non-zero offset.

Fix these cases by setting the pointer in the MMO to null, to reflect
the fact that we do not have any known IR value pointer + constant
offset for the access.

D106284 did this for SelectionDAG. This is the corresponding fix for
GlobalISel.

Differential Revision: https://reviews.llvm.org/D106451
2021-07-26 14:27:30 +01:00
Jay Foad
683b9ed0d5 [AMDGPU] Pre-commit global-isel test case for D106451
This test case shows the scheduler wrongly reordering two buffer
accesses that might alias.
2021-07-26 14:27:30 +01:00
Carl Ritson
7d4baf25aa [AMDGPU] Add maximum NSA size limit ISA feature
Add maximum NSA size limit as an ISA feature.
Use this to reduce NSA usage on GFX10.1 to avoid stability issues
with 4 and 5 dwords NSA instructions.
Maintain use of longer NSA instructions on GFX10.3.

Note: this also contains some minor fixes for GlobalISel which
did not work correctly with non-NSA form instructions on GFX10.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D103348
2021-07-23 16:16:06 +09:00
Carl Ritson
6efb3220b4 [AMDGPU] Add VReg_192/VReg_224 support for MIMG instructions
Allow MIMG instructions to be selected with 6/7 VGPRs for vaddr.
Previously these were rounded up to VReg_256 this saves VGPRs.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D103800
2021-07-22 10:42:15 +09:00
Sebastian Neubauer
b642d01fa8 [AMDGPU] Improve killed check for vgpr optimization
The killed flag is not always set. E.g. when a variable is used in a
loop, it is never marked as killed, although it is unused in following
basic blocks. Also, we try to deprecate kill flags and not use them.

Check if the register is live in the endif block. If not, consider it
killed in the then and else blocks.

The vgpr-liverange tests have two new tests with loops
(pre-committed, so the diff is visible).
I also needed to change the subtarget to gfx10.1, otherwise calls
are not working.

Differential Revision: https://reviews.llvm.org/D106291
2021-07-21 15:24:59 +02:00
Matt Arsenault
67d6132463 GlobalISel: Preserve memory types for implicit sret load/stores 2021-07-19 11:52:42 -04:00
Matt Arsenault
9236125ec8 GlobalISel: Preserve LLT when bitcasting loads and stores
This also avoids improperly legalizing some truncating vector stores.
2021-07-19 11:30:14 -04:00
Matt Arsenault
51f115b078 AMDGPU/GlobalISel: Add a few tests for struct arguments
Test structs with pointers and vectors of pointers since this stresses
a future patch.
2021-07-16 20:20:55 -04:00
Matt Arsenault
27addb85a6 AMDGPU/GlobalISel: Fix some incorrect memory types in tests 2021-07-16 20:20:55 -04:00
Matt Arsenault
3ceb92295e AMDGPU/GlobalISel: Preserve more memory types 2021-07-16 08:57:26 -04:00
Matt Arsenault
21a0ef8d19 AMDGPU/GlobalISel: Redo kernel argument load handling
This avoids relying on G_EXTRACT on unusual types, and also properly
decomposes structs into multiple registers. This also preserves the
LLTs in the memory operands.
2021-07-16 08:56:54 -04:00
Matt Arsenault
a81a7a9ad8 AMDGPU/GlobalISel: Fix incorrect memory types in test 2021-07-15 19:11:40 -04:00
Matt Arsenault
e91da668d0 GlobalISel: Track argument pointeriness with arg flags
Since we're still building on top of the MVT based infrastructure, we
need to track the pointer type/address space on the side so we can end
up with the correct pointer LLTs when interpreting CCValAssigns.
2021-07-15 19:11:40 -04:00
Matt Arsenault
47269da5d8 GlobalISel: Handle lowering non-power-of-2 extloads 2021-07-14 11:54:11 -04:00
Sebastian Neubauer
4359b870b1 [AMDGPU] Init scratch only if necessary
If no scratch or flat instructions are used, we do not need to
initialize the flat scratch hardware register.

Differential Revision: https://reviews.llvm.org/D105920
2021-07-14 10:45:22 +02:00
Matt Arsenault
eebe841a47 RegAlloc: Allow targets to split register allocation
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.

Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run.  This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.

This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.

In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.

One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.

Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.
2021-07-13 18:49:29 -04:00
Matt Arsenault
222fde1eec GlobalISel: Use extension instead of merge with undef in common case
This fixes not respecting signext/zeroext in these cases. In the
anyext case, this avoids a larger merge with undef and should be a
better canonical form.

This should also handle this if a merge is needed, but I'm not aware
of a case where that can happen. In a future change this will also
allow AMDGPU to drop some custom code without introducing regressions.
2021-07-13 11:04:47 -04:00