5146 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
bb1fe36977 [AMDGPU] Make v8i16/v8f16 legal
Differential Revision: https://reviews.llvm.org/D117721
2022-01-24 11:51:08 -08:00
Stanislav Mekhanoshin
c27f8fb968 [AMDGPU] Remove cndmask from readsExecAsData
Differential Revision: https://reviews.llvm.org/D117909
2022-01-24 11:24:47 -08:00
Matt Arsenault
18aabae8e2 AMDGPU: Fix assertion on fixed stack objects with VGPR->AGPR spills
These have negative / out of bounds frame index values and would
assert when trying to set the BitVector. Fixed stack objects can't be
colored away so ignore them.
2022-01-24 09:45:41 -05:00
Matt Arsenault
99e8e17313 Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction"
This reverts commit a97e20a3a8a58be751f023e610758310d5664562.
2022-01-24 09:26:52 -05:00
Abinav Puthan Purayil
912af6b570 [AMDGPU][GlobalISel] Remove the post ':' part of vreg operands in fsh combine tests. 2022-01-24 16:30:40 +05:30
Jay Foad
aa50b93e7c [AMDGPU][GlobalISel] Add more sign/zero/any-extension tests
Add s1 to s16 cases, and for sgprs s1 to s64 and s32 to s64.
2022-01-24 10:16:51 +00:00
Jay Foad
906ebd5830 [AMDGPU][GlobalISel] Regenerate checks in inst-select-*ext.mir 2022-01-24 10:16:51 +00:00
Abinav Puthan Purayil
68b70d17d8 [GlobalISel] Fold or of shifts with constant amount to funnel shift.
This change folds (or (shl x, C0), (lshr y, C1)) to funnel shift iff C0
and C1 are constants where C0 + C1 is the bit-width of the shift
instructions.

Differential Revision: https://reviews.llvm.org/D116529
2022-01-24 10:43:32 +05:30
David Green
b27e5459d5 [DAG] Convert truncstore(extend(x)) back to store(x)
Pulled out of D106237, this folds truncstore(extend(x)) back to store(x)
if the original store was legal. This can come up due to the order we
fold nodes. A fold from X86 needs to be adjusted to prevent infinite
loops, to have it pick the operand of a trunc more directly.

Differential Revision: https://reviews.llvm.org/D117901
2022-01-22 13:20:36 +00:00
Sebastian Neubauer
ae2f9c8be8 [AMDGPU] Remove lz and nomip combine from codegen
These combines have been moved into the IR combiner in D116042.

Differential Revision: https://reviews.llvm.org/D116116
2022-01-21 12:09:08 +01:00
Stanislav Mekhanoshin
41ebd19681 [AMDGPU] Do not ignore exec use where exec is read as data
Compares, v_cndmask_b32, and v_readfirstlane_b32 use EXEC
in a way which modifies the result. This implicit EXEC use
shall not be ignored for the purposes of instruction moves.

Differential Revision: https://reviews.llvm.org/D117814
2022-01-20 14:05:22 -08:00
Nico Weber
9122b5072a [llvm] Remove an old bot cleanup command 2022-01-20 15:02:35 -05:00
Stanislav Mekhanoshin
94a0660c14 [AMDGPU] Regenerate remat-vop.mir. NFC. 2022-01-20 11:15:42 -08:00
Matt Arsenault
237502c1a4 AMDGPU: Fix asm in test using wrong IR type for physical register 2022-01-20 12:56:53 -05:00
Matt Arsenault
064cea9c9a AMDGPU/GlobalISel: Try to use s_and_b64 in ptrmask selection
Avoids a test diff with SDAG.
2022-01-20 12:56:53 -05:00
Matt Arsenault
2d1f9aa27d AMDGPU/GlobalISel: Regenerate test checks with -NEXT 2022-01-20 12:56:53 -05:00
Matt Arsenault
08549ba51e AMDGPU/GlobalISel: Explicitly set -global-isel-abort in failure tests
If the default mode is the fallback, this would fail since it would
end up seeing the DAG failure message instead.
2022-01-20 12:56:53 -05:00
Matt Arsenault
2e49e0cfde AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics
Emit an error if the return value is used on subtargets that do not
support them. Previously we were falling back to the DAG on selection
failure, where it would emit this error and then fail again.
2022-01-20 12:46:45 -05:00
Matt Arsenault
be7e938e27 AMDGPU/GlobalISel: Stop handling llvm.amdgcn.buffer.atomic.fadd
This code is not structured to handle the legacy buffer intrinsics and
was miscompiling them.
2022-01-20 12:12:06 -05:00
Matt Arsenault
8ff3c9e0be AMDGPU/GlobalISel: Fix selection of gfx90a FP atomics
The struct/raw forms for the buffer atomics now work as
expected. However, we're incorrectly handling the legacy form (which
we probably shouldn't handle at all). We also are not diagnosing the
use of the return value on gfx908. These will be addressed separately.
2022-01-20 12:12:06 -05:00
Matt Arsenault
89c447e4e6 AMDGPU: Stop reserving 36-bytes before kernel arguments for amdpal
This was inheriting the mesa behavior, and as far as I know nobody is
using opencl kernels with amdpal. The isMesaKernel check was
irrelevant because this property needs to be held for all functions.
2022-01-20 12:12:05 -05:00
Abinav Puthan Purayil
d8b690409d [AMDGPU] Set MemoryVT for truncstores in tblgen.
GlobalISelEmitter was skipping these patterns when its predicates were
checked. This patch should allow us to select d16_hi stores in
GlobalISel.

Differential Revision: https://reviews.llvm.org/D117762
2022-01-20 19:05:12 +05:30
Jay Foad
847bb26820 [AMDGPU] Regenerate some MIR checks 2022-01-20 12:41:40 +00:00
Konstantina
c7b71acef2 [AMDGPU][NFC] Add autogenerated tests for vgpr-tuple-allocation.ll
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D117692
2022-01-19 15:06:16 -08:00
Yaxun (Sam) Liu
15f54dd5e4 AMDGPU: Account for usage HIP-style dynamic LDS
Disable promote alloca to LDS when HIP-style dynamic LDS since the size
is unknown at compile time.

Patch by: Siu Chi Chan

Reviewed by: Matt Arsenault, Yaxun Liu

Differential Revision: https://reviews.llvm.org/D117494
2022-01-19 13:05:29 -05:00
Matt Arsenault
052503979e AMDGPU/GlobalISel: Fix introducing f16 fmed3 for gfx8 2022-01-19 10:43:21 -05:00
Matt Arsenault
adab71711e AMDGPU/GlobalISel: Fix legalize failure on i65 ctpop 2022-01-19 10:26:28 -05:00
Matt Arsenault
b965617ccc GlobalISel: Fix assert on unmerge to different element of casted vector
This was failing if a G_UNMERGE_VALUES produced a different element
type than the cast result type.
2022-01-19 10:13:31 -05:00
Matt Arsenault
7f26a1027f AMDGPU/GlobalISel: Introduce pseudo to copy sp in call sequences
Arbitrary stack pointers are accessed using MUBUF instructions with
the voffset field, which is interpreted as the swizzled address. We
want to fold fold into the MUBUF form to use the SP in the SGPR
offset, and previously we were special casing the interpretation of
the pointer value if the access memory operand said it was relative to
the stack pointer.

690f5b7a0128a210093e9b217932743ad35b5c5a removed this check, and moved
the DAG path to special casing copies from SGPRs. This is not an
entirely sound approach, since it's still changing the interpretation
of pointer values based the context.

Introduce a new pseudo which corresponds to the wave-to-vector address
transform. This way the memory instruction has consistent semantics
where the incoming pointer is always interpreted as a vector address,
and we're not obligated to optimize into the MUBUF offset-only
addressing mode. The DAG should probably have an equivalent pseudo.

This should fix some correctness issues, and folding this into
addressing modes will be a future optimization patch.
2022-01-19 10:13:31 -05:00
Jay Foad
0bc14a0a98 [AMDGPU] Tweak some compares in wqm.ll test
This prevents the compares from being optimized away when D86578 lands,
which seems unintended. Also fixed some unused results.
2022-01-19 12:42:56 +00:00
Piotr Sobczak
8dfb417e67 [AMDGPU] Fix missing waitcnt issue
Ignore out of order counters when merging brackets. The fact that
there was a pending event in the old state does not guarantee that
the waitcnt was generated, so we still need to conservatively re-process
the block.

The patch fixes a correctness issue where the block was not re-processed
and the waitcnt not inserted in consequence.

Differential Revision: https://reviews.llvm.org/D117544
2022-01-19 10:54:44 +01:00
Jay Foad
7af959673e [AMDGPU] Tweak some compares in wave32.ll test
This prevents the compares from being optimized away when D86578 lands,
which seems unintended.
2022-01-19 08:14:06 +00:00
Matt Arsenault
e3dd47f987 AMDGPU: Fix using deprecated buffer intrinsics in test 2022-01-18 19:02:47 -05:00
Matt Arsenault
da72822763 GlobalISel: Fix CSEMIRBuilder mishandling constant folds of vectors
This was ignoring the requested result register, resulting in a
missing def when this happened in the IRTranslator. Fixes some crashes
and verifier errors at -O0.

Alternatively we could pass DstOps to the constant fold functions.
2022-01-18 17:21:02 -05:00
Matt Arsenault
42098c4a30 GlobalISel: Fix legalization error where CSE leaves behind dead defs
If the conversion artifact introduced in the unmerge of cast of merge
combine already existed in the function, this would introduce dead
copies which kept the old casts around, neither of which were deleted,
and would fail legalization.

This would fail as follows:

The G_UNMERGE_VALUES of the G_SEXT of the G_BUILD_VECTOR would
introduce a G_SEXT for each of the scalars.

Some of the required G_SEXTs already existed in the function, so CSE
moves them up in the function and introduces a copy to the original
result register.

The introduced CSE copies are dead, since the originally G_SEXTs were
already directly used. These copies add a use to the illegal G_SEXTs,
so they are not deleted.

The artifact combiner does not see the defs that need to be updated,
since it was hidden inside the CSE builder.

I see 2 potential fixes, and opted for the mechanically simpler one,
which is to just not insert the cast if the result operand isn't
used. Alternatively, we could not insert the cast directly into the
result register, and use replaceRegOrBuildCopy similar to the case
where there is no conversion.

I suspect this is a wider problem in the artifact combiner.
2022-01-18 17:04:40 -05:00
Matt Arsenault
82de129ab8 AMDGPU: Remove llvm.amdgcn.alignbit and handle bitcode upgrade to fshr 2022-01-18 14:08:36 -05:00
Matt Arsenault
de1600a1d9 AMDGPU: Avoid enabling kernel workitem IDs with reqd_work_group_size 2022-01-18 13:52:04 -05:00
Matt Arsenault
984451eafc PostRAPseudos: Don't preserve kills on some implicit copy operands
This fixes a verifier error I ran into at -O0. A subregister copy had
an implicit kill of an overlapping superregister, which was partially
redefined by the copy. The preserved implicit operand killed
subregisters made live earlier in the sequence. AMDGPU already uses
similar logic for whether to preserve the kill of the superregister on
the final instruction if there's overlap.
2022-01-18 13:52:04 -05:00
Matt Arsenault
f5ff1cab43 AMDGPU/GlobalISel: Regenerate base test checks 2022-01-18 11:26:47 -05:00
Vang Thao
10ed1eca24 [MachineSink] Allow sinking of constant or ignorable physreg uses
For AMDGPU, any use of the physical register EXEC prevents sinking even if it is not a real physical register read. Add check to see if a physical
register use can be ignored for sinking.

Also perform same constant and ignorable physical register check when considering sinking in loops.

https://reviews.llvm.org/D116053
2022-01-18 14:17:40 +00:00
Christudasan Devadasan
56a5d78893 [AMDGPU] Disable optimizeEndCf at -O0
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D116819
2022-01-18 02:48:52 -05:00
Carl Ritson
ed4d8fdafd [AMDGPU] Autogenerate wqm.ll
Switch wqm.ll to be autogenerated.
Replace gfx6 and gfx8 targets with gfx9 (wave64) and gfx10 (wave32).

Reviewed By: kmitropoulou

Differential Revision: https://reviews.llvm.org/D117455
2022-01-18 16:04:40 +09:00
Matt Arsenault
dc2457c8cf AMDGPU: Fix crashing on calls to C functions from graphics contexts
If we had one of the shader calling conventions calling a default
calling convention callee, this would crash when the caller did not
have anything to pass to the workitem ID.

This is illegal, but we still need to produce something
sensible. llvm-reduce likes to replace calls to intrinsics with calls
to null or undef, so this does appear and is helpful to avoid hard
erroring.

Pass undef in this case, as already happened for the other implicit
arguments. It might make sense to define the behavior here and pass
null for the pointers, and -1 for the workitem ID. We do have extra
bits in the workitem ID, so this wouldn't conflict with a valid value.
2022-01-17 10:13:25 -05:00
Matt Arsenault
9392b40d4b AMDGPU/GlobalISel: Fix selection of constant 32-bit addrspace loads
Unfortunately the selection patterns still rely on the address space
from the memory operand instead of using the pointer type. Add this
address space to the list of cases supported by global-like loads.

Alternatively we would have to adjust the address space of the memory
operand to deviate from the underlying IR value, which looks ugly and
is more work in the legalizer.

This doesn't come up in the DAG path because it uses a different
selection strategy where the cast is inserted during the addressing
mode matching.
2022-01-17 10:06:33 -05:00
Matt Arsenault
c3a74183a5 AMDGPU/GlobalISel: Fix legalization failure for s65 shifts
This was trying to clamp s65 down to s32, which wasn't handled so we
need to promote all the way to s128 first. Having to order the
legalization rules in just the right way is rather dissatisfying, but
I'm not sure how smart the legalizer should be in trying to interpret
the rules.
2022-01-17 10:04:41 -05:00
Matt Arsenault
d97fb55ff3 AMDGPU/GlobalISel: Add failing ABI lowering testcases 2022-01-17 09:38:35 -05:00
Matt Arsenault
e09f98a69a AMDGPU: Fix LiveVariables error after optimizing VGPR ranges
This was not removing the block from the live set depending on the
specific depth first visit order. Fixes a verifier error in the OpenCL
conformance tests.
2022-01-17 09:38:35 -05:00
Matt Arsenault
81004269e5 AMDGPU/GlobalISel: Fix test not matching test name
This was testing an s48 load instead of an s64 load as intended.
2022-01-17 09:38:35 -05:00
Nikita Popov
873a7ee7e4 [MachineInstr] Don't include debug uses in bundle header (PR52817)
Following the recommendation in
https://github.com/llvm/llvm-project/issues/52817#issuecomment-1007635426,
this excludes debug instructions when finalizing the bundle. As uses
in debug instructions don't have effects, they will no longer be
included in the BUNDLE header.

Fixes https://github.com/llvm/llvm-project/issues/52817.

Differential Revision: https://reviews.llvm.org/D116945
2022-01-17 10:43:21 +01:00
Craig Topper
454256ef4f [AMDGPU] Correct the known bits calculation for MUL_I24.
I'm not entirely sure, but based on how ComputeNumSignBits handles
ISD::MUL, I believe this code was miscounting the number of sign
bits.

As an example of an incorrect result let's say that countMinSignBits
returned 1 for the left hand side and 24 for the right hand side.
LHSValBits would be 23 and RHSValBits would be 0 and the sum would
be 23. This would cause the code to set 9 high bits as zero/one. Now
suppose the real values for the left side is 0x800000 and the right
hand side is 0xffffff. The product is 0x00800000 which has 8 sign bits
not 9.

The number of valid bits for the left and right operands is now
the number of non-sign bits + 1. If the sum of the valid bits of
the left and right sides exceeds 32, then the result may overflow and we
can't say anything about the sign of the result. If the sum is 32
or less then it won't overflow and we know the result has at least
1 sign bit.

For the previous example, the code will now calculate the left
side valid bits as 24 and the right side as 1. The sum will be 25
and the sign bits will be 32 - 25 + 1 which is 8, the correct value.

Differential Revision: https://reviews.llvm.org/D116469
2022-01-14 08:54:54 -08:00