5813 Commits

Author SHA1 Message Date
Simon Pilgrim
efd0d66269 [AMDGPU] Add regression test cases reported on D136042 2022-10-17 14:54:27 +01:00
Simon Pilgrim
0aa9a7f8d9 [AMDGPU] Regenerate bfe-combine.ll and bfe-patterns.ll 2022-10-17 14:41:14 +01:00
Jay Foad
0c22f4f5fe [AMDGPU] Common up some generated checks in fnearbyint.ll
Also remove -mattr=-flat-for-global which is not needed for generated
checks.
2022-10-17 11:02:19 +01:00
Peter Rong
c2e7c9cb33 [CodeGen] Using ZExt for extractelement indices.
In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`.
This is because IRTranslator uses SExt for indices.

In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt.
This change includes both documentation, SelectionDAG and IRTranslator.
We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86

This patch fixes issue #57452.

Differential Revision: https://reviews.llvm.org/D132978
2022-10-15 15:45:35 -07:00
Anshil Gandhi
94ac8f3a8c [BranchRelaxation] Fix test for duplicate branch instruction
This patch is a follow up for D134557, inserting a check
for a duplicate unconditional branch to fall through.

Differential Revision: https://reviews.llvm.org/D135975
2022-10-14 12:21:26 -06:00
Sander de Smalen
02df03c5b7 [AArch64][SME] Add support for arm_locally_streaming functions.
Functions with `aarch64_sme_pstatesm_body` will emit a SMSTART at the start
of the function, and a SMSTOP at the end of the function, such that all
operations use the right value for vscale.

Because the placement of these nodes is critically important (i.e. no
vscale-dependent operations should be done before SMSTART has been issued),
we require glueing the CopyFromReg to the Entry node such that we can
insert the SMSTART as part of that glued chain.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131582
2022-10-14 13:47:53 +00:00
Leon Clark
6370bc2435 Add f16 nearbyint support.
Enable lowering of FNEARBYINT for f16 and extend existing tests.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135124
2022-10-14 08:05:24 +01:00
Matt Arsenault
99dff82118 AMDGPU: Fix failing test with expensive checks
Fixes failure after d383adec4d3914492e67267462e6f00fdd4934af
2022-10-13 23:34:20 -07:00
Anshil Gandhi
d383adec4d [BranchRelaxation] Fall through only if block has no unconditional branches
Prior to inserting an unconditional branch from X to its
fall through basic block, check if X has any terminators to
avoid inserting additional branches.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134557
2022-10-13 22:48:41 -06:00
Leon Clark
98852a0f3d Precommit for SWDEV-353076: Add check directives to existing tests.
Add FileCheck directives to existing tests in preparation for new tests.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135788
2022-10-13 08:02:37 +01:00
Matt Arsenault
838fd611b7 AMDGPU: Fix assertion on <1 x i16> vectors
Fixes issue 58331.
2022-10-12 17:25:24 -07:00
Matt Arsenault
575eed3dac AMDGPU: Fix hazard with v_accvgpr_write_b32 and inline asm VGPR defs
If inline asm has a VGPR def, it must have come from a VGPR write
somewhere inside the asm. This should be further extended to all
read after write hazards.
2022-10-12 17:25:24 -07:00
Joe Nash
5f095e4751 [AMDGPU] Add GFX11 tests for fcmp and ballot. NFC
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D135782
2022-10-12 15:56:54 -04:00
Craig Topper
ac9209751a Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))"
This reverts commit 0148df8157f05ecf3b1064508e6f012aefb87dad.

Getting a lit test failures on AMDGPU but I can't reproduce it so far.
Reverting to investigate.
2022-10-11 16:30:40 -07:00
Craig Topper
0148df8157 [DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional
negate of Y.

This pattern shows up when type legalizing wide multiplies involving
a sign extended value.

Fixes PR57549.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D133399
2022-10-11 16:20:55 -07:00
Jessica Paquette
0f1a51e173 [GlobalISel] Allow vectors in redundant or + add combines
We support KnownBits for vectors, so we can enable these.

https://godbolt.org/z/r9a9W4Gj1

Differential Revision: https://reviews.llvm.org/D135719
2022-10-11 15:31:09 -07:00
Abinav Puthan Purayil
3d9f011a9c [AMDGPU] Make the uses_dynamic_stack field in the kernel descriptor and the metadata map specific to code object v5 and later
Unfortunately, we have a broken handling of this in the runtime of rocm
5.3. The runtime is expected to handle this correctly when v5 becomes
the default.

Differential Revision: https://reviews.llvm.org/D134714
2022-10-11 23:28:43 +05:30
Pierre van Houtryve
4d815bfae0 [GISel] Add redundant bitcast folding combine
Simply folds away bitcasts that cancel each other.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135146
2022-10-11 15:03:08 +00:00
Weining Lu
42b70793a1 Reland "[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC"
Reference: https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html

k: A memory operand whose address is formed by a base register and
(optionally scaled) index register.

m: A memory operand whose address is formed by a base register and
offset that is suitable for use in instructions with the same
addressing mode as st.w and ld.w.

ZB: An address that is held in a general-purpose register. The offset
is zero.

ZC: A memory operand whose address is formed by a base register and
offset that is suitable for use in instructions with the same
addressing mode as ll.w and sc.w.

Note:
The INLINEASM SDNode flags in below tests are updated because the new
introduced enum `Constraint_k` is added before `Constraint_m`.
  llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
  llvm/test/CodeGen/X86/callbr-asm-kill.mir

This patch passes `ninja check-all` on a X86 machine with all official
targets and the LoongArch target enabled.

Differential Revision: https://reviews.llvm.org/D134638
2022-10-11 19:51:48 +08:00
Joe Nash
8a7d4993b7 [AMDGPU] Fix True16 patterns for cmp on GFX11
These patterns should have a True16 version and a non-true16 version.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135609
2022-10-10 16:41:06 -04:00
Arthur Eubanks
f3a928e233 [opt] Don't translate legacy -analysis flag to require<analysis>
Tests relying on this should explicitly use -passes='require<analysis>,foo'.
2022-10-07 14:54:34 -07:00
Matt Arsenault
74ef03d38a AMDGPU: Update SlotIndexes independently of LiveIntervals
Apparently StackColoring depends on SlotIndexes, but not
LiveIntervals. If regalloc fast were manually requested, LiveIntervals
would be dropped before SILowerSGPRSpills but not SlotIndexes.

SILowerSGPRSpills preserved SlotIndexes, but only through
LiveIntervals. As a result, SILowerSGPRSpills was incorrectly
reporting it preserved SlotIndexes. Start updating these directly,
instead of depending on LiveIntervals also being available.
2022-10-07 13:15:15 -07:00
Pierre van Houtryve
36c3833783 [GISel] Add Trunc/Lshr/BuildVector Folding
Similar to the current "Trunc/BuildVector" folding - which folds low element extracts of BuildVectors, folds hi element extracts done using bitshifts.

For D134354

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135148
2022-10-07 08:44:03 +00:00
Jeffrey Byrnes
8e5b96cce9 [AMDGPU] Add test coverage to ensure first regallocfast only allocates SGPR
Register allocation is split into two passes, and the expected behavior is that the first pass only should only work on virtual SGPRs. Whereas the second pass works on virtual VGPRs. This adds a test case which breaks if the first pass allocates VGPRs.

Differential Revision: https://reviews.llvm.org/D135331
2022-10-06 14:31:51 -07:00
Pierre van Houtryve
3ec0085c3f [DAG] Update isKnownNeverNaN for FMA/FMAD
We can still get a NaN even if none of the operands are NaN,
e.g. from +inf/-inf. D50804 didn't catch that.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134854
2022-10-06 06:52:36 +00:00
Pierre van Houtryve
bb71079e30 [AMDGPU][GISel] Add missing V2S16 BUILD_VECTOR_TRUNC legalization
Previously we would be unable to legalize V2S16 BUILD_VECTOR_TRUNC on GFX8 & below as the custom legalization was missing.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135149
2022-10-06 06:48:53 +00:00
Carl Ritson
c316332e17 [Sink] Allow sinking of invariant loads across critical edges
Invariant loads can always be sunk.

Reviewed By: foad, arsenm

Differential Revision: https://reviews.llvm.org/D135133
2022-10-06 09:21:12 +09:00
jeff
cebec42089 [DAGCombiner] [AMDGPU] Allow vector loads in MatchLoadCombine
Since SROA chooses promotion based on reaching load / stores of allocas, we may run into scenarios in which we alloca a vector, but promote it to an integer. The result of which is the familiar LoadCombine pattern (i.e. ZEXT, SHL, OR). However, instead of coming directly from distinct loads, the elements to be combined are coming from ExtractVectorElements which stem from a shared load.

This patch identifies such a pattern and combines it into a load.

Change-Id: I0bc06588f11e88a0a975cde1fd71e9143e6c42dd
2022-10-04 12:16:00 -07:00
Pierre van Houtryve
75b292cb14 [AMDGPU][DAG] Fix insert_vector_elt lowering for 8 bit elements
The bitmask used to extract the bits assumed 16 bit elements and wasn't taking the size of the elements into account.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135156
2022-10-04 14:48:15 +00:00
Pierre van Houtryve
c93104073c [AMDGPU] Always lower SHUFFLE_VECTOR
Make it illegal, remove InstructionSelector logic for it

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134967
2022-10-04 14:23:17 +00:00
Jay Foad
af947d9fcb [ISel] Fix crash in new FMA DAG combine
Fix a crash in the FMA combine added by D132837 and amended by D134810.
In cases where the newly created node could be folded, the combiner
would fail this assertion:

llc: DAGCombiner.cpp:268: void (anonymous namespace)::DAGCombiner::AddToWorklist(llvm::SDNode *): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed.

Differential Revision: https://reviews.llvm.org/D135150
2022-10-04 15:13:18 +01:00
Thomas Symalla
82cac65dd2 [NFC][AMDGPU] Pre-commit test for D134418. 2022-10-04 14:30:56 +02:00
jeff
f4e6149d82 [AMDGPU] Use V_PERM to match buildvectors when inputs are not canonicalized (i.e. can't use V_PACK)
If we can not prove that f16 operands of a buildvector are canonicalized, then we can not lower into a V_PACK. In this scenario, we would previously lower into some combination of and(sdwa), shr, or. This patch allows for matching into V_PERM instead.

Change-Id: Ifa4a74fdb81ef44f22ba490c7fdf81ec8aebc945
2022-10-03 12:58:29 -07:00
Carl Ritson
a35013bec6 [AMDGPU][GFX11] Mitigate VALU mask write hazard
VALU use of an SGPR (pair) as mask followed by SALU write to the
same SGPR can cause incorrect execution of subsequent SALU reads
of the SGPR.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D134151
2022-10-01 16:21:24 +09:00
Jeffrey Byrnes
5a61340eb4 [AMDGPU] Fix tests in f6a2e6afed2 2022-09-30 12:53:08 -07:00
Jeffrey Byrnes
f6a2e6afed [AMDGPU] Precommit test case for D133584 2022-09-30 12:43:23 -07:00
Arthur Eubanks
e23aee7175 [test] Update some legacy PM tests 2022-09-30 11:31:02 -07:00
Joe Nash
50dfd3e9e4 [AMDGPU] Add test for FMAC_e64 dpp combine. NFC. 2022-09-30 12:48:12 -04:00
Pierre van Houtryve
d8258508d4 [AMDGPU][GISel] Update isCanonicalized
Recognize more opcodes in the function.
Fixes some regressions introduced in D134857 for fdiv.f16 too.

Depends on D134857

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D134862
2022-09-30 14:13:35 +00:00
Pierre van Houtryve
7388520d1c [GISel] Add more cases to isKnownNeverNaN
Make it even with the DAG implementation as of D134854

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D134857
2022-09-30 14:10:56 +00:00
Pierre van Houtryve
653beae5a1 [AMDGPU][GISel] Add Identity BUILD_VECTOR Combines
Folds-away BUILD_VECTOR-related noops in the post-legalizer combiner.

Depends on D134433

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134953
2022-09-30 14:07:13 +00:00
Pierre van Houtryve
9a67a6b72a [AMDGPU][GISel] Legalize V2S16 G_BUILD_VECTOR
Preparation patch for D134354 to make V2S16 G_BUILD_VECTOR legal.
Also removes RegBankInfo's scalarization of small BUILD_VECTORs,
replacing it with InstructionSelector logic instead.

This allows for V2S16 BUILD_VECTOR instructions to survive
all the way to ISel so we can select FMA/MAD_MIX instructions
in D134354.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134433
2022-09-30 14:04:53 +00:00
Thomas Symalla
a41dde2c62 [AMDGPU] Add use check in v_fma combine.
In D132837, an existing v_fma combine was extended to regard nested
fma instructions. Originally, the inner FMA was checked for being used
only once. In its current state, this check is missing, which causes
some regressions.

In this patch, this check was added.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D134856
2022-09-29 12:25:03 +02:00
Thomas Symalla
a41764810f [NFC][AMDGPU] Pre-commit FMA test. 2022-09-29 09:54:06 +02:00
Pierre van Houtryve
682c7c77f5 [AMDGPU] Update mad-mix* CodeGen tests
- Use `fneg %a` instead of `fsub -0.0, %a`
  - This is for D134354 as we don't currently support folding `fsub -0.0, %a` into `fneg` on GISel.
    Also, `fneg` is the canonical way to do the negation.
- Switch to `update_llc_test_checks`-generated tests.
  - Better test coverage
  - Easier to update
  - Easier to see changes in future diffs
- Remove unnecessary CL arguments in RUN lines

Motive for the patch: Preparation for D134354 - we would like to
put GISel tests in this file as well. Fixing the lack of `fneg` and
switching to generated testing makes it much easier.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134793
2022-09-29 07:11:34 +00:00
Abinav Puthan Purayil
3759398b4b [AMDGPU] Report minimum scratch size in code object v5 and later by default
This change sets
-amdgpu-assume-{external-call-stack-size | dynamic-stack-object-size}
options to zero by default for code object v5 and later. The runtime is
expected to adjust the scratch size if the amdhsa_uses_dynamic_stack bit
in the kernel descriptor is set.

Differential Revision: https://reviews.llvm.org/D128346
2022-09-29 09:52:45 +05:30
Jessica Paquette
1eb49bbab6 [GlobalISel][CallLowering] Use hasRetAttr for return flags on CallBases
Given something like this:

```
declare signext i16 @signext_callee()
define i32 @caller() {
  %res = call i16 @signext_callee()
  ...
}
```

CallLowering would miss that signext_callee's return value is sign extended,
because it isn't on the call.

Use hasRetAttr on the CallBase to allow us to catch this.

(This now inserts G_ASSERT_SEXT/G_ASSERT_ZEXT like in the original review.)

Differential Revision: https://reviews.llvm.org/D86228
2022-09-28 19:38:24 -07:00
Jay Foad
2c12a04bba [ISel] Fix DAG divergence after new FMA combine
D132837 introduced a new DAG combine that used MorphNodeTo to morph an
FMUL into an FMA. It turns out that MorphNodeTo does not properly update
the divergence bit for users of the morphed node, causing an assertion
failure on the new test case:

llc: SelectionDAG.cpp:10486: void llvm::SelectionDAG::VerifyDAGDivergence(): Assertion `calculateDivergence(N) == N->isDivergent() && "Divergence bit inconsistency detected"' failed.

Fixing MorphNodeTo to propagate the divergence bit is tricky because of
the way it is used to select machine instructions, so use getNode and
ReplaceAllUsesOfValueWith instead.

Differential Revision: https://reviews.llvm.org/D134810
2022-09-28 19:41:51 +01:00
Baptiste
b556726ccc [AMDGPU] Avoid flushing the vmcnt counter in loop preheaders if not necessary
One of the conditions to flush the vmcnt counter in loop preheaders is: The loop
contains a use of a vgpr that is defined out of the loop. The code currently
checks if a waitcnt is needed by looking at the score of that vgpr in the score
brackets. This is not enough and may cause the generation of an unnecessary
vmcnt flush. This patch fixes that case.

Differential Revision: https://reviews.llvm.org/D130313
2022-09-28 13:05:50 -04:00
Jon Chesterfield
35f2584ef9 [amdgpu] Error, instead of miscompile, anonymous kernels using lds
The association between kernel and struct is done by symbol name.
This doesn't work robustly for anonymous kernels as shown by the modified
test case.

An alternative association between function and struct can be constructed
if necessary, probably though metadata, but on the basis that we currently
miscompile anonymous kernels and that they are difficult to construct from
application code and difficult to call from the runtime, this patch makes
it a fatal error for now.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134741
2022-09-28 16:30:04 +01:00