92 Commits

Author SHA1 Message Date
Jay Foad
95258419f6
[AMDGPU] Use AMDGPU::isIntrinsicAlwaysUniform in isSDNodeAlwaysUniform (#87085)
This is mostly just a simplification, but tests show a slight codegen
improvement in code using the deprecated amdgcn.icmp/fcmp intrinsics.
2024-03-30 08:01:18 +00:00
Fangrui Song
9e9907f1cf
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.

For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.

This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:

```
  LLVM :: CodeGen/AMDGPU/fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fabs.ll
  LLVM :: CodeGen/AMDGPU/floor.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
  LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
  LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
2024-01-16 21:54:58 -08:00
Diana Picus
20e9e4f797 [AMDGPU] si-wqm: Skip only LiveMask COPY
si-wqm sometimes needs to save the LiveMask in the entry block. Later
on, while looking for a place to enter WQM/WWM, it unconditionally
skips over the first COPY instruction in the entry block. This is
incorrect for functions where the LiveMask doesn't need to be saved, and
therefore the first COPY is more likely a COPY from a function argument
and might need to be in some non-exact mode.

This patch fixes the issue by also checking that the source of the COPY
is the EXEC register.

This produces different code in 3 of the existing tests:

In wwm-reserved.ll, a SGPR copy is now inside the WWM area rather than
outside. This is benign.

In wave32.ll, we end up with an extra register copy. This is because
the first COPY in the block is now part of the WWM block, so
si-pre-allocate-wwm-regs will allocate a new register for its
destination (when it was outside of the WWM region, the register
allocator could just re-use the same register). We might be able to
improve this in si-pre-allocate-wwm-regs but I haven't looked into it.

The same thing happens in dual-source-blend-export.ll, but for that
one it's harder to see because of the scheduling changes. I've uploaded
the before/after si-wqm output for it here:
https://reviews.llvm.org/differential/diff/553445/

Differential Revision: https://reviews.llvm.org/D158841
2023-11-10 09:30:44 +01:00
Jay Foad
7b3bbd83c0 Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.

Reverted due to various buildbot failures.
2023-10-09 12:31:32 +01:00
Jay Foad
2501ae58e3
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
2023-10-09 11:44:41 +01:00
Ivan Kosarev
f04aa1f814
[AMDGPU][CodeGen] Fold immediates in src1 operands of V_MAD/MAC/FMA/FMAC. (#68002) 2023-10-05 14:22:29 +03:00
Jay Foad
e0919b189b [CodeGen] Renumber slot indexes before register allocation (#66334)
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering all slot
indexes with the default instruction distance ensures that this estimate
will be as accurate as possible, and will not depend on the history of
how instructions have been added to and removed from SlotIndexes's maps.

This also means that enabling -early-live-intervals, which runs the
SlotIndexes analysis earlier, will not cause large amounts of churn due
to different register allocator decisions.
2023-09-19 11:18:12 +01:00
Konstantina Mitropoulou
2c5d1b5ab7 [DAGCombiner] Reassociate the operands from (OR (OR(CMP1, CMP2)), CMP3) to (OR (OR(CMP1, CMP3)), CMP2)
This happens when CMP1 and CMP3 have the same predicate (or CMP2 and CMP3 have
the same predicate).

This helps optimizations such as the fololowing one:
CMP(A,C)||CMP(B,C) => CMP(MIN/MAX(A,B), C)
CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C)

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156215
2023-08-08 20:08:01 -07:00
Matt Arsenault
4d42e8b5d1 Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.

The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work
around the underlying problem with SUBREG_TO_REG.
2023-07-31 20:15:45 -04:00
Vitaly Buka
a496c8be6e Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.

No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

Reviewed By: alexfh

Differential Revision: https://reviews.llvm.org/D156381
2023-07-26 22:13:32 -07:00
Simon Pilgrim
e9caa37e9c [DAG] Move lshr narrowing from visitANDLike to SimplifyDemandedBits
Inspired by some of the cases from D145468

Let SimplifyDemandedBits handle the narrowing of lshr to half-width if we don't require the upper bits, the narrowed shift is profitable and the zext/trunc are free.

A future patch will propose the equivalent shl narrowing combine.

Differential Revision: https://reviews.llvm.org/D146121
2023-07-17 15:50:09 +01:00
Christudasan Devadasan
7a98f084c4 [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.

This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.

Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which is an unproblematic case.

Differential Revision: https://reviews.llvm.org/D124196
2023-07-07 23:14:32 +05:30
Jay Foad
f2c164c815 [AMDGPU] Do not wait for vscnt on function entry and return
SIInsertWaitcnts inserts waitcnt instructions to resolve data
dependencies. The GFX10+ vscnt (VMEM store count) counter is never used
in this way. It is only used to resolve memory dependencies, and that is
handled by SIMemoryLegalizer. Hence there is no need to conservatively
wait for vscnt to be 0 on function entry and before returns.

Differential Revision: https://reviews.llvm.org/D153537
2023-07-04 12:22:38 +01:00
Krzysztof Drewniak
faa2c678aa [AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.

The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.

One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.

In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.

This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.

Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.

Depends on D145441

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D147547
2023-06-05 16:59:07 +00:00
pvanhout
b3b3cb2d2f [AMDGPU] Less aggressively break large PHIs
In some cases, breaking large PHIs can very negatively affect
performance (3x more instructions observed in a particular test case).

This patch adds some basic profitability heuristics to help with some of these issues without affecting the "good" cases.
e.g. avoid breaking PHIs if it causes back-and-forth between vector/scalar form for no good reason.

Fixes SWDEV-392803
Fixes SWDEV-393781
Fixes SWDEV-394228

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D147786
2023-04-14 15:41:26 +02:00
Nuno Lopes
047efda60d Revert "[test] Remove occurences of br undef in CodeGen/AMDGPU tests"
This reverts commit 18c594c176036b7ffcd8439ed9c4b08d2085a244.
Build bots broke
2023-04-04 13:45:56 +01:00
Kriti Gupta
18c594c176 [test] Remove occurences of br undef in CodeGen/AMDGPU tests
Differential Revision: https://reviews.llvm.org/D145622
2023-04-04 13:15:44 +01:00
pvanhout
d892521076 [AMDGPU] Break-up large PHIs for DAGISel
DAGISel uses CopyToReg/CopyFromReg to lower PHI nodes. With large PHIs, this can result in poor codegen.
This is because it introduces a need to have a build_vector before copying the PHI value, and that build_vector may have many undef elements. This can cause very high register pressure and abnormal stack usage in some cases.

This scalarization/phi "break-up" can be easily tuned/disabled through CL options in case it's not beneficial for some users.
It's also only enabled for DAGIsel and GlobalISel handles PHIs much better (as it works on the whole function).

This can both scalarize (break a vector into its elements) and simplify (break a vector into smaller, more manageable subvectors) PHIs.

Fixes SWDEV-321581

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D143731
2023-03-28 09:38:47 +02:00
Max Kazantsev
6b03ce374e [LICM] Simplify (X < A && X < B) into (X < MIN(A, B)) if MIN(A, B) is loop-invariant
We don't do this transform in InstCombine in general case for arbitrary values, because cost of
AND and 2 ICMP's isn't higher than of MIN and ICMP. However, LICM also has a notion
about the loop structure. This transform becomes profitable if `A` and `B` are loop-invariant and
`X` is not: by doing this, we can compute min outside the loop.

Differential Revision: https://reviews.llvm.org/D143726
Reviewed By: nikic
2023-03-10 17:36:52 +07:00
Max Kazantsev
279f0c02ad [Test] Regenerate tests using update_llc_test_checks.py 2023-03-10 11:34:16 +07:00
Christudasan Devadasan
29247824f5 [AMDGPU][SIFrameLowering] Use the right frame register in CSR spills
Unlike the callee-saved VGPR spill instructions emitted by
`PEI::spillCalleeSavedRegs`, the CS VGPR spills inserted during
emitPrologue/emitEpilogue require the exec bits flipping to avoid
clobbering the inactive lanes of VGPRs used for SGPR spilling.
Currently, these spill instructions are referenced from the SP at
function entry and when the callee performs a stack realignment,
they ended up getting incorrect stack offsets. Even if we try to
adjust the offsets, the FP-SP becomes a runtime entity with dynamic
stack realignment and the offsets would still be inaccurate.

To fix it, use FP as the frame base in the spill instructions
whenever the function has FP. The offsets obtained for the CS
objects would always be the right values from FP.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134949
2022-12-17 11:52:36 +05:30
Christudasan Devadasan
b25b4c0ab4 [AMDGPU] Separate out SGPR spills to VGPR lanes during PEI
SILowerSGPRSpills pass handles the lowering of SGPR spills
into VGPR lanes. Some SGPR spills are handled later during
PEI. There is a common function used in both places to find
the free VGPR lane. This patch eliminates that dependency to
find the free VGPR by handling it separately for PEI. It is a
prerequisite patch for a future work to allow SGPR spills to
virtual VGPR lanes during SILowerSGPRSpills.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D124195
2022-12-17 11:49:41 +05:30
Matt Arsenault
d85e849ff4 AMDGPU: Convert some assorted tests to opaque pointers 2022-12-01 21:40:30 -05:00
Alexander Timofeev
2e29b0138c [AMDGPU] Lowering VGPR to SGPR copies to v_readfirstlane_b32 if profitable.
Since the divergence-driven instruction selection has been enabled for AMDGPU,
 all the uniform instructions are expected to be selected to SALU form, except those not having one.
 VGPR to SGPR copies appear in MIR to connect values producers and consumers. This change implements an algorithm
 that evolves a reasonable tradeoff between the profit achieved from keeping the uniform instructions in SALU form
 and overhead introduced by the data transfer between the VGPRs and SGPRs.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D128252
2022-07-14 23:59:02 +02:00
David Stuttard
77851cc1cf [AMDGPU] Change use null for dead sdst to be gfx1030+
Pre gfx1030 null for sdst is different.
c97436f8b6e2 [AMDGPU] Use null for dead sdst operand - requires a change to make
it not apply to pre gfx1030

Differential Revision: https://reviews.llvm.org/D127869
2022-06-16 10:39:06 +01:00
Stanislav Mekhanoshin
c97436f8b6 [AMDGPU] Use null for dead sdst operand
Differential Revision: https://reviews.llvm.org/D127542
2022-06-13 14:41:40 -07:00
Venkata Ramanaiah Nalamothu
04fff547e2 [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm, ronlieb

Differential Revision: https://reviews.llvm.org/D114652
2022-03-09 12:18:02 +05:30
Carl Ritson
565af157ef [AMDGPU] Extend pre-emit peephole to redundantly masked VCC
Extend pre-emit peephole for S_CBRANCH_VCC[N]Z to eliminate
redundant S_AND operations against EXEC for V_CMP results in VCC.
These occur after after register allocation when VCC has been
selected as the comparison destination.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D120202
2022-02-25 10:18:31 +09:00
Jay Foad
f510045d82 [CodeGen] Remove unneeded regex escaping in FileCheck patterns. NFC.
Take advantage of D117117 to simplify all {{\[}} to [ and {{\]}} to ].

Differential Revision: https://reviews.llvm.org/D117298
2022-02-18 16:10:56 +00:00
Jay Foad
8faad29634 Revert "[Local] invertCondition: try modifying an existing ICmpInst"
This reverts commit a6b54ddaba2d5dc0f72dcc4591c92b9544eb0016.

Apparently it is not safe to modify the condition even if it passes the
hasOneUse test, because StructurizeCFG might have other references to
the condition that are not manifest in the IR use-def chains.
2022-01-31 14:55:36 +00:00
Jay Foad
a6b54ddaba [Local] invertCondition: try modifying an existing ICmpInst
This avoids various cases where StructurizeCFG would otherwise insert an
xor i1 instruction, and it since it generally runs late in the pipeline,
instcombine does not clean up the xor-of-cmp pattern.

Differential Revision: https://reviews.llvm.org/D118478
2022-01-31 10:44:17 +00:00
Jay Foad
7af959673e [AMDGPU] Tweak some compares in wave32.ll test
This prevents the compares from being optimized away when D86578 lands,
which seems unintended.
2022-01-19 08:14:06 +00:00
Ron Lieberman
09b53296cf Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range"
This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3.

 Failed amdgpu runtime buildbot # 3514
2021-12-22 11:39:28 -05:00
RamNalamothu
9075009d1f [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D114652
2021-12-22 20:51:12 +05:30
RamNalamothu
18f9351223 [AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local
branch target labels. This bloats the code object, slows down the loader,
and is only used to simplify disassembly.

Use '--symbolize-operands' with llvm-objdump to improve readability of the
branch target operands in disassembly.

Fixes: SWDEV-312223

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D114273
2021-11-20 10:32:41 +05:30
Jay Foad
a70bbb5f7a [AMDGPU] Simplify 64-bit division/remainder expansion
The old expansion open-coded a 64-bit addition in a strange way, by
adding the high parts *without* carry-in from the low part, and then
adding the carry back in later on. Fixing this saves a couple of
instructions and makes the code much easier to understand.

Differential Revision: https://reviews.llvm.org/D113679
2021-11-12 15:48:41 +00:00
Stanislav Mekhanoshin
c80d8a8cea [AMDGPU] MachineLICM cannot hoist VALU
MachineLoop::isLoopInvariant() returns false for all VALU
because of the exec use. Check TII::isIgnorableUse() to
allow hoisting.

That unfortunately results in higher register consumption
since MachineLICM does not adequately estimate pressure.
Therefor I think it shall only be enabled after D107677 even
though it does not depend on it.

Differential Revision: https://reviews.llvm.org/D107859
2021-10-20 11:47:24 -07:00
Stanislav Mekhanoshin
2cfda6a691 [AMDGPU] Fold immediates in the optimizeCompareInstr
Peephole works before the first SIFoldOperands so most of
the immediates are in registers.

Differential Revision: https://reviews.llvm.org/D109186
2021-09-02 17:23:26 -07:00
Stanislav Mekhanoshin
f3645c792a [AMDGPU] Use S_BITCMP1_* to replace AND in optimizeCompareInstr
Differential Revision: https://reviews.llvm.org/D109082
2021-09-01 15:59:12 -07:00
Stanislav Mekhanoshin
bf77b11277 [AMDGPU] Introduce optimizeCompareInstr
The following patterns are currently handled:

s_cmp_eq_u32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1
s_cmp_eq_i32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1
s_cmp_eq_u64 (s_and_b64 $src, 1), 1 => s_and_b64 $src, 1
s_cmp_ge_u32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1
s_cmp_ge_i32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1
s_cmp_lg_u32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1
s_cmp_lg_i32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1
s_cmp_lg_u64 (s_and_b64 $src, 1), 0 => s_and_b64 $src, 1
s_cmp_gt_u32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1
s_cmp_gt_i32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1

Differential Revision: https://reviews.llvm.org/D109031
2021-09-01 15:57:05 -07:00
alex-t
e3cbf1d437 [AMDGPU] enable scalar compare in truncate selection
Currently, the truncate selection dag node is expanded as a bitwise AND plus compare to 1.  This change enables scalar comparison in the pattern if the truncate node is uniform.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108925
2021-09-01 23:35:11 +03:00
alex-t
ed0f4415f0 [AMDGPU] Divergence-driven compare operations instruction selection
Description: This change enables the compare operations to be selected to SALU/VALU form
             dependent of the SDNode divergence flag.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D106079
2021-08-25 18:30:49 +03:00
Simon Pilgrim
34dc4f24f2 Revert rG939291041bb35b8088e3b61be2b8b3bc950f64a7 "[AMDGPU] Regenerate wave32.ll test checks"
This still breaks buildbots
2021-07-25 15:59:26 +01:00
Simon Pilgrim
939291041b [AMDGPU] Regenerate wave32.ll test checks
To simplify diff in future patch
2021-07-25 15:13:09 +01:00
Sebastian Neubauer
96e1fcb1e0 [AMDGPU] Use s_add_i32 for address additions
This allows to convert the add instruction to s_addk_i32 and
v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU
instruction.

Differential Revision: https://reviews.llvm.org/D103322
2021-06-07 16:09:48 +02:00
Simon Pilgrim
81606ab856 Revert rGd70cbd1ce9b426f2c7e0e0f900769bbcbb300a95 "[AMDGPU] Regenerate wave32.ll tests"
This is failing on buildbots but not locally - not sure why
2021-05-18 12:15:38 +01:00
Simon Pilgrim
d70cbd1ce9 [AMDGPU] Regenerate wave32.ll tests
Keep the manual GFX10DEFWAVE checks for VGPRBlocks
2021-05-18 12:03:31 +01:00
Jay Foad
6fec0a34ce [AMDGPU] Fix typo in regular expression checks. NFC. 2021-04-06 12:29:48 +01:00
Dmitry Preobrazhensky
cd953434f2 [AMDGPU][MC][GFX10][GFX90A] Corrected _e32/_e64 suffices
Fixed bugs https://bugs.llvm.org//show_bug.cgi?id=49643, https://bugs.llvm.org//show_bug.cgi?id=49644, https://bugs.llvm.org//show_bug.cgi?id=49645.

Differential Revision: https://reviews.llvm.org/D99413
2021-04-01 14:21:00 +03:00
Simon Pilgrim
4a68740547 Revert rG3b635253ddd0106c88051cff3540d8eb90bee22f "[AMDGPU] Regenerate wave32.ll test checks"
Breaks on some buildbots.
2021-03-17 11:47:09 +00:00