48002 Commits

Author SHA1 Message Date
Philip Reames
33314693f5 Revert "[RISCV][InsertVSETVLI] Avoid VL toggles for extractelement patterns"
This reverts commit 657d20dc75252f0c8415ada5214affccc3c98efe.  A correctness problem was reported against the review and the fix warrants re-review.
2023-05-10 10:58:46 -07:00
Konstantin Zhuravlyov
9d05727972 AMDGPU: Add basic gfx942 target
Differential Revision: https://reviews.llvm.org/D149983
2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov
1fc70210a6 AMDGPU: Add basic gfx941 target
Differential Revision: https://reviews.llvm.org/D149982
2023-05-10 11:51:06 -04:00
Yingwei Zheng
e5532fb493
[RISCV] Enable signed truncation check transforms for i8
This patch enables signed truncation check transforms for i8 on rv32 when XVT is i64 and Zbb is enabled.

It is a small improvement of D149977.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D150177
2023-05-10 22:53:24 +08:00
Matt Devereau
004bf170c6 [AArch64] Emit FNMADD instead of FNEG(FMADD)
Emit FNMADD instead of FNEG(FMADD) for optimization levels
above Oz when fast-math flags (nsz+contract) permit it.

Differential Revision: https://reviews.llvm.org/D149260
2023-05-10 12:45:54 +00:00
Jonas Paulsson
655f0fc4b9 Reapply "[SystemZ] Bugfix in expansion of memmem operations."
The new test case showed that the NoPHIs flag needs to be cleared.

Original commit message:

[SystemZ] Bugfix in expansion of memmem operations.

Since NC, OC, and XC clobber CC, the EXRL_Pseudo targeting these must also be
marked to do so.

Original patch by uweigand.

Reviewed by: uweigand

Differential Revision: https://reviews.llvm.org/D150251

Fixes: https://github.com/llvm/llvm-project/issues/62572
2023-05-10 12:40:57 +02:00
Jonas Paulsson
dfa42a69b8 Revert "[SystemZ] Bugfix in expansion of memmem operations."
Sorry - mir test fails with expensive checks on build bot.

Seems to relate to the fact that there are no PHIs in the .mir input, but after
they are created the verifyer reports "Found PHI instruction with NoPHIs property
set".

This reverts commit 00454a17f361d677d5423905c888daca1a80661a.
2023-05-10 11:34:55 +02:00
Jonas Paulsson
00454a17f3 [SystemZ] Bugfix in expansion of memmem operations.
Since NC, OC, and XC clobber CC, the EXRL_Pseudo targeting these must also be
marked to do so.

Original patch by uweigand.

Reviewed by: uweigand

Differential Revision: https://reviews.llvm.org/D150251

Fixes: https://github.com/llvm/llvm-project/issues/62572
2023-05-10 11:05:13 +02:00
Matt Arsenault
62eac3e068 GlobalISel: Fold out G_FPTRUNC(G_FPEXT) 2023-05-10 08:01:27 +01:00
Serguei Katkov
92663cd464 [X86] Add test cases for fminimum/fmaximum with vector zero operands. 2023-05-10 11:54:51 +07:00
Chen Zheng
fb45493562 [DebugLine] save one debug line entry for empty prologue
Reland D147506 after fixing the failure in bot
https://lab.llvm.org/buildbot/#/builders/247/builds/4125

Some debuggers like DBX on AIX assume the address in debug line
entries is always incremental. But clang generates two entries (entry
for file scope line and entry for prologue end) with same address if
prologue is empty

And if the prologue is empty, seems the first debug line entry for the
function is unnecessary(i.e. removing the first entry won't impact the
behavior in GDB on Linux), so I implement this for all debuggers.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D147506
2023-05-10 01:21:02 +00:00
Aaron Ballman
f08b94d5c2 Further amend 945f6e65be0d20b3446e7c1537c64151de618af4
This addresses the ARM issue found by:
https://lab.llvm.org/buildbot/#/builders/109/builds/63726

(This test wouldn't run for me locally, hence missing it in the last fix.)
2023-05-09 16:16:07 -04:00
Aaron Ballman
b97527f144 Fix test failure from 945f6e65be0d20b3446e7c1537c64151de618af4
It seems we were testing the behavior of the debug messages!
2023-05-09 16:09:55 -04:00
Sami Tolvanen
e9569748de [CodeGen][KCFI] Move cfi-type lowering to TargetLowering
KCFI machine function passes transform indirect calls with a
cfi-type attribute into architecture-specific type checks bundled
together with the calls. Instead of having a separate pass for each
architecture, add a generic machine function pass for KCFI and
move the architecture-specific code that emits the actual check to
TargetLowering. This avoids unnecessary duplication and makes it
easier to add KCFI support to other architectures.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D149915
2023-05-09 18:38:54 +00:00
Thomas Symalla
60a4cb7076 [NFC][AMDGPU] Add option to test. 2023-05-09 17:24:00 +02:00
Thomas Symalla
80f442e1ed [NFC][AMDGPU] Pre-commit test.
Pre-commit fold-fabs.ll.
2023-05-09 14:53:44 +02:00
Tom Dohrmann
f6154364f6 fix stack probe lowering for x86_intrcc
The x86_intrcc calling convention will build two STACKALLOC_W_PROBING machine instructions if the function takes an error code. This is caused by an additional call to emitSPUpdate in llvm/lib/Target/X86/X86FrameLowering.cpp:1650. Previously only the first STACKALLOC_W_PROBING machine instruction was properly handled, the second one was simply ignored. This lead to miscompilations where the stack pointer wasn't properly updated (see https://github.com/rust-lang/rust/issues/109918). This patch fixes this by handling all STACKALLOC_W_PROBING machine instructions.

To be honest I don't quite understand why this didn't lead to more noticeable miscompilations previously.

This is my first time contributing to LLVM.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D150033
2023-05-09 16:31:42 +08:00
pvanhout
0d1864a754 [AMDGPU] Recompute liveness in SIOptimizeExecMaskingPreRA
Instead of ad-hoc updating liveness, recompute it completely for the affected register.

This does not affect any existing test and fixes an edge case that
caused a "Non-empty but used interval" error in the register allocator
due to how the pass updated liveranges. It created a "isolated" live-through
segment.

Overall this change just seems to be a net positive with no side effect observed. There may be a compile time impact but it's expected to be minimal.

Fixes SWDEV-388279

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D150105
2023-05-09 09:02:22 +02:00
Weining Lu
161716a713 [LoongArch] Support fcc* (condition flag) registers in inlineasm clobbers
Differential Revision: https://reviews.llvm.org/D150089
2023-05-09 14:55:50 +08:00
Amara Emerson
e1472db58e [GlobalISel] Implement commuting shl (add/or x, c1), c2 -> add/or (shl x, c2), c1 << c2
There's a target hook that's called in DAGCombiner that we stub here, I'll
implement the equivalent override for AArch64 in a subsequent patch since it's
used by different shift combine.

This change by itself has minor code size improvements on arm64 -Os CTMark:
Program                                       size.__text
                                              outputg181ppyy output8av1cxfn diff
consumer-typeset/consumer-typeset             410648.00      410648.00       0.0%
tramp3d-v4/tramp3d-v4                         364176.00      364176.00       0.0%
kimwitu++/kc                                  449216.00      449212.00      -0.0%
7zip/7zip-benchmark                           576128.00      576120.00      -0.0%
sqlite3/sqlite3                               285108.00      285100.00      -0.0%
SPASS/SPASS                                   411720.00      411688.00      -0.0%
ClamAV/clamscan                               379868.00      379764.00      -0.0%
Bullet/bullet                                 452064.00      451928.00      -0.0%
mafft/pairlocalalign                          246184.00      246108.00      -0.0%
lencod/lencod                                 428524.00      428152.00      -0.1%
                           Geomean difference                               -0.0%

Differential Revision: https://reviews.llvm.org/D150086
2023-05-08 22:37:43 -07:00
Alan Zhao
f4999d3535 Revert "[CodeGen][ShrinkWrap] Split restore point"
This reverts commit 1ddfd1c8186735c62b642df05c505dc4907ffac4.

The original commit causes a Chrome build assertion failure with
ThinLTO: https://crbug.com/1443635
2023-05-08 16:27:59 -07:00
Craig Topper
6b429a9cf9 [RISCV] Improve RV64 codegen for i32 ISD::SADDO when RHS is constant.
This uses the same sequence we get from LegalizeDAG for i32 on RV32, but modified
to use W instructions.

When the RHS is constant one of the setccs simplifies to a constant and the xor will either
be an xori with 1 or get removed.

When the RHS is not a constant it was not an obvious improvement and it was a regression
when used with a branch. So I've restricted to the constant case.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D150135
2023-05-08 13:02:19 -07:00
Stefan Pintilie
be95b4dec2 [PowerPC] Look through OR, AND, XOR instructions when checking a clear.
This patch adds the additional step of looking through AND, OR, XOR
instructions when we check the number of leading zeros.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D149223
2023-05-08 14:25:20 -04:00
Craig Topper
62e6082dcc [RISCV] Implement shouldTransformSignedTruncationCheck.
This helps avoid constant materialization for the patterns
InstCombine emits for something like INT_MIN <= X && x <= INT_MAX.

See top of changed test files for more detailed explanation.

I've enabled this for i16 when Zbb is enabled. sext.b did not seem
to be a benefit due to the constants folding into addi/sltiu.

This an alternative to https://reviews.llvm.org/D149814

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D149977
2023-05-08 09:52:52 -07:00
Alvin Wong
6b996282ce [X86][CodeGen] Do not add offset for memory reference using symbol
In the past, D71436 added writing the `offset` operator for some
legitimate cases. However, for memory references in Intel syntax, the
`offset` operator (`[offset sym]`) appears to be superfluous at best,
possibly wrong and contradictory at worst.

This patch bypasses writing the `offset` operator in
`X86AsmPrinter::PrintIntelMemReference` which affects exactly this
case. A similar code flow exists in `X86IntelInstPrinter.cpp` -
`X86IntelInstPrinter::printMemReference`.

The motivation for fixing this output is to allow us to reject the
confusing `call [offset fn_ref]` syntax in MC, as discussed in D149579.

Depends on D149579

Differential Revision: https://reviews.llvm.org/D150047
2023-05-09 00:07:40 +08:00
Simon Pilgrim
d53b76e3dd [X86] vector-reduce-or-bool.ll - add common AVX prefix
Share with AVX512 to reduce duplication
2023-05-08 15:30:41 +01:00
Simon Pilgrim
55f8083f7b [X86] avx512-insert-extract.ll - add nounwind to silence cfi noise 2023-05-08 14:13:25 +01:00
sgokhale
1ddfd1c818 [CodeGen][ShrinkWrap] Split restore point
Try to reland D42600

Differential Revision: https://reviews.llvm.org/D42600
2023-05-08 13:21:07 +05:30
sgokhale
7cba800104 [CodeGen] Autogen tests as prerequisite for D42600
Autogenerating tests as suggested in D42600
2023-05-08 12:25:51 +05:30
Noah Goldstein
3e998ede64 [X86] Lower used (atomicrmw xor p, SignBit) as (atomicrmw add p, SignBit)
`(xor X, SignBit)` == `(add X, SignBit)`. For atomics whose result is
used, the `add` option is preferable because of the `xadd` instruction
which allows us to avoid either a CAS loop or a `btc; setcc; shl`.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D149689
2023-05-07 19:11:53 -05:00
Noah Goldstein
3886985217 [X86] Add tests for (atomicrmw xor p, Imm); NFC
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D149687
2023-05-07 19:11:52 -05:00
Manoj Gupta
4157625cea Revert "[AArch64] Emit FNMADD instead of FNEG(FMADD)"
This reverts commit ea228bd0bd0173ffd4aac497a312a852e8f7ffad.

Cuases a crash on AArch64. Testcase provided at D149260.
2023-05-07 16:38:08 -07:00
Amara Emerson
345e419483 [NFC][AArch64][GlobalISel] Add a gisel run line for shift-logic.ll 2023-05-07 15:50:59 -07:00
David Green
b774f14841 [DAG] Calculate the number of sign bits for constant BUILD_VECTOR directly.
For constant BUILD_VECTORs the operands need to be legal types. This can mean
that when the number of sign bits is calculated it may look that the entire
constant and inefficiently produce less sign bits than it could. For example i8
vectors could use i32 elements, for which 0x000000ff would be incorrectly
limited to 1 sign bit as the original value has 24 sign bits. This makes it
look at the constant directly, truncated to the correct type for the element so
that it can correctly return 8.

Differential Revision: https://reviews.llvm.org/D149956
2023-05-07 22:31:10 +01:00
Simon Pilgrim
17dd1ad14b [X86] lowerShuffleAsElementInsertion - fold to or(vzext_movl(scalar_to_vector(zext(x))), and(constant, mask))
The logic in this function is a bit of a mess, but masking a vector constant should allow us to OR the zero-extended i8/i16 scalar value in place.

We can do more here - reusing the OR pattern if the relevant unused elements are known zero etc. but this is enough to address a regression from D127115.
2023-05-07 20:58:14 +01:00
Simon Pilgrim
d02848b2ab [X86] or-with-overflow.ll - adjust or_i64_ri constant to not constant fold the icmp
Better KnownBits handling of the icmp and/or an upcoming USUBSAT fold would constant fold this test away and prevent us testing for a cleared overflow flag.
2023-05-06 22:42:14 +01:00
Simon Pilgrim
b7116ba8b0 [DAG] computeOverflowForUnsignedAdd - use ConstantRange::unsignedAddMayOverflow as fallback
Replaces the more specific uadd_ov case
2023-05-06 22:03:38 +01:00
Simon Pilgrim
8f82d8ee76 [DAG] visitSUBSAT - fold subsat(x,y) -> sub(x,y) if it never overflows 2023-05-06 15:55:04 +01:00
Simon Pilgrim
05a57fd18c [X86] Add tests showing failure to simplify ssubsat/usubsat to sub 2023-05-06 15:55:04 +01:00
Simon Pilgrim
08c1150d4c [DAG] Add computeOverflowForSignedSub/computeOverflowForUnsignedSub/computeOverflowForSub
Match the addition variants (although computeOverflowForUnsignedSub is really just a placeholder), and use this in DAGCombiner::visitSUBO
2023-05-06 15:55:04 +01:00
Jay Foad
3551e0f345 [RegisterCoalescer] Fix problem with IMPLICIT_DEF live-in to an invoke
Give up on erasing an IMPLICIT_DEF if it might be live-in to a call
instruction in a basic block with EH pad successors. This fixes a
liveness bug that will be diagnosed by MachineVerifer when D149947
lands.

Differential Revision: https://reviews.llvm.org/D149954
2023-05-06 15:16:54 +01:00
Jay Foad
226ff45214 [X86] Generate checks for 2012-01-10-UndefExceptionEdge
Also add -verify-machineinstrs to make it easier to catch a
MachineVerifier failure introduced by D149947.

Differential Revision: https://reviews.llvm.org/D149953
2023-05-06 15:16:54 +01:00
Simon Pilgrim
3fb067f7ba [DAG] visitADDSAT - fold saddsat(x,y) -> add(x,y) if it never overflows
Extend existing uaddsat(x,y) fold
2023-05-06 14:18:23 +01:00
Simon Pilgrim
7395f6ae78 [DAG] Add computeOverflowForSignedAdd and computeOverflowForAdd wrapper
Add basic computeOverflowForSignedAdd helper to recognise that sadd overflow can't occur if both operands have more that one sign bit.

Add computeOverflowForAdd wrapper that calls computeOverflowForSignedAdd/computeOverflowForUnsignedAdd depending on the IsSigned argument, and use this in DAGCombiner::visitADDO
2023-05-06 13:33:14 +01:00
Simon Pilgrim
5e029f0142 [X86] xaluo.ll - add test coverage showing the failure to recognise when saddo/ssubo will not overflow
sadd/ssub with both operands with more than one sign bit will not overflow

Alive2: https://alive2.llvm.org/ce/z/a8HmNp
2023-05-06 13:33:14 +01:00
Simon Pilgrim
d24c179682 [X86] Regenerate xaluo.ll with common CHECK prefix 2023-05-06 13:33:14 +01:00
Fangrui Song
8afd831b45 ms inline asm: recognize case-insensitive JMP and CALL as TargetLowering::C_Address
In a `__asm` block, a symbol reference is usually a memory constraint
(indirect TargetLowering::C_Memory) [LOOP]. CALL and JUMP instructions are special
that `__asm call k` can be an address constraint, if `k` is a function.

Clang always gives us indirect TargetLowering::C_Memory and need to convert it
to direct TargetLowering::C_Address. D133914 implements this conversion, but
does not consider JMP or case-insensitive CALL. This patch implements the missing
cases, so that `__asm jmp k` (`jmp ${0:P}`) will correctly lower to `jmp _k`
instead of `jmp dword ptr [_k]`.

(`__asm call k` lowered to `call dword ptr ${0:P}` and is fixed by D149695 to
lower to `call ${0:P}` instead.)

[LOOP]: Some instructions like LOOP{,E,NE} and Jcc always use an address
constraint (`loop _k` instead of `loop dword ptr [_k]`).

After this patch and D149579, all the following cases will be correct.
```
int k(int);
int (*kptr)(int);
...
__asm call k; // correct without this patch
__asm CALL k; // correct, but needs this patch to be compatible with D149579
__asm jmp k;  // correct, but needs this patch to be compatible with D149579
__asm call kptr; // will be fixed by D149579.  "Broken case" in clang/test/CodeGen/ms-inline-asm-functions.c
__asm jmp kptr;  // will be fixed by this patch and D149579
```

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D149920
2023-05-05 15:32:32 -07:00
Craig Topper
9c377c53da [RISCV] Copy lack-of-signed-truncation-check.ll and signed-truncation-check.ll from AArch6/X86. NFC
This is a more exhaustive set of tests for the same issue D149814
is trying to solve.
2023-05-05 11:15:39 -07:00
David Green
6e7840dd42 [AArch64] Extend fp64 top zeroing peephole to all instructions
D147235 added a fold to remove instructions that zero the upper half of a
register if the instruction already implicitly zeros the register. As far as I
can tell this applies to all instructions that define a FPR64 register in
AArch64. This patch switches to a check for the register class. The full list
of instructions is:
  BSPv8i8
  FMOVD0
  ABSv1i64
  ABSv2i32
  ABSv4i16
  ABSv8i8
  ADDHNv2i64_v2i32
  ADDHNv4i32_v4i16
  ADDHNv8i16_v8i8
  ADDPv2i32
  ADDPv2i64p
  ADDPv4i16
  ADDPv8i8
  ADDv1i64
  ADDv2i32
  ADDv4i16
  ADDv8i8
  ANDv8i8
  BF16DOTlanev4bf16
  BFDOTv4bf16
  BICv2i32
  BICv4i16
  BICv8i8
  BIFv8i8
  BITv8i8
  BSLv8i8
  CLASTA_VPZ_D
  CLASTB_VPZ_D
  CLSv2i32
  CLSv4i16
  CLSv8i8
  CLZv2i32
  CLZv4i16
  CLZv8i8
  CMEQv1i64
  CMEQv1i64rz
  CMEQv2i32
  CMEQv2i32rz
  CMEQv4i16
  CMEQv4i16rz
  CMEQv8i8
  CMEQv8i8rz
  CMGEv1i64
  CMGEv1i64rz
  CMGEv2i32
  CMGEv2i32rz
  CMGEv4i16
  CMGEv4i16rz
  CMGEv8i8
  CMGEv8i8rz
  CMGTv1i64
  CMGTv1i64rz
  CMGTv2i32
  CMGTv2i32rz
  CMGTv4i16
  CMGTv4i16rz
  CMGTv8i8
  CMGTv8i8rz
  CMHIv1i64
  CMHIv2i32
  CMHIv4i16
  CMHIv8i8
  CMHSv1i64
  CMHSv2i32
  CMHSv4i16
  CMHSv8i8
  CMLEv1i64rz
  CMLEv2i32rz
  CMLEv4i16rz
  CMLEv8i8rz
  CMLTv1i64rz
  CMLTv2i32rz
  CMLTv4i16rz
  CMLTv8i8rz
  CMTSTv1i64
  CMTSTv2i32
  CMTSTv4i16
  CMTSTv8i8
  CNTv8i8
  DUPi64
  DUPv2i32gpr
  DUPv2i32lane
  DUPv4i16gpr
  DUPv4i16lane
  DUPv8i8gpr
  DUPv8i8lane
  EORv8i8
  EXTv8i8
  FABD64
  FABDv2f32
  FABDv4f16
  FABSDr
  FABSv2f32
  FABSv4f16
  FACGE64
  FACGEv2f32
  FACGEv4f16
  FACGT64
  FACGTv2f32
  FACGTv4f16
  FADDDrr
  FADDPv2f32
  FADDPv2i64p
  FADDPv4f16
  FADDv2f32
  FADDv4f16
  FCADDv2f32
  FCADDv4f16
  FCMEQ64
  FCMEQv1i64rz
  FCMEQv2f32
  FCMEQv2i32rz
  FCMEQv4f16
  FCMEQv4i16rz
  FCMGE64
  FCMGEv1i64rz
  FCMGEv2f32
  FCMGEv2i32rz
  FCMGEv4f16
  FCMGEv4i16rz
  FCMGT64
  FCMGTv1i64rz
  FCMGTv2f32
  FCMGTv2i32rz
  FCMGTv4f16
  FCMGTv4i16rz
  FCMLAv2f32
  FCMLAv4f16
  FCMLAv4f16_indexed
  FCMLEv1i64rz
  FCMLEv2i32rz
  FCMLEv4i16rz
  FCMLTv1i64rz
  FCMLTv2i32rz
  FCMLTv4i16rz
  FCSELDrrr
  FCVTASv1i64
  FCVTASv2f32
  FCVTASv4f16
  FCVTAUv1i64
  FCVTAUv2f32
  FCVTAUv4f16
  FCVTDHr
  FCVTDSr
  FCVTMSv1i64
  FCVTMSv2f32
  FCVTMSv4f16
  FCVTMUv1i64
  FCVTMUv2f32
  FCVTMUv4f16
  FCVTNSv1i64
  FCVTNSv2f32
  FCVTNSv4f16
  FCVTNUv1i64
  FCVTNUv2f32
  FCVTNUv4f16
  FCVTNv2i32
  FCVTNv4i16
  FCVTPSv1i64
  FCVTPSv2f32
  FCVTPSv4f16
  FCVTPUv1i64
  FCVTPUv2f32
  FCVTPUv4f16
  FCVTXNv2f32
  FCVTZSd
  FCVTZSv1i64
  FCVTZSv2f32
  FCVTZSv2i32_shift
  FCVTZSv4f16
  FCVTZSv4i16_shift
  FCVTZUd
  FCVTZUv1i64
  FCVTZUv2f32
  FCVTZUv2i32_shift
  FCVTZUv4f16
  FCVTZUv4i16_shift
  FDIVDrr
  FDIVv2f32
  FDIVv4f16
  FMADDDrrr
  FMAXDrr
  FMAXNMDrr
  FMAXNMPv2f32
  FMAXNMPv2i64p
  FMAXNMPv4f16
  FMAXNMv2f32
  FMAXNMv4f16
  FMAXPv2f32
  FMAXPv2i64p
  FMAXPv4f16
  FMAXv2f32
  FMAXv4f16
  FMINDrr
  FMINNMDrr
  FMINNMPv2f32
  FMINNMPv2i64p
  FMINNMPv4f16
  FMINNMv2f32
  FMINNMv4f16
  FMINPv2f32
  FMINPv2i64p
  FMINPv4f16
  FMINv2f32
  FMINv4f16
  FMLAL2lanev4f16
  FMLAL2v4f16
  FMLALlanev4f16
  FMLALv4f16
  FMLAv1i64_indexed
  FMLAv2f32
  FMLAv2i32_indexed
  FMLAv4f16
  FMLAv4i16_indexed
  FMLSL2lanev4f16
  FMLSL2v4f16
  FMLSLlanev4f16
  FMLSLv4f16
  FMLSv1i64_indexed
  FMLSv2f32
  FMLSv2i32_indexed
  FMLSv4f16
  FMLSv4i16_indexed
  FMOVDi
  FMOVDr
  FMOVXDr
  FMOVv2f32_ns
  FMOVv4f16_ns
  FMSUBDrrr
  FMULDrr
  FMULX64
  FMULXv1i64_indexed
  FMULXv2f32
  FMULXv2i32_indexed
  FMULXv4f16
  FMULXv4i16_indexed
  FMULv1i64_indexed
  FMULv2f32
  FMULv2i32_indexed
  FMULv4f16
  FMULv4i16_indexed
  FNEGDr
  FNEGv2f32
  FNEGv4f16
  FNMADDDrrr
  FNMSUBDrrr
  FNMULDrr
  FRECPEv1i64
  FRECPEv2f32
  FRECPEv4f16
  FRECPS64
  FRECPSv2f32
  FRECPSv4f16
  FRECPXv1i64
  FRINT32XDr
  FRINT32Xv2f32
  FRINT32ZDr
  FRINT32Zv2f32
  FRINT64XDr
  FRINT64Xv2f32
  FRINT64ZDr
  FRINT64Zv2f32
  FRINTADr
  FRINTAv2f32
  FRINTAv4f16
  FRINTIDr
  FRINTIv2f32
  FRINTIv4f16
  FRINTMDr
  FRINTMv2f32
  FRINTMv4f16
  FRINTNDr
  FRINTNv2f32
  FRINTNv4f16
  FRINTPDr
  FRINTPv2f32
  FRINTPv4f16
  FRINTXDr
  FRINTXv2f32
  FRINTXv4f16
  FRINTZDr
  FRINTZv2f32
  FRINTZv4f16
  FRSQRTEv1i64
  FRSQRTEv2f32
  FRSQRTEv4f16
  FRSQRTS64
  FRSQRTSv2f32
  FRSQRTSv4f16
  FSQRTDr
  FSQRTv2f32
  FSQRTv4f16
  FSUBDrr
  FSUBv2f32
  FSUBv4f16
  LASTA_VPZ_D
  LASTB_VPZ_D
  LD1Onev1d
  LD1Onev2s
  LD1Onev4h
  LD1Onev8b
  LD1Rv1d
  LD1Rv2s
  LD1Rv4h
  LD1Rv8b
  LDAPURdi
  LDNPDi
  LDPDi
  LDRDl
  LDRDroW
  LDRDroX
  LDRDui
  LDURDi
  MLAv2i32
  MLAv2i32_indexed
  MLAv4i16
  MLAv4i16_indexed
  MLAv8i8
  MLSv2i32
  MLSv2i32_indexed
  MLSv4i16
  MLSv4i16_indexed
  MLSv8i8
  MOVID
  MOVIv2i32
  MOVIv2s_msl
  MOVIv4i16
  MOVIv8b_ns
  MULv2i32
  MULv2i32_indexed
  MULv4i16
  MULv4i16_indexed
  MULv8i8
  MVNIv2i32
  MVNIv2s_msl
  MVNIv4i16
  NEGv1i64
  NEGv2i32
  NEGv4i16
  NEGv8i8
  NOTv8i8
  ORNv8i8
  ORRv2i32
  ORRv4i16
  ORRv8i8
  PMULv8i8
  RADDHNv2i64_v2i32
  RADDHNv4i32_v4i16
  RADDHNv8i16_v8i8
  RBITv8i8
  REV16v8i8
  REV32v4i16
  REV32v8i8
  REV64v2i32
  REV64v4i16
  REV64v8i8
  RSHRNv2i32_shift
  RSHRNv4i16_shift
  RSHRNv8i8_shift
  RSUBHNv2i64_v2i32
  RSUBHNv4i32_v4i16
  RSUBHNv8i16_v8i8
  SABAv2i32
  SABAv4i16
  SABAv8i8
  SABDv2i32
  SABDv4i16
  SABDv8i8
  SADALPv2i32_v1i64
  SADALPv4i16_v2i32
  SADALPv8i8_v4i16
  SADDLPv2i32_v1i64
  SADDLPv4i16_v2i32
  SADDLPv8i8_v4i16
  SADDLVv4i32v
  SCVTFSWDri
  SCVTFSXDri
  SCVTFUWDri
  SCVTFUXDri
  SCVTFd
  SCVTFv1i64
  SCVTFv2f32
  SCVTFv2i32_shift
  SCVTFv4f16
  SCVTFv4i16_shift
  SDOTlanev8i8
  SDOTv8i8
  SHADDv2i32
  SHADDv4i16
  SHADDv8i8
  SHLd
  SHLv2i32_shift
  SHLv4i16_shift
  SHLv8i8_shift
  SHRNv2i32_shift
  SHRNv4i16_shift
  SHRNv8i8_shift
  SHSUBv2i32
  SHSUBv4i16
  SHSUBv8i8
  SLId
  SLIv2i32_shift
  SLIv4i16_shift
  SLIv8i8_shift
  SMAXPv2i32
  SMAXPv4i16
  SMAXPv8i8
  SMAXv2i32
  SMAXv4i16
  SMAXv8i8
  SMINPv2i32
  SMINPv4i16
  SMINPv8i8
  SMINv2i32
  SMINv4i16
  SMINv8i8
  SQABSv1i64
  SQABSv2i32
  SQABSv4i16
  SQABSv8i8
  SQADDv1i64
  SQADDv2i32
  SQADDv4i16
  SQADDv8i8
  SQDMLALi32
  SQDMLALv1i64_indexed
  SQDMLSLi32
  SQDMLSLv1i64_indexed
  SQDMULHv2i32
  SQDMULHv2i32_indexed
  SQDMULHv4i16
  SQDMULHv4i16_indexed
  SQDMULLi32
  SQDMULLv1i64_indexed
  SQNEGv1i64
  SQNEGv2i32
  SQNEGv4i16
  SQNEGv8i8
  SQRDMLAHv2i32
  SQRDMLAHv2i32_indexed
  SQRDMLAHv4i16
  SQRDMLAHv4i16_indexed
  SQRDMLSHv2i32
  SQRDMLSHv2i32_indexed
  SQRDMLSHv4i16
  SQRDMLSHv4i16_indexed
  SQRDMULHv2i32
  SQRDMULHv2i32_indexed
  SQRDMULHv4i16
  SQRDMULHv4i16_indexed
  SQRSHLv1i64
  SQRSHLv2i32
  SQRSHLv4i16
  SQRSHLv8i8
  SQRSHRNv2i32_shift
  SQRSHRNv4i16_shift
  SQRSHRNv8i8_shift
  SQRSHRUNv2i32_shift
  SQRSHRUNv4i16_shift
  SQRSHRUNv8i8_shift
  SQSHLUd
  SQSHLUv2i32_shift
  SQSHLUv4i16_shift
  SQSHLUv8i8_shift
  SQSHLd
  SQSHLv1i64
  SQSHLv2i32
  SQSHLv2i32_shift
  SQSHLv4i16
  SQSHLv4i16_shift
  SQSHLv8i8
  SQSHLv8i8_shift
  SQSHRNv2i32_shift
  SQSHRNv4i16_shift
  SQSHRNv8i8_shift
  SQSHRUNv2i32_shift
  SQSHRUNv4i16_shift
  SQSHRUNv8i8_shift
  SQSUBv1i64
  SQSUBv2i32
  SQSUBv4i16
  SQSUBv8i8
  SQXTNv2i32
  SQXTNv4i16
  SQXTNv8i8
  SQXTUNv2i32
  SQXTUNv4i16
  SQXTUNv8i8
  SRHADDv2i32
  SRHADDv4i16
  SRHADDv8i8
  SRId
  SRIv2i32_shift
  SRIv4i16_shift
  SRIv8i8_shift
  SRSHLv1i64
  SRSHLv2i32
  SRSHLv4i16
  SRSHLv8i8
  SRSHRd
  SRSHRv2i32_shift
  SRSHRv4i16_shift
  SRSHRv8i8_shift
  SRSRAd
  SRSRAv2i32_shift
  SRSRAv4i16_shift
  SRSRAv8i8_shift
  SSHLv1i64
  SSHLv2i32
  SSHLv4i16
  SSHLv8i8
  SSHRd
  SSHRv2i32_shift
  SSHRv4i16_shift
  SSHRv8i8_shift
  SSRAd
  SSRAv2i32_shift
  SSRAv4i16_shift
  SSRAv8i8_shift
  SUBHNv2i64_v2i32
  SUBHNv4i32_v4i16
  SUBHNv8i16_v8i8
  SUBv1i64
  SUBv2i32
  SUBv4i16
  SUBv8i8
  SUDOTlanev8i8
  SUQADDv1i64
  SUQADDv2i32
  SUQADDv4i16
  SUQADDv8i8
  TBLv8i8Four
  TBLv8i8One
  TBLv8i8Three
  TBLv8i8Two
  TBXv8i8Four
  TBXv8i8One
  TBXv8i8Three
  TBXv8i8Two
  TRN1v2i32
  TRN1v4i16
  TRN1v8i8
  TRN2v2i32
  TRN2v4i16
  TRN2v8i8
  UABAv2i32
  UABAv4i16
  UABAv8i8
  UABDv2i32
  UABDv4i16
  UABDv8i8
  UADALPv2i32_v1i64
  UADALPv4i16_v2i32
  UADALPv8i8_v4i16
  UADDLPv2i32_v1i64
  UADDLPv4i16_v2i32
  UADDLPv8i8_v4i16
  UADDLVv4i32v
  UCVTFSWDri
  UCVTFSXDri
  UCVTFUWDri
  UCVTFUXDri
  UCVTFd
  UCVTFv1i64
  UCVTFv2f32
  UCVTFv2i32_shift
  UCVTFv4f16
  UCVTFv4i16_shift
  UDOTlanev8i8
  UDOTv8i8
  UHADDv2i32
  UHADDv4i16
  UHADDv8i8
  UHSUBv2i32
  UHSUBv4i16
  UHSUBv8i8
  UMAXPv2i32
  UMAXPv4i16
  UMAXPv8i8
  UMAXv2i32
  UMAXv4i16
  UMAXv8i8
  UMINPv2i32
  UMINPv4i16
  UMINPv8i8
  UMINv2i32
  UMINv4i16
  UMINv8i8
  UQADDv1i64
  UQADDv2i32
  UQADDv4i16
  UQADDv8i8
  UQRSHLv1i64
  UQRSHLv2i32
  UQRSHLv4i16
  UQRSHLv8i8
  UQRSHRNv2i32_shift
  UQRSHRNv4i16_shift
  UQRSHRNv8i8_shift
  UQSHLd
  UQSHLv1i64
  UQSHLv2i32
  UQSHLv2i32_shift
  UQSHLv4i16
  UQSHLv4i16_shift
  UQSHLv8i8
  UQSHLv8i8_shift
  UQSHRNv2i32_shift
  UQSHRNv4i16_shift
  UQSHRNv8i8_shift
  UQSUBv1i64
  UQSUBv2i32
  UQSUBv4i16
  UQSUBv8i8
  UQXTNv2i32
  UQXTNv4i16
  UQXTNv8i8
  URECPEv2i32
  URHADDv2i32
  URHADDv4i16
  URHADDv8i8
  URSHLv1i64
  URSHLv2i32
  URSHLv4i16
  URSHLv8i8
  URSHRd
  URSHRv2i32_shift
  URSHRv4i16_shift
  URSHRv8i8_shift
  URSQRTEv2i32
  URSRAd
  URSRAv2i32_shift
  URSRAv4i16_shift
  URSRAv8i8_shift
  USDOTlanev8i8
  USDOTv8i8
  USHLv1i64
  USHLv2i32
  USHLv4i16
  USHLv8i8
  USHRd
  USHRv2i32_shift
  USHRv4i16_shift
  USHRv8i8_shift
  USQADDv1i64
  USQADDv2i32
  USQADDv4i16
  USQADDv8i8
  USRAd
  USRAv2i32_shift
  USRAv4i16_shift
  USRAv8i8_shift
  UZP1v2i32
  UZP1v4i16
  UZP1v8i8
  UZP2v2i32
  UZP2v4i16
  UZP2v8i8
  XTNv2i32
  XTNv4i16
  XTNv8i8
  ZIP1v2i32
  ZIP1v4i16
  ZIP1v8i8
  ZIP2v2i32
  ZIP2v4i16
  ZIP2v8i8
2023-05-05 17:26:53 +01:00
Ronak Chauhan
5f0b92e580 [AMDGPU] Also consider global and scratch instructions when flushing vmcnt counter in loop preheader
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D149332
2023-05-05 21:12:10 +05:30