llvm-project

Author	SHA1	Message	Date
Philip Reames	33314693f5	Revert "[RISCV][InsertVSETVLI] Avoid VL toggles for extractelement patterns" This reverts commit 657d20dc75252f0c8415ada5214affccc3c98efe. A correctness problem was reported against the review and the fix warrants re-review.	2023-05-10 10:58:46 -07:00
Konstantin Zhuravlyov	9d05727972	AMDGPU: Add basic gfx942 target Differential Revision: https://reviews.llvm.org/D149983	2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov	1fc70210a6	AMDGPU: Add basic gfx941 target Differential Revision: https://reviews.llvm.org/D149982	2023-05-10 11:51:06 -04:00
Yingwei Zheng	e5532fb493	[RISCV] Enable signed truncation check transforms for i8 This patch enables signed truncation check transforms for i8 on rv32 when XVT is i64 and Zbb is enabled. It is a small improvement of D149977. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D150177	2023-05-10 22:53:24 +08:00
Matt Devereau	004bf170c6	[AArch64] Emit FNMADD instead of FNEG(FMADD) Emit FNMADD instead of FNEG(FMADD) for optimization levels above Oz when fast-math flags (nsz+contract) permit it. Differential Revision: https://reviews.llvm.org/D149260	2023-05-10 12:45:54 +00:00
Jonas Paulsson	655f0fc4b9	Reapply "[SystemZ] Bugfix in expansion of memmem operations." The new test case showed that the NoPHIs flag needs to be cleared. Original commit message: [SystemZ] Bugfix in expansion of memmem operations. Since NC, OC, and XC clobber CC, the EXRL_Pseudo targeting these must also be marked to do so. Original patch by uweigand. Reviewed by: uweigand Differential Revision: https://reviews.llvm.org/D150251 Fixes: https://github.com/llvm/llvm-project/issues/62572	2023-05-10 12:40:57 +02:00
Jonas Paulsson	dfa42a69b8	Revert "[SystemZ] Bugfix in expansion of memmem operations." Sorry - mir test fails with expensive checks on build bot. Seems to relate to the fact that there are no PHIs in the .mir input, but after they are created the verifyer reports "Found PHI instruction with NoPHIs property set". This reverts commit 00454a17f361d677d5423905c888daca1a80661a.	2023-05-10 11:34:55 +02:00
Jonas Paulsson	00454a17f3	[SystemZ] Bugfix in expansion of memmem operations. Since NC, OC, and XC clobber CC, the EXRL_Pseudo targeting these must also be marked to do so. Original patch by uweigand. Reviewed by: uweigand Differential Revision: https://reviews.llvm.org/D150251 Fixes: https://github.com/llvm/llvm-project/issues/62572	2023-05-10 11:05:13 +02:00
Matt Arsenault	62eac3e068	GlobalISel: Fold out G_FPTRUNC(G_FPEXT)	2023-05-10 08:01:27 +01:00
Serguei Katkov	92663cd464	[X86] Add test cases for fminimum/fmaximum with vector zero operands.	2023-05-10 11:54:51 +07:00
Chen Zheng	fb45493562	[DebugLine] save one debug line entry for empty prologue Reland D147506 after fixing the failure in bot https://lab.llvm.org/buildbot/#/builders/247/builds/4125 Some debuggers like DBX on AIX assume the address in debug line entries is always incremental. But clang generates two entries (entry for file scope line and entry for prologue end) with same address if prologue is empty And if the prologue is empty, seems the first debug line entry for the function is unnecessary(i.e. removing the first entry won't impact the behavior in GDB on Linux), so I implement this for all debuggers. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D147506	2023-05-10 01:21:02 +00:00
Aaron Ballman	f08b94d5c2	Further amend 945f6e65be0d20b3446e7c1537c64151de618af4 This addresses the ARM issue found by: https://lab.llvm.org/buildbot/#/builders/109/builds/63726 (This test wouldn't run for me locally, hence missing it in the last fix.)	2023-05-09 16:16:07 -04:00
Aaron Ballman	b97527f144	Fix test failure from 945f6e65be0d20b3446e7c1537c64151de618af4 It seems we were testing the behavior of the debug messages!	2023-05-09 16:09:55 -04:00
Sami Tolvanen	e9569748de	[CodeGen][KCFI] Move cfi-type lowering to TargetLowering KCFI machine function passes transform indirect calls with a cfi-type attribute into architecture-specific type checks bundled together with the calls. Instead of having a separate pass for each architecture, add a generic machine function pass for KCFI and move the architecture-specific code that emits the actual check to TargetLowering. This avoids unnecessary duplication and makes it easier to add KCFI support to other architectures. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D149915	2023-05-09 18:38:54 +00:00
Thomas Symalla	60a4cb7076	[NFC][AMDGPU] Add option to test.	2023-05-09 17:24:00 +02:00
Thomas Symalla	80f442e1ed	[NFC][AMDGPU] Pre-commit test. Pre-commit fold-fabs.ll.	2023-05-09 14:53:44 +02:00
Tom Dohrmann	f6154364f6	fix stack probe lowering for x86_intrcc The x86_intrcc calling convention will build two STACKALLOC_W_PROBING machine instructions if the function takes an error code. This is caused by an additional call to emitSPUpdate in llvm/lib/Target/X86/X86FrameLowering.cpp:1650. Previously only the first STACKALLOC_W_PROBING machine instruction was properly handled, the second one was simply ignored. This lead to miscompilations where the stack pointer wasn't properly updated (see https://github.com/rust-lang/rust/issues/109918). This patch fixes this by handling all STACKALLOC_W_PROBING machine instructions. To be honest I don't quite understand why this didn't lead to more noticeable miscompilations previously. This is my first time contributing to LLVM. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D150033	2023-05-09 16:31:42 +08:00
pvanhout	0d1864a754	[AMDGPU] Recompute liveness in SIOptimizeExecMaskingPreRA Instead of ad-hoc updating liveness, recompute it completely for the affected register. This does not affect any existing test and fixes an edge case that caused a "Non-empty but used interval" error in the register allocator due to how the pass updated liveranges. It created a "isolated" live-through segment. Overall this change just seems to be a net positive with no side effect observed. There may be a compile time impact but it's expected to be minimal. Fixes SWDEV-388279 Reviewed By: critson Differential Revision: https://reviews.llvm.org/D150105	2023-05-09 09:02:22 +02:00
Weining Lu	161716a713	[LoongArch] Support fcc* (condition flag) registers in inlineasm clobbers Differential Revision: https://reviews.llvm.org/D150089	2023-05-09 14:55:50 +08:00
Amara Emerson	e1472db58e	[GlobalISel] Implement commuting shl (add/or x, c1), c2 -> add/or (shl x, c2), c1 << c2 There's a target hook that's called in DAGCombiner that we stub here, I'll implement the equivalent override for AArch64 in a subsequent patch since it's used by different shift combine. This change by itself has minor code size improvements on arm64 -Os CTMark: Program size.__text outputg181ppyy output8av1cxfn diff consumer-typeset/consumer-typeset 410648.00 410648.00 0.0% tramp3d-v4/tramp3d-v4 364176.00 364176.00 0.0% kimwitu++/kc 449216.00 449212.00 -0.0% 7zip/7zip-benchmark 576128.00 576120.00 -0.0% sqlite3/sqlite3 285108.00 285100.00 -0.0% SPASS/SPASS 411720.00 411688.00 -0.0% ClamAV/clamscan 379868.00 379764.00 -0.0% Bullet/bullet 452064.00 451928.00 -0.0% mafft/pairlocalalign 246184.00 246108.00 -0.0% lencod/lencod 428524.00 428152.00 -0.1% Geomean difference -0.0% Differential Revision: https://reviews.llvm.org/D150086	2023-05-08 22:37:43 -07:00
Alan Zhao	f4999d3535	Revert "[CodeGen][ShrinkWrap] Split restore point" This reverts commit 1ddfd1c8186735c62b642df05c505dc4907ffac4. The original commit causes a Chrome build assertion failure with ThinLTO: https://crbug.com/1443635	2023-05-08 16:27:59 -07:00
Craig Topper	6b429a9cf9	[RISCV] Improve RV64 codegen for i32 ISD::SADDO when RHS is constant. This uses the same sequence we get from LegalizeDAG for i32 on RV32, but modified to use W instructions. When the RHS is constant one of the setccs simplifies to a constant and the xor will either be an xori with 1 or get removed. When the RHS is not a constant it was not an obvious improvement and it was a regression when used with a branch. So I've restricted to the constant case. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D150135	2023-05-08 13:02:19 -07:00
Stefan Pintilie	be95b4dec2	[PowerPC] Look through OR, AND, XOR instructions when checking a clear. This patch adds the additional step of looking through AND, OR, XOR instructions when we check the number of leading zeros. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D149223	2023-05-08 14:25:20 -04:00
Craig Topper	62e6082dcc	[RISCV] Implement shouldTransformSignedTruncationCheck. This helps avoid constant materialization for the patterns InstCombine emits for something like INT_MIN <= X && x <= INT_MAX. See top of changed test files for more detailed explanation. I've enabled this for i16 when Zbb is enabled. sext.b did not seem to be a benefit due to the constants folding into addi/sltiu. This an alternative to https://reviews.llvm.org/D149814 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D149977	2023-05-08 09:52:52 -07:00
Alvin Wong	6b996282ce	[X86][CodeGen] Do not add `offset` for memory reference using symbol In the past, D71436 added writing the `offset` operator for some legitimate cases. However, for memory references in Intel syntax, the `offset` operator (`[offset sym]`) appears to be superfluous at best, possibly wrong and contradictory at worst. This patch bypasses writing the `offset` operator in `X86AsmPrinter::PrintIntelMemReference` which affects exactly this case. A similar code flow exists in `X86IntelInstPrinter.cpp` - `X86IntelInstPrinter::printMemReference`. The motivation for fixing this output is to allow us to reject the confusing `call [offset fn_ref]` syntax in MC, as discussed in D149579. Depends on D149579 Differential Revision: https://reviews.llvm.org/D150047	2023-05-09 00:07:40 +08:00
Simon Pilgrim	d53b76e3dd	[X86] vector-reduce-or-bool.ll - add common AVX prefix Share with AVX512 to reduce duplication	2023-05-08 15:30:41 +01:00
Simon Pilgrim	55f8083f7b	[X86] avx512-insert-extract.ll - add nounwind to silence cfi noise	2023-05-08 14:13:25 +01:00
sgokhale	1ddfd1c818	[CodeGen][ShrinkWrap] Split restore point Try to reland D42600 Differential Revision: https://reviews.llvm.org/D42600	2023-05-08 13:21:07 +05:30
sgokhale	7cba800104	[CodeGen] Autogen tests as prerequisite for D42600 Autogenerating tests as suggested in D42600	2023-05-08 12:25:51 +05:30
Noah Goldstein	3e998ede64	[X86] Lower used `(atomicrmw xor p, SignBit)` as `(atomicrmw add p, SignBit)` `(xor X, SignBit)` == `(add X, SignBit)`. For atomics whose result is used, the `add` option is preferable because of the `xadd` instruction which allows us to avoid either a CAS loop or a `btc; setcc; shl`. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D149689	2023-05-07 19:11:53 -05:00
Noah Goldstein	3886985217	[X86] Add tests for `(atomicrmw xor p, Imm)`; NFC Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D149687	2023-05-07 19:11:52 -05:00
Manoj Gupta	4157625cea	Revert "[AArch64] Emit FNMADD instead of FNEG(FMADD)" This reverts commit ea228bd0bd0173ffd4aac497a312a852e8f7ffad. Cuases a crash on AArch64. Testcase provided at D149260.	2023-05-07 16:38:08 -07:00
Amara Emerson	345e419483	[NFC][AArch64][GlobalISel] Add a gisel run line for shift-logic.ll	2023-05-07 15:50:59 -07:00
David Green	b774f14841	[DAG] Calculate the number of sign bits for constant BUILD_VECTOR directly. For constant BUILD_VECTORs the operands need to be legal types. This can mean that when the number of sign bits is calculated it may look that the entire constant and inefficiently produce less sign bits than it could. For example i8 vectors could use i32 elements, for which 0x000000ff would be incorrectly limited to 1 sign bit as the original value has 24 sign bits. This makes it look at the constant directly, truncated to the correct type for the element so that it can correctly return 8. Differential Revision: https://reviews.llvm.org/D149956	2023-05-07 22:31:10 +01:00
Simon Pilgrim	17dd1ad14b	[X86] lowerShuffleAsElementInsertion - fold to or(vzext_movl(scalar_to_vector(zext(x))), and(constant, mask)) The logic in this function is a bit of a mess, but masking a vector constant should allow us to OR the zero-extended i8/i16 scalar value in place. We can do more here - reusing the OR pattern if the relevant unused elements are known zero etc. but this is enough to address a regression from D127115.	2023-05-07 20:58:14 +01:00
Simon Pilgrim	d02848b2ab	[X86] or-with-overflow.ll - adjust or_i64_ri constant to not constant fold the icmp Better KnownBits handling of the icmp and/or an upcoming USUBSAT fold would constant fold this test away and prevent us testing for a cleared overflow flag.	2023-05-06 22:42:14 +01:00
Simon Pilgrim	b7116ba8b0	[DAG] computeOverflowForUnsignedAdd - use ConstantRange::unsignedAddMayOverflow as fallback Replaces the more specific uadd_ov case	2023-05-06 22:03:38 +01:00
Simon Pilgrim	8f82d8ee76	[DAG] visitSUBSAT - fold subsat(x,y) -> sub(x,y) if it never overflows	2023-05-06 15:55:04 +01:00
Simon Pilgrim	05a57fd18c	[X86] Add tests showing failure to simplify ssubsat/usubsat to sub	2023-05-06 15:55:04 +01:00
Simon Pilgrim	08c1150d4c	[DAG] Add computeOverflowForSignedSub/computeOverflowForUnsignedSub/computeOverflowForSub Match the addition variants (although computeOverflowForUnsignedSub is really just a placeholder), and use this in DAGCombiner::visitSUBO	2023-05-06 15:55:04 +01:00
Jay Foad	3551e0f345	[RegisterCoalescer] Fix problem with IMPLICIT_DEF live-in to an invoke Give up on erasing an IMPLICIT_DEF if it might be live-in to a call instruction in a basic block with EH pad successors. This fixes a liveness bug that will be diagnosed by MachineVerifer when D149947 lands. Differential Revision: https://reviews.llvm.org/D149954	2023-05-06 15:16:54 +01:00
Jay Foad	226ff45214	[X86] Generate checks for 2012-01-10-UndefExceptionEdge Also add -verify-machineinstrs to make it easier to catch a MachineVerifier failure introduced by D149947. Differential Revision: https://reviews.llvm.org/D149953	2023-05-06 15:16:54 +01:00
Simon Pilgrim	3fb067f7ba	[DAG] visitADDSAT - fold saddsat(x,y) -> add(x,y) if it never overflows Extend existing uaddsat(x,y) fold	2023-05-06 14:18:23 +01:00
Simon Pilgrim	7395f6ae78	[DAG] Add computeOverflowForSignedAdd and computeOverflowForAdd wrapper Add basic computeOverflowForSignedAdd helper to recognise that sadd overflow can't occur if both operands have more that one sign bit. Add computeOverflowForAdd wrapper that calls computeOverflowForSignedAdd/computeOverflowForUnsignedAdd depending on the IsSigned argument, and use this in DAGCombiner::visitADDO	2023-05-06 13:33:14 +01:00
Simon Pilgrim	5e029f0142	[X86] xaluo.ll - add test coverage showing the failure to recognise when saddo/ssubo will not overflow sadd/ssub with both operands with more than one sign bit will not overflow Alive2: https://alive2.llvm.org/ce/z/a8HmNp	2023-05-06 13:33:14 +01:00
Simon Pilgrim	d24c179682	[X86] Regenerate xaluo.ll with common CHECK prefix	2023-05-06 13:33:14 +01:00
Fangrui Song	8afd831b45	ms inline asm: recognize case-insensitive JMP and CALL as TargetLowering::C_Address In a `__asm` block, a symbol reference is usually a memory constraint (indirect TargetLowering::C_Memory) [LOOP]. CALL and JUMP instructions are special that `__asm call k` can be an address constraint, if `k` is a function. Clang always gives us indirect TargetLowering::C_Memory and need to convert it to direct TargetLowering::C_Address. D133914 implements this conversion, but does not consider JMP or case-insensitive CALL. This patch implements the missing cases, so that `__asm jmp k` (`jmp ${0:P}`) will correctly lower to `jmp _k` instead of `jmp dword ptr [_k]`. (`__asm call k` lowered to `call dword ptr ${0:P}` and is fixed by D149695 to lower to `call ${0:P}` instead.) [LOOP]: Some instructions like LOOP{,E,NE} and Jcc always use an address constraint (`loop _k` instead of `loop dword ptr [_k]`). After this patch and D149579, all the following cases will be correct. ``` int k(int); int (*kptr)(int); ... __asm call k; // correct without this patch __asm CALL k; // correct, but needs this patch to be compatible with D149579 __asm jmp k; // correct, but needs this patch to be compatible with D149579 __asm call kptr; // will be fixed by D149579. "Broken case" in clang/test/CodeGen/ms-inline-asm-functions.c __asm jmp kptr; // will be fixed by this patch and D149579 ``` Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D149920	2023-05-05 15:32:32 -07:00
Craig Topper	9c377c53da	[RISCV] Copy lack-of-signed-truncation-check.ll and signed-truncation-check.ll from AArch6/X86. NFC This is a more exhaustive set of tests for the same issue D149814 is trying to solve.	2023-05-05 11:15:39 -07:00
David Green	6e7840dd42	[AArch64] Extend fp64 top zeroing peephole to all instructions D147235 added a fold to remove instructions that zero the upper half of a register if the instruction already implicitly zeros the register. As far as I can tell this applies to all instructions that define a FPR64 register in AArch64. This patch switches to a check for the register class. The full list of instructions is: BSPv8i8 FMOVD0 ABSv1i64 ABSv2i32 ABSv4i16 ABSv8i8 ADDHNv2i64_v2i32 ADDHNv4i32_v4i16 ADDHNv8i16_v8i8 ADDPv2i32 ADDPv2i64p ADDPv4i16 ADDPv8i8 ADDv1i64 ADDv2i32 ADDv4i16 ADDv8i8 ANDv8i8 BF16DOTlanev4bf16 BFDOTv4bf16 BICv2i32 BICv4i16 BICv8i8 BIFv8i8 BITv8i8 BSLv8i8 CLASTA_VPZ_D CLASTB_VPZ_D CLSv2i32 CLSv4i16 CLSv8i8 CLZv2i32 CLZv4i16 CLZv8i8 CMEQv1i64 CMEQv1i64rz CMEQv2i32 CMEQv2i32rz CMEQv4i16 CMEQv4i16rz CMEQv8i8 CMEQv8i8rz CMGEv1i64 CMGEv1i64rz CMGEv2i32 CMGEv2i32rz CMGEv4i16 CMGEv4i16rz CMGEv8i8 CMGEv8i8rz CMGTv1i64 CMGTv1i64rz CMGTv2i32 CMGTv2i32rz CMGTv4i16 CMGTv4i16rz CMGTv8i8 CMGTv8i8rz CMHIv1i64 CMHIv2i32 CMHIv4i16 CMHIv8i8 CMHSv1i64 CMHSv2i32 CMHSv4i16 CMHSv8i8 CMLEv1i64rz CMLEv2i32rz CMLEv4i16rz CMLEv8i8rz CMLTv1i64rz CMLTv2i32rz CMLTv4i16rz CMLTv8i8rz CMTSTv1i64 CMTSTv2i32 CMTSTv4i16 CMTSTv8i8 CNTv8i8 DUPi64 DUPv2i32gpr DUPv2i32lane DUPv4i16gpr DUPv4i16lane DUPv8i8gpr DUPv8i8lane EORv8i8 EXTv8i8 FABD64 FABDv2f32 FABDv4f16 FABSDr FABSv2f32 FABSv4f16 FACGE64 FACGEv2f32 FACGEv4f16 FACGT64 FACGTv2f32 FACGTv4f16 FADDDrr FADDPv2f32 FADDPv2i64p FADDPv4f16 FADDv2f32 FADDv4f16 FCADDv2f32 FCADDv4f16 FCMEQ64 FCMEQv1i64rz FCMEQv2f32 FCMEQv2i32rz FCMEQv4f16 FCMEQv4i16rz FCMGE64 FCMGEv1i64rz FCMGEv2f32 FCMGEv2i32rz FCMGEv4f16 FCMGEv4i16rz FCMGT64 FCMGTv1i64rz FCMGTv2f32 FCMGTv2i32rz FCMGTv4f16 FCMGTv4i16rz FCMLAv2f32 FCMLAv4f16 FCMLAv4f16_indexed FCMLEv1i64rz FCMLEv2i32rz FCMLEv4i16rz FCMLTv1i64rz FCMLTv2i32rz FCMLTv4i16rz FCSELDrrr FCVTASv1i64 FCVTASv2f32 FCVTASv4f16 FCVTAUv1i64 FCVTAUv2f32 FCVTAUv4f16 FCVTDHr FCVTDSr FCVTMSv1i64 FCVTMSv2f32 FCVTMSv4f16 FCVTMUv1i64 FCVTMUv2f32 FCVTMUv4f16 FCVTNSv1i64 FCVTNSv2f32 FCVTNSv4f16 FCVTNUv1i64 FCVTNUv2f32 FCVTNUv4f16 FCVTNv2i32 FCVTNv4i16 FCVTPSv1i64 FCVTPSv2f32 FCVTPSv4f16 FCVTPUv1i64 FCVTPUv2f32 FCVTPUv4f16 FCVTXNv2f32 FCVTZSd FCVTZSv1i64 FCVTZSv2f32 FCVTZSv2i32_shift FCVTZSv4f16 FCVTZSv4i16_shift FCVTZUd FCVTZUv1i64 FCVTZUv2f32 FCVTZUv2i32_shift FCVTZUv4f16 FCVTZUv4i16_shift FDIVDrr FDIVv2f32 FDIVv4f16 FMADDDrrr FMAXDrr FMAXNMDrr FMAXNMPv2f32 FMAXNMPv2i64p FMAXNMPv4f16 FMAXNMv2f32 FMAXNMv4f16 FMAXPv2f32 FMAXPv2i64p FMAXPv4f16 FMAXv2f32 FMAXv4f16 FMINDrr FMINNMDrr FMINNMPv2f32 FMINNMPv2i64p FMINNMPv4f16 FMINNMv2f32 FMINNMv4f16 FMINPv2f32 FMINPv2i64p FMINPv4f16 FMINv2f32 FMINv4f16 FMLAL2lanev4f16 FMLAL2v4f16 FMLALlanev4f16 FMLALv4f16 FMLAv1i64_indexed FMLAv2f32 FMLAv2i32_indexed FMLAv4f16 FMLAv4i16_indexed FMLSL2lanev4f16 FMLSL2v4f16 FMLSLlanev4f16 FMLSLv4f16 FMLSv1i64_indexed FMLSv2f32 FMLSv2i32_indexed FMLSv4f16 FMLSv4i16_indexed FMOVDi FMOVDr FMOVXDr FMOVv2f32_ns FMOVv4f16_ns FMSUBDrrr FMULDrr FMULX64 FMULXv1i64_indexed FMULXv2f32 FMULXv2i32_indexed FMULXv4f16 FMULXv4i16_indexed FMULv1i64_indexed FMULv2f32 FMULv2i32_indexed FMULv4f16 FMULv4i16_indexed FNEGDr FNEGv2f32 FNEGv4f16 FNMADDDrrr FNMSUBDrrr FNMULDrr FRECPEv1i64 FRECPEv2f32 FRECPEv4f16 FRECPS64 FRECPSv2f32 FRECPSv4f16 FRECPXv1i64 FRINT32XDr FRINT32Xv2f32 FRINT32ZDr FRINT32Zv2f32 FRINT64XDr FRINT64Xv2f32 FRINT64ZDr FRINT64Zv2f32 FRINTADr FRINTAv2f32 FRINTAv4f16 FRINTIDr FRINTIv2f32 FRINTIv4f16 FRINTMDr FRINTMv2f32 FRINTMv4f16 FRINTNDr FRINTNv2f32 FRINTNv4f16 FRINTPDr FRINTPv2f32 FRINTPv4f16 FRINTXDr FRINTXv2f32 FRINTXv4f16 FRINTZDr FRINTZv2f32 FRINTZv4f16 FRSQRTEv1i64 FRSQRTEv2f32 FRSQRTEv4f16 FRSQRTS64 FRSQRTSv2f32 FRSQRTSv4f16 FSQRTDr FSQRTv2f32 FSQRTv4f16 FSUBDrr FSUBv2f32 FSUBv4f16 LASTA_VPZ_D LASTB_VPZ_D LD1Onev1d LD1Onev2s LD1Onev4h LD1Onev8b LD1Rv1d LD1Rv2s LD1Rv4h LD1Rv8b LDAPURdi LDNPDi LDPDi LDRDl LDRDroW LDRDroX LDRDui LDURDi MLAv2i32 MLAv2i32_indexed MLAv4i16 MLAv4i16_indexed MLAv8i8 MLSv2i32 MLSv2i32_indexed MLSv4i16 MLSv4i16_indexed MLSv8i8 MOVID MOVIv2i32 MOVIv2s_msl MOVIv4i16 MOVIv8b_ns MULv2i32 MULv2i32_indexed MULv4i16 MULv4i16_indexed MULv8i8 MVNIv2i32 MVNIv2s_msl MVNIv4i16 NEGv1i64 NEGv2i32 NEGv4i16 NEGv8i8 NOTv8i8 ORNv8i8 ORRv2i32 ORRv4i16 ORRv8i8 PMULv8i8 RADDHNv2i64_v2i32 RADDHNv4i32_v4i16 RADDHNv8i16_v8i8 RBITv8i8 REV16v8i8 REV32v4i16 REV32v8i8 REV64v2i32 REV64v4i16 REV64v8i8 RSHRNv2i32_shift RSHRNv4i16_shift RSHRNv8i8_shift RSUBHNv2i64_v2i32 RSUBHNv4i32_v4i16 RSUBHNv8i16_v8i8 SABAv2i32 SABAv4i16 SABAv8i8 SABDv2i32 SABDv4i16 SABDv8i8 SADALPv2i32_v1i64 SADALPv4i16_v2i32 SADALPv8i8_v4i16 SADDLPv2i32_v1i64 SADDLPv4i16_v2i32 SADDLPv8i8_v4i16 SADDLVv4i32v SCVTFSWDri SCVTFSXDri SCVTFUWDri SCVTFUXDri SCVTFd SCVTFv1i64 SCVTFv2f32 SCVTFv2i32_shift SCVTFv4f16 SCVTFv4i16_shift SDOTlanev8i8 SDOTv8i8 SHADDv2i32 SHADDv4i16 SHADDv8i8 SHLd SHLv2i32_shift SHLv4i16_shift SHLv8i8_shift SHRNv2i32_shift SHRNv4i16_shift SHRNv8i8_shift SHSUBv2i32 SHSUBv4i16 SHSUBv8i8 SLId SLIv2i32_shift SLIv4i16_shift SLIv8i8_shift SMAXPv2i32 SMAXPv4i16 SMAXPv8i8 SMAXv2i32 SMAXv4i16 SMAXv8i8 SMINPv2i32 SMINPv4i16 SMINPv8i8 SMINv2i32 SMINv4i16 SMINv8i8 SQABSv1i64 SQABSv2i32 SQABSv4i16 SQABSv8i8 SQADDv1i64 SQADDv2i32 SQADDv4i16 SQADDv8i8 SQDMLALi32 SQDMLALv1i64_indexed SQDMLSLi32 SQDMLSLv1i64_indexed SQDMULHv2i32 SQDMULHv2i32_indexed SQDMULHv4i16 SQDMULHv4i16_indexed SQDMULLi32 SQDMULLv1i64_indexed SQNEGv1i64 SQNEGv2i32 SQNEGv4i16 SQNEGv8i8 SQRDMLAHv2i32 SQRDMLAHv2i32_indexed SQRDMLAHv4i16 SQRDMLAHv4i16_indexed SQRDMLSHv2i32 SQRDMLSHv2i32_indexed SQRDMLSHv4i16 SQRDMLSHv4i16_indexed SQRDMULHv2i32 SQRDMULHv2i32_indexed SQRDMULHv4i16 SQRDMULHv4i16_indexed SQRSHLv1i64 SQRSHLv2i32 SQRSHLv4i16 SQRSHLv8i8 SQRSHRNv2i32_shift SQRSHRNv4i16_shift SQRSHRNv8i8_shift SQRSHRUNv2i32_shift SQRSHRUNv4i16_shift SQRSHRUNv8i8_shift SQSHLUd SQSHLUv2i32_shift SQSHLUv4i16_shift SQSHLUv8i8_shift SQSHLd SQSHLv1i64 SQSHLv2i32 SQSHLv2i32_shift SQSHLv4i16 SQSHLv4i16_shift SQSHLv8i8 SQSHLv8i8_shift SQSHRNv2i32_shift SQSHRNv4i16_shift SQSHRNv8i8_shift SQSHRUNv2i32_shift SQSHRUNv4i16_shift SQSHRUNv8i8_shift SQSUBv1i64 SQSUBv2i32 SQSUBv4i16 SQSUBv8i8 SQXTNv2i32 SQXTNv4i16 SQXTNv8i8 SQXTUNv2i32 SQXTUNv4i16 SQXTUNv8i8 SRHADDv2i32 SRHADDv4i16 SRHADDv8i8 SRId SRIv2i32_shift SRIv4i16_shift SRIv8i8_shift SRSHLv1i64 SRSHLv2i32 SRSHLv4i16 SRSHLv8i8 SRSHRd SRSHRv2i32_shift SRSHRv4i16_shift SRSHRv8i8_shift SRSRAd SRSRAv2i32_shift SRSRAv4i16_shift SRSRAv8i8_shift SSHLv1i64 SSHLv2i32 SSHLv4i16 SSHLv8i8 SSHRd SSHRv2i32_shift SSHRv4i16_shift SSHRv8i8_shift SSRAd SSRAv2i32_shift SSRAv4i16_shift SSRAv8i8_shift SUBHNv2i64_v2i32 SUBHNv4i32_v4i16 SUBHNv8i16_v8i8 SUBv1i64 SUBv2i32 SUBv4i16 SUBv8i8 SUDOTlanev8i8 SUQADDv1i64 SUQADDv2i32 SUQADDv4i16 SUQADDv8i8 TBLv8i8Four TBLv8i8One TBLv8i8Three TBLv8i8Two TBXv8i8Four TBXv8i8One TBXv8i8Three TBXv8i8Two TRN1v2i32 TRN1v4i16 TRN1v8i8 TRN2v2i32 TRN2v4i16 TRN2v8i8 UABAv2i32 UABAv4i16 UABAv8i8 UABDv2i32 UABDv4i16 UABDv8i8 UADALPv2i32_v1i64 UADALPv4i16_v2i32 UADALPv8i8_v4i16 UADDLPv2i32_v1i64 UADDLPv4i16_v2i32 UADDLPv8i8_v4i16 UADDLVv4i32v UCVTFSWDri UCVTFSXDri UCVTFUWDri UCVTFUXDri UCVTFd UCVTFv1i64 UCVTFv2f32 UCVTFv2i32_shift UCVTFv4f16 UCVTFv4i16_shift UDOTlanev8i8 UDOTv8i8 UHADDv2i32 UHADDv4i16 UHADDv8i8 UHSUBv2i32 UHSUBv4i16 UHSUBv8i8 UMAXPv2i32 UMAXPv4i16 UMAXPv8i8 UMAXv2i32 UMAXv4i16 UMAXv8i8 UMINPv2i32 UMINPv4i16 UMINPv8i8 UMINv2i32 UMINv4i16 UMINv8i8 UQADDv1i64 UQADDv2i32 UQADDv4i16 UQADDv8i8 UQRSHLv1i64 UQRSHLv2i32 UQRSHLv4i16 UQRSHLv8i8 UQRSHRNv2i32_shift UQRSHRNv4i16_shift UQRSHRNv8i8_shift UQSHLd UQSHLv1i64 UQSHLv2i32 UQSHLv2i32_shift UQSHLv4i16 UQSHLv4i16_shift UQSHLv8i8 UQSHLv8i8_shift UQSHRNv2i32_shift UQSHRNv4i16_shift UQSHRNv8i8_shift UQSUBv1i64 UQSUBv2i32 UQSUBv4i16 UQSUBv8i8 UQXTNv2i32 UQXTNv4i16 UQXTNv8i8 URECPEv2i32 URHADDv2i32 URHADDv4i16 URHADDv8i8 URSHLv1i64 URSHLv2i32 URSHLv4i16 URSHLv8i8 URSHRd URSHRv2i32_shift URSHRv4i16_shift URSHRv8i8_shift URSQRTEv2i32 URSRAd URSRAv2i32_shift URSRAv4i16_shift URSRAv8i8_shift USDOTlanev8i8 USDOTv8i8 USHLv1i64 USHLv2i32 USHLv4i16 USHLv8i8 USHRd USHRv2i32_shift USHRv4i16_shift USHRv8i8_shift USQADDv1i64 USQADDv2i32 USQADDv4i16 USQADDv8i8 USRAd USRAv2i32_shift USRAv4i16_shift USRAv8i8_shift UZP1v2i32 UZP1v4i16 UZP1v8i8 UZP2v2i32 UZP2v4i16 UZP2v8i8 XTNv2i32 XTNv4i16 XTNv8i8 ZIP1v2i32 ZIP1v4i16 ZIP1v8i8 ZIP2v2i32 ZIP2v4i16 ZIP2v8i8	2023-05-05 17:26:53 +01:00
Ronak Chauhan	5f0b92e580	[AMDGPU] Also consider global and scratch instructions when flushing vmcnt counter in loop preheader Reviewed By: foad Differential Revision: https://reviews.llvm.org/D149332	2023-05-05 21:12:10 +05:30

1 2 3 4 5 ...

48002 Commits