llvm-project

Author	SHA1	Message	Date
Fangrui Song	8afd831b45	ms inline asm: recognize case-insensitive JMP and CALL as TargetLowering::C_Address In a `__asm` block, a symbol reference is usually a memory constraint (indirect TargetLowering::C_Memory) [LOOP]. CALL and JUMP instructions are special that `__asm call k` can be an address constraint, if `k` is a function. Clang always gives us indirect TargetLowering::C_Memory and need to convert it to direct TargetLowering::C_Address. D133914 implements this conversion, but does not consider JMP or case-insensitive CALL. This patch implements the missing cases, so that `__asm jmp k` (`jmp ${0:P}`) will correctly lower to `jmp _k` instead of `jmp dword ptr [_k]`. (`__asm call k` lowered to `call dword ptr ${0:P}` and is fixed by D149695 to lower to `call ${0:P}` instead.) [LOOP]: Some instructions like LOOP{,E,NE} and Jcc always use an address constraint (`loop _k` instead of `loop dword ptr [_k]`). After this patch and D149579, all the following cases will be correct. ``` int k(int); int (*kptr)(int); ... __asm call k; // correct without this patch __asm CALL k; // correct, but needs this patch to be compatible with D149579 __asm jmp k; // correct, but needs this patch to be compatible with D149579 __asm call kptr; // will be fixed by D149579. "Broken case" in clang/test/CodeGen/ms-inline-asm-functions.c __asm jmp kptr; // will be fixed by this patch and D149579 ``` Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D149920	2023-05-05 15:32:32 -07:00
Craig Topper	9c377c53da	[RISCV] Copy lack-of-signed-truncation-check.ll and signed-truncation-check.ll from AArch6/X86. NFC This is a more exhaustive set of tests for the same issue D149814 is trying to solve.	2023-05-05 11:15:39 -07:00
David Green	6e7840dd42	[AArch64] Extend fp64 top zeroing peephole to all instructions D147235 added a fold to remove instructions that zero the upper half of a register if the instruction already implicitly zeros the register. As far as I can tell this applies to all instructions that define a FPR64 register in AArch64. This patch switches to a check for the register class. The full list of instructions is: BSPv8i8 FMOVD0 ABSv1i64 ABSv2i32 ABSv4i16 ABSv8i8 ADDHNv2i64_v2i32 ADDHNv4i32_v4i16 ADDHNv8i16_v8i8 ADDPv2i32 ADDPv2i64p ADDPv4i16 ADDPv8i8 ADDv1i64 ADDv2i32 ADDv4i16 ADDv8i8 ANDv8i8 BF16DOTlanev4bf16 BFDOTv4bf16 BICv2i32 BICv4i16 BICv8i8 BIFv8i8 BITv8i8 BSLv8i8 CLASTA_VPZ_D CLASTB_VPZ_D CLSv2i32 CLSv4i16 CLSv8i8 CLZv2i32 CLZv4i16 CLZv8i8 CMEQv1i64 CMEQv1i64rz CMEQv2i32 CMEQv2i32rz CMEQv4i16 CMEQv4i16rz CMEQv8i8 CMEQv8i8rz CMGEv1i64 CMGEv1i64rz CMGEv2i32 CMGEv2i32rz CMGEv4i16 CMGEv4i16rz CMGEv8i8 CMGEv8i8rz CMGTv1i64 CMGTv1i64rz CMGTv2i32 CMGTv2i32rz CMGTv4i16 CMGTv4i16rz CMGTv8i8 CMGTv8i8rz CMHIv1i64 CMHIv2i32 CMHIv4i16 CMHIv8i8 CMHSv1i64 CMHSv2i32 CMHSv4i16 CMHSv8i8 CMLEv1i64rz CMLEv2i32rz CMLEv4i16rz CMLEv8i8rz CMLTv1i64rz CMLTv2i32rz CMLTv4i16rz CMLTv8i8rz CMTSTv1i64 CMTSTv2i32 CMTSTv4i16 CMTSTv8i8 CNTv8i8 DUPi64 DUPv2i32gpr DUPv2i32lane DUPv4i16gpr DUPv4i16lane DUPv8i8gpr DUPv8i8lane EORv8i8 EXTv8i8 FABD64 FABDv2f32 FABDv4f16 FABSDr FABSv2f32 FABSv4f16 FACGE64 FACGEv2f32 FACGEv4f16 FACGT64 FACGTv2f32 FACGTv4f16 FADDDrr FADDPv2f32 FADDPv2i64p FADDPv4f16 FADDv2f32 FADDv4f16 FCADDv2f32 FCADDv4f16 FCMEQ64 FCMEQv1i64rz FCMEQv2f32 FCMEQv2i32rz FCMEQv4f16 FCMEQv4i16rz FCMGE64 FCMGEv1i64rz FCMGEv2f32 FCMGEv2i32rz FCMGEv4f16 FCMGEv4i16rz FCMGT64 FCMGTv1i64rz FCMGTv2f32 FCMGTv2i32rz FCMGTv4f16 FCMGTv4i16rz FCMLAv2f32 FCMLAv4f16 FCMLAv4f16_indexed FCMLEv1i64rz FCMLEv2i32rz FCMLEv4i16rz FCMLTv1i64rz FCMLTv2i32rz FCMLTv4i16rz FCSELDrrr FCVTASv1i64 FCVTASv2f32 FCVTASv4f16 FCVTAUv1i64 FCVTAUv2f32 FCVTAUv4f16 FCVTDHr FCVTDSr FCVTMSv1i64 FCVTMSv2f32 FCVTMSv4f16 FCVTMUv1i64 FCVTMUv2f32 FCVTMUv4f16 FCVTNSv1i64 FCVTNSv2f32 FCVTNSv4f16 FCVTNUv1i64 FCVTNUv2f32 FCVTNUv4f16 FCVTNv2i32 FCVTNv4i16 FCVTPSv1i64 FCVTPSv2f32 FCVTPSv4f16 FCVTPUv1i64 FCVTPUv2f32 FCVTPUv4f16 FCVTXNv2f32 FCVTZSd FCVTZSv1i64 FCVTZSv2f32 FCVTZSv2i32_shift FCVTZSv4f16 FCVTZSv4i16_shift FCVTZUd FCVTZUv1i64 FCVTZUv2f32 FCVTZUv2i32_shift FCVTZUv4f16 FCVTZUv4i16_shift FDIVDrr FDIVv2f32 FDIVv4f16 FMADDDrrr FMAXDrr FMAXNMDrr FMAXNMPv2f32 FMAXNMPv2i64p FMAXNMPv4f16 FMAXNMv2f32 FMAXNMv4f16 FMAXPv2f32 FMAXPv2i64p FMAXPv4f16 FMAXv2f32 FMAXv4f16 FMINDrr FMINNMDrr FMINNMPv2f32 FMINNMPv2i64p FMINNMPv4f16 FMINNMv2f32 FMINNMv4f16 FMINPv2f32 FMINPv2i64p FMINPv4f16 FMINv2f32 FMINv4f16 FMLAL2lanev4f16 FMLAL2v4f16 FMLALlanev4f16 FMLALv4f16 FMLAv1i64_indexed FMLAv2f32 FMLAv2i32_indexed FMLAv4f16 FMLAv4i16_indexed FMLSL2lanev4f16 FMLSL2v4f16 FMLSLlanev4f16 FMLSLv4f16 FMLSv1i64_indexed FMLSv2f32 FMLSv2i32_indexed FMLSv4f16 FMLSv4i16_indexed FMOVDi FMOVDr FMOVXDr FMOVv2f32_ns FMOVv4f16_ns FMSUBDrrr FMULDrr FMULX64 FMULXv1i64_indexed FMULXv2f32 FMULXv2i32_indexed FMULXv4f16 FMULXv4i16_indexed FMULv1i64_indexed FMULv2f32 FMULv2i32_indexed FMULv4f16 FMULv4i16_indexed FNEGDr FNEGv2f32 FNEGv4f16 FNMADDDrrr FNMSUBDrrr FNMULDrr FRECPEv1i64 FRECPEv2f32 FRECPEv4f16 FRECPS64 FRECPSv2f32 FRECPSv4f16 FRECPXv1i64 FRINT32XDr FRINT32Xv2f32 FRINT32ZDr FRINT32Zv2f32 FRINT64XDr FRINT64Xv2f32 FRINT64ZDr FRINT64Zv2f32 FRINTADr FRINTAv2f32 FRINTAv4f16 FRINTIDr FRINTIv2f32 FRINTIv4f16 FRINTMDr FRINTMv2f32 FRINTMv4f16 FRINTNDr FRINTNv2f32 FRINTNv4f16 FRINTPDr FRINTPv2f32 FRINTPv4f16 FRINTXDr FRINTXv2f32 FRINTXv4f16 FRINTZDr FRINTZv2f32 FRINTZv4f16 FRSQRTEv1i64 FRSQRTEv2f32 FRSQRTEv4f16 FRSQRTS64 FRSQRTSv2f32 FRSQRTSv4f16 FSQRTDr FSQRTv2f32 FSQRTv4f16 FSUBDrr FSUBv2f32 FSUBv4f16 LASTA_VPZ_D LASTB_VPZ_D LD1Onev1d LD1Onev2s LD1Onev4h LD1Onev8b LD1Rv1d LD1Rv2s LD1Rv4h LD1Rv8b LDAPURdi LDNPDi LDPDi LDRDl LDRDroW LDRDroX LDRDui LDURDi MLAv2i32 MLAv2i32_indexed MLAv4i16 MLAv4i16_indexed MLAv8i8 MLSv2i32 MLSv2i32_indexed MLSv4i16 MLSv4i16_indexed MLSv8i8 MOVID MOVIv2i32 MOVIv2s_msl MOVIv4i16 MOVIv8b_ns MULv2i32 MULv2i32_indexed MULv4i16 MULv4i16_indexed MULv8i8 MVNIv2i32 MVNIv2s_msl MVNIv4i16 NEGv1i64 NEGv2i32 NEGv4i16 NEGv8i8 NOTv8i8 ORNv8i8 ORRv2i32 ORRv4i16 ORRv8i8 PMULv8i8 RADDHNv2i64_v2i32 RADDHNv4i32_v4i16 RADDHNv8i16_v8i8 RBITv8i8 REV16v8i8 REV32v4i16 REV32v8i8 REV64v2i32 REV64v4i16 REV64v8i8 RSHRNv2i32_shift RSHRNv4i16_shift RSHRNv8i8_shift RSUBHNv2i64_v2i32 RSUBHNv4i32_v4i16 RSUBHNv8i16_v8i8 SABAv2i32 SABAv4i16 SABAv8i8 SABDv2i32 SABDv4i16 SABDv8i8 SADALPv2i32_v1i64 SADALPv4i16_v2i32 SADALPv8i8_v4i16 SADDLPv2i32_v1i64 SADDLPv4i16_v2i32 SADDLPv8i8_v4i16 SADDLVv4i32v SCVTFSWDri SCVTFSXDri SCVTFUWDri SCVTFUXDri SCVTFd SCVTFv1i64 SCVTFv2f32 SCVTFv2i32_shift SCVTFv4f16 SCVTFv4i16_shift SDOTlanev8i8 SDOTv8i8 SHADDv2i32 SHADDv4i16 SHADDv8i8 SHLd SHLv2i32_shift SHLv4i16_shift SHLv8i8_shift SHRNv2i32_shift SHRNv4i16_shift SHRNv8i8_shift SHSUBv2i32 SHSUBv4i16 SHSUBv8i8 SLId SLIv2i32_shift SLIv4i16_shift SLIv8i8_shift SMAXPv2i32 SMAXPv4i16 SMAXPv8i8 SMAXv2i32 SMAXv4i16 SMAXv8i8 SMINPv2i32 SMINPv4i16 SMINPv8i8 SMINv2i32 SMINv4i16 SMINv8i8 SQABSv1i64 SQABSv2i32 SQABSv4i16 SQABSv8i8 SQADDv1i64 SQADDv2i32 SQADDv4i16 SQADDv8i8 SQDMLALi32 SQDMLALv1i64_indexed SQDMLSLi32 SQDMLSLv1i64_indexed SQDMULHv2i32 SQDMULHv2i32_indexed SQDMULHv4i16 SQDMULHv4i16_indexed SQDMULLi32 SQDMULLv1i64_indexed SQNEGv1i64 SQNEGv2i32 SQNEGv4i16 SQNEGv8i8 SQRDMLAHv2i32 SQRDMLAHv2i32_indexed SQRDMLAHv4i16 SQRDMLAHv4i16_indexed SQRDMLSHv2i32 SQRDMLSHv2i32_indexed SQRDMLSHv4i16 SQRDMLSHv4i16_indexed SQRDMULHv2i32 SQRDMULHv2i32_indexed SQRDMULHv4i16 SQRDMULHv4i16_indexed SQRSHLv1i64 SQRSHLv2i32 SQRSHLv4i16 SQRSHLv8i8 SQRSHRNv2i32_shift SQRSHRNv4i16_shift SQRSHRNv8i8_shift SQRSHRUNv2i32_shift SQRSHRUNv4i16_shift SQRSHRUNv8i8_shift SQSHLUd SQSHLUv2i32_shift SQSHLUv4i16_shift SQSHLUv8i8_shift SQSHLd SQSHLv1i64 SQSHLv2i32 SQSHLv2i32_shift SQSHLv4i16 SQSHLv4i16_shift SQSHLv8i8 SQSHLv8i8_shift SQSHRNv2i32_shift SQSHRNv4i16_shift SQSHRNv8i8_shift SQSHRUNv2i32_shift SQSHRUNv4i16_shift SQSHRUNv8i8_shift SQSUBv1i64 SQSUBv2i32 SQSUBv4i16 SQSUBv8i8 SQXTNv2i32 SQXTNv4i16 SQXTNv8i8 SQXTUNv2i32 SQXTUNv4i16 SQXTUNv8i8 SRHADDv2i32 SRHADDv4i16 SRHADDv8i8 SRId SRIv2i32_shift SRIv4i16_shift SRIv8i8_shift SRSHLv1i64 SRSHLv2i32 SRSHLv4i16 SRSHLv8i8 SRSHRd SRSHRv2i32_shift SRSHRv4i16_shift SRSHRv8i8_shift SRSRAd SRSRAv2i32_shift SRSRAv4i16_shift SRSRAv8i8_shift SSHLv1i64 SSHLv2i32 SSHLv4i16 SSHLv8i8 SSHRd SSHRv2i32_shift SSHRv4i16_shift SSHRv8i8_shift SSRAd SSRAv2i32_shift SSRAv4i16_shift SSRAv8i8_shift SUBHNv2i64_v2i32 SUBHNv4i32_v4i16 SUBHNv8i16_v8i8 SUBv1i64 SUBv2i32 SUBv4i16 SUBv8i8 SUDOTlanev8i8 SUQADDv1i64 SUQADDv2i32 SUQADDv4i16 SUQADDv8i8 TBLv8i8Four TBLv8i8One TBLv8i8Three TBLv8i8Two TBXv8i8Four TBXv8i8One TBXv8i8Three TBXv8i8Two TRN1v2i32 TRN1v4i16 TRN1v8i8 TRN2v2i32 TRN2v4i16 TRN2v8i8 UABAv2i32 UABAv4i16 UABAv8i8 UABDv2i32 UABDv4i16 UABDv8i8 UADALPv2i32_v1i64 UADALPv4i16_v2i32 UADALPv8i8_v4i16 UADDLPv2i32_v1i64 UADDLPv4i16_v2i32 UADDLPv8i8_v4i16 UADDLVv4i32v UCVTFSWDri UCVTFSXDri UCVTFUWDri UCVTFUXDri UCVTFd UCVTFv1i64 UCVTFv2f32 UCVTFv2i32_shift UCVTFv4f16 UCVTFv4i16_shift UDOTlanev8i8 UDOTv8i8 UHADDv2i32 UHADDv4i16 UHADDv8i8 UHSUBv2i32 UHSUBv4i16 UHSUBv8i8 UMAXPv2i32 UMAXPv4i16 UMAXPv8i8 UMAXv2i32 UMAXv4i16 UMAXv8i8 UMINPv2i32 UMINPv4i16 UMINPv8i8 UMINv2i32 UMINv4i16 UMINv8i8 UQADDv1i64 UQADDv2i32 UQADDv4i16 UQADDv8i8 UQRSHLv1i64 UQRSHLv2i32 UQRSHLv4i16 UQRSHLv8i8 UQRSHRNv2i32_shift UQRSHRNv4i16_shift UQRSHRNv8i8_shift UQSHLd UQSHLv1i64 UQSHLv2i32 UQSHLv2i32_shift UQSHLv4i16 UQSHLv4i16_shift UQSHLv8i8 UQSHLv8i8_shift UQSHRNv2i32_shift UQSHRNv4i16_shift UQSHRNv8i8_shift UQSUBv1i64 UQSUBv2i32 UQSUBv4i16 UQSUBv8i8 UQXTNv2i32 UQXTNv4i16 UQXTNv8i8 URECPEv2i32 URHADDv2i32 URHADDv4i16 URHADDv8i8 URSHLv1i64 URSHLv2i32 URSHLv4i16 URSHLv8i8 URSHRd URSHRv2i32_shift URSHRv4i16_shift URSHRv8i8_shift URSQRTEv2i32 URSRAd URSRAv2i32_shift URSRAv4i16_shift URSRAv8i8_shift USDOTlanev8i8 USDOTv8i8 USHLv1i64 USHLv2i32 USHLv4i16 USHLv8i8 USHRd USHRv2i32_shift USHRv4i16_shift USHRv8i8_shift USQADDv1i64 USQADDv2i32 USQADDv4i16 USQADDv8i8 USRAd USRAv2i32_shift USRAv4i16_shift USRAv8i8_shift UZP1v2i32 UZP1v4i16 UZP1v8i8 UZP2v2i32 UZP2v4i16 UZP2v8i8 XTNv2i32 XTNv4i16 XTNv8i8 ZIP1v2i32 ZIP1v4i16 ZIP1v8i8 ZIP2v2i32 ZIP2v4i16 ZIP2v8i8	2023-05-05 17:26:53 +01:00
Ronak Chauhan	5f0b92e580	[AMDGPU] Also consider global and scratch instructions when flushing vmcnt counter in loop preheader Reviewed By: foad Differential Revision: https://reviews.llvm.org/D149332	2023-05-05 21:12:10 +05:30
Thomas Lively	72a72315b0	[WebAssembly] Mark @llvm.wasm.shuffle lane indices as immediates This intrinsic is meant to lower directly to the i8x16.shuffle instruction, which takes its lane index arguments as immmediates. The ISel for the intrinsic assumed that the lane index arguments were constants, so bitcode that "incorrectly" used this intrinsic with non-immediate arguments caused an assertion failure in the backend. Avoid the crash by defining the lane index arguments to be immediates, matching the underlying instruction. Update ISel accordingly. This change means that the bitcode that previously caused a crash will now fail to validate. Fixes #55559. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D149898	2023-05-05 08:12:41 -07:00
David Green	44e7b8aaf1	[AArch64] Tests for implicit zero patterns. NFC See D149616	2023-05-05 15:12:50 +01:00
Jingu Kang	b18161d785	[AArch64] Handle vector with two different values If vector has two different values and it can be splitted into two sub vectors with same length, generate two DUP and CONCAT_VECTORS/VECTOR_SHUFFLE. For example, t22: v16i8 = BUILD_VECTOR t23, t23, t23, t23, t23, t23, t23, t23, t24, t24, t24, t24, t24, t24, t24, t24 ==> t26: v8i8 = AArch64ISD::DUP t23 t28: v8i8 = AArch64ISD::DUP t24 t29: v16i8 = concat_vectors t26, t28 Differential Revision: https://reviews.llvm.org/D148347	2023-05-05 14:42:59 +01:00
Matt Devereau	ea228bd0bd	[AArch64] Emit FNMADD instead of FNEG(FMADD) Emit FNMADD instead of FNEG(FMADD) for optimization levels above Oz when fast-math flags (nsz+contract) permit it. Differential Revision: https://reviews.llvm.org/D149260	2023-05-05 13:35:51 +00:00
Serguei Katkov	5b7f8d9da5	[X86] Add tests for fminimum/fmaximum for vector operands.	2023-05-05 18:42:58 +07:00
Matt Devereau	f9ff2468af	Revert "[AArch64] Emit FNMADD instead of FNEG(FMADD)" This reverts commit caa95c2408677d7af8c7be4da203ea9271854f46.	2023-05-05 10:50:23 +00:00
Simon Pilgrim	9a1cb8a856	[X86] Add abds/abdu lowering for scalar i8/i16/i32/i64 types The next step will be to begin adding generic legalization/lowering support	2023-05-05 11:49:33 +01:00
Matt Devereau	d9acb2aa91	Revert "Add AArch64 requirement for aarch64_fnmadd.ll" This reverts commit a9919db65a1afa71ac62631d51711383c17d43fc.	2023-05-05 10:49:06 +00:00
Serguei Katkov	50cd2ff7bc	[X86] Avoid usage constant -1 for fminimum/fmaximum lowering Instead of equality comparison of value to preferred zero we can check just the sign of value and if sign is set we should put this value as second operand for minimum and first operand for maximum. In this case FMIN/FMAX will choose the right result for 0.f and -0.f comparison. This allows us: 1. avoid loading of big 64-bit constant for fminimum. 2. for double on non-64-nib platform we need to check only high part of value. 3. test against zero to check sign takes less size of instruction Additionally, if we know that any of value is guaranteed to be non-zero we should not care about 0.f and -0.f comparison. Reviewed By: e-kud Differential Revision: https://reviews.llvm.org/D149812	2023-05-05 16:24:33 +07:00
Nicolai Hähnle	ef13308b26	AMDGPU/SDAG: Improve {extract,insert}_subvector lowering for 16-bit vectors v2: - simplify the escape to TableGen patterns Differential Revision: https://reviews.llvm.org/D149841	2023-05-05 10:55:18 +02:00
Serguei Katkov	96e09fef3c	[X86] Avoid usage constant NaN for fminimum/fmaximum lowering After applying FMIN/FMAX, if any of operands is NaN, the second operand will be the result. So all we need is to check whether first operand is NaN and return it or result of FMIN/FMAX. So we avoid usage of constant NaN in the lowering. Additionally we can avoid handling NaN after FMIN/FMAX if we are sure that first operand is not NaN. Reviewed By: e-kud Differential Revision: https://reviews.llvm.org/D149729	2023-05-05 15:42:54 +07:00
Matt Devereau	a9919db65a	Add AArch64 requirement for aarch64_fnmadd.ll	2023-05-05 08:36:05 +00:00
Matt Devereau	caa95c2408	[AArch64] Emit FNMADD instead of FNEG(FMADD) Emit FNMADD instead of FNEG(FMADD) for optimization levels above Oz when fast-math flags (nsz+contract) permit it. Differential Revision: https://reviews.llvm.org/D149260	2023-05-05 08:14:17 +00:00
Fangrui Song	89e02c7aea	[test] Update DirectX/min_vec_size.ll after shufflevector mask vector poison change	2023-05-04 22:30:36 -07:00
Craig Topper	38007dd394	[RISCV] Promote i1 shuffles to i8 shuffles. Otherwise I think we extract and use a build_vector. There may be some more improvements that can be made and there might be some cases that we should do something different for, but this seemed like a decent starting point. Reviewed By: luke Differential Revision: https://reviews.llvm.org/D149724	2023-05-04 19:44:43 -07:00
Craig Topper	fe9f557578	[DAGCombiner][RISCV] Enable reassociation for VP_FMA in visitFADDForFMACombine. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D149911	2023-05-04 17:20:58 -07:00
Craig Topper	f10bcf6f9d	[RISCV] Add vp.icmp/fcmp to RISCVTargetLowering::canSplatOperand.	2023-05-04 16:56:14 -07:00
Yeting Kuo	287aa6c453	[DAGCombiner] Use generalized pattern match for visitFSUBForFMACombine. The patch makes visitFSUBForFMACombine serve vp.fsub too. It helps DAGCombiner to fuse vp.fsub and vp.fmul patterns to vp.fma. Reviewed By: luke Differential Revision: https://reviews.llvm.org/D149821	2023-05-04 22:02:32 +08:00
Evgenii Kudriashov	a82d27a9a6	[X86] Support llvm.{min,max}imum.f{16,32,64} Addresses https://github.com/llvm/llvm-project/issues/53353 Reviewed By: RKSimon, pengfei Differential Revision: https://reviews.llvm.org/D145634	2023-05-04 21:04:48 +08:00
Evgenii Kudriashov	62f1d91727	[NFC][X86] Remove cfi instructions and unused attributes from half.ll test Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D149114	2023-05-04 21:04:48 +08:00
Joseph Huber	f05ce9045a	[NVPTX] Add NVPTXCtorDtorLoweringPass to handle global ctors / dtors This patch mostly adapts the existing AMDGPUCtorDtorLoweringPass for use by the Nvidia backend. This pass transforms the ctor / dtor list into a kernel call that can be used to invoke those functinos. Furthermore, we emit globals such that the names and addresses of these constructor functions can be found by the driver. Unfortunately, since NVPTX has no way to emit variables at a named section, nor a functioning linker to provide the begin / end symbols, we need to mangle these names and have an external application find them. This work is related to the work in D149398 and D149340. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D149451	2023-05-04 07:13:00 -05:00
Nicolai Hähnle	909095a880	AMDGPU: Precommit test showing codegen weakness The code sequence on gfx9 has a lot of useless v_bfi instructions. Differential Revision: https://reviews.llvm.org/D149840	2023-05-04 14:11:04 +02:00
Luke Lau	d9683a70fe	[RISCV] Fix extract_vector_elt on i1 at idx 0 being inverted It looks like the intention here is to truncate a XLenVT -> i1, in which case we should be emitting snez instead of sneq if I'm understanding correctly. Reviewed By: jacquesguan, frasercrmck Differential Revision: https://reviews.llvm.org/D149732	2023-05-04 11:45:35 +01:00
Tom Weaver	1d8ab713ad	Revert "[DebugLine] save one debug line entry for empty prologue" This reverts commit b48a8233f5e230e46182bf5c523ceb6a04cec8f5. This change caused https://lab.llvm.org/buildbot/#/builders/247/builds/4125 to start failing, please address the failures before resubmitting.	2023-05-04 11:08:58 +01:00
Luke Lau	9e9bf1e3ed	[RISCV] Use setcc to truncate results in widenVectorOpsToi8 To avoid an unnecessary vand.vi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D149771	2023-05-04 10:49:27 +01:00
Chen Zheng	b48a8233f5	[DebugLine] save one debug line entry for empty prologue Some debuggers like DBX on AIX assume the address in debug line entries is always incremental. But clang generates two entries (entry for file scope line and entry for prologue end) with same address if prologue is empty And if the prologue is empty, seems the first debug line entry for the function is unnecessary(i.e. removing the first entry won't impact the behavior in GDB on Linux), so I implement this for all debuggers. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D147506	2023-05-04 04:37:34 +00:00
Shao-Ce SUN	2dc0fa050e	[RISCV][CodeGen] Support Zdinx on RV64 codegen This patch was split from D122918 . Co-Author: @liaolucy @realqhc Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D149665	2023-05-04 09:00:40 +08:00
Thomas Lively	abdb5e041c	[WebAssembly] Remove incorrect result from wasm64 store_lane instructions The wasm64 versions of the v128.storeX_lane instructions was incorrectly defined as returning a v128 value, which resulted in spurious drop instructions being emitted and causing validation to fail. This was not caught earlier because wasm64 has been experimental and not well tested. Update the relevant test file to test both wasm32 and wasm64. Fixes #62443. Differential Revision: https://reviews.llvm.org/D149780	2023-05-03 16:00:20 -07:00
Krzysztof Drewniak	fc05b7f0d0	[AMDGPU] Add gfx940 to fp64 atomic tests in global ISel This changes the test in GlobalISel, which makes it match the test elsewhere. Differential Revision: https://reviews.llvm.org/D149795	2023-05-03 22:40:16 +00:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00
Mateja Marjanovic	cf76074a36	[AMDGPU][GlobalISel] Check exact width in get*ClassForBitWidth and widen if necessary Instead of checking if the given bitwidth is less or equal to a bitwidth of an existing RegClass, check if it has the exact same value. For LLVM vector types that don't have a corresponding Register Class, widen them during legalization. That goes for G_EXTRACT_VECTOR_ELT, G_INSERT_VECTOR_ELT and G_BUILD_VECTOR. Differential revision: https://reviews.llvm.org/D148096 Reviewers: foad, arsenm	2023-05-03 17:32:24 +02:00
Mateja Marjanovic	6175ec0bb6	Revert "[AMDGPU][GlobalISel] Widen the vector operand in G_BUILD/INSERT/EXTRACT_VECTOR" This reverts commit b25c7cafcbe1b52ea2d1ff5e5c2f13674b5f297d.	2023-05-03 17:28:01 +02:00
Krzysztof Drewniak	f9c1ede254	[AMDGPU] Define data layout entries for buffers Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441	2023-05-03 15:25:58 +00:00
Mateja Marjanovic	b25c7cafcb	[AMDGPU][GlobalISel] Widen the vector operand in G_BUILD/INSERT/EXTRACT_VECTOR Widen the vector operand type in G_BUILD_VECTOR, G_INSERT_VECTOR_ELT, G_EXTRACT_VECTOR_ELT to the nearest larger RegClass.	2023-05-03 17:14:38 +02:00
Sp00ph	8e46ac3623	[AArch64] Add more efficient bitwise vector reductions. Improves the codegen for VECREDUCE_{AND,OR,XOR} operations on AArch64. Currently, these are fully scalarized, except if the vector is a <N x i1>. This patch improves the codegen down to O(log(N)) where N is the length of the vector for vectors whose elements are not i1, by repeatedly applying the bitwise operations to the two halves of the vector. <N x i1> bitwise reductions are handled using VECREDUCE_{UMAX,UMIN,ADD} instead. I had to update quite a few codegen tests with these changes, with a general downward trend in instruction count. Since the vector reductions already have tests, I haven't added any new tests myself. Differential Revision: https://reviews.llvm.org/D148185	2023-05-03 15:56:16 +01:00
Philip Reames	53710b43a0	[RISCV] Use vslidedown for undef sub-sequences in generic build_vector This is a follow up to D149263 which extends the generic vslide1down handling to use vslidedown (without the one) for undef elements, and in particular for undef sub-sequences. This both removes the domain crossing, and for undef subsequences results in fewer instructions over all. Differential Revision: https://reviews.llvm.org/D149658#inline-1446673	2023-05-03 07:52:29 -07:00
Philip Reames	9fc5af1b84	[RISCV] Use vslide1down lowering for two element non-constant build_vectors When the values are in GPRs, the vslide1down lowering is always better. We need to greatly improve the splat-and-mask cost model to handle constants in a meaningful way, so for now, limit this to non-constant vectors. This does send the "partially constant" case down the vslide1down path. This could cause some regressions, though I don't see any in practice. The cost modeling for the general case is annoyingly tricky. We have a great amount of inconsistency around immediate operands, and as a result, the exact constant and exact lowering choice matters a lot. I'm hoping that we get a "good enough" result without modeling this exactly, but we may need to do something analogous to getIntMatCost (i.e. a search w/costing). Differential Revision: https://reviews.llvm.org/D149667	2023-05-03 07:35:23 -07:00
WuXinlong	9f0d725744	[RISCV] Add MC support of RISCV zcmt Extension This patch add the instructions of zcmt extension. [[ https://github.com/riscv/riscv-code-size-reduction/releases/tag/v1.0.0-RC5.7 \| spac is here ]] Which includes two instructions (cm.jt&cm.jalt) and a CSR Reg JVT co-author: @Scott Egerton Reviewed By: kito-cheng, craig.topper Differential Revision: https://reviews.llvm.org/D133863	2023-05-03 22:06:37 +08:00
David Green	b96967ad17	[AArch64] Combine concat through rshrn This tries to push the concat in trunc(concat(rshr, rshr)) into the leaves, so that we can generate rshrn(concat). This helps improve the codegen for small types, using the existing rshrn patterns. Differential Revision: https://reviews.llvm.org/D149636	2023-05-03 14:48:50 +01:00
David Green	15723e6f8c	[AArch64] Additional tests for rshrn patterns. NFC See D149636	2023-05-03 13:15:27 +01:00
Florian Hahn	4e2b4f97a0	[ShrinkWrap] Use underlying object to rule out stack access. Allow shrink-wrapping past memory accesses that only access globals or function arguments. This patch uses getUnderlyingObject to try to identify the accessed object by a given memory operand. If it is a global or an argument, it does not access the stack of the current function and should not block shrink wrapping. Note that the caller's stack may get accessed when passing an argument via the stack, but not the stack of the current function. This addresses part of the TODO from D63152. Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D149668	2023-05-03 09:28:07 +01:00
pvanhout	415956fe7e	[llvm-readobj][AMDGPU] Bypass MD verification for PAL Small split change from D146023. Migrate elf-notes to v4 and fix llvm-readobj to work with PAL metadata. Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D146119	2023-05-03 08:45:24 +02:00
Florian Hahn	bc1c95d973	[ShrinkWrap] Add tests with loads from byval/inalloca/preallocated args. Extra test coverage for D149668.	2023-05-02 20:41:58 +01:00
Simon Pilgrim	edce93c9d8	[X86] Lower abdu(lhs, rhs) -> or(usubsat(lhs,rhs), usubsat(rhs,lhs)) Adds pre-SSE4 v8i16 abdu handling - we already have something similar for umax(x,y) -> add(x,usubsat(y,x)) / umin(x,y) -> sub(x,usubsat(x,y)) (I'm starting to look at adding generic TargetLowering expandABD() handling and came across this missed opportunity). Inspiration: http://0x80.pl/notesen/2018-03-11-sse-abs-unsigned.html Alive2: https://alive2.llvm.org/ce/z/gMhaTa	2023-05-02 19:54:04 +01:00
Florian Hahn	7f8dee5c54	[X86] Remove stale checks after a30c17aba9.	2023-05-02 18:24:50 +01:00

1 2 3 4 5 ...

47956 Commits