47956 Commits

Author SHA1 Message Date
Fangrui Song
8afd831b45 ms inline asm: recognize case-insensitive JMP and CALL as TargetLowering::C_Address
In a `__asm` block, a symbol reference is usually a memory constraint
(indirect TargetLowering::C_Memory) [LOOP]. CALL and JUMP instructions are special
that `__asm call k` can be an address constraint, if `k` is a function.

Clang always gives us indirect TargetLowering::C_Memory and need to convert it
to direct TargetLowering::C_Address. D133914 implements this conversion, but
does not consider JMP or case-insensitive CALL. This patch implements the missing
cases, so that `__asm jmp k` (`jmp ${0:P}`) will correctly lower to `jmp _k`
instead of `jmp dword ptr [_k]`.

(`__asm call k` lowered to `call dword ptr ${0:P}` and is fixed by D149695 to
lower to `call ${0:P}` instead.)

[LOOP]: Some instructions like LOOP{,E,NE} and Jcc always use an address
constraint (`loop _k` instead of `loop dword ptr [_k]`).

After this patch and D149579, all the following cases will be correct.
```
int k(int);
int (*kptr)(int);
...
__asm call k; // correct without this patch
__asm CALL k; // correct, but needs this patch to be compatible with D149579
__asm jmp k;  // correct, but needs this patch to be compatible with D149579
__asm call kptr; // will be fixed by D149579.  "Broken case" in clang/test/CodeGen/ms-inline-asm-functions.c
__asm jmp kptr;  // will be fixed by this patch and D149579
```

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D149920
2023-05-05 15:32:32 -07:00
Craig Topper
9c377c53da [RISCV] Copy lack-of-signed-truncation-check.ll and signed-truncation-check.ll from AArch6/X86. NFC
This is a more exhaustive set of tests for the same issue D149814
is trying to solve.
2023-05-05 11:15:39 -07:00
David Green
6e7840dd42 [AArch64] Extend fp64 top zeroing peephole to all instructions
D147235 added a fold to remove instructions that zero the upper half of a
register if the instruction already implicitly zeros the register. As far as I
can tell this applies to all instructions that define a FPR64 register in
AArch64. This patch switches to a check for the register class. The full list
of instructions is:
  BSPv8i8
  FMOVD0
  ABSv1i64
  ABSv2i32
  ABSv4i16
  ABSv8i8
  ADDHNv2i64_v2i32
  ADDHNv4i32_v4i16
  ADDHNv8i16_v8i8
  ADDPv2i32
  ADDPv2i64p
  ADDPv4i16
  ADDPv8i8
  ADDv1i64
  ADDv2i32
  ADDv4i16
  ADDv8i8
  ANDv8i8
  BF16DOTlanev4bf16
  BFDOTv4bf16
  BICv2i32
  BICv4i16
  BICv8i8
  BIFv8i8
  BITv8i8
  BSLv8i8
  CLASTA_VPZ_D
  CLASTB_VPZ_D
  CLSv2i32
  CLSv4i16
  CLSv8i8
  CLZv2i32
  CLZv4i16
  CLZv8i8
  CMEQv1i64
  CMEQv1i64rz
  CMEQv2i32
  CMEQv2i32rz
  CMEQv4i16
  CMEQv4i16rz
  CMEQv8i8
  CMEQv8i8rz
  CMGEv1i64
  CMGEv1i64rz
  CMGEv2i32
  CMGEv2i32rz
  CMGEv4i16
  CMGEv4i16rz
  CMGEv8i8
  CMGEv8i8rz
  CMGTv1i64
  CMGTv1i64rz
  CMGTv2i32
  CMGTv2i32rz
  CMGTv4i16
  CMGTv4i16rz
  CMGTv8i8
  CMGTv8i8rz
  CMHIv1i64
  CMHIv2i32
  CMHIv4i16
  CMHIv8i8
  CMHSv1i64
  CMHSv2i32
  CMHSv4i16
  CMHSv8i8
  CMLEv1i64rz
  CMLEv2i32rz
  CMLEv4i16rz
  CMLEv8i8rz
  CMLTv1i64rz
  CMLTv2i32rz
  CMLTv4i16rz
  CMLTv8i8rz
  CMTSTv1i64
  CMTSTv2i32
  CMTSTv4i16
  CMTSTv8i8
  CNTv8i8
  DUPi64
  DUPv2i32gpr
  DUPv2i32lane
  DUPv4i16gpr
  DUPv4i16lane
  DUPv8i8gpr
  DUPv8i8lane
  EORv8i8
  EXTv8i8
  FABD64
  FABDv2f32
  FABDv4f16
  FABSDr
  FABSv2f32
  FABSv4f16
  FACGE64
  FACGEv2f32
  FACGEv4f16
  FACGT64
  FACGTv2f32
  FACGTv4f16
  FADDDrr
  FADDPv2f32
  FADDPv2i64p
  FADDPv4f16
  FADDv2f32
  FADDv4f16
  FCADDv2f32
  FCADDv4f16
  FCMEQ64
  FCMEQv1i64rz
  FCMEQv2f32
  FCMEQv2i32rz
  FCMEQv4f16
  FCMEQv4i16rz
  FCMGE64
  FCMGEv1i64rz
  FCMGEv2f32
  FCMGEv2i32rz
  FCMGEv4f16
  FCMGEv4i16rz
  FCMGT64
  FCMGTv1i64rz
  FCMGTv2f32
  FCMGTv2i32rz
  FCMGTv4f16
  FCMGTv4i16rz
  FCMLAv2f32
  FCMLAv4f16
  FCMLAv4f16_indexed
  FCMLEv1i64rz
  FCMLEv2i32rz
  FCMLEv4i16rz
  FCMLTv1i64rz
  FCMLTv2i32rz
  FCMLTv4i16rz
  FCSELDrrr
  FCVTASv1i64
  FCVTASv2f32
  FCVTASv4f16
  FCVTAUv1i64
  FCVTAUv2f32
  FCVTAUv4f16
  FCVTDHr
  FCVTDSr
  FCVTMSv1i64
  FCVTMSv2f32
  FCVTMSv4f16
  FCVTMUv1i64
  FCVTMUv2f32
  FCVTMUv4f16
  FCVTNSv1i64
  FCVTNSv2f32
  FCVTNSv4f16
  FCVTNUv1i64
  FCVTNUv2f32
  FCVTNUv4f16
  FCVTNv2i32
  FCVTNv4i16
  FCVTPSv1i64
  FCVTPSv2f32
  FCVTPSv4f16
  FCVTPUv1i64
  FCVTPUv2f32
  FCVTPUv4f16
  FCVTXNv2f32
  FCVTZSd
  FCVTZSv1i64
  FCVTZSv2f32
  FCVTZSv2i32_shift
  FCVTZSv4f16
  FCVTZSv4i16_shift
  FCVTZUd
  FCVTZUv1i64
  FCVTZUv2f32
  FCVTZUv2i32_shift
  FCVTZUv4f16
  FCVTZUv4i16_shift
  FDIVDrr
  FDIVv2f32
  FDIVv4f16
  FMADDDrrr
  FMAXDrr
  FMAXNMDrr
  FMAXNMPv2f32
  FMAXNMPv2i64p
  FMAXNMPv4f16
  FMAXNMv2f32
  FMAXNMv4f16
  FMAXPv2f32
  FMAXPv2i64p
  FMAXPv4f16
  FMAXv2f32
  FMAXv4f16
  FMINDrr
  FMINNMDrr
  FMINNMPv2f32
  FMINNMPv2i64p
  FMINNMPv4f16
  FMINNMv2f32
  FMINNMv4f16
  FMINPv2f32
  FMINPv2i64p
  FMINPv4f16
  FMINv2f32
  FMINv4f16
  FMLAL2lanev4f16
  FMLAL2v4f16
  FMLALlanev4f16
  FMLALv4f16
  FMLAv1i64_indexed
  FMLAv2f32
  FMLAv2i32_indexed
  FMLAv4f16
  FMLAv4i16_indexed
  FMLSL2lanev4f16
  FMLSL2v4f16
  FMLSLlanev4f16
  FMLSLv4f16
  FMLSv1i64_indexed
  FMLSv2f32
  FMLSv2i32_indexed
  FMLSv4f16
  FMLSv4i16_indexed
  FMOVDi
  FMOVDr
  FMOVXDr
  FMOVv2f32_ns
  FMOVv4f16_ns
  FMSUBDrrr
  FMULDrr
  FMULX64
  FMULXv1i64_indexed
  FMULXv2f32
  FMULXv2i32_indexed
  FMULXv4f16
  FMULXv4i16_indexed
  FMULv1i64_indexed
  FMULv2f32
  FMULv2i32_indexed
  FMULv4f16
  FMULv4i16_indexed
  FNEGDr
  FNEGv2f32
  FNEGv4f16
  FNMADDDrrr
  FNMSUBDrrr
  FNMULDrr
  FRECPEv1i64
  FRECPEv2f32
  FRECPEv4f16
  FRECPS64
  FRECPSv2f32
  FRECPSv4f16
  FRECPXv1i64
  FRINT32XDr
  FRINT32Xv2f32
  FRINT32ZDr
  FRINT32Zv2f32
  FRINT64XDr
  FRINT64Xv2f32
  FRINT64ZDr
  FRINT64Zv2f32
  FRINTADr
  FRINTAv2f32
  FRINTAv4f16
  FRINTIDr
  FRINTIv2f32
  FRINTIv4f16
  FRINTMDr
  FRINTMv2f32
  FRINTMv4f16
  FRINTNDr
  FRINTNv2f32
  FRINTNv4f16
  FRINTPDr
  FRINTPv2f32
  FRINTPv4f16
  FRINTXDr
  FRINTXv2f32
  FRINTXv4f16
  FRINTZDr
  FRINTZv2f32
  FRINTZv4f16
  FRSQRTEv1i64
  FRSQRTEv2f32
  FRSQRTEv4f16
  FRSQRTS64
  FRSQRTSv2f32
  FRSQRTSv4f16
  FSQRTDr
  FSQRTv2f32
  FSQRTv4f16
  FSUBDrr
  FSUBv2f32
  FSUBv4f16
  LASTA_VPZ_D
  LASTB_VPZ_D
  LD1Onev1d
  LD1Onev2s
  LD1Onev4h
  LD1Onev8b
  LD1Rv1d
  LD1Rv2s
  LD1Rv4h
  LD1Rv8b
  LDAPURdi
  LDNPDi
  LDPDi
  LDRDl
  LDRDroW
  LDRDroX
  LDRDui
  LDURDi
  MLAv2i32
  MLAv2i32_indexed
  MLAv4i16
  MLAv4i16_indexed
  MLAv8i8
  MLSv2i32
  MLSv2i32_indexed
  MLSv4i16
  MLSv4i16_indexed
  MLSv8i8
  MOVID
  MOVIv2i32
  MOVIv2s_msl
  MOVIv4i16
  MOVIv8b_ns
  MULv2i32
  MULv2i32_indexed
  MULv4i16
  MULv4i16_indexed
  MULv8i8
  MVNIv2i32
  MVNIv2s_msl
  MVNIv4i16
  NEGv1i64
  NEGv2i32
  NEGv4i16
  NEGv8i8
  NOTv8i8
  ORNv8i8
  ORRv2i32
  ORRv4i16
  ORRv8i8
  PMULv8i8
  RADDHNv2i64_v2i32
  RADDHNv4i32_v4i16
  RADDHNv8i16_v8i8
  RBITv8i8
  REV16v8i8
  REV32v4i16
  REV32v8i8
  REV64v2i32
  REV64v4i16
  REV64v8i8
  RSHRNv2i32_shift
  RSHRNv4i16_shift
  RSHRNv8i8_shift
  RSUBHNv2i64_v2i32
  RSUBHNv4i32_v4i16
  RSUBHNv8i16_v8i8
  SABAv2i32
  SABAv4i16
  SABAv8i8
  SABDv2i32
  SABDv4i16
  SABDv8i8
  SADALPv2i32_v1i64
  SADALPv4i16_v2i32
  SADALPv8i8_v4i16
  SADDLPv2i32_v1i64
  SADDLPv4i16_v2i32
  SADDLPv8i8_v4i16
  SADDLVv4i32v
  SCVTFSWDri
  SCVTFSXDri
  SCVTFUWDri
  SCVTFUXDri
  SCVTFd
  SCVTFv1i64
  SCVTFv2f32
  SCVTFv2i32_shift
  SCVTFv4f16
  SCVTFv4i16_shift
  SDOTlanev8i8
  SDOTv8i8
  SHADDv2i32
  SHADDv4i16
  SHADDv8i8
  SHLd
  SHLv2i32_shift
  SHLv4i16_shift
  SHLv8i8_shift
  SHRNv2i32_shift
  SHRNv4i16_shift
  SHRNv8i8_shift
  SHSUBv2i32
  SHSUBv4i16
  SHSUBv8i8
  SLId
  SLIv2i32_shift
  SLIv4i16_shift
  SLIv8i8_shift
  SMAXPv2i32
  SMAXPv4i16
  SMAXPv8i8
  SMAXv2i32
  SMAXv4i16
  SMAXv8i8
  SMINPv2i32
  SMINPv4i16
  SMINPv8i8
  SMINv2i32
  SMINv4i16
  SMINv8i8
  SQABSv1i64
  SQABSv2i32
  SQABSv4i16
  SQABSv8i8
  SQADDv1i64
  SQADDv2i32
  SQADDv4i16
  SQADDv8i8
  SQDMLALi32
  SQDMLALv1i64_indexed
  SQDMLSLi32
  SQDMLSLv1i64_indexed
  SQDMULHv2i32
  SQDMULHv2i32_indexed
  SQDMULHv4i16
  SQDMULHv4i16_indexed
  SQDMULLi32
  SQDMULLv1i64_indexed
  SQNEGv1i64
  SQNEGv2i32
  SQNEGv4i16
  SQNEGv8i8
  SQRDMLAHv2i32
  SQRDMLAHv2i32_indexed
  SQRDMLAHv4i16
  SQRDMLAHv4i16_indexed
  SQRDMLSHv2i32
  SQRDMLSHv2i32_indexed
  SQRDMLSHv4i16
  SQRDMLSHv4i16_indexed
  SQRDMULHv2i32
  SQRDMULHv2i32_indexed
  SQRDMULHv4i16
  SQRDMULHv4i16_indexed
  SQRSHLv1i64
  SQRSHLv2i32
  SQRSHLv4i16
  SQRSHLv8i8
  SQRSHRNv2i32_shift
  SQRSHRNv4i16_shift
  SQRSHRNv8i8_shift
  SQRSHRUNv2i32_shift
  SQRSHRUNv4i16_shift
  SQRSHRUNv8i8_shift
  SQSHLUd
  SQSHLUv2i32_shift
  SQSHLUv4i16_shift
  SQSHLUv8i8_shift
  SQSHLd
  SQSHLv1i64
  SQSHLv2i32
  SQSHLv2i32_shift
  SQSHLv4i16
  SQSHLv4i16_shift
  SQSHLv8i8
  SQSHLv8i8_shift
  SQSHRNv2i32_shift
  SQSHRNv4i16_shift
  SQSHRNv8i8_shift
  SQSHRUNv2i32_shift
  SQSHRUNv4i16_shift
  SQSHRUNv8i8_shift
  SQSUBv1i64
  SQSUBv2i32
  SQSUBv4i16
  SQSUBv8i8
  SQXTNv2i32
  SQXTNv4i16
  SQXTNv8i8
  SQXTUNv2i32
  SQXTUNv4i16
  SQXTUNv8i8
  SRHADDv2i32
  SRHADDv4i16
  SRHADDv8i8
  SRId
  SRIv2i32_shift
  SRIv4i16_shift
  SRIv8i8_shift
  SRSHLv1i64
  SRSHLv2i32
  SRSHLv4i16
  SRSHLv8i8
  SRSHRd
  SRSHRv2i32_shift
  SRSHRv4i16_shift
  SRSHRv8i8_shift
  SRSRAd
  SRSRAv2i32_shift
  SRSRAv4i16_shift
  SRSRAv8i8_shift
  SSHLv1i64
  SSHLv2i32
  SSHLv4i16
  SSHLv8i8
  SSHRd
  SSHRv2i32_shift
  SSHRv4i16_shift
  SSHRv8i8_shift
  SSRAd
  SSRAv2i32_shift
  SSRAv4i16_shift
  SSRAv8i8_shift
  SUBHNv2i64_v2i32
  SUBHNv4i32_v4i16
  SUBHNv8i16_v8i8
  SUBv1i64
  SUBv2i32
  SUBv4i16
  SUBv8i8
  SUDOTlanev8i8
  SUQADDv1i64
  SUQADDv2i32
  SUQADDv4i16
  SUQADDv8i8
  TBLv8i8Four
  TBLv8i8One
  TBLv8i8Three
  TBLv8i8Two
  TBXv8i8Four
  TBXv8i8One
  TBXv8i8Three
  TBXv8i8Two
  TRN1v2i32
  TRN1v4i16
  TRN1v8i8
  TRN2v2i32
  TRN2v4i16
  TRN2v8i8
  UABAv2i32
  UABAv4i16
  UABAv8i8
  UABDv2i32
  UABDv4i16
  UABDv8i8
  UADALPv2i32_v1i64
  UADALPv4i16_v2i32
  UADALPv8i8_v4i16
  UADDLPv2i32_v1i64
  UADDLPv4i16_v2i32
  UADDLPv8i8_v4i16
  UADDLVv4i32v
  UCVTFSWDri
  UCVTFSXDri
  UCVTFUWDri
  UCVTFUXDri
  UCVTFd
  UCVTFv1i64
  UCVTFv2f32
  UCVTFv2i32_shift
  UCVTFv4f16
  UCVTFv4i16_shift
  UDOTlanev8i8
  UDOTv8i8
  UHADDv2i32
  UHADDv4i16
  UHADDv8i8
  UHSUBv2i32
  UHSUBv4i16
  UHSUBv8i8
  UMAXPv2i32
  UMAXPv4i16
  UMAXPv8i8
  UMAXv2i32
  UMAXv4i16
  UMAXv8i8
  UMINPv2i32
  UMINPv4i16
  UMINPv8i8
  UMINv2i32
  UMINv4i16
  UMINv8i8
  UQADDv1i64
  UQADDv2i32
  UQADDv4i16
  UQADDv8i8
  UQRSHLv1i64
  UQRSHLv2i32
  UQRSHLv4i16
  UQRSHLv8i8
  UQRSHRNv2i32_shift
  UQRSHRNv4i16_shift
  UQRSHRNv8i8_shift
  UQSHLd
  UQSHLv1i64
  UQSHLv2i32
  UQSHLv2i32_shift
  UQSHLv4i16
  UQSHLv4i16_shift
  UQSHLv8i8
  UQSHLv8i8_shift
  UQSHRNv2i32_shift
  UQSHRNv4i16_shift
  UQSHRNv8i8_shift
  UQSUBv1i64
  UQSUBv2i32
  UQSUBv4i16
  UQSUBv8i8
  UQXTNv2i32
  UQXTNv4i16
  UQXTNv8i8
  URECPEv2i32
  URHADDv2i32
  URHADDv4i16
  URHADDv8i8
  URSHLv1i64
  URSHLv2i32
  URSHLv4i16
  URSHLv8i8
  URSHRd
  URSHRv2i32_shift
  URSHRv4i16_shift
  URSHRv8i8_shift
  URSQRTEv2i32
  URSRAd
  URSRAv2i32_shift
  URSRAv4i16_shift
  URSRAv8i8_shift
  USDOTlanev8i8
  USDOTv8i8
  USHLv1i64
  USHLv2i32
  USHLv4i16
  USHLv8i8
  USHRd
  USHRv2i32_shift
  USHRv4i16_shift
  USHRv8i8_shift
  USQADDv1i64
  USQADDv2i32
  USQADDv4i16
  USQADDv8i8
  USRAd
  USRAv2i32_shift
  USRAv4i16_shift
  USRAv8i8_shift
  UZP1v2i32
  UZP1v4i16
  UZP1v8i8
  UZP2v2i32
  UZP2v4i16
  UZP2v8i8
  XTNv2i32
  XTNv4i16
  XTNv8i8
  ZIP1v2i32
  ZIP1v4i16
  ZIP1v8i8
  ZIP2v2i32
  ZIP2v4i16
  ZIP2v8i8
2023-05-05 17:26:53 +01:00
Ronak Chauhan
5f0b92e580 [AMDGPU] Also consider global and scratch instructions when flushing vmcnt counter in loop preheader
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D149332
2023-05-05 21:12:10 +05:30
Thomas Lively
72a72315b0 [WebAssembly] Mark @llvm.wasm.shuffle lane indices as immediates
This intrinsic is meant to lower directly to the i8x16.shuffle instruction,
which takes its lane index arguments as immmediates. The ISel for the intrinsic
assumed that the lane index arguments were constants, so bitcode that
"incorrectly" used this intrinsic with non-immediate arguments caused an
assertion failure in the backend.

Avoid the crash by defining the lane index arguments to be immediates, matching
the underlying instruction. Update ISel accordingly. This change means that the
bitcode that previously caused a crash will now fail to validate.

Fixes #55559.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D149898
2023-05-05 08:12:41 -07:00
David Green
44e7b8aaf1 [AArch64] Tests for implicit zero patterns. NFC
See D149616
2023-05-05 15:12:50 +01:00
Jingu Kang
b18161d785 [AArch64] Handle vector with two different values
If vector has two different values and it can be splitted into two sub
vectors with same length, generate two DUP and CONCAT_VECTORS/VECTOR_SHUFFLE.
For example,

 t22: v16i8 = BUILD_VECTOR t23, t23, t23, t23, t23, t23, t23, t23,
                           t24, t24, t24, t24, t24, t24, t24, t24
==>
   t26: v8i8 = AArch64ISD::DUP t23
   t28: v8i8 = AArch64ISD::DUP t24
 t29: v16i8 = concat_vectors t26, t28

Differential Revision: https://reviews.llvm.org/D148347
2023-05-05 14:42:59 +01:00
Matt Devereau
ea228bd0bd [AArch64] Emit FNMADD instead of FNEG(FMADD)
Emit FNMADD instead of FNEG(FMADD) for optimization levels
above Oz when fast-math flags (nsz+contract) permit it.

Differential Revision: https://reviews.llvm.org/D149260
2023-05-05 13:35:51 +00:00
Serguei Katkov
5b7f8d9da5 [X86] Add tests for fminimum/fmaximum for vector operands. 2023-05-05 18:42:58 +07:00
Matt Devereau
f9ff2468af Revert "[AArch64] Emit FNMADD instead of FNEG(FMADD)"
This reverts commit caa95c2408677d7af8c7be4da203ea9271854f46.
2023-05-05 10:50:23 +00:00
Simon Pilgrim
9a1cb8a856 [X86] Add abds/abdu lowering for scalar i8/i16/i32/i64 types
The next step will be to begin adding generic legalization/lowering support
2023-05-05 11:49:33 +01:00
Matt Devereau
d9acb2aa91 Revert "Add AArch64 requirement for aarch64_fnmadd.ll"
This reverts commit a9919db65a1afa71ac62631d51711383c17d43fc.
2023-05-05 10:49:06 +00:00
Serguei Katkov
50cd2ff7bc [X86] Avoid usage constant -1 for fminimum/fmaximum lowering
Instead of equality comparison of value to preferred zero we can check just
the sign of value and if sign is set we should put this value as second operand for minimum
and first operand for maximum.
In this case FMIN/FMAX will choose the right result for 0.f and -0.f comparison.

This allows us:
1. avoid loading of big 64-bit constant for fminimum.
2. for double on non-64-nib platform we need to check only high part of value.
3. test against zero to check sign takes less size of instruction

Additionally, if we know that any of value is guaranteed to be non-zero
we should not care about 0.f and -0.f comparison.

Reviewed By: e-kud
Differential Revision: https://reviews.llvm.org/D149812
2023-05-05 16:24:33 +07:00
Nicolai Hähnle
ef13308b26 AMDGPU/SDAG: Improve {extract,insert}_subvector lowering for 16-bit vectors
v2:
- simplify the escape to TableGen patterns

Differential Revision: https://reviews.llvm.org/D149841
2023-05-05 10:55:18 +02:00
Serguei Katkov
96e09fef3c [X86] Avoid usage constant NaN for fminimum/fmaximum lowering
After applying FMIN/FMAX, if any of operands is NaN, the second operand will be the result.
So all we need is to check whether first operand is NaN and return it or result of FMIN/FMAX.

So we avoid usage of constant NaN in the lowering.

Additionally we can avoid handling NaN after FMIN/FMAX if we are sure that first operand is not NaN.

Reviewed By: e-kud
Differential Revision: https://reviews.llvm.org/D149729
2023-05-05 15:42:54 +07:00
Matt Devereau
a9919db65a Add AArch64 requirement for aarch64_fnmadd.ll 2023-05-05 08:36:05 +00:00
Matt Devereau
caa95c2408 [AArch64] Emit FNMADD instead of FNEG(FMADD)
Emit FNMADD instead of FNEG(FMADD) for optimization levels
above Oz when fast-math flags (nsz+contract) permit it.

Differential Revision: https://reviews.llvm.org/D149260
2023-05-05 08:14:17 +00:00
Fangrui Song
89e02c7aea [test] Update DirectX/min_vec_size.ll after shufflevector mask vector poison change 2023-05-04 22:30:36 -07:00
Craig Topper
38007dd394 [RISCV] Promote i1 shuffles to i8 shuffles.
Otherwise I think we extract and use a build_vector. There may be
some more improvements that can be made and there might be some
cases that we should do something different for, but this seemed like a
decent starting point.

Reviewed By: luke

Differential Revision: https://reviews.llvm.org/D149724
2023-05-04 19:44:43 -07:00
Craig Topper
fe9f557578 [DAGCombiner][RISCV] Enable reassociation for VP_FMA in visitFADDForFMACombine.
Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D149911
2023-05-04 17:20:58 -07:00
Craig Topper
f10bcf6f9d [RISCV] Add vp.icmp/fcmp to RISCVTargetLowering::canSplatOperand. 2023-05-04 16:56:14 -07:00
Yeting Kuo
287aa6c453 [DAGCombiner] Use generalized pattern match for visitFSUBForFMACombine.
The patch makes visitFSUBForFMACombine serve vp.fsub too. It helps DAGCombiner
to fuse vp.fsub and vp.fmul patterns to vp.fma.

Reviewed By: luke

Differential Revision: https://reviews.llvm.org/D149821
2023-05-04 22:02:32 +08:00
Evgenii Kudriashov
a82d27a9a6 [X86] Support llvm.{min,max}imum.f{16,32,64}
Addresses https://github.com/llvm/llvm-project/issues/53353

Reviewed By: RKSimon, pengfei

Differential Revision: https://reviews.llvm.org/D145634
2023-05-04 21:04:48 +08:00
Evgenii Kudriashov
62f1d91727 [NFC][X86] Remove cfi instructions and unused attributes from half.ll test
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D149114
2023-05-04 21:04:48 +08:00
Joseph Huber
f05ce9045a [NVPTX] Add NVPTXCtorDtorLoweringPass to handle global ctors / dtors
This patch mostly adapts the existing AMDGPUCtorDtorLoweringPass for use
by the Nvidia backend. This pass transforms the ctor / dtor list into a
kernel call that can be used to invoke those functinos. Furthermore, we
emit globals such that the names and addresses of these constructor
functions can be found by the driver. Unfortunately, since NVPTX has no
way to emit variables at a named section, nor a functioning linker to
provide the begin / end symbols, we need to mangle these names and have
an external application find them.

This work is related to the work in D149398 and D149340.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D149451
2023-05-04 07:13:00 -05:00
Nicolai Hähnle
909095a880 AMDGPU: Precommit test showing codegen weakness
The code sequence on gfx9 has a lot of useless v_bfi instructions.

Differential Revision: https://reviews.llvm.org/D149840
2023-05-04 14:11:04 +02:00
Luke Lau
d9683a70fe [RISCV] Fix extract_vector_elt on i1 at idx 0 being inverted
It looks like the intention here is to truncate a XLenVT -> i1, in
which case we should be emitting snez instead of sneq if I'm understanding
correctly.

Reviewed By: jacquesguan, frasercrmck

Differential Revision: https://reviews.llvm.org/D149732
2023-05-04 11:45:35 +01:00
Tom Weaver
1d8ab713ad Revert "[DebugLine] save one debug line entry for empty prologue"
This reverts commit b48a8233f5e230e46182bf5c523ceb6a04cec8f5.

This change caused https://lab.llvm.org/buildbot/#/builders/247/builds/4125
to start failing, please address the failures before resubmitting.
2023-05-04 11:08:58 +01:00
Luke Lau
9e9bf1e3ed [RISCV] Use setcc to truncate results in widenVectorOpsToi8
To avoid an unnecessary vand.vi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D149771
2023-05-04 10:49:27 +01:00
Chen Zheng
b48a8233f5 [DebugLine] save one debug line entry for empty prologue
Some debuggers like DBX on AIX assume the address in debug line
entries is always incremental. But clang generates two entries (entry
for file scope line and entry for prologue end) with same address if
prologue is empty

And if the prologue is empty, seems the first debug line entry for the
function is unnecessary(i.e. removing the first entry won't impact the
behavior in GDB on Linux), so I implement this for all debuggers.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D147506
2023-05-04 04:37:34 +00:00
Shao-Ce SUN
2dc0fa050e [RISCV][CodeGen] Support Zdinx on RV64 codegen
This patch was split from D122918 . Co-Author: @liaolucy @realqhc

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D149665
2023-05-04 09:00:40 +08:00
Thomas Lively
abdb5e041c [WebAssembly] Remove incorrect result from wasm64 store_lane instructions
The wasm64 versions of the v128.storeX_lane instructions was incorrectly defined
as returning a v128 value, which resulted in spurious drop instructions being
emitted and causing validation to fail. This was not caught earlier because
wasm64 has been experimental and not well tested. Update the relevant test file
to test both wasm32 and wasm64.

Fixes #62443.

Differential Revision: https://reviews.llvm.org/D149780
2023-05-03 16:00:20 -07:00
Krzysztof Drewniak
fc05b7f0d0 [AMDGPU] Add gfx940 to fp64 atomic tests in global ISel
This changes the test in GlobalISel, which makes it match the test
elsewhere.

Differential Revision: https://reviews.llvm.org/D149795
2023-05-03 22:40:16 +00:00
Krzysztof Drewniak
f0415f2a45 Re-land "[AMDGPU] Define data layout entries for buffers""
Re-land D145441 with data layout upgrade code fixed to not break OpenMP.

This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27.

Differential Revision: https://reviews.llvm.org/D149776
2023-05-03 19:43:56 +00:00
Krzysztof Drewniak
3f2fbe92d0 Revert "[AMDGPU] Define data layout entries for buffers"
This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850.

Differential Revision: https://reviews.llvm.org/D149758
2023-05-03 16:11:00 +00:00
Mateja Marjanovic
cf76074a36 [AMDGPU][GlobalISel] Check exact width in get*ClassForBitWidth and widen if necessary
Instead of checking if the given bitwidth is less or equal to a bitwidth of an existing RegClass,
check if it has the exact same value.

For LLVM vector types that don't have a corresponding Register Class, widen them during legalization.
That goes for G_EXTRACT_VECTOR_ELT, G_INSERT_VECTOR_ELT and G_BUILD_VECTOR.

Differential revision: https://reviews.llvm.org/D148096
Reviewers: foad, arsenm
2023-05-03 17:32:24 +02:00
Mateja Marjanovic
6175ec0bb6 Revert "[AMDGPU][GlobalISel] Widen the vector operand in G_BUILD/INSERT/EXTRACT_VECTOR"
This reverts commit b25c7cafcbe1b52ea2d1ff5e5c2f13674b5f297d.
2023-05-03 17:28:01 +02:00
Krzysztof Drewniak
f9c1ede254 [AMDGPU] Define data layout entries for buffers
Per discussion at
https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798,
we define two new address spaces for AMDGCN targets.

The first is address space 7, a non-integral address space (which was
already in the data layout) that has 160-bit pointers (which are
256-bit aligned) and uses a 32-bit offset. These pointers combine a
128-bit buffer descriptor and a 32-bit offset, and will be usable with
normal LLVM operations (load, store, GEP). However, they will be
rewritten out of existence before code generation.

The second of these is address space 8, the address space for "buffer
resources". These will be used to represent the resource arguments to
buffer instructions, and new buffer intrinsics will be defined that
take them instead of <4 x i32> as resource arguments. ptr
addrspace(8). These pointers are 128-bits long (with the same
alignment). They must not be used as the arguments to getelementptr or
otherwise used in address computations, since they can have
arbitrarily complex inherent addressing semantics that can't be
represented in LLVM. Even though, like their address space 7 cousins,
these pointers have deterministic ptrtoint/inttoptr semantics, they
are defined to be non-integral in order to prevent optimizations that
rely on pointers being a [0, [addr_max]] value from applying to them.

Future work includes:
- Defining new buffer intrinsics that take ptr addrspace(8) resources.
- A late rewrite to turn address space 7 operations into buffer
intrinsics and offset computations.

This commit also updates the "fallback address space" for buffer
intrinsics to the buffer resource, and updates the alias analysis
table.

Depends on D143437

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D145441
2023-05-03 15:25:58 +00:00
Mateja Marjanovic
b25c7cafcb [AMDGPU][GlobalISel] Widen the vector operand in G_BUILD/INSERT/EXTRACT_VECTOR
Widen the vector operand type in G_BUILD_VECTOR, G_INSERT_VECTOR_ELT,
G_EXTRACT_VECTOR_ELT to the nearest larger RegClass.
2023-05-03 17:14:38 +02:00
Sp00ph
8e46ac3623 [AArch64] Add more efficient bitwise vector reductions.
Improves the codegen for VECREDUCE_{AND,OR,XOR} operations on AArch64.
Currently, these are fully scalarized, except if the vector is a <N x i1>. This
patch improves the codegen down to O(log(N)) where N is the length of the
vector for vectors whose elements are not i1, by repeatedly applying the
bitwise operations to the two halves of the vector. <N x i1> bitwise reductions
are handled using VECREDUCE_{UMAX,UMIN,ADD} instead.

I had to update quite a few codegen tests with these changes, with a general
downward trend in instruction count. Since the vector reductions already have
tests, I haven't added any new tests myself.

Differential Revision: https://reviews.llvm.org/D148185
2023-05-03 15:56:16 +01:00
Philip Reames
53710b43a0 [RISCV] Use vslidedown for undef sub-sequences in generic build_vector
This is a follow up to D149263 which extends the generic vslide1down handling to use vslidedown (without the one) for undef elements, and in particular for undef sub-sequences. This both removes the domain crossing, and for undef subsequences results in fewer instructions over all.

Differential Revision: https://reviews.llvm.org/D149658#inline-1446673
2023-05-03 07:52:29 -07:00
Philip Reames
9fc5af1b84 [RISCV] Use vslide1down lowering for two element non-constant build_vectors
When the values are in GPRs, the vslide1down lowering is always better. We need to greatly improve the splat-and-mask cost model to handle constants in a meaningful way, so for now, limit this to non-constant vectors.

This does send the "partially constant" case down the vslide1down path. This could cause some regressions, though I don't see any in practice.

The cost modeling for the general case is annoyingly tricky. We have a great amount of inconsistency around immediate operands, and as a result, the exact constant and exact lowering choice matters a lot. I'm hoping that we get a "good enough" result without modeling this exactly, but we may need to do something analogous to getIntMatCost (i.e. a search w/costing).

Differential Revision: https://reviews.llvm.org/D149667
2023-05-03 07:35:23 -07:00
WuXinlong
9f0d725744 [RISCV] Add MC support of RISCV zcmt Extension
This patch add the instructions of zcmt extension.
[[ https://github.com/riscv/riscv-code-size-reduction/releases/tag/v1.0.0-RC5.7 | spac is here ]]
Which includes two instructions (cm.jt&cm.jalt) and a CSR Reg JVT

co-author: @Scott Egerton

Reviewed By: kito-cheng, craig.topper

Differential Revision: https://reviews.llvm.org/D133863
2023-05-03 22:06:37 +08:00
David Green
b96967ad17 [AArch64] Combine concat through rshrn
This tries to push the concat in trunc(concat(rshr, rshr)) into the leaves, so
that we can generate rshrn(concat). This helps improve the codegen for small
types, using the existing rshrn patterns.

Differential Revision: https://reviews.llvm.org/D149636
2023-05-03 14:48:50 +01:00
David Green
15723e6f8c [AArch64] Additional tests for rshrn patterns. NFC
See D149636
2023-05-03 13:15:27 +01:00
Florian Hahn
4e2b4f97a0
[ShrinkWrap] Use underlying object to rule out stack access.
Allow shrink-wrapping past memory accesses that only access globals or
function arguments. This patch uses getUnderlyingObject to try to
identify the accessed object by a given memory operand. If it is a
global or an argument, it does not access the stack of the current
function and should not block shrink wrapping.

Note that the caller's stack may get accessed when passing an argument
via the stack, but not the stack of the current function.

This addresses part of the TODO from D63152.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D149668
2023-05-03 09:28:07 +01:00
pvanhout
415956fe7e [llvm-readobj][AMDGPU] Bypass MD verification for PAL
Small split change from D146023.

Migrate elf-notes to v4 and fix llvm-readobj to work with PAL metadata.

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D146119
2023-05-03 08:45:24 +02:00
Florian Hahn
bc1c95d973
[ShrinkWrap] Add tests with loads from byval/inalloca/preallocated args.
Extra test coverage for D149668.
2023-05-02 20:41:58 +01:00
Simon Pilgrim
edce93c9d8 [X86] Lower abdu(lhs, rhs) -> or(usubsat(lhs,rhs), usubsat(rhs,lhs))
Adds pre-SSE4 v8i16 abdu handling - we already have something similar for umax(x,y) -> add(x,usubsat(y,x)) / umin(x,y) -> sub(x,usubsat(x,y))

(I'm starting to look at adding generic TargetLowering expandABD() handling and came across this missed opportunity).

Inspiration: http://0x80.pl/notesen/2018-03-11-sse-abs-unsigned.html

Alive2: https://alive2.llvm.org/ce/z/gMhaTa
2023-05-02 19:54:04 +01:00
Florian Hahn
7f8dee5c54
[X86] Remove stale checks after a30c17aba9. 2023-05-02 18:24:50 +01:00