80705 Commits

Author SHA1 Message Date
David Majnemer
5c12434906 [X86] Emit comments explaining the immediate in vfpclass
This makes the assembly a lot more readable at a glance.

As an example:
```
  vfpclasspd $4, %zmm0, %k0 # k0 = isNegativeZero(zmm0)
```
2024-10-29 19:54:34 +00:00
Maryam Moghadas
8a0cb9ac86
[PowerPC] Add custom lowering for ssubo (#111748)
This patch is to improve the codegen for ssubo node for i32 in 64-bit
mode by custom lowering.
2024-10-29 15:43:05 -04:00
Adam Yang
3a1228a543
[SPIRV] Add GroupMemoryBarrierWithGroupSync intrinsic (#111888)
partially fixes #70103

### Changes
* Added int_spv_group_memory_barrier_with_group_sync intrinsic in
IntrinsicsSPIRV.td
* Added lowering for int_spv_group_memory_barrier_with_group_sync in
SPIRVInstructionSelector.cpp
* Added SPIRV backend test case

### Related PRs
* [[clang][HLSL] Add GroupMemoryBarrierWithGroupSync intrinsic
#111883](https://github.com/llvm/llvm-project/pull/111883)
* [[DXIL] Add GroupMemoryBarrierWithGroupSync intrinsic
#111884](https://github.com/llvm/llvm-project/pull/111884)
2024-10-29 12:40:01 -07:00
Min-Yih Hsu
ba65710908
[RISCV] Avoid redundant SchedRead on _TIED VPseudos (#113940)
_TIED and _MASK_TIED pseudos have one less operand compared to other
pseudos, thus we shouldn't attach the same number of SchedRead for these
instructions.

I don't think we have a way to (explicitly) check scheduling classes. So
I only test this patch with existing tests.
2024-10-29 10:49:35 -07:00
Harald van Dijk
950ee75909
[RISC-V] Fix check of minimum vlen. (#114055)
If we have a minimum vlen, we were adjusting StackSize to change the
unit from vscale to bytes, and then calculating the required padding
size for alignment in bytes. However, we then used that padding size as
an offset in vscale units, resulting in misplaced stack objects.

While it would be possible to adjust the object offsets by dividing
AlignmentPadding by ST.getRealMinVLen() / RISCV::RVVBitsPerBlock, we can
simplify the calculation a bit if instead we adjust the alignment to be
in vscale units.

@topperc This fixes a bug I am seeing after #110312, but I am not 100%
certain I am understanding the code correctly, could you please see if
this makes sense to you?
2024-10-29 17:30:30 +00:00
Adam Yang
9a5b3a1bbc
[DXIL] Add GroupMemoryBarrierWithGroupSync intrinsic (#111884)
fixes #112974
partially fixes #70103

### Changes
- Added new tablegen based way of lowering dx intrinsics to DXIL ops.
- Added int_dx_group_memory_barrier_with_group_sync intrinsic in
IntrinsicsDirectX.td
- Added expansion for int_dx_group_memory_barrier_with_group_sync in
DXILIntrinsicExpansion.cpp`
- Added DXIL backend test case

### Related PRs
* [[clang][HLSL] Add GroupMemoryBarrierWithGroupSync intrinsic
#111883](https://github.com/llvm/llvm-project/pull/111883)
* [[SPIRV] Add GroupMemoryBarrierWithGroupSync intrinsic
#111888](https://github.com/llvm/llvm-project/pull/111888)
2024-10-29 10:17:35 -07:00
Craig Topper
b1d0fe095b [RISCV] Remove trailing whitespace. NFC 2024-10-29 10:09:28 -07:00
Jubilee
f53889ffca
[RISCV] Allow crypto features to imply dependents (#112659)
This relationship is a logical dependency.

Note Zvbc and Zvknhb. They are explicitly called out in the spec as
requiring 64 bits:
-
56ed7952d1/doc/vector/riscv-crypto-spec-vector.adoc
2024-10-29 10:07:20 -07:00
SpencerAbson
2a9dd8af5a
[AArch64] Add assembly/disassembly for zeroing SVE FCVT{X} and BFCVT (#113916)
This patch adds assembly/disassembly support for the following SVE2.2
instructions

    - FCVT (zeroing)
    - FCVTX (zeroing)
    - BFCVT (zeroing)
    
In accordance with:
https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions
2024-10-29 16:55:19 +00:00
Jay Foad
a156362e93
[AMDGPU] Fix machine verification failure after SIFoldOperandsImpl::tryFoldOMod (#113544)
Fixes #54201
2024-10-29 14:59:37 +00:00
Sarah Spall
75e7ba8c0b
[HLSL] Re-implement countbits with the correct return type (#113189)
Restricts hlsl countbits to always return a uint32.
Implements a lowering from llvm.ctpop which has an overloaded return
type to dxil cbits op which always returns uint32.
Closes #112779
2024-10-29 07:56:05 -07:00
Shilei Tian
e268398fa8
[NFC][AMDGPU] Use !foreach to replace explicit list of registers (#114005) 2024-10-29 10:50:06 -04:00
Momchil Velikov
b6a84e77b6
[AArch64] Add assembly/disassembly for FMOP4A (widening, 4-way) instructions (#113347)
The new instructions are described in
https://developer.arm.com/documentation/ddi0602/2024-09/SME-Instructions
2024-10-29 14:36:07 +00:00
Matt Arsenault
88e23eb2cf
DAG: Fix legalization of vector addrspacecasts (#113964) 2024-10-29 08:08:50 -05:00
Lukacma
3c2d77185e
[AARCH64] Add assembly/disassembly for FMMLA instructions (#113313)
This patch adds assembly/disassembly for the following instructions:
FMMLA (widening, FP16 to FP32)
FMMLA (widening, FP8 to FP16)
FMMLA (widening, FP8 to FP32)

According to [1]

[1]https://developer.arm.com/documentation/ddi0602
2024-10-29 13:02:46 +00:00
Momchil Velikov
ec427df2b9
[AArch64] Add assembly/disassembly for FMOP4{A,S} (non-widening) half-precision instructions (#113343)
The new instructions are described in
https://developer.arm.com/documentation/ddi0602/2024-09/SME-Instructions
2024-10-29 11:50:29 +00:00
Jay Foad
2443549b85
[IR] Remove some uses of StructType::setBody. NFC. (#113685)
It is simple to create the struct body up front, now that we have
transitioned to opaque pointers.
2024-10-29 11:44:53 +00:00
Lukacma
98c8d64353
[AArch64] Add assembly/dissasembly for BFSCALE instructions (#113538)
This patch adds assembly/disassembly for following instructions:
   BFSCALE (multiple and single vector)
   BFSCALE (multiple vectors)

As specified in https://developer.arm.com/documentation/ddi0602/2024-09

Co-authored-by: Momchil Velikov
[momchil.velikov@arm.com](mailto:momchil.velikov@arm.com)
2024-10-29 11:08:36 +00:00
Benjamin Maxwell
c3260c65e8
[IR] Add llvm.sincos intrinsic (#109825)
This adds the `llvm.sincos` intrinsic, legalization, and lowering.

The `llvm.sincos` intrinsic takes a floating-point value and returns
both the sine and cosine (as a struct).

```
declare { float, float }          @llvm.sincos.f32(float  %Val)
declare { double, double }        @llvm.sincos.f64(double %Val)
declare { x86_fp80, x86_fp80 }    @llvm.sincos.f80(x86_fp80  %Val)
declare { fp128, fp128 }          @llvm.sincos.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 }  @llvm.sincos.ppcf128(ppc_fp128  %Val)
declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float>  %Val)
```

The lowering is built on top of the existing FSINCOS ISD node, with
additional type legalization to allow for f16, f128, and vector values.
2024-10-29 10:52:20 +00:00
CarolineConcatto
8d38fbf2f0
[LLVM][AArch64] Add assembly/disassembly for SVE Integer Unary Arithm… (#113670)
…etic Predicated instructions

This patch adds the following instructions:

SVE bitwise unary operations (predicated)
CLS, CLZ, CNT, CNOT, FABS, FNEG, NOT

SVE integer unary operations (predicated)
SXT{B,H,W}, UXT{B,H,W}, ABS ,NEG

SVE2 integer unary operations (predicated)
URECPE, URSQRTE, SQABS, SQNEG

According to https://developer.arm.com/documentation/ddi0602

Co-authored-by: Spencer Abson Spencer.Abson@arm.com
2024-10-29 09:09:55 +00:00
CarolineConcatto
d4197f3ac1
[LLVM][AArch64] Add assembly/disassembly for MUL/BFMUL SME instructions (#113535)
According to https://developer.arm.com/documentation/ddi0602

Co-authored-by: Momchil-Velikov Momchil.Velikov@arm.com
2024-10-29 09:09:13 +00:00
Alex Bradbury
7544d3af0e
[RISCV] Mark RVB23U64 and RVB23S64 as non-experimental (#113918)
The specification was recently ratified

<https://github.com/riscv/riscv-profiles/blob/main/src/rvb23-profile.adoc>.
2024-10-29 07:57:34 +00:00
Craig Topper
3f4468faaa
[RISCV] Teach expandRV32ZdinxStore to handle memoperand not being present. (#113981)
I received a report that the outliner drops memoperands and causes this
code to crash. Handle this by only copying the memoperand if it exists.

Similar for expandRV32ZdinxLoad
2024-10-28 22:37:47 -07:00
Craig Topper
635c344dfb
[X86] Add vector_compress patterns with a zero vector passthru. (#113970)
We can use the kz form to automatically zero the extra elements.

Fixes #113263.
2024-10-28 19:59:00 -07:00
joaosaffran
481bce018e
Adding splitdouble HLSL function (#109331)
- Adding hlsl `splitdouble` intrinsics
- Adding DXIL lowering
- Adding SPIRV lowering
- Adding test

Fixes: #108901

---------

Co-authored-by: Joao Saffran <jderezende@microsoft.com>
2024-10-28 13:26:59 -07:00
David Green
8274be509e [AArch64] Remove header dependencies of AArch64ISelLowering.h. NFC
This patch aims to reduce the include used by AArch64ISelLowering, allowing it
to be included by unittests so that they can reference the AArch64ISD nodes.
It:
 - Moves the inclusion of AArch64SMEAttributes.h to the uses.
 - Moves LowerPtrAuthGlobalAddressStatically to a static function, so that
   AArch64PACKey is not required in the header.
 - Moves the definitions of getExceptionPointerRegister to the cpp file, to
   remove the reference of AArch64::X0.
2024-10-28 18:53:37 +00:00
David Green
5a5b78a84e [AArch64][GlobalISel] Lower aarch64.neon.smull/umull intrinsics.
As with other nodes, we can convert these into G_UMULL and G_SMULL aarch64
instructions.
2024-10-28 18:51:10 +00:00
Shilei Tian
4cf128512b
[NFC][AMDGPU] Use C++17 structured bindings as much as possible (#113939)
This only changes `llvm/lib/Target/AMDGPU/SIISelLowering.cpp`.
There are five uses of `std::tie` remaining because they can't be
replaced with
C++17 structured bindings.
2024-10-28 13:55:57 -04:00
Ellis Hoag
6ab26eab4f
Check hasOptSize() in shouldOptimizeForSize() (#112626) 2024-10-28 09:45:03 -07:00
Min-Yih Hsu
ab5d3c9d35
[RISCV] Assign different scheduling classes for VMADC/VMSBC (#113009)
Split the scheduling classes of VMADC/VMSBC away from that of VADC/VSBC.
Because the former are technically mask-producing instructions rather
than normal vector arithmetics, which might have different performance
characteristics on some processors.

This is effectively NFC.
2024-10-28 09:37:54 -07:00
CarolineConcatto
106259510f
[AArch64]Add convert and multiply-add SIMD&FP assembly/disassembly in… (#113296)
…structions

This patch adds the following instructions:
Conversion between floating-point and integer:
  FCVT{AS, AU, MS, MU, NS, NU, PS, PU, ZS, ZU}
  {S,U}CVTF
Advanced SIMD three-register extension:
  FMMLA

According to https://developer.arm.com/documentation/ddi0602

Co-authored-by: Marian Lukac marian.lukac@arm.com
Co-authored-by: Spencer Abson spencer.abson@arm.com
2024-10-28 16:36:02 +00:00
Aiden Grossman
eb53d08bce
[llvm-exegesis] Add Pfm Counters for SapphireRapids (#113847)
This patch adds the appropriate hookups in X86PfmCounters.td for
SapphireRapids. This is mostly to fix errors when some of my jobs that
only really need dummy counters get scheduled on sapphire rapids
machines, but figured I might as well do it properly while here. I do
not have hardware access to test this currently, but this matches
exactly with what is in the libpfm source code.
2024-10-28 09:07:14 -07:00
Craig Topper
5ac3f3c45c
[RISCV] Add DestEEW = EEW1 to VMADC. (#113013)
It was present on VMSBC but not VMADC. Reorder the instructions to avoid
duplicate 'let' statements.
2024-10-28 09:06:12 -07:00
SpencerAbson
ce0368eb84
[AArch64] Add assembly/disassembly for PMLAL/PMULL instructions (#113564)
This patch adds assembly/disassembly for the following SVE_AES2
instructions

    -  PMLAL
    -  PMULL
- In accordance with:
https://developer.arm.com/documentation/ddi0602/latest/
2024-10-28 13:55:16 +00:00
Benjamin Maxwell
ddd463be7e
[AArch64] Add getStreamingHazardSize() to AArch64Subtarget (#113679)
This is defined by the `-aarch64-streaming-hazard-size` option or its
alias `-aarch64-stack-hazard-size` (the original name). It has been
renamed to be more general as this option will (for the time being) be
used to detect if the current target has streaming mode memory hazards.

---------

Co-authored-by: Hari Limaye <hari.limaye@arm.com>
2024-10-28 13:01:22 +00:00
Alex Bradbury
ba7555e640
[RISCV] Mark the RVA23S64 and RVA23U64 profiles as non-experimental (#113826)
All of the extensions used by these profile are themselves
non-experimental, and RVA23 was just ratified

<https://riscv.org/announcements/2024/10/risc-v-announces-ratification-of-the-rva23-profile-standard/>.

<https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc>

We lack a way of expressing `Ss1p13` (supervisor architecture 1.13), but
this is a problem we have for RVA22 (Ss1p12) and RVA20 (Ss1p11) so I
don't feel it's a blocker.
2024-10-28 12:56:47 +00:00
dong-miao
75c75fc16e
[RISCV]Add svvptc extension (#113882) 2024-10-28 22:54:51 +11:00
Luke Lau
0cbccb13d6
[RISCV] Remove support for pre-RA vsetvli insertion (#110796)
Now that LLVM 19.1.0 has been out for a while with post-vector-RA
vsetvli insertion enabled by default, this proposes to remove the flag
that restores the old pre-RA behaviour so we only have one configuration
going forward.

That flag was mainly meant as a fallback in case users ran into issues,
but I haven't seen anything reported so far.
2024-10-28 11:31:18 +00:00
SpencerAbson
64148944c5
[AArch64] Add assembly/disassembly for zeroing SVE2 integer instructions (#113473)
This patch adds assembly/disassembly for the following SVE2.2
instructions

    -  SQABS  (zeroing)
    -  SQNEG  (zeroing)
    -  URECPE (zeroing)
    -  USQRTE (zeroing)

- Refactor the existing merging forms to remove the now redundant bit 17
argument.
- In accordance with:
https://developer.arm.com/documentation/ddi0602/latest/
2024-10-28 10:41:07 +00:00
Jonas Paulsson
09160a9821
[SystemZ] Silence compiler warning (#113894)
Use SystemZ::NoRegister instead of 0 in
SystemZTargetLowering::getRegisterByName().
2024-10-28 11:32:39 +01:00
Mirko Brkušanin
fa4790e404
[AMDGPU][MC] Fix disassembler for VIMAGE when non-first vaddr is v0 (#113569)
For disassembler tables we use *V1_V4* variants for VIMAGE and then
remove unused vaddr fields. *V1_V1* variant, which has every vaddr
field other than vaddr0 set to 0, was also enabled and caused confusion
when decoding cases which used v0 (whose encoded value is 0)
2024-10-28 10:43:18 +01:00
Luke Lau
96f5c68350
[RISCV] Lower @llvm.experimental.vector.compress for zvfhmin/zvfbfmin (#113770)
This is a follow up to #113291 and handles f16/bf16 with zvfhmin and
zvfbmin.
2024-10-28 09:37:06 +00:00
Alex Bradbury
43a5719d9f
[RISCV] Use Sha extension in RVA23S64 profile (#113823)
In the ratified version of the RVA23S64 definition, the Sha extension is
now used to group together the set of hypervisor related extensions.

<https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc>
2024-10-28 09:22:09 +00:00
Oliver Stannard
dff114b356
[ARM] Optimise non-ABI frame pointers (#110286)
With -fomit-frame-pointer, even if we set up a frame pointer for other
reasons (e.g. variable-sized or over-aligned stack allocations), we
don't need to create an ABI-compliant frame record. This means that we
can save all of the general-purpose registers in one push, instead of
splitting it to ensure that the frame pointer and link register are
adjacent on the stack, saving two instructions per function.
2024-10-28 09:01:06 +00:00
Jack Styles
86f76c3b17
[AArch64][Libunwind] Add Support for FEAT_PAuthLR DWARF Instruction (#112171)
As part of FEAT_PAuthLR, a new DWARF Frame Instruction was introduced,
`DW_CFA_AARCH64_negate_ra_state_with_pc`. This instructs Libunwind that
the PC has been used with the signing instruction. This change includes
three commits
- Libunwind support for the newly introduced DWARF Instruction
- CodeGen Support for the DWARF Instructions
- Reversing the changes made in #96377. Due to
`DW_CFA_AARCH64_negate_ra_state_with_pc`'s requirements to be placed
immediately after the signing instruction, this would mean the CFI
Instruction location was not consistent with the generated location when
not using FEAT_PAuthLR. The commit reverses the changes and makes the
location consistent across the different branch protection options.
While this does have a code size effect, this is a negligible one.

For the ABI information, see here:
853286c7ab/aadwarf64/aadwarf64.rst (id23)
2024-10-28 08:22:38 +00:00
Fabian Ritter
a4fd3dba6e
[AMDGPU] Use wider loop lowering type for LowerMemIntrinsics (#112332)
When llvm.memcpy or llvm.memmove intrinsics are lowered as a loop in
LowerMemIntrinsics.cpp, the loop consists of a single load/store pair
per iteration. We can improve performance in some cases by emitting
multiple load/store pairs per iteration. This patch achieves that by
increasing the width of the loop lowering type in the GCN target and
letting legalization split the resulting too-wide access pairs into
multiple legal access pairs.

This change only affects lowered memcpys and memmoves with large (>=
1024 bytes) constant lengths. Smaller constant lengths are handled by
ISel directly; non-constant lengths would be slowed down by this change
if the dynamic length was smaller or slightly larger than what an
unrolled iteration copies.

The chosen default unroll factor is the result of microbenchmarks on
gfx1030. This change leads to speedups of 15-38% for global memory and
1.9-5.8x for scratch in these microbenchmarks.

Part of SWDEV-455845.
2024-10-28 09:04:19 +01:00
Alex Bradbury
35f6cc6af0
[RISCV] Add the Sha extension (#113820)
This was introduced in the now-ratified RVA23 profile (and also added to
the RVA22 text) as a simple way of referring to H plus the set of
supervisor extensions required by RVA23.
https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc

This patch simply defines the extension. The next patch will adjust the
RVA23 profile to use it, and at that point I think we will be ready to
mark RVA23 as non-experimental.

Note that I haven't made it so if you enable all extensions that
constitute Sha, Sha is implied. Per #76893 (adding 'B'), the concern is
making this implication might break older external assemblers. Perhaps
this is less of a concern given the relative frequency of
`-march=${foo}_zba_zbb_zbs` vs the collection of H extensions. If we did
want to add that implication, we'd probably want to add it in a separate
patch so it can be easily reverted if found to cause problems.
2024-10-28 07:42:33 +00:00
Phoebe Wang
fd85761208
[X86][BF16] Customize VSELECT for BF16 under AVX-NECONVERT (#113322)
Fixes: https://godbolt.org/z/9abGnE8zs
2024-10-28 15:15:49 +08:00
Freddy Ye
d3f70db51c
[X86][MC] Support instructions of MSR_IMM (#113524)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-10-28 12:59:51 +08:00
Freddy Ye
5aa1275d03
[X86] Support SM4 EVEX version intrinsics/instructions. (#113402)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-10-28 10:46:16 +08:00