llvm-project

Author	SHA1	Message	Date
David Majnemer	5c12434906	[X86] Emit comments explaining the immediate in vfpclass This makes the assembly a lot more readable at a glance. As an example: ``` vfpclasspd $4, %zmm0, %k0 # k0 = isNegativeZero(zmm0) ```	2024-10-29 19:54:34 +00:00
Maryam Moghadas	8a0cb9ac86	[PowerPC] Add custom lowering for ssubo (#111748 ) This patch is to improve the codegen for ssubo node for i32 in 64-bit mode by custom lowering.	2024-10-29 15:43:05 -04:00
Adam Yang	3a1228a543	[SPIRV] Add GroupMemoryBarrierWithGroupSync intrinsic (#111888 ) partially fixes #70103 ### Changes * Added int_spv_group_memory_barrier_with_group_sync intrinsic in IntrinsicsSPIRV.td * Added lowering for int_spv_group_memory_barrier_with_group_sync in SPIRVInstructionSelector.cpp * Added SPIRV backend test case ### Related PRs * [[clang][HLSL] Add GroupMemoryBarrierWithGroupSync intrinsic #111883](https://github.com/llvm/llvm-project/pull/111883) * [[DXIL] Add GroupMemoryBarrierWithGroupSync intrinsic #111884](https://github.com/llvm/llvm-project/pull/111884)	2024-10-29 12:40:01 -07:00
Min-Yih Hsu	ba65710908	[RISCV] Avoid redundant SchedRead on _TIED VPseudos (#113940 ) _TIED and _MASK_TIED pseudos have one less operand compared to other pseudos, thus we shouldn't attach the same number of SchedRead for these instructions. I don't think we have a way to (explicitly) check scheduling classes. So I only test this patch with existing tests.	2024-10-29 10:49:35 -07:00
Harald van Dijk	950ee75909	[RISC-V] Fix check of minimum vlen. (#114055 ) If we have a minimum vlen, we were adjusting StackSize to change the unit from vscale to bytes, and then calculating the required padding size for alignment in bytes. However, we then used that padding size as an offset in vscale units, resulting in misplaced stack objects. While it would be possible to adjust the object offsets by dividing AlignmentPadding by ST.getRealMinVLen() / RISCV::RVVBitsPerBlock, we can simplify the calculation a bit if instead we adjust the alignment to be in vscale units. @topperc This fixes a bug I am seeing after #110312, but I am not 100% certain I am understanding the code correctly, could you please see if this makes sense to you?	2024-10-29 17:30:30 +00:00
Adam Yang	9a5b3a1bbc	[DXIL] Add GroupMemoryBarrierWithGroupSync intrinsic (#111884 ) fixes #112974 partially fixes #70103 ### Changes - Added new tablegen based way of lowering dx intrinsics to DXIL ops. - Added int_dx_group_memory_barrier_with_group_sync intrinsic in IntrinsicsDirectX.td - Added expansion for int_dx_group_memory_barrier_with_group_sync in DXILIntrinsicExpansion.cpp` - Added DXIL backend test case ### Related PRs * [[clang][HLSL] Add GroupMemoryBarrierWithGroupSync intrinsic #111883](https://github.com/llvm/llvm-project/pull/111883) * [[SPIRV] Add GroupMemoryBarrierWithGroupSync intrinsic #111888](https://github.com/llvm/llvm-project/pull/111888)	2024-10-29 10:17:35 -07:00
Craig Topper	b1d0fe095b	[RISCV] Remove trailing whitespace. NFC	2024-10-29 10:09:28 -07:00
Jubilee	f53889ffca	[RISCV] Allow crypto features to imply dependents (#112659 ) This relationship is a logical dependency. Note Zvbc and Zvknhb. They are explicitly called out in the spec as requiring 64 bits: - `56ed7952d1/doc/vector/riscv-crypto-spec-vector.adoc`	2024-10-29 10:07:20 -07:00
SpencerAbson	2a9dd8af5a	[AArch64] Add assembly/disassembly for zeroing SVE FCVT{X} and BFCVT (#113916 ) This patch adds assembly/disassembly support for the following SVE2.2 instructions - FCVT (zeroing) - FCVTX (zeroing) - BFCVT (zeroing) In accordance with: https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions	2024-10-29 16:55:19 +00:00
Jay Foad	a156362e93	[AMDGPU] Fix machine verification failure after SIFoldOperandsImpl::tryFoldOMod (#113544 ) Fixes #54201	2024-10-29 14:59:37 +00:00
Sarah Spall	75e7ba8c0b	[HLSL] Re-implement countbits with the correct return type (#113189 ) Restricts hlsl countbits to always return a uint32. Implements a lowering from llvm.ctpop which has an overloaded return type to dxil cbits op which always returns uint32. Closes #112779	2024-10-29 07:56:05 -07:00
Shilei Tian	e268398fa8	[NFC][AMDGPU] Use `!foreach` to replace explicit list of registers (#114005 )	2024-10-29 10:50:06 -04:00
Momchil Velikov	b6a84e77b6	[AArch64] Add assembly/disassembly for FMOP4A (widening, 4-way) instructions (#113347 ) The new instructions are described in https://developer.arm.com/documentation/ddi0602/2024-09/SME-Instructions	2024-10-29 14:36:07 +00:00
Matt Arsenault	88e23eb2cf	DAG: Fix legalization of vector addrspacecasts (#113964 )	2024-10-29 08:08:50 -05:00
Lukacma	3c2d77185e	[AARCH64] Add assembly/disassembly for FMMLA instructions (#113313 ) This patch adds assembly/disassembly for the following instructions: FMMLA (widening, FP16 to FP32) FMMLA (widening, FP8 to FP16) FMMLA (widening, FP8 to FP32) According to [1] [1]https://developer.arm.com/documentation/ddi0602	2024-10-29 13:02:46 +00:00
Momchil Velikov	ec427df2b9	[AArch64] Add assembly/disassembly for FMOP4{A,S} (non-widening) half-precision instructions (#113343 ) The new instructions are described in https://developer.arm.com/documentation/ddi0602/2024-09/SME-Instructions	2024-10-29 11:50:29 +00:00
Jay Foad	2443549b85	[IR] Remove some uses of StructType::setBody. NFC. (#113685 ) It is simple to create the struct body up front, now that we have transitioned to opaque pointers.	2024-10-29 11:44:53 +00:00
Lukacma	98c8d64353	[AArch64] Add assembly/dissasembly for BFSCALE instructions (#113538 ) This patch adds assembly/disassembly for following instructions: BFSCALE (multiple and single vector) BFSCALE (multiple vectors) As specified in https://developer.arm.com/documentation/ddi0602/2024-09 Co-authored-by: Momchil Velikov [momchil.velikov@arm.com](mailto:momchil.velikov@arm.com)	2024-10-29 11:08:36 +00:00
Benjamin Maxwell	c3260c65e8	[IR] Add `llvm.sincos` intrinsic (#109825 ) This adds the `llvm.sincos` intrinsic, legalization, and lowering. The `llvm.sincos` intrinsic takes a floating-point value and returns both the sine and cosine (as a struct). ``` declare { float, float } @llvm.sincos.f32(float %Val) declare { double, double } @llvm.sincos.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val) ``` The lowering is built on top of the existing FSINCOS ISD node, with additional type legalization to allow for f16, f128, and vector values.	2024-10-29 10:52:20 +00:00
CarolineConcatto	8d38fbf2f0	[LLVM][AArch64] Add assembly/disassembly for SVE Integer Unary Arithm… (#113670 ) …etic Predicated instructions This patch adds the following instructions: SVE bitwise unary operations (predicated) CLS, CLZ, CNT, CNOT, FABS, FNEG, NOT SVE integer unary operations (predicated) SXT{B,H,W}, UXT{B,H,W}, ABS ,NEG SVE2 integer unary operations (predicated) URECPE, URSQRTE, SQABS, SQNEG According to https://developer.arm.com/documentation/ddi0602 Co-authored-by: Spencer Abson Spencer.Abson@arm.com	2024-10-29 09:09:55 +00:00
CarolineConcatto	d4197f3ac1	[LLVM][AArch64] Add assembly/disassembly for MUL/BFMUL SME instructions (#113535 ) According to https://developer.arm.com/documentation/ddi0602 Co-authored-by: Momchil-Velikov Momchil.Velikov@arm.com	2024-10-29 09:09:13 +00:00
Alex Bradbury	7544d3af0e	[RISCV] Mark RVB23U64 and RVB23S64 as non-experimental (#113918 ) The specification was recently ratified <https://github.com/riscv/riscv-profiles/blob/main/src/rvb23-profile.adoc>.	2024-10-29 07:57:34 +00:00
Craig Topper	3f4468faaa	[RISCV] Teach expandRV32ZdinxStore to handle memoperand not being present. (#113981 ) I received a report that the outliner drops memoperands and causes this code to crash. Handle this by only copying the memoperand if it exists. Similar for expandRV32ZdinxLoad	2024-10-28 22:37:47 -07:00
Craig Topper	635c344dfb	[X86] Add vector_compress patterns with a zero vector passthru. (#113970 ) We can use the kz form to automatically zero the extra elements. Fixes #113263.	2024-10-28 19:59:00 -07:00
joaosaffran	481bce018e	Adding splitdouble HLSL function (#109331 ) - Adding hlsl `splitdouble` intrinsics - Adding DXIL lowering - Adding SPIRV lowering - Adding test Fixes: #108901 --------- Co-authored-by: Joao Saffran <jderezende@microsoft.com>	2024-10-28 13:26:59 -07:00
David Green	8274be509e	[AArch64] Remove header dependencies of AArch64ISelLowering.h. NFC This patch aims to reduce the include used by AArch64ISelLowering, allowing it to be included by unittests so that they can reference the AArch64ISD nodes. It: - Moves the inclusion of AArch64SMEAttributes.h to the uses. - Moves LowerPtrAuthGlobalAddressStatically to a static function, so that AArch64PACKey is not required in the header. - Moves the definitions of getExceptionPointerRegister to the cpp file, to remove the reference of AArch64::X0.	2024-10-28 18:53:37 +00:00
David Green	5a5b78a84e	[AArch64][GlobalISel] Lower aarch64.neon.smull/umull intrinsics. As with other nodes, we can convert these into G_UMULL and G_SMULL aarch64 instructions.	2024-10-28 18:51:10 +00:00
Shilei Tian	4cf128512b	[NFC][AMDGPU] Use C++17 structured bindings as much as possible (#113939 ) This only changes `llvm/lib/Target/AMDGPU/SIISelLowering.cpp`. There are five uses of `std::tie` remaining because they can't be replaced with C++17 structured bindings.	2024-10-28 13:55:57 -04:00
Ellis Hoag	6ab26eab4f	Check hasOptSize() in shouldOptimizeForSize() (#112626 )	2024-10-28 09:45:03 -07:00
Min-Yih Hsu	ab5d3c9d35	[RISCV] Assign different scheduling classes for VMADC/VMSBC (#113009 ) Split the scheduling classes of VMADC/VMSBC away from that of VADC/VSBC. Because the former are technically mask-producing instructions rather than normal vector arithmetics, which might have different performance characteristics on some processors. This is effectively NFC.	2024-10-28 09:37:54 -07:00
CarolineConcatto	106259510f	[AArch64]Add convert and multiply-add SIMD&FP assembly/disassembly in… (#113296 ) …structions This patch adds the following instructions: Conversion between floating-point and integer: FCVT{AS, AU, MS, MU, NS, NU, PS, PU, ZS, ZU} {S,U}CVTF Advanced SIMD three-register extension: FMMLA According to https://developer.arm.com/documentation/ddi0602 Co-authored-by: Marian Lukac marian.lukac@arm.com Co-authored-by: Spencer Abson spencer.abson@arm.com	2024-10-28 16:36:02 +00:00
Aiden Grossman	eb53d08bce	[llvm-exegesis] Add Pfm Counters for SapphireRapids (#113847 ) This patch adds the appropriate hookups in X86PfmCounters.td for SapphireRapids. This is mostly to fix errors when some of my jobs that only really need dummy counters get scheduled on sapphire rapids machines, but figured I might as well do it properly while here. I do not have hardware access to test this currently, but this matches exactly with what is in the libpfm source code.	2024-10-28 09:07:14 -07:00
Craig Topper	5ac3f3c45c	[RISCV] Add DestEEW = EEW1 to VMADC. (#113013 ) It was present on VMSBC but not VMADC. Reorder the instructions to avoid duplicate 'let' statements.	2024-10-28 09:06:12 -07:00
SpencerAbson	ce0368eb84	[AArch64] Add assembly/disassembly for PMLAL/PMULL instructions (#113564 ) This patch adds assembly/disassembly for the following SVE_AES2 instructions - PMLAL - PMULL - In accordance with: https://developer.arm.com/documentation/ddi0602/latest/	2024-10-28 13:55:16 +00:00
Benjamin Maxwell	ddd463be7e	[AArch64] Add getStreamingHazardSize() to AArch64Subtarget (#113679 ) This is defined by the `-aarch64-streaming-hazard-size` option or its alias `-aarch64-stack-hazard-size` (the original name). It has been renamed to be more general as this option will (for the time being) be used to detect if the current target has streaming mode memory hazards. --------- Co-authored-by: Hari Limaye <hari.limaye@arm.com>	2024-10-28 13:01:22 +00:00
Alex Bradbury	ba7555e640	[RISCV] Mark the RVA23S64 and RVA23U64 profiles as non-experimental (#113826 ) All of the extensions used by these profile are themselves non-experimental, and RVA23 was just ratified <https://riscv.org/announcements/2024/10/risc-v-announces-ratification-of-the-rva23-profile-standard/>. <https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc> We lack a way of expressing `Ss1p13` (supervisor architecture 1.13), but this is a problem we have for RVA22 (Ss1p12) and RVA20 (Ss1p11) so I don't feel it's a blocker.	2024-10-28 12:56:47 +00:00
dong-miao	75c75fc16e	[RISCV]Add svvptc extension (#113882 )	2024-10-28 22:54:51 +11:00
Luke Lau	0cbccb13d6	[RISCV] Remove support for pre-RA vsetvli insertion (#110796 ) Now that LLVM 19.1.0 has been out for a while with post-vector-RA vsetvli insertion enabled by default, this proposes to remove the flag that restores the old pre-RA behaviour so we only have one configuration going forward. That flag was mainly meant as a fallback in case users ran into issues, but I haven't seen anything reported so far.	2024-10-28 11:31:18 +00:00
SpencerAbson	64148944c5	[AArch64] Add assembly/disassembly for zeroing SVE2 integer instructions (#113473 ) This patch adds assembly/disassembly for the following SVE2.2 instructions - SQABS (zeroing) - SQNEG (zeroing) - URECPE (zeroing) - USQRTE (zeroing) - Refactor the existing merging forms to remove the now redundant bit 17 argument. - In accordance with: https://developer.arm.com/documentation/ddi0602/latest/	2024-10-28 10:41:07 +00:00
Jonas Paulsson	09160a9821	[SystemZ] Silence compiler warning (#113894 ) Use SystemZ::NoRegister instead of 0 in SystemZTargetLowering::getRegisterByName().	2024-10-28 11:32:39 +01:00
Mirko Brkušanin	fa4790e404	[AMDGPU][MC] Fix disassembler for VIMAGE when non-first vaddr is v0 (#113569 ) For disassembler tables we use V1_V4 variants for VIMAGE and then remove unused vaddr fields. V1_V1 variant, which has every vaddr field other than vaddr0 set to 0, was also enabled and caused confusion when decoding cases which used v0 (whose encoded value is 0)	2024-10-28 10:43:18 +01:00
Luke Lau	96f5c68350	[RISCV] Lower @llvm.experimental.vector.compress for zvfhmin/zvfbfmin (#113770 ) This is a follow up to #113291 and handles f16/bf16 with zvfhmin and zvfbmin.	2024-10-28 09:37:06 +00:00
Alex Bradbury	43a5719d9f	[RISCV] Use Sha extension in RVA23S64 profile (#113823 ) In the ratified version of the RVA23S64 definition, the Sha extension is now used to group together the set of hypervisor related extensions. <https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc>	2024-10-28 09:22:09 +00:00
Oliver Stannard	dff114b356	[ARM] Optimise non-ABI frame pointers (#110286 ) With -fomit-frame-pointer, even if we set up a frame pointer for other reasons (e.g. variable-sized or over-aligned stack allocations), we don't need to create an ABI-compliant frame record. This means that we can save all of the general-purpose registers in one push, instead of splitting it to ensure that the frame pointer and link register are adjacent on the stack, saving two instructions per function.	2024-10-28 09:01:06 +00:00
Jack Styles	86f76c3b17	[AArch64][Libunwind] Add Support for FEAT_PAuthLR DWARF Instruction (#112171 ) As part of FEAT_PAuthLR, a new DWARF Frame Instruction was introduced, `DW_CFA_AARCH64_negate_ra_state_with_pc`. This instructs Libunwind that the PC has been used with the signing instruction. This change includes three commits - Libunwind support for the newly introduced DWARF Instruction - CodeGen Support for the DWARF Instructions - Reversing the changes made in #96377. Due to `DW_CFA_AARCH64_negate_ra_state_with_pc`'s requirements to be placed immediately after the signing instruction, this would mean the CFI Instruction location was not consistent with the generated location when not using FEAT_PAuthLR. The commit reverses the changes and makes the location consistent across the different branch protection options. While this does have a code size effect, this is a negligible one. For the ABI information, see here: `853286c7ab/aadwarf64/aadwarf64.rst (id23)`	2024-10-28 08:22:38 +00:00
Fabian Ritter	a4fd3dba6e	[AMDGPU] Use wider loop lowering type for LowerMemIntrinsics (#112332 ) When llvm.memcpy or llvm.memmove intrinsics are lowered as a loop in LowerMemIntrinsics.cpp, the loop consists of a single load/store pair per iteration. We can improve performance in some cases by emitting multiple load/store pairs per iteration. This patch achieves that by increasing the width of the loop lowering type in the GCN target and letting legalization split the resulting too-wide access pairs into multiple legal access pairs. This change only affects lowered memcpys and memmoves with large (>= 1024 bytes) constant lengths. Smaller constant lengths are handled by ISel directly; non-constant lengths would be slowed down by this change if the dynamic length was smaller or slightly larger than what an unrolled iteration copies. The chosen default unroll factor is the result of microbenchmarks on gfx1030. This change leads to speedups of 15-38% for global memory and 1.9-5.8x for scratch in these microbenchmarks. Part of SWDEV-455845.	2024-10-28 09:04:19 +01:00
Alex Bradbury	35f6cc6af0	[RISCV] Add the Sha extension (#113820 ) This was introduced in the now-ratified RVA23 profile (and also added to the RVA22 text) as a simple way of referring to H plus the set of supervisor extensions required by RVA23. https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc This patch simply defines the extension. The next patch will adjust the RVA23 profile to use it, and at that point I think we will be ready to mark RVA23 as non-experimental. Note that I haven't made it so if you enable all extensions that constitute Sha, Sha is implied. Per #76893 (adding 'B'), the concern is making this implication might break older external assemblers. Perhaps this is less of a concern given the relative frequency of `-march=${foo}_zba_zbb_zbs` vs the collection of H extensions. If we did want to add that implication, we'd probably want to add it in a separate patch so it can be easily reverted if found to cause problems.	2024-10-28 07:42:33 +00:00
Phoebe Wang	fd85761208	[X86][BF16] Customize VSELECT for BF16 under AVX-NECONVERT (#113322 ) Fixes: https://godbolt.org/z/9abGnE8zs	2024-10-28 15:15:49 +08:00
Freddy Ye	d3f70db51c	[X86][MC] Support instructions of MSR_IMM (#113524 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-10-28 12:59:51 +08:00
Freddy Ye	5aa1275d03	[X86] Support SM4 EVEX version intrinsics/instructions. (#113402 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-10-28 10:46:16 +08:00

1 2 3 4 5 ...

80705 Commits