llvm-project

Author	SHA1	Message	Date
David Majnemer	5c12434906	[X86] Emit comments explaining the immediate in vfpclass This makes the assembly a lot more readable at a glance. As an example: ``` vfpclasspd $4, %zmm0, %k0 # k0 = isNegativeZero(zmm0) ```	2024-10-29 19:54:34 +00:00
Maryam Moghadas	8a0cb9ac86	[PowerPC] Add custom lowering for ssubo (#111748 ) This patch is to improve the codegen for ssubo node for i32 in 64-bit mode by custom lowering.	2024-10-29 15:43:05 -04:00
Adam Yang	3a1228a543	[SPIRV] Add GroupMemoryBarrierWithGroupSync intrinsic (#111888 ) partially fixes #70103 ### Changes * Added int_spv_group_memory_barrier_with_group_sync intrinsic in IntrinsicsSPIRV.td * Added lowering for int_spv_group_memory_barrier_with_group_sync in SPIRVInstructionSelector.cpp * Added SPIRV backend test case ### Related PRs * [[clang][HLSL] Add GroupMemoryBarrierWithGroupSync intrinsic #111883](https://github.com/llvm/llvm-project/pull/111883) * [[DXIL] Add GroupMemoryBarrierWithGroupSync intrinsic #111884](https://github.com/llvm/llvm-project/pull/111884)	2024-10-29 12:40:01 -07:00
Harald van Dijk	950ee75909	[RISC-V] Fix check of minimum vlen. (#114055 ) If we have a minimum vlen, we were adjusting StackSize to change the unit from vscale to bytes, and then calculating the required padding size for alignment in bytes. However, we then used that padding size as an offset in vscale units, resulting in misplaced stack objects. While it would be possible to adjust the object offsets by dividing AlignmentPadding by ST.getRealMinVLen() / RISCV::RVVBitsPerBlock, we can simplify the calculation a bit if instead we adjust the alignment to be in vscale units. @topperc This fixes a bug I am seeing after #110312, but I am not 100% certain I am understanding the code correctly, could you please see if this makes sense to you?	2024-10-29 17:30:30 +00:00
Afanasyev Ivan	4e1b9d34f9	[mir-strip-debug] Fix debug location info strip for bundled instructions (#113676 ) Fix bug that `mir-strip-debug` pass does not remove debug location from bundled instructions. Problem arises during testing that debug info does not affect optimization passes output (`llvm-lit` with ` -Dllc="llc -debugify-and-strip-all-safe"`), when pass operates on MIR with bundled instructions + memory operands. Let mir test check looks like: ``` CHECK-NEXT: BUNDLE { CHECK-NEXT: $r3 = LD $r1, $r2 :: (load (s64) from %ir.a, !tbaa !2) CHECK-NEXT: } ``` So as `mir-strip-debug` pass does not process bundled instructions, running `llc -debugify-and-strip-all-safe` on the test will produce the following output: ``` BUNDLE { $r3 = LD $r1, $r2, debug-location !DILocation(line: 3, column: 1, scope: <0x608cb2b99b10>) :: (load (s64) from %ir.a, !tbaa !2) } ``` And test will fail, but it shouldn't. Seems like the root cause is that `mir-strip-debug` pass should remove debug location from bundled instructions.	2024-10-29 10:26:15 -07:00
Adam Yang	9a5b3a1bbc	[DXIL] Add GroupMemoryBarrierWithGroupSync intrinsic (#111884 ) fixes #112974 partially fixes #70103 ### Changes - Added new tablegen based way of lowering dx intrinsics to DXIL ops. - Added int_dx_group_memory_barrier_with_group_sync intrinsic in IntrinsicsDirectX.td - Added expansion for int_dx_group_memory_barrier_with_group_sync in DXILIntrinsicExpansion.cpp` - Added DXIL backend test case ### Related PRs * [[clang][HLSL] Add GroupMemoryBarrierWithGroupSync intrinsic #111883](https://github.com/llvm/llvm-project/pull/111883) * [[SPIRV] Add GroupMemoryBarrierWithGroupSync intrinsic #111888](https://github.com/llvm/llvm-project/pull/111888)	2024-10-29 10:17:35 -07:00
Jay Foad	a156362e93	[AMDGPU] Fix machine verification failure after SIFoldOperandsImpl::tryFoldOMod (#113544 ) Fixes #54201	2024-10-29 14:59:37 +00:00
Sarah Spall	75e7ba8c0b	[HLSL] Re-implement countbits with the correct return type (#113189 ) Restricts hlsl countbits to always return a uint32. Implements a lowering from llvm.ctpop which has an overloaded return type to dxil cbits op which always returns uint32. Closes #112779	2024-10-29 07:56:05 -07:00
Matt Arsenault	88e23eb2cf	DAG: Fix legalization of vector addrspacecasts (#113964 )	2024-10-29 08:08:50 -05:00
Simon Pilgrim	32aa782ea2	[PowerPC] copysignl.ll - regenerate to reduce the diff in #111269	2024-10-29 10:57:43 +00:00
Benjamin Maxwell	c3260c65e8	[IR] Add `llvm.sincos` intrinsic (#109825 ) This adds the `llvm.sincos` intrinsic, legalization, and lowering. The `llvm.sincos` intrinsic takes a floating-point value and returns both the sine and cosine (as a struct). ``` declare { float, float } @llvm.sincos.f32(float %Val) declare { double, double } @llvm.sincos.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val) ``` The lowering is built on top of the existing FSINCOS ISD node, with additional type legalization to allow for f16, f128, and vector values.	2024-10-29 10:52:20 +00:00
Alex Bradbury	7544d3af0e	[RISCV] Mark RVB23U64 and RVB23S64 as non-experimental (#113918 ) The specification was recently ratified <https://github.com/riscv/riscv-profiles/blob/main/src/rvb23-profile.adoc>.	2024-10-29 07:57:34 +00:00
Craig Topper	635c344dfb	[X86] Add vector_compress patterns with a zero vector passthru. (#113970 ) We can use the kz form to automatically zero the extra elements. Fixes #113263.	2024-10-28 19:59:00 -07:00
Matt Arsenault	1ceccbb0dd	VirtRegRewriter: Add implicit register defs for live out undef lanes (#112679 ) If an undef subregister def is live into another block, we need to maintain a physreg def to track the liveness of those lanes. This would manifest a verifier error after branch folding, when the cloned tail block use no longer had a def. We need to detect interference with other assigned intervals to avoid clobbering the undef lanes defined in other intervals, since the undef def didn't count as interference. This is pretty ugly and adds a new dependency on LiveRegMatrix, keeping it live for one more pass. It also adds a lot of implicit operand spam (we really should have a better representation for this). There is a missing verifier check for this situation. Added an xfailed test that demonstrates this. We may also be able to revert the changes in 47d3cbcf842a036c20c3f1c74255cdfc213f41c2. It might be better to insert an IMPLICIT_DEF before the instruction rather than using the implicit-def operand. Fixes #98474	2024-10-28 17:33:53 -07:00
Simon Pilgrim	c5edecbb4b	[X86] Regenerate scmp/ucmp test checks with vpternlog comments	2024-10-28 21:36:59 +00:00
joaosaffran	481bce018e	Adding splitdouble HLSL function (#109331 ) - Adding hlsl `splitdouble` intrinsics - Adding DXIL lowering - Adding SPIRV lowering - Adding test Fixes: #108901 --------- Co-authored-by: Joao Saffran <jderezende@microsoft.com>	2024-10-28 13:26:59 -07:00
David Green	5a5b78a84e	[AArch64][GlobalISel] Lower aarch64.neon.smull/umull intrinsics. As with other nodes, we can convert these into G_UMULL and G_SMULL aarch64 instructions.	2024-10-28 18:51:10 +00:00
Simon Pilgrim	670512b5c3	[AArch64] Regenerate srem-lkk.ll to add missing asm comments Reduces diff in #112588	2024-10-28 16:00:45 +00:00
Alex Bradbury	ba7555e640	[RISCV] Mark the RVA23S64 and RVA23U64 profiles as non-experimental (#113826 ) All of the extensions used by these profile are themselves non-experimental, and RVA23 was just ratified <https://riscv.org/announcements/2024/10/risc-v-announces-ratification-of-the-rva23-profile-standard/>. <https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc> We lack a way of expressing `Ss1p13` (supervisor architecture 1.13), but this is a problem we have for RVA22 (Ss1p12) and RVA20 (Ss1p11) so I don't feel it's a blocker.	2024-10-28 12:56:47 +00:00
dong-miao	75c75fc16e	[RISCV]Add svvptc extension (#113882 )	2024-10-28 22:54:51 +11:00
Simon Pilgrim	056cf936a7	[DAG] Fold (and X, (bswap/bitreverse (not Y))) -> (and X, (not (bswap/bitreverse Y))) (#112547 ) On ANDNOT capable targets we can always do this profitably, without ANDNOT we only attempt this if we don't introduce an additional NOT Fixes #112425	2024-10-28 11:52:44 +00:00
Luke Lau	0cbccb13d6	[RISCV] Remove support for pre-RA vsetvli insertion (#110796 ) Now that LLVM 19.1.0 has been out for a while with post-vector-RA vsetvli insertion enabled by default, this proposes to remove the flag that restores the old pre-RA behaviour so we only have one configuration going forward. That flag was mainly meant as a fallback in case users ran into issues, but I haven't seen anything reported so far.	2024-10-28 11:31:18 +00:00
Luke Lau	96f5c68350	[RISCV] Lower @llvm.experimental.vector.compress for zvfhmin/zvfbfmin (#113770 ) This is a follow up to #113291 and handles f16/bf16 with zvfhmin and zvfbmin.	2024-10-28 09:37:06 +00:00
Alex Bradbury	43a5719d9f	[RISCV] Use Sha extension in RVA23S64 profile (#113823 ) In the ratified version of the RVA23S64 definition, the Sha extension is now used to group together the set of hypervisor related extensions. <https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc>	2024-10-28 09:22:09 +00:00
Oliver Stannard	dff114b356	[ARM] Optimise non-ABI frame pointers (#110286 ) With -fomit-frame-pointer, even if we set up a frame pointer for other reasons (e.g. variable-sized or over-aligned stack allocations), we don't need to create an ABI-compliant frame record. This means that we can save all of the general-purpose registers in one push, instead of splitting it to ensure that the frame pointer and link register are adjacent on the stack, saving two instructions per function.	2024-10-28 09:01:06 +00:00
Jack Styles	86f76c3b17	[AArch64][Libunwind] Add Support for FEAT_PAuthLR DWARF Instruction (#112171 ) As part of FEAT_PAuthLR, a new DWARF Frame Instruction was introduced, `DW_CFA_AARCH64_negate_ra_state_with_pc`. This instructs Libunwind that the PC has been used with the signing instruction. This change includes three commits - Libunwind support for the newly introduced DWARF Instruction - CodeGen Support for the DWARF Instructions - Reversing the changes made in #96377. Due to `DW_CFA_AARCH64_negate_ra_state_with_pc`'s requirements to be placed immediately after the signing instruction, this would mean the CFI Instruction location was not consistent with the generated location when not using FEAT_PAuthLR. The commit reverses the changes and makes the location consistent across the different branch protection options. While this does have a code size effect, this is a negligible one. For the ABI information, see here: `853286c7ab/aadwarf64/aadwarf64.rst (id23)`	2024-10-28 08:22:38 +00:00
Fabian Ritter	a4fd3dba6e	[AMDGPU] Use wider loop lowering type for LowerMemIntrinsics (#112332 ) When llvm.memcpy or llvm.memmove intrinsics are lowered as a loop in LowerMemIntrinsics.cpp, the loop consists of a single load/store pair per iteration. We can improve performance in some cases by emitting multiple load/store pairs per iteration. This patch achieves that by increasing the width of the loop lowering type in the GCN target and letting legalization split the resulting too-wide access pairs into multiple legal access pairs. This change only affects lowered memcpys and memmoves with large (>= 1024 bytes) constant lengths. Smaller constant lengths are handled by ISel directly; non-constant lengths would be slowed down by this change if the dynamic length was smaller or slightly larger than what an unrolled iteration copies. The chosen default unroll factor is the result of microbenchmarks on gfx1030. This change leads to speedups of 15-38% for global memory and 1.9-5.8x for scratch in these microbenchmarks. Part of SWDEV-455845.	2024-10-28 09:04:19 +01:00
Alex Bradbury	35f6cc6af0	[RISCV] Add the Sha extension (#113820 ) This was introduced in the now-ratified RVA23 profile (and also added to the RVA22 text) as a simple way of referring to H plus the set of supervisor extensions required by RVA23. https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc This patch simply defines the extension. The next patch will adjust the RVA23 profile to use it, and at that point I think we will be ready to mark RVA23 as non-experimental. Note that I haven't made it so if you enable all extensions that constitute Sha, Sha is implied. Per #76893 (adding 'B'), the concern is making this implication might break older external assemblers. Perhaps this is less of a concern given the relative frequency of `-march=${foo}_zba_zbb_zbs` vs the collection of H extensions. If we did want to add that implication, we'd probably want to add it in a separate patch so it can be easily reverted if found to cause problems.	2024-10-28 07:42:33 +00:00
Phoebe Wang	fd85761208	[X86][BF16] Customize VSELECT for BF16 under AVX-NECONVERT (#113322 ) Fixes: https://godbolt.org/z/9abGnE8zs	2024-10-28 15:15:49 +08:00
Serge Pavlov	819abe412d	[Test] Fix usage of constrained intrinsics (#113523 ) Some tests contain errors in constrained intrinsic usage, such as missed or extra type parameters, wrong type parameters order and some other. --------- Co-authored-by: Andy Kaylor <andy_kaylor@yahoo.com>	2024-10-28 14:07:32 +07:00
Freddy Ye	5aa1275d03	[X86] Support SM4 EVEX version intrinsics/instructions. (#113402 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-10-28 10:46:16 +08:00
Phoebe Wang	40fffba9b2	[X86][AVX10.2] Fix wrong predicates for BF16 feature (#113800 ) Since AVX10.2, we need to enable 128/256-bit vector by default and check for 512 feature for 512-bit vector.	2024-10-28 09:54:29 +08:00
Thorsten Schütt	7b3da7b3b2	[GlobalISel][AArch64] Legalize G_ADD, G_SUB, G_AND, G_OR, and G_XOR for SVE (#110561 ) Credits: https://github.com/llvm/llvm-project/pull/72976 LLVM ERROR: cannot select: %3:zpr(<vscale x 2 x s64>) = G_MUL %0:fpr, %1:fpr (in function: xmulnxv2i64) ;; mul define void @xmulnxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, ptr %p) { entry: %c = mul <vscale x 2 x i64> %a, %b store <vscale x 2 x i64> %c, ptr %p, align 16 ret void } define void @mulnxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, ptr %p) { entry: %c = mul <vscale x 4 x i32> %a, %b store <vscale x 4 x i32> %c, ptr %p, align 16 ret void } define void @mulnxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, ptr %p) { entry: %c = mul <vscale x 8 x i16> %a, %b store <vscale x 8 x i16> %c, ptr %p, align 16 ret void } define void @mulnxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, ptr %p) { entry: %c = mul <vscale x 16 x i8> %a, %b store <vscale x 16 x i8> %c, ptr %p, align 16 ret void }	2024-10-27 23:14:07 +01:00
Aiden Grossman	7c9cf0c6f0	[SHT_LLVM_BB_ADDR_MAP][AsmPrinter] Emit error on bad option combinatons This patch makes it so that specifying all or none for -pgo-analysis-map along with an explicit option causes an error as this set of options does not really make sense.	2024-10-26 08:15:34 +00:00
Philip Reames	5ad500ca4a	[RISCV] Coverage for a few missed vector idioms	2024-10-25 16:28:36 -07:00
Aiden Grossman	38caf282ab	[SHT_LLVM_BB_ADDR_MAP][AsmPrinter] Add none and all options to PGO Map (#111221 ) This patch adds none and all options to the -pgo-analysis-map flag, which do basically what they say on the tin. The none option is added to enable forcing the pgo-analysis-map by overriding an earlier invocation of the flag. The all option is just added for convenience.	2024-10-25 15:39:52 -07:00
Dan Gohman	1bc2cd98c5	[WebAssembly] Enable nontrapping-fptoint and bulk-memory by default. (#112049 ) We were prepared to enable these features [back in February], but they got pulled for what appear to be unrelated reasons. So let's have another try at enabling them! Another motivation here is that it'd be convenient for the [Lime1 proposal] if "lime1" is close to a subset of "generic" (missing only for extended-const). [back in February]: https://github.com/WebAssembly/tool-conventions/issues/158#issuecomment-1931119512 [Lime1 proposal]: https://github.com/llvm/llvm-project/pull/112035	2024-10-25 13:52:51 -07:00
Gaëtan Bossu	a0c318938a	[CodeGen][NFC] Properly split MachineLICM and EarlyMachineLICM (#113573 ) Both are based on MachineLICMBase, and the functionality there is "switched" based on a PreRegAlloc flag. This commit is simply about trusting the original value of that flag, defined by the `MachineLICM` and `EarlyMachineLICM` classes. The `PreRegAlloc` flag used to be overwritten it based on MRI.isSSA(), which is un-reliable due to how it is inferred by the MIRParser. I see that we can now define isSSA in MIR (thanks @gargaroff ), meaning the fix isn’t really needed anymore, but redefining that flag still feels wrong. Note that I'm looking into upstreaming more changes to MachineLICM, see [the discourse thread](https://discourse.llvm.org/t/extending-post-regalloc-machinelicm/82725).	2024-10-25 11:19:22 -07:00
Alex Bradbury	cbdfb18794	[RISCV] Add Supm extension to RVA23 profiles (#113619 ) This is mandatory for both RVA23U64 and RVA23S64 in the ratified version of the specification <https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc>.	2024-10-25 15:39:07 +01:00
Alex Bradbury	2c0b34852a	[RISCV] Mark pointer masking extensions as non-experimental (#113618 ) These extensions were ratified very recently. <https://lf-riscv.atlassian.net/wiki/spaces/HOME/pages/16154732/Ratified+Extensions> I've ensured we have definitions for all extensions in the document <https://drive.google.com/file/d/159QffOTbi3EEbdkKndYRZ2c46D25ZLmO/view?usp=drive_link>. There are no additional CSRs.	2024-10-25 12:24:50 +01:00
Oliver Stannard	376d7b27fa	[ARM] Optimise byval arguments in tail-calls We don't need to copy byval arguments to tail calls via a temporary, if we can prove that we are not copying from the outgoing argument area. This patch does this when the source if the argument is one of: * Memory in the local stack frame, which can't be used for tail-call arguments. * A global variable. We can also avoid doing the copy completely if the source and destination are the same memory location, which is the case when the caller and callee have the same signature, and pass some arguments through unmodified.	2024-10-25 09:34:09 +01:00
Oliver Stannard	914a3990d1	[ARM] Avoid clobbering byval arguments when passing to tail-calls When passing byval arguments to tail-calls, we need to store them into the stack memory in which this the caller received it's arguments. If any of the outgoing arguments are forwarded from incoming byval arguments, then the source of the copy is from the same stack memory. This can result in the copy corrupting a value which is still to be read. The fix is to first make a copy of the outgoing byval arguments in local stack space, and then copy them to their final location. This fixes the correctness issue, but results in extra copying, which could be optimised.	2024-10-25 09:34:09 +01:00
Oliver Stannard	78ec2e2ed5	[ARM] Allow tail calls with byval args Byval arguments which are passed partially in registers get stored into the local stack frame, but it is valid to tail-call them because the part which gets spilled is always re-loaded into registers before doing the tail-call, so it's OK for the spill area to be deallocated.	2024-10-25 09:34:08 +01:00
Oliver Stannard	82e6472197	[ARM] Allow functions with sret returns to be tail-called It is valid to tail-call a function which returns through an sret argument, as long as we have an incoming sret pointer to pass on.	2024-10-25 09:34:08 +01:00
Oliver Stannard	c1eb790cd2	[ARM] Tail-calls do not require caller and callee arguments to match The ARM backend was checking that the outgoing values for a tail-call matched the incoming argument values of the caller. This isn't necessary, because the caller can change the values in both registers and the stack before doing the tail-call. The actual limitation is that the callee can't need more stack space for it's arguments than the caller does. This is needed for code using the musttail attribute, as well as enabling tail calls as an optimisation in more cases.	2024-10-25 09:34:08 +01:00
Oliver Stannard	e3f218096c	[ARM] Re-generate a test	2024-10-25 09:34:07 +01:00
dong-miao	ed6ddffb58	[RISCV] Add Smrnmi extension (#111668 ) This commit has completed the Extension for Resumable Non Maskable Interrupts, adding four CRSs and one Trap-Return instruction. Specification link:["Smrnmi" Extension](https://github.com/riscv/riscv-isa-manual/blob/main/src/rnmi.adoc) --------- Co-authored-by: Sam Elliott <sam@lenary.co.uk>	2024-10-25 18:41:21 +11:00
Freddy Ye	c4248fa3ed	[X86] Support MOVRS and AVX10.2 instructions. (#113274 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-10-25 09:00:19 +08:00
Tex Riddell	c03d09ce3e	[aarch64] atan2 intrinsic lowering (p5) (#112611 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 - `VecFuncs.def`: define intrinsic to sleef/armpl mapping - `LegalizerHelper.cpp`: add missing fewerElementsVector handling for the new atan2 intrinsic - `AArch64ISelLowering.cpp`: Add arch64 specializations for lowering like neon instructions - `AArch64LegalizerInfo.cpp`: Legalize atan2. Part 5 for Implement the atan2 HLSL Function #70096.	2024-10-24 17:53:12 -07:00
Jun Wang	19b0453361	[AMDGPU][MC] Fix disassembler problem for image_atomic with TFE (#112622 ) For image_atomic instructions with TFE, in some cases (e.g., when dmask=3) the disassembler produces dst register with wrong size (e.g., image_atomic_smin v5, v1, s[8:15] dmask:0x3 tfe, instead of v[5:7]). This patch fixes the VDataDwords values for image atomic instructions.	2024-10-24 16:19:18 -07:00

1 2 3 4 5 ...

55694 Commits