llvm-project

Author	SHA1	Message	Date
David Green	5db67e1c86	[GlobalISel] Add a fadd 0.0 combine with nsz (#153748 ) This is surprisingly helpful, coming up a lot from fadd reductions.	2025-08-21 10:19:39 +01:00
David Green	d9d71bdc14	[AArch64] Move BSL generation to lowering. (#151855 ) It is generally better to allow the target independent combines before creating AArch64 specific nodes (providing they don't mess it up). This moves the generation of BSL nodes to lowering, not a combine, so that intermediate nodes are more likely to be optimized. There is a small change in the constant handling to detect legalized buildvector arguments correctly. Fixes #149380 but not directly. #151856 contained a direct fix for expanding the pseudos.	2025-08-21 09:54:42 +01:00
Benjamin Maxwell	810ea69edd	[LiveRegUnits] Exclude runtime defined liveins when computing liveouts (#154325 ) These liveins are not defined by predecessors, so should not be considered as liveouts in predecessor blocks. This resolves: - https://github.com/llvm/llvm-project/pull/149062#discussion_r2285072001 - https://github.com/llvm/llvm-project/pull/153417#issuecomment-3199972351	2025-08-21 09:06:32 +01:00
David Green	4875553f4c	[AArch64][GlobalISel] Port unmerge KnownBits tests to print<gisel-value-tracking>. NFC This takes the known-bits tests added in #112172 and ports them over to be a new print<gisel-value-tracking> test.	2025-08-20 20:57:14 +01:00
Benjamin Maxwell	478b4b012f	[AArch64][SME] Rework VG CFI information for streaming-mode changes (#152283 ) This patch reworks how VG is handled around streaming mode changes. Previously, for functions with streaming mode changes, we would: - Save the incoming VG in the prologue - Emit `.cfi_offset vg, <offset>` and `.cfi_restore vg` around streaming mode changes Additionally, for locally streaming functions, we would: - Also save the streaming VG in the prologue - Emit `.cfi_offset vg, <incoming VG offset>` in the prologue - Emit `.cfi_offset vg, <streaming VG offset>` and `.cfi_restore vg` around streaming mode changes In both cases, this ends up doing more than necessary and would be hard for an unwinder to parse, as using `.cfi_offset` in this way does not follow the semantics of the underlying DWARF CFI opcodes. So the new scheme in this patch is to: In functions with streaming mode changes (inc locally streaming) - Save the incoming VG in the prologue - Emit `.cfi_offset vg, <offset>` in the prologue (not at streaming mode changes) - Emit `.cfi_restore vg` after the saved VG has been deallocated - This will be in the function epilogue, where VG is always the same as the entry VG - Explicitly reference the incoming VG expressions for SVE callee-saves in functions with streaming mode changes - Ensure the CFA is not described in terms of VG in functions with streaming mode changes A more in-depth discussion of this scheme is available in: https://gist.github.com/MacDue/b7a5c45d131d2440858165bfc903e97b But the TLDR is that following this scheme, SME unwinding can be implemented with minimal changes to existing unwinders. All unwinders need to do is initialize VG to `CNTD` at the start of unwinding, then everything else is handled by standard opcodes (which don't need changes to handle VG).	2025-08-20 14:06:12 +01:00
Paul Walker	d6a688fb3d	[LLVM][CodeGen][SME] hasB16b16() is not sufficient to prove BFADD availability. (#154143 ) The FEAT_SVE_B16B16 arithmetic instructions are only available to streaming mode functions when SME2 is available.	2025-08-20 11:12:43 +01:00
jyli0116	9df7ca1f0f	[GlobalISel] Legalize Saturated Truncate instructions and intrinsics (#154340 ) Adds legalization support for `G_TRUNC_SSAT_S`, `G_TRUNC_SSAT_S`, `G_TRUNC_USAT_U` instructions for GlobalISel.	2025-08-20 10:37:22 +01:00
David Green	e3cf967cdf	[AArch64] Regenerate and update itofp.ll and fptoi.ll This updates the fp<->int tests to include some store(fptoi) and itofp(load) test cases. It also cuts down on the number of large vector cases which are not testing anything new.	2025-08-20 08:43:10 +01:00
AZero13	f94290cbff	[ValueTracking][GlobalISel] UCMP and SCMP cannot create undef or poison (#154404 ) They cannot make poison or undef, same for IR. They can only make -1, 0, or 1 Alive 2: https://alive2.llvm.org/ce/z/--Jd78	2025-08-20 08:41:27 +09:00
David Tellenbach	0542355147	[AArch64] Fix zero-register copying with zero-cycle moves (#154362 ) Fix incorrect super-register lookup when copying from $wzr on subtargets that lack zero-cycle zeroing but support 64-bit zero-cycle moves. When copying from $wzr, we used the wrong register class to lookup the super-register, causing $w0 = COPY $wzr to get expanded as $x0 = ORRXrr $xzr, undef $noreg, implicit $wzr, rather than the correct $x0 = ORRXrr $xzr, undef $xzr, implicit $wzr.	2025-08-19 21:07:16 +02:00
Tomer Shafir	ffddf33beb	[AArch64] Remove wrong processor feature (#151289 ) `fmov dX, dY` is not a preferred instruction. Previously introduced by: https://github.com/llvm/llvm-project/pull/144152	2025-08-19 21:20:09 +03:00
Benjamin Maxwell	86e23af6be	[AArch64][SME] Temporarily mark test as XFAIL (#154384 ) Marking this as XFAIL until https://github.com/llvm/llvm-project/pull/154325 lands	2025-08-19 17:33:08 +00:00
jyli0116	586858015e	[GlobalISel][AArch64] Add saturated truncate tests. NFC (#154329 ) Added GlobalISel tests for saturated truncate.	2025-08-19 16:11:11 +01:00
David Green	22b4021f01	[AArch64][GlobalISel] Add additional vecreduce.fadd and fadd 0.0 tests. NFC	2025-08-19 11:52:50 +01:00
Benjamin Maxwell	eb764040bc	[AArch64][SME] Implement the SME ABI (ZA state management) in Machine IR (#149062 ) ## Short Summary This patch adds a new pass `aarch64-machine-sme-abi` to handle the ABI for ZA state (e.g., lazy saves and agnostic ZA functions). This is currently not enabled by default (but aims to be by LLVM 22). The goal is for this new pass to more optimally place ZA saves/restores and to work with exception handling. ## Long Description This patch reimplements management of ZA state for functions with private and shared ZA state. Agnostic ZA functions will be handled in a later patch. For now, this is under the flag `-aarch64-new-sme-abi`, however, we intend for this to replace the current SelectionDAG implementation once complete. The approach taken here is to mark instructions as needing ZA to be in a specific ("ACTIVE" or "LOCAL_SAVED"). Machine instructions implicitly defining or using ZA registers (such as $zt0 or $zab0) require the "ACTIVE" state. Function calls may need the "LOCAL_SAVED" or "ACTIVE" state depending on the callee (having shared or private ZA). We already add ZA register uses/definitions to machine instructions, so no extra work is needed to mark these. Calls need to be marked by glueing Arch64ISD::INOUT_ZA_USE or Arch64ISD::REQUIRES_ZA_SAVE to the CALLSEQ_START. These markers are then used by the MachineSMEABIPass to find instructions where there is a transition between required ZA states. These are the points we need to insert code to set up or restore a ZA save (or initialize ZA). To handle control flow between blocks (which may have different ZA state requirements), we bundle the incoming and outgoing edges of blocks. Bundles are formed by assigning each block an incoming and outgoing bundle (initially, all blocks have their own two bundles). Bundles are then combined by joining the outgoing bundle of a block with the incoming bundle of all successors. These bundles are then assigned a ZA state based on the blocks that participate in the bundle. Blocks whose incoming edges are in a bundle "vote" for a ZA state that matches the state required at the first instruction in the block, and likewise, blocks whose outgoing edges are in a bundle vote for the ZA state that matches the last instruction in the block. The ZA state with the most votes is used, which aims to minimize the number of state transitions.	2025-08-19 10:00:28 +01:00
黃國庭	0773854575	[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have sufficient leading zero/sign bits (#152273 ) avgceil version : https://alive2.llvm.org/ce/z/2CKrRh Fixes #147773 --------- Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-18 16:36:26 +01:00
David Green	03912a1de5	[GlobalISel] Translate scalar sequential vecreduce.fadd/fmul as fadd/fmul. (#153966 ) A llvm.vector.reduce.fadd(float, <1 x float>) will be translated to G_VECREDUCE_SEQ_FADD with two scalar operands, which is illegal according to the verifier. This makes sure we generate a fadd/fmul instead.	2025-08-18 14:59:44 +00:00
David Green	8b52e5ac22	[AArch64] Update and cleanup irtranslator-reductions.ll. NFC	2025-08-18 15:30:23 +01:00
AZero13	0e52092ff7	[AArch64] Adjust comparison constant if adjusting it means less instructions (#151024 ) Prefer constants that require less instructions to materialize, in both Global-ISel and Selection-DAG	2025-08-18 14:56:45 +01:00
jofrn	e8e3e6e893	[LiveVariables] Mark use without implicit if defined at instr (#119446 ) LiveVariables will mark instructions with their implicit subregister uses. However, it will also mark the subregister as an implicit if its own definition is a subregister of it, i.e. `$r3 = OP val, implicit-def $r0_r1_r2_r3, ..., implicit $r2_r3`, even if it is otherwise unused, which defines $r3 on the same line it is used. This change ensures such uses are marked without implicit, i.e. `$r3 = OP val, implicit-def $r0_r1_r2_r3, ..., $r2_r3`. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-08-18 08:34:59 -04:00
Jonathan Cohen	c6fe567064	[AArch64][MachineCombiner] Combine sequences of gather patterns (#152979 ) Reland of #142941 Squashed with fixes for #150004, #149585 This pattern matches gather-like patterns where values are loaded per lane into neon registers, and replaces it with loads into 2 separate registers, which will be combined with a zip instruction. This decreases the critical path length and improves Memory Level Parallelism. rdar://151851094	2025-08-18 15:10:59 +03:00
David Green	144f3c4cbf	[AArch64] Adjust the scheduling info of SVE FCMP on Cortex-A510. (#153810 ) According to the SWOG, these have a lower throughput than other instructions. Mark them as taking multiple cycles to model that.	2025-08-15 15:45:33 +01:00
Gaëtan Bossu	9828745661	[AArch64][ISel] Select constructive EXT_ZZI pseudo instruction (#152554 ) The patch adds patterns to select the EXT_ZZI_CONSTRUCTIVE pseudo instead of the EXT_ZZI destructive instruction for vector_splice. This only works when the two inputs to vector_splice are identical. Given that registers aren't tied anymore, this gives the register allocator more freedom and a lot of MOVs get replaced with MOVPRFX. In some cases however, we could have just chosen the same input and output register, but regalloc preferred not to. This means we end up with some test cases now having more instructions: there is now a MOVPRFX while no MOV was previously needed.	2025-08-15 14:30:24 +01:00
David Green	649762cb04	Revert "[AArch64][GlobalISel] Add additional vecreduce.fadd and fadd 0.0 tests. NFC" This reverts commit 16314eb7312dab38d721c70f247f2117e9800704 as the test cases are failing under EXPENSIVE_CHECKS. Scalar vecreduce.fadd are not valid in GISel.	2025-08-15 14:23:53 +01:00
Gaëtan Bossu	69e105beec	[AArch64][ISel] Add unary vector_splice tests (NFC) (#152553 ) They use extract shuffles for fixed vectors, and llvm.vector.splice intrinsics for scalable vectors. In the previous tests using ld+extract+st, the extract was optimized away and replaced by a smaller load at the right offset. This meant we didn't really test the vector_splice ISD node.	2025-08-15 13:15:35 +01:00
Gaëtan Bossu	fdd2d4df12	[AArch64] Define constructive EXT_ZZI pseudo instruction (#152552 ) It will get expanded into MOVPRFX_ZZ and EXT_ZZI by the AArch64ExpandPseudo pass. This instruction takes a single Z register as input, as opposed to the existing destructive EXT_ZZI instruction. Note this patch only defines the pseudo, it isn't used in any ISel pattern yet. It will later be used for vector.extract.	2025-08-15 09:05:10 +01:00
Cullen Rhodes	393c21137e	[StatepointLowering] Use FrameIndex instead of TargetFrameIndex (#153555 ) TargetFrameIndex shouldn't be used as an operand to target-independent node such as a load. This causes ISel issues. #81635 fixed a similar issue with this code using a TargetConstant, instead of a Constant. Fixes #142314.	2025-08-15 08:02:31 +01:00
David Green	16314eb731	[AArch64][GlobalISel] Add additional vecreduce.fadd and fadd 0.0 tests. NFC	2025-08-15 07:06:34 +01:00
David Green	c5105c1e0a	[GlobalISel] Fix bitcast fewerElements with scalar narrow types. (#153364 ) For a <8 x i32> -> <2 x i128> bitcast, that under aarch64 is split into two halfs, the scalar i128 remainder was causing problems, causing a crash with invalid vector types. This makes sure they are handled correctly in fewerElementsBitcast.	2025-08-13 22:27:53 +01:00
Shoreshen	db96363c0a	[AMDGPU] Avoid put implicit_def into bundle that break reg's liveness (#142563 ) Cause: 1. `implicit_def` inside bundle does not count for define of reg in machineinst verifier 2. Including `implicit_def` will cause relative reg not define, result in `Bad machine code: Using an undefined physical register` in the machineinst verifier Fixes https://github.com/llvm/llvm-project/issues/139102 --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>	2025-08-13 10:41:44 +08:00
Trevor Gross	919021b0df	[Arm64EC] Add support for `half` (#152843 ) `f16` is passed and returned in vector registers on both x86 on AArch64, the same calling convention as `f32`, so it is a straightforward type to support. The calling convention support already exists, added as part of a6065f0fa55a ("Arm64EC entry/exit thunks, consolidated. (#79067)"). Thus, add mangling and remove the error in order to make `half` work. MSVC does not yet support `_Float16`, so for now this will remain an LLVM-only extension. Fixes the `f16` portion of https://github.com/llvm/llvm-project/issues/94434	2025-08-12 14:15:52 -07:00
Philip Reames	4d629f9744	[MIR] Remove std::variant from multiple save/restore point handling [nfc] (#153226 ) In review of bbde6b, I had originally proposed that we support the legacy text format. As review evolved, it bacame clear this had been a bad idea (too much complexity), but in order to let that patch finally move forward, I approved the change with the variant. This change undoes the variant, and updates all the tests to just use the array form.	2025-08-12 11:23:05 -07:00
Daniel Paoliello	c430e06fb5	[win][arm64ec] Fix duplicate errors with the dontcall attribute (#152810 ) Since the `dontcall-` attributes are checked both by `FastISel`/`GlobalISel` and `SelectionDAGBuilder`, and both `FastISel` and `GlobalISel` bail for calls on Arm64EC for AFTER doing the check, we ended up emitting duplicate copies of this error. This change moves the checking for `dontcall-` in `FastISel` and `GlobalISel` to after it has been successfully lowered.	2025-08-12 11:05:07 -07:00
Elizaveta Noskova	bbde6be841	[llvm] Support multiple save/restore points in mir (#119357 ) Currently mir supports only one save and one restore point specification: ``` savePoint: '%bb.1' restorePoint: '%bb.2' ``` This patch provide possibility to have multiple save and multiple restore points in mir: ``` savePoints: - point: '%bb.1' restorePoints: - point: '%bb.2' ``` Shrink-Wrap points split Part 3. RFC: https://discourse.llvm.org/t/shrink-wrap-save-restore-points-splitting/83581 Part 1: https://github.com/llvm/llvm-project/pull/117862 Part 2: https://github.com/llvm/llvm-project/pull/119355 Part 4: https://github.com/llvm/llvm-project/pull/119358 Part 5: https://github.com/llvm/llvm-project/pull/119359	2025-08-12 16:34:29 +03:00
Ricardo Jesus	ef5e65d27b	[AArch64] Fix stp kill when merging forward. (#152994 ) As an alternative to #149177, iterate through all instructions in `AArch64LoadStoreOptimizer`.	2025-08-12 14:19:43 +01:00
David Green	5d099c2831	[AArch64][GlobalISel] Add 128bit insert and extract vector test coverage. NFC	2025-08-12 13:50:36 +01:00
David Sherwood	7f763d9b48	[AArch64] Support symmetric complex deinterleaving with higher factors (#151295 ) For loops such as this: ``` struct foo { double a, b; }; void foo(struct foo dst, struct foo src, int n) { for (int i = 0; i < n; i++) { dst[i].a += src[i].a * 3.2; dst[i].b += src[i].b * 3.2; } } ``` the complex deinterleaving pass will spot that the deinterleaving associated with the structured loads cancels out the interleaving associated with the structured stores. This happens even though they are not truly "complex" numbers because the pass can handle symmetric operations too. This is great because it means we can then perform normal loads and stores instead. However, we can also do the same for higher interleave factors, e.g. 4: ``` struct foo { double a, b, c, d; }; void foo(struct foo dst, struct foo src, int n) { for (int i = 0; i < n; i++) { dst[i].a += src[i].a * 3.2; dst[i].b += src[i].b * 3.2; dst[i].c += src[i].c * 3.2; dst[i].d += src[i].d * 3.2; } } ``` This PR extends the pass to effectively treat such structures as a set of complex numbers, i.e. ``` struct foo_alt { std::complex<double> x, y; }; ``` with equivalence between members: ``` foo_alt.x.real == foo.a foo_alt.x.imag == foo.b foo_alt.y.real == foo.c foo_alt.y.imag == foo.d ``` I've written the code to handle sets with arbitrary numbers of complex values, but since we only support interleave factors between 2 and 4 I've restricted the sets to 1 or 2 complex numbers. Also, for now I've restricted support for interleave factors of 4 to purely symmetric operations only. However, it could also be extended to handle complex multiplications, reductions, etc. Fixes: https://github.com/llvm/llvm-project/issues/144795	2025-08-12 11:05:15 +01:00
Seraphimt	296e057d0b	[DAG] SelectionDAG::canCreateUndefOrPoison - add ISD::FMA/FMAD + tests (#152187 ) In SelectionDAG::canCreateUndefOrPoison add case ISD::FMA/FMAD + tests. Fixing #147693 --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-08-12 17:17:46 +09:00
Benjamin Maxwell	d0c9599c41	[AArch64][SME] Use entry pstate.sm for conditional streaming-mode changes (#152169 ) We only do conditional streaming mode changes in two cases: - Around calls in streaming-compatible functions that don't have a streaming body - At the entry/exit of streaming-compatible functions with a streaming body In both cases, the condition depends on the entry pstate.sm value. Given this, we don't need to emit calls to __arm_sme_state at every mode change. This patch handles this by placing a "AArch64ISD::ENTRY_PSTATE_SM" node in the entry block and copying the result to a register. The register is then used whenever we need to emit a conditional streaming mode change. The "ENTRY_PSTATE_SM" node expands to a call to "__arm_sme_state" only if (after SelectionDAG) the function is determined to have streaming-mode changes. This has two main advantages: 1. It allows back-to-back conditional smstart/stop pairs to be folded 2. It has the correct behaviour for EH landing pads - These are entered with pstate.sm = 0, and should switch mode based on the entry pstate.sm - Note: This is not fully implemented yet	2025-08-12 09:15:30 +01:00
Trevor Gross	00c4be3c9e	[Test] Add and update tests for `lrint`/`llrint` (NFC) (#152662 ) Many backends are missing either all tests for lrint, or specifically those for f16, which currently crashes for `softPromoteHalf` targets. For a number of popular backends, do the following: * Ensure f16, f32, f64, and f128 are all covered * Ensure both a 32- and 64-bit target are tested, if relevant * Add `nounwind` to clean up CFI output * Add a test covering the above if one did not exist * Always specify the integer type in intrinsic calls There are quite a few FIXMEs here, especially for `f16`, but much of this will be resolved in the near future.	2025-08-12 09:56:51 +09:00
Cullen Rhodes	5bd39a0060	[AArch64][nfc] Remove duplicate [us]addl tests (#152664 ) in the following list we keep the first test: - extadd[us]_v8i8_i16, test_vaddl_[us]8, [us]addl8h - extadd[us]_v4i16_i32, test_vaddl_[us]16, [us]addl4s - extadd[us]_v2i32_i64, test_vaddl_[us]32, [us]addl2d	2025-08-11 10:21:41 +01:00
Sander de Smalen	2ad1d77b17	[AArch64] Match constants in SelectSMETileSlice (#151494 ) If the slice is a constant then it should try to use `WZR + <imm>` addressing mode if the constant fits the range.	2025-08-11 10:19:26 +01:00
Fabian Ritter	96775e9229	[GISel] Handle Flags in G_PTR_ADD Combines (#152495 ) So far, GlobalISel's G_PTR_ADD combines have ignored MIFlags like nuw, nusw, and inbounds. That was in many cases unnecessarily conservative and in others unsound, since reassociations re-used the existing G_PTR_ADD instructions without invalidating their flags. This patch aims to improve that. I've checked the transforms in this PR with Alive2 on corresponding middle-end IR constructs. A longer-term goal would be to encapsulate the logic that determines which GEP/ISD::PTRADD/G_PTR_ADD flags can be preserved in which case, since this occurs in similar forms in the middle end, the SelectionDAG combines, and the GlobalISel combines here. For SWDEV-516125.	2025-08-11 10:34:45 +02:00
AZero13	e6b4daf48c	[AArch64] Support MI and PL (#150314 ) Now, why would we want to do this? There are a small number of places where this works: 1. It helps peepholeopt when less flag checking. 2. It allows the folding of things such as x - 0x80000000 < 0 to be folded to cmp x, register holding this value 3. We can refine the other passes over time for this.	2025-08-11 07:41:38 +01:00
woruyu	95b16d1264	[DAG] Fold trunc(abdu(x,y)) and trunc(abds(x,y)) if they have sufficient leading zero/sign bits (#151471 ) This PR resolves https://github.com/llvm/llvm-project/issues/147683 --------- Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-08 10:43:14 +01:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Benjamin Maxwell	94c48a21bb	[AArch64][SVE] Fix hang in VECTOR_HISTOGRAM DAG combine (#152539 ) The histogram DAG combine went into an infinite loop of creating the same histogram node due to an incorrect use of the `refineUniformBase` and `refineIndexType` APIs. These APIs take SDValues by reference (SDValue&) and return `true` if they were "refined" (i.e., set to new values). Previously, this DAG combine would create the `Ops` array (used to create the new histogram node) before calling the `refine*` APIs, which copies the SDValues into the array, meaning the updated values were not used to create the new histogram node. Reproducer: https://godbolt.org/z/hsGWhTaqY (it will timeout)	2025-08-08 09:59:24 +01:00
Cullen Rhodes	e9d71efb83	[AArch64] Mark [usp]mull, [us]addl, [us]abdl as commutative (#152158 ) Fixes #61461.	2025-08-08 09:35:28 +01:00
Cullen Rhodes	3a561bc662	[AArch64] Add tests for commutable [usp]mull, [us]addl, [us]abdl (#152512 ) Precommit tests for PR #152158.	2025-08-08 08:19:34 +01:00
David Green	229ab5aa2b	[AArch64] Drop flags from BSP pseudos (#151856 ) This prevents cases where some of the operands match from hitting verifier errors with kill flags. These nodes should have been removed earlier in most cases. Fixes the direct issue from #149380. #151855 cleans up the codegen.	2025-08-08 07:47:56 +01:00

1 2 3 4 5 ...

9165 Commits