llvm-project

Author	SHA1	Message	Date
Theodoros Theodoridis	d15b7a83a7	[llvm][LICM] Limit multi-use BOAssociation to FP and Vector (#149829 ) Limit the re-association of BOps with multiple users to FP and Vector arithmetic.	2025-08-14 11:56:55 +01:00
Simon Pilgrim	c96d0da62b	[X86] lowerShuffleAsLanePermuteAndPermute - ensure we've simplified the demanded shuffle mask elts before testing for a matching shuffle (#153554 ) When lowering using sublane shuffles, we can sometimes end up with the same mask as we started with. We already bail in these occasions, but we weren't fully simplifying the new shuffle mask before testing if it matched. Fixes #153457	2025-08-14 10:47:11 +01:00
tangaac	9315d701eb	[LoongArch] Optimize inserting extracted element for v4i64/v8i32 (#152629 )	2025-08-14 17:06:50 +08:00
Björn Pettersson	5e7924a3cb	[SelectionDAG] Handle more opcodes in isGuaranteedNotToBeUndefOrPoison (#147019 ) Add special handling of EXTRACT_SUBVECTOR, INSERT_SUBVECTOR, EXTRACT_VECTOR_ELT, INSERT_VECTOR_ELT and SCALAR_TO_VECTOR in isGuaranteedNotToBeUndefOrPoison. Make use of DemandedElts to improve the analysis and only check relevant elements for each operand. Also start using DemandedElts in the recursive calls that check isGuaranteedNotToBeUndefOrPoison for all operands for operations that do not create undef/poison. We can do that for a number of elementwise operations for which the DemandedElts can be applied to every operand (e.g. ADD, OR, BITREVERSE, TRUNCATE).	2025-08-14 09:05:15 +00:00
Shoreshen	04aebbfbe2	[AMDGPU] Delete AMDGPU Unify Metadata pass (#153548 ) Fixes #153150	2025-08-14 16:16:32 +08:00
Piotr Fusik	18782db4c9	[RISCV] Improve instruction selection for most significant bit extraction (#151687 ) (seteq (and X, 1<<XLEN-1), 0) -> (xori (srli X, XLEN-1), 1) (seteq (and X, 1<<31), 0) -> (xori (srliw X, 31), 1) // RV64 (setlt X, 0) -> (srli X, XLEN-1) // SRLI is compressible (setlt (sext X), 0) -> (srliw X, 31) // RV64	2025-08-14 09:59:43 +02:00
XChy	f393f2a61e	[BranchFolding] Avoid moving blocks to fall through to an indirect target (#152916 ) Depend on #152591 to fix https://github.com/llvm/llvm-project/issues/149023. Similar to an EH pad, there is no real advantage in "falling through" to an indirect target of an INLINEASM_BR. And multiple indirect targets of inline asm at the end of a function may be rotated infinitely. Therefore, this patch avoids such optimization on indirect target of inline asm as fall through.	2025-08-14 16:18:36 +09:00
quic_hchandel	71b066e3a2	[RISCV] Add CodeGen support for qc.insbi and qc.insb insert instructions (#152447 ) This patch adds CodeGen support for qc.insbi and qc.insb instructions defined in the Qualcomm uC Xqcibm extension. qc.insbi and qc.insb inserts bits into destination register from immediate and register operand respectively. A sequence of `xor`, `and` & `xor` depending on appropriate conditions are converted to `qc.insbi` or `qc.insb` which depends on the immediate's value.	2025-08-14 12:08:28 +05:30
paperchalice	b671979b7e	[NVPTX] Remove `UnsafeFPMath` uses (#151479 ) Remove `UnsafeFPMath` in NVPTX part, it blocks some bugfixes related to clang and the ultimate goal is to remove `resetTargetOptions` method in `TargetMachine`, see FIXME in `resetTargetOptions`. See also https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract	2025-08-14 08:42:29 +08:00
David Green	c5105c1e0a	[GlobalISel] Fix bitcast fewerElements with scalar narrow types. (#153364 ) For a <8 x i32> -> <2 x i128> bitcast, that under aarch64 is split into two halfs, the scalar i128 remainder was causing problems, causing a crash with invalid vector types. This makes sure they are handled correctly in fewerElementsBitcast.	2025-08-13 22:27:53 +01:00
zhijian lin	4936fc5a56	[PowerPC][NFC] Pre-commit test case: use millicode for strlen instead of libcal (#153466 ) add test case to test lib call are used for the strlen.	2025-08-13 16:34:29 -04:00
David Green	06d2d1e156	[ARM] Protect against odd sized vectors in isVTRNMask and friends (#153413 ) Fixes the issue reported on #153138, where odd-sized vectors would cause the checks to iterate off the end of the mask.	2025-08-13 20:57:46 +01:00
Amy Kwan	63cc2e390d	[PowerPC][CodeGen] Expand ISD::AssertNoFPClass for ppc_fp128 (#152357 ) 780054d3ff18075a6bc433029f336931792b1d2d added support for `ISD::AssertNoFPClass`. This ISD node can be used with the `ppc_fp128` type, which is really just two `f64s` and requires expanding when used with `ISD::AssertNoFPClass`. Without the support for expanding the result, we get an assertion because the legalizer does not know how to expand the results of `ppc_fp128` with `ISD::AssertNoFPClass`. ``` ExpandFloatResult #0: t7: ppcf128 = AssertNoFPClass t5, TargetConstant:i32<3> LLVM ERROR: Do not know how to expand the result of this operator! ``` Thus, this patch aims to add support for the expand so we no longer assert. This fixes #151375.	2025-08-13 15:00:32 -04:00
Robert Imschweiler	d21feb5e66	[AMDGPU] Fix crash for inline-asm inputs of type MVT::Other (#153425 )	2025-08-13 17:27:31 +02:00
Mikhail R. Gadelha	489a41d474	[RISCV][VLOPT] Added support for the zvbc and the remaining zvbb instructions (#153234 ) Follow-up PR to #153071, adding the remaining zvbb instructions (VBREV8_V and VREV8_V), plus the zvbc instruction (VCLMUL_VV, VCLMUL_VX, VCLMULH_VV, VCLMULH_VX).	2025-08-13 14:43:25 +00:00
Petar Avramovic	4d4966d481	AMDGPU/GlobalISel: Add regbanklegalize rules for ptr-add (#153175 )	2025-08-13 15:49:48 +02:00
Sergey Kachkov	bdddff2488	[RISCV][RVV] Prohibit conversion of scalar store to single-element vse if vmv.x.s has multiple uses (#152112 ) Godbolt example: https://godbolt.org/z/ThdfP475a In the example single-element vse is used to store reduction result instead of scalar store ([this optimization was introduced by this patch](https://reviews.llvm.org/D109482)). However, vmv.x.s can't be eliminated here because it has other uses (e.g. CopyToReg), so it seems more profitable to use scalar store (we already have store value in a scalar register, and can save one vsetvli which is likely to be required for single-element vse). The proposed solution is to this transform only if vmv.x.s has one use (in store instruction)	2025-08-13 13:10:27 +03:00
Diana Picus	420a5de1a4	[AMDGPU] Ignore inactive VGPRs in .vgpr_count (#149052 ) When using the `amdgcn.init.whole.wave` intrinsic, we add dummy VGPR arguments with the purpose of preserving their inactive lanes. The pattern may look something like this: ``` entry: call amdgcn.init.whole.wave branch to shader or tail shader: $vInactive = IMPLICIT_DEF ; Tells regalloc it's safe to use the active lanes actual code... tail: call amdgcn.cs.chain [...], implicit $vInactive ``` We should not report these VGPRs in the `.vgpr_count` metadata. This patch achieves that goal by ignoring meta instructions and calls. This should be safe since if those registers are actually used in any other context, they will be counted there. The same reasoning applies in the general case, so we don't explicitly check for the existence of `init.whole.wave`. This is a reworked version of #133242, which was reverted in #144039 and split into smaller bits.	2025-08-13 10:47:00 +02:00
Jasmine Tang	d32793ca6e	Revert "[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128" (#153360 ) Reverts llvm/llvm-project#149461 The first test w/ memcmp in `test/neon/test_neon_wasm_simd.cpp` in the Emscripten test suite has failed. This PR applies a revert so I can take a closer look at it Test case link: https://github.com/emscripten-core/emscripten/blob/main/test/neon/test_neon_wasm_simd.cpp Compile option: `em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128 -o something.js` Original comment report: https://github.com/llvm/llvm-project/pull/149461#issuecomment-3181652746	2025-08-13 07:41:44 +00:00
Shoreshen	db96363c0a	[AMDGPU] Avoid put implicit_def into bundle that break reg's liveness (#142563 ) Cause: 1. `implicit_def` inside bundle does not count for define of reg in machineinst verifier 2. Including `implicit_def` will cause relative reg not define, result in `Bad machine code: Using an undefined physical register` in the machineinst verifier Fixes https://github.com/llvm/llvm-project/issues/139102 --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>	2025-08-13 10:41:44 +08:00
Alex MacLean	9e6b29137b	[NVPTX] miscellaneous minor cleanup (NFC) (#152329 )	2025-08-12 18:15:01 -07:00
Stanislav Mekhanoshin	d0ee82040c	[AMDGPU] Add s_barrier_init\|join\|leave instructions (#153296 )	2025-08-12 15:07:07 -07:00
Adam Yang	8710571aba	[AMDGPU] Fixed llvm-debuginfo-analyzer for AMDGPU. (#145125 ) Constructing Target triple with `ObjectFile::makeTriple` instead of just with `Arch` and leaving the rest unknown. Also creating the subtarget with the `CPU`. AMDGPU needs the full triple and `CPU` to disassemble correctly. To run a full test, also fixed a failure in `SIPreAllocateWWMRegs` with the `$noreg` operand in `DBG_VALUE`. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-08-12 22:04:52 +00:00
Farzon Lotfi	1ca8ad29db	[SPIRV] Create a new OpSelect selector and fix register types. (#152311 ) fixes #135572 There are two problems that are causing problems first register types are copied from older registers instead of evaluating the spirv types. Second the way OpSelect is defined in SPIRVInstrInfo.td we always default to integer for TernOpTyped. There seems to be a problem of multiple matches in the getMatchTable so when executeMatchTable runs we aren't getting the right opSelect. Correcting the tablegen wasn't very easy so instead created an emitter for Select that evaluated the register types. this passes the original llvm/test/CodeGen/SPIRV/instructions/select.ll tests and the new float ones I'm adding in issue-135572-emit-float-opselect.ll	2025-08-12 17:43:30 -04:00
Trevor Gross	919021b0df	[Arm64EC] Add support for `half` (#152843 ) `f16` is passed and returned in vector registers on both x86 on AArch64, the same calling convention as `f32`, so it is a straightforward type to support. The calling convention support already exists, added as part of a6065f0fa55a ("Arm64EC entry/exit thunks, consolidated. (#79067)"). Thus, add mangling and remove the error in order to make `half` work. MSVC does not yet support `_Float16`, so for now this will remain an LLVM-only extension. Fixes the `f16` portion of https://github.com/llvm/llvm-project/issues/94434	2025-08-12 14:15:52 -07:00
Min-Yih Hsu	ca05058b49	[IA][RISCV] Recognize deinterleaved loads that could lower to strided segmented loads (#151612 ) Turn the following deinterleaved load patterns ``` %l = masked.load(%ptr, /mask=/110110110110, /passthru=/poison) %f0 = shufflevector %l, [0, 3, 6, 9] %f1 = shufflevector %l, [1, 4, 7, 10] %f2 = shufflevector %l, [2, 5, 8, 11] ``` into ``` %s = riscv.vlsseg2(/passthru=/poison, %ptr, /mask=/1111) %f0 = extractvalue %s, 0 %f1 = extractvalue %s, 1 %f2 = poison ``` The mask `110110110110` is regarded as 'gap mask' since it effectively skips the entire third field / component. Similarly, turning the following snippet ``` %l = masked.load(%ptr, /mask=/110000110000, /passthru=/poison) %f0 = shufflevector %l, [0, 3, 6, 9] %f1 = shufflevector %l, [1, 4, 7, 10] ``` into ``` %s = riscv.vlsseg2(/passthru=/poison, %ptr, /mask=/1010) %f0 = extractvalue %s, 0 %f1 = extractvalue %s, 1 ``` Right now this patch only tries to detect gap mask from a constant mask supplied to a masked.load/vp.load.	2025-08-12 14:08:18 -07:00
Philip Reames	4d629f9744	[MIR] Remove std::variant from multiple save/restore point handling [nfc] (#153226 ) In review of bbde6b, I had originally proposed that we support the legacy text format. As review evolved, it bacame clear this had been a bad idea (too much complexity), but in order to let that patch finally move forward, I approved the change with the variant. This change undoes the variant, and updates all the tests to just use the array form.	2025-08-12 11:23:05 -07:00
Sam Elliott	9b93ccbcbe	[RISCV] Fix Immediate Check for Xqcibi UGT (#153141 ) The check should be about unsigned 16-bit immediates, not signed ones. This is not a bug per-se, as the old codegen was correct for the uint16_max case, it just didn't end up using `qc.e.bgeui`, which we would prefer it did.	2025-08-12 11:06:00 -07:00
Daniel Paoliello	c430e06fb5	[win][arm64ec] Fix duplicate errors with the dontcall attribute (#152810 ) Since the `dontcall-` attributes are checked both by `FastISel`/`GlobalISel` and `SelectionDAGBuilder`, and both `FastISel` and `GlobalISel` bail for calls on Arm64EC for AFTER doing the check, we ended up emitting duplicate copies of this error. This change moves the checking for `dontcall-` in `FastISel` and `GlobalISel` to after it has been successfully lowered.	2025-08-12 11:05:07 -07:00
Jasmine Tang	348f01f89c	[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128 (#149461 ) Fixes https://github.com/llvm/llvm-project/issues/149230 Previously, even with simd enabled via `-mattr=+simd128`, the compiler cannot utilize v128 to optimize loads and setcc of i128, instead legalizing it to consecutive i64s. This PR then adds support for setcc of i128 by converting them to v16i8's anytrue and alltrue; consequently, this benefits memcmp of 16 bytes or more (when simd128 is present). The check for enabling this optimization is if the comparison operand is either a load or an integer in i128, with the comparison code being either `EQ \| NE`, without `NoImplicitFloat` function flag. Inspiration taken from RISCV's isel lowering.	2025-08-12 11:04:37 -07:00
Mikhail R. Gadelha	d7c7fbd20d	Pre-commit tests for PR adding more instruction to the vlopt pass	2025-08-12 14:19:44 -03:00
Kane Wang	74fbdbf91f	[RISCV][GISel][NFC] Add MIR legalizer tests for G_UADDE (rv32 & rv64) (#152827 ) Add MIR tests that exercise legalization of the G_UADDE (unsigned add with extend/carry) operation for RISC-V targets.	2025-08-12 09:59:17 -07:00
Farzon Lotfi	544562ebc2	[DirectX] Remove lifetime intrinsics and run Dead Store Elimination (#152636 ) fixes #151764 This fix has two parts first we track all lifetime intrinsics and if they are users of an alloca of a target extention like dx.RawBuffer then we eliminate those memory intrinsics when we visit the alloca. We do step one to allow us to use the Dead Store Elimination Pass. This removes the alloca and simplifies the use of the target extention back to using just the global. That keeps things in a form the DXILBitcodeWriter is expecting. Obviously to pull this off we needed to bring back the legacy pass manager plumbing for the DSE pass and hook it up into the DirectX backend. The net impact of this change is that DML shader pass rate went from 89.72% (4268 successful compilations) to 90.98% (4328 successful compilations).	2025-08-12 12:42:08 -04:00
Orlando Cazalet-Hyams	54f92c7806	[RemoveDIs][AMDGPU] Replace defunct getAssignmentMarkers call (#153212 ) Not quite NFC as it looks like the original intrinsic-handling code never got updated to use records. This was never caught because that code wasn't tested. I've adjusted an existing test so the behaviour is now covered.	2025-08-12 17:20:38 +01:00
Nathan Gauër	6abbfcae6e	[SPIR-V] Fix OpVectorShuffle undef emission (#151993 ) When an undef/poison value is lowered as a an immediate, it becomes -1. When reaching the backend, the -1 was printed as operand to OpVectorShuffle instead of the proper 0xFFFFFFFF. From the SPIR-V spec: A Component literal may also be FFFFFFFF, which means the corresponding result component has no source and is undefined. The reason the existing tests were passing `spirv-val` was because the binary format was used as output, meaning the `-1` was lowered to `0xFFFFFFFF`. But when the text format is used, `-1` is emitted as-is which is wrong. Fixes #151691	2025-08-12 15:50:48 +00:00
Dan Salvato	b09b05a83e	[M68k] Fix incorrect boolean content type (#152572 ) M68k's SETCC instruction (`scc`) distinctly fills the destination byte with all 1s. If boolean contents are set to `ZeroOrOneBooleanContent`, LLVM can mistakenly think the destination holds `0x01` instead of `0xff` and emit broken code as a result. This change corrects the boolean content type to `ZeroOrNegativeOneBooleanContent`. For example, this IR: ```llvm define dso_local signext range(i8 0, 2) i8 @testBool(i32 noundef %a) local_unnamed_addr #0 { entry: %cmp = icmp eq i32 %a, 4660 %. = zext i1 %cmp to i8 ret i8 %. } ``` would previously build as: ```asm testBool: ; @testBool cmpi.l #4660, (4,%sp) seq %d0 and.l #255, %d0 rts ``` Notice the `zext` is erroneously not clearing the low bits, and thus the register returns with 255 instead of 1. This patch fixes the issue: ```asm testBool: ; @testBool cmpi.l #4660, (4,%sp) seq %d0 and.l #1, %d0 rts ``` Most of the tests containing `scc` suffered from the same value error as described above, so those tests have been updated to match the new output (which also logically corrects them).	2025-08-12 08:46:41 -07:00
Koakuma	111219ed27	[SPARC] Use FMA instructions when we have UA2007 (#148434 )	2025-08-12 22:46:00 +07:00
Mikhail R. Gadelha	d455d45654	[RISCV][VLOPT] Added support for several vector crypto instructions (#153071 ) This PR adds support for the following instructions to the RISC-V VLOptimizer: vandn.vx, vandn.vv, vbrev.v, vclz.v, vcpop.v, vctz.v, vror.vi, vror.vx, vror.vv, vrol.vx, vrol.vv.	2025-08-12 12:05:03 -03:00
Elizaveta Noskova	bbde6be841	[llvm] Support multiple save/restore points in mir (#119357 ) Currently mir supports only one save and one restore point specification: ``` savePoint: '%bb.1' restorePoint: '%bb.2' ``` This patch provide possibility to have multiple save and multiple restore points in mir: ``` savePoints: - point: '%bb.1' restorePoints: - point: '%bb.2' ``` Shrink-Wrap points split Part 3. RFC: https://discourse.llvm.org/t/shrink-wrap-save-restore-points-splitting/83581 Part 1: https://github.com/llvm/llvm-project/pull/117862 Part 2: https://github.com/llvm/llvm-project/pull/119355 Part 4: https://github.com/llvm/llvm-project/pull/119358 Part 5: https://github.com/llvm/llvm-project/pull/119359	2025-08-12 16:34:29 +03:00
Ricardo Jesus	ef5e65d27b	[AArch64] Fix stp kill when merging forward. (#152994 ) As an alternative to #149177, iterate through all instructions in `AArch64LoadStoreOptimizer`.	2025-08-12 14:19:43 +01:00
Petar Avramovic	f88be47fbf	AMDGPU/GlobalISel: Switch a few tests to new-reg-bank-select (#153174 )	2025-08-12 15:03:31 +02:00
David Green	5d099c2831	[AArch64][GlobalISel] Add 128bit insert and extract vector test coverage. NFC	2025-08-12 13:50:36 +01:00
XChy	2a49719525	[SelectionDAGBuilder] Look for appropriate INLINEASM_BR instruction to verify (#152591 ) Partially fix #149023. The original code `MRI.def_begin(Reg)->getParent()` may return the incorrect MI, as the physical register `Reg` may have multiple definitions. This patch selects the correct MI to verify by comparing the MBB of each definition. New testcase hangs with -O1/2/3 enabled. The BranchFolding may be to blame.	2025-08-12 12:37:56 +00:00
Andrei Safronov	48da8489f2	[Xtensa] Add esp32/esp8266 cpus implementation. (#152409 ) Add Xtensa esp32 and esp8266 cpus. Implement target parser to recognise Xtensa hardware features.	2025-08-12 15:17:36 +03:00
zhijian lin	598f21e9fc	[PowerPC] need to set CallFrameSize for the pass PPCReduceCRLogicals when insert a new block (#151017 ) In the [ [CodeGen] Store call frame size in MachineBasicBlock](https://reviews.llvm.org/D156113), it mentions When a basic block has been split in the middle of a call sequence. the call frame size may not be zero, it need to set the setCallFrameSize for the new MachineBasicBlock. but in the function `splitMBB(BlockSplitInfo &BSI)` in the llvm/lib/Target/PowerPC/PPCReduceCRLogicals.cpp , it do not setCallFrameSzie for the new MachineBasicBlock `NewMBB`, we will setCallFrameSzie in the patch. the patch fix the crash mention in https://github.com/llvm/llvm-project/pull/144594#issuecomment-2993736654	2025-08-12 20:30:28 +09:00
Jim M. R. Teichgräber	5d54a576fe	[AMDGPU] AMDGPULateCodeGenPrepare Legacy PM: replace `setPreservesAll()` with `setPreservesCFG()` (#148167 ) This PR depends on #148165; the first commit (90f1d0a881a21a8b4f192622d798c290770fda63) belongs to that PR. The changes are distinct, so separate PRs seemed like the best option. I don't have commit access, so I couldn't use user-branches to mark the dependency. As AMDGPULateCodeGenPrepare actually performs changes that invalidate Uniformity Analysis; use `setPreservesCFG()` to mark this, instead of `setPreservesAll()` which wrongly includes preserving Uniformity Analysis. Note that before #148165, this would still have preserved Uniformity Analysis, hence the dependency. In addition, `amdgpu/llc-pipeline.cc` needs to be changed when both changes are in effect, but those changes would make the test fail if the PRs weren't based on one another. Note on why this hasn't caused issues so far: It just so happens that AMDGPULateCodeGenPrepare is always immediately followed by AMDGPUUnifyDivergentExitNodes, which does invalidate most analyses, including Uniformity. And because UnifyDivergentExitNodes only looks at terminators, and LateCGP seemingly does not replace uniform values with divergent values, or divergent values with uniform values, and it only inserts new values that are not looked at by UnifyDivergentExitNodes, this bug remained hidden. --- I ran `git-clang-format` on my changes. I tested them using the `check-llvm` target; no unexpected failures occurred after I made the change to `amdgpu/llc-pipeline.ll`.	2025-08-12 19:40:02 +09:00
David Sherwood	7f763d9b48	[AArch64] Support symmetric complex deinterleaving with higher factors (#151295 ) For loops such as this: ``` struct foo { double a, b; }; void foo(struct foo dst, struct foo src, int n) { for (int i = 0; i < n; i++) { dst[i].a += src[i].a * 3.2; dst[i].b += src[i].b * 3.2; } } ``` the complex deinterleaving pass will spot that the deinterleaving associated with the structured loads cancels out the interleaving associated with the structured stores. This happens even though they are not truly "complex" numbers because the pass can handle symmetric operations too. This is great because it means we can then perform normal loads and stores instead. However, we can also do the same for higher interleave factors, e.g. 4: ``` struct foo { double a, b, c, d; }; void foo(struct foo dst, struct foo src, int n) { for (int i = 0; i < n; i++) { dst[i].a += src[i].a * 3.2; dst[i].b += src[i].b * 3.2; dst[i].c += src[i].c * 3.2; dst[i].d += src[i].d * 3.2; } } ``` This PR extends the pass to effectively treat such structures as a set of complex numbers, i.e. ``` struct foo_alt { std::complex<double> x, y; }; ``` with equivalence between members: ``` foo_alt.x.real == foo.a foo_alt.x.imag == foo.b foo_alt.y.real == foo.c foo_alt.y.imag == foo.d ``` I've written the code to handle sets with arbitrary numbers of complex values, but since we only support interleave factors between 2 and 4 I've restricted the sets to 1 or 2 complex numbers. Also, for now I've restricted support for interleave factors of 4 to purely symmetric operations only. However, it could also be extended to handle complex multiplications, reductions, etc. Fixes: https://github.com/llvm/llvm-project/issues/144795	2025-08-12 11:05:15 +01:00
Seraphimt	296e057d0b	[DAG] SelectionDAG::canCreateUndefOrPoison - add ISD::FMA/FMAD + tests (#152187 ) In SelectionDAG::canCreateUndefOrPoison add case ISD::FMA/FMAD + tests. Fixing #147693 --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-08-12 17:17:46 +09:00
Benjamin Maxwell	d0c9599c41	[AArch64][SME] Use entry pstate.sm for conditional streaming-mode changes (#152169 ) We only do conditional streaming mode changes in two cases: - Around calls in streaming-compatible functions that don't have a streaming body - At the entry/exit of streaming-compatible functions with a streaming body In both cases, the condition depends on the entry pstate.sm value. Given this, we don't need to emit calls to __arm_sme_state at every mode change. This patch handles this by placing a "AArch64ISD::ENTRY_PSTATE_SM" node in the entry block and copying the result to a register. The register is then used whenever we need to emit a conditional streaming mode change. The "ENTRY_PSTATE_SM" node expands to a call to "__arm_sme_state" only if (after SelectionDAG) the function is determined to have streaming-mode changes. This has two main advantages: 1. It allows back-to-back conditional smstart/stop pairs to be folded 2. It has the correct behaviour for EH landing pads - These are entered with pstate.sm = 0, and should switch mode based on the entry pstate.sm - Note: This is not fully implemented yet	2025-08-12 09:15:30 +01:00
Fabian Ritter	e9ece175f9	[AMDGPU][GISel] Only fold flat offsets if they are inbounds (#153001 ) For flat memory instructions where the address is supplied as a base address register with an immediate offset, the memory aperture test ignores the immediate offset. Currently, ISel does not respect that, which leads to miscompilations where valid input programs crash when the address computation relies on the immediate offset to get the base address in the proper memory aperture. Global or scratch instructions are not affected. This patch only selects flat instructions with immediate offsets from address computations with the inbounds flag: If the address computation does not leave the bounds of the allocated object, it cannot leave the bounds of the memory aperture and is therefore safe to handle with an immediate offset. Relevant tests are in fold-gep-offset.ll. Analogous to #132353 for SDAG (which is not yet in a mergeable state, its progress is currently blocked by #146076). Fixes SWDEV-516125 for GISel.	2025-08-12 10:14:20 +02:00

1 2 3 4 5 ...

60437 Commits