llvm-project

Author	SHA1	Message	Date
Nikita Popov	7ea7ccd24d	[PowerPC][AIX] Specify pointer info and alignment for stack store (#144526 ) When lowering call arguments to stack, specify a stack MPI, as well as the stack alignment, instead of using the defaults (which would be an unknown location with ABI alignment). I believe the asm diffs are just changes in scheduling.	2025-06-18 10:50:17 +02:00
Matt Arsenault	7b9d10d2e6	PowerPC: Fix using long double libm functions for f128 intrinsics (#144382 ) This wasn't setting the correct libcall names, which default to the l suffixed libm names.	2025-06-18 13:26:15 +09:00
Matt Arsenault	af49a650e1	PowerPC: Add baseline tests for more f128 libcall handling (#144381 ) Some of these incorrectly call the l suffixed version of libm functions and others assert.	2025-06-18 13:23:17 +09:00
Nikita Popov	76ea1db174	[PowerPC] Split test into assembly and MIR variants (NFC) So that both can be generated.	2025-06-17 15:16:24 +02:00
Nikita Popov	3451cd5d20	[PowerPC] Regenerate MIR test checks (NFC)	2025-06-17 15:04:16 +02:00
Nikita Popov	49c6235d1f	[PowerPC] Regenerate MIR test checks (NFC)	2025-06-17 12:52:00 +02:00
zhijian lin	ea73fc5f07	[PowerPC] fixed mtvsrbmi.ll test case error caused by run the update_llc_test_checks.py (#144075 ) fixed mtvsrbmi.ll test case error which caused by run the update_llc_test_checks.py	2025-06-13 09:38:54 -04:00
zhijian lin	9c2e0bd59c	[PowerPC][NFC] Pre-commit test case for checking whether `mtvsrbmi` power10 instruction not used (#143956 ) Verify whether the generated assembly for the following function includes the mtvsrbmi instruction. vector unsigned char v00FF() { vector unsigned char x = { 0xFF, 0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0 }; return x; }	2025-06-13 09:19:10 -04:00
zhijian lin	85a9f2e148	[PowerPC] enable AtomicExpandImpl::expandAtomicCmpXchg for powerpc (#142395 ) In PowerPC, the AtomicCmpXchgInst is lowered to ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS. However, this node does not handle the weak attribute of AtomicCmpXchgInst. As a result, when compiling C++ atomic_compare_exchange_weak_explicit, the generated assembly includes a "reservation lost" loop — i.e., it branches back and retries if the stwcx. (store-conditional) fails. This differs from GCC’s codegen, which does not include that loop for weak compare-exchange. Since PowerPC uses LL/SC-style atomic instructions, the patch enables AtomicExpandImpl::expandAtomicCmpXchg for PowerPC. With this, the weak attribute is properly respected, and the "reservation lost" loop is removed for weak operations. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-06-13 09:14:48 -04:00
Fangrui Song	28bda77843	Introduce MCAsmInfo::UsesSetToEquateSymbol and prefer = to .set Introduce MCAsmInfo::UsesSetToEquateSymbol to control the preferred syntax for symbol equating. We now favor the more readable and common `symbol = expression` syntax over `.set`. This aligns with pre- https://reviews.llvm.org/D44256 behavior. On Apple platforms, this resolves a clang -S vs -c behavior difference (resolves #104623). For targets whose = support is unconfirmed, UsesSetToEquateSymbol is set to false. This also minimizes test updates. Pull Request: https://github.com/llvm/llvm-project/pull/142289	2025-06-11 22:19:31 -07:00
Tony Varghese	7a0c9f607a	[NFC][PowerPC] Pre-commit test case for exploitation of xxeval for the pattern ternary(A,X,or(B,C)) (#143693 ) Pre-commit test case for exploitation of `xxeval` for ternary operations of the pattern `ternary(A,X,or(B,C))`. Exploitation of `xxeval` to be added later. Co-authored-by: Tony Varghese <tony.varghese@ibm.com>	2025-06-11 14:26:15 -04:00
RolandF77	5d6218d311	[PowerPC] extend smaller splats into bigger splats (with fix) (#142194 ) For pwr9, xxspltib is a byte splat with a range -128 to 127 - it can be used with a following vector extend sign to make splats of i16, i32, or i64 element size. For pwr8, vspltisw with a following vector extend sign can be used to make splats of i64 elements in the range -16 to 15. Add check for P8 to make sure the 64-bit vector ops are there.	2025-06-09 14:01:38 -04:00
Lei Huang	649020c680	[PowerPC] Change default for auto gen stxvp for cpu=future (#142826 ) For cpu=future, we want to auto generate stxvp instructions by default.	2025-06-09 12:34:50 -04:00
zhijian lin	a91b0d2780	[PowerPC] hoist xxspltiw instruction out of the loop with FMA mutation pass. (#111696 ) Summary: The patch fixes the issue [[PowerPC] missing VSX FMA Mutation optimize in some case for option -schedule-ppc-vsx-fma-mutation-early #111906](https://github.com/llvm/llvm-project/issues/111906) In certain cases, the Register Coalescer pass—which eliminates COPY instructions—can interfere with the PowerPC VSX FMA Mutation pass. Specifically, it can prevent the mutation of a COPY adjacent to an XSMADDADP into a single XSMADDMDP instruction. As a result, the xxspltiw instruction is not hoisted out of the loop as expected, leading to missed optimization opportunities. To address this, the patch ensures that the `VSX FMA Mutation` pass runs before the `Register Coalescer` pass when the -schedule-ppc-vsx-fma-mutation-early option is enabled.	2025-06-05 09:41:51 -04:00
Nikita Popov	d74831efeb	Revert "[SDAG] Fix fmaximum legalization errors (#142170 )" This reverts commit 58cc1675ec7b4aa5bc2dab56180cb7af1b23ade5. I also made the incorrect assumption that we know both values are +/-0.0 here as well. Revert for now.	2025-06-04 14:35:30 +02:00
Nikita Popov	42605b8aa3	Revert "[SelectionDAG] Avoid one comparison when legalizing fmaximum (#142732 )" This reverts commit 54da543a14da6dd0e594875241494949cb659b08. I made a logic error here with the assumption that both values are known to be +/-0.0.	2025-06-04 14:22:19 +02:00
Nikita Popov	54da543a14	[SelectionDAG] Avoid one comparison when legalizing fmaximum (#142732 ) When ordering signed zero, only check the sign of one of the values. We already know at this point that both values must be +/-0.0, so it is sufficient to check one of them to correctly order them. For example, for fmaximum, if we know LHS is `+0.0` then we can always select LHS, value of RHS does not matter. If LHS is `-0.0` we can always select RHS, value of RHS doesn't matter.	2025-06-04 10:41:30 +02:00
Tony Varghese	52cf598c78	[NFC][PowerPC] Add testcases for locking down the xxeval instruction support for ternary operators (#141601 ) NFC patch to add testcases for locking down the support of ternary operators using the `xxsel` instructions. Currently ternary operators are supoprted by emitting `xxsel` instructions instead of `xxeval`. Co-authored-by: Tony Varghese <tony.varghese@ibm.com>	2025-06-03 11:07:58 -04:00
Simon Tatham	56acb06bc6	[ARM,AArch64] Don't put BTI at asm goto branch targets (#141562 ) In 'asm goto' statements ('callbr' in LLVM IR), you can specify one or more labels / basic blocks in the containing function which the assembly code might jump to. If you're also compiling with branch target enforcement via BTI, then previously listing a basic block as a possible jump destination of an asm goto would cause a BTI instruction to be placed at the start of the block, in case the assembly code used an _indirect_ branch instruction (i.e. to a destination address read from a register) to jump to that location. Now it doesn't do that any more: branches to destination labels from the assembly code are assumed to be direct branches (to a relative offset encoded in the instruction), which don't require a BTI at their destination. This change was proposed in https://discourse.llvm.org/t/85845 and there seemed to be no disagreement. The rationale is: 1. it brings clang's handling of asm goto in Arm and AArch64 in line with gcc's, which didn't generate BTIs at the target labels in the first place. 2. it improves performance in the Linux kernel, which uses a lot of 'asm goto' in which the assembly language just contains a NOP, and the label's address is saved elsewhere to let the kernel self-modify at run time to swap between the original NOP and a direct branch to the label. This allows hot code paths to be instrumented for debugging, at only the cost of a NOP when the instrumentation is turned off, instead of the larger cost of an indirect branch. In this situation a BTI is unnecessary (if the branch happens it's direct), and since the code paths are hot, also a noticeable performance hit. Implementation: `SelectionDAGBuilder::visitCallBr` is the place where 'asm goto' target labels are handled. It calls `setIsInlineAsmBrIndirectTarget()` on each target `MachineBasicBlock`. Previously it also called `setMachineBlockAddressTaken()`, which made `hasAddressTaken()` return true, which caused a BTI to be added in the Arm backends. Now `visitCallBr` doesn't call `setMachineBlockAddressTaken()` any more on asm goto targets, but `hasAddressTaken()` also checks the flag set by `setIsInlineAsmBrIndirectTarget()`. So call sites that were using `hasAddressTaken()` don't need to be modified. But the Arm backends don't call `hasAddressTaken()` any more: instead they test two more specific query functions that cover all the reasons `hasAddressTaken()` might have returned true _except_ being an asm goto target. Testing: The new test `AArch64/callbr-asm-label-bti.ll` is testing the actual change, where it expects not to see a `bti` instruction after `[[LABEL]]`. The rest of the test changes are all churn, due to the flags on basic blocks changing. Actual output code hasn't changed in any of the existing tests, only comments and diagnostics. Further work: `RISCVIndirectBranchTracking.cpp` and `X86IndirectBranchTracking.cpp` also call `hasAddressTaken()` in a way that might benefit from using the same more specific check I've put in `ARMBranchTargets.cpp` and `AArch64BranchTargets.cpp`. But I'm not sure of that, so in this commit I've only changed the Arm backends, and left those alone.	2025-06-03 08:44:13 +01:00
Lei Huang	05f1ca7d17	[PowerPC] Spill and restore DMR register (#141530 ) Add spilling and restoring of DMR registers.	2025-06-02 13:11:39 -04:00
Yingwei Zheng	1984c7539e	[ValueTracking] Do not use FMF from fcmp (#142266 ) This patch introduces an FMF parameter for `matchDecomposedSelectPattern` to pass FMF flags from select, instead of fcmp. Closes https://github.com/llvm/llvm-project/issues/137998. Closes https://github.com/llvm/llvm-project/issues/141017.	2025-06-02 18:21:14 +08:00
Nikita Popov	58cc1675ec	[SDAG] Fix fmaximum legalization errors (#142170 ) FMAXIMUM is currently legalized via IS_FPCLASS for the signed zero handling. This is problematic, because it assumes the equivalent integer type is legal. Many targets have legal fp128, but illegal i128, so this results in legalization failures. Fix this by replacing IS_FPCLASS with checking the bitcast to integer instead. In that case it is sufficient to use any legal integer type, as we're just interested in the sign bit. This can be obtained via a stack temporary cast. There is existing FloatSignAsInt functionality used for legalization of FABS and similar we can use for this purpose. Fixes https://github.com/llvm/llvm-project/issues/139380. Fixes https://github.com/llvm/llvm-project/issues/139381. Fixes https://github.com/llvm/llvm-project/issues/140445.	2025-06-02 10:14:33 +02:00
Hubert Tong	8f486254e4	Revert "[PowerPC] extend smaller splats into bigger splats (#141282 )" The subject commit causes the build to ICE on AIX: https://lab.llvm.org/buildbot/#/builders/64/builds/3890/steps/5/logs/stdio This reverts commit 7fa365843d9f99e75c38a6107e8511b324950e74.	2025-05-29 01:10:55 -04:00
RolandF77	7fa365843d	[PowerPC] extend smaller splats into bigger splats (#141282 ) For pwr9, xxspltib is a byte splat with a range -128 to 127 - it can be used with a following vector extend sign to make splats of i16, i32, or i64 element size. For pwr8, vspltisw with a following vector extend sign can be used to make splats of i64 elements in the range -16 to 15.	2025-05-28 10:11:28 -04:00
Ruiling, Song	3e47d8deba	MachineScheduler: Reset next cluster candidate for each node (#139513 ) When a node is picked, we should reset its next cluster candidate to null before releasing its successors/predecessors.	2025-05-28 14:53:46 +08:00
Nico Weber	04a96c6900	[PowerPC] Attempt to fix test added in #141263	2025-05-27 17:40:35 -04:00
zhijian lin	7b1a6a8a90	[PowerPC ][NFC] Add a test case for the function atomic_compare_exchange_weak (#141263 ) Add test case to test the generated asm of the function atomic_compare_exchange_weak	2025-05-27 16:36:39 -04:00
Jon Roelofs	714096c132	[LLVM] Skip dumping inline SDag children (#141359 ) If they're simple enough to render inline, we don't need to dump them again in the recursive walk.	2025-05-26 19:40:01 -07:00
Lei Huang	4b09eedf7b	[PowerPC] Update DMF VSX ACC data transfer instructions (#138897 ) For cpu=future, acc registers no longer overlap VSRs and are prefixed with `dm`. The original, xxmfacc/xxmtacc instructions are now extended menemonics to it's dm* equivalents.	2025-05-26 12:47:12 -04:00
Shimin Cui	b1017a4b84	Use getSignedTargetConstant for offset (#141149 ) This is to fix an assertion failure with PeepholePPC64. The load/store offset can be negative. A reduced case from one of our failures is added as well.	2025-05-26 11:08:13 -04:00
Maryam Moghadas	a54300b32c	[PowerPC] Add load/store support for v2048i1 and DMF cryptography instructions (#136145 ) This commit adds support for loading and storing v2048i1 DMR pairs and introduces Dense Math Facility cryptography instructions: DMSHA2HASH, DMSHA3HASH, and DMXXSHAPAD, along with their corresponding intrinsics and tests.	2025-05-26 10:59:35 -04:00
RolandF77	bbca78fbcb	[PowerPC] vector shift word/double by element size - 1 use all ones (#139794 ) Vector shift word or double requires a shift amount vector of 31 or 63 which is too big for splat immediate and requires a multi-instruction sequence. However the PPC instructions only use 5 or 6 bits of the shift amount vector elements so an all ones mask, which we can generate efficiently, works.	2025-05-23 10:49:37 -04:00
Jay Foad	1f0c178411	Fix typo "redudant"	2025-05-22 15:42:22 +01:00
RolandF77	99f0309669	[PowerPC] catch v2i64 shift left by 1 is add case (#138772 ) Catch missing case in PPC BE for v2i64 x << 1 and generate x + x.	2025-05-13 11:26:46 -04:00
zhijian lin	41647412c6	[PowerPC] Fix an LowerADDSUBO_CARRY error when converting carry bit for usubo_carry (#137809 ) In PowerPC, if a borrow occurs during a subtraction, the carry bit is zero (unset). The carry bit is set if no borrow occurs. For ISD::USUBO_CARRY, the nodes produce two results: the normal result of the addition or subtraction, and a boolean value that is 1 if and only if there is an outgoing carry or borrow. Therefore, we need to convert a 1 (which indicates a borrow in ISD::USUBO_CARRY) to 0 to match PowerPC's definition of borrow. Similarly, we need to convert a 0 (no borrow in ISD::USUBO_CARRY) to 1 for PowerPC. To perform this conversion, we use XOR 1 instead of XOR DAG.getAllOnesConstant(DL, CarryOp.getValueType()). `	2025-04-30 10:39:09 -04:00
Vikram Hegde	53a8b89003	[CodeGen][NewPM] Port "ShrinkWrap" pass to NPM (#129880 )	2025-04-30 13:11:17 +05:30
Maryam Moghadas	82a1d5078d	[PowerPC] Add dense math half-precision floating-point outer-product accumulate to DMR instructions (#133272 ) This patch adds the following Dense Math Facility 16-bit half-precision floating-point calculation instructions: dmxvf16gerx2, dmxvf16gerx2pp, dmxvf16gerx2pn, dmxvf16gerx2np, dmxvf16gerx2nn, pmdmxvf16gerx2, pmdmxvf16gerx2pp, pmdmxvf16gerx2pn, pmdmxvf16gerx2np, pmdmxvf16gerx2nn, along with their corresponding intrinsics and tests.	2025-04-28 16:03:10 -04:00
RolandF77	a903c7b7f5	[PowerPC] Intrinsics and tests for dmr insert/extract (#135653 ) Add some intrinsics and LIT tests for PPC dmr insert/extract instructions.	2025-04-24 11:27:22 -04:00
zhijian lin	3e605b1e1d	[NFC] Add a pre-commit test case for #111696 (#136730 ) Add a pre- commit test case for Patch https://github.com/llvm/llvm-project/pull/111696 Test ppc-vsx-fma-mutate pass work with -schedule-ppc-vsx-fma-mutation-early not hoist the instruction `xxspltiw vs2, 1170469888` out the loop. --------- Co-authored-by: Amy Kwan <amy.kwan1@ibm.com>	2025-04-24 10:37:24 -04:00
Sergei Barannikov	5080a0251f	[CodeGenPrepare] Unfold slow ctpop when used in power-of-two test (#102731 ) DAG combiner already does this transformation, but in some cases it does not have a chance because either CodeGenPrepare or SelectionDAGBuilder move icmp to a different basic block. https://alive2.llvm.org/ce/z/ARzh99 Fixes #94829 Pull Request: https://github.com/llvm/llvm-project/pull/102731	2025-04-23 08:54:10 +03:00
zhijian lin	afda4c295b	Reland [SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#136701 ) This patch addresses the signed/zero extension of poison by using a poison value of the extended type instead of a constant zero of the extended type.	2025-04-22 17:36:41 -04:00
Maryam Moghadas	c40d3a411c	[PowerPC] Add dense math bfloat16 floating-point outer-product accumulate to DMR instructions (#133109 ) This patch adds the following Dense Math Facility bfloat16 floating-point calculation instructions: dmxvbf16gerx2, dmxvbf16gerx2pp,dmxvbf16gerx2pn, dmxvbf16gerx2np, dmxvbf16gerx2nn, pmdmxvbf16gerx2, pmdmxvbf16gerx2pp, pmdmxvbf16gerx2pn, pmdmxvbf16gerx2np, pmdmxvbf16gerx2nn, along with their corresponding intrinsics and tests.	2025-04-21 18:39:44 -04:00
Nico Weber	e18a77cfbe	Revert "[SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#122741 )" This reverts commit f12078e72601e7c03e5d66afab034313caf8f791. Breaks `check-llvm`, see comments on https://github.com/llvm/llvm-project/pull/122741	2025-04-21 10:51:03 -04:00
zhijian lin	f12078e726	[SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#122741 ) The PR will fix the issue https://github.com/llvm/llvm-project/issues/122728 This patch addresses the signed/zero extension of poison by using a poison value of the extended type instead of a constant zero of the extended type.	2025-04-21 10:02:21 -04:00
Yingwei Zheng	7e5317139d	[PowerPC] Pre-commit tests for PR130742. NFC. (#135606 ) Needed by https://github.com/llvm/llvm-project/pull/130742.	2025-04-17 17:52:49 +08:00
Matt Arsenault	393c783a10	LICM: Avoid looking at use list of constant data (#134690 ) The codegen test changes seem incidental. Either way, sms-grp-order.ll seems to already not hit the original issue.	2025-04-13 17:06:38 +02:00
Douglas Yung	b03aa291b8	Add 'REQUIRES: asserts' to test undef-args.ll added in #135247 to skip test when asserts are not present. Should fix bot failure: https://lab.llvm.org/buildbot/#/builders/202/builds/601	2025-04-11 02:18:10 +00:00
zhijian lin	5aeeebc1f4	[NFC] add a pre-commit test case for patch 122741 (#135247 ) [NFC] add a pre-commit test case for patch [Eliminating li of 0 into arg registers of unused arguments](https://github.com/llvm/llvm-project/pull/122741) The test case tests that extend poison are lower to undef and also test there are redendunt instrution load 0 into argument registers for unused arguments.	2025-04-10 16:33:40 -04:00
zhijian lin	378ac572ac	Reland "[SelectionDAG] Introducing a new ISD::POISON SDNode to represent the poison value in the IR." (#135056 ) A new ISD::POISON SDNode is introduced to represent the poison value in the IR, replacing the previous use of ISD::UNDEF	2025-04-10 11:29:14 -04:00
Lei Huang	3479c57466	PowerPC32:PIC: Update to bcl to fix branch prediction mis-predict issue (#134140 ) Update `bl` to `bcl 20, 31, .+4` for 32bit PIC code gen so the link stack is not corrupted and cause mis-predict for the branch predictor. fixes: https://github.com/llvm/llvm-project/issues/128644	2025-04-07 15:50:21 -04:00

1 2 3 4 5 ...

4138 Commits