llvm-project

Author	SHA1	Message	Date
Lei Huang	3a84aef64a	[PowerPC][NFC] auto gen checks vec rounding tests (#166435 ) Update tests to contain auto generated checks.	2025-11-05 10:24:16 -05:00
Lei Huang	a01e4da6d6	[PowerPC] Ensure correct codgen for MMA functions for cpu=future (#165791 ) Update MMA tests to add run line for `cpu=future` to ensure MMA functionality is not broken with the new `wacc` register classes introduced. Previous commit have added def for using the new `wacc` registers, this just add in testing and fixes a few patterns that was missing .	2025-11-04 09:29:26 -05:00
Vigneshwar Jayakumar	469702c5d5	[LICM] Sink unused l-invariant loads in preheader. (#157559 ) Unused loop invariant loads were not sunk from the preheader to the exit block, increasing live range. This commit moves the sinkUnusedInvariant logic from indvarsimplify to LICM also adds functionality to sink unused load that's not clobbered by the loop body.	2025-10-30 09:23:04 -05:00
Princeton Ferro	68e74f8f84	[DAGCombiner] Lower dynamic insertelt chain more efficiently (#162368 ) For an insertelt with a dynamic index, the default handling in DAGTypeLegalizer and LegalizeDAG will reserve a stack slot for the vector, lower the insertelt to a store, then load the modified vector back into temporaries. The vector store and load may be legalized into a sequence of smaller operations depending on the target. Let V = the vector size and L = the length of a chain of insertelts with dynamic indices. In the worse case, this chain will lower to O(VL) operations, which can increase code size dramatically. Instead, identify such chains, reserve one stack slot for the vector, and lower all of the insertelts to stores at once. This requires only O(V + L) operations. This change only affects the default lowering behavior.	2025-10-29 09:46:01 -07:00
Shimin Cui	531fd45e92	[PPC] Set minimum of largest number of comparisons to use bit test for switch lowering (#155910 ) Currently it is considered suitable to lower to a bit test for a set of switch case clusters when the the number of unique destinations (`NumDests`) and the number of total comparisons (`NumCmps`) satisfy: `(NumDests == 1 && NumCmps >= 3) \|\| (NumDests == 2 && NumCmps >= 5) \|\| (NumDests == 3 && NumCmps >= 6)` However it is found for some cases on powerpc, for example, when NumDests is 3, and the number of comparisons for each destination is all 2, it's not profitable to lower the switch to bit test. This is to add an option to set the minimum of largest number of comparisons to use bit test for switch lowering. --------- Co-authored-by: Shimin Cui <scui@xlperflep9.rtp.raleigh.ibm.com>	2025-10-28 10:24:32 -04:00
Lei Huang	2b3a76825f	[PowerPC] Update tlbie instruction implementation for ISA3.0+ (#162729 ) The instruction `tlbie` changed in ISA3.0. ISA V2.07: `tlbie RB,RS` ISA V3.0: `tlbie RB,RS,RIC,PRS,R`, with `tlbie RB,RS` aliased to `tlbie RB,RS,0,0,0`	2025-10-27 11:18:45 -04:00
paperchalice	c8f5c602c8	[test][PowerPC] Remove unsafe-fp-math uses (NFC) (#164817 ) Post cleanup for #164534.	2025-10-26 09:29:45 +08:00
paperchalice	3656f6f226	[CodeGen] Remove `-enable-unsafe-fp-math` option (#164559 ) Hope this can unblock #105746.	2025-10-22 15:40:31 +08:00
zhijian lin	7aa6c62bdb	[PowecPC] Hint branch `bne-` for atomic operation after the store-conditional (#152529 ) The branches emitted for atomic operations after the store-conditional are currently not hinted, even though they should be. According to the Power10 Processor Chip User’s Manual: ` “Without static prediction, if the lock is not acquired in the first iteration, the branch history mechanism works to update the prediction to predict taken; that is, predict lock acquisition failure and cause more lwarx traffic for the next iteration.”` This patch addresses the issue by adding explicit branch hints for atomic operations after the store-conditional.	2025-10-21 09:37:30 -04:00
paperchalice	26feb1a9f1	[PowerPC] Remove `UnsafeFPMath` uses (#154901 ) Try to remove `UnsafeFPMath` uses in PowerPC backend. These global flags block some improvements like https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797. Remove them incrementally. FP operations may raise exceptions are replaced by constrained intrinsics. However, vector type is not supported by these intrinsics.	2025-10-21 19:01:34 +08:00
Himadhith	43364151d7	[NFC][PowerPC] Patch to add the remaining types v2i64, v8i16 and v16i8 into exisiting testfile (#163201 ) The previous [NFC patch](https://github.com/llvm/llvm-project/pull/160476#top) addressed only the vector type `v4i32`, this is a continuation for the previous patch which adds the remaining 3 vector types which were left out. This should include the following operands: - `v2i64`: `A + vector {1, 1,}` - `v8i16`: `A + vector {1, 1, 1, 1, 1, 1, 1, 1}` - `v16i8`: `A + vector {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}` --------- Co-authored-by: himadhith <himadhith.v@ibm.com>	2025-10-16 11:02:09 +05:30
Tony Varghese	d83fe1201e	[PowerPC] Exploit xxeval instruction for operations of the form ternary(A, X, nor(B,C)), ternary(A, X, eqv(B,C)), ternary(A, X, nand(B,C)), ternary(A, X, not(B)) and ternary(A, X, not(C)) (#158096 ) Adds support for ternary equivalent operations of the form `ternary(A, X, nor(B,C))`, `ternary(A, X, eqv(B,C))`, `ternary(A, X, nand(B,C))`, `ternary(A, X, not(B))` and `ternary(A, X, not(C))` where `X=[xor(B,C)\| nor(B,C)\| eqv(B,C)\| not(B)\| not(C)\| and(B,C)\| nand(B,C)]`. This adds support for `v4i32, v2i64, v16i8, v8i16` operand types for the following patterns. List of xxeval equivalent ternary operations added and the corresponding imm value required: ``` ternary(A, and(B,C), nor(B,C)) 129 ternary(A, B, nor(B,C)) 131 ternary(A, C, nor(B,C)) 133 ternary(A, xor(B,C), nor(B,C)) 134 ternary(A, not(C), nor(B,C)) 138 ternary(A, not(B), nor(B,C)) 140 ternary(A, nand(B,C), nor(B,C)) 142 ternary(A, or(B,C), eqv(B,C)) 151 ternary(A, nor(B,C), eqv(B,C)) 152 ternary(A, not(C), eqv(B,C)) 154 ternary(A, nand(B,C), eqv(B,C)) 158 ternary(A, and(B,C), not(C)) 161 ternary(A, B, not(C)) 163 ternary(A, xor(B,C), not(C)) 166 ternary(A, or(B,C), not(C)) 167 ternary(A, not(B), not(C)) 172 ternary(A, nand(B,C), not(C)) 174 ternary(A, and(B,C), not(B)) 193 ternary(A, xor(B,C), not(B)) 198 ternary(A, or(B,C), not(B)) 199 ternary(A, nand(B,C), not(B)) 206 ternary(A, B, nand(B,C)) 227 ternary(A, C, nand(B,C)) 229 ternary(A, xor(B,C), nand(B,C)) 230 ternary(A, or(B,C), nand(B,C)) 231 ternary(A, eqv(B,C), nand(B,C)) 233 ``` eg. `xxeval XT, XA, XB, XC, 129` performs the ternary operation: `XA ? and(XB, XC) : nor(XB, XC)` and places the result in `XT`. This is the continuation of: - [[PowerPC] Exploit xxeval instruction for ternary patterns - ternary(A, X, and(B,C))](https://github.com/llvm/llvm-project/pull/141733#top) - [[PowerPC] Exploit xxeval instruction for operations of the form ternary(A,X,B) and ternary(A,X,C).](https://github.com/llvm/llvm-project/pull/152956#top) - [[PowerPC] Exploit xxeval instruction for operations of the form ternary(A,X, XOR(B,C)) and ternary(A,X, OR(B,C))](https://github.com/llvm/llvm-project/pull/157909#top) Co-authored-by: Tony Varghese <tony.varghese@ibm.com>	2025-10-15 10:58:42 +05:30
Tony Varghese	60ee515b8c	[PowerPC] Emit lxvkq and vsrq instructions for build vector patterns (#157625 ) ### Optimize BUILD_VECTOR having special quadword patterns This change optimizes `BUILD_VECTOR` operations by using the `lxvkq` or `xxpltib + vsrq` instructions to inline constants matching specific 128-bit patterns: - MSB set pattern: `0x8000_0000_0000_0000_0000_0000_0000_0000` - LSB set pattern: `0x0000_0000_0000_0000_0000_0000_0000_0001` ### Implementation Details The `lxvkq` instruction loads special quadword values into VSX registers: ```asm lxvkq XT, UIM # When UIM=16: loads 0x8000_0000_0000_0000_0000_0000_0000_0000 ``` The optimization reconstructs the 128-bit register pattern from `BUILD_VECTOR` operands, accounting for target endianness. For example, the MSB pattern can be represented as: - Big-Endian: `<i64 -9223372036854775808, i64 0>` - Little-Endian: `<i64 0, i64 -9223372036854775808>` Both produce the same register value: `0x8000_0000_0000_0000_0000_0000_0000_0000` ### MSB Pattern (`0x8000...0000`) All vector types (`v2i64`, `v4i32`, `v8i16`, `v16i8`) generate: ```asm lxvkq v2, 16 ``` ### LSB Pattern (`0x0000...0001`) All vector types generate: ```asm xxspltib v2, 255 vsrq v2, v2, v2 ``` --------- Co-authored-by: Tony Varghese <tony.varghese@ibm.com>	2025-10-15 10:54:04 +05:30
paperchalice	dd44e63c8e	[DAGCombiner] Use `FlagInserter` in `visitFSQRT` (#163301 ) Propagate fast-math flags for TLI.getSqrtEstimate etc.	2025-10-15 09:03:15 +08:00
Himadhith	9bcf8f088b	[NFC][PowerPC] Lockdown instructions for floating point comparison with zero-vector (#162828 ) This NFC patch adds a new function which aids in emitting machine instructions for floating point vectors. This was previously not included in the test file as it currently only checks for integer vectors. --------- Co-authored-by: himadhith <himadhith.v@ibm.com>	2025-10-14 07:59:04 +05:30
小钟	e6358ab75c	Fix typo: IsGlobaLinkage -> IsGlobalLinkage in XCOFF (#161960 ) Corrects the spelling of 'IsGlobaLinkage' to 'IsGlobalLinkage' in XCOFF-related code, comments, and tests across the codebase.	2025-10-12 12:03:40 -07:00
AZero13	07eeb5f08d	[PowerPC] Lower ucmp using subtractions (#146446 ) Source: Hacker's delight, page 21. Using the carry, we can use contractions to use the ucmp.	2025-10-11 12:34:30 +09:00
Folkert de Vries	3f3d522ba7	[PowerPC] recognize `vmnsub` in older ppc versions (#155465 ) fixes https://github.com/llvm/llvm-project/issues/129432 Recognize expansion sequence of negate where it isn't legal in order to select multiply-subtract.	2025-10-06 12:40:57 -04:00
Matt Arsenault	c6e280e7ed	PeepholeOpt: Fix losing subregister indexes on full copies (#161310 ) Previously if we had a subregister extract reading from a full copy, the no-subregister incoming copy would overwrite the DefSubReg index of the folding context. There's one ugly rvv regression, but it's a downstream issue of this; an unnecessary same class reg-to-reg full copy was avoided.	2025-10-02 13:36:47 +09:00
paperchalice	c6d3b517ee	[DAGCombiner] Remove most `NoSignedZerosFPMath` uses (#161180 ) Remained two uses are related to fneg and foldFPToIntToFP, some AMDGPU tests are duplicated and regenerated.	2025-09-30 11:44:34 +08:00
paperchalice	b0a755b2bf	[TargetLowering] Remove NoSignedZerosFPMath uses (#160975 ) Remove NoSignedZerosFPMath in TargetLowering part, users should always use instruction level fast math flags.	2025-09-29 14:33:56 +08:00
Himadhith	0e72c3da2a	[NFC] Lockdown instructions of vspltisw for addition of vector of 1s (#160476 ) This NFC patch looks to lock down the instruction generated for the operation of `A + vector {1, 1, 1, 1}` in which the current code emits `vspltisw`. It can be made better with the use of a `2 cycle` instruction `xxleqv` over the current `4 cycle vspltisw`. --------- Co-authored-by: himadhith <himadhith.v@ibm.com>	2025-09-27 09:22:58 +05:30
Nikita Popov	8b824f3b3e	[PowerPC] Avoid working on deleted node in ext bool trunc combine (#160050 ) This code was already creating HandleSDNodes to handle the case where a node gets replaced with an equivalent node. However, the code before the handles are created also performs RAUW operations, which can end up CSEing and deleting nodes. Fix this issue by moving the handle creation earlier. Fixes https://github.com/llvm/llvm-project/issues/160040.	2025-09-22 21:37:13 +02:00
Tony Varghese	87129cf759	[PowerPC] Exploit xxeval instruction for operations of the form ternary(A,X, XOR(B,C)) and ternary(A,X, OR(B,C)) (#157909 ) Adds support for ternary equivalent operations of the form - `ternary(A, X, xor(B,C))` where `X=[and(B,C)\| nor(B,C)\| or(B,C)\| B \| C]`. - `ternary(A, X, or(B,C))` where `X = [and(B,C)\| eqv(B,C)\| not(B)\| not(C)\| nand(B,C)\| B \| C]`. The following are the patterns involved and the imm values: ``` ternary(A, and(B,C), xor(B,C)) 97 ternary(A, B, xor(B,C)) 99 ternary(A, C, xor(B,C)) 101 ternary(A, or(B,C), xor(B,C)) 103 ternary(A, nor(B,C), xor(B,C)) 104 ternary(A, and(B,C), or(B,C)) 113 ternary(A, B, or(B,C)) 115 ternary(A, C, or(B,C)) 117 ternary(A, eqv(B,C), or(B,C)) 121 ternary(A, not(C), or(B,C)) 122 ternary(A, not(B), or(B,C)) 124 ternary(A, nand(B,C), or(B,C)) 126 ``` eg. `xxeval XT, XA, XB, XC, 97` performs the ternary operation: `XA ? and(XB, XC) : xor(XB, XC)` and places the result in `XT`. This is the continuation of: - [[PowerPC] Exploit xxeval instruction for ternary patterns - ternary(A, X, and(B,C))](https://github.com/llvm/llvm-project/pull/141733#top) - [[PowerPC] Exploit xxeval instruction for operations of the form ternary(A,X,B) and ternary(A,X,C).](https://github.com/llvm/llvm-project/pull/152956#top) --------- Co-authored-by: Tony Varghese <tony.varghese@ibm.com>	2025-09-22 19:09:47 +05:30
Matt Arsenault	5eebb58fb4	PPC: Fix regression for 32-bit ppc with 64-bit support (#159893 ) Fixes regression after e5bbaa9c8fb6e06dbcbd39404039cc5d31df4410. e5500 accidentally still had the 64bit feature applied instead of 64bit-support.	2025-09-20 02:31:38 +00:00
RolandF77	1eb575dcae	[PowerPC] Fix vector extend result types in BUILD_VECTOR lowering (#159398 ) The result type of the vector extend intrinsics generated by the BUILD_VECTOR lowering code should match how they are actually defined. Currently the result type is defaulting to the operand type there. This can conflict with calls to the same intrinsic from other paths.	2025-09-19 10:43:22 -04:00
zhijian lin	be6c4d933d	[PowerPC] using milicode call for strlen instead of lib call (#153600 ) AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __strlen routine is a millicode implementation; we use millicode for the strlen function instead of a library call to improve performance.	2025-09-19 10:02:21 -04:00
Paul Walker	b7e4edca3d	[LLVM][CodeGen] Update PPCFastISel::SelectRet for ConstantInt based vectors. (#159331 ) The current implementation assumes ConstantInt return values are scalar, which is not true when use-constant-int-for-fixed-length-splat is enabled.	2025-09-19 13:15:57 +01:00
Craig Topper	f209d63b04	[SelectionDAGBuilder][PPC] Use getShiftAmountConstant. (#158400 ) The PowerPC changes are caused by shifts created by different IR operations being CSEd now. This allows consecutive loads to be turned into vectors earlier. This has effects on the ordering of other combines and legalizations. This leads to some improvements and some regressions.	2025-09-16 10:26:49 -07:00
Lei Huang	b22448c9ba	[PowerPC] Add intrinsic definition for load and store with Right Length Left-justified (#148873 )	2025-09-16 12:36:28 -04:00
Matt Arsenault	e5bbaa9c8f	PPC: Split 64bit target feature into 64bit and 64bit-support (#157206 ) This was being used for 2 different purposes. The TargetMachine constructor prepends +64bit based on isPPC64 triples as a mode switch. The same feature name was also explicitly added to different processors, making it impossible to perform a pure feature check for whether 64-bit mode is enabled ir not. i.e., checkFeatures("+64bit") would be true even for ppc32 triples. The comment in tablegen suggests it's relevant to track which processors support 64-bit mode independently of whether that's the active compile target, so replace that with a new feature.	2025-09-16 12:43:53 +09:00
zhijian lin	4bf0001c07	[PowerPC][NFC] Pre-commit test case: Implement a more efficient memcmp in cases where the length is known (#158367 ) The newly added test case will be used to verify a more efficient memcmp in cases where the length is known.	2025-09-15 10:26:01 -04:00
Tony Varghese	30010f49ca	[NFC][PowerPC] Pre-commit testcases for locking down the xxsel instructions for ternary(A, X, eqv(B,C)), ternary(A, X, not(C)), ternary(A, X, not(B)), ternary(A, X, nand(B,C)) and ternary(A, X, nor(B,C)) patterns (#158091 ) Pre-commit test case for exploitation of `xxsel` for ternary operations of the pattern. This adds support for v4i32, v2i64, v16i8 and v8i16 operand types for the following patterns. The following are the patterns involved in the change: ``` ternary(A, and(B,C), nor(B,C)) ternary(A, B, nor(B,C)) ternary(A, C, nor(B,C)) ternary(A, xor(B,C), nor(B,C)) ternary(A, not(C), nor(B,C)) ternary(A, not(B), nor(B,C)) ternary(A, nand(B,C), nor(B,C)) ternary(A, or(B,C), eqv(B,C)) ternary(A, nor(B,C), eqv(B,C)) ternary(A, not(C), eqv(B,C)) ternary(A, nand(B,C), eqv(B,C)) ternary(A, and(B,C), not(C)) ternary(A, B, not(C)) ternary(A, xor(B,C), not(C)) ternary(A, or(B,C), not(C)) ternary(A, not(B), not(C)) ternary(A, nand(B,C), not(C)) ternary(A, and(B,C), not(B)) ternary(A, xor(B,C), not(B)) ternary(A, or(B,C), not(B)) ternary(A, nand(B,C), not(B)) ternary(A, B, nand(B,C)) ternary(A, C, nand(B,C)) ternary(A, xor(B,C), nand(B,C)) ternary(A, or(B,C), nand(B,C)) ternary(A, eqv(B,C), nand(B,C)) ``` Exploitation of `xxeval` for the above patterns to be added as a follow up. Co-authored-by: Tony Varghese <tony.varghese@ibm.com>	2025-09-12 09:36:37 +05:30
Trevor Gross	a975e64239	[PowerPC] Extend and update the test for `half` support (NFC) (#152625 ) `f16` is more functional than just a storage type on the platform, though it does have some codegen issues [1]. To prepare for future changes, do the following nonfunctional updates to the existing `half` test: * Add tests for passing and returning the type directly. * Add tests showing bitcast behavior, which is currently incorrect but serves as a baseline. * Add tests for `fabs` and `copysign` (trivial operations that shouldn't require libcalls). * Add invocations for big-endian and for PPC32. * Rename the test to `half.ll` to reflect its status, which also matches other backends. [1]: https://github.com/llvm/llvm-project/issues/97975	2025-09-10 09:03:29 +00:00
Maryam Moghadas	2bd0d770af	[PowerPC] Support `-fpatchable-function-entry` on PPC64LE (#151569 ) This patch enables `-fpatchable-function-entry` on PPC64 little-endian Linux. It is mutually exclusive with existing XRay instrumentation on this target.	2025-09-09 16:43:18 -04:00
Florian Hahn	74ec38fad0	[SCEV] Fold (C * A /u C) -> A, if A is a multiple of C and C a pow-of-2. (#156730 ) Alive2 Proof: https://alive2.llvm.org/ce/z/JoHJE9 PR: https://github.com/llvm/llvm-project/pull/156730	2025-09-05 08:45:13 +00:00
Himadhith	ffbd616210	[NFC][PowerPC] adding the options for register names and VSR to VR (#157007 ) NFC patch to add the flags -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr to the following test files ``` llvm/test/CodeGen/PowerPC/recipest.ll llvm/test/CodeGen/PowerPC/setcc-logic.ll llvm/test/CodeGen/PowerPC/vector-popcnt-128-ult-ugt.ll ``` Created this PR based on this discussion: https://github.com/llvm/llvm-project/pull/151971#issuecomment-3234090675 Co-authored-by: himadhith <himadhith.v@ibm.com>	2025-09-05 10:27:02 +05:30
zhijian lin	36cb33bbca	support branch hint for AtomicExpandImpl::expandAtomicCmpXchg (#152366 ) The patch add branch hint for AtomicExpandImpl::expandAtomicCmpXchg, For example: in PowerPC, it support branch hint as ``` loop: lwarx r6,0,r3 # load and reserve cmpw r4,r6 #1st 2 operands equal? bne- exit #skip if not bne- exit #skip if not stwcx. r5,0,r3 #store new value if still res’ved bne- loop #loop if lost reservation bne- loop #loop if lost reservation exit: mr r4,r6 #return value from storage ``` `-` hints not taken, `+` hints taken,	2025-09-02 09:33:28 -04:00
Himadhith	09350bd1c5	[NFC][PowerPC] adding the arguments for register names and VSR to VR (#155991 ) NFC patch to add the flags `-ppc-asm-full-reg-names --ppc-vsr-nums-as-vr` to the test file `llvm/test/CodeGen/PowerPC/check-zero-vector.ll`. Created this PR based on this discussion: https://github.com/llvm/llvm-project/pull/151971#issuecomment-3234090675 Co-authored-by: himadhith <himadhith.v@ibm.com> Co-authored-by: Lei Huang <lei@ca.ibm.com>	2025-09-01 10:17:14 +05:30
Tony Varghese	3fc1aad65b	[PowerPC] Merge vsr(vsro(input, byte_shift), bit_shift) to vsrq(input, res_bit_shift) (#154388 ) This change implements a patfrag based pattern matching ~dag combiner~ that combines consecutive `VSRO (Vector Shift Right Octet)` and `VSR (Vector Shift Right)` instructions into a single `VSRQ (Vector Shift Right Quadword)` instruction on Power10+ processors. Vector right shift operations like `vec_srl(vec_sro(input, byte_shift), bit_shift)` generate two separate instructions `(VSRO + VSR)` when they could be optimised into a single `VSRQ `instruction that performs the equivalent operation. ``` vsr(vsro (input, vsro_byte_shift), vsr_bit_shift) to vsrq(input, vsrq_bit_shift) where vsrq_bit_shift = (vsro_byte_shift * 8) + vsr_bit_shift ``` Note: ``` vsro : Vector Shift Right by Octet VX-form - vsro VRT, VRA, VRB - The contents of VSR[VRA+32] are shifted right by the number of bytes specified in bits 121:124 of VSR[VRB+32]. - Bytes shifted out of byte 15 are lost. - Zeros are supplied to the vacated bytes on the left. - The result is placed into VSR[VRT+32]. vsr : Vector Shift Right VX-form - vsr VRT, VRA, VRB - The contents of VSR[VRA+32] are shifted right by the number of bits specified in bits 125:127 of VSR[VRB+32]. 3 bits. - Bits shifted out of bit 127 are lost. - Zeros are supplied to the vacated bits on the left. - The result is place into VSR[VRT+32], except if, for any byte element in VSR[VRB+32], the low-order 3 bits are not equal to the shift amount, then VSR[VRT+32] is undefined. vsrq : Vector Shift Right Quadword VX-form - vsrq VRT,VRA,VRB - Let src1 be the contents of VSR[VRA+32]. Let src2 be the contents of VSR[VRB+32]. - src1 is shifted right by the number of bits specified in the low-order 7 bits of src2. - Bits shifted out the least-significant bit are lost. - Zeros are supplied to the vacated bits on the left. - The result is placed into VSR[VRT+32]. ``` --------- Co-authored-by: Tony Varghese <tony.varghese@ibm.com>	2025-09-01 10:14:12 +05:30
Tony Varghese	2e7ea9c945	[PowerPC] Exploit xxeval instruction for operations of the form ternary(A,X,B) and ternary(A,X,C). (#152956 ) Adds support for ternary equivalent operations of the form `ternary(A, X, B)` and `ternary(A, X, C)` where `X=[and(B,C)\| nor(B,C)\| eqv(B,C)\| nand(B,C)]`. The following are the patterns involved and the imm values: \| Operation \| Immediate Value \| \|----------------------------\|---------------------\| \| ternary(A, and(B,C), B) \| 49 \| \| ternary(A, nor(B,C), B) \| 56 \| \| ternary(A, eqv(B,C), B) \| 57 \| \| ternary(A, nand(B,C), B) \| 62 \| \| \| \| \| ternary(A, and(B,C), C) \| 81 \| \| ternary(A, nor(B,C), C) \| 88 \| \| ternary(A, eqv(B,C), C) \| 89 \| \| ternary(A, nand(B,C), C) \| 94 \| eg. `xxeval XT, XA, XB, XC, 49` - performs `XA ? and(XB, XC) : B`and places the result in `XT`. This is the continuation of [[PowerPC] Exploit xxeval instruction for ternary patterns - ternary(A, X, and(B,C))](https://github.com/llvm/llvm-project/pull/141733#top). --------- Co-authored-by: Tony Varghese <tony.varghese@ibm.com>	2025-09-01 10:13:54 +05:30
paperchalice	19464d951a	[NFC] #155740 post cleanup (#155966 ) Remove all "approx-func-fp-math" in tests.	2025-08-29 12:45:38 +08:00
Maryam Moghadas	242d51afe5	[PowerPC] Add DMR and WACC COPY support (#149129 ) This patch updates PPCInstrInfo::copyPhysReg to support DMR and WACC register classes and extends the PPCVSXCopy pass to handle specific WACC copy patterns.	2025-08-27 11:07:24 -04:00
Simon Pilgrim	6aed01a2a7	[PowerPC] ppc64-P9-vabsd.ll - update v16i8 abdu test now that it vectorizes in the middle-end (#154712 ) The scalarized IR was written before improvements to SLP / cost models ensured that the abs intrinsic was easily vectorizable opt -O3 : https://zig.godbolt.org/z/39T65vh8M Now that it is we need a more useful llc test	2025-08-27 07:29:30 +00:00
Josh Stone	e6ae4e689c	[PowerPC] Indicate that PPC32PICGOT clobbers LR (#154654 ) This pseudo-instruction emits a local `bl` writing LR, so that must be saved and restored for the function to return to the right place. If not, we'll return to the inline `.long` that the `bl` stepped over. This fixes the `SIGILL` seen in rayon-rs/rayon#1268.	2025-08-25 15:31:27 -07:00
RolandF77	d1cbe6ed74	[PowerPC] Add DMF builtins for build and disassemble (#153097 ) Add support for PPC Dense Math builtins mma_build_dmr and mma_disassemble_dmr builtins.	2025-08-25 12:14:55 -04:00
Matt Arsenault	65d12622fa	RuntimeLibcalls: Add entries for stackprotector globals (#154930 ) Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie, and __guard_local. As far as I can tell these are all just different names for the same shaped functionality on different systems. These aren't really functions, but special global variable names. They should probably be treated the same way; all the same contexts that need to know about emittable function names also need to know about this. This avoids a special case check in IRSymtab. This isn't a complete change, there's a lot more cleanup which should be done. The stack protector configuration system is a complete mess. There are multiple overlapping controls, used in 3 different places. Some of the target control implementations overlap with conditions used in the emission points, and some use correlated but not identical conditions in different contexts. i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and insertSSPDeclarations are all used in inconsistent ways so I don't know if I've tracked the intention of the system correctly. The PowerPC test change is a bug fix on linux. Previously the manual conditions were based around !isOSOpenBSD, which is not the condition where __stack_chk_guard are used. Now getSDagStackGuard returns the proper global reference, resulting in LOAD_STACK_GUARD getting a MachineMemOperand which allows scheduling.	2025-08-23 10:21:00 +09:00
DanilaZhebryakov	0a3ee7de9c	[PowerPC] fix bug affecting float to int32 conversion on LE PowerPC (#150194 ) When moving fcti results from float registers to normal registers through memory, even though MPI was adjusted to account for endianness, FIPtr was always adjusted for big-endian, which caused loads of wrong half of a value in little-endian mode.	2025-08-20 12:37:14 +02:00
Aditi Medhane	948abf1bf5	[PowerPC] Add BCDCOPYSIGN and BCDSETSIGN Instruction Support (#144874 ) Support the following BCD format conversion builtins for PowerPC. - `__builtin_bcdcopysign` – Conversion that returns the decimal value of the first parameter combined with the sign code of the second parameter. ` - `__builtin_bcdsetsign` – Conversion that sets the sign code of the input parameter in packed decimal format. > Note: This built-in function is valid only when all following conditions are met: > -qarch is set to utilize POWER9 technology. > The bcd.h file is included. ## Prototypes ```c vector unsigned char __builtin_bcdcopysign(vector unsigned char, vector unsigned char); vector unsigned char __builtin_bcdsetsign(vector unsigned char, unsigned char); ``` ## Usage Details `__builtin_bcdsetsign`: Returns the packed decimal value of the first parameter combined with the sign code. The sign code is set according to the following rules: - If the packed decimal value of the first parameter is positive, the following rules apply: - If the second parameter is 0, the sign code is set to 0xC. - If the second parameter is 1, the sign code is set to 0xF. - If the packed decimal value of the first parameter is negative, the sign code is set to 0xD. > notes: > The second parameter can only be 0 or 1. > You can determine whether a packed decimal value is positive or negative as follows: > - Packed decimal values with sign codes 0xA, 0xC, 0xE, or 0xF are interpreted as positive. > - Packed decimal values with sign codes 0xB or 0xD are interpreted as negative. --------- Co-authored-by: Aditi-Medhane <aditi.medhane@ibm.com>	2025-08-19 14:47:27 +05:30
Theodoros Theodoridis	d15b7a83a7	[llvm][LICM] Limit multi-use BOAssociation to FP and Vector (#149829 ) Limit the re-association of BOps with multiple users to FP and Vector arithmetic.	2025-08-14 11:56:55 +01:00

1 2 3 4 5 ...

4185 Commits