llvm-project

Author	SHA1	Message	Date
RolandF77	1eb575dcae	[PowerPC] Fix vector extend result types in BUILD_VECTOR lowering (#159398 ) The result type of the vector extend intrinsics generated by the BUILD_VECTOR lowering code should match how they are actually defined. Currently the result type is defaulting to the operand type there. This can conflict with calls to the same intrinsic from other paths.	2025-09-19 10:43:22 -04:00
zhijian lin	be6c4d933d	[PowerPC] using milicode call for strlen instead of lib call (#153600 ) AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __strlen routine is a millicode implementation; we use millicode for the strlen function instead of a library call to improve performance.	2025-09-19 10:02:21 -04:00
Paul Walker	b7e4edca3d	[LLVM][CodeGen] Update PPCFastISel::SelectRet for ConstantInt based vectors. (#159331 ) The current implementation assumes ConstantInt return values are scalar, which is not true when use-constant-int-for-fixed-length-splat is enabled.	2025-09-19 13:15:57 +01:00
Craig Topper	f209d63b04	[SelectionDAGBuilder][PPC] Use getShiftAmountConstant. (#158400 ) The PowerPC changes are caused by shifts created by different IR operations being CSEd now. This allows consecutive loads to be turned into vectors earlier. This has effects on the ordering of other combines and legalizations. This leads to some improvements and some regressions.	2025-09-16 10:26:49 -07:00
Lei Huang	b22448c9ba	[PowerPC] Add intrinsic definition for load and store with Right Length Left-justified (#148873 )	2025-09-16 12:36:28 -04:00
Matt Arsenault	e5bbaa9c8f	PPC: Split 64bit target feature into 64bit and 64bit-support (#157206 ) This was being used for 2 different purposes. The TargetMachine constructor prepends +64bit based on isPPC64 triples as a mode switch. The same feature name was also explicitly added to different processors, making it impossible to perform a pure feature check for whether 64-bit mode is enabled ir not. i.e., checkFeatures("+64bit") would be true even for ppc32 triples. The comment in tablegen suggests it's relevant to track which processors support 64-bit mode independently of whether that's the active compile target, so replace that with a new feature.	2025-09-16 12:43:53 +09:00
zhijian lin	4bf0001c07	[PowerPC][NFC] Pre-commit test case: Implement a more efficient memcmp in cases where the length is known (#158367 ) The newly added test case will be used to verify a more efficient memcmp in cases where the length is known.	2025-09-15 10:26:01 -04:00
Tony Varghese	30010f49ca	[NFC][PowerPC] Pre-commit testcases for locking down the xxsel instructions for ternary(A, X, eqv(B,C)), ternary(A, X, not(C)), ternary(A, X, not(B)), ternary(A, X, nand(B,C)) and ternary(A, X, nor(B,C)) patterns (#158091 ) Pre-commit test case for exploitation of `xxsel` for ternary operations of the pattern. This adds support for v4i32, v2i64, v16i8 and v8i16 operand types for the following patterns. The following are the patterns involved in the change: ``` ternary(A, and(B,C), nor(B,C)) ternary(A, B, nor(B,C)) ternary(A, C, nor(B,C)) ternary(A, xor(B,C), nor(B,C)) ternary(A, not(C), nor(B,C)) ternary(A, not(B), nor(B,C)) ternary(A, nand(B,C), nor(B,C)) ternary(A, or(B,C), eqv(B,C)) ternary(A, nor(B,C), eqv(B,C)) ternary(A, not(C), eqv(B,C)) ternary(A, nand(B,C), eqv(B,C)) ternary(A, and(B,C), not(C)) ternary(A, B, not(C)) ternary(A, xor(B,C), not(C)) ternary(A, or(B,C), not(C)) ternary(A, not(B), not(C)) ternary(A, nand(B,C), not(C)) ternary(A, and(B,C), not(B)) ternary(A, xor(B,C), not(B)) ternary(A, or(B,C), not(B)) ternary(A, nand(B,C), not(B)) ternary(A, B, nand(B,C)) ternary(A, C, nand(B,C)) ternary(A, xor(B,C), nand(B,C)) ternary(A, or(B,C), nand(B,C)) ternary(A, eqv(B,C), nand(B,C)) ``` Exploitation of `xxeval` for the above patterns to be added as a follow up. Co-authored-by: Tony Varghese <tony.varghese@ibm.com>	2025-09-12 09:36:37 +05:30
Trevor Gross	a975e64239	[PowerPC] Extend and update the test for `half` support (NFC) (#152625 ) `f16` is more functional than just a storage type on the platform, though it does have some codegen issues [1]. To prepare for future changes, do the following nonfunctional updates to the existing `half` test: * Add tests for passing and returning the type directly. * Add tests showing bitcast behavior, which is currently incorrect but serves as a baseline. * Add tests for `fabs` and `copysign` (trivial operations that shouldn't require libcalls). * Add invocations for big-endian and for PPC32. * Rename the test to `half.ll` to reflect its status, which also matches other backends. [1]: https://github.com/llvm/llvm-project/issues/97975	2025-09-10 09:03:29 +00:00
Maryam Moghadas	2bd0d770af	[PowerPC] Support `-fpatchable-function-entry` on PPC64LE (#151569 ) This patch enables `-fpatchable-function-entry` on PPC64 little-endian Linux. It is mutually exclusive with existing XRay instrumentation on this target.	2025-09-09 16:43:18 -04:00
Florian Hahn	74ec38fad0	[SCEV] Fold (C * A /u C) -> A, if A is a multiple of C and C a pow-of-2. (#156730 ) Alive2 Proof: https://alive2.llvm.org/ce/z/JoHJE9 PR: https://github.com/llvm/llvm-project/pull/156730	2025-09-05 08:45:13 +00:00
Himadhith	ffbd616210	[NFC][PowerPC] adding the options for register names and VSR to VR (#157007 ) NFC patch to add the flags -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr to the following test files ``` llvm/test/CodeGen/PowerPC/recipest.ll llvm/test/CodeGen/PowerPC/setcc-logic.ll llvm/test/CodeGen/PowerPC/vector-popcnt-128-ult-ugt.ll ``` Created this PR based on this discussion: https://github.com/llvm/llvm-project/pull/151971#issuecomment-3234090675 Co-authored-by: himadhith <himadhith.v@ibm.com>	2025-09-05 10:27:02 +05:30
zhijian lin	36cb33bbca	support branch hint for AtomicExpandImpl::expandAtomicCmpXchg (#152366 ) The patch add branch hint for AtomicExpandImpl::expandAtomicCmpXchg, For example: in PowerPC, it support branch hint as ``` loop: lwarx r6,0,r3 # load and reserve cmpw r4,r6 #1st 2 operands equal? bne- exit #skip if not bne- exit #skip if not stwcx. r5,0,r3 #store new value if still res’ved bne- loop #loop if lost reservation bne- loop #loop if lost reservation exit: mr r4,r6 #return value from storage ``` `-` hints not taken, `+` hints taken,	2025-09-02 09:33:28 -04:00
Himadhith	09350bd1c5	[NFC][PowerPC] adding the arguments for register names and VSR to VR (#155991 ) NFC patch to add the flags `-ppc-asm-full-reg-names --ppc-vsr-nums-as-vr` to the test file `llvm/test/CodeGen/PowerPC/check-zero-vector.ll`. Created this PR based on this discussion: https://github.com/llvm/llvm-project/pull/151971#issuecomment-3234090675 Co-authored-by: himadhith <himadhith.v@ibm.com> Co-authored-by: Lei Huang <lei@ca.ibm.com>	2025-09-01 10:17:14 +05:30
Tony Varghese	3fc1aad65b	[PowerPC] Merge vsr(vsro(input, byte_shift), bit_shift) to vsrq(input, res_bit_shift) (#154388 ) This change implements a patfrag based pattern matching ~dag combiner~ that combines consecutive `VSRO (Vector Shift Right Octet)` and `VSR (Vector Shift Right)` instructions into a single `VSRQ (Vector Shift Right Quadword)` instruction on Power10+ processors. Vector right shift operations like `vec_srl(vec_sro(input, byte_shift), bit_shift)` generate two separate instructions `(VSRO + VSR)` when they could be optimised into a single `VSRQ `instruction that performs the equivalent operation. ``` vsr(vsro (input, vsro_byte_shift), vsr_bit_shift) to vsrq(input, vsrq_bit_shift) where vsrq_bit_shift = (vsro_byte_shift * 8) + vsr_bit_shift ``` Note: ``` vsro : Vector Shift Right by Octet VX-form - vsro VRT, VRA, VRB - The contents of VSR[VRA+32] are shifted right by the number of bytes specified in bits 121:124 of VSR[VRB+32]. - Bytes shifted out of byte 15 are lost. - Zeros are supplied to the vacated bytes on the left. - The result is placed into VSR[VRT+32]. vsr : Vector Shift Right VX-form - vsr VRT, VRA, VRB - The contents of VSR[VRA+32] are shifted right by the number of bits specified in bits 125:127 of VSR[VRB+32]. 3 bits. - Bits shifted out of bit 127 are lost. - Zeros are supplied to the vacated bits on the left. - The result is place into VSR[VRT+32], except if, for any byte element in VSR[VRB+32], the low-order 3 bits are not equal to the shift amount, then VSR[VRT+32] is undefined. vsrq : Vector Shift Right Quadword VX-form - vsrq VRT,VRA,VRB - Let src1 be the contents of VSR[VRA+32]. Let src2 be the contents of VSR[VRB+32]. - src1 is shifted right by the number of bits specified in the low-order 7 bits of src2. - Bits shifted out the least-significant bit are lost. - Zeros are supplied to the vacated bits on the left. - The result is placed into VSR[VRT+32]. ``` --------- Co-authored-by: Tony Varghese <tony.varghese@ibm.com>	2025-09-01 10:14:12 +05:30
Tony Varghese	2e7ea9c945	[PowerPC] Exploit xxeval instruction for operations of the form ternary(A,X,B) and ternary(A,X,C). (#152956 ) Adds support for ternary equivalent operations of the form `ternary(A, X, B)` and `ternary(A, X, C)` where `X=[and(B,C)\| nor(B,C)\| eqv(B,C)\| nand(B,C)]`. The following are the patterns involved and the imm values: \| Operation \| Immediate Value \| \|----------------------------\|---------------------\| \| ternary(A, and(B,C), B) \| 49 \| \| ternary(A, nor(B,C), B) \| 56 \| \| ternary(A, eqv(B,C), B) \| 57 \| \| ternary(A, nand(B,C), B) \| 62 \| \| \| \| \| ternary(A, and(B,C), C) \| 81 \| \| ternary(A, nor(B,C), C) \| 88 \| \| ternary(A, eqv(B,C), C) \| 89 \| \| ternary(A, nand(B,C), C) \| 94 \| eg. `xxeval XT, XA, XB, XC, 49` - performs `XA ? and(XB, XC) : B`and places the result in `XT`. This is the continuation of [[PowerPC] Exploit xxeval instruction for ternary patterns - ternary(A, X, and(B,C))](https://github.com/llvm/llvm-project/pull/141733#top). --------- Co-authored-by: Tony Varghese <tony.varghese@ibm.com>	2025-09-01 10:13:54 +05:30
paperchalice	19464d951a	[NFC] #155740 post cleanup (#155966 ) Remove all "approx-func-fp-math" in tests.	2025-08-29 12:45:38 +08:00
Maryam Moghadas	242d51afe5	[PowerPC] Add DMR and WACC COPY support (#149129 ) This patch updates PPCInstrInfo::copyPhysReg to support DMR and WACC register classes and extends the PPCVSXCopy pass to handle specific WACC copy patterns.	2025-08-27 11:07:24 -04:00
Simon Pilgrim	6aed01a2a7	[PowerPC] ppc64-P9-vabsd.ll - update v16i8 abdu test now that it vectorizes in the middle-end (#154712 ) The scalarized IR was written before improvements to SLP / cost models ensured that the abs intrinsic was easily vectorizable opt -O3 : https://zig.godbolt.org/z/39T65vh8M Now that it is we need a more useful llc test	2025-08-27 07:29:30 +00:00
Josh Stone	e6ae4e689c	[PowerPC] Indicate that PPC32PICGOT clobbers LR (#154654 ) This pseudo-instruction emits a local `bl` writing LR, so that must be saved and restored for the function to return to the right place. If not, we'll return to the inline `.long` that the `bl` stepped over. This fixes the `SIGILL` seen in rayon-rs/rayon#1268.	2025-08-25 15:31:27 -07:00
RolandF77	d1cbe6ed74	[PowerPC] Add DMF builtins for build and disassemble (#153097 ) Add support for PPC Dense Math builtins mma_build_dmr and mma_disassemble_dmr builtins.	2025-08-25 12:14:55 -04:00
Matt Arsenault	65d12622fa	RuntimeLibcalls: Add entries for stackprotector globals (#154930 ) Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie, and __guard_local. As far as I can tell these are all just different names for the same shaped functionality on different systems. These aren't really functions, but special global variable names. They should probably be treated the same way; all the same contexts that need to know about emittable function names also need to know about this. This avoids a special case check in IRSymtab. This isn't a complete change, there's a lot more cleanup which should be done. The stack protector configuration system is a complete mess. There are multiple overlapping controls, used in 3 different places. Some of the target control implementations overlap with conditions used in the emission points, and some use correlated but not identical conditions in different contexts. i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and insertSSPDeclarations are all used in inconsistent ways so I don't know if I've tracked the intention of the system correctly. The PowerPC test change is a bug fix on linux. Previously the manual conditions were based around !isOSOpenBSD, which is not the condition where __stack_chk_guard are used. Now getSDagStackGuard returns the proper global reference, resulting in LOAD_STACK_GUARD getting a MachineMemOperand which allows scheduling.	2025-08-23 10:21:00 +09:00
DanilaZhebryakov	0a3ee7de9c	[PowerPC] fix bug affecting float to int32 conversion on LE PowerPC (#150194 ) When moving fcti results from float registers to normal registers through memory, even though MPI was adjusted to account for endianness, FIPtr was always adjusted for big-endian, which caused loads of wrong half of a value in little-endian mode.	2025-08-20 12:37:14 +02:00
Aditi Medhane	948abf1bf5	[PowerPC] Add BCDCOPYSIGN and BCDSETSIGN Instruction Support (#144874 ) Support the following BCD format conversion builtins for PowerPC. - `__builtin_bcdcopysign` – Conversion that returns the decimal value of the first parameter combined with the sign code of the second parameter. ` - `__builtin_bcdsetsign` – Conversion that sets the sign code of the input parameter in packed decimal format. > Note: This built-in function is valid only when all following conditions are met: > -qarch is set to utilize POWER9 technology. > The bcd.h file is included. ## Prototypes ```c vector unsigned char __builtin_bcdcopysign(vector unsigned char, vector unsigned char); vector unsigned char __builtin_bcdsetsign(vector unsigned char, unsigned char); ``` ## Usage Details `__builtin_bcdsetsign`: Returns the packed decimal value of the first parameter combined with the sign code. The sign code is set according to the following rules: - If the packed decimal value of the first parameter is positive, the following rules apply: - If the second parameter is 0, the sign code is set to 0xC. - If the second parameter is 1, the sign code is set to 0xF. - If the packed decimal value of the first parameter is negative, the sign code is set to 0xD. > notes: > The second parameter can only be 0 or 1. > You can determine whether a packed decimal value is positive or negative as follows: > - Packed decimal values with sign codes 0xA, 0xC, 0xE, or 0xF are interpreted as positive. > - Packed decimal values with sign codes 0xB or 0xD are interpreted as negative. --------- Co-authored-by: Aditi-Medhane <aditi.medhane@ibm.com>	2025-08-19 14:47:27 +05:30
Theodoros Theodoridis	d15b7a83a7	[llvm][LICM] Limit multi-use BOAssociation to FP and Vector (#149829 ) Limit the re-association of BOps with multiple users to FP and Vector arithmetic.	2025-08-14 11:56:55 +01:00
zhijian lin	4936fc5a56	[PowerPC][NFC] Pre-commit test case: use millicode for strlen instead of libcal (#153466 ) add test case to test lib call are used for the strlen.	2025-08-13 16:34:29 -04:00
Amy Kwan	63cc2e390d	[PowerPC][CodeGen] Expand ISD::AssertNoFPClass for ppc_fp128 (#152357 ) 780054d3ff18075a6bc433029f336931792b1d2d added support for `ISD::AssertNoFPClass`. This ISD node can be used with the `ppc_fp128` type, which is really just two `f64s` and requires expanding when used with `ISD::AssertNoFPClass`. Without the support for expanding the result, we get an assertion because the legalizer does not know how to expand the results of `ppc_fp128` with `ISD::AssertNoFPClass`. ``` ExpandFloatResult #0: t7: ppcf128 = AssertNoFPClass t5, TargetConstant:i32<3> LLVM ERROR: Do not know how to expand the result of this operator! ``` Thus, this patch aims to add support for the expand so we no longer assert. This fixes #151375.	2025-08-13 15:00:32 -04:00
Philip Reames	4d629f9744	[MIR] Remove std::variant from multiple save/restore point handling [nfc] (#153226 ) In review of bbde6b, I had originally proposed that we support the legacy text format. As review evolved, it bacame clear this had been a bad idea (too much complexity), but in order to let that patch finally move forward, I approved the change with the variant. This change undoes the variant, and updates all the tests to just use the array form.	2025-08-12 11:23:05 -07:00
zhijian lin	598f21e9fc	[PowerPC] need to set CallFrameSize for the pass PPCReduceCRLogicals when insert a new block (#151017 ) In the [ [CodeGen] Store call frame size in MachineBasicBlock](https://reviews.llvm.org/D156113), it mentions When a basic block has been split in the middle of a call sequence. the call frame size may not be zero, it need to set the setCallFrameSize for the new MachineBasicBlock. but in the function `splitMBB(BlockSplitInfo &BSI)` in the llvm/lib/Target/PowerPC/PPCReduceCRLogicals.cpp , it do not setCallFrameSzie for the new MachineBasicBlock `NewMBB`, we will setCallFrameSzie in the patch. the patch fix the crash mention in https://github.com/llvm/llvm-project/pull/144594#issuecomment-2993736654	2025-08-12 20:30:28 +09:00
Trevor Gross	00c4be3c9e	[Test] Add and update tests for `lrint`/`llrint` (NFC) (#152662 ) Many backends are missing either all tests for lrint, or specifically those for f16, which currently crashes for `softPromoteHalf` targets. For a number of popular backends, do the following: * Ensure f16, f32, f64, and f128 are all covered * Ensure both a 32- and 64-bit target are tested, if relevant * Add `nounwind` to clean up CFI output * Add a test covering the above if one did not exist * Always specify the integer type in intrinsic calls There are quite a few FIXMEs here, especially for `f16`, but much of this will be resolved in the near future.	2025-08-12 09:56:51 +09:00
Paul Murphy	5f864560a6	[PowerPC] fix lowering of SPILL_CRBIT on pwr9 and pwr10 (#146424 ) If a copy exists between creation of a crbit and a spill, machine-cp may delete the copy since it seems unaware of the relation between a cr and crbit. A fix was previously made for the generic ppc64 lowering. It should be applied to the pwr9 and pwr10 variants too. Likewise, relax and extend the pwr8 test to verify pwr9 and pwr10 codegen too. This fixes #143989.	2025-08-08 09:24:22 +02:00
zhijian lin	093439c688	[PowerPC][AIX] Using milicode for memcmp instead of libcall (#147093 ) AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __memcmp routine is a millicode implementation; we use millicode for the memcmp function instead of a library call to improve performance.	2025-08-07 13:13:56 -04:00
Sean Fertile	ab40909810	Implement the trampoline intrinsics and nest parameter for AIX. (#149388 ) We can expand the init intrinsic to create a descriptor for the nested procedure by combining the entry point and TOC pointer from the global descriptor with the nest argument. The normal indirect call sequence then calls the nested procedure through the descriptor like all other calls. Patch also implements support for a nest parameter by mapping it to gpr 11.	2025-08-06 12:15:27 -04:00
Simon Pilgrim	c4f6d34674	[DAG] getNode - fold (sext (trunc x)) -> x iff the upper bits are already signbits (#151945 ) Similar to what we already do for ZERO_EXTEND/ANY_EXTEND patterns.	2025-08-06 14:55:46 +01:00
zhijian lin	23b3203113	[POWERPC] Fixes an error in the handling of the MTVSRBMI instruction for big-endian (#151565 ) The patch fixed a bug introduced patch [[PowePC] using MTVSRBMI instruction instead of constant pool in power10+](https://github.com/llvm/llvm-project/pull/144084#top). The issue arose because the layout of vector register elements differs between little-endian and big-endian modes — specifically, the elements appear in reverse order. This led to incorrect behavior when loading constants using MTVSRBMI in big-endian configurations.	2025-08-06 09:36:37 -04:00
Himadhith	1f1b903a64	[NFC][PowerPC] Cleaning up test file and removing redundant front-end test (#151971 ) NFC patch to clean up extra lines of code in the file `llvm/test/CodeGen/PowerPC/check-zero-vector.ll` as the current one has loop unrolled. Also removing the file `clang/test/CodeGen/PowerPC/check-zero-vector.c` as the patch affects only the backend. Co-authored-by: himadhith <himadhith.v@ibm.com>	2025-08-06 15:59:47 +05:30
Sander de Smalen	ed5bd23867	Revert "Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (#134408 )" This reverts commit bae8f1336db6a7f3288a7dcf253f2d484743b257. Some issues were found: * https://github.com/llvm/llvm-project/issues/151768 * https://github.com/llvm/llvm-project/issues/151592 * https://github.com/llvm/llvm-project/pull/134408#issuecomment-3145468321 * https://github.com/llvm/llvm-project/issues/151888#issuecomment-3149286820 I'll revert this for the time being while I investigate.	2025-08-04 12:07:30 +00:00
Amy Kwan	f48a8da342	[AIX] Handle arbitrary sized integers when lowering formal arguments passed on the stack (#149351 ) When arbitrary sized (non-simple type, or non-power of two types) integers are passed on the stack, these integers are not handled when lowering formal arguments on AIX as we always assume we will encounter simple type integers. However, it is possible for frontends to generate arbitrary sized immediate values in IR. Specifically in rustc, it will generate an integer value in LLVM IR for small structures that are less than a pointer size, which is done for optimization purposes for the Rust ABI. For example, if a Rust structure of three characters is passed into function on the stack, ``` struct my_struct { field1: u8, field2: u8, field3: u8, } ``` This will generate an `i24` type in LLVM IR. Currently, it is not obvious for the backend to distinguish an integer versus something that wasn't an integer to begin with (such as a struct), and the latter case would not have an extend on the parameter. Thus, this PR allows us to perform a truncation and extend on integers, both non-simple and simple types.	2025-08-01 08:01:26 -04:00
Nikita Popov	0a41e7c87e	[LICM] Do not reassociate constant offset GEP (#151492 ) LICM tries to reassociate GEPs in order to hoist an invariant GEP. Currently, it also does this in the case where the GEP has a constant offset. This is usually undesirable. From a back-end perspective, constant GEPs are usually free because they can be folded into addressing modes, so this just increases register pressume. From a middle-end perspective, keeping constant offsets last in the chain makes it easier to analyze the relationship between multiple GEPs on the same base, especially after CSE. The worst that can happen here is if we start with something like ``` loop { p + 4x p + 4x + 1 p + 4x + 2 p + 4x + 3 } ``` And LICM converts it into: ``` p.1 = p + 1 p.2 = p + 2 p.3 = p + 3 loop { p + 4x p.1 + 4x p.2 + 4x p.3 + 4x } ``` Which is much worse than leaving it for CSE to convert to: ``` loop { p2 = p + 4*x p2 + 1 p2 + 2 p2 + 3 } ```	2025-08-01 09:43:15 +02:00
Sander de Smalen	bae8f1336d	Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (#134408 ) This tries to reland #123632 (previously reverted by commit 6b1db79887df19bc8e8c946108966aa6021c8b87) This PR aims to fix coalescing of SUBREG_TO_REG when sub-register liveness tracking is enabled and this is now the so-manieth reincarnation of this effort :) This change is needed in order to enable subreg liveness tracking for AArch64, because without the implicit-def, Machine Copy Propagation would remove a 'redundant' copy because it doesn't realise that the top 32-bits of the register are zeroed, which subsequent instructions rely on. Changes compared to previous PR: * Rather than updating all instructions that define the source register (SrcReg) of the SUBREG_TO_REG, this new approach only updates instructions that define SrcReg when they dominate the SUBREG_TO_REG. The live-ranges are updated accordingly.	2025-07-30 14:42:24 +01:00
Tony Varghese	59c3fe6505	[PowerPC] Exploit xxeval instruction for ternary patterns - ternary(A, X, and(B,C)) (#141733 ) ## Description <!--- Title/Description will be Subject/Body of commit message. --> <!--- Please be concise and limit the subject line to 50 characters, --> <!--- and wrap the Description at 72 characters. --> <!--- Describe why this is required, what problem it solves. --> Adds support for ternary equivalent operations of the form `ternary(A, X, and(B,C))` where `X=[xor(B,C)\| nor(B,C)\| eqv(B,C)\| not(B)\| not(C)]`. List of `xxeval` equivalent ternary operations added and the corresponding `imm` value required: Ternary Operator\| Imm Value --\|-- ternary(A, xor(B,C), and(B,C)) \| 22 ternary(A, nor(B,C), and(B,C)) \| 24 ternary(A, eqv(B,C), and(B,C)) \| 25 ternary(A, not(C), and(B,C)) \| 26 ternary(A, not(B), and(B,C)) \| 28 eg. `xxeval XT,XA,XB,XC,22` - performs `XA ? xor(XB, XC) : and(XB,XC)`and places the result in `XT`. Co-authored-by: Tony Varghese <tony.varghese@ibm.com>	2025-07-29 22:56:05 +05:30
Nikita Popov	fe0dbe0f29	[CodeGen] More consistently expand float ops by default (#150597 ) These float operations were expanded for scalar f32/f64/f128, but not for f16 and more problematically, not for vectors. A small subset of them was separately set to expand for vectors. Change these to always expand by default, and adjust targets to mark these as legal where necessary instead. This is a much safer default, and avoids unnecessary legalization failures because a target failed to manually mark them as expand. Fixes https://github.com/llvm/llvm-project/issues/110753. Fixes https://github.com/llvm/llvm-project/issues/121390.	2025-07-28 09:46:00 +02:00
Simon Pilgrim	c37942df00	[DAG] visitFREEZE - limit freezing of multiple operands (#149797 ) This is a partial revert of #145939 (I've kept the BUILD_VECTOR(FREEZE(UNDEF), FREEZE(UNDEF), elt2, ...) canonicalization) as we're getting reports of infinite loops (#148084). The issue appears to be due to deep chains of nodes and how visitFREEZE replaces all instances of an operand with a common frozen version - other users of the original frozen node then get added back to the worklist but might no longer be able to confirm a node isn't poison due to recursion depth limits on isGuaranteedNotToBeUndefOrPoison. The issue still exists with the old implementation but by only allowing a single frozen operand it helps prevent cases of interdependent frozen nodes. I'm still working on supporting multiple operands as its critical for topological DAG handling but need to get a fix in for trunk and 21.x. Fixes #148084	2025-07-22 15:40:55 +01:00
Guy David	cb6d1bbfcd	[PowerPC] Test SPE incompatibility with VSX (#147184 ) PPCSubtarget is not always initialized, depending on which passes are running, and in our downstream fork, -enable-matrix is the default configuration (regardless of whether matrix intrinsics are present in the IR), which triggers a fatal error in builtins-ppc-fpconstrained.c.	2025-07-17 00:29:38 +03:00
Matt Arsenault	3d50e1f3e8	RuntimeLibcalls: Add some tests for OpenBSD stack protectors (#147888 ) 7dce16f69dc3e26cb74d5ad38b0648a6f47f9640 removed a libcall for STACKPROTECTOR_CHECK_FAIL from OpenBSD but added no tests. Add a basic test copied from RISCV into all the backends on the OpenBSD page of supported architectures before I potentially break in in RuntimeLibcalls refactoring.	2025-07-15 15:50:54 +09:00
woruyu	b22b103c3d	[DAG] SelectionDAG::canCreateUndefOrPoison - add ISD::FCOPYSIGN (#148617 ) ### Summary This PR resolves https://github.com/llvm/llvm-project/issues/147694	2025-07-14 15:28:52 +01:00
Trevor Gross	0db197adef	[Test] Mark a number of libcall tests `nounwind` (#148329 ) Many tests for floating point libcalls include CFI directives, which isn't needed for the purpose of these tests. Mark some of the relevant test functions `nounwind` in order to remove this noise.	2025-07-12 11:57:28 +02:00
Himadhith	f9292c25cf	[NFC][PowerPC] Add test case for lockdown of vector compare greater than support for Zero vector comparisons (#147246 ) NFC patch to add testcase for locking down the support of Zero vector comparisons using the `vcmpgtuh (vector compare greater than unsigned halfword)` instruction. Currently `vcmpequh (vector compare equal unsigned halfword)` is in use. --------- Co-authored-by: himadhith <himadhith.v@ibm.com> Co-authored-by: Tony Varghese <tonypalampalliyil@gmail.com>	2025-07-11 11:10:22 +05:30
Fangrui Song	68494ae072	[XRay] xray_fn_idx: fix alignment directive Use `emitValueToAlignment` as the section does not contain code. `emitCodeAlignment` would lead to ALIGN relocations on RISC-V and LoongArch with linker relaxation. In addition, change the alignment to wordsize, sufficient for the runtime requirement (`XRayFunctionSledIndex`). Related to #147322	2025-07-08 21:52:53 -07:00
Simon Pilgrim	d3d8ef7e41	[PowerPC] licm-xxsplti.ll - regenerate test checks	2025-07-07 15:19:18 +01:00

1 2 3 4 5 ...

4160 Commits