1959 Commits

Author SHA1 Message Date
Matt Arsenault
a757c4e74e
CodeGen: Add subtarget to TargetLoweringBase constructor (#168620)
Currently LibcallLoweringInfo is defined inside of TargetLowering,
which is owned by the subtarget. Pass in the subtarget so we can
construct LibcallLoweringInfo with the subtarget. This is a temporary
step that should be revertable in the future, after LibcallLoweringInfo
is moved out of TargetLowering.
2025-11-19 19:18:13 +00:00
Aditi Medhane
fa50a684c5
[PowerPC] Add custom lowering for SADD overflow for i32 and i64 (#159255)
This patch improves the codegen for saddo on i32 and i64 in both 32-bit
and 64-bit modes by custom lowering. It implements signed-add overflow
detection using the `(x eqv y) & (sum xor x)`bit-level sequence.
2025-11-19 10:11:51 +05:30
Sergei Barannikov
43dacd07f6
[PowerPC] TableGen-erate SDNode descriptions (#168108)
This allows SDNodes to be validated against their expected type profiles
and reduces the number of changes required to add a new node.

The validation functionality has detected several issues, see
`PPCSelectionDAGInfo::verifyTargetNode()`.

Most of the nodes have a description in `*.td` files and were
successfully "imported". Those that don't have a description are listed
in the enum in `PPCSelectionDAGInfo.td`. These nodes are not validated.

Part of #119709.

Pull Request: https://github.com/llvm/llvm-project/pull/168108
2025-11-17 22:58:26 +00:00
zhijian lin
55aff64d2c
[PowerPC] fold i128 equality/inequality compares of two loads into a vectorized compare using vcmpequb.p when Altivec is available (#158657)
The patch add 16 bytes load size for function
PPCTTIImpl::enableMemCmpExpansion and fold i128 equality/inequality
compares of two loads into a vectorized compare using vcmpequb.p when
Altivec is available.

Rationale:
A scalar i128 SETCC (eq/ne) normally lowers to multiple scalar ops. On
VSX-capable subtargets, we can instead reinterpret the i128 loads as
v16i8 vectors and use the Altive vcmpequb.p instruction to perform a
full 128-bit equality check in a single vector compare.

Example Result:
This transformation replaces memcmp(a, b, 16) with two vector loads and
one vector compare instruction.
2025-11-13 10:06:36 -05:00
RolandF77
411ea8e9dd
[PowerPC] Lowering support for EVL type VP_LOAD/VP_STORE (#165910)
Map EVL type VP_LOAD/VP_STORE for fixed length vectors to PPC load/store
with length.
2025-11-07 10:27:46 -05:00
Shimin Cui
531fd45e92
[PPC] Set minimum of largest number of comparisons to use bit test for switch lowering (#155910)
Currently it is considered suitable to lower to a bit test for a set of
switch case clusters when the the number of unique destinations
(`NumDests`) and the number of total comparisons (`NumCmps`) satisfy:
`(NumDests == 1 && NumCmps >= 3) || (NumDests == 2 && NumCmps >= 5) ||
(NumDests == 3 && NumCmps >= 6)`

However it is found for some cases on powerpc, for example, when
NumDests is 3, and the number of comparisons for each destination is all
2, it's not profitable to lower the switch to bit test. This is to add
an option to set the minimum of largest number of comparisons to use bit
test for switch lowering.

---------

Co-authored-by: Shimin Cui <scui@xlperflep9.rtp.raleigh.ibm.com>
2025-10-28 10:24:32 -04:00
paperchalice
15d11ebc84
[NFC] "unsafe-fp-math" post cleanup (code comments part) (#164582) 2025-10-22 11:07:23 +00:00
zhijian lin
7aa6c62bdb
[PowecPC] Hint branch bne- for atomic operation after the store-conditional (#152529)
The branches emitted for atomic operations after the store-conditional
are currently not hinted, even though they should be.

According to the Power10 Processor Chip User’s Manual:

` “Without static prediction, if the lock is not acquired in the first
iteration, the branch history mechanism works to update the prediction
to predict taken; that is, predict lock acquisition failure and cause
more lwarx traffic for the next iteration.”`

This patch addresses the issue by adding explicit branch hints for
atomic operations after the store-conditional.
2025-10-21 09:37:30 -04:00
paperchalice
26feb1a9f1
[PowerPC] Remove UnsafeFPMath uses (#154901)
Try to remove `UnsafeFPMath` uses in PowerPC backend. These global flags
block some improvements like
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797.
Remove them incrementally.

FP operations may raise exceptions are replaced by constrained
intrinsics. However, vector type is not supported by these intrinsics.
2025-10-21 19:01:34 +08:00
Tony Varghese
60ee515b8c
[PowerPC] Emit lxvkq and vsrq instructions for build vector patterns (#157625)
### Optimize BUILD_VECTOR having special quadword patterns

This change optimizes `BUILD_VECTOR` operations by using the `lxvkq` or
`xxpltib + vsrq` instructions to inline constants matching specific
128-bit patterns:
- **MSB set pattern**: `0x8000_0000_0000_0000_0000_0000_0000_0000`
- **LSB set pattern**: `0x0000_0000_0000_0000_0000_0000_0000_0001`

### Implementation Details

The `lxvkq` instruction loads special quadword values into VSX
registers:
```asm
lxvkq XT, UIM
# When UIM=16: loads 0x8000_0000_0000_0000_0000_0000_0000_0000
```

The optimization reconstructs the 128-bit register pattern from
`BUILD_VECTOR` operands, accounting for target endianness. For example,
the MSB pattern can be represented as:
- **Big-Endian**: `<i64 -9223372036854775808, i64 0>`
- **Little-Endian**: `<i64 0, i64 -9223372036854775808>`

Both produce the same register value:
`0x8000_0000_0000_0000_0000_0000_0000_0000`

### MSB Pattern (`0x8000...0000`)
All vector types (`v2i64`, `v4i32`, `v8i16`, `v16i8`) generate:
```asm
lxvkq v2, 16
```

### LSB Pattern (`0x0000...0001`)
All vector types generate:
```asm
xxspltib v2, 255
vsrq v2, v2, v2
```

---------

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
2025-10-15 10:54:04 +05:30
AZero13
07eeb5f08d
[PowerPC] Lower ucmp using subtractions (#146446)
Source: Hacker's delight, page 21.

Using the carry, we can use contractions to use the ucmp.
2025-10-11 12:34:30 +09:00
Nikita Popov
8b824f3b3e
[PowerPC] Avoid working on deleted node in ext bool trunc combine (#160050)
This code was already creating HandleSDNodes to handle the case where a
node gets replaced with an equivalent node. However, the code before the
handles are created also performs RAUW operations, which can end up
CSEing and deleting nodes.

Fix this issue by moving the handle creation earlier.

Fixes https://github.com/llvm/llvm-project/issues/160040.
2025-09-22 21:37:13 +02:00
RolandF77
1eb575dcae
[PowerPC] Fix vector extend result types in BUILD_VECTOR lowering (#159398)
The result type of the vector extend intrinsics generated by the
BUILD_VECTOR lowering code should match how they are actually defined.
Currently the result type is defaulting to the operand type there. This
can conflict with calls to the same intrinsic from other paths.
2025-09-19 10:43:22 -04:00
Kazu Hirata
d77aafbeee
[PowerPC] Remove an unnecessary cast (NFC) (#156599)
getSExtValue already returns int64_t.
2025-09-03 07:48:36 -07:00
Tony Varghese
3fc1aad65b
[PowerPC] Merge vsr(vsro(input, byte_shift), bit_shift) to vsrq(input, res_bit_shift) (#154388)
This change implements a patfrag based pattern matching ~dag combiner~
that combines consecutive `VSRO (Vector Shift Right Octet)` and `VSR
(Vector Shift Right)` instructions into a single `VSRQ (Vector Shift
Right Quadword)` instruction on Power10+ processors.

Vector right shift operations like `vec_srl(vec_sro(input, byte_shift),
bit_shift)` generate two separate instructions `(VSRO + VSR)` when they
could be optimised into a single `VSRQ `instruction that performs the
equivalent operation.

```
vsr(vsro (input, vsro_byte_shift), vsr_bit_shift) to vsrq(input, vsrq_bit_shift) 
where vsrq_bit_shift = (vsro_byte_shift * 8) + vsr_bit_shift
```

Note:
```
 vsro : Vector Shift Right by Octet VX-form
- vsro VRT, VRA, VRB
- The contents of VSR[VRA+32] are shifted right by the number of bytes specified in bits 121:124 of VSR[VRB+32].
	- Bytes shifted out of byte 15 are lost. 
	- Zeros are supplied to the vacated bytes on the left.
- The result is placed into VSR[VRT+32].

vsr : Vector Shift Right VX-form
- vsr VRT, VRA, VRB
- The contents of VSR[VRA+32] are shifted right by the number of bits specified in bits 125:127 of VSR[VRB+32]. 3 bits.
	- Bits shifted out of bit 127 are lost.
	- Zeros are supplied to the vacated bits on the left.
- The result is place into VSR[VRT+32], except if, for any byte element in VSR[VRB+32], the low-order 3 bits are not equal to the shift amount, then VSR[VRT+32] is undefined.

vsrq : Vector Shift Right Quadword VX-form
- vsrq VRT,VRA,VRB 
- Let src1 be the contents of VSR[VRA+32]. Let src2 be the contents of VSR[VRB+32]. 
- src1 is shifted right by the number of bits specified in the low-order 7 bits of src2.
	- Bits shifted out the least-significant bit are lost. 
	- Zeros are supplied to the vacated bits on the left. 
	- The result is placed into VSR[VRT+32].
```

---------

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
2025-09-01 10:14:12 +05:30
RolandF77
d1cbe6ed74
[PowerPC] Add DMF builtins for build and disassemble (#153097)
Add support for PPC Dense Math builtins mma_build_dmr and
mma_disassemble_dmr builtins.
2025-08-25 12:14:55 -04:00
Matt Arsenault
65d12622fa
RuntimeLibcalls: Add entries for stackprotector globals (#154930)
Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie,
and __guard_local. As far as I can tell these are all just different
names for the same shaped functionality on different systems.

These aren't really functions, but special global variable names. They
should probably be treated the same way; all the same contexts that
need to know about emittable function names also need to know about
this. This avoids a special case check in IRSymtab.

This isn't a complete change, there's a lot more cleanup which
should be done. The stack protector configuration system is a
complete mess. There are multiple overlapping controls, used in
3 different places. Some of the target control implementations overlap
with conditions used in the emission points, and some use correlated
but not identical conditions in different contexts.

i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and
insertSSPDeclarations are all used in inconsistent ways so I don't know
if I've tracked the intention of the system correctly.

The PowerPC test change is a bug fix on linux. Previously the manual
conditions were based around !isOSOpenBSD, which is not the condition
where __stack_chk_guard are used. Now getSDagStackGuard returns the
proper global reference, resulting in LOAD_STACK_GUARD getting a
MachineMemOperand which allows scheduling.
2025-08-23 10:21:00 +09:00
Kazu Hirata
11b4f110e0
[llvm] Remove unused includes of SmallSet.h (NFC) (#154893)
We just replaced SmallSet<T *, N> with SmallPtrSet<T *, N>, bypassing
the redirection found in SmallSet.h.  With that, we no longer need to
include SmallSet.h in many files.
2025-08-22 10:33:46 -07:00
DanilaZhebryakov
0a3ee7de9c
[PowerPC] fix bug affecting float to int32 conversion on LE PowerPC (#150194)
When moving fcti results from float registers to normal registers
through memory, even though MPI was adjusted to account for endianness,
FIPtr was always adjusted for big-endian, which caused loads of wrong
half of a value in little-endian mode.
2025-08-20 12:37:14 +02:00
Nikita Popov
fea7e6934a
[PowerPC] Remove custom original type tracking (NFCI) (#154090)
The OrigTy is passed to CC lowering nowadays, so use it directly instead
of custom pre-analysis.
2025-08-20 12:36:23 +02:00
Kazu Hirata
07eb7b7692
[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
2025-08-18 07:01:29 -07:00
Sergei Barannikov
aa2fe4eb3d
[PowerPC] Remove some unused SDNodes and FastISel workaround (NFC) (#153964)
These nodes have never been used since introduction in 2013/2015.
2025-08-16 17:01:03 +00:00
Nikita Popov
01bc742185
[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817)
This ensures that the required fields are set, and also makes the
construction more convenient.
2025-08-15 18:06:07 +02:00
Nikita Popov
e92b7e9641
[CodeGen] Provide original IR type to CC lowering (NFC) (#152709)
It is common to have ABI requirements for illegal types: For example,
two i64 argument parts that originally came from an fp128 argument may
have a different call ABI than ones that came from a i128 argument.

The current calling convention lowering does not provide access to this
information, so backends come up with various hacks to support it (like
additional pre-analysis cached in CCState, or bypassing the default
logic entirely).

This PR adds the original IR type to InputArg/OutputArg and passes it
down to CCAssignFn. It is not actually used anywhere yet, this just does
the mechanical changes to thread through the new argument.
2025-08-11 08:57:53 +02:00
Nikita Popov
406d9b1dd6
[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319)
The information whether a specific argument is vararg or fixed is
currently stored separately from all the other argument information in
ArgFlags. This means that it is not accessible from CCAssign, and
backends have developed all kinds of workarounds for how they can access
it after all.

Move this information to ArgFlags to make it directly available in all
relevant places.

I've opted to invert this and store it as IsVarArg, as I think that both
makes the meaning more obvious and provides for a better default (which
is IsVarArg=false).
2025-08-07 09:12:40 +02:00
Sean Fertile
ab40909810
Implement the trampoline intrinsics and nest parameter for AIX. (#149388)
We can expand the init intrinsic to create a descriptor for the nested
procedure by combining the entry point and TOC pointer from the global
descriptor with the nest argument. The normal indirect call sequence
then calls the nested procedure through the descriptor like all other
calls. Patch also implements support for a nest parameter by mapping it
to gpr 11.
2025-08-06 12:15:27 -04:00
zhijian lin
23b3203113
[POWERPC] Fixes an error in the handling of the MTVSRBMI instruction for big-endian (#151565)
The patch fixed a bug introduced patch [[PowePC] using MTVSRBMI
instruction instead of constant pool in
power10+](https://github.com/llvm/llvm-project/pull/144084#top).

The issue arose because the layout of vector register elements differs
between little-endian and big-endian modes — specifically, the elements
appear in reverse order. This led to incorrect behavior when loading
constants using MTVSRBMI in big-endian configurations.
2025-08-06 09:36:37 -04:00
Paul Walker
94d374ab6c
[LLVM][CGP] Allow finer control for sinking compares. (#151366)
Compare sinking is selectable based on the result of
hasMultipleConditionRegisters. This function is too coarse grained by
not taking into account the differences between scalar and vector
compares. This PR extends the interface to take an EVT to allow finer
control.
    
The new interface is used by AArch64 to disable sinking of scalable
vector compares, but with isProfitableToSinkOperands updated to maintain
the cases that are specifically tested.
2025-08-05 11:43:41 +01:00
Fangrui Song
d6c2e53151 MCSymbolXCOFF: Migrate away from classof
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
2025-08-03 18:18:44 -07:00
Amy Kwan
f48a8da342
[AIX] Handle arbitrary sized integers when lowering formal arguments passed on the stack (#149351)
When arbitrary sized (non-simple type, or non-power of two types)
integers are passed on the stack, these integers are not handled when
lowering formal arguments on AIX as we always assume we will encounter
simple type integers.

However, it is possible for frontends to generate arbitrary sized
immediate values in IR. Specifically in rustc, it will generate an
integer value in LLVM IR for small structures that are less than a
pointer size, which is done for optimization purposes for the Rust ABI.
For example, if a Rust structure of three characters is passed into
function on the stack,
```
struct my_struct {
  field1: u8,
  field2: u8,
  field3: u8,
}
```
This will generate an `i24` type in LLVM IR.

Currently, it is not obvious for the backend to distinguish an integer
versus something that wasn't an integer to begin with (such as a
struct), and the latter case would not have an extend on the parameter.
Thus, this PR allows us to perform a truncation and extend on integers,
both non-simple and simple types.
2025-08-01 08:01:26 -04:00
Boyao Wang
697beb3f17
[TargetLowering] Change getOptimalMemOpType and findOptimalMemOpLowering to take LLVM Context (#147664)
Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So
that we can use EVT::getVectorVT to generate EVT type in
getOptimalMemOpType.

Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).
2025-07-10 11:11:09 +08:00
Dominik Steenken
acdf1c7526
[DAG] Add generic expansion for ISD::FCANONICALIZE nodes (#142105)
This PR takes the work previously done by @pawan-nirpal-031 on X86 in
#106370, and makes it available in common code. This should enable all
targets to use `__builtin_canonicalize` for all `f(16|32|64|128)` data
types.

Canonicalization is implemented here as multiplication by `1.0`, as
suggested in [the
docs](https://llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic).
2025-07-08 16:12:17 +01:00
Matt Arsenault
d8ef156379
DAG: Remove verifyReturnAddressArgumentIsConstant (#147240)
The intrinsic argument is already marked with immarg so non-constant
values are rejected by the IR verifier.
2025-07-07 16:28:47 +09:00
Kazu Hirata
f46c1d6bcc [PowerPC] Fix a warning
This patch fixes:

  llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9588:16: error: unused
  variable 'NumOps' [-Werror,-Wunused-variable]
2025-07-04 07:53:29 -07:00
zhijian lin
45909ec469
[PowePC] using MTVSRBMI instruction instead of constant pool in power10+ (#144084)
The instruction MTVSRBMI set 0x00(or 0xFF) to each byte of VSR based on
the bits mask. Using the instruction instead of constant pool can reduce
the asm code size and instructions in power10.
2025-07-04 10:07:03 -04:00
Jie Fu
25d52fbf96 [PowerPC] Prevent copying in loop variables (NFC)
/data/llvm-project/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:5769:19: error: loop variable '[Reg, N]' creates a copy from type 'std::pair<unsigned int, llvm::SDValue> const' [-Werror,-Wrange-loop-construct]
  for (const auto [Reg, N] : RegsToPass)
                  ^
/data/llvm-project/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:5769:8: note: use reference type 'std::pair<unsigned int, llvm::SDValue> const &' to prevent copying
  for (const auto [Reg, N] : RegsToPass)
       ^~~~~~~~~~~~~~~~~~~~~
                  &
/data/llvm-project/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:6193:19: error: loop variable '[Reg, N]' creates a
copy from type 'std::pair<unsigned int, llvm::SDValue> const' [-Werror,-Wrange-loop-construct]
  for (const auto [Reg, N] : RegsToPass) {
                  ^
/data/llvm-project/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:6193:8: note: use reference type 'std::pair<unsigned int, llvm::SDValue> const &' to prevent copying
  for (const auto [Reg, N] : RegsToPass) {
       ^~~~~~~~~~~~~~~~~~~~~
                  &
/data/llvm-project/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:6806:19: error: loop variable '[Reg, N]' creates a copy from type 'std::pair<unsigned int, llvm::SDValue> const' [-Werror,-Wrange-loop-construct]
  for (const auto [Reg, N] : RegsToPass) {
                  ^
/data/llvm-project/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:6806:8: note: use reference type 'std::pair<unsigned int, llvm::SDValue> const &' to prevent copying
  for (const auto [Reg, N] : RegsToPass) {
       ^~~~~~~~~~~~~~~~~~~~~
                  &
3 errors generated.
2025-06-29 10:21:00 +08:00
Kazu Hirata
bad5a740e1
[PowerPC] Use range-based for loops (NFC) (#146221) 2025-06-28 13:04:08 -07:00
Wael Yehia
735d721de4
[PowerPC] Fix handling of undefs in the PPC::isSplatShuffleMask query (#145149)
Currently, the query assumes that a single undef byte implies the rest of
the `EltSize - 1` bytes are undefs, but that's not always true.
e.g. isSplatShuffleMask(
<0,1,2,3,4,5,6,7,undef,undef,undef,undef,0,1,2,3>, 8) should return
false.

---------

Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
2025-06-23 13:22:33 -04:00
Matt Arsenault
48155f93dd
CodeGen: Emit error if getRegisterByName fails (#145194)
This avoids using report_fatal_error and standardizes the error
message in a subset of the error conditions.
2025-06-23 16:33:35 +09:00
Nikita Popov
7ea7ccd24d
[PowerPC][AIX] Specify pointer info and alignment for stack store (#144526)
When lowering call arguments to stack, specify a stack MPI, as well as
the stack alignment, instead of using the defaults (which would be an
unknown location with ABI alignment).

I believe the asm diffs are just changes in scheduling.
2025-06-18 10:50:17 +02:00
zhijian lin
85a9f2e148
[PowerPC] enable AtomicExpandImpl::expandAtomicCmpXchg for powerpc (#142395)
In PowerPC, the AtomicCmpXchgInst is lowered to
ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS. However, this node does not handle
the weak attribute of AtomicCmpXchgInst. As a result, when compiling C++
atomic_compare_exchange_weak_explicit, the generated assembly includes a
"reservation lost" loop — i.e., it branches back and retries if the
stwcx. (store-conditional) fails. This differs from GCC’s codegen, which
does not include that loop for weak compare-exchange.

Since PowerPC uses LL/SC-style atomic instructions, the patch enables
AtomicExpandImpl::expandAtomicCmpXchg for PowerPC. With this, the weak
attribute is properly respected, and the "reservation lost" loop is
removed for weak operations.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-06-13 09:14:48 -04:00
Matt Arsenault
55858527da
PowerPC: Move runtime libcall configuration to RuntimeLibcallsInfo (#142542)
These should not be set in the TargetLowering constructor,
RuntimeLibcalls
needs to be accurate outside of codegen contexts.
2025-06-10 07:28:04 +09:00
RolandF77
5d6218d311
[PowerPC] extend smaller splats into bigger splats (with fix) (#142194)
For pwr9, xxspltib is a byte splat with a range -128 to 127 - it can be
used with a following vector extend sign to make splats of i16, i32, or
i64 element size. For pwr8, vspltisw with a following vector extend sign
can be used to make splats of i64 elements in the range -16 to 15.

Add check for P8 to make sure the 64-bit vector ops are there.
2025-06-09 14:01:38 -04:00
Lei Huang
649020c680
[PowerPC] Change default for auto gen stxvp for cpu=future (#142826)
For cpu=future, we want to auto generate stxvp instructions by default.
2025-06-09 12:34:50 -04:00
Hubert Tong
8f486254e4 Revert "[PowerPC] extend smaller splats into bigger splats (#141282)"
The subject commit causes the build to ICE on AIX:
https://lab.llvm.org/buildbot/#/builders/64/builds/3890/steps/5/logs/stdio

This reverts commit 7fa365843d9f99e75c38a6107e8511b324950e74.
2025-05-29 01:10:55 -04:00
RolandF77
7fa365843d
[PowerPC] extend smaller splats into bigger splats (#141282)
For pwr9, xxspltib is a byte splat with a range -128 to 127 - it can be
used with a following vector extend sign to make splats of i16, i32, or
i64 element size. For pwr8, vspltisw with a following vector extend sign
can be used to make splats of i64 elements in the range -16 to 15.
2025-05-28 10:11:28 -04:00
Kazu Hirata
9738373c0b [PowerPC] Fix warnings
This patch fixes:

  llvm/lib/Target/PowerPC/PPCISelLowering.cpp:11897:8: error: unused
  variable 'IsV2048i1' [-Werror,-Wunused-variable]

  llvm/lib/Target/PowerPC/PPCISelLowering.cpp:12035:8: error: unused
  variable 'IsV2048i1' [-Werror,-Wunused-variable]
2025-05-26 08:55:51 -07:00
Maryam Moghadas
a54300b32c
[PowerPC] Add load/store support for v2048i1 and DMF cryptography instructions (#136145)
This commit adds support for loading and storing v2048i1 DMR pairs and
introduces Dense Math Facility cryptography instructions: DMSHA2HASH,
DMSHA3HASH, and DMXXSHAPAD, along with their corresponding intrinsics
and tests.
2025-05-26 10:59:35 -04:00
RolandF77
bbca78fbcb
[PowerPC] vector shift word/double by element size - 1 use all ones (#139794)
Vector shift word or double requires a shift amount vector of 31 or 63
which is too big for splat immediate and requires a multi-instruction
sequence. However the PPC instructions only use 5 or 6 bits of the shift
amount vector elements so an all ones mask, which we can generate
efficiently, works.
2025-05-23 10:49:37 -04:00
RolandF77
99f0309669
[PowerPC] catch v2i64 shift left by 1 is add case (#138772)
Catch missing case in PPC BE for v2i64 x << 1 and generate x + x.
2025-05-13 11:26:46 -04:00