4220 Commits

Author SHA1 Message Date
Frederik Harwath
5c05824d2b
[CodeGen] Rename expand-fp to expand-ir-insts (#172681)
The pass now contains a non-fp expansion and should
be used for any similar expansions regardless of the
types involved. Hence a generic name seems apt.

Rename the source files, pass, and adjust the pass
description. Move all tests for the expansions
that have previously been merged into the pass
to a single directory.
2025-12-18 11:15:04 +00:00
Kevin Per
98b82f90df
[PowerPC]: Add check for cast when shufflevector (#172443)
The crash happens because the cast for `Mask =
cast<ShuffleVectorSDNode>(Res)->getMask();` fails for node `t197: v16i8
= vector_shuffle<16,17,18,19,4,5,6,7,8,9,10,11,u,u,u,u> t196, t196`.
However, both `LHS` and `RHS` are the same node, so
`DAG.getCommutedVectorShuffle` doesn't return a `ShuffleVectorSDNode`
and crashes. The fix is to add a check before the cast is performed.

Closes https://github.com/llvm/llvm-project/issues/172265
2025-12-18 17:14:01 +08:00
Frederik Harwath
71760f324f
[CodeGen] Merge ExpandLargeDivRem into ExpandFp (#172680)
Both passes expand instructions at the IR level.
They use the same kind of instruction visitation
logic and contain significant code duplication e.g.
for scalarization.
2025-12-18 09:22:47 +01:00
Himadhith
c3e7a1ab8f
[NFC][PowerPC] Optimize vector compares for not equal to non zero vectors (#171635)
Lockdown instructions for vector compares `not equal to non-zero (Ex:
vec[i]!=7)`. Current implementation can be made better by removing the
negation and using the identity ``` 0XFFFF + 1 = 0 and 0 + 1 = 0 ```

Co-authored-by: himadhith <himadhith.v@ibm.com>
2025-12-12 11:23:14 +05:30
Nikita Popov
b7c0452a9a
[PowerPC][AIX] Specify correct ABI alignment for double (#144673)
Add `f64:32:64` to the data layout for AIX, to indicate that doubles
have a 32-bit ABI alignment and 64-bit preferred alignment.

Clang was already taking this into account, but it was not reflected in
LLVM's data layout.

A notable effect of this change is that `double` loads/stores with 4
byte alignment are no longer considered "unaligned" and avoid the
corresponding unaligned access legalization. I assume that this is
correct/desired for AIX. (The codegen previously already relied on this
in some places related to the call ABI simply by dint of assuming
certain stack locations were 8 byte aligned, even though they were only
actually 4 byte aligned.)

Fixes https://github.com/llvm/llvm-project/issues/133599.
2025-12-11 08:57:26 +01:00
Nikita Popov
5a24dfa339
[SDAG] Remove most non-canonical libcall handing (#171288)
This is a followup to https://github.com/llvm/llvm-project/pull/171114,
removing the handling for most libcalls that are already canonicalized
to intrinsics in the middle-end. The only remaining one is fabs, which
has more test coverage than the others.
2025-12-10 11:45:26 +01:00
paperchalice
c05ba635c4
[PowerPC] Use the same lowering rule for vector rounding instructions (#166307)
They should have the same lowering rule.
2025-12-09 00:49:14 +00:00
Sean Fertile
7dfe599bda
Fix VarArgs FixedStack object on AIX. (#170240)
Create a mutable aliased fixed stack object for the va_list when any of
the optional arguments are passed in gprs. Since we need to spill the
gpr registers into the parameter save area the stack object is not
immutable, and since the values will almost certainly be accessed
through the IR value for a va_list make the stack object aliased as
well.
2025-12-08 14:36:45 -05:00
zhijian lin
d1ad0856f8
Fix [PowerPC] llc crashed at -O1/O2/O3: Assertion `isImm() && "Wrong MachineOperand mutator"' failed. (#170548)
Fixed issue 
[[PowerPC] llc crashed at -O1/O2/O3: Assertion `isImm() && "Wrong
MachineOperand mutator"'
failed.](https://github.com/llvm/llvm-project/issues/167672)

the root cause of the crash, the IMM operand is in different operand num
of the instruction PPC::XXSPLTW and PPC::XXSPLTB/PPC::XXSPLTH.

and the patch also fix a potential bug that the new element index of
PPC::XXSPLTB/PPC::XXSPLTH/XXSPLTW use the same logic. It should be
different .We need to convert the element index into the proper unit
(byte for VSPLTB, halfword for VSPLTH, word for VSPLTW) because
PPC::XXSLDWI interprets its ShiftImm in 32-bit word units.
2025-12-08 11:16:55 -05:00
YunQiang Su
c6f45f51fb
PowerPC/VSX: Select FMINNUM and FMAXNUM (#135739)
In LangRef, we claim that FMINNUM and FMAXNUM should follow the minNum
and maxNum operators in IEEE754-2008.

PowerPC/VSX does have these instructions XSMINDP and XSMAXDP.

Now we use FMINNUM_IEEE and FMAXNUM_IEEE, since they are used by the
non-arch expand codes now.
In future, we may replace all FMINNUM_IEEE/FMAXNUM_IEEE with FMINNUM and
FMAXNUM.

---------

Co-authored-by: Your Name <you@example.com>
2025-12-08 13:18:52 +08:00
Maryam Moghadas
f650330665
[PowerPC] Add initial support for AMO load builtins (#168746)
This commit adds two Clang builtins for PowerPC AMO load operations:

__builtin_amo_lwat for 32-bit unsigned operations
__builtin_amo_ldat for 64-bit unsigned operations

Also adds an amo.h header that maps GCC's AMO functions to these Clang
builtins for compatibility.
2025-12-03 17:47:56 -05:00
Matt Arsenault
04c81a9973
CodeGen: Add LibcallLoweringInfo analysis pass (#168622)
The libcall lowering decisions should be program dependent,
depending on the current module's RuntimeLibcallInfo. We need
another related analysis derived from that plus the current
function's subtarget to provide concrete lowering decisions.

This takes on a somewhat unusual form. It's a Module analysis,
with a lookup keyed on the subtarget. This is a separate module
analysis from RuntimeLibraryAnalysis to avoid that depending on
codegen. It's not a function pass to avoid depending on any
particular function, to avoid repeated subtarget map lookups in
most of the use passes, and to avoid any recomputation in the
common case of one subtarget (and keeps it reusable across
repeated compilations).

This also switches ExpandFp and PreISelIntrinsicLowering as
a sample function and module pass. Note this is not yet wired
up to SelectionDAG, which is still using the LibcallLoweringInfo
constructed inside of TargetLowering.
2025-12-03 22:00:12 +01:00
Nikita Popov
d0f5a49fb6
[Support] Support debug counters in non-assertion builds (#170468)
This enables the use of debug counters in (non-assertion) release
builds. This is useful to enable debugging without having to switch to
an assertion-enabled build, which may not always be easy.

After some recent improvements, always supporting debug counters no
longer has measurable overhead.
2025-12-03 16:21:47 +01:00
zhijian lin
0c9c62adf1
[PowerPC ]convert (setcc (and X, 1), 0, eq) to XORI (and X, 1), 1 (#168384)
Convert `(setcc (and X, 1), 0, eq)` to `XORI (and X, 1), 1`  , it will save one instruction.
2025-11-25 13:16:39 -05:00
Sander de Smalen
e1b08731e5 Revert "Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG""
This reverts commit bb78728826ff57f3df859e79bfd857b5a175bb6d.
2025-11-25 11:01:27 +00:00
Sander de Smalen
bb78728826
Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"
A SUBREG_TO_REG instruction expresses that the top bits of the result
register are set to a certain value (e.g. 0).

The example below expresses that the result of %1 will have the top 32
bits zeroed and the lower 32bits being equal to the result of INSTR.
```
    %0:gpr32 = INSTR
    %1:gpr64 = SUBREG_TO_REG 0, %0, sub32
```
When the RegisterCoalescer tries to remove SUBREG_TO_REG instructions by
coalescing %0 into %1, it must keep the same semantics. Currently
however, the RegisterCoalescer would emit:
```
    %1.sub32:gpr64 = INSTR
```
which no longer expresses that the top 32-bits of the register are
defined (zeroed) by INSTR.

This may cause issues with e.g. machine copy propagation where the pass
may think it can remove a COPY-like instruction because the MIR says
only the bottom 32-bits are defined/used, even though other uses of the
register rely on the top 32-bits being zeroed by the COPY-like
instruction.

This PR changes the RegisterCoalescer to instead emit:
```
    undef %1.sub32:gpr64 = MOVimm32 42, implicit-def %1
```
to express that the entire contents of %1:gpr64 are defined by the
instruction.

This tries to reland #134408 which had to be reverted due to a few reported
failures.
2025-11-24 15:55:19 +00:00
hstk30-hw
a6cec3f3e5
Reland "[RegAlloc] Fix the terminal rule check for interfere with DstReg (#168661)" (#169219)
Reland d5f3ab8ec97786476a077b0c8e35c7c337dfddf2, fix testcases.
2025-11-24 09:27:25 +08:00
Aiden Grossman
d5f3ab8ec9 Revert "[RegAlloc] Fix the terminal rule check for interfere with DstReg (#168661)"
This reverts commit 0859ac5866a0228f5607dd329f83f4a9622dedcc.

This caused a couple test failures, likely due to a mid-air collision.
Reverting for now to get the tree back to green and allow the original
author to run UTC/friends and verify the output.
2025-11-23 05:17:45 +00:00
hstk30-hw
0859ac5866
[RegAlloc] Fix the terminal rule check for interfere with DstReg (#168661)
This maybe a bug which is introduced by commit
6749ae36b4a33769e7a77cf812d7cd0a908ae3b9, and has been present ever
since.
In this case, `OtherReg` always overlaps with `DstReg` cause they from
the `Copy` all.
2025-11-23 10:11:24 +08:00
Himadhith
e4a4bb0f6d
[PowerPC] Replace vspltisw+vadduwm instructions with xxleqv+vsubuwm for adding the vector {1, 1, 1, 1} (#160882)
This patch optimizes vector addition operations involving **`all-ones`**
vectors by leveraging the generation of vectors of -1s(using `xxleqv`,
which is cheaper than generating vectors of 1s(`vspltisw`). These are
the respective vector types.
`v2i64`: **`A + vector {1, 1}`**
`v4i32`: **`A + vector {1, 1, 1, 1}`**
`v8i16`: **`A + vector {1, 1, 1, 1, 1, 1, 1, 1}`**
`v16i8`: **`A + vector {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1}`**

The optimized version replaces `vspltisw (4 cycles)` with `xxleqv (2
cycles)` using the following identity:
`A - (-1) = A + 1`.

---------

Co-authored-by: himadhith <himadhith.v@ibm.com>
Co-authored-by: Tony Varghese <tonypalampalliyil@gmail.com>
2025-11-21 12:26:58 +05:30
Matt Arsenault
253ed52436
DAG: Use poison for some vector result widening (#168290) 2025-11-19 16:49:43 -05:00
Aditi Medhane
fa50a684c5
[PowerPC] Add custom lowering for SADD overflow for i32 and i64 (#159255)
This patch improves the codegen for saddo on i32 and i64 in both 32-bit
and 64-bit modes by custom lowering. It implements signed-add overflow
detection using the `(x eqv y) & (sum xor x)`bit-level sequence.
2025-11-19 10:11:51 +05:30
Mikołaj Piróg
e7b41df10e
[SelectionDAGBuilder] Propagate fast-math flags to fpext (#167574)
As in title. Without this, fpext behaves in selectionDAG as always
having no fast-math flags.
2025-11-14 20:50:59 -08:00
Alexander Belyaev
7ee0e0f956 Revert "[LICM] Sink unused l-invariant loads in preheader. #157559"
This reverts commit 469702c5d5cc4fa18c3a962afb971950a084f373.

https://github.com/llvm/llvm-project/issues/168048
2025-11-14 14:51:33 +01:00
zhijian lin
55aff64d2c
[PowerPC] fold i128 equality/inequality compares of two loads into a vectorized compare using vcmpequb.p when Altivec is available (#158657)
The patch add 16 bytes load size for function
PPCTTIImpl::enableMemCmpExpansion and fold i128 equality/inequality
compares of two loads into a vectorized compare using vcmpequb.p when
Altivec is available.

Rationale:
A scalar i128 SETCC (eq/ne) normally lowers to multiple scalar ops. On
VSX-capable subtargets, we can instead reinterpret the i128 loads as
v16i8 vectors and use the Altive vcmpequb.p instruction to perform a
full 128-bit equality check in a single vector compare.

Example Result:
This transformation replaces memcmp(a, b, 16) with two vector loads and
one vector compare instruction.
2025-11-13 10:06:36 -05:00
Matt Arsenault
dfdada1b78
CodeGen: Remove target hook for terminal rule (#165962)
Enables the terminal rule for remaining targets
2025-11-12 21:12:19 +00:00
Lei Huang
c0ac0c47e4
[PowerPC] Add intrinsic support for xvrlw (#167349) 2025-11-12 12:25:49 -05:00
Matt Arsenault
7d9b7e8c7b
PPC: Mark xfailed sincospi test as unsupported with EXPENSIVE_CHECKS (#167639) 2025-11-12 05:26:40 +00:00
Matt Arsenault
d6c750b36a
PPC: Disable type checking in xfailed sincospi test (#167563)
This hangs in expensive_checks
2025-11-11 13:50:27 -08:00
zhijian lin
85d2b10838
[DAG] Make strictfp attribute only restricts for libm and make non-math optimizations possible (#165464)
the patch 

[Add strictfp attribute to prevent unwanted optimizations of libm
calls](https://reviews.llvm.org/D34163)


  add `I.isStrictFP()` into 
```
  if (!I.isNoBuiltin() && !I.isStrictFP() && !F->hasLocalLinkage() &&
        F->hasName() && LibInfo->getLibFunc(*F, Func) &&
        LibInfo->hasOptimizedCodeGen(Func)) 
```

it prevents the backend from optimizing even non-math libcalls such as
`strlen` and `memcmp` if a call has the strict floating-point attribute.
For example, it prevent converting strlen and memcmp to milicode call
__strlen and __memcmp.
2025-11-11 13:34:14 -05:00
Matt Arsenault
821d2825a4
RuntimeLibcalls: Remove incorrect sincospi from most targets (#166982)
sincospi/sincospif/sincospil does not appear to exist on common
targets. Darwin targets have __sincospi and __sincospif, so define
and use those implementations. I have no idea what version added
those calls, so I'm just guessing it's the same conditions as
__sincos_stret.

Most of this patch is working to preserve codegen when a vector
library is explicitly enabled. This only covers sleef and armpl,
as those are the only cases tested.

The multiple result libcalls have an aberrant process where the
legalizer looks for the scalar type's libcall in RuntimeLibcalls,
and then cross references TargetLibraryInfo to find a matching
vector call. This was unworkable in the sincospi case, since the
common case is there is no scalar call available. To preserve
codegen if the call is available, first try to match a libcall
with the vector type before falling back on the old scalar search.

Eventually all of this logic should be contained in RuntimeLibcalls,
without the link to TargetLibraryInfo. In principle we should perform
the same legalization logic as for an ordinary operation, trying
to find a matching subvector type with a libcall.
2025-11-10 11:05:08 -08:00
zhijian lin
ccedf259fc
[PowerPC] convert memmove to milicode call .___memmove64[PR] in 64-bit mode (#167334)
conversion of bl memmove call to milicode bl .___memmove64[PR] in
64--bit mode is broken , the patch fix the problem.
 
in the llvm/include/llvm/IR/RuntimeLibcalls.td, we do not need to define
the

 `def ___memmove64 : RuntimeLibcallImpl<MEMCPY>` in PPC64AIXCallList 
` def ___memmove32 : RuntimeLibcallImpl<MEMCPY>` in PPC32AIXCallList
 
 since there is function 
 
```
   /// Return a function impl compatible with RTLIB::MEMCPY, or
  /// RTLIB::Unsupported if fully unsupported.
  RTLIB::LibcallImpl getMemcpyImpl() const {
    RTLIB::LibcallImpl Memcpy = getLibcallImpl(RTLIB::MEMCPY);
    if (Memcpy == RTLIB::Unsupported) {
      // Fallback to memmove if memcpy isn't available.
      return getLibcallImpl(RTLIB::MEMMOVE);
    }

    return Memcpy;
  }
```
2025-11-10 10:30:40 -08:00
zhijian lin
69c8756af0
[NFC][PowerPC] Pre-commit adding test case: use millicode for memmove (#166961)
add test case to test lib call are used for the memmove milicode
2025-11-10 10:12:00 -05:00
RolandF77
411ea8e9dd
[PowerPC] Lowering support for EVL type VP_LOAD/VP_STORE (#165910)
Map EVL type VP_LOAD/VP_STORE for fixed length vectors to PPC load/store
with length.
2025-11-07 10:27:46 -05:00
Sean Fertile
43b69e760e
Filter out unemitted metadata before assertion in AIXAsmPrinter. (#165620)
Global annotations metadata would trigger an assertion during code
emission on AIX. Filter out globals that are in the "llvm.metadata"
section before reaching the assert. Adds a test to verify the metadata
is not emitted on either ELF or XCOFF targets.
2025-11-06 10:07:22 -05:00
Lei Huang
3a84aef64a
[PowerPC][NFC] auto gen checks vec rounding tests (#166435)
Update tests to contain auto generated checks.
2025-11-05 10:24:16 -05:00
Lei Huang
a01e4da6d6
[PowerPC] Ensure correct codgen for MMA functions for cpu=future (#165791)
Update MMA tests to add run line for `cpu=future` to ensure MMA
functionality is not broken with the new `wacc` register classes
introduced. Previous commit have added def for using the new `wacc`
registers, this just add in testing and fixes a few patterns that was
missing .
2025-11-04 09:29:26 -05:00
Vigneshwar Jayakumar
469702c5d5
[LICM] Sink unused l-invariant loads in preheader. (#157559)
Unused loop invariant loads were not sunk from the preheader to the exit
block, increasing live range.

This commit moves the sinkUnusedInvariant logic from indvarsimplify to
LICM also adds functionality to sink unused load that's not
clobbered by the loop body.
2025-10-30 09:23:04 -05:00
Princeton Ferro
68e74f8f84
[DAGCombiner] Lower dynamic insertelt chain more efficiently (#162368)
For an insertelt with a dynamic index, the default handling in
DAGTypeLegalizer and LegalizeDAG will reserve a stack slot for the
vector, lower the insertelt to a store, then load the modified vector
back into temporaries. The vector store and load may be legalized into a
sequence of smaller operations depending on the target.

Let V = the vector size and L = the length of a chain of insertelts with
dynamic indices. In the worse case, this chain will lower to O(VL)
operations, which can increase code size dramatically.

Instead, identify such chains, reserve one stack slot for the vector,
and lower all of the insertelts to stores at once. This requires only
O(V + L) operations. This change only affects the default lowering
behavior.
2025-10-29 09:46:01 -07:00
Shimin Cui
531fd45e92
[PPC] Set minimum of largest number of comparisons to use bit test for switch lowering (#155910)
Currently it is considered suitable to lower to a bit test for a set of
switch case clusters when the the number of unique destinations
(`NumDests`) and the number of total comparisons (`NumCmps`) satisfy:
`(NumDests == 1 && NumCmps >= 3) || (NumDests == 2 && NumCmps >= 5) ||
(NumDests == 3 && NumCmps >= 6)`

However it is found for some cases on powerpc, for example, when
NumDests is 3, and the number of comparisons for each destination is all
2, it's not profitable to lower the switch to bit test. This is to add
an option to set the minimum of largest number of comparisons to use bit
test for switch lowering.

---------

Co-authored-by: Shimin Cui <scui@xlperflep9.rtp.raleigh.ibm.com>
2025-10-28 10:24:32 -04:00
Lei Huang
2b3a76825f
[PowerPC] Update tlbie instruction implementation for ISA3.0+ (#162729)
The instruction `tlbie` changed in ISA3.0.

ISA V2.07:  `tlbie RB,RS`
ISA V3.0: `tlbie RB,RS,RIC,PRS,R`, with `tlbie RB,RS` aliased to `tlbie
RB,RS,0,0,0`
2025-10-27 11:18:45 -04:00
paperchalice
c8f5c602c8
[test][PowerPC] Remove unsafe-fp-math uses (NFC) (#164817)
Post cleanup for #164534.
2025-10-26 09:29:45 +08:00
paperchalice
3656f6f226
[CodeGen] Remove -enable-unsafe-fp-math option (#164559)
Hope this can unblock #105746.
2025-10-22 15:40:31 +08:00
zhijian lin
7aa6c62bdb
[PowecPC] Hint branch bne- for atomic operation after the store-conditional (#152529)
The branches emitted for atomic operations after the store-conditional
are currently not hinted, even though they should be.

According to the Power10 Processor Chip User’s Manual:

` “Without static prediction, if the lock is not acquired in the first
iteration, the branch history mechanism works to update the prediction
to predict taken; that is, predict lock acquisition failure and cause
more lwarx traffic for the next iteration.”`

This patch addresses the issue by adding explicit branch hints for
atomic operations after the store-conditional.
2025-10-21 09:37:30 -04:00
paperchalice
26feb1a9f1
[PowerPC] Remove UnsafeFPMath uses (#154901)
Try to remove `UnsafeFPMath` uses in PowerPC backend. These global flags
block some improvements like
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797.
Remove them incrementally.

FP operations may raise exceptions are replaced by constrained
intrinsics. However, vector type is not supported by these intrinsics.
2025-10-21 19:01:34 +08:00
Himadhith
43364151d7
[NFC][PowerPC] Patch to add the remaining types v2i64, v8i16 and v16i8 into exisiting testfile (#163201)
The previous [NFC
patch](https://github.com/llvm/llvm-project/pull/160476#top) addressed
only the vector type `v4i32`, this is a continuation for the previous
patch which adds the remaining 3 vector types which were left out.

This should include the following operands:
- `v2i64`: `A + vector {1, 1,}`
- `v8i16`: `A + vector {1, 1, 1, 1, 1, 1, 1, 1}`
- `v16i8`: `A + vector {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}`

---------

Co-authored-by: himadhith <himadhith.v@ibm.com>
2025-10-16 11:02:09 +05:30
Tony Varghese
d83fe1201e
[PowerPC] Exploit xxeval instruction for operations of the form ternary(A, X, nor(B,C)), ternary(A, X, eqv(B,C)), ternary(A, X, nand(B,C)), ternary(A, X, not(B)) and ternary(A, X, not(C)) (#158096)
Adds support for ternary equivalent operations of the form `ternary(A,
X, nor(B,C))`, `ternary(A, X, eqv(B,C))`, `ternary(A, X, nand(B,C))`,
`ternary(A, X, not(B))` and `ternary(A, X, not(C))` where `X=[xor(B,C)|
nor(B,C)| eqv(B,C)| not(B)| not(C)| and(B,C)| nand(B,C)]`.

This adds support for `v4i32, v2i64, v16i8, v8i16` operand types for the
following patterns.
List of xxeval equivalent ternary operations added and the corresponding
imm value required:
```
ternary(A,  and(B,C),   nor(B,C))	129
ternary(A,  B,          nor(B,C))	131
ternary(A,  C,          nor(B,C))	133
ternary(A,  xor(B,C),   nor(B,C))	134
ternary(A,  not(C),     nor(B,C))	138
ternary(A,  not(B),     nor(B,C))	140
ternary(A,  nand(B,C),  nor(B,C))	142

ternary(A,  or(B,C),    eqv(B,C))	151
ternary(A,  nor(B,C),   eqv(B,C))	152
ternary(A,  not(C),     eqv(B,C))	154
ternary(A,  nand(B,C),  eqv(B,C))	158

ternary(A,  and(B,C),   not(C))	    161
ternary(A,  B,          not(C))	    163
ternary(A,  xor(B,C),   not(C))	    166
ternary(A,  or(B,C),    not(C))	    167
ternary(A,  not(B),     not(C))	    172
ternary(A,  nand(B,C),  not(C))	    174

ternary(A,  and(B,C),   not(B))	    193
ternary(A,  xor(B,C),   not(B))	    198
ternary(A,  or(B,C),    not(B))	    199
ternary(A,  nand(B,C),  not(B))	    206

ternary(A,  B,          nand(B,C))	227
ternary(A,  C,          nand(B,C))	229
ternary(A,  xor(B,C),   nand(B,C))	230
ternary(A,  or(B,C),    nand(B,C))	231
ternary(A,  eqv(B,C),   nand(B,C))	233
```

eg. `xxeval XT, XA, XB, XC, 129`

performs the ternary operation: `XA ? and(XB, XC) : nor(XB, XC)` and
places the result in `XT`.

This is the continuation of:
- [[PowerPC] Exploit xxeval instruction for ternary patterns -
ternary(A, X,
and(B,C))](https://github.com/llvm/llvm-project/pull/141733#top)
- [[PowerPC] Exploit xxeval instruction for operations of the form
ternary(A,X,B) and
ternary(A,X,C).](https://github.com/llvm/llvm-project/pull/152956#top)
- [[PowerPC] Exploit xxeval instruction for operations of the form
ternary(A,X, XOR(B,C)) and ternary(A,X,
OR(B,C))](https://github.com/llvm/llvm-project/pull/157909#top)

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
2025-10-15 10:58:42 +05:30
Tony Varghese
60ee515b8c
[PowerPC] Emit lxvkq and vsrq instructions for build vector patterns (#157625)
### Optimize BUILD_VECTOR having special quadword patterns

This change optimizes `BUILD_VECTOR` operations by using the `lxvkq` or
`xxpltib + vsrq` instructions to inline constants matching specific
128-bit patterns:
- **MSB set pattern**: `0x8000_0000_0000_0000_0000_0000_0000_0000`
- **LSB set pattern**: `0x0000_0000_0000_0000_0000_0000_0000_0001`

### Implementation Details

The `lxvkq` instruction loads special quadword values into VSX
registers:
```asm
lxvkq XT, UIM
# When UIM=16: loads 0x8000_0000_0000_0000_0000_0000_0000_0000
```

The optimization reconstructs the 128-bit register pattern from
`BUILD_VECTOR` operands, accounting for target endianness. For example,
the MSB pattern can be represented as:
- **Big-Endian**: `<i64 -9223372036854775808, i64 0>`
- **Little-Endian**: `<i64 0, i64 -9223372036854775808>`

Both produce the same register value:
`0x8000_0000_0000_0000_0000_0000_0000_0000`

### MSB Pattern (`0x8000...0000`)
All vector types (`v2i64`, `v4i32`, `v8i16`, `v16i8`) generate:
```asm
lxvkq v2, 16
```

### LSB Pattern (`0x0000...0001`)
All vector types generate:
```asm
xxspltib v2, 255
vsrq v2, v2, v2
```

---------

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
2025-10-15 10:54:04 +05:30
paperchalice
dd44e63c8e
[DAGCombiner] Use FlagInserter in visitFSQRT (#163301)
Propagate fast-math flags for TLI.getSqrtEstimate etc.
2025-10-15 09:03:15 +08:00
Himadhith
9bcf8f088b
[NFC][PowerPC] Lockdown instructions for floating point comparison with zero-vector (#162828)
This NFC patch adds a new function which aids in emitting machine
instructions for floating point vectors. This was previously not
included in the test file as it currently only checks for integer
vectors.

---------

Co-authored-by: himadhith <himadhith.v@ibm.com>
2025-10-14 07:59:04 +05:30