1692 Commits

Author SHA1 Message Date
Luke Lau
fe105347e2
[SelectionDAG] Expand CTTZ_ELTS[_ZERO_POISON] and handle splitting (#185605)
Currently a cttz.elts of e.g. nxv32i1 will get expanded to a reduction
of nxv32i64 or equivalent, but we can split it into two legal nxv16i1
cttz.elts once we have dedicated SelectionDAG nodes.

This implements the splitting for them the same way we implement type
splitting for vp.cttz.elts, i.e. check if the low result is VF, and if
so add it to the result of the high result. It also implements operand
type promotion for NEON which needs to promote i1 vectors to something
larger first.

We also need to move expansion into LegalizeVectorOps so it doesn't get
expanded before type legalization can do splitting. This uses
LegalizeVectorOps in case the scalar reduction type, which depends on
the minimum bitwidth needed to store the result, still needs type
promotion.

The TTI costs should be updated after this to reflect the more efficient
codegen, but that is deferred to another PR.
2026-03-24 10:11:46 +00:00
Elvis Wang
494b98236f
[TargetLowering][RISCV] Using index type for step vector in expandVectorFindLastActive (#187984)
This patch change the type of the step vector lowered from
`expandVectorFindLastActive` from `e8` to the index type of the target
machine.

This can help the index out of bound issue when the VLEN is large.

Note that after this patch, there are still some issue in
expandVectorFindLastActive.
2026-03-24 09:37:25 +08:00
Craig Topper
0afc30f8d5
[TargetLowering] Add helper to create FSHR like operation in expandDIVREMByConstant. NFC (#187979) 2026-03-23 08:49:38 -07:00
Craig Topper
c75b8a1649
[TargetLowering] Avoid unnecessary nodes in the chunk loop in expandDIVREMByConstant (#187967)
We don't need an AND on the last iteration. If we shifted the dividend
due to trailing zeros in the divisor, we don't need a chunk that only
contains shifted in zeros.
2026-03-23 08:48:59 -07:00
Max Graey
7c6996fc8f
[ValueType][NFC] Add widenIntegerElementType method (#187816)
Fixes #187805
2026-03-23 09:43:47 +00:00
Craig Topper
f146677396
[TargetLowering] Refactor expandDIVREMByConstant to share more code. NFC (#187582)
Make the (1 << HBitWidth) % Divisor == 1 path a special case within
the recently added chunk summing algorithm. This allows us to
share the trailing zero shifting code.

While there make some comment improvements and avoid creating
unnecessary nodes.
2026-03-21 11:23:16 -07:00
Craig Topper
c1df6937ba
[TargetLowering] Use legally typed shifts to split chunks in expandDIVREMByConstant. (#187567)
This replaces LegalVT with HiLoVT and LegalWidth with HBitWidth as
they are the same for all current uses.
    
Then we rewrite the shifts to operate on LL and LH.
    
There's a slight regression on RISC-V due to different node creation
order leading to different DAG combine order. I have other refactoring
I'd like to explore then I may try to fix that.
2026-03-21 09:31:20 -07:00
Craig Topper
343b566b57
[TargetLowering] Move the MULH/MUL_LOHI legality checks to the beginning of BuildSDIV/UDIV. NFCI (#187780)
This groups the type and operation legality checks to the beginning. The
rest of the code can focus on the transformation.
2026-03-20 14:52:28 -07:00
Craig Topper
3eecb98b37
[TargetLowering] Separate some of the control for the i32->i64 optimization out of BuildUDIVPattern. (#187739)
Check the type before we call getOperationAction. Give BuildUDIVPattern
only AllowWiden and a WideSVT.

Update variable names and comments to avoid spreading "64" to too many
places.
2026-03-20 14:52:09 -07:00
Craig Topper
b6543c98d7 [TargetLowering] Make sure LL/LH are always initialized in expandDIVREMByConstant
This is quick fix to fix some reported failures.
2026-03-19 11:48:07 -07:00
Shivam Gupta
796b218edd
[LegalizeTypes] Expand UDIV/UREM by constant via chunk summation (#146238)
This patch improves the lowering of 128-bit unsigned division and
remainder by constants (UDIV/UREM) by avoiding a fallback to libcall
(__udivti3/uremti3) for specific divisors.

When a divisor D satisfies the condition (1 << ChunkWidth) % D == 1, the
128-bit value is split into fixed-width chunks (e.g., 30-bit) and summed
before applying a smaller UDIV/UREM. This transformation is based on the
"remainder by summing digits" trick described in Hacker’s Delight.

This fixes #137514 for some constants.
2026-03-19 17:58:54 +05:30
Craig Topper
291359be68
[SelectionDAG] Move the call to BuildExactSDIV and BuildExactUDIV to the top of BuildSDIV/BuildUDIV. (#187378)
This moves it above the type legality check. The legality check we use
for the main division by constant algorithm is probably not right for
BuildExactSDIV and BuildExactSDIV. These checks are largely about the
legality of MUL_LOHI/MULH which are not used for the exact case.

This patch removes the legal type check for the exact case. If we do
need a check it's probably better to have a specific version in
BuildExactSDIV and BuildExactSDIV.

I'm hoping to do some refactoring of the legality checks in
BuildSDIV/BuildUDIV so separating them makes this easier.
2026-03-18 19:54:47 -07:00
Demetrius Kanios
351501799a
[CodeGen] Improve getLoadExtAction and friends (#181104)
Alternative approach to the same goals as #162407

This takes `TargetLoweringBase::getLoadExtAction`, renames it to
`TargetLoweringBase::getLoadAction`, merges `getAtomicLoadExtAction`
into it, and adds more inputs for relavent information (alignment,
address space).

The `isLoadExtLegal[OrCustom]` helpers are also modified in a matching
manner.

This is fully backwards compatible, with the existing `setLoadExtAction`
working as before. But this allows targets to override a new hook to
allow the query to make more use of the information. The hook
`getCustomLoadAction` is called with all the parameters whenever the
table lookup yields `LegalizeAction::Custom`, and can return any other
action it wants.
2026-03-17 23:40:19 -07:00
Craig Topper
ded656b467
[TargetLowering][X86] Directly emit FSHR from expandDIVREMByConstant when Legal. (#186863) 2026-03-16 16:48:34 -07:00
AbdallahRashed
367569e667
[SelectionDAG] Use ExpandIntRes_CLMUL to expand vector CLMUL via narrower legal types (#184468)
Reuse the ExpandIntRes_CLMUL identity to expand vector
CLMUL/CLMULR/CLMULH on wider element types (vXi16, vXi32, vXi64) by
decomposing into half-element-width operations that eventually reach a
legal CLMUL type.

Three generic strategies in expandCLMUL:
1. Halve: halve element width (e.g. v8i16 -> v8i8 on AArch64)
2. promote to double : zext to wider type if CLMUL is legal there (e.g.
x86)
3. Count widen: pad with undef to double element count (e.g. v4i16 ->
v8i16)

A helper canNarrowCLMULToLegal() guides strategy selection and prevents
circular expansion in the CLMULH bitreverse path.

Also add Custom BITREVERSE lowering for v4i16/v8i16 on AArch64 using
REV16+RBIT, which the CLMULH expansion relies on.

Fixes #183768
2026-03-09 23:21:44 +00:00
MITSUNARI Shigeo
3e24a39357
[SelectionDAG] Optimize 32-bit udiv with 33-bit magic constants on 64-bit targets (#181288)
This PR optimizes 32-bit unsigned division by constants when the magic
constant is 33 bits (IsAdd=true case in UnsignedDivisionByConstantInfo)
on 64-bit targets.

## Overview

Compiler optimization for constant division of `uint32_t` variables
(such as `x / 7`) is based on the method
proposed by Granlund and Montgomery in 1994 (hereafter referred to as
the GM method).
However, the GM method for the IsAdd=true case was optimized for 32-bit
CPUs, not 64-bit CPUs.

This patch provides optimizations specifically for 64-bit CPUs (such as
x86_64 and Apple M-series).
A simple benchmark demonstrates over 60% speedup on both Intel Xeon and
Apple M4 processors.

## The GM Method

The GM method for `x / 7` can be expressed in C code as follows,
where the constants `c` and `a` are magic numbers determined by the
divisor:

```cpp
uint32_t udiv_original(uint32_t x) {
    uint64_t v = x * c;
    v >>= 32;
    uint32_t t = uint32_t(x) - uint32_t(v);
    t >>= 1;
    t += uint32_t(v);
    t >>= a - 33;
    return t;
}
```

For example, division by 7 on x86_64 generates 7 instructions:

```asm
movl    %edi, %eax
imulq   $613566757, %rax, %rax
shrq    $32, %rax
subl    %eax, %edi
shrl    %edi
addl    %edi, %eax
shrl    $2, %eax
```

## Proposed Solution

This patch generates the following optimized code:

```cpp
uint32_t udiv_optimized(uint32_t x) {
    uint128_t v = uint128_t(x) * ((c + 0x100000000) << (64 - a));
    return uint32_t(v >> 64);
}
```

Since a 64-bit right shift of a 128-bit variable extracts the upper 64
bits,
this code eliminates the need for shifts after multiplication.

The implementation pre-shifts the 33-bit magic constant `c = 2^32 +
Magic` left by `(64-a)` bits
and uses the high 64 bits of a 64 x 64 -> 128 bit multiplication
directly.
This eliminates the add/sub/shift sequence.

After optimization, division by 7 becomes 4 instructions (or 3 with
BMI2):

```asm
# Standard (4 instructions)
movl    %edi, %eax
movabsq $2635249153617166336, %rcx
mulq    %rcx
movq    %rdx, %rax

# With BMI2 (3 instructions)
movl    %edi, %edx
movabsq $2635249153617166336, %rax
mulxq   %rax, %rax, %rax
```
2026-03-06 15:18:34 -08:00
Craig Topper
98c46261d9
[TargetLowering][PowerPC] Don't unroll vector CLMUL when MUL is not supported. (#184238)
We can use the bittest lowering instead.
2026-03-03 17:25:54 -08:00
AbdallahRashed
3702733820
[SelectionDAG] Fix CLMULR/CLMULH expansion (#183537)
For v8i8 on AArch64, `expandCLMUL` picked the zext path (ExtVT=v8i16) since ZERO_EXTEND/SRL were legal, but CLMUL on v8i16 is not, resulting in a bit-by-bit expansion (~42 insns). Prefer the bitreverse path when CLMUL is legal on VT but not ExtVT.

v8i8 CLMULR: 42 → 4 instructions.

Fixes #182780
2026-02-27 15:16:52 +00:00
Simon Pilgrim
90b3fd7101
[DAG] Move (X +/- Y) & Y --> ~X & Y fold from visitAnd to SimplifyDemandedBits (#183270)
Add DemandedElts handling to allow better vector support

To prevent RISCV falling back to a mul call in known-never-zero.ll I've
had to tweak the (mul step_vector(C0), C1) to (step_vector(C0 * C1))
fold to only occur if C0 is already non-power-of-2, C0 * C1 is a
power-of-2 or the target has good mul support.
2026-02-26 11:26:00 +00:00
Craig Topper
ab0823c9c7
[TargetLowering][RISCV] Disable the special illegal type expansion of ISD::AVGFLOORU on RV32 (#181073)
RISC-V doesn't have a carry flag which makes the UADDO expansion
expensive to emulate.

I've disabled the code by checking if UADDO is not supported for the
type that will be legalized too. Unfortunatley, we have custom lowering
of UADDO on RV64 so this doesn't disable this code there.
2026-02-25 23:26:16 -08:00
Nikita Popov
cfca635efc
[SelectionDAG] Fix fptoui.sat expansion using minnum/maxnum (#180178)
fptoui.sat can currently use a minnum/maxnum based expansion, which
relies on NaNs not being propagated. Specifically, it relies on
minnum(maxnum(NaN, 0), MAX) to return 0. However, if the input is sNaN,
then maxnum(sNaN, 0) is allowed to return qNaN, in which case the final
result will be MAX rather than 0.

This PR does the following changes:

* Support the fold for minimumnum/maximumnum, which guarantees that NaN
is not propagated even for sNaN, so it can use the old lowering. Test
this using Hexagon which has legal minimumnum but illegal minnum.
* For the minnum/maxnum case, remove the special unsigned case and
instead always insert the explicit NaN check. In that case the NaN
propagation semantics don't matter.
* This also means that we can support this expansion for
minimum/maximum.
2026-02-25 12:23:42 +00:00
Carlos Alberto Enciso
bc9d5b01d3
[clang][DebugInfo] Add virtuality call-site target information in DWARF. (#182510)
Given the test case:

  struct CBase {
    virtual void foo();
  };

  void bar(CBase *Base) {
    Base->foo();
  }

and using '-emit-call-site-info' with llc, the following DWARF
is produced for the indirect call 'Base->foo()':

1$: DW_TAG_structure_type "CBase"
      ...
2$:   DW_TAG_subprogram "foo"
        ...

3$: DW_TAG_subprogram "bar"
      ...
4$:   DW_TAG_call_site
        ...

We add DW_AT_LLVM_virtual_call_origin to existing call-site
information, linking indirect calls to the function-declaration
they correspond to.

4$:   DW_TAG_call_site
        ...
        DW_AT_LLVM_virtual_call_origin (2$ "_ZN5CBase3fooEv")

The new attribute DW_AT_LLVM_virtual_call_origin helps to
address the ambiguity to any consumer due to the usage of
DW_AT_call_origin.

The functionality is available to all supported debuggers and
it is generated only for DWARF version 5 or greater.
2026-02-25 05:35:07 +00:00
Björn Pettersson
5e5e300d07
[SelectionDAG] Fix bug related to demanded bits/elts for BITCAST (#145902)
When we have a BITCAST and the source type is a vector with smaller
elements compared to the destination type, then we need to demand all
the source elements that make up the demanded elts for the result when
doing recursive calls to SimplifyDemandedBits,
SimplifyDemandedVectorElts and SimplifyMultipleUseDemandedBits. Problem
is that those simplifications are allowed to turn non-demanded elements
of a vector into POISON, so unless we demand all source elements that
make up the result there is a risk that the result would be more
poisonous (even for demanded elts) after the simplification.

The patch fixes some bugs in SimplifyMultipleUseDemandedBits and
SimplifyDemandedBits for situations when we did not consider the problem
described above. Now we make sure that we also demand vector elements
that "must not be turned into poison" even if those elements correspond
to bits that does not need to be defined according to the DemandedBits
mask.

Fixes #138513
2026-02-23 14:38:07 +01:00
Craig Topper
2617cc5e82
[TargetLowering][RISCV] Avoid ISD::MUL in expandCLMUL if hasBitTest or MUL requires a library call. (#182389)
Scalar multiply is not part of the most basic RISC-V ISA. Use a
and+setcc+select for these targets.

The and+setcc+select is also beneficial for targets with bit test
instructions. RISC-V may not get the full benefit here due to
not having a cmove-like instruction without Zicond.

Co-authored-by: fbrv <Fabio.Baravalle@gmail.com>
2026-02-22 19:20:16 -08:00
Paul Kirth
ec8b9ca47d
Revert "[clang][DebugInfo] Add virtuality call-site target informatio… (#182343)
…n in DWARF. (#167666)"

This reverts commit 418ba6e8ae2cde7924388142b8ab90c636d2c21f.

The commit caused an ICE due to hitting unreachable in
llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:1307

Fixes #182337
2026-02-19 12:19:11 -08:00
Simon Pilgrim
aa2dac40de
[DAG] SimplifyDemandedBits - fold FSHR(X,Y,Amt) -> SRL(Y,Amt) (#182294)
If a FSHR node's DemandedBits mask and maximum shift amount doesn't
demand any bits from the X upper register, then simplify to a SRL node.

FSHL is less useful but we could add it as a future patch if there's
interest

Based off a discussion on #182021
2026-02-19 18:29:34 +00:00
Hamza Hassanain
ca77001a1a
[ARM] Replace manual CLS expansion with ISD::CTLS (#178430)
Converts ARM scalar CLS intrinsics to use the unified ISD::CTLS node
instead of custom manual expansion. This addresses the issue
[#174337](https://github.com/llvm/llvm-project/issues/174337).

Co-authored-by: Craig Topper <craig.topper@sifive.com>
2026-02-19 07:49:26 -08:00
Carlos Alberto Enciso
418ba6e8ae
[clang][DebugInfo] Add virtuality call-site target information in DWARF. (#167666)
Given the test case:

  struct CBase {
    virtual void foo();
  };

  void bar(CBase *Base) {
    Base->foo();
  }

and using '-emit-call-site-info' with llc, the following DWARF
is produced for the indirect call 'Base->foo()':

1$: DW_TAG_structure_type "CBase"
      ...
2$:   DW_TAG_subprogram "foo"
        ...

3$: DW_TAG_subprogram "bar"
      ...
4$:   DW_TAG_call_site
        ...

We add DW_AT_LLVM_virtual_call_origin to existing call-site
information, linking indirect calls to the function-declaration
they correspond to.

4$:   DW_TAG_call_site
        ...
        DW_AT_LLVM_virtual_call_origin (2$ "_ZN5CBase3fooEv")

The new attribute DW_AT_LLVM_virtual_call_origin helps to
address the ambiguity to any consumer due to the usage of
DW_AT_call_origin.

The functionality is available to all supported debuggers.
2026-02-19 14:48:59 +00:00
Simon Pilgrim
9b3470d56f
[DAG] expandCLMUL - unroll vector clmul if vector multiplies are not supported (#182041)
Fixes powerpc cases reported on #182039

I'm hoping #177566 can be adapted to improve upon this.
2026-02-18 19:03:36 +00:00
Nikita Popov
c4721872af Revert "[Clang][inlineasm] Add special support for "rm" output constraints (#92040)"
This change landed without approval.

This reverts commit 45e666a8531c1148bdb170b9a120f99e1500c427.
This reverts commit a636dd4c37f12594275de2fe180ca35bc04d76ea.
2026-02-14 15:59:04 +01:00
Bill Wendling
45e666a853
[Clang][inlineasm] Add special support for "rm" output constraints (#92040)
Clang isn't able to support multiple constraints on inputs and outputs,
like "rm". Instead, it picks the "safest" one to use, i.e. the memory
constraint for "rm". This leads to obviously horrible code:

  asm __volatile__ ("pushf\n\t"
                    "popq %0"
                    : "=rm" (x));

is compiled to:

        pushf
	popq -8(%rsp)
	movq	-8(%rsp), %rax

It gets worse when inlined into other functions, because it may
introduce
a stack where none is needed.

With this change, Clang now generates IR for the more optimistic choice
("r"). All but the fast register allocator are able to fold registers if
it turns out that register pressure is too high.

This leaves the fast register allocator. The fast register allocator, as
the name suggests, is built for execution speed, not code quality. Thus,
we add special processing to convert the "optimistic" IR into the
"conservative" choice (again at the IR level), which we know it can
handle.

We focus on "rm" for the initial commit, but that can be expanded in the
future for other constraints where Clang generates ++ungood code (like
"g").

Fixes: https://github.com/llvm/llvm-project/issues/20571
2026-02-14 05:02:24 -08:00
Björn Pettersson
6420099bcc
[SelectionDAG] Make sure demanded lanes for AND/MUL-by-zero are frozen (#180727)
DAGCombiner can fold a chain of INSERT_VECTOR_ELT into a vector AND/OR
operation. This patch adds protection to avoid that we end up making the
vector more poisonous by freezing the source vector when the elements
that should be set to 0/-1 may be poison in the source vector.

The patch also fixes a bug in SimplifyDemandedVectorElts for
MUL/MULHU/MULHS/AND that could result in making the vector more
poisonous. Problem was that we skipped demanding elements from Op0 that
were known to be zero in Op1. But that could result in elements being
simplified into poison when simplifying Op0, and then the result would
be poison and not zero after the MUL/MULHU/MULHS/AND. The solution is to
defensively make sure that we demand all the elements originally
demanded also when simplifying Op0.

This bugs were found when analysing the miscompiles in
https://github.com/llvm/llvm-project/issues/179448

Main culprit in #179448 seems to have been the bug in DAGCombiner. The
bug in SimplifyDemandedVectorElts surfaced when fixing the DAGCombiner,
as that fix typically introduce the (AND (FREEZE x), y) pattern that
wasn't handled correctly in SimplifyDemandedVectorElts.

Also fixes #180409.
Also fixes #176682.
2026-02-12 10:58:29 +01:00
yingopq
1e42c76d61
[Mips] Fix cttz.i32 fails to lower on mips16 (#179633)
MIPS16 cannot handle constant pools created by CTTZ table lookup
expansion. This causes "Cannot select" errors when trying to select
MipsISD::Lo nodes for constant pool addresses.
    
Modify the table lookup conditions to check ConstantPool operation
status, and only set ConstantPool to Custom in non-MIPS16 mode in MIPS
backend.
    
This ensures MIPS16 uses the ISD::CTPOP instead of attempting
unsupported constant pool operations.

Fix #61055.
2026-02-11 16:27:57 +08:00
Craig Topper
1d1a34ff3e
[TargetLowering] Avoid creating a VTList until we know we need it. NFC (#180599)
Since I was in the area, also use SDValue::getValue() to shorten getting
result 1.
2026-02-09 20:16:08 +00:00
paperchalice
c53acf0443
[SelectionDAGBuilder] Remove NoNaNsFPMath uses (#169904)
Replaced by checking fast-math flags or value tracking results.
2026-02-09 09:48:07 +08:00
ZhaoQi
38e280d8a4
[SelectionDAG] Use promoted types when creating nodes after type legalization (#178617)
When creating new nodes with illegal types after type legalization, we
should try to use promoted type to avoid creating nodes with illegal
types.

Fixes: https://github.com/llvm/llvm-project/issues/177155
2026-02-03 09:56:20 +00:00
niqiangpro-cell
603b625b21
[Analysis] Add Intrinsics::CLMUL case to cost calculations to getIntrinsicInstrCost / getTypeBasedIntrinsicInstrCost (#176552)
This patch adds a case in getIntrinsicInstrCost and
getTypeBasedIntrinsicInstrCost in
llvm/include/llvm/CodeGen/BasicTTIImpl.h for Intrinsic::clmul. This
patch uses TLI->isOperationLegalOrCustom to check if the instruction is
cheap. If not cheap, it sums up the cost of the arithmetic operations
(AND, SHIFT, XOR) multiplied by the bit width.

Fixes #176354
2026-02-01 12:56:41 +00:00
Osama Abdelkader
aad7259ff6
[AArch64] Optimize memset to use NEON DUP instruction for more sizes (#166030)
This change improves memset code generation for non-zero values on
AArch64 by using NEON's DUP instruction instead of
the less efficient multiplication with 0x01010101 pattern.

For small sizes, the value is extracted from a larger DUP. For
non-power-of-two sizes, overlapping stores are used in some cases.

TargetLowering::findOptimalMemOpLowering is modified to allow explicitly
specifying the size of the constant in cases where the constant is
larger than the store operations.

Fixes #165949
2026-01-29 13:03:38 -08:00
Anikesh Parashar
fd45140ed6
[DAG] SimplifyDemandedBits - ICMP_SLT(X,0) - only sign mask of X is required (#164946)
Resolves #164589
2026-01-28 17:30:23 +00:00
valadaptive
cdc6a84c14
TargetLowering: Allow FMINNUM/FMAXNUM to lower to FMINIMUM/FMAXIMUM even without nsz (#177828)
This restriction was originally added in
https://reviews.llvm.org/D143256, with the given justification:

> Currently, in TargetLowering, if the target does not support fminnum,
we lower to fminimum if neither operand could be a NaN. But this isn't
quite correct because fminnum and fminimum treat +/-0 differently; so,
we need to prove that one of the operands isn't a zero.

As far as I can tell, this was never correct. Before
https://github.com/llvm/llvm-project/pull/172012, `minnum` and `maxnum`
were nondeterministic with regards to signed zero, so it's always been
perfectly legal to lower them to operations that order signed zeroes.
2026-01-25 18:24:12 -05:00
Simon Pilgrim
15cd9f736b
[DAG] expandIntMINMAX - use getOppositeSignednessMinMaxOpcode helper to flip min/max signedness. NFC. (#177450) 2026-01-22 20:38:35 +00:00
Luke Lau
cee36b23cc
[IR] Allow non-constant offsets in @llvm.vector.splice.{left,right} (#174693)
Following on from #170796, this PR implements the second part of
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974
by allowing non-constant offsets in the vector splice intrinsics.

Previously @llvm.vector.splice had a restriction enforced by the
verifier that the offset had to be known to be within the range of the
vector at compile time. Because we can't enforce this with non-constant
offsets, it's been relaxed so that offsets that would slide the vector
out of bounds return a poison value, similar to
insertelement/extractelement.

@llvm.vector.splice.left also previously only allowed offsets within the
range 0 <= Offset < N, but this has been relaxed to 0 <= Offset <= N so
that it's consistent with @llvm.vector.splice.right.

In lieu of the verifier checks that were removed, InstSimplify has been
taught to fold splices to poison when the offset is out of bounds.

The cost model isn't implemented in this PR, and just returns invalid
for any non-constant offsets for now. I think the correct way to cost
these non-constant offets isn't through getShuffleCost because they
can't handle variable masks, but instead just through
getIntrinsicInstCost.
2026-01-21 10:58:40 +00:00
Simon Pilgrim
c7af813b52
[DAG] expandCLMUL - if a target supports CLMUL+CLMULH then CLMULR can be merged from the results (#176644)
If a target supports CLMUL + CLMULH, then we can funnel shift the
results together to form CMULR.

Helps x86 PCLMUL targets particularly
2026-01-18 21:17:36 +00:00
Valeriy Savchenko
9391d46389
[SelectionDAG] Eliminate redundant setcc on comparison results (#171431)
When comparisons produce all-zeros or all-ones in scalars or per lane in
vectors, comparing results of such comparisons against 0 is an identity
operation. This change eliminates redundant comparison instructions
after another comparison operation.
2026-01-16 16:45:19 +00:00
Phoebe Wang
e83021ab16
[SelectionDAG][InlineAsm] Check VT isSimple before getSimpleVT (#176323)
Fixes: #170024
2026-01-16 19:57:52 +08:00
Florian Hahn
68a04c1ada
[SelDag] Use use BoolVT size when expanding find-last-active, if larger. (#175971)
On some targets, BoolVT may have been widened earlier. In those cases,
choosing StepVT to be smaller can cause crashes when widening the
mis-matched select. Without the fix, the new test
@extract_last_active_v4i32_penryn crashes when trying to widen.

It also improves codegen for other cases.

PR: https://github.com/llvm/llvm-project/pull/175971
2026-01-14 20:46:16 +00:00
DaKnig
aa299269ea
[SDAG] (setcc (sub nsw a, b), zero, s??) -> (setcc a, b, s??) (#175459)
This often happens when the dag combiner produces sign/zero extends and
realizes that nsw/nuw can be added, for example in the case of `(abds
(sext a), (sext b))`

alive2:
- slt, nsw: [link](https://alive2.llvm.org/ce/z/cgjMSx)
- sgt, nsw: [link](https://alive2.llvm.org/ce/z/JP7h2f)
- sle, nsw: [link](https://alive2.llvm.org/ce/z/n5Wuc_)
- sge, nsw: [link](https://alive2.llvm.org/ce/z/Eps53-)
2026-01-13 17:00:00 +00:00
Liao Chunyu
b5401031d6
[DAG]Add ISD::SPLAT_VECTOR to TargetLowering::getNegatedExpression (#173967)
Fold splat_vector(fneg(X)) -> splat_vector(-X)
Call the getCheaperNegatedExpression function, and ISD::SPLAT_VECTOR
return NegatibleCost::Cheaper.
This optimization is applied only to the fneg instruction.
2026-01-09 18:07:10 +08:00
Florian Hahn
f444467a38
[ISel] Handle TypeWidenVector in expandVectorFindLastActive. (#174384)
When widening extract.last.active, the element count changes. Create a
step vector with only the original elements valid and zeros for padding.
Also widen the mask accordingly. This fixes a hang when lowering on X86,
where widening is required in some cases.

Fixes https://github.com/llvm/llvm-project/issues/171831.

PR: https://github.com/llvm/llvm-project/pull/174384
2026-01-06 12:40:34 +00:00
Luke Lau
ad4bfac732
[IR] Split vector.splice into vector.splice.left and vector.splice.right (#170796)
This PR implements the first change outlined in
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel

In order to allow non-immediate offsets in the llvm.vector.splice
intrinsic, we need to separate out the "shift left" and "shift right"
modes into two separate intrinsics, which were previously determined by
whether or not the offset is positive or negative.

The description in the LangRef has also been reworded in terms of
sliding elements left or right and extracting either the upper or lower
half as opposed to extracting from a certain index, which brings it
inline with the definition of `llvm.fshr.*`/`llvm.fshl.*`.

This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into
their new equivalent one based on their offset, so existing uses of
vector.splice should still work.

Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced
in this PR to keep the diff small and kick the tyres on the AutoUpgrader
a bit. I planned to do this in a follow up NFC but can include it in
this PR if reviewers prefer.

Similarly the shuffle costing kind `SK_Splice` has just been kept the
same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight`
later.
2026-01-06 15:41:26 +08:00