9450 Commits

Author SHA1 Message Date
Craig Topper
2b7b8bdc16
[X86] Accept the canonical form of a sign bit test in MatchVectorAllEqualTest. (#154421)
This function tries to look for (seteq (and (reduce_or), mask), 0). If
the mask is a sign bit, InstCombine will have turned it into (setgt
(reduce_or), -1). We should handle that case too.

I'm looking into adding the same canonicalization to SimplifySetCC and
this change is needed to prevent test regressions.
2025-08-20 09:09:55 -07:00
Simon Pilgrim
d770567a51
[X86] SimplifyDemandedVectorEltsForTargetNode - don't split X86ISD::CVTTP2UI nodes without AVX512VL (#154504)
Unlike CVTTP2SI, CVTTP2UI is only available on AVX512 targets, so we
don't fallback to the AVX1 variant when we split a 512-bit vector, so we
can only use the 128/256-bit variants if we have AVX512VL.

Fixes #154492
2025-08-20 12:18:10 +01:00
Simon Pilgrim
1359f72a03
[X86] canCreateUndefOrPoisonForTargetNode - add X86ISD::MOVMSK (#154321)
MOVMSK nodes don't create undef/poison when extracting the signbits from the source operand
2025-08-19 14:40:42 +01:00
Adam Nemet
350cb989b8
[X86] Explicitly widen larger than v4f16 to the legal v8f16 (NFC) (#153839)
This patch makes the current behavior explicit to prepare for adding VTs
for v[567]f16.

Right now these types are EVTs and hence don't fall under
getPreferredVectorAction and are simply widened to the next legal
power-of-two vector type. For SSE2 this is v8f16.

Without the preparatory patch however, the behavior would change after
adding these types. getPreferredVectorAction would try to split them
because this is the current behavior for any f16 vector type that is not
legal.

There is a lot more detail at
https://github.com/llvm/llvm-project/issues/152150 in particular how
splitting these new types leads to an inconsistency between
NumRegistersForVT and getTypeAction.

The patch ensures that after the new types are added they would continue
to be widened rather than split. Once the patch to enable v[567]f16
lands, it will be an NFC for x86.
2025-08-17 19:15:10 +00:00
Nikita Popov
01bc742185
[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817)
This ensures that the required fields are set, and also makes the
construction more convenient.
2025-08-15 18:06:07 +02:00
Simon Pilgrim
c96d0da62b
[X86] lowerShuffleAsLanePermuteAndPermute - ensure we've simplified the demanded shuffle mask elts before testing for a matching shuffle (#153554)
When lowering using sublane shuffles, we can sometimes end up with the
same mask as we started with. We already bail in these occasions, but we
weren't fully simplifying the new shuffle mask before testing if it
matched.

Fixes #153457
2025-08-14 10:47:11 +01:00
Abhishek Kaushik
11eeb4d133
[X86] combinePMULH - combine mulhu + srl (#132548)
Fixes #132166
2025-08-05 16:10:56 +05:30
paperchalice
03e902cc68
[X86] Remove UnsafeFPMath uses (#151667)
Remove `UnsafeFPMath` in X86 part, it blocks some bugfixes related to
clang and the ultimate goal is to remove `resetTargetOptions` method in
`TargetMachine`, see FIXME in `resetTargetOptions`.
See also
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast

https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract
2025-08-05 08:24:52 +08:00
woruyu
38bfe9ae56
[DAG] combineVSelectWithAllOnesOrZeros - missing freeze (#150388)
This PR resolves https://github.com/llvm/llvm-project/issues/150069

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-04 15:55:12 +01:00
AZero13
8e9e38acc8
[X86] Try to shrink i64 compares if the input has enough sign bits (#149719)
If there are enough sign bits in a 64 bit value, we can just compare the bottom 32 bits.

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-02 18:01:33 +01:00
Phoebe Wang
740758a5fd
[X86][APX] Combine xor .., -1 into Cload/Cstore conditions (#151457)
Remove redundant NOT instruction: https://godbolt.org/z/jM89ejnsh
2025-07-31 15:12:47 +08:00
Phoebe Wang
743177c1ef
[X86][APX] Use TEST instruction for CLOAD/CSTORE (#151160) 2025-07-30 16:34:45 +08:00
Simon Pilgrim
3345582542
[X86] getTargetConstantBitsFromNode - early-out if the element bitsize doesn't align with the source bitsize (#150184)
As we use getTargetConstantBitsFromNode in a wider variety of places we can't guarantee that all the sources match (or are legal) anymore - better to early out than assert.

Fixes #150117
2025-07-23 11:22:38 +01:00
Simon Pilgrim
d87bf79a23
[X86] isGuaranteedNotToBeUndefOrPoisonForTargetNode - X86ISD::GlobalBaseReg and X86ISD::Wrapper/WrapperRIP nodes are never poison (#149854)
Fixes #149841
2025-07-22 07:51:01 +01:00
Simon Pilgrim
069f0fea00
[X86] canCreateUndefOrPoisonForTargetNode - SSE PINSR/PEXTR vector element insert/extract are never out of bounds (#149822)
The immediate index is guaranteed to be treated as modulo
2025-07-22 07:50:17 +01:00
Tobias Decking
10b0dee97d
[X86] Ensure that bit reversals of byte vectors are properly lowered on pure GFNI targets (#148304)
Fixes #148238.

When GFNI is present, custom bit reversal lowerings for scalar integers
become active. They work by swapping the bytes in the scalar value and
then reversing bits in a vector of bytes. However, the custom bit
reversal lowering for a vector of bytes is disabled if GFNI is present
in isolation, resulting messed up code.

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-07-18 19:14:34 +01:00
Brad Smith
0d2e11f3e8
Remove Native Client support (#133661)
Remove the Native Client support now that it has finally reached end of life.
2025-07-15 13:22:33 -04:00
David Green
0736f330b0
[DAG] Handle truncated splat in isBoolConstant (#145473)
This allows truncated splat / buildvector in isBoolConstant, to allow
certain not instructions to be recognized post-legalization, and allow
vselect to optimize.

An override for x86 avx512 predicated vectors is required to avoid an
infinite recursion from the code that detects zero vectors. From:
```
  // Check if the first operand is all zeros and Cond type is vXi1.
  // If this an avx512 target we can improve the use of zero masking by
  // swapping the operands and inverting the condition.
```
2025-07-10 20:59:34 +01:00
Simon Pilgrim
75656d8c11
[X86] combineStore - remove rangedata when converting 64-bit copies to f64 load/store (#147904)
We're changing from i64 to f64 - we can't retain any range metadata

Fixes #147781
2025-07-10 09:32:09 +01:00
woruyu
7edf6bfb54
[DAG][X86] Use pattern matching to simplify PSADBW+ADD combine (#147637)
This patch refactors the add(psadbw(x, 0), psadbw(y, 0)) -> psadbw(x + y, 0) combine to use SDPatternMatch matchers instead of manually checking opcodes and operands.

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-07-09 10:26:12 +01:00
Simon Pilgrim
f0bc41181c
[X86] combineBasicSADPattern - pattern match various vXi8 ABDU patterns (#147570)
We were previously limited to abs(sub(zext(),zext()) patterns, but add
handling for a number of other abdu patterns until a topological sorted
dag allows us to rely on a ABDU node having already been created.

Now that we don't just match zext() sources, I've generalised the
createPSADBW helper to explicitly zext/truncate to the expected vXi8
source type - it still assumes the sources are correct for a PSADBW
node.

Fixes #143456
2025-07-09 08:08:22 +01:00
woruyu
b0790e04a3
[DAG] combineVSelectWithAllOnesOrZeros - fold select Cond, 0, x -> and not(Cond), x (#147472)
### Summary
This patch extends the work from
[#145298](https://github.com/llvm/llvm-project/pull/145298) by removing
the now-unnecessary X86-specific combineVSelectWithLastZeros logic. That
combine is now correctly and more generally handled in the
target-independent combineVSelectWithAllOnesOrZeros.

This simplifies the X86 DAG combine logic and avoids duplication.

Fixes: [#144513](https://github.com/llvm/llvm-project/issues/144513)
Related for reference:
[#146831](https://github.com/llvm/llvm-project/pull/146831)
2025-07-08 14:45:40 +01:00
Simon Pilgrim
153c6db069
[X86] Merge detectZextAbsDiff into combineBasicSADPattern. NFC. (#147368)
detectZextAbsDiff had already been simplified a great deal when it was
converted to SDPatternMatch, and a future patch should allow us to match
to ISD::ABDU directly making it entirely redundant.
2025-07-08 07:19:08 +01:00
Matt Arsenault
d8ef156379
DAG: Remove verifyReturnAddressArgumentIsConstant (#147240)
The intrinsic argument is already marked with immarg so non-constant
values are rejected by the IR verifier.
2025-07-07 16:28:47 +09:00
Phoebe Wang
eca05fde84
[X86] Switch operands order for FMINIMUMNUM/FMAXIMUMNUM (#147193)
When optimizate for NaN, switch operands order for
FMINIMUMNUM/FMAXIMUMNUM.

Fixes #135313
2025-07-07 09:57:02 +08:00
Phoebe Wang
a438c60997
[X86][FP16] Do not customize WidenLowerNode for half if VLX not enabled (#146994)
The #142763 tried to reuse ISD node to workaround the non-VLX lowering
problem, but it caused a new problem: https://godbolt.org/z/1hEGnddhY
2025-07-04 22:57:03 +08:00
Simon Pilgrim
043789519a [X86] combineShiftToPMULH - convert matching to use SDPatternMatch. NFC. 2025-07-04 11:13:29 +01:00
Simon Pilgrim
0f717044ff [X86] lowerX86FPLogicOp - use MVT::changeVectorElementTypeToInteger(). NFC. 2025-07-03 16:23:12 +01:00
Simon Pilgrim
f019c89008 [X86] foldXorTruncShiftIntoCmp - pull out repeated SDLoc. NFC. 2025-07-03 16:03:14 +01:00
Simon Pilgrim
51ff8f2f7e [X86] foldXor1SetCC - pull out repeated SDLoc. NFC. 2025-07-03 16:03:13 +01:00
Simon Pilgrim
a282c68580 [X86] combineX86AddSub - pull out repeated getOperand() call. NFC. 2025-07-03 16:03:13 +01:00
Simon Pilgrim
30eb97c584
[X86] commuteSelect - update to use SDPatternMatch. NFC. (#146868) 2025-07-03 14:51:36 +01:00
Simon Pilgrim
aa8e1bc0e9
[X86] Add BLEND/UNPCK shuffles to canCreateUndefOrPoisonForTargetNode/isGuaranteedNotToBeUndefOrPoisonForTargetNode (#146728)
None of these implicitly generate UNDEF/POISON
2025-07-02 18:38:00 +01:00
woruyu
bbcebec3af
[DAG] Refactor X86 combineVSelectWithAllOnesOrZeros fold into a generic DAG Combine (#145298)
This PR resolves https://github.com/llvm/llvm-project/issues/144513

The modification include five pattern :
1.vselect Cond, 0, 0 → 0
2.vselect Cond, -1, 0 → bitcast Cond
3.vselect Cond, -1, x → or Cond, x
4.vselect Cond, x, 0 → and Cond, x
5.vselect Cond, 000..., X -> andn Cond, X

1-4 have been migrated to DAGCombine. 5 still in x86 code.

The reason is that you cannot use the andn instruction directly in
DAGCombine, you can only use and+xor, which will introduce optimization
order issues. For example, in the x86 backend, select Cond, 0, x →
(~Cond) & x, the backend will first check whether the cond node of
(~Cond) is a setcc node. If so, it will modify the comparison operator
of the condition.So the x86 backend cannot complete the optimization of
andn.In short, I think it is a better choice to keep the pattern of
vselect Cond, 000..., X instead of and+xor in combineDAG.

For commit, the first is code changes and x86 test(note 1), the second
is tests in other backend(node 2).

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-07-02 15:07:48 +01:00
Matt Arsenault
dbe441e716
X86: Avoid some uses of getPointerTy (#146306)
In most contexts the pointer type is implied by the operation
and should be propagated; getPointerTy is for niche cases where
there is a synthesized value.
2025-07-02 22:14:16 +09:00
Simon Pilgrim
fd46e409a9
[X86] detectZextAbsDiff - use m_SpecificVectorElementVT matcher. NFC. (#146498) 2025-07-01 11:59:37 +01:00
Phoebe Wang
67b740bd73
[X86] Add diagnostic for fp128 inline assemble for 32-bit (#146458)
Suggested by Craig from #146259
2025-07-01 12:39:43 +08:00
Simon Pilgrim
372c808217
[X86] canCreateUndefOrPoisonForTargetNode - PCMPEQ/PCMPGT don't create poison/undef (#146116) 2025-06-28 17:01:10 +01:00
Simon Pilgrim
fe4b4033ed
[X86] lowerShuffleAsVTRUNC - use combineConcatVectorOps to catch more "cheap" concats (#145876) 2025-06-26 14:12:18 +01:00
Simon Pilgrim
db4dc88d06
[X86] combineEXTRACT_SUBVECTOR - remove unnecessary bitcast handling. (#145496)
We already aggressively fold extract_subvector(bitcast()) -> bitcast(extract_subvector())
2025-06-24 13:47:03 +01:00
Simon Pilgrim
594ebe6340
[X86] combineSelect - move vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)) fold (#145475)
Move the OR(PSHUFB(),PSHUFB()) fold to reuse an existing
createShuffleMaskFromVSELECT result and ensure it is performed before
the combineX86ShufflesRecursively combine to prevent some hasOneUse
failures noticed in #133947 (combineX86ShufflesRecursively still
unnecessarily widens vectors in several locations).
2025-06-24 10:50:29 +01:00
Matt Arsenault
48155f93dd
CodeGen: Emit error if getRegisterByName fails (#145194)
This avoids using report_fatal_error and standardizes the error
message in a subset of the error conditions.
2025-06-23 16:33:35 +09:00
Abhishek Kaushik
cbfec48697
Revert "[X86][NFC] Use std::move to avoid copy" (#145215)
Reverts llvm/llvm-project#141455
2025-06-22 12:52:57 +05:30
Abhishek Kaushik
4c1a1009ad
[X86][NFC] Use std::move to avoid copy (#141455) 2025-06-21 22:04:41 +05:30
Simon Pilgrim
1753aba034
[X86] combineINSERT_SUBVECTOR - directly fold to X86ISD::SUBV_BROADCAST_LOAD to prevent vector split infinite loop (#145077)
This reverts #140919 / f1d03dedfbe87119cfcafb07e0e0f90ec291cb97 - which
could result in another fold trying to split the concatenation apart
again before it was folded to a SUBV_BROADCAST_LOAD
2025-06-21 00:38:30 +02:00
Simon Pilgrim
f8ee5774b6
[X86] combineConcatVectorOps - only concat AVX1 v4i64 shift-by-32 to a shuffle if the concat is free (#145043) 2025-06-20 18:09:07 +01:00
Simon Pilgrim
151ee0faad [X86] SimplifyDemandedVectorEltsForTargetNode - ensure X86ISD::VPERMILPV node use v2f64/v4f32 types
When reducing v4f64/v8f32 non-lane crossing X86ISD::VPERMV nodes, we use X86ISD::VPERMILPV nodes for 128-bits, but these are only available for fp types.

Fixes #145046
2025-06-20 17:03:30 +01:00
Simon Pilgrim
95c6c11c74
[X86] combineConcatVectorOps - only always concat logic ops on AVX512 targets (#145036)
We should only concat logic ops if at least one operand will freely
concatenate. We've now addressed the remaining regressions on AVX2
targets, but still have a number on AVX512 targets which can
aggressively use VPTERNLOG in many cases.
2025-06-20 15:51:04 +01:00
Matt Arsenault
1c35fe4e6b
RuntimeLibcalls: Pass in exception handling type (#144696)
All of the ABI options that influence libcall decisions need
to be passed in.
2025-06-19 19:08:52 +09:00
Simon Pilgrim
34a4894149 [X86] detectZextAbsDiff - use SDPatternMatch::m_Abs() matcher. NFC. 2025-06-18 13:21:09 +01:00