This function tries to look for (seteq (and (reduce_or), mask), 0). If
the mask is a sign bit, InstCombine will have turned it into (setgt
(reduce_or), -1). We should handle that case too.
I'm looking into adding the same canonicalization to SimplifySetCC and
this change is needed to prevent test regressions.
Unlike CVTTP2SI, CVTTP2UI is only available on AVX512 targets, so we
don't fallback to the AVX1 variant when we split a 512-bit vector, so we
can only use the 128/256-bit variants if we have AVX512VL.
Fixes#154492
This patch makes the current behavior explicit to prepare for adding VTs
for v[567]f16.
Right now these types are EVTs and hence don't fall under
getPreferredVectorAction and are simply widened to the next legal
power-of-two vector type. For SSE2 this is v8f16.
Without the preparatory patch however, the behavior would change after
adding these types. getPreferredVectorAction would try to split them
because this is the current behavior for any f16 vector type that is not
legal.
There is a lot more detail at
https://github.com/llvm/llvm-project/issues/152150 in particular how
splitting these new types leads to an inconsistency between
NumRegistersForVT and getTypeAction.
The patch ensures that after the new types are added they would continue
to be widened rather than split. Once the patch to enable v[567]f16
lands, it will be an NFC for x86.
When lowering using sublane shuffles, we can sometimes end up with the
same mask as we started with. We already bail in these occasions, but we
weren't fully simplifying the new shuffle mask before testing if it
matched.
Fixes#153457
If there are enough sign bits in a 64 bit value, we can just compare the bottom 32 bits.
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
As we use getTargetConstantBitsFromNode in a wider variety of places we can't guarantee that all the sources match (or are legal) anymore - better to early out than assert.
Fixes#150117
Fixes#148238.
When GFNI is present, custom bit reversal lowerings for scalar integers
become active. They work by swapping the bytes in the scalar value and
then reversing bits in a vector of bytes. However, the custom bit
reversal lowering for a vector of bytes is disabled if GFNI is present
in isolation, resulting messed up code.
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
This allows truncated splat / buildvector in isBoolConstant, to allow
certain not instructions to be recognized post-legalization, and allow
vselect to optimize.
An override for x86 avx512 predicated vectors is required to avoid an
infinite recursion from the code that detects zero vectors. From:
```
// Check if the first operand is all zeros and Cond type is vXi1.
// If this an avx512 target we can improve the use of zero masking by
// swapping the operands and inverting the condition.
```
This patch refactors the add(psadbw(x, 0), psadbw(y, 0)) -> psadbw(x + y, 0) combine to use SDPatternMatch matchers instead of manually checking opcodes and operands.
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
We were previously limited to abs(sub(zext(),zext()) patterns, but add
handling for a number of other abdu patterns until a topological sorted
dag allows us to rely on a ABDU node having already been created.
Now that we don't just match zext() sources, I've generalised the
createPSADBW helper to explicitly zext/truncate to the expected vXi8
source type - it still assumes the sources are correct for a PSADBW
node.
Fixes#143456
detectZextAbsDiff had already been simplified a great deal when it was
converted to SDPatternMatch, and a future patch should allow us to match
to ISD::ABDU directly making it entirely redundant.
This PR resolves https://github.com/llvm/llvm-project/issues/144513
The modification include five pattern :
1.vselect Cond, 0, 0 → 0
2.vselect Cond, -1, 0 → bitcast Cond
3.vselect Cond, -1, x → or Cond, x
4.vselect Cond, x, 0 → and Cond, x
5.vselect Cond, 000..., X -> andn Cond, X
1-4 have been migrated to DAGCombine. 5 still in x86 code.
The reason is that you cannot use the andn instruction directly in
DAGCombine, you can only use and+xor, which will introduce optimization
order issues. For example, in the x86 backend, select Cond, 0, x →
(~Cond) & x, the backend will first check whether the cond node of
(~Cond) is a setcc node. If so, it will modify the comparison operator
of the condition.So the x86 backend cannot complete the optimization of
andn.In short, I think it is a better choice to keep the pattern of
vselect Cond, 000..., X instead of and+xor in combineDAG.
For commit, the first is code changes and x86 test(note 1), the second
is tests in other backend(node 2).
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
In most contexts the pointer type is implied by the operation
and should be propagated; getPointerTy is for niche cases where
there is a synthesized value.
Move the OR(PSHUFB(),PSHUFB()) fold to reuse an existing
createShuffleMaskFromVSELECT result and ensure it is performed before
the combineX86ShufflesRecursively combine to prevent some hasOneUse
failures noticed in #133947 (combineX86ShufflesRecursively still
unnecessarily widens vectors in several locations).
This reverts #140919 / f1d03dedfbe87119cfcafb07e0e0f90ec291cb97 - which
could result in another fold trying to split the concatenation apart
again before it was folded to a SUBV_BROADCAST_LOAD
When reducing v4f64/v8f32 non-lane crossing X86ISD::VPERMV nodes, we use X86ISD::VPERMILPV nodes for 128-bits, but these are only available for fp types.
Fixes#145046
We should only concat logic ops if at least one operand will freely
concatenate. We've now addressed the remaining regressions on AVX2
targets, but still have a number on AVX512 targets which can
aggressively use VPTERNLOG in many cases.