This transofrm loses information that can be useful for other transforms.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D141883
As per the test case from Steven Johnson in https://reviews.llvm.org/rGf8d9097168b7#1165311
we can indeed encounter such shuffles, that produce all-undef after folding,
before something else manages to optimize them away.
combineInsertEltToShuffle was performing 2 very different folds in the same call, merging "(insert_vector_elt (vector_shuffle X, Y), (extract_vector_elt X, N), IdxC) --> (vector_shuffle X, Y)" and "(insert_vector_elt V, (bitcast X from vector type), IdxC) --> bitcast(shuffle (bitcast V), (extended X), Mask)"
The folds are currently still attempted in the same order as before (just as 2 seperate calls) so there should be no change in behaviour.
First step towards some adjustments to mergeInsertEltWithShuffle for D127115.
This implements the fold (abs (sub nsw x, y)) -> abds(x, y). Providing
the sub is nsw this appears to be valid without the extensions that are
usually used for abds. https://alive2.llvm.org/ce/z/XHVaB3. The
equivalent abdu combine seems to not be valid.
Differential Revision: https://reviews.llvm.org/D141665
As noted in https://reviews.llvm.org/D141778#inline-1369900,
we fail to produce splat shuffles from certain sequences
of shuffles, that may have non-shuffles in the middle of seq.
There is a big pitfail to avoid here: just because `isSplatValue()`
says that all demanded elements are splat, we can't pick any random
one of them, because some of them could be undef! We must ignore those!
When we freeze operands of an operation that we are trying to freeze,
doing so may invalidate the original SDValue. We should just re-fetch
it from the ISD::FREEZE node, because if we bail, we'd hopefully just
revisit the node and do that again.
Fixes https://github.com/llvm/llvm-project/issues/59891
Differential Revision: https://reviews.llvm.org/D141256
Type legalization will insert a sign extend anyway. By doing it
early we can remove the zext. ComputeNumSignBits can't spot it
after type legalization because type legalization may expand
the abs to sra+xor+sub.
If the zext result type is larger than the type to be promoted to,
we'll promote to a legal type and then zext the rest of the way.
If the legal type is larger than the destination type we can promote
and then truncate.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D140509
This fixes a regression introduced by D127115 in test/CodeGen/PowerPC/store-forward-be64.ll
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D140993
faa35fc87370 fix the case of negative input shift. But when `c1`, `c2` is not the same side, it will also cause negative shift amount.
And that negative shift amount can't normalize by urem. So add one more bit size to normalize the last shift amount.
Fix: https://github.com/llvm/llvm-project/issues/59898
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D141363
Use deduction guides instead of helper functions.
The only non-automatic changes have been:
1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).
Per reviewers' comment, some useless makeArrayRef have been removed in the process.
This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.
Differential Revision: https://reviews.llvm.org/D140955
This appears to be the root problematic pattern
for AArch64 regression in D140677.
We already do this, and many more, as target-specific X86 combines,
so this isn't causing much of an impact.
The original computation was both making assumptions that do not hold
in practice, and being overly pessimistic. We should just check
every possible split factor, and pick the best one.
Fixes https://github.com/llvm/llvm-project/issues/59781
This kind of pattern seems to come up as regressions
with better ZERO_EXTEND_VECTOR_INREG recognition.
For initial implementation, this is quite restricted
to the minimal viable transform, otherwise there are
too many regressions to be dealt with.
DAGCombiner replaces (load const_addr1) directly chained with (store
(val, const_addr2)) with val if address space stripped const_addr1 ==
const_addr2. The patch fixes the issue by checking address spaces as
well. However, it might makes sense to not to chain together side
effects that belong to different address spaces in the first place and
make SelectionDAG::root address space aware.
This mainly cleans up a few patterns that are legalized by scalarization
from a wide-element vector, but then are further split apart to build
a more narrow-sized-element vector. In particular this happens in some
cases for illegal ISD::ZERO_EXTEND_VECTOR_INREG.
Given a ISD::EXTRACT_VECTOR_ELT, which is a glorified bit sequence extract,
recursively analyse all of it's users. and try to model themselves as
bit sequence extractions. If all of them agree on the new, narrower element
type, and all of them can be modelled as ISD::EXTRACT_VECTOR_ELT's of that
new element type, do that, but only if unmodelled users are ISD::BUILD_VECTOR.
We might have sunk a bitcast into shuffle, and now it might be operating
on more fine-grained elements than what we'd match, so we must not be
dependent on whatever the granularity the shuffle happened to be in,
but transform it into the one canonical for us - with widest elements.
Sometimes we end up with a shuffles in DAG that would be
better represented as a `ISD::ZERO_EXTEND_VECTOR_INREG`,
and a failure to do so causes suboptimal codegen in a number of cases,
especially when we will then cast vector to scalar.
I acknowledge, the test changes here are rather underwhelming,
but as with all of codegen, it's always a yak shawing,
and this is the most stripped down version of the patch
that shows *some* effect without having insurmountable amount
of fallout to deal with. The next change resolves this regression.
The transformation will be extended in follow-ups.
Adding zero-ext support isn't as straight-forward, and it's easier
to to so in a new function, but this helper is useful there.
This does not change any existing behaviour.
Depending on the particular DAG, we might either create a `freeze`,
or not. And only in the former case, the cycle would be formed.
It would be nicer to have `ReplaceAllUsesOfValueWithIf()`,
like we have in IR, but we don't have that.
Fixes https://github.com/llvm/llvm-project/issues/59677
The original code was confusing. It was stripping poison-generating flags,
but the comments were saying that doing so was a TODO.
If the poison-generating flags are present, then even if all operands
are guaranteed not to be undef or poison, the whole operation may still
produce undef or poison. We can still deal with that case,
and we already do deal with it in fact, by also dropping those flags.
Refs. https://github.com/llvm/llvm-project/issues/59676
Lack of such operands implies that the op might be poison-producing due to
it's flags. We seem to drop them already, but the comments are confusing.
Fixes https://github.com/llvm/llvm-project/issues/59676
This may allow us to further simplify the vector,
and freezing the extracted result is still fine:
```
----------------------------------------
define i8 @src(<2 x i8> %src, i64 %idx) {
%0:
%i1 = freeze <2 x i8> %src
%i2 = extractelement <2 x i8> %i1, i64 %idx
ret i8 %i2
}
=>
define i8 @tgt(<2 x i8> %src, i64 %idx) {
%0:
%i1 = extractelement <2 x i8> %src, i64 %idx
%i2 = freeze i8 %i1
ret i8 %i2
}
Transformation seems to be correct!
```
BUT, there must not be other uses of that freeze,
see `@freeze_extractelement_extra_use`.
Also, looks like we are missing some ISEL-level handling for freeze.
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
This fixes LLVMMIRParser, LLVMGlobalISel, LLVMAsmPrinter, LLVMSelectionDAG.
This adds a fold which teaches the backend to fold
splat(bitcast(buildvector(x,..))) or
splat(bitcast(scalar_to_vector(x))) to a single splat.
This only handles lane 0 splats, which are only valid under LE, and
needs to be a little careful with the types it creates for the new
buildvector.
Differential Revision: https://reviews.llvm.org/D139611