This kind of pattern seems to come up as regressions
with better ZERO_EXTEND_VECTOR_INREG recognition.
For initial implementation, this is quite restricted
to the minimal viable transform, otherwise there are
too many regressions to be dealt with.
DAGCombiner replaces (load const_addr1) directly chained with (store
(val, const_addr2)) with val if address space stripped const_addr1 ==
const_addr2. The patch fixes the issue by checking address spaces as
well. However, it might makes sense to not to chain together side
effects that belong to different address spaces in the first place and
make SelectionDAG::root address space aware.
This mainly cleans up a few patterns that are legalized by scalarization
from a wide-element vector, but then are further split apart to build
a more narrow-sized-element vector. In particular this happens in some
cases for illegal ISD::ZERO_EXTEND_VECTOR_INREG.
Given a ISD::EXTRACT_VECTOR_ELT, which is a glorified bit sequence extract,
recursively analyse all of it's users. and try to model themselves as
bit sequence extractions. If all of them agree on the new, narrower element
type, and all of them can be modelled as ISD::EXTRACT_VECTOR_ELT's of that
new element type, do that, but only if unmodelled users are ISD::BUILD_VECTOR.
We might have sunk a bitcast into shuffle, and now it might be operating
on more fine-grained elements than what we'd match, so we must not be
dependent on whatever the granularity the shuffle happened to be in,
but transform it into the one canonical for us - with widest elements.
Sometimes we end up with a shuffles in DAG that would be
better represented as a `ISD::ZERO_EXTEND_VECTOR_INREG`,
and a failure to do so causes suboptimal codegen in a number of cases,
especially when we will then cast vector to scalar.
I acknowledge, the test changes here are rather underwhelming,
but as with all of codegen, it's always a yak shawing,
and this is the most stripped down version of the patch
that shows *some* effect without having insurmountable amount
of fallout to deal with. The next change resolves this regression.
The transformation will be extended in follow-ups.
Adding zero-ext support isn't as straight-forward, and it's easier
to to so in a new function, but this helper is useful there.
This does not change any existing behaviour.
Depending on the particular DAG, we might either create a `freeze`,
or not. And only in the former case, the cycle would be formed.
It would be nicer to have `ReplaceAllUsesOfValueWithIf()`,
like we have in IR, but we don't have that.
Fixes https://github.com/llvm/llvm-project/issues/59677
The original code was confusing. It was stripping poison-generating flags,
but the comments were saying that doing so was a TODO.
If the poison-generating flags are present, then even if all operands
are guaranteed not to be undef or poison, the whole operation may still
produce undef or poison. We can still deal with that case,
and we already do deal with it in fact, by also dropping those flags.
Refs. https://github.com/llvm/llvm-project/issues/59676
Lack of such operands implies that the op might be poison-producing due to
it's flags. We seem to drop them already, but the comments are confusing.
Fixes https://github.com/llvm/llvm-project/issues/59676
This may allow us to further simplify the vector,
and freezing the extracted result is still fine:
```
----------------------------------------
define i8 @src(<2 x i8> %src, i64 %idx) {
%0:
%i1 = freeze <2 x i8> %src
%i2 = extractelement <2 x i8> %i1, i64 %idx
ret i8 %i2
}
=>
define i8 @tgt(<2 x i8> %src, i64 %idx) {
%0:
%i1 = extractelement <2 x i8> %src, i64 %idx
%i2 = freeze i8 %i1
ret i8 %i2
}
Transformation seems to be correct!
```
BUT, there must not be other uses of that freeze,
see `@freeze_extractelement_extra_use`.
Also, looks like we are missing some ISEL-level handling for freeze.
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
This fixes LLVMMIRParser, LLVMGlobalISel, LLVMAsmPrinter, LLVMSelectionDAG.
This adds a fold which teaches the backend to fold
splat(bitcast(buildvector(x,..))) or
splat(bitcast(scalar_to_vector(x))) to a single splat.
This only handles lane 0 splats, which are only valid under LE, and
needs to be a little careful with the types it creates for the new
buildvector.
Differential Revision: https://reviews.llvm.org/D139611
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
This fixes the miscompile in issue #58883.
The test demonstrates that we gave up on store merging in that example.
This change should be strictly safe (just adds another clause
to avoid the transform), and it does not prohibit any existing
valid optimizations based on regression tests. I want to believe
that it's also a sufficient fix (possibly overkill), but I'm not
sure how to prove that.
Differential Revision: https://reviews.llvm.org/D137791
As discussed on Issue #59217, under certain circumstances the DAG can generate duplicate MUL and MUL_LOHI nodes, often during MULO legalization.
This patch attempts to replace MUL nodes with additional uses of the LO result from the MUL_LOHI node
Differential Revision: https://reviews.llvm.org/D138790
or (xor x, y), x --> or x, y
or (xor x, y), y --> or x, y
or (xor x, y), (and x, y) --> or x, y
or (xor x, y), (or x, y) --> or x, y
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D138401
This now allows folding an AND of a anyext masked_load to a
zext_masked_load even if the masked load has multiple users. Doing is
eliminates some redundant ANDs/MOVs for certain AArch64 SVE code.
I'm not sure if there's any cases where doing this could negatively the
other users of the masked_load. Looking at other optimizations of
masked loads, most don't apply if the load is used more than once, so it
doesn't look like this would interfere.
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D137844
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.
A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.
Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
This was disabled to prevent regressions, which appear to be just occurring on AMDGPU (at least in our current lit tests), which I've addressed by adding AMDGPUTargetLowering::isDesirableToCommuteWithShift overrides.
Fixes#57872
Differential Revision: https://reviews.llvm.org/D136042
This bug was introduced with D136713 / 54eeadcf442df91aed0 .
As an enhancement, we could cast operands to the expected type,
but we need to make sure that is done correctly (zext vs. sext).
It's also possible (but seems unlikely) that an operand can have
a type larger than the result type.
Fixes#58661