value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
This fixes LLVMMIRParser, LLVMGlobalISel, LLVMAsmPrinter, LLVMSelectionDAG.
This adds a fold which teaches the backend to fold
splat(bitcast(buildvector(x,..))) or
splat(bitcast(scalar_to_vector(x))) to a single splat.
This only handles lane 0 splats, which are only valid under LE, and
needs to be a little careful with the types it creates for the new
buildvector.
Differential Revision: https://reviews.llvm.org/D139611
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
This fixes the miscompile in issue #58883.
The test demonstrates that we gave up on store merging in that example.
This change should be strictly safe (just adds another clause
to avoid the transform), and it does not prohibit any existing
valid optimizations based on regression tests. I want to believe
that it's also a sufficient fix (possibly overkill), but I'm not
sure how to prove that.
Differential Revision: https://reviews.llvm.org/D137791
As discussed on Issue #59217, under certain circumstances the DAG can generate duplicate MUL and MUL_LOHI nodes, often during MULO legalization.
This patch attempts to replace MUL nodes with additional uses of the LO result from the MUL_LOHI node
Differential Revision: https://reviews.llvm.org/D138790
or (xor x, y), x --> or x, y
or (xor x, y), y --> or x, y
or (xor x, y), (and x, y) --> or x, y
or (xor x, y), (or x, y) --> or x, y
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D138401
This now allows folding an AND of a anyext masked_load to a
zext_masked_load even if the masked load has multiple users. Doing is
eliminates some redundant ANDs/MOVs for certain AArch64 SVE code.
I'm not sure if there's any cases where doing this could negatively the
other users of the masked_load. Looking at other optimizations of
masked loads, most don't apply if the load is used more than once, so it
doesn't look like this would interfere.
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D137844
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.
A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.
Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
This was disabled to prevent regressions, which appear to be just occurring on AMDGPU (at least in our current lit tests), which I've addressed by adding AMDGPUTargetLowering::isDesirableToCommuteWithShift overrides.
Fixes#57872
Differential Revision: https://reviews.llvm.org/D136042
This bug was introduced with D136713 / 54eeadcf442df91aed0 .
As an enhancement, we could cast operands to the expected type,
but we need to make sure that is done correctly (zext vs. sext).
It's also possible (but seems unlikely) that an operand can have
a type larger than the result type.
Fixes#58661
scalar-to-vector (scalar binop (extractelt V, Idx), C) --> shuffle (vector binop V, C'), {Idx, -1, -1...}
We generally try to avoid ad-hoc vectorization in SDAG,
but the motivating case from issue #39482 escapes our
normal vectorization folds in IR. It seems like it should
always be a win to transform this pattern in cases where
we have the same vector type for input and output and the
target supports the vector operation. That avoids
transfers from vector to scalar and back.
In the x86 shift examples, we create the scalar-to-vector
node during legalization. I'm not sure if there's a more
general way to create the pattern for testing. (If so, I
could add tests for other targets.)
Differential Revision: https://reviews.llvm.org/D136713
This is not quite NFC because one of the users should
now avoid the DIVREM opcodes too, but I'm not sure
how to test that.
I used the same name as an analysis function in IR
in case we want to expand this to include other
operations.
Another potential use is proposed in D136713.
If the upper half of an abs() is all sign bits, then we can perform the abs() using just the lower half and then zero extend.
I've limited the DAG combine to only sign_extend_inreg (and free truncate/zero_extend) to minimise any later promotion issues, but for legalization a similar fold can use ComputeNumSignBits to be more aggressive.
Alive2: https://alive2.llvm.org/ce/z/y32fS4Fixes#43370
Differential Revision: https://reviews.llvm.org/D136559
This reverts commit e8b3ffa532b8ebac5dcdf17bb91b47817382c14d.
The AMDGPU/mad_64_32.ll seems to fail on some of the build bots but
passes locally. I'm really confused.
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional
negate of Y.
This pattern shows up when type legalizing wide multiplies involving
a sign extended value.
Fixes PR57549.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D133399
The CanFoldNonConst doesn't work correctly with opaque constants
because getNode won't constant fold constants if one is opaque. Even
if the operation is AND/OR. This can lead to infinite loops.
This patch does the folding manually in the DAGCombine. Alternatively,
we could improve getNode but that seemed likely to have bigger impact
and possibly increase compile time for the additional checks. We wouldn't
want to directly constant fold because we need to preserve the opaque flag.
Fixes PR58511.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D136472
@foad was right - this isn't actually going to help with D136042 as much as hoped, we need a better AMDGPU-specific solution as other targets are likely to make use of it
Helps with some of the AMDGPU regressions identified in D136042 where we were losing signed BFE patterns after sinking shifts behind logic ops.
Differential Revision: https://reviews.llvm.org/D136081
Instead of checking that an operand is constant/opaque before calling getNode() and then checking that the result is a constant, just use FoldConstantArithmetic which will just early-out if the operands are not constant foldable.
The crash case comes from #58350. It have two stores, one store is type f32 and the other is v1f32.
When we try to merge these two stores on v1f32, the memVT is vector type so the old code will use ISD::EXTRACT_SUBVECTOR for type f32 also then compiler crash.
So this patch insert a build_vector for f32 store to generate v1f32 also when memVT is v1f32.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D135954
This reverts commit 0148df8157f05ecf3b1064508e6f012aefb87dad.
Getting a lit test failures on AMDGPU but I can't reproduce it so far.
Reverting to investigate.
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional
negate of Y.
This pattern shows up when type legalizing wide multiplies involving
a sign extended value.
Fixes PR57549.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D133399
The first two parameters of addcarry are commutative. We may face a situation where both variant are present in the DAG, in which case we benefit from using just one.
Depends on D57302 and D33587
Reviewed By: RKSimon, chfast
Differential Revision: https://reviews.llvm.org/D57317
Since SROA chooses promotion based on reaching load / stores of allocas, we may run into scenarios in which we alloca a vector, but promote it to an integer. The result of which is the familiar LoadCombine pattern (i.e. ZEXT, SHL, OR). However, instead of coming directly from distinct loads, the elements to be combined are coming from ExtractVectorElements which stem from a shared load.
This patch identifies such a pattern and combines it into a load.
Change-Id: I0bc06588f11e88a0a975cde1fd71e9143e6c42dd