Try to remove extra bitcasts around logicops if we're dealing with illegal types
Fixes the regressions in D145939
Differential Revision: https://reviews.llvm.org/D146032
Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis.
No explosions seen during internal testing so this looks like a smooth transition.
Reviewed By: sameerds
Differential Revision: https://reviews.llvm.org/D145918
Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis.
No explosions seen during internal testing so this looks like a smooth transition.
Reviewed By: sameerds
Differential Revision: https://reviews.llvm.org/D145918
[DAGCombiner] handle more store value forwarding
When lowering calls on target like PPC, some stack loads
will be generated for by value parameters. Node CALLSEQ_START
prevents such loads from being combined.
Suggested by @RolandF, this patch removes the unnecessary
loads for the byval parameter by extending ForwardStoreValueToDirectLoad
Reviewed By: nemanjai, RolandF
Differential Revision: https://reviews.llvm.org/D138899
Try to more aggressively narrow masks of extended values.
This is mainly for cases where the mask is trying to zero out any_extended upper bits, assuming we can zext/trunc the values for free.
This catches a few actual missed folds, as well as helps canonicalize a number of other cases which were being caught in isel etc.
Differential Revision: https://reviews.llvm.org/D145866
The patch mainly does two things. The first is allowing scalable vector
ISD::STRICT_FP_EXTEND. The second is making RISC-V customized lower
strict_fpextend to riscv_strict_fpextend_vl, the strict version of
riscv_fpextend_vl.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145548
This patch resolves suboptimal code generation reported by https://github.com/llvm/llvm-project/issues/60571 .
DAGCombiner currently converts `(x or/xor const) + y` to `(x + y) + const` if this is valid.
However, if `.. + const` is broken down into a sequences of adds with carries, the benefit is not clear, introducing two more add(-with-carry) ops (total 6) in the case of the reported issue whereas the optimal sequence must only have 4 add(-with-carry)s.
This patch resolves this issue by allowing this conversion only when (1) `.. + const` is legal or promotable, or (2) `const` is a sign bit because it does not introduce more adds.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D144116
The merged store touches memory for other underlying objects, so mapping
the merged store to the first underlying object is not correct. For example
in https://github.com/llvm/llvm-project/issues/60744, the merged store is
not correctly analyzed as dependent with memory operations which are also
part of the merged store.
Fixes#60744
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D144711
This is generally done by the InstCombine, but can be emitted as an
intermediate step and is cheap to handle.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D145177
This is generally done by the InstCombine, but can be emitted as an
intermediate step and is cheap to handle.
Differential Revision: https://reviews.llvm.org/D145143
It turns out that there are relatively trivial, albeit rare, cases that
require a MaxDepth of more than 16 (see added test). However, we want to
avoid having to rely on a large fixed MaxDepth.
Since these cases are relatively rare, apply the following strategy:
1. Start with a low MaxDepth of 16 - if the entry node was not
reached, we can return (the common case).
2. If the entry node was reached, exponentially increase MaxDepth up
to some large limit that should cover all cases and guard against
stack exhaustion.
This retains the better performance with a low MaxDepth in the common
case, and in complex cases backs off and retries. On a whole, this is
preferable vs. starting with a large MaxDepth which would unnecessarily
penalize the common case where a low MaxDepth is sufficient.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D145386
To make legalization easier, the operands and outputs have the same size for
these ISD Nodes. When legalizing the results in PromoteIntegerResult the operands
are legalized to the same size as the outputs.
The ISD Node has two output/results, therefore the legalizing functions update
both results/outputs.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D144846
Move MaxDepth into the lambda, since it is not needed outside. This
fixes some compilers that complain about missing capture:
error C3493: 'MaxDepth' cannot be implicitly captured because no
default capture mode has been specified
Fixes: f693932fbea7 ("[SelectionDAG] Transitively copy NodeExtraInfo on RAUW")
During legalization of the SelectionDAG, some nodes are replaced with
arch-specific nodes. These may be complex nodes, where the root node no
longer corresponds to the node that should carry the extra info.
Fix the issue by copying extra info to the new node and all its new
transitive operands during RAUW. See code comments for more details.
This fixes the remaining pcsections-atomics.ll tests on X86.
v2: Optimize copyExtraInfo() deep copy. For now we assume that only
NodeExtraInfo that have PCSections set require deep copy. Furthermore,
limit the depth of graph search while pre-populating the visited set,
assuming the to-be-replaced subgraph 'From' has limited complexity. An
assertion catches if the maximum depth needs to be increased.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D144677
This is guarding a check for isTypeLegal so it should check is
LegalTypes.
Fixes PR61111.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D145139
This patch adds AArch64 CodeGen support such that the type can be passed
and returned to/from functions, and also adds support to use this type in
load/store operations and PHI nodes.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D136862
As part of this work, removing `SDDbgValue::clearIsEmitted` originally added for
`dbg.addr` in 045c67769d7fe577fc38cccb6fb40fd814437447 was attempted, but it
appears some tests for `DBG_INSTR_REF` now depend on that behaviour as well, so
it was kept and comments were updated instead.
Part of `dbg.addr` removal
Discussed in https://discourse.llvm.org/t/what-is-the-status-of-dbg-addr/62898
Differential Revision: https://reviews.llvm.org/D144800
This removes the unused `FuncArgumentDbgValueKind::Addr` value originally added
by e24f5348798605a799c63ff09169d177d262cd37. The intent was to signal the
original intrinsic that marked a function argument, but the `Addr` part was
never used.
Part of `dbg.addr` removal
Discussed in https://discourse.llvm.org/t/what-is-the-status-of-dbg-addr/62898
Differential Revision: https://reviews.llvm.org/D144794
As noticed on D144789, when we have pairs of min/max nodes we often end up with multiple comparisons which we could reuse with commuted select ops, so check to see if a suitable SETCC already exists. This also allowed us to remove a similar X86 peephole.
There are other getSETCC cases where we could safely reuse other CondCodes as well - I've been trying to think of how we could reuse this logic in SelectionDAG but haven't found anything that always works well.
An alternative would be to have a TLI callback that returns a preferred CondCode from a list of options, I've noticed this helped fpclamptosat tests on some other targets (MVE + WebAssembly), but other tests suffered.
Differential Revision: https://reviews.llvm.org/D145065
To make legalization easier, the operands and outputs have the same size for
these ISD Nodes. When legalizing the results in SplitVectorResult the operands
are legalized to the same size as the outputs.
The ISD Node has two output/results, therefore the legalizing functions update
both results/outputs.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144744
If we have sext_inreg(vector_extract(x)) but the top bits are not used, DAG
will try to remove the sext_inreg, using vector_extract(x) directly. This can
lead to multiple uses of both sext_inreg(vector_extract(x)) and
vector_extract(x), leading to the generation of both umov and smov extracts.
This adds a target hook to prevent that under AArch64 where the sext_inreg can
be considered free if there are multiple uses of the sext and no uses of the
vector_extract. This helps fix a small regression from D144550.
Differential Revision: https://reviews.llvm.org/D144850
During legalization of the SelectionDAG, some nodes are replaced with
arch-specific nodes. These may be complex nodes, where the root node no
longer corresponds to the node that should carry the extra info.
Fix the issue by copying extra info to the new node and all its new
transitive operands during RAUW. See code comments for more details.
This fixes the remaining pcsections-atomics.ll tests on X86.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D144677
`(and/or (icmp eq/ne A,C0), (icmp eq/ne A,C1))` can be lowered to
`(icmp eq/ne (and (sub A, (smin C0, C1)), (not (sub (smax C0, C1), (smin C0, C1)))), 0)`
generically if `(sub (smax C0, C1), (smin C0,C1))` is a power of 2.
This covers the existing case of `(and/or (icmp eq/ne A, C_Pow2),(icmp eq/ne A, -C_Pow2))`
as well as other cases.
Alive2 Links:
EQ: https://alive2.llvm.org/ce/z/mLJiUW
NE: https://alive2.llvm.org/ce/z/TKnzUr
Differential Revision: https://reviews.llvm.org/D144283
This is recommit of 2e416cdd52, fixed to be accepatble by GCC.
The original commit message is below.
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.
Differential Revision: https://reviews.llvm.org/D144241
On targets without ADDCARRY or ADDE, we need to emit a separate
SETCC to determine carry from the low half to the high half. Usually
we do (setult Lo, LHSLo). If RHSLo is 1 we can instead do (seteq Lo, 0).
This can reduce the live range of LHSLo.
On targets that lack ADDCARRY support we split a wide uaddo into
an ADD and a SETCC that both need to be split.
For (uaddo X, 1) we can observe that when the add overflows the result
will be 0. We can emit (seteq (or Lo, Hi), 0) to detect this.
This improves D142071.
There is an alternative here. We could use either ~(lo(X) & hi(X)) == 0 or
(lo(X) & hi(X)) == -1 before the addition. That would be closer to the
code before D142071.
Reviewed By: liaolucy
Differential Revision: https://reviews.llvm.org/D144614
The m_VScale() matcher is unusual in that it requires a DataLayout.
It is currently used to determine the size of the GEP type. However,
I believe it is sufficient to check for the canonical
<vscale x 1 x i8> form here -- I don't think there's a need to
recognize exotic variations like <vscale x 1 x i4> as a vscale
constant representation as well.
Differential Revision: https://reviews.llvm.org/D144566
The patch ensures last two operands of vp.abs/ctlz/cttz are mask and evl.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144536
This reverts commit e7613c1d9b259bdf2b0b06b4169d9a10dd553406.
GCC issues an error:
In file included from /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/ADT/BitmaskEnumTest.cpp:9:
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:66:22: error: explicit specialization of template<class E, class Enable> struct llvm::is_bitmask_enum outside its namespace must use a nested-name-specifier [-fpermissive]
66 | template <> struct is_bitmask_enum<Enum> : std::true_type {}; \
| ^~~~~~~~~~~~~~~~~~~~~
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/ADT/BitmaskEnumTest.cpp:30:1: note: in expansion of macro LLVM_DECLARE_ENUM_AS_BITMASK
30 | LLVM_DECLARE_ENUM_AS_BITMASK(Flags2, V4);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is recommit of 2e416cdd52, reverted in 8555ab2fcd, because GCC
complains on extra qualification. The macro LLVM_DECLARE_ENUM_AS_BITMASK
does not specify llvm:: anymore, so the macro must occur in the namespace
llvm. Documentation updated accordingly. The original commit message is below.
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.
Differential Revision: https://reviews.llvm.org/D144241
65420c8041f4 introduced an ICE in combineMinNumMaxNum(...) when
combineMinNumMaxNumImpl(...) returns an SDValue(). Make sure to check that a
value is returned before trying to perform an FNEG on it.
GitHub Issue: #60924
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D144571
This partially reverts a regression introduced in 8f25e382c5b1 for
AArch64 targets. In particular, we restore the logic of `(abs (sub nsw
x, y)) -> abds(x, y)` for all targets except X86, which keeps the logic
introduced in 8f25e382c5b1. See also https://reviews.llvm.org/D142288.
Differential Revision: https://reviews.llvm.org/D144379
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.
Differential Revision: https://reviews.llvm.org/D144241
This patch adds 2 new intrinsics:
; Interleave two vectors into a wider vector
<vscale x 4 x i64> @llvm.vector.interleave2.nxv2i64(<vscale x 2 x i64> %even, <vscale x 2 x i64> %odd)
; Deinterleave the odd and even lanes from a wider vector
{<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.vector.deinterleave2.nxv2i64(<vscale x 4 x i64> %vec)
The main motivator for adding these intrinsics is to support vectorization of
complex types using scalable vectors.
The intrinsics are kept simple by only supporting a stride of 2, which makes
them easy to lower and type-legalize. A stride of 2 is sufficient to handle
complex types which only have a real/imaginary component.
The format of the intrinsics matches how `shufflevector` is used in
LoopVectorize. For example:
using cf = std::complex<float>;
void foo(cf * dst, int N) {
for (int i=0; i<N; ++i)
dst[i] += cf(1.f, 2.f);
}
For this loop, LoopVectorize:
(1) Loads a wide vector (e.g. <8 x float>)
(2) Extracts odd lanes using shufflevector (leading to <4 x float>)
(3) Extracts even lanes using shufflevector (leading to <4 x float>)
(4) Performs the addition
(5) Interleaves the two <4 x float> vectors into a single <8 x float> using
shufflevector
(6) Stores the wide vector.
In this example, we can 1-1 replace shufflevector in (2) and (3) with the
deinterleave intrinsic, and replace the shufflevector in (5) with the
interleave intrinsic.
The SelectionDAG nodes might be extended to support higher strides (3, 4, etc)
as well in the future.
Similar to what was done for vector.splice and vector.reverse, the intrinsic
is lowered to a shufflevector when the type is fixed width, so to benefit from
existing code that was written to recognize/optimize shufflevector patterns.
Note that this approach does not prevent us from adding new intrinsics for other
strides, or adding a more generic shuffle intrinsic in the future. It just solves
the immediate problem of being able to vectorize loops with complex math.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D141924