This is recommit of 2e416cdd52, fixed to be accepatble by GCC.
The original commit message is below.
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.
Differential Revision: https://reviews.llvm.org/D144241
On targets without ADDCARRY or ADDE, we need to emit a separate
SETCC to determine carry from the low half to the high half. Usually
we do (setult Lo, LHSLo). If RHSLo is 1 we can instead do (seteq Lo, 0).
This can reduce the live range of LHSLo.
On targets that lack ADDCARRY support we split a wide uaddo into
an ADD and a SETCC that both need to be split.
For (uaddo X, 1) we can observe that when the add overflows the result
will be 0. We can emit (seteq (or Lo, Hi), 0) to detect this.
This improves D142071.
There is an alternative here. We could use either ~(lo(X) & hi(X)) == 0 or
(lo(X) & hi(X)) == -1 before the addition. That would be closer to the
code before D142071.
Reviewed By: liaolucy
Differential Revision: https://reviews.llvm.org/D144614
Where the new checks have been added, `SymmetricDifference` - still being built
- contains entries for variables present in `A` and not in `B`. If
`SymmetricDifference` is empty at this point it means the variables (map keys)
in `A` are a subset of those in `B`, so if `A` and `B` are the same size then
we know they're identical.
This reduces the number of instructions retired building some of the CTMark
projects in a ReleaseLTO-g configuration (geomean change -0.05% with the best
improvement being -0.24% for tramp3d-v4)
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D144621
The m_VScale() matcher is unusual in that it requires a DataLayout.
It is currently used to determine the size of the GEP type. However,
I believe it is sufficient to check for the canonical
<vscale x 1 x i8> form here -- I don't think there's a need to
recognize exotic variations like <vscale x 1 x i4> as a vscale
constant representation as well.
Differential Revision: https://reviews.llvm.org/D144566
The size lower bound is known - the `Join` map in both cases gets an entry for
each variable from both input maps (union).
This reduces the number of times the map grows, improving ReleaseLTO-g compile
time for CTMark projects by an average of around 0.2%.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D144486
The patch ensures last two operands of vp.abs/ctlz/cttz are mask and evl.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144536
This reverts commit e7613c1d9b259bdf2b0b06b4169d9a10dd553406.
GCC issues an error:
In file included from /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/ADT/BitmaskEnumTest.cpp:9:
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:66:22: error: explicit specialization of template<class E, class Enable> struct llvm::is_bitmask_enum outside its namespace must use a nested-name-specifier [-fpermissive]
66 | template <> struct is_bitmask_enum<Enum> : std::true_type {}; \
| ^~~~~~~~~~~~~~~~~~~~~
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/ADT/BitmaskEnumTest.cpp:30:1: note: in expansion of macro LLVM_DECLARE_ENUM_AS_BITMASK
30 | LLVM_DECLARE_ENUM_AS_BITMASK(Flags2, V4);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is recommit of 2e416cdd52, reverted in 8555ab2fcd, because GCC
complains on extra qualification. The macro LLVM_DECLARE_ENUM_AS_BITMASK
does not specify llvm:: anymore, so the macro must occur in the namespace
llvm. Documentation updated accordingly. The original commit message is below.
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.
Differential Revision: https://reviews.llvm.org/D144241
65420c8041f4 introduced an ICE in combineMinNumMaxNum(...) when
combineMinNumMaxNumImpl(...) returns an SDValue(). Make sure to check that a
value is returned before trying to perform an FNEG on it.
GitHub Issue: #60924
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D144571
These checks show optimized instructions if an operand is known to be
(partially) zero.
Change-Id: Ie2f6d0d3ee9d5b279d1f4c1dd0787492e39cc77a
Differential Revision: https://reviews.llvm.org/D140208
This partially reverts a regression introduced in 8f25e382c5b1 for
AArch64 targets. In particular, we restore the logic of `(abs (sub nsw
x, y)) -> abds(x, y)` for all targets except X86, which keeps the logic
introduced in 8f25e382c5b1. See also https://reviews.llvm.org/D142288.
Differential Revision: https://reviews.llvm.org/D144379
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.
Differential Revision: https://reviews.llvm.org/D144241
This patch adds 2 new intrinsics:
; Interleave two vectors into a wider vector
<vscale x 4 x i64> @llvm.vector.interleave2.nxv2i64(<vscale x 2 x i64> %even, <vscale x 2 x i64> %odd)
; Deinterleave the odd and even lanes from a wider vector
{<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.vector.deinterleave2.nxv2i64(<vscale x 4 x i64> %vec)
The main motivator for adding these intrinsics is to support vectorization of
complex types using scalable vectors.
The intrinsics are kept simple by only supporting a stride of 2, which makes
them easy to lower and type-legalize. A stride of 2 is sufficient to handle
complex types which only have a real/imaginary component.
The format of the intrinsics matches how `shufflevector` is used in
LoopVectorize. For example:
using cf = std::complex<float>;
void foo(cf * dst, int N) {
for (int i=0; i<N; ++i)
dst[i] += cf(1.f, 2.f);
}
For this loop, LoopVectorize:
(1) Loads a wide vector (e.g. <8 x float>)
(2) Extracts odd lanes using shufflevector (leading to <4 x float>)
(3) Extracts even lanes using shufflevector (leading to <4 x float>)
(4) Performs the addition
(5) Interleaves the two <4 x float> vectors into a single <8 x float> using
shufflevector
(6) Stores the wide vector.
In this example, we can 1-1 replace shufflevector in (2) and (3) with the
deinterleave intrinsic, and replace the shufflevector in (5) with the
interleave intrinsic.
The SelectionDAG nodes might be extended to support higher strides (3, 4, etc)
as well in the future.
Similar to what was done for vector.splice and vector.reverse, the intrinsic
is lowered to a shufflevector when the type is fixed width, so to benefit from
existing code that was written to recognize/optimize shufflevector patterns.
Note that this approach does not prevent us from adding new intrinsics for other
strides, or adding a more generic shuffle intrinsic in the future. It just solves
the immediate problem of being able to vectorize loops with complex math.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D141924
The uniformity analysis treated an undef argument to phi to be distinct from any
other argument, equivalent to calling PHINode::hasConstantValue() instead of
PHINode::hasConstantOrUndefValue(). Such a phi was reported as divergent. This
is different from the older divergence analysis which treats such a phi as
uniform. Fixed uniformity analysis to match the older behaviour.
The original behaviour was added to DivergenceAnalysis in D19013. But it is not
clear if relying on the undef value is safe. The defined values are not constant
per se; they just happen to be uniform and the non-constant uniform value may
not dominate the PHI.
Reviewed By: ruiling
Differential Revision: https://reviews.llvm.org/D144254
This fixes a corner case where we would skip doing an alias check because of a
>= vs > bug, due to the presence of a non-aliasing instruction, in this case
the load %safeld.
Fixes issue #59376
The extending loads combine tries to prefer sign-extends folding into loads vs
zexts, and in cases where a G_ZEXTLOAD is first used by a G_ZEXT, and then used
by a G_SEXT, it would select the G_SEXT even though the load is already
zero-extending.
Fixes issue #59630
This patch introduces a new type __externref_t that denotes a WebAssembly opaque
reference type. It also implements builtin __builtin_wasm_ref_null_extern(),
that returns a null value of __externref_t. This lays the ground work
for further builtins and reference types.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D122215
For in-order cores MachineCombiner makes better decisions when the critical path
is calculated only for the current basic block and does not take into account
other blocks from the trace.
This patch adds a virtual method to TargetInstrInfo to allow each target decide
which strategy to use.
Depends on D140541
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D140542
Some of TargetLowering functions needed opcodes are often used in DAGCombiner.
The patch make those MatchContextClass classes have TargetLowering members and
pass specific opcodes for those TargetLowering functions.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D144075
Now that we've inserted a call to an intrinsic, we need to update
certain previous uses of CallBrInst values to use the value of this
intrinsic instead.
There are 3 cases to handle:
1. The @llvm.callbr.landingpad.<type>() intrinsic call is in the same
BasicBlock as the use of the callbr we're replacing.
2. The use is dominated by the direct destination.
3. The use is not dominated by the direct destination, and may or may
not be dominated by the indirect destination.
Part 2c of
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8.
Reviewed By: efriedma, void, jyknight
Differential Revision: https://reviews.llvm.org/D139970
We are about to allow different trace strategies for MachineCombiner. Make
the name of the ensemble strategy-neutral.
Depends on D140540
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D140541
This strategy makes each trace local to the basic block. For in-order cores some
heuristics work better when we do local decisions. For example, MachineCombiner
may expect that instructions outside the current basic block do not lengthen
the critical path when we execute instructions in order or the core has a
small re-order buffer.
This patch only introduce the strategy, real use-case is added in the further
pathes.
Depends on D140539
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D140540
Recommit bbdf24357932b064f2aa18ea1356b474e0220dde.
Original commit message:
If a chain of two selects share a true/false value and are controlled
by two setcc nodes, that are never both true, we can fold away one of
the selects. So, the following:
(select (setcc X, const0, eq), Y,
(select (setcc X, const1, eq), Z, Y))
Can be combined to:
select (setcc X, const1, eq) Z, Y
Differential Revision: https://reviews.llvm.org/D142535
This doesn't make sense as an option. fneg and fabs are bit
preserving by definition. If a target has some fneg or fabs
instruction that are not bitpreserving it's incorrect to lower
fneg/fabs to use it.
Make forward declaration possible to reduce amount of dependencies and reduce
re-compilation burden caused by further patches.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D140539