Or else InstCombine can incorrectly report that no change has been made.
This optimization doesn't really fit into InstCombine since it optimizes multiple instructions at once; there's likely a more comprehensive fix.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D146064
This patch adds support for recording BuildIds usng the sanitizer
ListOfModules API. We add another entry to the SegmentEntry struct and
change the memprof raw version.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D145190
After merging main part of the gather/buildvector code, CreateShuffle
lambda can removed and ShuffleBuilder add functions can be used instead.
Also, part of the code from CreateShuffle migrated to createShuffle of
the BaseShuffleAnalysis::createShuffle function for better code emission.
Differential Revision: https://reviews.llvm.org/D145988
This is a partial revert of D128830, restoring the previous
position of DeadArgElim in the fat LTO pipeline. The motivation
for this is a major code size regression observed in Rust and
illustrated in the PhaseOrdering test.
This is a conservative fix restoring the previous pipeline order.
The real problem is that the LTO pipeline is conceptually broken:
It doesn't have a CGSCC function simplification pipeline. The
inliner is just being run by itself. This wouldn't be a problem
if fat LTO used a standard design where ArgPromotion and DAE are
only run after functions have already been simplified by the
CGSCC inliner pipeline.
Differential Revision: https://reviews.llvm.org/D146051
In the degenerate case where the select is fed by an unsimplified
icmp with two constant operands, don't try to replace one constant
with another. Wait for the icmp to be simplified first instead.
Fixes https://github.com/llvm/llvm-project/issues/61361.
Because widenable conditions with eventually lower into a constant, such instructions
as `and`, `or` etc. will also be optimized away. Treat them as free.
This is an important thing to have if we want that guards represented as experimental.guard
calls and in their explicit form (branch by `and` with widenable condition) have the same cost
for unroller and other passes like this.
Differential Revision: https://reviews.llvm.org/D146034
Reviewed By: nikic
operation for combined entries.
The vector factor after combining of the shuffle entries is defined by
the size of the mask, not by the vector factors of the original
entries. So, need to adjust it to emit correct code.
Reapply with a fix for phi handling: For phis, we need to insert
into the incoming block, not above the phi. This is especially
tricky if there are multiple incoming values from the same
predecessor, because these must all use the same value.
-----
LowerTypeTests replaces weak declarations with an icmp+select
constant expressions. As this is not a relocatable expression,
it additionally promotes initializers using it to global ctors.
As part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179,
I would like to remove the select constant expression, of which LTT
is now the last user. This is a bit tricky, because we now need to
replace a constant with an instruction, which might require
converting intermediate constant expression users to instructions as
well.
We do this using the convertUsersOfConstantsToInstructions() helper.
However, it needs to be slightly extended to also support expansion
of ConstantAggregates. These are important in this context, because
the promotion of initializers to global ctors will produce stores
of such aggregates.
Differential Revision: https://reviews.llvm.org/D145247
This intrinsic is not supposed to live through lowering, eventually it should turn
into `true` constant and be optimized away.
Differential Revision: https://reviews.llvm.org/D146027
Reviewed By: skatkov
There are a number of issues with the current code for converting
ule -> ult (etc) predicates for comparisons controlling finite loops:
* It sets nowrap flags, which may only hold for that particular
comparison, not globally. (PR60944)
* It doesn't check that the RHS is invariant. (I'm not sure this
can cause practical issues independently of the previous point.)
* It runs before simplifications that may be more profitable. (PR54191)
This patch moves the handling for this into computeExitLimitFromICmp(),
because it is somewhat tightly coupled with assumptions in that code,
and addresses the aforementioned issues.
Fixes https://github.com/llvm/llvm-project/issues/60944.
Fixes https://github.com/llvm/llvm-project/issues/54191.
Differential Revision: https://reviews.llvm.org/D145510
The TripCount liveins would currently be printed as badref in the vplan as they
are not allocated slots in the VPSlotTracker. This patch allocates them a slot
and adds them to the printed Live-Ins. It also makes a minor adjustment to
printing of Live-ins to reduce the empty lines when multiple Live-ins are
present.
Differential Revision: https://reviews.llvm.org/D145507
This patch adds support for recording BuildIds usng the sanitizer
ListOfModules API. We add another entry to the SegmentEntry struct and
change the memprof raw version.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D145190
This slightly increases the costs of InsertElement instructions that are part
of a vector splat sequence, i.e. a load, InsertElement and a shuffle (load +
dup). The resulting LD1R is a high latency instruction, and this slight
increase in costs avoids SLP vectorisation for a couple of cases where this
isn't profitable.
Fixes: https://github.com/llvm/llvm-project/issues/61047
Differential Revision: https://reviews.llvm.org/D145578
D143767 will change the intrinsics used to lower floating-point
svadd_x, svmul_x and svsub_x builtins. This will result in the
combines added as part of D140200 to no longer fire in all cases.
This patch extends the existing combines for contraction to cover
fadd_u, fmul_u and fsub_u intrinsics.
Differential Revision: https://reviews.llvm.org/D144413
This is the follow-up to D144199 and suggestion from D144045.
We make use of loop info explicit via InstCombine pass parameter
rather than semi-arbitrary via caching.
The only InstCombine transform that uses LoopInfo currently is a
GEP fold in visitGEPOfGEP(), so that shows up as a failure in the
dedicated test for the fold as well as several LoopVectorizer tests
that run extra passes.
I don't see any pass manager regression tests that actually check
for pass options, but this is intended to be NFC for the pass
pipeline behavior - we only try to use loop info where it would
have been used before via caching .
Differential Revision: https://reviews.llvm.org/D144274
DFAJumpThreading
JumpThreading
LibCallsShrink
LoopVectorize
SLPVectorizer
DeadStoreElimination
AggressiveDCE
CorrelatedValuePropagation
IndVarSimplify
These are part of the optimization pipeline, of which the legacy version is deprecated and being removed.
Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes.
But the compiler may do the same for other gather/buildvector nodes too, just need to check the
dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet.
Part of D110978
Differential Revision: https://reviews.llvm.org/D144958
Otherwise %x <= 10 will be true on the first iteration, making the latch
dead. This makes the test more robust to CE becoming more powerful in
the future.
This caused verifier errors:
Instruction does not dominate all uses!
%8 = insertelement <2 x i64> %7, i64 %pgocount1330, i64 1
%15 = shufflevector <2 x i64> %8, <2 x i64> poison, <2 x i32> <i32 1, i32 1>
in function ?NearestInclusiveAncestorAssignedToSlot@SlotScopedTraversal@blink@@SAPAVElement@2@ABV32@@Z
(or register allocator crash when the verifier was disabled).
See comment on the code review.
> Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes.
> But the compiler may do the same for other gather/buildvector nodes too, just need to check the
> dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet.
>
> Part of D110978
>
> Differential Revision: https://reviews.llvm.org/D144958
This reverts commit a611b3f3059e4c3b9e7b914091c3edaef099fd5d.
It also reverts 7a4061ae372b3262703ffeea3b64db89187db611 which depended on the above.
We can handle logical AND/OR in the same way as arithmetic AND/OR, it only
takes us freezing `RHS2` for which we may introduce a new use which didn't
exist before dynamically.
Differential Revision: https://reviews.llvm.org/D145771
Reviewed By: nikic
We don't do this transform in InstCombine in general case for arbitrary values, because cost of
AND and 2 ICMP's isn't higher than of MIN and ICMP. However, LICM also has a notion
about the loop structure. This transform becomes profitable if `A` and `B` are loop-invariant and
`X` is not: by doing this, we can compute min outside the loop.
Differential Revision: https://reviews.llvm.org/D143726
Reviewed By: nikic
When threading operations over phis, we need to adjust the context
instruction to the terminator of the incoming block. This was
handled when threading icmps, but not when threading binops.
Fixes https://github.com/llvm/llvm-project/issues/61312.