We have a bunch of folds that basically perform X pred Y to ~Y pred ~X
for various special cases where this saves an instruction.
Generalize these folds to use isFreeToInvert(). We have to make sure
that we consume an instruction in either of the inversions, otherwise
we're just going to swap the icmp back and forth.
Fixes https://github.com/llvm/llvm-project/issues/74302.
This adds support for using dominating conditions in computeKnownBits()
when called from InstCombine. The implementation uses a
DomConditionCache, which stores which branches may provide information
that is relevant for a given value.
DomConditionCache is similar to AssumptionCache, but does not try to do
any kind of automatic tracking. Relevant branches have to be explicitly
registered and invalidated values explicitly removed. The necessary
tracking is done inside InstCombine.
The reason why this doesn't just do exactly the same thing as
AssumptionCache is that a lot more transforms touch branches and branch
conditions than assumptions. AssumptionCache is an immutable analysis
and mostly gets away with this because only a handful of places have to
register additional assumptions (mostly as a result of cloning). This is
very much not the case for branches.
This change regresses compile-time by about ~0.2%. It also improves
stage2-O0-g builds by about ~0.2%, which indicates that this change results
in additional optimizations inside clang itself.
Fixes https://github.com/llvm/llvm-project/issues/74242.
The disjoint flag was recently added to IR in #72583
We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.
Use isKnownNonNegative for information transfer. This can improve
results, in cases where ValueTracking can infer additional non-negative
info, e.g. for phi nodes.
This allows simplifying the check from
https://github.com/llvm/llvm-project/issues/63126 by
ConstraintElimination. It is also simplified by IndVarSimplify now; note
the changes in llvm/test/Transforms/PhaseOrdering/loop-access-checks.ll,
due to this now being simplified earlier.
This patch adds exact flags for sext/zext idiom `shr (shl X, Y), Y`.
Alive2: https://alive2.llvm.org/ce/z/xYFpfB
We can generalize it to handle pattern `shr (shl X, Y), Z` with `Y u>=
Z` (e.g., non-splat vectors). But I don't think it's worth the effort.
This missed optimization is discovered with the help of
https://github.com/AliveToolkit/alive2/pull/962.
This PR fixes https://github.com/llvm/llvm-project/issues/55013 : the
max intrinsics is not generated for this simple loop case :
https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being
folded into a select, thus not generating the max intrinsics.
For the story :
Since LLVM 14, SCCP pass got smarter by folding sext into zext for
positive ranges : https://reviews.llvm.org/D81756. After this change,
InstCombine was sometimes unable to fold ICMP correctly as both of the
arguments pointed to mismatched zext/sext. To fix this, @rotateright
implemented this fix : https://reviews.llvm.org/D124419 that tries to
resolve the mismatch by knowing if the argument of a zext is positive
(in which case, it is like a sext) by using ValueTracking, however
ValueTracking is not smart enough to infer that the value is positive in
some cases. Recently, @nikic implemented #67982 which keeps the
information that a zext is non-negative. This PR simply uses this
information to do the folding accordingly.
TLDR : This PR uses the recent nneg tag on zext to fold the icmp
accordingly in instcombine.
This PR also contains test cases for sext/zext folding with InstCombine
as well as a x86 regression tests for the max/min case.
Before we tracked the size of the teams reduction buffer in order to
allocate it at runtime per kernel launch. This patch splits the number
into two parts, the size of the reduction data (=all reduction
variables) and the (maximal) length of the buffer. This will allow us to
allocate less if we need less, e.g., if we have less teams than the
maximal length. It also allows us to move code from clangs codegen into
the runtime as we now know how large the reduction data is.
Builds on #67982 which recently introduced the nneg flag on a zext
instruction. InstCombine is one of our largest canonicalizers of zext
from non-negative sext instructions, so set the flag there.
This reverts commit ddbaa11e9f43a38d50d62a9b9b07c3653b6bf8ab.
Reapply the original commit, the broken test was repaired in 5e51363f38d083ab326736c0d4d1b5f9fe0de080 in the meantime.
The runtime needs to know about the acceptable launch bounds, especially
if the compiler (middle- or backend) assumed those bounds. While this
patch does not yet inform the runtime, it stores the bounds in a place
that can/will be accessed and is associated with the kernel.
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.
One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.
Differential Revision: https://reviews.llvm.org/D141060
D152730 may add trivial pre-conditions of the form (ICMP_ULE, 0, B),
which won't be handled automatically by the constraint system, because
we don't add Var >= 0 for all variables in the unsigned system.
Handling the trivial condition explicitly here avoids having the
increase the number of rows in the system per variable.
https://alive2.llvm.org/ce/z/QC92ur
Depends on D152730.
Fixes#63125.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D158776
This patch adds additional logic to add additional facts for A != B, if
A is a monotonically increasing induction phi. The motivating use case
for this is removing checks when using iterators with hardened libc++,
e.g. https://godbolt.org/z/zhKEP37vG.
The patch pulls in SCEV to detect AddRecs. If possible, the patch adds
the following facts for a AddRec phi PN with StartValue as incoming
value from the loo preheader and B being an upper bound for PN from a
condition in the loop header.
* (ICMP_UGE, PN, StartValue)
* (ICMP_ULT, PN, B) [if (ICMP_ULE, StartValue, B)]
The patch also adds an optional precondition to FactOrCheck (the new
DoesHold field) , which can be used to only add a fact if the
precondition holds at the point the fact is added to the constraint
system.
Depends on D151799.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D152730
Adjust the pipeline slightly to move ConstraintElim just before the loop
simplification pipeline. This increases the number of cases where SCEV
should can preserved in the future.
This also enables slightly more opportunities, by benefiting from
earlier CFG simplifications, which allow more conditions to be added.
Reviewed By: nikic, antoniofrighetto
Differential Revision: https://reviews.llvm.org/D158843
Summary:
Argument attributes like NoAlias and ReadOnly could affect memoryssa and thus earlyCSE in the function simplification pipeline.
https://reviews.llvm.org/D145210 adjusted PostOrderFunctionAttrs placement and caused the argument attributes not referred for the use
in the pipeline. This work (initiated by @nikic) unconditionally performs argument attribute inference in the first function-attrs pass.
Reviewers:
aeubanks and nikic
Differential Revision:
https://reviews.llvm.org/D156397
We can use buildShuffleEntryMask() to build the shuffle mask correctly
not only for the alternate nodes with reuses, but also for the nodes
without reused scalars. It allows better to estimate the cost of the
node and emit better code.
Differential Revision: https://reviews.llvm.org/D157413
When eliminating a tail call, we modify the values of the arguments.
Therefore, if the byval parameter has a readonly attribute, we have to remove it. It is safe because,
from the perspective of a caller, the byval parameter is always treated as "readonly," even if the readonly attribute is removed.
Fixes#64289.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D156793
This reverts commit 1f37088679a5c2416707d477093950e48148d430 as it causes a
large regression in x264, and some other regressions in downstream embedded
benchmarks under LTO.
We currently only enable hoisting in the last SimplifyCFG run of
the function simplification pipeline. In particular this happens
after GVN, which means that instructions that were identical (and
thus hoistable) prior to GVN might no longer be so after it ran,
due to equality replacements (see the phase ordering test).
The history here is that D84108 restricted hoisting to the very
late (module optimization) pipeline only. Then D101468 went back
on that, and also performed it at the end of function simplification.
This patch goes one step further and allows it prior to GVN.
Importantly, we still don't perform hoisting before LoopRotate,
which was the original motivation for delaying it.
Differential Revision: https://reviews.llvm.org/D156532
This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53.
Recommits eea9258648ce with a fix to only erase the instruction from the
first part if it is defined outside the loop. This fixes a
use-after-free error reported.
Set phi inputs to poison whenever we find a dead edge (either
during initial worklist population or the main InstCombine run),
instead of only doing this for successors of dead blocks.
This means that the phi operand is set to poison even if for
critical edges without an intermediate block.
There are quite a few test changes, because the pattern is fairly
common in vectorizer output, for cases where we know the vectorized
loop will be entered.
InstCombine is a worklist-driven algorithm, which works roughly
as follows:
* All instructions are initially pushed to the worklist.
The initial order is in RPO program order.
* All newly inserted instructions get added to the worklist.
* When an instruction is folded, its users get added back to the
worklist.
* When the use-count of an instruction decreases, it gets added
back to the worklist.
* And a few of other heuristics on when we should revisit
instructions.
On top of the worklist algorithm, InstCombine layers an additional
fix-point iteration: If any fold was performed in the previous
iteration, then InstCombine will re-populate the worklist from
scratch and fold the entire function again. This continues until
a fix-point is reached.
In the vast majority of cases, InstCombine will reach a fix-point
within a single iteration: However, a second iteration is performed
to verify that this is indeed the fixpoint. We can see this in the
statistics for llvm-test-suite:
"instcombine.NumOneIteration": 411380,
"instcombine.NumTwoIterations": 117921,
"instcombine.NumThreeIterations": 236,
"instcombine.NumFourOrMoreIterations": 2,
The way to read these numbers is that in 411380 cases, InstCombine
performs no folds. In 117921 cases it performs a fold and reaches
the fix-point within one iteration (the second iteration verifies
the fixpoint). In the remaining 238 cases, more than one iteration
is needed to reach the fixpoint.
In other words, only in 0.04% of cases are additional iterations
needed to reach a fixpoint. Conversely, in 22.3% of cases InstCombine
performs a completely useless extra iteration to verify the fix point.
This patch removes the fixpoint iteration from InstCombine, and always
only perform a single iteration. This results in a major compile-time
improvement of around 4% at negligible codegen impact.
This explicitly does accept that we will not reach a fixpoint in all
cases. However, this is mitigated by two factors: First, the data
suggests that this happens very rarely in practice. Second,
InstCombine runs many times during the optimization pipeline
(8 times even without LTO), so there are many chances to recover
such cases.
In order to prevent accidental optimization regressions in the
future, this implements a verify-fixpoint option, which is enabled
by default when instcombine is specified in -passes and disabled
when InstCombinePass() is constructed from C++. This means that
test cases need to explicitly use the no-verify-fixpoint option
if they fail to reach a fixed point (for a well understand reason
we cannot / do not want to avoid).
Differential Revision: https://reviews.llvm.org/D154579