We have a bunch of folds that basically perform X pred Y to ~Y pred ~X
for various special cases where this saves an instruction.
Generalize these folds to use isFreeToInvert(). We have to make sure
that we consume an instruction in either of the inversions, otherwise
we're just going to swap the icmp back and forth.
Fixes https://github.com/llvm/llvm-project/issues/74302.
This adds support for using dominating conditions in computeKnownBits()
when called from InstCombine. The implementation uses a
DomConditionCache, which stores which branches may provide information
that is relevant for a given value.
DomConditionCache is similar to AssumptionCache, but does not try to do
any kind of automatic tracking. Relevant branches have to be explicitly
registered and invalidated values explicitly removed. The necessary
tracking is done inside InstCombine.
The reason why this doesn't just do exactly the same thing as
AssumptionCache is that a lot more transforms touch branches and branch
conditions than assumptions. AssumptionCache is an immutable analysis
and mostly gets away with this because only a handful of places have to
register additional assumptions (mostly as a result of cloning). This is
very much not the case for branches.
This change regresses compile-time by about ~0.2%. It also improves
stage2-O0-g builds by about ~0.2%, which indicates that this change results
in additional optimizations inside clang itself.
Fixes https://github.com/llvm/llvm-project/issues/74242.
The disjoint flag was recently added to IR in #72583
We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.
We can use buildShuffleEntryMask() to build the shuffle mask correctly
not only for the alternate nodes with reuses, but also for the nodes
without reused scalars. It allows better to estimate the cost of the
node and emit better code.
Differential Revision: https://reviews.llvm.org/D157413
This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53.
Recommits eea9258648ce with a fix to only erase the instruction from the
first part if it is defined outside the loop. This fixes a
use-after-free error reported.
This reverts commit eea9258648ce73507f6f85c395de978af659d498.
That commit triggered crashes in the following testcase:
$ cat reduced.c
typedef struct {
int a[8]
} b;
typedef struct {
b *c;
short d
} e;
void f() {
int g;
char *h;
e *i = f;
short j = i->d;
int a = i->c->a[0];
for (;;)
for (; g < a; g++) {
*h = j * i->d >> 8;
h++;
}
}
$ clang -target aarch64-linux-gnu -w -c -O2 reduced.c
Use Value::getPointerDereferenceableBytes() instead of hardcoding dereferenceable only for allocas. Allows us to infer inbounds GEPs for other Values like CallInsts and Arguments.
Fixed clang test broken in initial land.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D153815
Use Value::getPointerDereferenceableBytes() instead of hardcoding dereferenceable only for allocas. Allows us to infer inbounds GEPs for other Values like CallInsts and Arguments.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D153815
This is the consolidation of D151644 and D151943 moved from
InstCombine to FunctionAttrs. This is based on discussion in the above
patches as well as D152081 (Attributor). This patch was written in a
way so it can have an immediate impact in currently active passes
(FunctionAttrs), but should be easy to port elsewhere (Attributor or
Inliner) if that makes more sense later on.
Some function attributes imply the attribute for all/some instructions
in the function. These attributes can be safely propagated to
callsites within the function that are missing the attribute. This can
be useful when 1) analyzing individual instructions in a function
and 2) if the original caller is later inlined, as if the attributes are
not propagated, they will be lost.
This patch implements propagation in a new class/file
`InferCallsiteAttrs` which can hypothetically be included elsewhere.
At the moment this patch infers the following:
Function Attributes:
- mustprogress
- nofree
- willreturn
- All memory attributes (readnone, readonly, writeonly, argmem,
etc...)
- The memory attributes are only propagated IFF the set of
pointers available to the callsite is the same as the set
available outside the caller (i.e no local memory arguments
from alloca or local malloc like functions).
Argument Attributes:
- noundef
- nonnull
- nofree
- readnone
- readonly
- writeonly
- nocapture
- nocapture is only propagated IFF the set of pointers
available to the callsite is the same as the set available
outside the caller and its guranteed that between the
callsite and function return, the state of any capture
pointers will not change (so the nocaptured gurantee of the
caller has been met by the instruction preceding the
callsite and will not changed).
Argument are only propagated to callsite arguments that are also function
arguments, but not derived values.
Return Attributes:
- noundef
- nonnull
Return attributes are only propagated if the callsite's return value
is used as the caller's return and execution is guranteed to pass from
callsite to return.
The compile time hit of this for -O3 and -O3+thinLTO is ~[.02, .37]%
regression. Proper LTO, however, has more significant regressions (up
to 3.92%):
https://llvm-compile-time-tracker.com/compare.php?from=94407e1bba9807193afde61c56b6125c0fc0b1d1&to=79feb6e78b818e33ec69abdc58c5f713d691554f&stat=instructions:u
Differential Revision: https://reviews.llvm.org/D152226
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.
Differential Revision: https://reviews.llvm.org/D149210
Based off D148215, when expanding a min/max reduction we should be creating min/max intrinsics directly instead of relying on instcombine to fold them back together.
This patch handles integer min/max cases. Hopefully we can add floating point support soon (at least for fastmath/nnan cases) - but we're missing some of the plumbing to pass the correct FMF to the intrinsic at the moment.
Differential Revision: https://reviews.llvm.org/D148221
Metric: size..text
size..text results results0 diff
SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-980605-1.test 445.00 461.00 3.6%
SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 428477.00 428445.00 -0.0%
External/SPEC/CFP2006/447.dealII/447.dealII.test 618849.00 618785.00 -0.0%
For all tests some extra code was optimized, GCC-C-execute has some more
inlining after
Differential Revision: https://reviews.llvm.org/D132261
This patch adds metadata to disable runtime unrolling to the vectorized
loop. If runtime unrolling/interleaving is considered profitable, LV
will interleave the loop directly. There should be no need to perform
runtime unrolling at a later stage.
Note that we already add metadata to disable runtime unrolling to the
scalar loop after vectorization.
The additional unrolling unnecessarily increases code size and compile
time. In addition to that we have several bug reports of unncessary
runtime unrolling for vectorized loops, e.g. PR40961
Compile-time improvements:
NewPM-O3: -1.04%
NewPM-ReleaseThinLTO: -0.59%
NewPM-ReleaseLTO-g: -0.97%
https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:uFixes#40306.
Reviewed By: lebedev.ri, nikic
Differential Revision: https://reviews.llvm.org/D115261
This puts lower insert indexes before higher. This is independent
of endian, so it requires an adjustment to a fold added with
4446f71ce392, but it makes that fold more robust.
That's also where this patch was suggested - D139668.
This matches what we already do in DAGCombiner, but there is one
more constraint because there's an existing canonicalization for
insert-of-scalar-constant. I'm not sure if that is still needed,
so it may be adjusted/removed as a follow-up.
This replaces patches that tried to convert related patterns to shuffles
(D138872, D138873, D138874 - reverted/abandoned) but caused codegen
problems and were questionable as a canonicalization because an
insertelement is a simpler op than a shuffle.
This detects a larger pattern -- insert-of-insert -- and replaces with
another insert, so this hopefully does not cause any problems.
As noted by TODO items in the code and tests, this could go a lot further.
But this is enough to reduce the motivating test from issue #17113.
Example proofs:
https://alive2.llvm.org/ce/z/NnUv3a
I drafted a version of this for AggressiveInstCombine, but it seems that
would uncover yet another phase ordering gap. If we do generalize this to
handle the full range of potential patterns, that may be worth looking at
again.
Differential Revision: https://reviews.llvm.org/D139668
This reverts commit e71b81cab09bf33e3b08ed600418b72cc4117461.
As discussed in the planned follow-on to this patch (D138874),
this and the subsequent patches in this set can cause trouble for
the backend, and there's probably no quick fix. We may even
want to canonicalize in the opposite direction (towards insertelt).
The first attempt was reverted because a clang test changed
unexpectedly - the file is already marked with a FIXME, so
I just updated it this time to pass.
Original commit message:
This is the main patch for converting a truncated scalar that is
inserted into a vector to bitcast+shuffle. We could go either way
on patterns like this, but this direction will allow collapsing a
pair of these sequences on the motivating example from issue
The patch is split into 3 parts to make it easier to see the
progression of tests diffs. We allow inserting/shuffling into a
different size vector for flexibility, so there are several test
variations. The length-changing is handled by shortening/padding
the shuffle mask with undef elements.
In part 1, handle the basic pattern:
inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask
Proof for the endian-dependency behaving as expected:
https://alive2.llvm.org/ce/z/BsA7yC
The TODO items for handling shifts and insert into an arbitrary base
vector value are implemented as follow-ups.
Differential Revision: https://reviews.llvm.org/D138872
This is the main patch for converting a truncated scalar that is
inserted into a vector to bitcast+shuffle. We could go either way
on patterns like this, but this direction will allow collapsing a
pair of these sequences on the motivating example from issue
The patch is split into 3 parts to make it easier to see the
progression of tests diffs. We allow inserting/shuffling into a
different size vector for flexibility, so there are several test
variations. The length-changing is handled by shortening/padding
the shuffle mask with undef elements.
In part 1, handle the basic pattern:
inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask
Proof for the endian-dependency behaving as expected:
https://alive2.llvm.org/ce/z/BsA7yC
The TODO items for handling shifts and insert into an arbitrary base
vector value are implemented as follow-ups.
Differential Revision: https://reviews.llvm.org/D138872
As part of legacy PM optimization pipeline removal.
This shouldn't be used in codegen pipelines so it should be ok to remove.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D137116
Two lit tests were found running something like this:
opt -O<n> -pass-locked-to-legacy-PM ...
The expand-atomicrmw-xchg-fp.ll seem to have used -O1 just to ensure
that the -atomic-expand pass were thinking that it wasn't running at
O0 level. Same thing can be ensured by using the -codegen-opt-level=1
option, making it possible to avoid using O1 in that test case.
In the vector-reductions-expanded.ll test case it was possible to
split the RUN line into using two opt invocations. First running
"opt -O2" using the new PM, and then running "opt -expand-reductions"
using the legacy PM.
I think that given this patch we get closer to removing code related
to 'AddOptimizationPasses' in opt.cpp.
Differential Revision: https://reviews.llvm.org/D137626
This reverts commit bd7949bcd86633bd4203b2ba6f891aea00fce4d1.
Revert this patch since reviwers have different opinions regarding
the approach in post-commit review.
Will open RFC for further discussion.
Differential Revision: https://reviews.llvm.org/D132408
This patch extends the load merge/widen in AggressiveInstCombine() to handle reverse load patterns.
Differential Revision: https://reviews.llvm.org/D135137
Revert rGef89409a59f3b79ae143b33b7d8e6ee6285aa42f "Fix 'unused-lambda-capture' gcc warning. NFCI."
Revert rG926ccfef032d206dcbcdf74ca1e3a9ebf4d1be45 "[SLP] ScalarizationOverheadBuilder - demand all elements for scalarization if the extraction index is unknown / out of bounds"
Revert ScalarizationOverheadBuilder sequence from D134605 - when accumulating extraction costs by Type (instead of specific Value), we are not distinguishing enough when they are coming from the same source or not, and we always just count the cost once. This needs addressing before we can use getScalarizationOverhead properly.