This enabled opaque pointers by default in LLVM. The effect of this
is twofold:
* If IR that contains *neither* explicit ptr nor %T* types is passed
to tools, we will now use opaque pointer mode, unless
-opaque-pointers=0 has been explicitly passed.
* Users of LLVM as a library will now default to opaque pointers.
It is possible to opt-out by calling setOpaquePointers(false) on
LLVMContext.
A cmake option to toggle this default will not be provided. Frontends
or other tools that want to (temporarily) keep using typed pointers
should disable opaque pointers via LLVMContext.
Differential Revision: https://reviews.llvm.org/D126689
Now that SimpleLoopUnswitch and other transforms no longer introduce
branch on poison, enable the -branch-on-poison-as-ub option by
default. The practical impact of this is mostly better flag
preservation in SCEV, and some freeze instructions no longer being
necessary.
Differential Revision: https://reviews.llvm.org/D125299
This option was added in D89854. It prevents GVN from performing
load PRE in a loop, if doing so would require critical edge
splitting on the backedge. From the review:
> I know that GVN Load PRE negatively impacts peeling,
> loop predication, so the passes expecting that latch has
> a conditional branch.
In the PhaseOrdering test in this patch, splitting the backedge
negatively affects vectorization: After critical edge splitting,
the loop gets rotated, effectively peeling off the first loop
iteration. The effect is that the first element is handled
separately, then the bulk of the elements use a vectorized
reduction (but using unaligned, off-by-one memory accesses) and
then a tail of 15 elements is handled separately again.
It's probably worth noting that the loop load PRE from D99926 is
not affected by this change (as it does not need backedge
splitting). This is about normal load PRE that happens to occur
inside a loop.
Differential Revision: https://reviews.llvm.org/D126382
Use IRBuilder so that the newly created freeze instructions
automatically gets inserted back into the IC worklist.
The changed worklist processing order leads to some cosmetic
differences in tests.
Fixes https://github.com/llvm/llvm-project/issues/55619.
When computing the BECount for multi-exit loops, we need to combine
individual exit counts using umin_seq rather than umin. This is
because an earlier exit may exit on the first iteration, in which
case later exit expressions will not be evaluated and could be
poisonous. We cannot propagate potential poison values from later
exits.
In particular, this avoids the introduction of "branch on poison"
UB when optimizing multi-exit loops.
Differential Revision: https://reviews.llvm.org/D124910
The transform was wrong in 3 ways:
1. It created an extra instruction when the source and dest types don't match.
2. It did not account for an extra use of the icmp, so could create 2 extra insts.
3. It favored bit hacks over icmp (icmp generally has better analysis).
This fixes#54692 (modeled by the PhaseOrdering tests).
This is a minimal step to fix the bug, but we should likely invert
this and the sibling transform for the "is negative" pattern too.
The backend should be able to invert this back to a shift if that
leads to better codegen.
This is a reduced try of 3794cc0e9964 - that was reverted because
it could cause infinite loops by conflicting with the related
transforms in this block that create shifts.
This reverts commit 3794cc0e996481e10307b67c8436aa44e0d65d22.
This change is suspected of causing bots to hang at stage 2
compiles, so reverting to confirm and investigate.
The existing transform was wrong in 3 ways:
1. It created an extra instruction when the source and dest types don't match.
2. It did not account for an extra use of the icmp, so could create 2 extra insts.
3. It favored bit hacks over icmp (icmp generally has better analysis).
This fixes#54692 (modeled by the PhaseOrdering tests).
This is a minimal step to fix the bug, but we should likely invert
the sibling transform for the "is negative" pattern too.
The backend should be able to invert this back to a shift if that
leads to better codegen.
The tests (see C++ source in #54692) have multiple potential
optimizations/canonicalizations, but we should be consistent
since they are logically identical.
This patch adds initial support for a pointer diff based runtime check
scheme for vectorization. This scheme requires fewer computations and
checks than the existing full overlap checking, if it is applicable.
The main idea is to only check if source and sink of a dependency are
far enough apart so the accesses won't overlap in the vector loop. To do
so, it is sufficient to compute the difference and compare it to the
`VF * UF * AccessSize`. It is sufficient to check
`(Sink - Src) <u VF * UF * AccessSize` to rule out a backwards
dependence in the vector loop with the given VF and UF. If Src >=u Sink,
there is not dependence preventing vectorization, hence the overflow
should not matter and using the ULT should be sufficient.
Note that the initial version is restricted in multiple ways:
1. Pointers must only either be read or written, by a single
instruction (this allows re-constructing source/sink for
dependences with the available information)
2. Source and sink pointers must be add-recs, with matching steps
3. The step must be a constant.
3. abs(step) == AccessSize.
Most of those restrictions can be relaxed in the future.
See https://github.com/llvm/llvm-project/issues/53590.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D119078
Currently SLP vectorizer walks through the instructions and selects
3 main classes of values: 1) reduction operations - instructions with same
reduction opcode (add, mul, min/max, etc.), which build the reduction,
2) reduced values - instructions with the same opcodes, but different
from the reduction opcode, 3) extra arguments - all other values,
instructions from the different basic block rather than the root node,
instructions with to many/less uses.
This scheme is not very efficient. It excludes some instructions and all
non-instruction values from the reductions (constants, proficient
gathers), to many possibly reduced values are marked as extra arguments.
Patch improves this process by introducing a bit extended analysis
stage. During this stage, we still try to select 3 classes of the
values: 1) reduction operations - same as before, 2) possibly reduced
values - all instructions from the current block/non-instructions, which
may build a vectorization tree, 3) extra arguments - instructions from
the different basic blocks. Additionally, an extra sorting of the
possibly reduced values occurs to build the scalar sequences which
highly likely will bed vectorized, e.g. loads are grouped by the
distance between them, constants are grouped together, cmp instructions
are sorted by their compare types and predicates, extractelement
instructions are sorted by the vector operand, etc. Also, these groups
are reordered by their length so the longest group is the first in the
list of the possibly reduced values.
The vectorization process tries to emit the reductions for all these
groups. These reductions, remaining non-vectorized possible reduced
values and extra arguments are then combined into the final expression
just like it was before.
Differential Revision: https://reviews.llvm.org/D114171
Given a shuffle with 4 elements size 16 or 32, we can use the costs
directly from the PerfectShuffle tables to get a slightly more accurate
cost for the resulting shuffle.
Differential Revision: https://reviews.llvm.org/D123409
Currently SLP vectorizer walks through the instructions and selects
3 main classes of values: 1) reduction operations - instructions with same
reduction opcode (add, mul, min/max, etc.), which build the reduction,
2) reduced values - instructions with the same opcodes, but different
from the reduction opcode, 3) extra arguments - all other values,
instructions from the different basic block rather than the root node,
instructions with to many/less uses.
This scheme is not very efficient. It excludes some instructions and all
non-instruction values from the reductions (constants, proficient
gathers), to many possibly reduced values are marked as extra arguments.
Patch improves this process by introducing a bit extended analysis
stage. During this stage, we still try to select 3 classes of the
values: 1) reduction operations - same as before, 2) possibly reduced
values - all instructions from the current block/non-instructions, which
may build a vectorization tree, 3) extra arguments - instructions from
the different basic blocks. Additionally, an extra sorting of the
possibly reduced values occurs to build the scalar sequences which
highly likely will bed vectorized, e.g. loads are grouped by the
distance between them, constants are grouped together, cmp instructions
are sorted by their compare types and predicates, extractelement
instructions are sorted by the vector operand, etc. Also, these groups
are reordered by their length so the longest group is the first in the
list of the possibly reduced values.
The vectorization process tries to emit the reductions for all these
groups. These reductions, remaining non-vectorized possible reduced
values and extra arguments are then combined into the final expression
just like it was before.
Differential Revision: https://reviews.llvm.org/D114171
RequireAnalysis<GlobalsAA> doesn't actually recompute GlobalsAA.
GlobalsAA isn't invalidated (unless specifically invalidated) because
it's self-updating via ValueHandles, but can be imprecise during the
self-updates.
Rather than invalidating GlobalsAA, which would invalidate AAManager and
any analyses that use AAManager, create a new pass that recomputes
GlobalsAA.
Fixes#53131.
Differential Revision: https://reviews.llvm.org/D121167
This is a revert of cfcc42bdc. The analysis is wrong as shown by
the minimal tests for instcombine:
https://alive2.llvm.org/ce/z/y9Dp8A
There may be a way to salvage some of the other tests,
but that can be done as follow-ups. This avoids a miscompile
and fixes#54311.
Now that integer min/max intrinsics have good support in both
InstCombine and other passes, start canonicalizing SPF min/max
to intrinsic min/max.
Once this sticks, we can stop matching SPF min/max in various
places, and can remove hacks we have for preventing infinite loops
and breaking of SPF canonicalization.
Differential Revision: https://reviews.llvm.org/D98152
LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information.
This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D119965
Unfortunately, it seems we really do need to take the long route;
start from the "merge" block, find (all the) "dispatch" blocks,
and deal with each "dispatch" block separately, instead of simply
starting from each "dispatch" block like it would logically make sense,
otherwise we run into a number of other missing folds around
`switch` formation, missing sinking/hoisting and phase ordering.
This reverts commit 85628ce75b3084dc0f185a320152baf85b59aba7.
This reverts commit c5fff9095342a792bf4b9a077fe3c3a83c4e566c.
This reverts commit 34a98e1046e3aa55e5f26ab20a15e96b4034d25a.
This reverts commit 1e353f092288309d74d380367aa50bbd383780ed.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.
Differential Revision: https://reviews.llvm.org/D115955
The current `FoldTwoEntryPHINode()` is not quite designed correctly.
It starts from the merge point, and then tries to detect
the 'divergence' point.
Because of that, it is limited to the simple two-predecessor case,
where the PHI completely goes away. but that is rather pessimistic,
and it doesn't make much sense from the costmodel side of things.
For example if there is some other unrelated predecessor of
the merge point, we could split the merge point so that
the then/else blocks first branch to an empty block
and then to the merge point, and then we'd be able to speculate
the then/else code.
But if we'd instead simply start at the divergence point,
and look for the merge point, then we'll just natively support this case.
There's also the fact that `SpeculativelyExecuteBB()` already does
just that, but only if there is a single block to speculate,
and with a much more restrictive cost model.
But that also means we have code duplication.
Now, sadly, while this is as much NFCI as possible,
there is just no way to cleanly migrate to
the proper implementation. The results *are* going to be different
somewhat because of various phase ordering effects and SimplifyCFG
block iteration strategy.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.
Differential Revision: https://reviews.llvm.org/D115955
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.
Differential Revision: https://reviews.llvm.org/D115955
In D110057 we moved LoopFlatten to a LoopPassManager. This caused a performance
regression for our 64-bit targets (the 32-bit were unaffected), the pass is no
longer triggering for a motivating example. The reason is that the IR is just
very different than expected; we try to match loop statements and particular
uses of induction variables. The easiest is to just move LoopFlatten to a place
in the pipeline where the IR is as expected, which is just before
IndVarSimplify. This means we move it from LPM2 to LPM1, so that it actually
runs just a bit earlier from where it was running before. IndVarSimplify is
responsible for significant rewrites that are difficult to "look through" in
LoopFlatten.
Differential Revision: https://reviews.llvm.org/D116612
Instead of summing leading zeros on the input operands, multiply the
max possible values of those inputs and count the leading zeros of
the result. This can give us an extra zero bit (typically in cases
where one of the operands is a known constant).
This allows folding away the remaining 'add' ops in the motivating
bug (modeled in the PhaseOrdering IR test):
https://github.com/llvm/llvm-project/issues/48399Fixes#48399
Differential Revision: https://reviews.llvm.org/D115969
~(iN X s>> (N-1)) & Y --> (X s< 0) ? 0 : Y
https://alive2.llvm.org/ce/z/JKlQ9x
This is similar to D111410 / 727e642e970d028049d ,
but it includes a 'not' of the signbit and so it
saves an instruction in the basic pattern.
DAGCombiner or target-specific folds can expand
this back into bit-hacks.
The diffs in the logical-select tests are not true
regressions - running early-cse and another round
of instcombine is expected in a normal opt pipeline,
and that reduces back to a minimal form as shown
in the duplicated PhaseOrdering test.
I have no understanding of the SystemZ diffs, so
I made the minimal edits suggested by FileCheck to
make that test pass again. That whole test file is
wrong though. It is running the entire optimizer (-O2)
to check IR, and then topping that by even running
codegen and checking asm. It needs to be split up.
Fixes#52631
The 1st test corresponds to a minimally optimized (mem2reg)
version of the example in:
issue #52631
The 2nd test copies an existing instcombine test with the
same pattern. If we canonicalize differently, we can miss
reducing to minimal form in a single invocation of
-instcombine, but that should not escape the normal opt
pipeline.
The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order.
I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions.
We chose the canonical type as i64 arbitrarily. We might consider changing this to something else in the future if we have cause.
Differential Revision: https://reviews.llvm.org/D115387
MergeFunctions (as well as HotColdSplitting an IROutliner) are
incorrectly scheduled under the new pass manager. The code makes
it look like they run towards the end of the module optimization
pipeline (as they should), while in reality the run at the start.
This is because the OptimizePM populated around them is only
scheduled later.
I'm fixing this by moving these three passes until after OptimizePM
to avoid splitting the function pass pipeline. It doesn't seem
important to me that some of the function passes run after these
late module passes.
Differential Revision: https://reviews.llvm.org/D115098