This reverts commit 43842b887e0a7b918bb2d6c9f672025b2c621f8a while I
investigate a buildbot failure.
It also reverts the follow-up commit
2883de05145fc5b4afb99b91f69ebb835af36af5.
Given a shuffle feeding a commutative reduction, the lane ordering of
the shuffle will not alter the result. This is also true if there are a
number of operations between the reduction and the shuffle, providing
they only operate lane-wise. This patch searches for cases like that in
Vector Combine, allowing us to check the cost of the shuffle vs an
in-order identity shuffle and replace the order if possible. This only
handles a single shuffle at the moment to keep things simple, and is
able to ignore splats that produce results where every result is the
same.
This is a more powerful version of a combine that already happens in
instrcombine, capable of optimizing more cases by looking through more
instructions and being able to cost the shuffle.
Differential Revision: https://reviews.llvm.org/D123494
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.
Differential Revision: https://reviews.llvm.org/D100486
The structure ArgPart and alias OffsetAndArgPart have been moved
into the anonymous namespace. NFC.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D124617
The condition should be 'ArgParts.size() > MaxElements', so that if we
have exactly 3 elements in the 'ArgParts' vector, the promotion should
be allowed because the 'MaxElement' threshold is not exceeded yet.
The default value for 'MaxElement' has been decreased to 2 in order
to avoid an actual change in argument promoting behavior. However,
this changes byval argument transformation behavior by allowing
adding not more than 2 arguments to the function instead of 3 allowed
before.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D124178
Remove one of the last remaining uses of ::needsVectorIV, preparing for
its removal. Now that usesScalars is available and based on the
information explicit in VPlan, there is no need to use the pre-computed
needsVectorIV.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D123720
Some loop counters ('i', 'e') and variables ('type') were named not
in accordance with the code style and clang-tidy issues warnings
about the using of such variables. This patch renames the variables
and fixes some typos in the comments within the source file.
Differential Revision: https://reviews.llvm.org/D123662
Reland 3f2b76ec90b5f108272a3072a1345ba55d8ec75b with the test corrected
to require x86-registered-target.
Differential Revision: https://reviews.llvm.org/D120169
Reland commit 74273d575f9938d751a1c67862cffe553fe2de8b following a fix
for a memory leak. The DVIRecoveryRecord vectors now use unique_ptr.
Differential Revision: https://reviews.llvm.org/D120169
When using opaque pointers, convert GEPs into offset representation
of the form P + V1 * Scale1 + V2 * Scale2 + ... + ConstantOffset.
This allows us to recognize equivalent address calculations even if
the GEPs don't use the same source element type.
This fixes an opaque pointer codegen regression seen in rustc.
Differential Revision: https://reviews.llvm.org/D124527
They can already be available, and even if not, DT/LI can be available.
We should not recompute them. Old PM is unchanged because it would
require changing dependencies, and we don't care enough about it.
Differential Revision: https://reviews.llvm.org/D124439
Reviewed By: nikic, aeubanks
isNoopAddrSpaceCast is expecting SrcAS is different from DestAS.
If the two AS are the same, consider ptrtoint/inttoptr as noop cast.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D123573
Previously all entries in global_ctors had to have the void()* type and
we'd skip evaluating bitcasted functions. With opaque pointers we may
see the function directly.
Fixes#55147.
Reviewed By: #opaque-pointers, nikic
Differential Revision: https://reviews.llvm.org/D124553
Using the legacy PM for the optimization pipeline was deprecated in 13.0.0.
Following recent changes to remove non-core features of the legacy
PM/optimization pipeline, remove ThreadSanitizerLegacyPass.
Reviewed By: #sanitizers, vitalybuka
Differential Revision: https://reviews.llvm.org/D124209
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.
Differential Revision: https://reviews.llvm.org/D100486
When a block containing llvm.coro.id is cloned during CHR, it inserts an invalid
PHI node with token type to the beginning of the block containing llvm.coro.begin.
To avoid such case, we exclude regions with llvm.coro.id.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D124418
IRCE is a function pass that operates on loops. If there are no loops in
the function (as seen through LI), we should avoid computing the
remaining expensive analyses (such as BPI). Reordered the analyses
requests and early return if there are no loops. This is an NFC with
compile time improvement.
The same will be done in a follow-up patch for the loop vectorizer.
Reviewed-By: nikic
Differential Revision: https://reviews.llvm.org/D124478
This relands commit 8f550368b169b0a3d4ebc3e65efec49ec1e0798a.
The test is amended with REQUIRES: x86-registered-target, in line with
the other debuginfo-scev-salvage tests.
Differential Revision: https://reviews.llvm.org/D120169
Second of two patches to extend SCEV-based salvaging to dbg.value
intrinsics that have multiple location ops pre-LSR. This second patch
adds the core implementation.
Reviewers: @StephenTozer, @djtodoro
Differential Revision: https://reviews.llvm.org/D120169
First of two patches that extends SCEV-based salvaging to enable
salvaging of dbg.value instrinsics that have multiple locations ops
before the Loop Strength Reduction pass.
The existing single-op SCEV-based salvaging can generate variadic
dbg.value intrinsics in order to salvage a dbg.value that has a single
location op. If a dbg.value has multiple location ops before LSR, and
LSR optimises away one or more of the location operands, then currently
no salvaging will be attempted.
Salvaging can now be added, but first this patch cleans up consistency
in both the code and comments, and applies some refactoring to make
application of the new salvaging implementation more straightforward.
- Use SCEVDbgValueBuilder for both types of recovery expressions:
IV-offset based and iteration count based.
- Combine the functions that write the final DIExpression.
- Move some static functions into member functions.
Reviewers: @Orlando
Differential Revision: https://reviews.llvm.org/D120168
Currently, two GEPs will only be combined if the result element
type of one is the same as the source element type of the other.
However, this means we may miss folding opportunities where the
second GEP could be rewritten using a different element type. This
is especially relevant for opaque pointers, where constant GEPs
often use i8 element type.
Address this by converting GEP indices to offsets, adding them,
and then converting them back to indices. The first (inner) GEP
is allowed to have variable indices as well, in which case only
the constant suffix is converted into an offset.
This should address the regression reported in
https://reviews.llvm.org/D123300#3467615.
Differential Revision: https://reviews.llvm.org/D124459
I found this bug when performing a two-stage build of clang with
Function Specialization enabled and tuned aggressively. The crash
appears only on release builds.
Fixes https://github.com/llvm/llvm-project/issues/55000.
Before accessing the contents of the ArgInfo iterator inside
SCCPInstVisitor::markArgInFuncSpecialization, we should be
checking that the iterator is valid.
Differential Revision: https://reviews.llvm.org/D124114
canonicalizeClampLike canonicalizes the ule/ugt comparisons to ult/uge,
respectively. However, it does not update the variable holding the
comparison predicate type after doing this. Later code fails to handle
the non-canonical predicate type (specifically, the swap of
ThresholdLowIncl and ThresholdHighExcl when Pred0 has been canonicalized
from ugt to uge). This leads to the miscompile reported in PR53252. Fix
this by updating the comparison predicate after canonicalizing.
Fixes#53252
Differential Revision: https://reviews.llvm.org/D119690
The callback is expected to create a branch to the ContinuationBB (sometimes called FiniBB in some lambdas) argument when finishing. This creates problems:
1. The InsertPoint used for CodeGenIP does not need to be the end of a block. If it is not, a naive callback will insert a branch instruction into the middle of the block.
2. The BasicBlock the CodeGenIP is pointing to may or may not have a terminator. There is an conflict where to branch to if the block already has a terminator.
3. Some API functions work only with block having a terminator. Some workarounds have been used to insert a temporary terminator that is removed again.
4. Some callbacks are sensitive to whether the BasicBlock has a terminator or not. This creates a callback ordering problem where different callback may have different behaviour depending on whether a previous callback created a terminator or not. The problem also exists for FinalizeCallbackTy where some callbacks do create branch to another "continue" block, but unlike BodyGenCallbackTy does not receive the target as argument. This is not addressed in this patch.
With this patch, the callback receives an CodeGenIP into a BasicBlock where to insert instructions. If it has to insert control flow, it can split the block at that position as needed but otherwise no separate ContinuationBB is needed. In particular, a callback can be empty without breaking the emitted IR. If the caller needs the control flow to branch to a specific target, it can insert the branch instruction itself and pass an InsertPoint before the terminator to the callback.
Certain frontends such as Clang may expect the current IRBuilder position to be at the end of a basic block. In this case its callbacks must split the block at CodeGenIP before setting the IRBuilder position such that the instructions after CodeGenIP are moved to another basic block and before returning create a new branch instruction to the split block.
Some utility functions such as `splitBB` are supporting correct splitting of BasicBlocks, independent of whether they have a terminator or not, returning/setting the InsertPoint of an IRBuilder to the end of split predecessor block, and optionally omitting creating a branch to the split successor block to be added later.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D118409
We can always replace the undef elements in a vector constant
with regular constants to get rid of the freeze:
https://alive2.llvm.org/ce/z/nfRb4F
The select diffs show that we might do better by adjusting the
logic for a frozen select condition. We may also want to refine
the vector constant replacement to consider forming a splat.
Differential Revision: https://reviews.llvm.org/D123962
Before this patch `Args` was used to pass a broadcat's arguments by SLP.
This patch changes this. `Args` is now used for passing the operands of
the shuffle.
Differential Revision: https://reviews.llvm.org/D124202
This continues the push away from hard-coded knowledge about functions
towards attributes. We'll use this to annotate free(), realloc() and
cousins and obviate the hard-coded list of free functions.
Differential Revision: https://reviews.llvm.org/D123083
This reorganizes the code as a preparation for D123865:
* Use more descriptive names for variables
* Simplify a condition by use an already calculated value
for `MaxPeelCount`
* Remove a duplicate log entry
* Report basic values for loop costs
Differential Revision: https://reviews.llvm.org/D124388