ConstraintElimination currently only supports decomposing gep nusw with
non-negative indices (with "non-negative" possibly being enforced via
pre-condition).
Add support for gep nuw, which directly gives us the necessary
guarantees for the decomposition.
ValueMap automatically updates entries with the new value if they have
been RAUW. This can lead to instructions that are expected to not have
shape info to be added to the map (e.g. shufflevector as in the added
test case).
This leads to incorrect results. Originally it was used for transpose
optimizations, but they now all use updateShapeAndReplaceAllUsesWith,
which takes care of updating the shape info as needed.
This fixes a crash in the newly added test cases.
PR: https://github.com/llvm/llvm-project/pull/118282
This refactoring hoists the profitability check earlier in the pipeline,
so that for loops that are not profitable to transform there is no
iteration over the basic blocks or LoopStructure computation.
Motivated by PR #104659 that tweaks how the profitability of individual
branches is evaluated.
This patch extends the PGO infrastructure with an option to prefer the
instrumentation of loop entry blocks.
This option is a generalization of
19fb5b467b,
and helps to cover cases where the loop exit is never executed.
An example where this can occur are event handling loops.
Note that change does NOT change the default behavior.
The AddressTaken set used for CFI with regular LTO was being computed on
the ExportSummary regardless of whether any CFI metadata existed. In the
case of ThinLTO, the ExportSummary is the global summary index for the
target, and the lack of guard in this code meant this was being computed
on the ThinLTO index even when there was an empty regular LTO module,
since the backend is called on the combined module to generate the
expected output file (normally this is trivial as there is no IR).
Move the computation of the AddressTaken set into the condition checking
for CFI to avoid this overhead. This change resulted in a 20% speedup in
the thin link of a large target. It looks like the outer loop has
existed here for several years, but likely became a larger overhead
after the inner loop was added very recently in PR113987.
I will send a separate patch to refactor the ThinLTO backend handling to
avoid invoking the opt pipeline if the module is empty, in case there
are other summary-based analyses in some of the passes now or in the
future. This change is still desireable as by default regular LTO
modules contain summaries, or we can have split thin and regular LTO
modules, and if they don't involve CFI these would still unnecessarily
compute the AddressTaken set.
Check for nusw instead of inbounds when decomposing GEPs.
In this particular case, we can also look through multiple nusw
flags, because we will ultimately be working in the unsigned
constraint system.
Introduce a general recipe to generate a scalar phi. Lower
VPCanonicalIVPHIRecipe and VPEVLBasedIVRecipe to VPScalarIVPHIrecipe
before plan execution, avoiding the need for duplicated ::execute
implementations. There are other cases that could benefit, including
in-loop reduction phis and pointer induction phis.
Builds on a similar idea as
https://github.com/llvm/llvm-project/pull/82270.
PR: https://github.com/llvm/llvm-project/pull/114305
Introduce llvm::CmpPredicate, an abstraction over a floating-point
predicate, and a pack of an integer predicate with samesign information,
in order to ease extending large portions of the codebase that take a
CmpInst::Predicate to respect the samesign flag.
We have chosen to demonstrate the utility of this new abstraction by
migrating parts of ValueTracking, InstructionSimplify, and InstCombine
from CmpInst::Predicate to llvm::CmpPredicate. There should be no
functional changes, as we don't perform any extra optimizations with
samesign in this patch, or use CmpPredicate::getMatching.
The design approach taken by this patch allows for unaudited callers of
APIs that take a llvm::CmpPredicate to silently drop the samesign
information; it does not pose a correctness issue, and allows us to
migrate the codebase piece-wise.
Unsigned icmp of gep nuw folds to unsigned icmp of offsets. Unsigned
icmp of gep nusw nuw folds to unsigned samesign icmp of offsets.
Proofs: https://alive2.llvm.org/ce/z/VEwQY8
When `-import-declaration` option is enabled, declaration import is
supported for functions. https://github.com/llvm/llvm-project/pull/88024
has the context for this option.
This patch supports declaration import for global variables in
distributed ThinLTO. The motivating use case is to propagate `dso_local`
attribute of global variables across modules, to optimize global
variable access when a binary is built with
`-fno-direct-access-external-data`.
* With `-fdirect-access-external-data`, non thread-local global
variables will [have `dso_local`
attributes](fe3c23b439/clang/lib/CodeGen/CodeGenModule.cpp (L1730-L1746)).
This optimizes the global variable access as shown by
https://gcc.godbolt.org/z/vMzWcKdh3
Add an extra know to UnrollingPreferences to let backends control the
maximum budget for SCEV expansions.
This gives backends more fine-grained control on the cost of the runtime
checks for runtime unrolling.
PR: https://github.com/llvm/llvm-project/pull/118316
We had a separate fold that handled just the trivial case where we're
replacing exactly the argument of the select. Handle this in select
value equivalence by relaxing the infinite loop protection to allow a
replacement of a non-constant with a constant.
This also fixes https://github.com/llvm/llvm-project/issues/113301, as
the separate fold did not handle undef values correctly.
As vector element loads are free on SystemZ, this patch improves the cost
computation in getGatherCost() to reflect this.
getScalarizationOverhead() gets an optional parameter which can hold the actual
Values so that they in turn can be passed (by BasicTTIImpl) to
getVectorInstrCost().
SystemZTTIImpl::getVectorInstrCost() will now recognize a LoadInst and
typically return a 0 cost for it, with some exceptions.
* Move CoroCloner to its own header. For now, the header is located in llvm/lib/Transforms/Coroutines
* Change private to protected to allow inheritance
* Create CoroSwitchCloner and move some of the switch specific code into this cloner. More code will follow in later commits.
This manual constant folding was added in 2017 in
https://reviews.llvm.org/D29956, but since then it looks like IRBuilder
has learnt to fold it away itself.
I'm not sure at what point this happened, I just verified this by
stepping through the call to CreateVectorSplat in the debugger.
lowerDotProduct called above may already lower a matrix multiply and
mark it as procssed by adding it to FusedInsts. Don't try to process it
again in LowerMatrixMultiplyFused by checking if FusedInsts.
Without this change, we trigger an assertion when trying to erase the
same original matrix multiply twice.
This modification will enable the usage of `MergeFunctions` as a
standalone library. Currently, `MergeFunctions` can only be applied to
an entire module. By adopting this change, developers will gain the
flexibility to reuse the `MergeFunctions` code within their own
projects, choosing which functions to merge; hence, promoting code
reusability. Notice that this modification will not break backward
compatibility, because `MergeFunctions` will still work as a pass after
the modification.
If IVUpdateMayOverflow is false, we proved that the induction increment
cannot overflow in the vector loop. This allows setting NUW in some
cases when folding the tail.
PR: https://github.com/llvm/llvm-project/pull/111758
Update the test cases contains `any-of` printings from the
precomputeCost().
Origin message:
The any-of reduction contains phi and select instructions.
The select instruction might be optimized and removed in the vplan which
may cause VF difference between legacy and VPlan-based model. But if the
select instruction be removed, planContainsAdditionalSimplifications()
will catch it and disable the assertion.
Therefore, we can just remove the ayn-of reduction calculation in the
precomputeCost().
Recommit "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC
(#117109)"
Summary:
We currently have an unnecessary level of indirection when initializing
the RPC client. This is a holdover from when the RPC client was not
trivially copyable and simply makes it more complicated. Here we use the
`asm` syntax to give the C++ variable a valid name so that we can just
copy to it directly.
Another advantage to this, is that if users want to piggy-back on the
same RPC interface they need only declare theirs as extern with the same
symbol name, or make it weak to optionally use it if LIBC isn't
avaialb.e
`DenseMap::lookup` returns by value (because it default-creates the
returned value if the key isn't present in the map), which means that we
do a lot of copying here. Since we assert that something is present in
the returned value two lines below this call, it's safe to use `.at`
here instead.
Copying and then destroying dense maps here is responsible for 60% of
the time spent in LTO indexing in a large internal build.
AlwaysInliner doesn't use BFI itself, it only updates it. If BFI is not
already computed, it will spend time to first compute it, and then
update it. This is not necessary: If BFI is not available in the first
place, there is no need to update it.
This is mainly relevant in debug builds for IR that has a lot of
alwaysinline functions.
Preserve !alias.scope, !noalias and !mem.parallel_loop_access metadata
on the replacement instruction, if it does not move. In that case, the
program would be UB, if the aliasing property encoded in the metadata
does not hold. This makes use of the clarification re aliasing metadata
implying UB if the property does not hold: #116220
Same as #115868, but for !alias.scope, !noalias and
!mem.parallel_loop_access.
PR: https://github.com/llvm/llvm-project/pull/117716