DFAJumpThreading
JumpThreading
LibCallsShrink
LoopVectorize
SLPVectorizer
DeadStoreElimination
AggressiveDCE
CorrelatedValuePropagation
IndVarSimplify
These are part of the optimization pipeline, of which the legacy version is deprecated and being removed.
This is wasteful, but only affects the legacy pass manager. Otherwise
a1b78fb929fccf96acaa0212cf68fee82298e747 would crash JT when running
with that PM. There are still a few users of the legacy PM out there
that are reluctant to migrate, numba in this case.
No test as we don't test legacy PM anymore.
Currently, JT creates and updates local instances of BPI\BFI. As a result global ones have to be invalidated if JT made any changes.
In fact, JT doesn't use any information from BPI/BFI for the sake of the transformation itself. It only creates BPI/BFI to keep them up to date. But since it updates local copies (besides cases when it updates profile metadata) it just waste of time.
Current patch is a rework of D124439. D124439 makes one step and replaces local copies with global ones retrieved through AnalysisPassManager. Here we do one more step and don't create BPI/BFI if the only reason of creation is to keep BPI/BFI up to date. Overall logic is the following. If there is cached BPI/BFI then update it along the transformations. If there is no existing BPI/BFI, then create it only if it is required to update profile metadata.
Please note if BPI/BFI exists on exit from JT (either cached or created) it is always up to date and no reason to invalidate it.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D136827
This patch causes debug value intrinsics outside of cloned blocks in the
Jump Threading pass to correctly point towards any derived values. If it cannot,
it kills them.
Reviewed By: probinson, StephenTozer
Differential Revision: https://reviews.llvm.org/D140404
Currently, JT creates and updates local instances of BPI\BFI. As a result global ones have to be invalidated if JT made any changes.
In fact, JT doesn't use any information from BPI/BFI for the sake of the transformation itself. It only creates BPI/BFI to keep them up to date. But since it updates local copies (besides cases when it updates profile metadata) it just waste of time.
Current patch is a rework of D124439. D124439 makes one step and replaces local copies with global ones retrieved through AnalysisPassManager. Here we do one more step and don't create BPI/BFI if the only reason of creation is to keep BPI/BFI up to date. Overall logic is the following. If there is cached BPI/BFI then update it along the transformations. If there is no existing BPI/BFI, then create it only if it is required to update profile metadata.
Please note if BPI/BFI exists on exit from JT (either cached or created) it is always up to date and no reason to invalidate it.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D136827
Jump threading can replace select and unconditional branch with
conditional branch, but when doing so loses profile information.
This destructive transform can eventually lead to a performance
degradation due to folding of branches in
shouldFoldCondBranchesToCommonDestination as branch probabilities
are no longer known.
The first version was reverted due to assert caused by i32 overflow,
fixed in this version.
Patch by Roman Paukner!
Differential Revision: https://reviews.llvm.org/D138132
Reviewed By: mkazantsev
This reverts commit 957952dbf2f34ed552e8e1f8c35eed17eee2ea38.
Addition in the newly added code can overflow. As a result, the
constructor of `BranchProbability()` can trigger an assertion. See
the discussion on https://reviews.llvm.org/D138132 for more details.
This is a patch to fix duplicated dbg.values in the JumpThreading pass not
pointing towards their local value, and instead towards the variable in the
original block.
JumpThreadingPass::cloneInstructions is the changed function to target metadata
as well as normal cloned values.
Reviewed By: jmorse, StephenTozer
Differential Revision: https://reviews.llvm.org/D140006
Jump threading can replace select and unconditional branch with
conditional branch, but when doing so loses profile information.
This destructive transform can eventually lead to a performance
degradation due to folding of branches in
shouldFoldCondBranchesToCommonDestination as branch probabilities
are no longer known.
Patch by Roman Paukner!
Differential Revision: https://reviews.llvm.org/D138132
Reviewed By: mkazantsev
Currently, JT creates and updates local instances of BPI\BFI. As a result global ones have to be invalidated if JT made any changes.
In fact, JT doesn't use any information from BPI/BFI for the sake of the transformation itself. It only creates BPI/BFI to keep them up to date. But since it updates local copies (besides cases when it updates profile metadata) it just waste of time.
Current patch is a rework of D124439. D124439 makes one step and replaces local copies with global ones retrieved through AnalysisPassManager. Here we do one more step and don't create BPI/BFI if the only reason of creation is to keep BPI/BFI up to date. Overall logic is the following. If there is cached BPI/BFI then update it along the transformations. If there is no existing BPI/BFI, then create it only if it is required to update profile metadata.
Please note if BPI/BFI exists on exit from JT (either cached or created) it is always up to date and no reason to invalidate it.
Differential Revision: https://reviews.llvm.org/D136827
Do not duplicate a BB if it has a lot of PHI nodes.
If a threadable chain is too long then the number of duplicated PHI nodes
can add up, leading to a substantial increase in compile time when rewriting
the SSA.
Fixes https://github.com/llvm/llvm-project/issues/58203
Differential Revision: https://reviews.llvm.org/D136716
The threshold of 76 in this patch is reasonably high and reduces the compile
time of cldwat2m_macro.f90 in SPEC2017/cam4 from 80+min to <2min.
Change-Id: I153c89a8e0d89b206a5193dc1b908c67e320717e
Use getPredicateOnEdge method if value is a non-local
compare-with-a-constant instruction, that can give more precise
results than getConstantOnEdge.
Differential Revision: https://reviews.llvm.org/D131956
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.
Original Patch by @samparker (Sam Parker)
Differential Revision: https://reviews.llvm.org/D79483
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
Since we can't change the destination of indirectbr, so when
encounter indirectbr as PredPredBB terminator, we should pass it.
Differential Revision: https://reviews.llvm.org/D129193
SplitBlockPredecessors currently asserts if one of the predecessor
terminators is a callbr. This limitation was originally necessary,
because just like with indirectbr, it was not possible to replace
successors of a callbr. However, this is no longer the case since
D67252. As the requirement nowadays is that callbr must reference
all blockaddrs directly in the call arguments, and these get
automatically updated when setSuccessor() is called, we no longer
need this limitation.
The only thing we need to do here is use replaceSuccessorWith()
instead of replaceUsesOfWith(), because only the former does the
necessary blockaddr updating magic.
I believe there's other similar limitations that can be removed,
e.g. related to critical edge splitting.
Differential Revision: https://reviews.llvm.org/D129205
This code requires the result to be an UndefValue/ConstantInt
anyway (checked by getKnownConstant), so we are only interested
in the case where this folds.
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName". This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.
This is the alternative to the less invasive clang-format only patch: D126783
Reviewed By: spatel, rengolin
Differential Revision: https://reviews.llvm.org/D126889
This whole part with recomputation of BPI and BFI looks redundant,
and we tried to get rid of it in D124439. Unfortunately, it causes
some hard-to-reproduce failures due to invalid state of analysis.
Until this is investigated and fixed, let's try to reuse at least
part of available analyzes.
DT is available at this point, and there is no need to recompute it.
Please revert if you see it causing *any* behavior changes.
All callers pass true.
select-unfold-freeze.ll is now a subset of select.ll so delete it.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D126501
JumpThreading may convert selects into branch instructions,
in which case the condition needs to be frozen (as branch on
poison is immediate undefined behavior, unlike select on poison).
The necessary code for this is already in place, this just enables
the option.
Differential Revision: https://reviews.llvm.org/D125869
This code is valid for any icmp, so we can safely look through a
freeze when trying to find one.
A caveat here is that replaceFoldableUses() may not end up replacing
any uses in this case. It might make sense to use the freeze as the
context instruction (rather than the terminator) if there is a
freeze, to ensure that it always gets folded. This would require
some changes to how replaceFoldedUses() works though, as it
currently assumes that the value is valid at the end of the block.
It's sufficient to just fold the icmp to true/false here, and then
let constant terminator folding take care of the rest.
It should be noted that while replaceFoldableUses() may not replace
all uses of the icmp, at least the use in the terminator we're
working on is always replaceable, so terminator constant folding
should be reliably enabled as a subsequent step.
This patch makes JumpThreading's ProcessImpliedCondition deal with frozen
conditions.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D84941
JumpThreading intentionally does not force updating of the DT
during optimization, because this may be expensive when many CFG
updates and DT calculations are interleaved.
We shouldn't be fetching the DT just for the purpose of calling
isGuaranteedNotToBeUndefOrPoison(), especially as DT availability
doesn't even show benefit in tests.
This change has caused non-reproducibility of a self-build of Clang
when using NewPM and providing profile data.
This reverts commit 35f38583d2f2484794f579bed69566b40e732206.
They can already be available, and even if not, DT/LI can be available.
We should not recompute them. Old PM is unchanged because it would
require changing dependencies, and we don't care enough about it.
Differential Revision: https://reviews.llvm.org/D124439
Reviewed By: nikic, aeubanks
After e734e8286b4b521d829aaddb6d1cbbd264953625, it is possible to end up in
a situation where an `indirectbr` is fed by a cast, which is in turn fed by
an operation which only produces integers.
`indirectbr` expects a block address, however these operations can't produce
that.
There were several asserts in `computeValueKnownInPredecessorsImpl` which check
that we're not looking for a block address if we're walking through something
which can never produce one.
Since it's now possible to hit these asserts, this changes them into actual
checks which return false if `Preference` is not `WantInteger`.
This adds a testcase which verifies that we don't crash anymore in these
situations.
Differential Revision: https://reviews.llvm.org/D99814
It seems the crashes we saw wasn't caused by this (see comments on the review).
> This is basically D108837 but for jump threading. Free instructions
> should be ignored for the threading decision. JumpThreading already
> skips some free instructions (like pointer bitcasts), but does not
> skip various free intrinsics -- in fact, it currently gives them a
> fairly large cost of 2.
>
> Differential Revision: https://reviews.llvm.org/D110290
This reverts commit 4604695d7c20e72b551a1a5224f3de877cb41bd3.
It caused compiler crashes, see comment on the code review for repro.
> This is basically D108837 but for jump threading. Free instructions
> should be ignored for the threading decision. JumpThreading already
> skips some free instructions (like pointer bitcasts), but does not
> skip various free intrinsics -- in fact, it currently gives them a
> fairly large cost of 2.
>
> Differential Revision: https://reviews.llvm.org/D110290
This reverts commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff.
This is basically D108837 but for jump threading. Free instructions
should be ignored for the threading decision. JumpThreading already
skips some free instructions (like pointer bitcasts), but does not
skip various free intrinsics -- in fact, it currently gives them a
fairly large cost of 2.
Differential Revision: https://reviews.llvm.org/D110290
getMetadata() currently uses a weird API where it populates a
structure passed to it, and optionally merges into it. Instead,
we can return the AAMDNodes and provide a separate merge() API.
This makes usages more compact.
Differential Revision: https://reviews.llvm.org/D109852
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.
I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104477