Added a class to hold such common state. The goal is to both reduce the argument list of other utilities used by `computeImportForModule` (which will be brought as members in a subsequent patch), and to make it easy to extend such state later.
VPWidenRecipe only needs the opcode to widen, all other information
(flags, debug loc and operands) is already modeled directly via the
recipe.
This removes the remaining uses of the underlying instruction from
VPWidenRecipe::execute.
Currently we hoist conditions from widenable branch which are joined to
the widenable_condition by And operation. E.g if we have
br(WC && (c1 && c2)) we will operate with (c1 && c2) unsplitted.
This patch adds more flexibility to that mechanism by supporting work
with the list of checks parsed from the widenable branch.
On that stage patch doesn't change the logic of checks hoisting. In the
example above we will either hoist both checks [c1, c2] or none of them.
But in the future we would improve that logic analyzing each check
separately.
Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D157689
Add a dedicated debug location to VPRecipeBase to remove another
unneeded use of the underlying LLVM IR instruction and also consolidate
various DL fields in sub classes.
Each recipe can have debug location and it shouldn't rely on reference
to the underlying LLVM IR instructions to retain it. See various recipes
that had separate DL fields already.
This patch removes the member TTI from VPReductionRecipe, as the
generation of reduction operations no longer requires TTI.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D158148
Update VPBlendRecipe::print() to print the result directly, instead of
relying on the stored Phi pointer. This brings the recipe in line with
how other recipes are printed.
Debug and pseudo instructions aren't modeled in VPlan. Turn a check into
an assertion. This will help removing the direct use of Inst here in the
future.
Extend VPRecipeWithIRFlags to also manage predicates for compares. This
allows removing the custom ICmpULE opcode from VPInstruction which was a
workaround for missing proper predicate handling.
This simplifies the code a bit while also allowing compares with any
predicates. It also fixes a case where the compare predixcate wasn't
printed properly for VPReplicateRecipes.
Discussed/split off from D150398.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D158992
Many AAs translated callee information to the call site explicitly but
they now all use the helper we already had for callee return to call
site return propagation. In a follow up the helper is going to be
extended to handle multiple callees.
Fold select (fcmp oeq x, 0), (fmul x, y), x => x
This cleans up a pattern left behind by denormal range checks under
denormals are zero.
The pattern starts out as something like:
x = x < smallest_normal ? x * K : x;
The comparison folds to an == 0 when the denormal mode treats input
denormals as zero. This makes library denormal checks free after
linked into DAZ enabled code.
alive2 is mostly happy with this, but there are some issues. First,
there are many reported failures in some of the negative tests that
happen to trigger some preexisting canonicalize introducing
combine. Second, alive2 is incorrectly asserting that denormals must
be flushed with the DAZ modes. It's allowed to drop a canonicalize.
https://reviews.llvm.org/D157030
Usually pointer tag will match tag in the shadow, so we can keep
inlining this check keeping the rest in outlined part.
It imroves performance by about 25%, but increases code size by 30%.
Existing outlining reduces performance by 30%, but saves code size by 80%.
So we still significantly reduce code size with minimal performance loss.
Reviewed By: fmayer
Differential Revision: https://reviews.llvm.org/D159172
A direct call to a function casted to a different type is still not
really an address taken event. We allow the user to opt out of these
now.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D159149
Also removed them from the header. They are there for test-only. This
simplifies further refactoring (as well as code comprehension)
Differential Revision: https://reviews.llvm.org/D159308
Address feedback in https://reviews.llvm.org/D158817. Since `extractProbe` can be used for both calliste and BB probe, we can leverage this to unify the callsite handling code.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D159169
Technically increases the number of instructions if the
result isn't cast back to float. Even in this case it's
still probably a better canonical form since it enables FP value
tracking.
https://reviews.llvm.org/D151939
In the past we sort of pretended float might be implementable
as a non-IEEE type but that never realistically would work. Exotic
FP types would need to be added to the IR. Turning these
into FP operations enables FP tracking optimizations.
https://reviews.llvm.org/D151937
This is a resurrection of D18874. This was previously wrong with
fneg conflated with fsub, but we now have a proper fneg instruction.
Additionally, I think it is now clearer that IR float=IEEE float,
and a different bit layout would require adding a different IR type.
https://reviews.llvm.org/D151934
When a loop has multiple reductions, each with an intermediate invariant
store, the order in which those reductions are processed is not considered.
This can result in the invariant stores outside the loop not preserving the
original order.
This patch sorts VPReductionPHIRecipes by the order in which they have
stores in the original loop before running
`InnerLoopVectorizer::fixReduction` function, and it helps to maintain
the correct order of stores.
Fixes https://github.com/llvm/llvm-project/issues/64047
Differential Revision: https://reviews.llvm.org/D157631
Allow specialization of functions with "dynamic" denormal modes to a
known IEEE or DAZ mode based on callers. This should make it possible
to implement a is-denormal-flushing-enabled test using
llvm.canonicalize and have it be free after LTO.
https://reviews.llvm.org/D156129
/data/home/jiefu/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2189:8: error: variable 'IsFuncHashMismatch' set but not used [-Werror,-Wunused-but-set-variable]
bool IsFuncHashMismatch = false;
^
1 error generated.
Accumulating the staleness metrics from per-link is less accurate than doing it from post-link time(assuming we use the offline profile mismatch as baseline), the reason is that there are some duplicated reports for the same functions, for example, one template function could be included in multiple TUs, but in post thin link time, only one function are kept(linkonce_odr) and others are marked as available-externally function. Hence, this change skips reporting the metrics for imported functions(available-externally).
I saw the post-link number is now very close to the offline number(dump the mismatched functions and count the metrics offline based on the entire profile), sightly smaller than offline number due to some missing inlined functions.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D156725
Follow-up diff for https://reviews.llvm.org/D158891. Compute the checksum mismatch based on the original nested profile. Additionally, use a recursive way to compute the children mismatched samples in the nested tree even the top-level func checksum is matched.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D158900
- Always use flattened profile to find the profile anchors. Since profile under different contexts may have different inlined callsites, to get more profile anchors, we use a merged profile from all the contexts(the flattened profile) to find callsite anchors.
- Compute the staleness metrics based on the original nested profile, as currently once a callsite is mismatched, all its children profile are dropped.(TODO: in future, we can improve to reuse the children valid profile)
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D158891
As in per-link time, callsites could be optimized out by inlining, we don't have those original call targets in the IR in LTO time. Additionally, the inlined code doesn't actually belong to the original function, the IR locations or pseudo probe parsed from it are incorrect and could mislead the matching later.
This change adds the support to extract the original IR location info from the inlined code, specifically, it make sure to skip all the inlined code that doesn't belong the original function, but before that, it processes the inline frames of the debug info to extract the base frame and recover its callsite and callee target(name).
Measured on some stale profile instances, all showed some perf improvements.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D156722
- rename `IRLocation` --> `IRAnchors`, `ProfileLocation` --> `ProfileAnchors`
- reorganize runOnFunction, fact out the finding IR anchors code into `findIRAnchors`
- introduce a new function `findProfileAnchors` to populate the profile related anchors, the result is saved into `ProfileAnchors`, it's later used for both mismatch report and matching, this can avoid to parse the `getBodySamples` and `getCallsiteSamples` for multiple times.
- move the `MatchedCallsiteLocs` stuffs from `findIRAnchors` to `countProfileMismatches` so that all the staleness metrics report are computed in one function.
- move all matching related into `runStaleProfileMatching`, and move all mismatching report into `countProfileMismatches`
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D158817
This allows to add facts even if no corresponding ICmp instruction
exists in the IR.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D158837