34532 Commits

Author SHA1 Message Date
Alexey Bataev
8d933ea5ac [SLP][NFC]Use SmallDensetSet for lookup instead of ArrayRef, NFC. 2023-09-06 13:17:30 -07:00
Mircea Trofin
24a08592bc
[nfc][thinlto] Factor common state for computeImportForModule (#65427)
Added a class to hold such common state. The goal is to both reduce the argument list of other utilities used by `computeImportForModule` (which will be brought as members in a subsequent patch), and to make it easy to extend such state later.
2023-09-06 11:57:15 -07:00
Florian Hahn
785e7063b9
[VPlan] Don't rely on underlying instr in VPWidenRecipe (NFCI).
VPWidenRecipe only needs the opcode to widen, all other information
(flags, debug loc and operands) is already modeled directly via the
recipe.

This removes the remaining uses of the underlying instruction from
VPWidenRecipe::execute.
2023-09-06 16:27:09 +01:00
Aleksandr Popov
0e0ff8573d [GuardWidening] Refactor to work with the list of checks to widen/hoist
Currently we hoist conditions from widenable branch which are joined to
the widenable_condition by And operation. E.g if we have
br(WC && (c1 && c2)) we will operate with (c1 && c2) unsplitted.

This patch adds more flexibility to that mechanism by supporting work
with the list of checks parsed from the widenable branch.

On that stage patch doesn't change the logic of checks hoisting. In the
example above we will either hoist both checks [c1, c2] or none of them.
But in the future we would improve that logic analyzing each check
separately.

Reviewed By: anna

Differential Revision: https://reviews.llvm.org/D157689
2023-09-06 00:46:48 +02:00
Alexey Bataev
09b8bbd6e0 [SLP][NFC]Reorder indeces instead of real values, NFC.
May save some memory/compile time.
2023-09-05 08:48:52 -07:00
Florian Hahn
165e24aa2a
[VPlan] Move DebugLoc to VPRecipeBase (NFCI).
Add a dedicated debug location to VPRecipeBase to remove another
unneeded use of the underlying LLVM IR instruction and also consolidate
various DL fields in sub classes.

Each recipe can have debug location and it shouldn't rely on reference
to the underlying LLVM IR instructions to retain it. See various recipes
that had separate DL fields already.
2023-09-05 15:45:16 +01:00
Florian Hahn
168e23c741
[VPlan] Remove reference to Instr when setting debug loc. (NFCI)
This allows untangling references to underlying IR for various recipes.
2023-09-05 10:59:13 +01:00
Mel Chen
26aed5b9a8 [VPlan][LoopUtils] Remove unused parameter TTI
This patch removes the member TTI from VPReductionRecipe, as the
generation of reduction operations no longer requires TTI.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D158148
2023-09-04 05:30:37 -07:00
Florian Hahn
3fa1b254b7
[VPlan] Print blend recipe as operand directly, instead of IR PHI.
Update VPBlendRecipe::print() to print the result directly, instead of
relying on the stored Phi pointer. This brings the recipe in line with
how other recipes are printed.
2023-09-04 12:35:58 +01:00
DianQK
7ded71b1e4
[JumpThreading] Invalidate LVI after combineMetadataForCSE. 2023-09-04 11:50:14 +08:00
Florian Hahn
19d286bca0
[VPlan] Assert that inst isnt' a debug or pseudo inst (NFCI).
Debug and pseudo instructions aren't modeled in VPlan. Turn a check into
an assertion. This will help removing the direct use of Inst here in the
future.
2023-09-03 21:31:31 +01:00
Nuno Lopes
5a3fd5f3f5 [LoopVectorizer] Fix PR #65212: vectorization of reduction loop wasn't respecting original store alignment 2023-09-03 16:35:05 +01:00
Florian Hahn
fd66195777
[VPlan] Manage compare predicates in VPRecipeWithIRFlags.
Extend VPRecipeWithIRFlags to also manage predicates for compares. This
allows removing the custom ICmpULE opcode from VPInstruction which was a
workaround for missing proper predicate handling.

This simplifies the code a bit while also allowing compares with any
predicates. It also fixes a case where the compare predixcate wasn't
printed properly for VPReplicateRecipes.

Discussed/split off from D150398.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158992
2023-09-02 21:45:24 +01:00
Kazu Hirata
83e6931827 [llvm] Use llvm::is_contained (NFC) 2023-09-02 09:32:46 -07:00
Kazu Hirata
6da470d7f8 [llvm] Use range-based for loops (NFC) 2023-09-02 09:32:45 -07:00
Johannes Doerfert
9207a90be5 [Attributor] Do not expand dead indirect call sites 2023-09-01 22:14:38 -07:00
Johannes Doerfert
a8ac969b10 [Attributor][NFC] Use common helper to avoid duplication
Many AAs translated callee information to the call site explicitly but
they now all use the helper we already had for callee return to call
site return propagation. In a follow up the helper is going to be
extended to handle multiple callees.
2023-09-01 21:04:03 -07:00
Johannes Doerfert
ac0d3869c5 [Attributor][NFC] Simplify the helper APIs
We have various helpers to propagate information. This patch cleans up
the API to allow less template parameters and more uniform handling.
2023-09-01 21:04:02 -07:00
Johannes Doerfert
6b95126b9b [Attributor][NFC] Rename AACallSiteReturnedFromReturned
In a follow up we'll use it for more than "callee return" -> "call site
return" deduction, effectively allowing "callee" -> "call site".
2023-09-01 21:04:02 -07:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00
Johannes Doerfert
37642714ed [Attributor][FIX] Support non-0 AS for function pointers 2023-09-01 17:17:51 -07:00
Noah Goldstein
54ec8bcaf8 Recommit "[InstCombine] Expand foldSelectICmpAndOr -> foldSelectICmpAndBinOp to work for more binops" (3rd Try)
Fixed bug that assumed binop was commutative.
Was re-reviewed by nikic and chapuni

Differential Revision: https://reviews.llvm.org/D148414
2023-09-01 17:15:51 -05:00
Christoph Stiller
3af4590506 [InstCombine] Contracting x^2 + 2*x*y + y^2 to (x + y)^2 (float)
Resolves https://github.com/llvm/llvm-project/issues/61296 if https://reviews.llvm.org/D156026 didn't suffice.

Reviewed By: goldstein.w.n

Differential Revision: https://reviews.llvm.org/D158079
2023-09-01 15:02:12 -05:00
Matt Arsenault
5ae881ff0a InstCombine: Fold out scale-if-denormal pattern
Fold select (fcmp oeq x, 0), (fmul x, y), x => x

This cleans up a pattern left behind by denormal range checks under
denormals are zero.

The pattern starts out as something like:
  x = x < smallest_normal ? x * K : x;

The comparison folds to an == 0 when the denormal mode treats input
denormals as zero. This makes library denormal checks free after
linked into DAZ enabled code.

alive2 is mostly happy with this, but there are some issues. First,
there are many reported failures in some of the negative tests that
happen to trigger some preexisting canonicalize introducing
combine. Second, alive2 is incorrectly asserting that denormals must
be flushed with the DAZ modes. It's allowed to drop a canonicalize.

https://reviews.llvm.org/D157030
2023-09-01 07:47:12 -04:00
Vitaly Buka
be601928e1 [HWASAN] Inline fast pass of instrumentMemAccessOutline
Usually pointer tag will match tag in the shadow, so we can keep
inlining this check keeping the rest in outlined part.

It imroves performance by about 25%, but increases code size by 30%.
Existing outlining reduces performance by 30%, but saves code size by 80%.
So we still significantly reduce code size with minimal performance loss.

Reviewed By: fmayer

Differential Revision: https://reviews.llvm.org/D159172
2023-08-31 21:26:48 -07:00
Johannes Doerfert
8dd3b4581c [Attributor][NFC] Clean include order 2023-08-31 19:32:52 -07:00
Johannes Doerfert
209496b766 [Core] Allow hasAddressTaken to ignore "casted direct calls"
A direct call to a function casted to a different type is still not
really an address taken event. We allow the user to opt out of these
now.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D159149
2023-08-31 19:32:52 -07:00
Mircea Trofin
a479dd1242 [nfc][thinlto] Mark some functions explicitly as "Test"
Also removed them from the header. They are there for test-only. This
simplifies further refactoring (as well as code comprehension)

Differential Revision: https://reviews.llvm.org/D159308
2023-08-31 16:30:18 -07:00
wlei
f14a5ff635 [CSSPGO] Refactoring findIRAnchors
Address feedback in https://reviews.llvm.org/D158817. Since `extractProbe` can be used for both calliste and BB probe, we can leverage this to unify the callsite handling code.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D159169
2023-08-31 16:25:47 -07:00
Matt Arsenault
70aede228a InstCombine: Recognize fneg(fabs) as bitcasted integer
Technically increases the number of instructions if the
result isn't cast back to float. Even in this case it's
still probably a better canonical form since it enables FP value
tracking.

https://reviews.llvm.org/D151939
2023-08-31 19:07:36 -04:00
Matt Arsenault
5c0da5839d InstCombine: Recognize fabs as bitcasted integer
In the past we sort of pretended float might be implementable
as a non-IEEE type but that never realistically would work. Exotic
FP types would need to be added to the IR. Turning these
into FP operations enables FP tracking optimizations.

https://reviews.llvm.org/D151937
2023-08-31 19:03:48 -04:00
Matt Arsenault
50a9b3d8a5 InstCombine: Recognize fneg when performed as bitcasted integer
This is a resurrection of D18874. This was previously wrong with
fneg conflated with fsub, but we now have a proper fneg instruction.
Additionally, I think it is now clearer that IR float=IEEE float,
and a different bit layout would require adding a different IR type.

https://reviews.llvm.org/D151934
2023-08-31 18:59:34 -04:00
Vitaly Buka
c6aaf2e521 [NFC][HWASAN] Extract insertShadowTagCheck()
Reviewed By: fmayer

Differential Revision: https://reviews.llvm.org/D159165
2023-08-31 13:22:51 -07:00
Vitaly Buka
b80fa58bdc [NFC][hwasan] Rename local variable 2023-08-31 12:25:46 -07:00
Vitaly Buka
bb637396db [test][HWASAN] Precommit -hwasan-inline-fast-path-checks tests
Reviewed By: fmayer

Differential Revision: https://reviews.llvm.org/D159157
2023-08-31 11:24:36 -07:00
Igor Kirillov
ac65fb8699 [LoopVectorize] Fix incorrect order of invariant stores when there are multiple reductions.
When a loop has multiple reductions, each with an intermediate invariant
store, the order in which those reductions are processed is not considered.
This can result in the invariant stores outside the loop not preserving the
original order.
This patch sorts VPReductionPHIRecipes by the order in which they have
stores in the original loop before running
`InnerLoopVectorizer::fixReduction` function, and it helps to maintain
the correct order of stores.

Fixes https://github.com/llvm/llvm-project/issues/64047

Differential Revision: https://reviews.llvm.org/D157631
2023-08-31 16:21:44 +00:00
Matt Arsenault
9536bbe464 Attributor: Don't pass ArrayRef by const reference 2023-08-31 08:41:08 -04:00
Matt Arsenault
850ec7bbb1 Attributor: Try to propagate concrete denormal-fp-math{-f32}
Allow specialization of functions with "dynamic" denormal modes to a
known IEEE or DAZ mode based on callers. This should make it possible
to implement a is-denormal-flushing-enabled test using
llvm.canonicalize and have it be free after LTO.

https://reviews.llvm.org/D156129
2023-08-31 08:26:32 -04:00
Fraser Cormack
e0c60bff8c [InferAddressSpaces][NFC] Fix code formatting 2023-08-31 12:19:13 +01:00
Jie Fu
3b51881dd5 [CSSPGO] Silence -Wunused-but-set-variable warning without asserts (NFC)
/data/home/jiefu/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2189:8: error: variable 'IsFuncHashMismatch' set but not used [-Werror,-Wunused-but-set-variable]
  bool IsFuncHashMismatch = false;
       ^
1 error generated.
2023-08-31 09:58:29 +08:00
wlei
4bb6bbb9bf [CSSPGO] Skip reporting staleness metrics for imported functions
Accumulating the staleness metrics from per-link is less accurate than doing it from post-link time(assuming we use the offline profile mismatch as baseline), the reason is that there are some duplicated reports for the same functions, for example, one template function could be included in multiple TUs, but in post thin link time, only one function are kept(linkonce_odr) and others are marked as available-externally function. Hence, this change skips reporting the metrics for imported functions(available-externally).

I saw the post-link number is now very close to the offline number(dump the mismatched functions and count the metrics offline based on the entire profile), sightly smaller than offline number due to some missing inlined functions.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D156725
2023-08-30 18:00:23 -07:00
wlei
3365cd4544 [CSSPGO] Compute checksum mismatch recursively on nested profile
Follow-up diff for https://reviews.llvm.org/D158891. Compute the checksum mismatch based on the original nested profile. Additionally, use a recursive way to compute the children mismatched samples in the nested tree even the top-level func checksum is matched.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D158900
2023-08-30 18:00:23 -07:00
wlei
62a3f6c96e [CSSPGO] Retire FlattenProfileForMatching
- Always use flattened profile to find the profile anchors. Since profile under different contexts may have different inlined callsites, to get more profile anchors, we use a merged profile from all the contexts(the flattened profile) to find callsite anchors.

- Compute the staleness metrics based on the original nested profile, as currently once a callsite is mismatched, all its children profile are dropped.(TODO: in future, we can improve to reuse the children valid profile)

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D158891
2023-08-30 18:00:23 -07:00
wlei
062af2e763 [CSSPGO] Support stale profile matching for LTO
As in per-link time, callsites could be optimized out by inlining, we don't have those original call targets in the IR in LTO time. Additionally, the inlined code doesn't actually belong to the original function, the IR locations or pseudo probe parsed from it are incorrect and could mislead the matching later.

This change adds the support to extract the original IR location info from the inlined code, specifically, it make sure to skip all the inlined code that doesn't belong the original function, but before that, it processes the inline frames of the debug info to extract the base frame and recover its callsite and callee target(name).

Measured on some stale profile instances, all showed some perf improvements.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D156722
2023-08-30 18:00:23 -07:00
wlei
148cceb0d6 [CSSPGO] Refactoring SampleProfileMatcher::runOnFunction
- rename `IRLocation` --> `IRAnchors`,  `ProfileLocation` --> `ProfileAnchors`
- reorganize runOnFunction, fact out the finding IR anchors code into `findIRAnchors`
- introduce a new function `findProfileAnchors` to populate the profile related anchors, the result is saved into `ProfileAnchors`, it's later used for both mismatch report and matching, this can avoid to parse the `getBodySamples` and `getCallsiteSamples` for multiple times.
- move the `MatchedCallsiteLocs` stuffs from `findIRAnchors` to `countProfileMismatches` so that all the staleness metrics report are computed in one function.
- move all matching related into `runStaleProfileMatching`, and move all mismatching report into `countProfileMismatches`

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D158817
2023-08-30 18:00:23 -07:00
Philip Reames
aada8f2e54 [slp] Tweak debug costing output to include VL
This makes it much easier to understand which vector length is being considered when the same set of nodes are evaluated at multiple vector lengths.
2023-08-30 09:13:19 -07:00
Florian Hahn
e544d9cc36
[VPlan] Remove unused VPBuilder::insert member (NFC). 2023-08-30 16:35:55 +01:00
Florian Hahn
cd9563ae17
[VPlan] Remove unused VPInstruction::clone member (NFC). 2023-08-30 15:53:39 +01:00
Mikhail Goncharov
74f4daef04 fix unused variable warnings in conditionals
for 92023b15099012a657da07ebf49dd7d94a260f84
2023-08-30 14:36:42 +02:00
Florian Hahn
4a5bcbd560
[ConstraintElim] Store conditional facts as (Predicate, Op0, Op1).
This allows to add facts even if no corresponding ICmp instruction
exists in the IR.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D158837
2023-08-30 10:54:28 +01:00