28947 Commits

Author SHA1 Message Date
Hongtao Yu
042cefd2b5 [CSSPGO] Fix a hash code truncating issue in ContextTrieNode.
std::hash returns a 64bit hash code while previously we were using only lower 32 bits which caused hash collision for large workloads.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D113688
2021-11-16 11:01:52 -08:00
Sanjay Patel
8fce94f916 [InstCombine] canonicalize icmp with trunc op into mask and cmp, part 2
If C is a high-bit mask:
(trunc X) u< C --> (X & C) != C (are any masked-high-bits clear?)

If C is low-bit mask:
(trunc X) u> C --> (X & ~C) != 0 (are any masked-high-bits set?)

If C is not-of-power-of-2 (one clear bit):
(trunc X) u> C --> (X & (C+1)) == C+1 (are all masked-high-bits set?)

This extends the fold added with:
acabad9ff6bf (https://alive2.llvm.org/ce/z/aFr7qV)

Using decomposeBitTestICmp() to generalize this is a planned follow-up, but that requires removing an inverse fold.

Here are Alive2 generalizations for these folds:
https://alive2.llvm.org/ce/z/u-ZpC_ (ult, the previous patch)
https://alive2.llvm.org/ce/z/YsuAu2 (ult, this patch)
https://alive2.llvm.org/ce/z/ekktQP (ugt, low bitmask)
https://alive2.llvm.org/ce/z/pJY9wR (ugt, one clear bit)

Differential Revision: https://reviews.llvm.org/D112634
2021-11-16 09:27:30 -05:00
Alexey Bataev
900cc1a226 [SLP]Improve cost of the gather nodes.
No need to count the final shuffle cost for the constants, gathering of
the constants is just a constant vector + extra inserts, if required.

Differential Revision: https://reviews.llvm.org/D113770
2021-11-16 06:25:07 -08:00
Alexey Bataev
cdf8a53c1d [SLP]Fix windows build, NFC.
Need to put `IndexIdx` var to the list of captures.
2021-11-16 06:09:51 -08:00
Alexey Bataev
aa9bbb64be [SLP]Adjust GEP indices types when trying to build entries.
Need to adjust the types of GEPs indices when building the tree
entries/operands. Otherwise some of the nodes might differ and
vectorizer is unable to correctly find them and count their cost.

Differential Revision: https://reviews.llvm.org/D113792
2021-11-16 05:44:33 -08:00
Sander.DeSmalen@arm.com
305816ff1e [IndVarSimplify] Reduce nondeterministic behaviour in visitIVCast.
rGf39978b84f1d3a1da6c32db48f64c8daae64b3ad led to and/or exposed
an issue with IndVarSimplification for a loop where a i32 phi node is
no longer replaced by a widened (i64) phi node, because the SCEVs of a
sign-extend no longer folded the same way. I'm unsure how to properly
explain this because it's all rather complicated, but in short: SCEVs
don't fold as nicely as they used to and this caused a difference.

While investigating this, I found that IndVarSimplify can actually
optimise the case in the way we want to if it chooses the widened IV to
be 'signed' (the i32 IV is both sign and zero-extended). Oddly enough,
there is some level of indeterminism in the way the algorithm works,
it just picks the sign of the 'first' zext/sext user, where the order of
the users-iterator is not guaranteed to be the same on each invocation
of the pass (e.g. shown by first running loop-rotate, which puts the
users in a different order).

While I think the fix is valid in the sense that consistently picking
_any_ order is better than having an nondeterministic order, I can
use a bit of advice from people more familiar in this area of the
code-base.

For example, I'm not sure if this fix is hiding another issue where the
IndVarSimplify pass could actually draw the same conclusions (i.e. that
it only needs an i64 phi node) if it does a bit more work, regardless
of whether it chooses the induction variable to be signed or unsigned.

I'm also not sure if choosing signed is better than unsigned, or whether
that just happens to be beneficial only in this individual case.

Any feedback would be much appreciated!

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D112573
2021-11-16 12:41:04 +00:00
Arthur Eubanks
19867de9e7 [NewPM] Only invalidate modified functions' analyses in CGSCC passes + turn on eagerly invalidate analyses
Previously, any change in any function in an SCC would cause all
analyses for all functions in the SCC to be invalidated. With this
change, we now manually invalidate analyses for functions we modify,
then let the pass manager know that all function analyses should be
preserved since we've already handled function analysis invalidation.

So far this only touches the inliner, argpromotion, function-attrs, and
updateCGAndAnalysisManager(), since they are the most used.

This is part of an effort to investigate running the function
simplification pipeline less on functions we visit multiple times in the
inliner pipeline.

However, this causes major memory regressions especially on larger IR.
To counteract this, turn on the option to eagerly invalidate function
analyses. This invalidates analyses on functions immediately after
they're processed in a module or scc to function adaptor for specific
parts of the pipeline.

Within an SCC, if a pass only modifies one function, other functions in
the SCC do not have their analyses invalidated, so in later function
passes in the SCC pass manager the analyses may still be cached. It is
only after the function passes that the eager invalidation takes effect.
For the default pipelines this makes sense because the inliner pipeline
runs the function simplification pipeline after all other SCC passes
(except CoroSplit which doesn't request any analyses).

Overall this has mostly positive effects on compile time and positive effects on memory usage.
https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=instructions
https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=max-rss

D113196 shows that we slightly regressed compile times in exchange for
some memory improvements when turning on eager invalidation.  D100917
shows that we slightly improved compile times in exchange for major
memory regressions in some cases when invalidating less in SCC passes.
Turning these on at the same time keeps the memory improvements while
keeping compile times neutral/slightly positive.

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D113304
2021-11-15 14:44:53 -08:00
Philip Reames
8f95e915cd [unroll-runtime] Relax two profitability limitations on multi-exit unrolling
This change is mostly about getting rid of some "uninteresting" cases in a follow on deeper heuristic change.  If anyone sees actually interesting code differences out of this, please let me know.  I'm not expecting this to have much impact at all.

Case 1 - With the single deoptimize non-latch exit, we can't have two exiting blocks sharing an exit block.  We can only hit this with a poorly documented debug flag.

Case 2 - Why should we treat epilog cases differently from prolog cases?  Or to say it differently, why should starting with a constant control whether a multiple exit loop gets unrolled?

Sorry for the lack of tests here.  These are both *exceedingly* narrow cases in practice, and after a while trying, I couldn't come up with a test which did anything "useful" as opposed to simply exercise a random combination of force flags.  Note that the legality cases for each are already exercised with force flags.
2021-11-15 13:00:14 -08:00
Philip Reames
423da61835 [runtime-unroll] Inline canSafelyUnrollMultiExitLoop [NFC]
All of the interesting logic from this routine has been removed, inline the single check into the sole non-assert caller.  The assert use has little value with the restructured code and is simply dropped.
2021-11-15 11:39:07 -08:00
Stanislav Mekhanoshin
e785f4ab6a [PatternMatch] Add m_BinOp/m_c_BinOp with specific opcode
Differential Revision: https://reviews.llvm.org/D113508
2021-11-15 11:24:27 -08:00
Philip Reames
e99902a872 [runtime-unroll] Restructure if-clause to improve readability [NFC] 2021-11-15 11:13:27 -08:00
Alexey Bataev
224e46d355 [SLP][DOT][NFCI]Output all scalars for the splats, not only the first one. 2021-11-15 10:54:26 -08:00
Mehrnoosh Heidarpour
7daa95c8fa [InstCombine] Fold (A^B)|~A-->~(A&B)
https://alive2.llvm.org/ce/z/2v6rhF

Fixes:
https://llvm.org/PR52478

Differential Revision: https://reviews.llvm.org/D113783
2021-11-15 12:29:37 -05:00
Alexey Bataev
036207d5f2 [SLP]Improve splat detection.
A bunch of scalars can be treated as a splat not only if all elements
are the same but also if some of them are undefvalues.

Differential Revision: https://reviews.llvm.org/D113774
2021-11-15 07:50:34 -08:00
Alexey Bataev
b85152f8b1 [SLP][NFC]Use isa_and_nonnull and fix comment, NFC. 2021-11-15 06:49:33 -08:00
ksyx
72b5138d37 Revert "[GVN][NFC] Remove redundant check"
This reverts commit c35e8185d8c170c20e28956e0c9f3c1be895fefb.

mstorsjo reported in the revision thread that one VNCoercion assertion
is violated and seemly in relate to this commit. As per "If a test case
that demonstrates a problem is reported in the commit thread, please
revert and investigate offline", this commit is reverted.
2021-11-15 09:14:13 -05:00
Alexey Bataev
6fb5bed7d1 [SLP]Do not create unused gather nodes for scalar arguments of vector intrinsics.
If the vector intrinsic has scalar argument, we currently still create
a tree entry for this argument. This entry is not used, just consumes
resources and increases the cost of the tree.

Differential Revision: https://reviews.llvm.org/D113806
2021-11-15 06:11:19 -08:00
Sander de Smalen
f835fe8ef7 [LV] Rename blockNeedsPredication to blockNeedsPredicationForAnyReason.
The interface is a convenience function to ask if a block requires
predication when widening, but it's important that there are two
separate concepts to consider:
(A) The block was predicated in the original loop.
(B) The block was unpredicated in the original loop, but requires
    predication because of tail folding.

In the case of (B) we know that at least one lane of the vector will
be executed, which means we can implementing a load from a uniform address
with a scalar load + splat (D112552). In the case of predication because
of (A), we cannot do this, because the scalar load itself requires
predication.

The name 'blockNeedsPredication' does not make the distinction between
(A) and (B), hence the reason to rename it.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D113392
2021-11-15 08:04:20 +00:00
Kazu Hirata
feb40a3a47 [llvm] Use range-based for loops with instructions (NFC) 2021-11-14 19:40:48 -08:00
Kazu Hirata
d243cbf8ea [llvm] Use isa instead of dyn_cast (NFC) 2021-11-14 19:40:46 -08:00
Mircea Trofin
a32c2c3808 [NFC] Use Optional<ProfileCount> to model invalid counts
ProfileCount could model invalid values, but a user had no indication
that the getCount method could return bogus data. Optional<ProfileCount>
addresses that, because the user must dereference the optional. In
addition, the patch removes concept duplication.

Differential Revision: https://reviews.llvm.org/D113839
2021-11-14 19:03:30 -08:00
Kazu Hirata
7379736774 [llvm] Use range-based for loops with User::operands (NFC) 2021-11-14 09:32:38 -08:00
Kazu Hirata
098e935174 [llvm] Use range-based for loops with CallBase::args (NFC) 2021-11-14 09:32:36 -08:00
Mircea Trofin
0662a3612c [NFC][InlineFunction] Renamed some vars to conform to coding style 2021-11-14 07:26:44 -08:00
Kazu Hirata
7505b7045f [llvm] Use GetElementPtrInst::indices (NFC) 2021-11-13 21:43:28 -08:00
ksyx
c35e8185d8
[GVN][NFC] Remove redundant check
The if-check above deleted part guarantees that StoreOffset <= LoadOffset
and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that
LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows
StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0,
while it could be meaningless to have a type with nonpositive size, so that
the check could be removed.

Part of revision D100179
Reviewed By: nikic
2021-11-13 15:59:43 -05:00
Philip Reames
37ead201e6 [runtime-unroll] Use incrementing IVs instead of decrementing ones
This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing.

Why does this matter?  A couple of reasons:
* SCEV doesn't have a native subtract node.  Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such.  As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones.  (You can see this in the inferred flags in some of the test cases.)
* Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language.  We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced.  (You can see this looking at nearby phis in the test cases.)

Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen.

* Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value.  We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.
2021-11-12 15:44:58 -08:00
Philip Reames
de2fed6152 [unroll] Keep unrolled iterations with initial iteration
The unrolling code was previously inserting new cloned blocks at the end of the function.  The result of this with typical loop structures is that the new iterations are placed far from the initial iteration.

With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting.  As such, placing Count-1 copies out of line is a fairly poor code placement choice.  We'd much rather fall through into the hot (non-exiting) path.  For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code.

However, the real motivation for this change isn't performance.  Its readability and human understanding.  Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.
2021-11-12 11:40:50 -08:00
Joel E. Denny
c9dfe322ee [OpenMP] Fix main thread barrier for Pascal and amdgpu
Fixes what's left of https://bugs.llvm.org/show_bug.cgi?id=51781.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113602
2021-11-12 11:18:45 -05:00
Alexey Bataev
352c46e707 [SLP]Improve vectorization of split loads.
Need to fix ther cost estimation for split loads, since we look at the
subregs already, no need to permute them, need just to estimate
subregister insert, if it is smaller than the real register. Also, using
split loads, it might be profitable already to vectorize smaller trees
with gathering of the loads.

Differential Revision: https://reviews.llvm.org/D107188
2021-11-12 06:13:22 -08:00
Nikita Popov
986416251b [InstCombine] Drop redundant fold for and/or of icmp eq/ne (NFCI)
This handles a special case of foldAndOrOfICmpsUsingRanges()
with two equality predicates.
2021-11-11 20:25:40 +01:00
Nikita Popov
84e273cced [InstCombine] Handle undefs in and of icmp eq zero fold
For the scalar/splat case, this fold is subsumed by
foldLogOpOfMaskedICmps(). However, the conjugated fold for "or"
also supports splats with undef. Make both code paths consistent
by using m_ZeroInt() for the "and" implementation as well.

https://alive2.llvm.org/ce/z/tN63cu
https://alive2.llvm.org/ce/z/ufB_Ue
2021-11-11 19:07:07 +01:00
Nikita Popov
0242a6adf7 [InstCombine] Support splat vectors in some or of icmp folds
Replace m_ConstantInt() with m_APInt() in order to support splat
constants in addition to scalar integers.
2021-11-10 22:59:09 +01:00
Nikita Popov
861adaf2ad [InstCombine] Support splat vectors in some and of icmp folds
Replace m_ConstantInt() with m_APInt() to support splat vectors
in addition to scalar integers.
2021-11-10 22:37:54 +01:00
Nikita Popov
58ebc79a64 [InstCombine] Strip offset when folding and/or of icmps
When folding and/or of icmps, look through add of a constant and
adjust the icmp range instead. Effectively, this decomposes
X + C1 < C2 style range checks back into a normal range. This allows
us to fold comparisons involving two range checks or one range check
and some other condition. We had a fold for a really specific case
of this (or of range check and eq, and only one one side!) while
this handles it in fully generality.

Differential Revision: https://reviews.llvm.org/D113510
2021-11-10 22:01:52 +01:00
Stanislav Mekhanoshin
5731381594 [InstCombine] Relax and reorganize one use checks in the ~(a | b) & c
Since there is just a single check for LHS in ~(A | B) & C | ...
transforms and multiple RHS checks inside with more coming I am
removing m_OneUse checks for LHS and adding new checks for RHS.
This is non essential as long as there is total benefit.

In addition (~(A | B) & C) | (~(A | C) & B) --> (B ^ C) & ~A
checks were overly restrictive, it should be good without any
additional checks.

Differential Revision: https://reviews.llvm.org/D113141
2021-11-10 10:14:12 -08:00
Sanjay Patel
67299aa84f [InstCombine] add check for integer source type from cast to prevent crash
A problem was noted in the post-commit review for
c36b7e21bd8f04a44d6 / D113035 :

If the source type is not integer or integer vector,
then we could crash when trying to ComputeNumSignBits().
2021-11-10 09:44:55 -05:00
Florian Hahn
93931d78cf
[LV] Do not rely on InductionDescriptor::getCastInsts. (NFC)
Now that CastDef is passed as VPValue, there is no need to access
ID.getCastInsts, as CastDef can instead be checked.
2021-11-10 13:03:44 +00:00
Florian Hahn
e7f1232cb7
[LV] Move optimized IV recipes to phi section of header after sinking.
Unfortunately sinking recipes for first-order recurrences relies on
the original position of recipes. So if a recipes needs to be sunk after
an optimized induction, it needs to stay in the original position, until
sinking is done. This is causing PR52460.

To fix the crash, keep the recipes in the original position until
sink-after is done.

Post-commit follow-up to c45045bfd04af9 to address PR52460.
2021-11-10 11:41:08 +00:00
Kerry McLaughlin
6f16ee5e14 Revert "[LoopVectorize] Extract the last lane from a uniform store"
This reverts commit 0d748b4d32cbddf58a1ff83f3ff178ec1ad49edc.
This is causing some failures when building Spec2017 with scalable
vectors. Reverting to investigate.
2021-11-10 11:21:19 +00:00
Dmitry Makogon
62f86d4f95 Reapply 5ec2386 "Reapply db28934 "[IndVars] Pass TTI to replaceCongruentIVs""
This reverts commit 7cd273c339cfe8427404f881ae280bd9fae6ff78.

Several patches with tests fixes have been applied:
0cada82f0a30e5ae22dce66b58604ab9b47a3897 "[Test] Remove incorrect test in GVN"
97cb13615d6d9df254e3c0f3deef9eaedfe189b6 "[Test] Separate IndVars test into AArch64 and X86 parts"
985cc490f17d28b20392ee214895d947b85120ef "[Test] Remove separated test in IndVars",
and test failures caused by 5ec2386 should be resolved now.
2021-11-10 17:36:14 +07:00
Itay Bookstein
f9059efa0d [InstCombine] Extend stacksave/restore elimination
Previously, InstCombine detected a pair of llvm.stacksave/stackrestore
instructions that are adjacent modulo debug instructions in order to
eliminate the llvm.stackrestore. This precludes situations where
intervening instructions (e.g. loads) preclude the llvm.stacksave and
llvm.stackrestore from becoming adjacent. This commit extends the logic
and allows for eliminating the llvm.stackrestore when the range of
instructions between them does not include any alloca or side-effect
causing instructions.

Signed-off-by: Itay Bookstein <itay.bookstein@nextsilicon.com>

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D113105
2021-11-10 10:41:58 +02:00
Itay Bookstein
fe7491d32f [InstCombine][NFC] Refactor llvm.stackrestore handling
Hoist the instruction classification logic outside the loop
in preparation for reuse in a future commit.

Signed-off-by: Itay Bookstein <itay.bookstein@nextsilicon.com>

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D113464
2021-11-10 10:41:56 +02:00
Joseph Huber
e52937eba0 [OpenMP] Use AAAssumptionInfo to get assumptions in OpenMPOpt
This patch uses the abstract attributor introduced in D111054 to get the
assumption values instead of the `hasAssumption` function. This also
calls it so assumption information should propagate throug the device
where applicabile.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D111445
2021-11-09 17:39:21 -05:00
Joseph Huber
b8a825b483 [Attributor] Introduce AAAssumptionInfo to propagate assumptions
This patch introduces a new abstract attributor instance that propagates
assumption information from functions. Conceptually, if a function is
only called by functions that have certain assumptions, then we can
apply the same assumptions to that function. This problem is similar to
calculating the dominator set, but the assumptions are merged instead of
nodes.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D111054
2021-11-09 17:39:18 -05:00
Kostya Serebryany
b7f3a4f4fa [sancov] add tracing for loads and store
add tracing for loads and stores.

The primary goal is to have more options for data-flow-guided fuzzing,
i.e. use data flow insights to perform better mutations or more agressive corpus expansion.
But the feature is general puspose, could be used for other things too.

Pipe the flag though clang and clang driver, same as for the other SanitizerCoverage flags.
While at it, change some plain arrays into std::array.

Tests: clang flags test, LLVM IR test, compiler-rt executable test.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D113447
2021-11-09 14:35:13 -08:00
Nikita Popov
0aabdad1ef [InstCombine] Combine code for and/or of icmps (NFC)
The implementation for and/or is the same, apart from the choice
of exactIntersectWith() vs exactUnionWith(). Extract a common
function to make future extension easier.
2021-11-09 21:18:31 +01:00
Nikita Popov
bb12dedede [InstCombine] Refactor and/or of icmp with constant (NFCI)
Rather than testing for many specific combinations of predicates
and values, compute the exact icmp regions for both comparisons
and check whether they union/intersect exactly. If they do,
construct the equivalent icmp for the new range. Assuming that the
existing code handled all possible cases, this should be NFC.

Differential Revision: https://reviews.llvm.org/D113367
2021-11-09 21:05:46 +01:00
Stanislav Mekhanoshin
791baf38e1 [InstCombine] Fuse checks for LHS (~(A | B) & C) | ... NFC.
Differential Revision: https://reviews.llvm.org/D113132
2021-11-09 11:31:22 -08:00
Sanjay Patel
d5c002bdc7 [InstCombine] fix code comment to match code; NFC 2021-11-09 14:27:29 -05:00