2128 Commits

Author SHA1 Message Date
Florian Hahn
217e0f3971
[SCEV] Add initial pattern matching for SCEV constants. (NFC) (#119389)
Add initial pattern matching for SCEV constants. Follow-up patches will
add additional matchers for various SCEV expressions.

This patch only converts a few instances to use the new matchers to make
sure everything builds as expected for now.

PR: https://github.com/llvm/llvm-project/pull/119389
2024-12-13 10:36:30 +00:00
Akshay Deodhar
82c93b6f19
[SCEV] Simplify SCEVExpr for PHI to SCEV for operand if operands are identical (#115945)
Helps SCEV analyze some special phi nodes, allowing the computation of
loop trip count in cases like the following:

https://godbolt.org/z/xGs1d81TW
2024-12-06 15:26:45 +05:30
Yingwei Zheng
f7ef0721d6
[SCEV] Do not allow refinement in the rewriting of BEValue (#117152)
See the following case:
```
; bin/opt -passes="print<scalar-evolution>" test.ll --disable-output
define i32 @widget() {
b:
  br label %b1

b1:                                              ; preds = %b5, %b
  %phi = phi i32 [ 0, %b ], [ %udiv6, %b5 ]
  %phi2 = phi i32 [ 1, %b ], [ %add, %b5 ]
  %icmp = icmp eq i32 %phi, 0
  br i1 %icmp, label %b3, label %b8

b3:                                              ; preds = %b1
  %udiv = udiv i32 10, %phi2
  %urem = urem i32 %udiv, 10
  %icmp4 = icmp eq i32 %urem, 0
  br i1 %icmp4, label %b7, label %b5

b5:                                              ; preds = %b3
  %udiv6 = udiv i32 %phi2, 0
  %add = add i32 %phi2, 1
  br label %b1

b7:                                              ; preds = %b3
  ret i32 5

b8:                                              ; preds = %b1
  ret i32 7
}
```
```
%phi2 = phi i32 [ 1, %b ], [ %add, %b5 ] -->  {1,+,1}<nuw><nsw><%b1>
%udiv6 = udiv i32 %phi2, 0 --> ({1,+,1}<nuw><nsw><%b1> /u 0)
%phi = phi i32 [ 0, %b ], [ %udiv6, %b5 ] --> ({0,+,1}<nuw><nsw><%b1> /u 0)
```
`ScalarEvolution::createAddRecFromPHI` gives a wrong SCEV result for
`%phi`:

d7d6fb1804/llvm/lib/Analysis/ScalarEvolution.cpp (L5926-L5950)
It converts `phi(0, ({1,+,1}<nuw><nsw><%b1> /u 0))` into `phi(0 / 0,
({1,+,1}<nuw><nsw><%b1> /u 0))`. Then it simplifies the expr into
`{0,+,1}<nuw><nsw><%b1> /u 0`.

As we did in
acd700a24b,
this patch disallows udiv simplification if we cannot prove that the
denominator is a well-defined non-zero value.

Fixes https://github.com/llvm/llvm-project/issues/117133.
2024-12-01 20:11:09 +08:00
Yingwei Zheng
458dfbd855
[SCEV] Fix sext handling for getConstantMultiple (#117093)
Counterexample: 219 is a multiple of 73. But `sext i8 219 to i16 =
65499` is not.
Fixes https://github.com/llvm/llvm-project/issues/116483.
2024-11-21 17:23:04 +08:00
Ramkumar Ramachandra
2b5214b9e1
IR: de-duplicate two CmpInst routines (NFC) (#116866)
De-duplicate the functions getSignedPredicate and getUnsignedPredicate,
nearly identical versions of which were present in CmpInst and ICmpInst,
creating less confusion.
2024-11-20 09:30:35 +00:00
Florian Hahn
feb9b3701b
[SCEV] Address post-commit comments for #113915.
Address post-commit comments for
https://github.com/llvm/llvm-project/pull/113915.
2024-11-17 19:31:56 +00:00
Julian Nagele
7c8e05aa45
[SCEV] Collect and merge loop guards through PHI nodes with multiple incoming values (#113915)
This patch aims to strengthen collection of loop guards by processing
PHI nodes with multiple incoming values as follows: collect guards for
all incoming values/blocks and try to merge them into a single one for
the PHI node.

The goal is to determine tighter bounds on the trip counts of scalar
tail loops after vectorization, helping to avoid unnecessary transforms.
In particular we'd like to avoid vectorizing scalar tails of
hand-vectorized loops, for example in
[Transforms/PhaseOrdering/X86/pr38280.ll](231e03ba7e/llvm/test/Transforms/PhaseOrdering/X86/pr38280.ll),
discovered via https://github.com/llvm/llvm-project/pull/108190

Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=a55248789ed3f653740e0723d016203b9d585f26&to=500e4c46e79f60b93b11a752698c520e345948e3&stat=instructions:u

PR: https://github.com/llvm/llvm-project/pull/113915
2024-11-15 10:03:08 +00:00
Yingwei Zheng
0b9f1cc024
[SCEV] Disallow simplifying phi(undef, X) to X (#115109)
See the following case:
```
@GlobIntONE = global i32 0, align 4

define ptr @src() {
entry:
  br label %for.body.peel.begin

for.body.peel.begin:                              ; preds = %entry
  br label %for.body.peel

for.body.peel:                                    ; preds = %for.body.peel.begin
  br i1 true, label %cleanup.peel, label %cleanup.loopexit.peel

cleanup.loopexit.peel:                            ; preds = %for.body.peel
  br label %cleanup.peel

cleanup.peel:                                     ; preds = %cleanup.loopexit.peel, %for.body.peel
  %retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ]
  br i1 true, label %for.body.peel.next, label %cleanup7

for.body.peel.next:                               ; preds = %cleanup.peel
  br label %for.body.peel.next1

for.body.peel.next1:                              ; preds = %for.body.peel.next
  br label %entry.peel.newph

entry.peel.newph:                                 ; preds = %for.body.peel.next1
  br label %for.body

for.body:                                         ; preds = %cleanup, %entry.peel.newph
  %retval.0 = phi ptr [ %retval.2.peel, %entry.peel.newph ], [ %retval.2, %cleanup ]
  br i1 false, label %cleanup, label %cleanup.loopexit

cleanup.loopexit:                                 ; preds = %for.body
  br label %cleanup

cleanup:                                          ; preds = %cleanup.loopexit, %for.body
  %retval.2 = phi ptr [ %retval.0, %for.body ], [ @GlobIntONE, %cleanup.loopexit ]
  br i1 false, label %for.body, label %cleanup7.loopexit

cleanup7.loopexit:                                ; preds = %cleanup
  %retval.2.lcssa.ph = phi ptr [ %retval.2, %cleanup ]
  br label %cleanup7

cleanup7:                                         ; preds = %cleanup7.loopexit, %cleanup.peel
  %retval.2.lcssa = phi ptr [ %retval.2.peel, %cleanup.peel ], [ %retval.2.lcssa.ph, %cleanup7.loopexit ]
  ret ptr %retval.2.lcssa
}

define ptr @tgt() {
entry:
  br label %for.body.peel.begin

for.body.peel.begin:                              ; preds = %entry
  br label %for.body.peel

for.body.peel:                                    ; preds = %for.body.peel.begin
  br i1 true, label %cleanup.peel, label %cleanup.loopexit.peel

cleanup.loopexit.peel:                            ; preds = %for.body.peel
  br label %cleanup.peel

cleanup.peel:                                     ; preds = %cleanup.loopexit.peel, %for.body.peel
  %retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ]
  br i1 true, label %for.body.peel.next, label %cleanup7

for.body.peel.next:                               ; preds = %cleanup.peel
  br label %for.body.peel.next1

for.body.peel.next1:                              ; preds = %for.body.peel.next
  br label %entry.peel.newph

entry.peel.newph:                                 ; preds = %for.body.peel.next1
  br label %for.body

for.body:                                         ; preds = %cleanup, %entry.peel.newph
  br i1 false, label %cleanup, label %cleanup.loopexit

cleanup.loopexit:                                 ; preds = %for.body
  br label %cleanup

cleanup:                                          ; preds = %cleanup.loopexit, %for.body
  br i1 false, label %for.body, label %cleanup7.loopexit

cleanup7.loopexit:                                ; preds = %cleanup
  %retval.2.lcssa.ph = phi ptr [ %retval.2.peel, %cleanup ]
  br label %cleanup7

cleanup7:                                         ; preds = %cleanup7.loopexit, %cleanup.peel
  %retval.2.lcssa = phi ptr [ %retval.2.peel, %cleanup.peel ], [ %retval.2.lcssa.ph, %cleanup7.loopexit ]
  ret ptr %retval.2.lcssa
}
```
1. `simplifyInstruction(%retval.2.peel)` returns `@GlobIntONE`. Thus,
`ScalarEvolution::createNodeForPHI` returns SCEV expr `@GlobIntONE` for
`%retval.2.peel`.
2. `SimplifyIndvar::replaceIVUserWithLoopInvariant` tries to replace the
use of `%retval.2.peel` in `%retval.2.lcssa.ph` with `@GlobIntONE`.
3. `simplifyLoopAfterUnroll -> simplifyLoopIVs -> SCEVExpander::expand`
reuses `%retval.2.peel = phi ptr [ undef, %for.body.peel ], [
@GlobIntONE, %cleanup.loopexit.peel ]` to generate code for
`@GlobIntONE`. It is incorrect.

This patch disallows simplifying `phi(undef, X)` to `X` by setting
`CanUseUndef` to false.
Closes https://github.com/llvm/llvm-project/issues/114879.
2024-11-07 15:53:51 +08:00
Kazu Hirata
b47849b4cb
[SCEV] Avoid repeated hash lookups (NFC) (#112656) 2024-10-17 07:46:32 -07:00
Nikita Popov
255a99c29f
[APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (#80309)
This fixes all the places that hit the new assertion added in
https://github.com/llvm/llvm-project/pull/106524 in tests. That is,
cases where the value passed to the APInt constructor is not an N-bit
signed/unsigned integer, where N is the bit width and signedness is
determined by the isSigned flag.

The fixes either set the correct value for isSigned, set the
implicitTrunc flag, or perform more calculations inside APInt.

Note that the assertion is currently still disabled by default, so this
patch is mostly NFC.
2024-10-17 08:48:08 +02:00
Rahul Joshi
6924fc0326
[LLVM] Add Intrinsic::getDeclarationIfExists (#112428)
Add `Intrinsic::getDeclarationIfExists` to lookup an existing
declaration of an intrinsic in a `Module`.
2024-10-16 07:21:10 -07:00
Florian Hahn
7f06d8afb0
[SCEV] Retain SCEVSequentialMinMaxExpr if an operand may trigger UB. (#110824)
Retain SCEVSequentialMinMaxExpr if an operand may trigger UB, e.g. if
there is an UDiv operand that may divide by 0 or poison

PR: https://github.com/llvm/llvm-project/pull/110824
2024-10-14 13:08:49 +01:00
David Sherwood
72f339de45
[LoopVectorize] Use predicated version of getSmallConstantMaxTripCount (#109928)
There are a number of places where we call getSmallConstantMaxTripCount
without passing a vector of predicates:

getSmallBestKnownTC
isIndvarOverflowCheckKnownFalse
computeMaxVF
isMoreProfitable

I've changed all of these to now pass in a predicate vector so that
we get the benefit of making better vectorisation choices when we
know the max trip count for loops that require SCEV predicate checks.

I've tried to add tests that cover all the cases affected by these
changes.
2024-10-11 10:10:15 +01:00
Kazu Hirata
0614b3cfac
[Analysis] Simplify code with DenseMap::operator[] (NFC) (#111331) 2024-10-07 07:00:45 -07:00
Florian Hahn
6022a3a05f
[SCEV] Store predicates for EL/ENT in SmallVector.
Store predicates in ExitLimit and ExitNotTaken in a SmallVector instead
of a SmallPtrSet. This guarantees the predicates can be iterated on in a
predictable manner. This ensures the predicates can be printed and
generated in a predictable order.

This shifts de-duplication of predicates to construction time for
ExitLimit. ExitNotTaken just takes predicates from ExitLimit, so they
should also be free of duplicates.

This was exposed by 2f7ccaf4a8565628a4c7d2b5a49bb45478940be6
(https://github.com/llvm/llvm-project/pull/108777).

Should fix https://lab.llvm.org/buildbot/#/builders/110/builds/1494.
2024-09-28 21:24:17 +01:00
Florian Hahn
2f7ccaf4a8
[SCEV] Add predicate in SolveLinEq to ensure B is a multiple of A. (#108777)
This can help in cases where pointer alignment info is missing, e.g.
https://github.com/llvm/llvm-project/pull/108210

The predicate is formed for the complex expression that's passed to
SolveLinEquationWithOverflow and the checks could probably be pushed
closer to the root nodes, which in some cases may be cheaper to check.


PR: https://github.com/llvm/llvm-project/pull/108777
2024-09-28 14:19:57 +01:00
Jeremy Morse
056a3f4673 [NFC] Reapply 3f37c517f, SmallDenseMap speedups
This time with 100% more building unit tests. Original commit message follows.

[NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (#109417)

If we use SmallDenseMaps instead of DenseMaps at these locations,
we get a substantial speedup because there's less spurious malloc
traffic. Discovered by instrumenting DenseMap with some accounting
code, then selecting sites where we'll get the most bang for our buck.
2024-09-26 10:49:29 +01:00
Jeremy Morse
817e742ba5 Revert "[NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (#109417)"
This reverts commit 3f37c517fbc40531571f8b9f951a8610b4789cd6.

Lo and behold, I missed a unit test
2024-09-25 14:31:30 +01:00
Jeremy Morse
3f37c517fb
[NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (#109417)
If we use SmallDenseMaps instead of DenseMaps at these locations,
we get a substantial speedup because there's less spurious malloc
traffic. Discovered by instrumenting DenseMap with some accounting
code, then selecting sites where we'll get the most bang for our buck.
2024-09-25 14:22:23 +01:00
David Sherwood
02ee96eca9
[Analysis] Teach isDereferenceableAndAlignedInLoop about SCEV predicates (#106562)
Currently if a loop contains loads that we can prove at compile time
are dereferenceable when certain conditions are satisfied the function
isDereferenceableAndAlignedInLoop will still return false because
getSmallConstantMaxTripCount will return 0 when SCEV predicates
are required. This patch changes getSmallConstantMaxTripCount to take
an optional Predicates pointer argument so that we can permit
functions such as isDereferenceableAndAlignedInLoop to consider more
cases.
2024-09-23 09:56:37 +01:00
Jay Foad
e03f427196
[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133)
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
2024-09-19 16:16:38 +01:00
Florian Hahn
4e6ec0bf6d
[SCEV] Replace redundant !Preds.empty() check with assert. (NFCI)
If there are no predicates, the predicated counts should not be
different to the non-predicated ones.
2024-09-19 13:53:30 +01:00
David Sherwood
270ee6549c
[Analysis][NFC] Clean-up in ScalarEvolution when copying predicates (#108851)
There are a few places in ScalarEvolution.cpp where we copy predicates
from one list to another and they have a similar pattern:

  for (const auto *P : ENT.Predicates)
    Predicates->push_back(P);

We can avoid the loop by writing them like this:

  Predicates->append(ENT.Predicates.begin(), ENT.Predicates.end());

which may end up being more efficient since we only have to try
reserving more space once.
2024-09-17 10:28:24 +01:00
Antonio Frighetto
e80f48986c [SCEV] BECount to zero if ((-C + (C smax %x)) /u %x), C > 0 holds
The SCEV expression `((-C + (C smax %x)) /u %x)` can be folded
to zero for any positive constant C.

Proof: https://alive2.llvm.org/ce/z/_dLm8C.
2024-09-05 17:01:56 +02:00
David Sherwood
df3d70b5a7
[Analysis] Add getPredicatedExitCount to ScalarEvolution (#105649)
Due to a reviewer request on PR #88385 I have created this patch
to add a getPredicatedExitCount function, which is similar to
getExitCount except that it uses the predicated backedge taken
information. With PR #88385 we will start to care about more
loops with multiple exits, and want the ability to query exit
counts for a particular exiting block. Such loops may require
predicates in order to be vectorised.

New tests added here:

Analysis/ScalarEvolution/predicated-exit-count.ll
2024-09-02 14:05:26 +01:00
David Sherwood
0caa909a3c
[Analysis][NFC] Use SmallVectorImpl consistently in ScalarEvolution (#105663)
Use SmallVectorImpl instead of SmallVector for function arguments
to give the caller greater flexibility in choice of initial size.
2024-08-27 09:29:14 +01:00
David Sherwood
d46812a7be
[Analysis] Teach ScalarEvolution::getRangeRef about more dereferenceable objects (#104778)
Whilst dealing with review comments on

https://github.com/llvm/llvm-project/pull/96752

I discovered that SCEV does not know about the dereferenceable attribute
on function arguments so I have updated getRangeRef to make use of it
by calling getPointerDereferenceableBytes.
2024-08-22 14:45:14 +01:00
Nikita Popov
6a84af704f
[LAA] Use computeConstantDifference() (#103725)
Use computeConstantDifference() instead of casting getMinusSCEV() to
SCEVConstant. This can be much faster in some cases, because
computeConstantDifference() computes the result without creating new
SCEV expressions.

This improves LTO/ThinLTO compile-time for lencod by more than 10%.

I've verified that computeConstantDifference() does not produce worse
results than the previous code for anything in llvm-test-suite. This
required raising the iteration cutoff to 6. I ended up increasing it to
8 just to be on the safe side (for code outside llvm-test-suite), and
because this doesn't materially affect compile-time anyway (we'll almost
always bail out earlier).
2024-08-16 12:52:57 +02:00
Kazu Hirata
1115dee248
[Analysis] Use range-based for loops (NFC) (#103540) 2024-08-14 08:23:04 -07:00
Nikita Popov
6da3361f50
[SCEV] Look through multiply in computeConstantDifference() (#103051)
Inside computeConstantDifference(), handle the case where both sides are
of the form `C * %x`, in which case we can strip off the common
multiplication (as long as we remember to multiply by it for the
following difference calculation).

There is an obvious alternative implementation here, which would be to
directly decompose multiplies inside the "Multiplicity" accumulation.
This does work, but I've found this to be both significantly slower
(because everything has to work on APInt) and more complex in
implementation (e.g. because we now need to match back the new More/Less
with an arbitrary factor) without providing more power in practice. As
such, I went for the simpler variant here.

This is the last step to make computeConstantDifference() sufficiently
powerful to replace existing uses of
`cast<SCEVConstant>(getMinusSCEV())` with it.
2024-08-14 09:37:38 +02:00
Jie Fu
b7ebb67b86 [SCEV] Fix -Wrange-loop-construct (NFC)
/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:12009:21:
 error: loop variable '[S, Mul]' creates a copy from type 'const value_type' (aka 'const llvm::detail::DenseMapPair<const llvm::SCEV *, int>') [-Werror,-Wrange-loop-construct]
    for (const auto [S, Mul] : Multiplicity) {
                    ^
/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:12009:10:
 note: use reference type 'const value_type &' (aka 'const llvm::detail::DenseMapPair<const llvm::SCEV *, int> &') to prevent copying
    for (const auto [S, Mul] : Multiplicity) {
         ^~~~~~~~~~~~~~~~~~~~~
                    &
2024-08-13 17:13:04 +08:00
Nikita Popov
306b9c7b48
[SCEV] Handle more add/addrec mixes in computeConstantDifference() (#101999)
computeConstantDifference() can currently look through addrecs with
identical steps, and then through adds with identical operands (apart
from constants).

However, it fails to handle minor variations, such as two nested add
recs, or an outer add with an inner addrec (rather than the other way
around).

This patch supports these cases by adding a loop over the
simplifications, limited to a small number of iterations. The motivation
is the same as in #101339, to make
computeConstantDifference() powerful enough to replace existing uses of
`dyn_cast<SCEVConstant>(getMinusSCEV())` with it. Though as the IR test
diff shows, other callers may also benefit.
2024-08-13 11:01:39 +02:00
Philip Reames
b812e57ac3
[SCEV] Consolidate code for proving wrap flags of controlling finite IVs (#101404)
The canAssumeNoSelfWrap routine in howManyLessThans was doing two subtly
inter-related things. First, it was proving no-self-wrap. This exactly
duplicates the existing logic in the caller. Second, it was establishing
the precondition for the nw->nsw/nuw inference. Specifically, we need to
know that *this* exit must be taken for the inference to be sound.
Otherwise, another (possible abnormal) exit could be taken in the
iteration where this IV would become poison.

This change moves all of that logic into the caller, and caches the
resulting nuw/nsw flags in the AddRec. This centralizes the logic in one
place, and makes it clear that it all depends on controlling the sole
exit.

We do loose a couple cases with SCEV predication. Specifically, if SCEV
predication was able to convert e.g. zext(addrec) into an addrec(zext)
using predication, but didn't record the nuw fact on the new addrec,
then the consuming code can no longer fix this up. I don't think this
case particularly matters.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2024-08-12 11:40:24 -07:00
Nikita Popov
3512bcc2e9 [SCEV] Fix incorrect extension in computeConstantDifference()
The Mul factor was zero-extended here, resulting in incorrect
results for integers larger than 64-bit.

As we currently only multiply by 1 or -1, just split this into
two cases -- there's no need for a full multiplication here.

Fixes https://github.com/llvm/llvm-project/issues/102597.
2024-08-12 15:24:37 +02:00
Kazu Hirata
b728f37121
[Analysis] Use llvm::set_is_subset (NFC) (#102766) 2024-08-10 16:33:41 -07:00
Kazu Hirata
e9a47a664a
[llvm] Construct SmallVector with ArrayRef (NFC) (#102712)
Without this patch, the constructor arguments come from
SmallVectorImpl, not ArrayRef.  This patch switches them to ArrayRef
so that we can construct SmallVector with a single argument.

Note that LLVM Programmer’s Manual prefers ArrayRef to SmallVectorImpl
for flexibility.
2024-08-09 21:39:13 -07:00
Nikita Popov
a9eb3fd79b [SCEV] Fix warning (NFC)
Produces -Wrange-loop-construct on some buildbots.
2024-08-02 14:48:29 +02:00
Nikita Popov
79af6892f8
[SCEV] Handle more adds in computeConstantDifference() (#101339)
Currently it only deals with the case where we're subtracting adds with
at most one non-constant operand. This patch extends it to cancel out
common operands for the subtraction of arbitrary add expressions.

The background here is that I want to replace a getMinusSCEV() call in
LAA with computeConstantDifference():

93fecc2577/llvm/lib/Analysis/LoopAccessAnalysis.cpp (L1602-L1603)

This particular call is very expensive in some cases (e.g. lencod with
LTO) and computeConstantDifference() could achieve this much more
cheaply, because it does not need to construct new SCEV expressions.

However, the current computeConstantDifference() implementation is too
weak for this and misses many basic cases. This is a step towards making
it more powerful while still keeping it pretty fast.
2024-08-02 13:43:02 +02:00
Nikita Popov
85c5265fea
[SCEV] Unify and optimize constant folding (NFC) (#101473)
Add a common constantFoldAndGroupOps() helper that takes care of
constant folding and grouping transforms that are common to all nary
ops. This moves the constant folding prior to grouping, which is more
efficient, and excludes any constant from the sort.

The constant folding has hooks for folding, identity constants and
absorber constants.

This gives a compile-time improvement for SCEV-heavy workloads like
lencod.
2024-08-02 09:55:02 +02:00
Philip Reames
f0944f4be0
[SCEV] Prove no-self-wrap from negative power of two step (#101416)
We have existing code which reasons about a step evenly dividing the
iteration space is a finite loop with a single exit implying
no-self-wrap. The sign of the step doesn't effect this.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2024-08-01 13:17:07 -07:00
Philip Reames
7583c484c8
[SCEV] Use power of two facts involving vscale when inferring wrap flags (#101380)
SCEV has logic for inferring wrap flags on AddRecs which are known to
control an exit based on whether the step is a power of two. This logic
only considered constants, and thus did not trigger for steps such as (4
x vscale) which are common in scalably vectorized loops.

The net effect is that we were very sensative to the preservation of
nsw/nuw flags on such IVs, and could not infer trip counts if they got
lost for any reason.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2024-07-31 14:18:20 -07:00
Nikita Popov
8d28d448e3 [SCEV] Fix outdated comment (NFC)
The EqCache parameter has been removed.
2024-07-30 14:37:52 +02:00
Johannes Reifferscheid
65697b1c7c
Remove value cache in SCEV comparator. (#100721)
The cache triggers almost never, and seems unlikely to help with
performance. However, when it does, it is likely to cause the comparator
to become inconsistent due to a bad interaction of the depth limit and
cache hits. This leads to crashes in debug builds. See the new unit test
for a reproducer.
2024-07-30 14:27:37 +02:00
Nikita Popov
5d833ee6ac [SCEV] Avoid unnecessary computeConstantDifference() call (NFC)
No need to do the second one if the first one already failed.
2024-07-30 14:04:43 +02:00
v01dXYZ
cff8d716bd
[SCEV] forgetValue: support (with-overflow-inst op0, op1) (#98015)
The use-def walk in forgetValue() was skipping instructions with
non-SCEVable types. However, SCEV may look past with.overflow
intrinsics returning aggregates.

Fixes #97586.
2024-07-09 09:14:33 +02:00
Florian Hahn
2f89d4a8c7
[SCEV] Split collecting and applying rewrite info from loop guards (NFC) (#97316)
Introduce a new LoopGuards class to track info from loop guards and split 
off collecting rewrite info to LoopGuards::collect. This allows users of 
applyLoopGuards to collect rewrite info once in cases where the same
loop guards are applied multiple times.

This is used to collect rewrite info once in howFarToZero, which saves a
bit of compile-time:
stage1-O3: -0.04%
stage1-ReleaseThinLTO: -0.02%
stage1-ReleaseLTO-g: -0.04%
stage2-O3: -0.02%

https://llvm-compile-time-tracker.com/compare.php?from=117b53ae38428ca66eaa886fb432e6f09db88fe4&to=4ffb7b2e1c99081ccebe6f236c48a0be2f64b6ff&stat=instructions:u

Notably this improves mafft by -0.9% with -O3, -0.11% with LTO and
-0.12% with stage2-O3.

PR: https://github.com/llvm/llvm-project/pull/97316
2024-07-02 19:32:28 +01:00
Nikita Popov
d23959b2f5 [SCEV] Cache DataLayout in class (NFC)
PR #96919 caused a minor compile-time regression, mostly because
SCEV now goes through an extra out-of-line function to fetch the
data layout, and does this a lot. Cache the DataLayout in SCEV
to avoid these repeated calls.
2024-06-28 12:19:27 +02:00
vaibhav
7e59b20034
[SCEV] Support addrec in right hand side in howManyLessThans (#92560)
Fixes #92554 (std::reverse will auto-vectorize now)

When calculating number of times a exit condition containing a
comparison is executed, we mostly assume that RHS of comparison should
be loop invariant, but it may be another add-recurrence.

~In that case, we can try the computation with `LHS = LHS - RHS` and
`RHS = 0`.~ (It is not valid unless proven that it doesn't wrap)

**Edit:**
We can calculate back edge count for loop structure like:

```cpp
left = left_start
right = right_start
while(left < right){
  // ...do something...
  left += s1; // the stride of left is s1 (> 0)
  right -= s2; // the stride of right is -s2 (s2 > 0)
}
// left and right converge somewhere in the middle of their start values
```
We can calculate the backedge-count as ceil((End - left_start) /u (s1-
(-s2)) where, End = max(left_start, right_start).

**Alive2**: https://alive2.llvm.org/ce/z/ggxx58
2024-06-25 14:33:57 -07:00
Kazu Hirata
1462605ab0
[Analysis] Use range-based for loops (NFC) (#96587) 2024-06-25 06:57:30 -07:00
Mohammed Keyvanzadeh
7b57a1b401
[llvm] format and terminate namespaces with closing comment (#94917)
Namespaces are terminated with a closing comment in the majority of the
codebase so do the same here for consistency. Also format code within
some namespaces to make clang-format happy.
2024-06-21 23:50:53 +03:30