Emit safety guards for ptr accesses when cross partition loads exist
which have a corresponding store to the same address in a different
partition. This will emit the necessary ptr checks for these accesses.
The test case was obtained from SuperTest, which SiFive runs regularly.
We enabled LoopDistribution by default in our downstream compiler, this
change was part of that enablement.
getPtrStride returns 0 when the PtrScev is loop-invariant, and this is
not an erroneous value: it returns std::nullopt to communicate that it
was not able to find a valid pointer stride. In analyzeLoop, we call
getPtrStride with a value_or(0) conflating the zero return value with
std::nullopt. Fix this, handling loop-invariant loads correctly.
If we have a pointer AddRec, the maximum increment is
2^(pointer-index-wdith - 1) - 1. This means that if incrementing the
AddRec wraps, the distance between the previously accessed location and
the wrapped location is > 2^(pointer-index-wdith - 1), i.e. if the GEP
for the AddRec is inbounds, this would be poison due to the object being
larger than half the pointer index type space. The poison would be
immediate UB when the memory access gets executed..
Similar reasoning can be applied for decrements.
PR: https://github.com/llvm/llvm-project/pull/113126
Clean up unused triple/datalayout lines, strengthen RUN lines to include
-verify-loop-info/-verify-dom-info, and regenerate tests with
UpdateTestChecks where appropriate.
Tweak the LoopDistribute debug output to be prefixed with "LDist: ", get
it to be stable, and extend update_analyze_test_checks.py trivially to
support this output.
Introduce a Loop::getLocStr stolen from LoopVectorize's static function
getDebugLocString in order to have uniform debug output headers across
LoopVectorize, LoopAccessAnalysis, and LoopDistribute. The motivation
for this change is to have UpdateTestChecks recognize the headers and
automatically generate CHECK lines for debug output, with minimal
special-casing.
Update LAA to use PSE::getSymbolicMaxBackedgeTakenCount which returns
the minimum of the countable exits.
When analyzing dependences and computing runtime checks, we need the
smallest upper bound on the number of iterations. In terms of memory
safety, it shouldn't matter if any uncomputable exits leave the loop,
as long as we prove that there are no dependences given the minimum of
the countable exits. The same should apply also for generating runtime
checks.
Note that this shifts the responsiblity of checking whether all exit
counts are computable or handling early-exits to the users of LAA.
Depends on https://github.com/llvm/llvm-project/pull/93498
PR: https://github.com/llvm/llvm-project/pull/93499
This test triple was removed earlier to fix build errors. To preserve the original test intention the triple is re-added with an explicit target requirement.
A set of microbenchmarks (https://github.com/llvm/llvm-test-suite/pull/26) showed that loop interleaving can be beneficial for loops with low trip count as well. Loop interleaving count computation is updated accordingly in prior patches while this patch removes the loop trip count threshold for interleaving.
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:
* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.
IR is now always parsed in opaque pointer mode, unless
-opaque-pointers=0 is explicitly given. There is no automatic
detection of typed pointers anymore.
The -opaque-pointers=0 option is added to any remaining IR tests
that haven't been migrated yet.
Differential Revision: https://reviews.llvm.org/D141912
At the moment, LoopAccessAnalysis is a loop analysis for the new pass
manager. The issue with that is that LAI caches SCEV expressions and
modifications in a loop may impact SCEV expressions in other loops, but
we do not have a convenient way to invalidate LAI for other loops
withing a loop pipeline.
To avoid this issue, turn it into a function analysis which returns a
manager object that keeps track of the individual LAI objects per loop.
Fixes#50940.
Fixes#51669.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D134606
This updates the naming for the LAA printing pass to be in line with
most other analysis printing passes.
The old name has come up as confusing multiple times already, e.g. in
D131924.
Alternative to D116817.
This introduces a new value-based folding interface for Or (FoldOr),
which takes 2 values and returns an existing Value or a constant if the
Or can be simplified. Otherwise nullptr is returned. This replaces the
more restrictive CreateOr which takes 2 constants.
This is the used to implement a folder that uses InstructionSimplify.
The logic to simplify `Or` instructions is moved there. Subsequent
patches are going to transition other CreateXXX to the more general
FoldXXX interface.
Reviewed By: nikic, lebedev.ri
Differential Revision: https://reviews.llvm.org/D116935
9345ab3a4550 updated generateOverflowCheck to skip creating checks that
always evaluate to false. This in turn means that we only need to
create TruncTripCount if it is actually used.
Sink the TruncTripCount creating into ComputeEndCheck, so it is only
created when there's an actual check.
9345ab3a4550 updated generateOverflowCheck to skip creating checks that
always evaluate to false. This in turn means that we only need to
compute |Step| * Trip count if the result of the multiplication is
actually used.
Sink the multiplication into ComputeEndCheck, so it is only created
when there's an actual check.
9345ab3a4550 updated generateOverflowCheck to skip creating checks that
always evaluate to false. This in turn means that we only need to check
for overflows if the result of the multiplication is actually used.
Sink the Or for the overflow check into ComputeEndCheck, so it is only
created when there's an actual check.
Unsigned compares of the form <u 0 are always false. Do not create such
a redundant check in generateOverflowCheck.
The patch introduces a new lambda to create the check, so we can
exit early conveniently and skip creating some instructions feeding the
check.
I am planning to sink a few additional instructions as follow-ups, but I
would prefer to do this separately, to keep the changes and diff
smaller.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D116811
Currently generateOverflowCheck always creates code for Step being
negative and positive, followed by a select at the end depending on
Step's sign.
This patch updates the code to only create either the checks for step
being positive or negative, if the sign is known.
Follow-up to D116696.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D116747
This patch updates SCEVExpander::expandUnionPredicate to not create
redundant 'or false, x' instructions. While those are trivially
foldable, they can be easily avoided and hinder code that checks the
size/cost of the generated checks before further folds.
I am planning on look into a few other similar improvements to code
generated by SCEVExpander.
I remember a while ago @lebedev.ri working on doing some trivial folds
like that in IRBuilder itself, but there where concerns that such
changes may subtly break existing code.
Reviewed By: reames, lebedev.ri
Differential Revision: https://reviews.llvm.org/D116696
Upon further investigation and discussion,
this is actually the opposite direction from what we should be taking,
and this direction wouldn't solve the motivational problem anyway.
Additionally, some more (polly) tests have escaped being updated.
So, let's just take a step back here.
This reverts commit f3190dedeef9da2109ea57e4cb372f295ff53b88.
This reverts commit 749581d21f2b3f53e4fca4eb8728c942d646893b.
This reverts commit f3df87d57e096143670e0fd396e81d43393a2dd2.
This reverts commit ab1dbcecd6f0969976fafd62af34730436ad5944.
Clang OpenMP codegen tests are failing.
This reverts commit 288f1f8abe5835180a0021f142043ee261ab3846.
This reverts commit cb90e5356ac1594e95fed8e208d6e0e9b6a87db1.
There's precedent for that in `CreateOr()`/`CreateAnd()`.
The motivation here is to avoid bloating the run-time check's IR
in `SCEVExpander::generateOverflowCheck()`.
Refs. https://reviews.llvm.org/D109368#3089809
While we could emit such a tautological `select`,
it will stick around until the next instsimplify invocation,
which may happen after we count the cost of this redundant `select`.
Which is precisely what happens with loop vectorization legality checks,
and that artificially increases the cost of said checks,
which is bad.
There is prior art for this in `IRBuilderBase::CreateAnd()`/`IRBuilderBase::CreateOr()`.
Refs. https://reviews.llvm.org/D109368#3089809
This simplifies the return value of addRuntimeCheck from a pair of
instructions to a single `Value *`.
The existing users of addRuntimeChecks were ignoring the first element
of the pair, hence there is not reason to track FirstInst and return
it.
Additionally all users of addRuntimeChecks use the second returned
`Instruction *` just as `Value *`, so there is no need to return an
`Instruction *`. Therefore there is no need to create a redundant
dummy `and X, true` instruction any longer.
Effectively this change should not impact the generated code because the
redundant AND will be folded by later optimizations. But it is easy to
avoid creating it in the first place and it allows more accurately
estimating the cost of the runtime checks.
SCEV does not look through non-header PHIs inside the loop. Such phis
can be analyzed by adding separate accesses for each incoming pointer
value.
This results in 2 more loops vectorized in SPEC2000/186.crafty and
avoids regressions when sinking instructions before vectorizing.
Fixes PR50296, PR50288.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D102266
This is a followup to D104662 to generate slightly nicer code for
pointer overflow checks. Bypass expandAddToGEP and instead
explicitly generate i8 GEPs. This saves some bitcasts and negates
the value in a more obvious way. In particular, this prevents SCEV
from looking through the umul.with.overflow, same as in the integer
case.
The wrapping-pointer-ni.ll test deserves a comment: Previously,
this generated a typed GEP which used the umulo argument rather
than the multiplication result. This results in more compact IR in
that case, but effectively does the multiplication twice, the
second one is just hidden in the GEP. Reusing the umulo result
seems pretty reasonable to me.
Differential Revision: https://reviews.llvm.org/D109093
We'd special cased this logic to use pointer types for non-integral pointers, but there's no reason we can't do that for all pointer types. Doing it this was has a few advantages:
a) The code itself becomes more straight forward, and easier to test.
b) We avoid introducing ptrtoint into programs which didn't have them in the source.
c) The resulting codegen is easier to analyze and simplify (mostly due to lack of ptrtoint).
Note that there are some test diffs, but a) running them through instcombine helps a ton, and b) there's enough missing obvious transforms on both before and after IR that it's clear this isn't performance sensitive.
This is mostly motivated by cleaning up mentions of non-integrals to have a clearer idea of what we actually need to support.
Differential Revision: https://reviews.llvm.org/D104662
Currently, InsertNoopCastOfTo() would implicitly insert that cast,
but now that we have SCEVPtrToIntExpr, i'm hoping we could stop
InsertNoopCastOfTo() from doing that. But first all users must be fixed.
The exit blocks of the versioned and non-versioned loops are not dedicated and thus the two loops are not in simplify form.
Insert dummy exit blocks after loop versioning with `formDedicatedExits()` to preserve the simplify form for subsequence passes.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D89569