getUpper returns 1 more than the maxium value included in the range.
This may be 0. We should not use this in a umin. Instead we should
get the maximum value included in the range and use that for the umin.
Then convert that to Upper for the new range by adding 1.
The test was manually reduced from a downstream failure, but I couldn't
get it behave exactly the same way without more instructions. It should
be enough to show an incorrect range being calculated.
Fixes#176471
Fixes#173180
The crash occurs when a vector constant refines its value during
iterative analysis.
In `SCCPInstVisitor::visitCastInst`, the logic for folding constants
through a `CastInst` uses `markConstant`. This function is strictly
designed for initial assignments and contains an assertion that prevents
a lattice element from being updated with a different constant pointer.
During the analysis of loops or complex data flows, a vector constant
may "refine." For example:
First Pass: SCCP identifies a value as `<4 x i64> {poison, poison,
poison, 0}`.
Second Pass: The value refines to `<4 x i64> zeroinitializer`.
Because these are distinct `Constant*` objects, `markConstant` triggers
a `"Marking constant with different value"` assertion failure.
The call to `markConstant` is replaced with `mergeInValue`. Unlike the
former, `mergeInValue` is lattice-aware, it allows for valid refinement
(moving from a less-defined to a more-defined state) and gracefully
handles transitions to Overdefined if a true conflict occurs. This
brings BitCast handling in line with how Trunc and Ext instructions are
already safely handled in the same pass.
Co-authored-by: Milos Poletanovic <mpoletanovic@syrmia.com>
From the review in
https://github.com/llvm/llvm-project/pull/169527#discussion_r2567122387,
there are some users where we want to extend or truncate a ConstantRange
only if it's not already the destination bitwidth. Previously this
asserted, so this PR relaxes it to just be a no-op, similar to
IRBuilder::createZExt and friends.
As noted in the reproducer provided in
https://github.com/llvm/llvm-project/issues/164762#issuecomment-3554719231,
on RISC-V after LTO we sometimes have trip counts exposed to vectorized
loops. The loop vectorizer will have generated calls to
@llvm.experimental.get.vector.length, but there are [some
properties](https://llvm.org/docs/LangRef.html#id2399) about the
intrinsic we can use to simplify it:
- The result is always less than both Count and MaxLanes
- If Count <= MaxLanes, then the result is Count
This teaches SCCP to handle these cases with the intrinsic, which allows
some single-iteration-after-LTO loops to be unfolded.
#169293 is related and also simplifies the intrinsic in InstCombine via
computeKnownBits, but it can't fully remove the loop since
computeKnownBits only does limited reasoning on recurrences.
This patch adds support for constant propagation of individual structure
members through Phi nodes. Each member is handled independently,
allowing optimization opportunities when specific members become
constant.
Also, update the testcase since we are able to optimize the call
parameter now.
PredicateInfo needs some no-op to which the predicate can be attached.
Currently this is an ssa.copy intrinsic. This PR replaces it with a
no-op bitcast.
Using a bitcast is more efficient because we don't have the overhead of
an overloaded intrinsic. It also makes things slightly simpler overall.
SCCP currently stores instructions whose lattice value has changed in a
worklist, and then updates their users in the main loop. This may result
in instructions unnecessarily being visited multiple times (as an
instruction will often use multiple other instructions). Additionally,
we'd often redundantly visit instructions that were already visited when
the containing block first became executable.
Instead, change the worklist to directly store the instructions that
need to be revisited. Additionally, do not add instructions to the
worklist that will already be covered by the main basic block walk.
This change is conceptually NFC, but is expected to produce minor
differences in practice, because the visitation order interacts with the
range widening limit.
Currently predicates are allocated on the heap and tracked with an
intrusive list. Use a bump pointer allocator instead, which is more
efficient. The list is no longer needed, as we don't have to track
predicates for freeing.
The bump pointer allocator is provided as a parameter for PredicateInfo
to allow reusing the same allocator for all functions during IPSCCP.
Following the work in PR #107279, this patch applies the annotative
DebugLocs, which indicate that a particular instruction is intentionally
missing a location for a given reason, to existing sites in the compiler
where their conditions apply. This is NFC in ordinary LLVM builds (each
function `DebugLoc::getFoo()` is inlined as `DebugLoc()`), but marks the
instruction in coverage-tracking builds so that it will be ignored by
Debugify, allowing only real errors to be reported. From a developer
standpoint, it also communicates the intentionality and reason for a
missing DebugLoc.
Some notes for reviewers:
- The difference between `I->dropLocation()` and
`I->setDebugLoc(DebugLoc::getDropped())` is that the former _may_ decide
to keep some debug info alive, while the latter will always be empty; in
this patch, I always used the latter (even if the former could
technically be correct), because the former could result in some
(barely) different output, and I'd prefer to keep this patch purely NFC.
- I've generally documented the uses of `DebugLoc::getUnknown()`, with
the exception of the vectorizers - in summary, they are a huge cause of
dropped source locations, and I don't have the time or the domain
knowledge currently to solve that, so I've plastered it all over them as
a form of "fixme".
This patch makes getMRVFunctionsTracked return a reference.
runIPSCCP, the sole user of getMRVFunctionsTracked, just needs a
read-access to the map.
The missing "&" is most likely an oversight as two "sibling" functions
getTrackedRetVals and getTrackedGlobals return maps by const
reference.
try_emplace can default-construct values, so we do not need to do so
on our own. Plus, try_emplace(Key) is much shorter than
insert(std::make_pair(Key, Value()).
If the GEP is nusw/inbounds and has all-non-negative offsets infer nuw
as well.
This doesn't have measurable compile-time impact.
Proof: https://alive2.llvm.org/ce/z/ihztLy
During inter-procedural SCCP, also infer attributes on arguments, not
just return values. This allows other non-interprocedural passes to make
use of the information later.
Don't just leave the result as unknown. I think this currently
works out thanks to undef resolution, but the correct thing to
do is set it to overdefined explicitly.
IPSCCP can currently return worse results than SCCP for arguments that
are tracked interprocedurally, because information from attributes is
not used for them.
Fix this by intersecting in the attribute information when propagating
lattice values from calls.
Add NotConstant(Null) roots for nonnull arguments and then propagate
them through nuw/inbounds GEPs.
Having this functionality in SCCP is useful because it allows reliably
eliminating null comparisons, independently of how deeply nested they
are in selects/phis. This handles cases that would hit a cutoff in
ValueTracking otherwise.
The implementation is something of a MVP, there are a number of obvious
extensions (e.g. allocas are also non-null).
This is a confusingly named helper than means "is not unknown,
undef or constant". Prefer the more obvious ValueLattice API
instead. Most of these checks are for values which are forced to
overdefined by undef resolution, in which case only actual
overdefined values are relevant.
Add preliminary support for vectors of integers by using the
`ValueLatticeElement::asConstantRange()` helper instead of a custom
implementation, and relxing various integer type checks.
This enables just the part that works automatically, e.g. icmps with a
constant vector operand aren't supported yet.
The change in ssa.copy handling is because asConstantRange() returns an
unknown LV for empty range, while SCCP's getConstantRange() returned a
full range. I've made the change to preserve the existing behavior.
When performing some range operation (e.g. and) on a constant range that
includes undef, we currently just ignore the undef value, which is
obviously incorrect. Instead, we can do one of two things:
* Say that the result range also includes undef.
* Treat undef as a full range.
This patch goes with the second approach -- I'd expect it to be a bit
better overall, e.g. it allows preserving the fact that a zext of a
range with undef isn't a full range.
Fixes https://github.com/llvm/llvm-project/issues/93096.
I'd reverted this in 6c7805d5d1 after a bad stage. Original commit
messsage follows:
[NFC][RemoveDIs] Bulk update utilities to insert with iterators
As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.
There are two general flavours of update:
* Almost all call-sites just call getIterator on an instruction
* Several make use of an existing iterator (scenarios where the code is
actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.
I've also switched DemotePHIToStack to take an optional iterator: it needs
to take an iterator, and having a no-insert-location behaviour appears to
be important. The constructors for ICmpInst and FCmpInst have been updated
too. They're the only instructions that take block _references_ rather than
pointers for certain calls, and a future patch is going to make use of
default-null block insertion locations.
All of this should be NFC.
As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.
There are two general flavours of update:
* Almost all call-sites just call getIterator on an instruction
* Several make use of an existing iterator (scenarios where the code is
actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.
I've also switched DemotePHIToStack to take an optional iterator: it needs
to take an iterator, and having a no-insert-location behaviour appears to
be important. The constructors for ICmpInst and FCmpInst have been updated
too. They're the only instructions that take block _references_ rather than
pointers for certain calls, and a future patch is going to make use of
default-null block insertion locations.
All of this should be NFC.