It's common to delete some instructions after using SCEVExpander,
while it is still live (but will not be used afterwards). In that
case, the AssertingVH may trigger. Replace it with a PoisoningVH
so that we only detect the case where the SCEVExpander actually is
used in a problematic fashion after the instruction removal.
The alternative would be to add clear() calls to more code paths.
Fixes https://github.com/llvm/llvm-project/issues/83404.
This patch trivially extends support for DbgValueInst recovery to
DPValues in LoopStrengthReduce; they are handled identically, so this is
mostly done by reusing the DbgValueInst code (using templates or
auto-parameter lambdas to reduce actual code duplication).
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.
One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.
Differential Revision: https://reviews.llvm.org/D141060
When normalizing a SCEV expression during expansion, there should be
no need for it to be invertible, as it will only be used for code
generation. This fixes a crash after 7f5b15ad150e.
Fixes https://github.com/llvm/llvm-project/issues/63678.
Move the logic added in 3a57152d85e1 to normalizeForPostIncUse to catch
additional un-invertable cases. This fixes another mis-compile pointed
out by @peixin in D153004.
getExpr is missing a check to make sure the result is invertible.
This can lead to incorrect results, so return nullptr in those cases
like in other places in IVUsers.
Fixes#62660.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D153202
This reverts the revert commit 1797ab36efc9c90c921cd725831f8c3f6a7125a2.
The recommitted version now checks the PostIncLoopSets for all fixups
and returns nullptr if the result doesn't match for all fixups.
This reverts commit abfeda5af329b5889db709ff74506e20e0b569e9.
and fe19036e1266d2a90b44725c82b898134906e4c3.
The added assertion triggers during clang bootstrap builds. Revert while
I investigate.
GenerateTruncates at the moment creates extends/truncates for post-inc
uses of normalized expressions. For example, if an add rec of the form
{1,+,-1} is used outside the loop, the normalized form will use {1,+,-1}
instead of {0,+,-1}. When naively sign-extending the normalized
expression, it will get extended incorrectly to {1,+,-1} for the wider
type, if the backedge-taken count of the loop is 1.
To address this, the patch updates GenerateTruncates to check if the
LSRUse contains any fixups with PostIncLoops. If that's the case, first
de-normalize the expression, then perform the extend/truncate, then
normalize again.
There may be other places where similar checks are needed and the helper
can be generalized for those cases. I'd not be surprised if other subtle
mis-compiles are caused by this.
Fixes#38847.
Fixes#58039.
Fixes#62852.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D153004
This is an attempt to reland D42600 and enabling this optimisation by default.
This also resolves the issue pointed out in the context of PGO build.
Differential Revision: https://reviews.llvm.org/D42600
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
This reverts commit 1ddfd1c8186735c62b642df05c505dc4907ffac4.
The original commit causes a Chrome build assertion failure with
ThinLTO: https://crbug.com/1443635
This patch splits a restore point to allow it to only post-dominate blocks reachable by use
or def of CSRs(Callee Saved Registers)/FI(Frame Index).
Benchmarking this on SPEC2017, this gives around 4% improvement on povray and no significant change
for others.
Co-authored-by: junbuml
Differential Revision: https://reviews.llvm.org/D42600
Fixes https://github.com/llvm/llvm-project/issues/61182.
LoopStrengthReduce may sometimes break LCSSA form when applying a rewrite
for an instruction used in a PHI.
It happens if:
- The PHI is in a loop exit block,
- The edge from the corresponding exiting block to that exit is critical,
- The PHI has at least two inputs coming from loop blocks,
- and the rewritten instruction is inserted in the loop.
In such case we split the critical edge and then replace PHI inputs
with the rewritten instruction. However ExitBlock is no longer
a loop exit, so LCSSA form is broken.
This patch fixes it by collecting all inserted instructions for PHIs
whose parent block is not a loop exit and then forming LCSSA for them.
Differential Revision: https://reviews.llvm.org/D146811
IR is now always parsed in opaque pointer mode, unless
-opaque-pointers=0 is explicitly given. There is no automatic
detection of typed pointers anymore.
The -opaque-pointers=0 option is added to any remaining IR tests
that haven't been migrated yet.
Differential Revision: https://reviews.llvm.org/D141912
This patch relaxes the restriction that both reassociate operands must
be in the same block as the root instruction.
The comment indicates that the reason for this restriction was that the
operands not in the same block won't have a depth in the trace.
I believe this is outdated; if the operand is in a different block, it
must dominate the current block (otherwise it would need to be phi),
which in turn means the operand's block must be included in the current
rance, and depths must be available.
There's a test case (no_reassociate_different_block) added in
70520e2f1c5fc4 which shows that we have accurate depths for operands
defined in other blocks.
This allows reassociation of code that computes the final reduction
value after vectorization, among other things.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D141302
This stops reporting CostPerUse 1 for `R8`-`R15` and `XMM8`-`XMM31`.
This was previously done because instruction encoding require a REX
prefix when using them resulting in longer instruction encodings. I
found that this regresses the quality of the register allocation as the
costs impose an ordering on eviction candidates. I also feel that there
is a bit of an impedance mismatch as the actual costs occure when
encoding instructions using those registers, but the order of VReg
assignments is not primarily ordered by number of Defs+Uses.
I did extensive measurements with the llvm-test-suite wiht SPEC2006 +
SPEC2017 included, internal services showed similar patterns. Generally
there are a log of improvements but also a lot of regression. But on
average the allocation quality seems to improve at a small code size
regression.
Results for measuring static and dynamic instruction counts:
Dynamic Counts (scaled by execution frequency) / Optimization Remarks:
Spills+FoldedSpills -5.6%
Reloads+FoldedReloads -4.2%
Copies -0.1%
Static / LLVM Statistics:
regalloc.NumSpills mean -1.6%, geomean -2.8%
regalloc.NumReloads mean -1.7%, geomean -3.1%
size..text mean +0.4%, geomean +0.4%
Static / LLVM Statistics:
mean -2.2%, geomean -3.1%) regalloc.NumSpills
mean -2.6%, geomean -3.9%) regalloc.NumReloads
mean +0.6%, geomean +0.6%) size..text
Static / LLVM Statistics:
regalloc.NumSpills mean -3.0%
regalloc.NumReloads mean -3.3%
size..text mean +0.3%, geomean +0.3%
Differential Revision: https://reviews.llvm.org/D133902
Now that SimpleLoopUnswitch and other transforms no longer introduce
branch on poison, enable the -branch-on-poison-as-ub option by
default. The practical impact of this is mostly better flag
preservation in SCEV, and some freeze instructions no longer being
necessary.
Differential Revision: https://reviews.llvm.org/D125299
Relative to the previous landing attempt, this introduces an additional
flag on forgetMemoizedResults() to not remove SCEVUnknown phis from
the value map. The invalidation after BECount calculation wants to
leave these alone and skips them in its own use-def walk, but we can
still end up invalidating them via forgetMemoizedResults() if there
is another IR value with the same SCEV. This is intended as a temporary
workaround only, and the need for this should go away once the
getBackedgeTakenInfo() invalidation is refactored in the spirit of
D114263.
-----
This adds validation for consistency of ValueExprMap and
ExprValueMap, and fixes identified issues:
* Addrec construction directly wrote to ValueExprMap in a few places,
without updating ExprValueMap. Add a helper to ensures they stay
consistent. The adjustment in forgetSymbolicName() explicitly
drops the old value from the map, so that we don't rely on it
being overwritten.
* forgetMemoizedResultsImpl() was dropping the SCEV from
ExprValueMap, but not dropping the corresponding entries from
ValueExprMap.
Differential Revision: https://reviews.llvm.org/D113349
Relative to the previous landing attempt, this makes
insertValueToMap() resilient against the value already being
present in the map -- previously I only checked this for the
createSimpleAffineAddRec() case, but the same issue can also
occur for the general createNodeForPHI(). In both cases, the
addrec may be constructed and added to the map in a recursive
query trying to create said addrec. In this case, this happens
due to the invalidation when the BE count is computed, which
ends up clearing out the symbolic name as well.
-----
This adds validation for consistency of ValueExprMap and
ExprValueMap, and fixes identified issues:
* Addrec construction directly wrote to ValueExprMap in a few places,
without updating ExprValueMap. Add a helper to ensures they stay
consistent. The adjustment in forgetSymbolicName() explicitly
drops the old value from the map, so that we don't rely on it
being overwritten.
* forgetMemoizedResultsImpl() was dropping the SCEV from
ExprValueMap, but not dropping the corresponding entries from
ValueExprMap.
Differential Revision: https://reviews.llvm.org/D113349
This fixes a violation of the wrap flag rules introduced in c4048d8f. This was also noted in the (very old) PR23527.
The issue being fixed is that we assume the inbound flag on any GEP assumes that all users of *any* gep (or add) which happens to map to that SCEV would also be UB if the (other) gep overflowed. That's simply not true.
In terms of the test diffs, I don't see anything seriously problematic. The lost flags are expected (given the semantic restriction on when its legal to tag the SCEV), and there are several cases where the previously inferred flags are unsound per the new semantics.
The only common trend I noticed when looking at the deltas is that by not considering branch on poison as immediate UB in ValueTracking, we do miss a few cases we could reclaim. We may be able to claw some of these back with the follow ideas mentioned in PR51817.
It's worth noting that most of the changes are analysis result only changes. The two transform changes are pretty minimal. In one case, we miss the opportunity to infer a nuw (correctly). In the other, we fail to fold an exit and produce a loop invariant form instead. This one is probably over-reduced as the program appears to be undefined in practice, and neither before or after exploits that.
Differential Revision: https://reviews.llvm.org/D109789
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.
There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.
The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.
As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base. This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.
This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes). The resulting changes
seem okay to me, but suggestions welcome. As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.
I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.
Recommitting with fix to MemoryDepChecker::isDependent.
Differential Revision: https://reviews.llvm.org/D104806