Comments attached to the `ScaledReg` field of `struct Formula` explains
that, `ScaledReg` must be non-null when `Scale` is non-zero.
This fixes up a code path where this invariant is violated. Also, add an
assert to ensure this invariant holds true.
Without this patch, compiler aborts with the attached test case.
Fixes#76504
As of rev ea222be0d, LLVMs assembler will actually try to honour the
"fill value" part of p2align directives. X86 printed these as 0x90, which
isn't actually what it wanted: we want multi-byte nops for .text
padding. Compiling via a textual assembly file produces single-byte
nop padding since ea222be0d but the built-in assembler will produce
multi-byte nops. This divergent behaviour is undesirable.
To fix: don't set the byte padding field for x86, which allows the
assembler to pick multi-byte nops. Test that we get the same multi-byte
padding when compiled via textual assembly or directly to object file.
Added same-align-bytes-with-llasm-llobj.ll to that effect, updated
numerous other tests to not contain check-lines for the explicit padding.
When expanding SCEV adds to geps, transfer the nuw flag to the resulting
gep. (Note that this doesn't apply to IV increment GEPs, which go
through a different code path.)
Motivating example: https://godbolt.org/z/eb97zrxhx
Here we have 2 induction variables in the loop: one is corresponding to
i variable (add rdx, 4), the other - to res (add rax, 2). The second
induction variable can be removed by rewriteLoopExitValues() method
(final value of res at loop exit is unroll_iter * -2); however, this
doesn't happen because we have duplicated LCSSA phi nodes at loop exit:
```
; Preheader:
for.body.preheader.new: ; preds = %for.body.preheader
%unroll_iter = and i64 %N, -4
br label %for.body
; Loop:
for.body: ; preds = %for.body, %for.body.preheader.new
%lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ]
%i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ]
%inc.3 = add nuw i64 %i.07, 4
%lsr.iv.next = add nsw i64 %lsr.iv, -2
%niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3
br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7
; Exit blocks
for.end.loopexit.unr-lcssa.loopexit: ; preds = %for.body
%inc.3.lcssa = phi i64 [ %inc.3, %for.body ]
%lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ]
%lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ]
br label %for.end.loopexit.unr-lcssa
```
rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses:
one in LCSSA phi node, the other - in induction phi node. Here we have 3
uses of this value because of duplicated lcssa nodes, so the transform
doesn't apply and leads to an extra add operation inside the loop. The
proposed solution is to accumulate inserted instructions that will
require LCSSA form update into SetVector and then call
formLCSSAForInstructions for this SetVector once, so the same
instructions don't process twice.
Reland fixes the issue with preserve-lcssa.ll test: it fails in the situation
when x86_64-unknown-linux-gnu target is unavailable in opt. The changes are
moved into separate duplicated-phis.ll test with explicit x86 target requirement
to fix bots which are not building this target.
Motivating example: https://godbolt.org/z/eb97zrxhx
Here we have 2 induction variables in the loop: one is corresponding to
i variable (add rdx, 4), the other - to res (add rax, 2). The second
induction variable can be removed by rewriteLoopExitValues() method
(final value of res at loop exit is unroll_iter * -2); however, this
doesn't happen because we have duplicated LCSSA phi nodes at loop exit:
```
; Preheader:
for.body.preheader.new: ; preds = %for.body.preheader
%unroll_iter = and i64 %N, -4
br label %for.body
; Loop:
for.body: ; preds = %for.body, %for.body.preheader.new
%lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ]
%i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ]
%inc.3 = add nuw i64 %i.07, 4
%lsr.iv.next = add nsw i64 %lsr.iv, -2
%niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3
br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7
; Exit blocks
for.end.loopexit.unr-lcssa.loopexit: ; preds = %for.body
%inc.3.lcssa = phi i64 [ %inc.3, %for.body ]
%lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ]
%lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ]
br label %for.end.loopexit.unr-lcssa
```
rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses:
one in LCSSA phi node, the other - in induction phi node. Here we have 3
uses of this value because of duplicated lcssa nodes, so the transform
doesn't apply and leads to an extra add operation inside the loop. The
proposed solution is to accumulate inserted instructions that will
require LCSSA form update into SetVector and then call
formLCSSAForInstructions for this SetVector once, so the same
instructions don't process twice.
Fix#97510 .
Note that, for the new phi instruction `NewPH`, which replaces the old
phi `PH` and the cast `ShadowUse`, I choose to propagate the debug
location of `PH` to it, because the cast is eliminated according to the
optimization semantics.
If we have a urem expression, emitting it as a urem is significantly
better that letting the fully expansion kick in. We have the risk of a
udiv or mul which could have previously been shared, but loosing that
seems like a reasonable tradeoff for being able to round trip a urem w/o
modification.
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records.
If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.
For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
This patch canonicalizes constant expression GEPs to use i8 source
element type, aka ptradd. This is the ConstantFolding equivalent of the
InstCombine canonicalization introduced in #68882.
I believe all our optimizations working on constant expression GEPs
(like GlobalOpt etc) have already been switched to work on offsets, so I
don't expect any significant fallout from this change.
This is part of:
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699
It's common to delete some instructions after using SCEVExpander,
while it is still live (but will not be used afterwards). In that
case, the AssertingVH may trigger. Replace it with a PoisoningVH
so that we only detect the case where the SCEVExpander actually is
used in a problematic fashion after the instruction removal.
The alternative would be to add clear() calls to more code paths.
Fixes https://github.com/llvm/llvm-project/issues/83404.
This patch trivially extends support for DbgValueInst recovery to
DPValues in LoopStrengthReduce; they are handled identically, so this is
mostly done by reusing the DbgValueInst code (using templates or
auto-parameter lambdas to reduce actual code duplication).
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.
One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.
Differential Revision: https://reviews.llvm.org/D141060
When normalizing a SCEV expression during expansion, there should be
no need for it to be invertible, as it will only be used for code
generation. This fixes a crash after 7f5b15ad150e.
Fixes https://github.com/llvm/llvm-project/issues/63678.
Move the logic added in 3a57152d85e1 to normalizeForPostIncUse to catch
additional un-invertable cases. This fixes another mis-compile pointed
out by @peixin in D153004.
getExpr is missing a check to make sure the result is invertible.
This can lead to incorrect results, so return nullptr in those cases
like in other places in IVUsers.
Fixes#62660.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D153202
This reverts the revert commit 1797ab36efc9c90c921cd725831f8c3f6a7125a2.
The recommitted version now checks the PostIncLoopSets for all fixups
and returns nullptr if the result doesn't match for all fixups.
This reverts commit abfeda5af329b5889db709ff74506e20e0b569e9.
and fe19036e1266d2a90b44725c82b898134906e4c3.
The added assertion triggers during clang bootstrap builds. Revert while
I investigate.
GenerateTruncates at the moment creates extends/truncates for post-inc
uses of normalized expressions. For example, if an add rec of the form
{1,+,-1} is used outside the loop, the normalized form will use {1,+,-1}
instead of {0,+,-1}. When naively sign-extending the normalized
expression, it will get extended incorrectly to {1,+,-1} for the wider
type, if the backedge-taken count of the loop is 1.
To address this, the patch updates GenerateTruncates to check if the
LSRUse contains any fixups with PostIncLoops. If that's the case, first
de-normalize the expression, then perform the extend/truncate, then
normalize again.
There may be other places where similar checks are needed and the helper
can be generalized for those cases. I'd not be surprised if other subtle
mis-compiles are caused by this.
Fixes#38847.
Fixes#58039.
Fixes#62852.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D153004
This is an attempt to reland D42600 and enabling this optimisation by default.
This also resolves the issue pointed out in the context of PGO build.
Differential Revision: https://reviews.llvm.org/D42600
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
This reverts commit 1ddfd1c8186735c62b642df05c505dc4907ffac4.
The original commit causes a Chrome build assertion failure with
ThinLTO: https://crbug.com/1443635
This patch splits a restore point to allow it to only post-dominate blocks reachable by use
or def of CSRs(Callee Saved Registers)/FI(Frame Index).
Benchmarking this on SPEC2017, this gives around 4% improvement on povray and no significant change
for others.
Co-authored-by: junbuml
Differential Revision: https://reviews.llvm.org/D42600
Fixes https://github.com/llvm/llvm-project/issues/61182.
LoopStrengthReduce may sometimes break LCSSA form when applying a rewrite
for an instruction used in a PHI.
It happens if:
- The PHI is in a loop exit block,
- The edge from the corresponding exiting block to that exit is critical,
- The PHI has at least two inputs coming from loop blocks,
- and the rewritten instruction is inserted in the loop.
In such case we split the critical edge and then replace PHI inputs
with the rewritten instruction. However ExitBlock is no longer
a loop exit, so LCSSA form is broken.
This patch fixes it by collecting all inserted instructions for PHIs
whose parent block is not a loop exit and then forming LCSSA for them.
Differential Revision: https://reviews.llvm.org/D146811
IR is now always parsed in opaque pointer mode, unless
-opaque-pointers=0 is explicitly given. There is no automatic
detection of typed pointers anymore.
The -opaque-pointers=0 option is added to any remaining IR tests
that haven't been migrated yet.
Differential Revision: https://reviews.llvm.org/D141912
This patch relaxes the restriction that both reassociate operands must
be in the same block as the root instruction.
The comment indicates that the reason for this restriction was that the
operands not in the same block won't have a depth in the trace.
I believe this is outdated; if the operand is in a different block, it
must dominate the current block (otherwise it would need to be phi),
which in turn means the operand's block must be included in the current
rance, and depths must be available.
There's a test case (no_reassociate_different_block) added in
70520e2f1c5fc4 which shows that we have accurate depths for operands
defined in other blocks.
This allows reassociation of code that computes the final reduction
value after vectorization, among other things.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D141302