Summary:
Depends on https://reviews.llvm.org/D59069.
https://bugs.llvm.org/show_bug.cgi?id=40979 describes a bug in which the
-coro-split pass would assert that a use was across a suspend point from
a definition. Normally this would mean that a value would "spill" across
a suspend point and thus need to be stored in the coroutine frame. However,
in this case the use was unreachable, and so it would not be necessary
to store the definition on the frame.
To prevent the assert, simply remove unreachable basic blocks from a
coroutine function before computing spills. This avoids the assert
reported in PR40979.
Reviewers: GorNishanov, tks2103
Reviewed By: GorNishanov
Subscribers: EricWF, jdoerfert, llvm-commits, lewissbaker
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59068
llvm-svn: 355852
Summary:
Extract the functionality of eliminating unreachable basic blocks
within a function, previously encapsulated within the
-unreachableblockelim pass, and make it available as a function within
BlockUtils.h. No functional change intended other than making the logic
reusable.
Exposing this logic makes it easier to implement
https://reviews.llvm.org/D59068, which fixes coroutines bug
https://bugs.llvm.org/show_bug.cgi?id=40979.
Reviewers: mkazantsev, wmi, davidxl, silvas, davide
Reviewed By: davide
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59069
llvm-svn: 355846
Fixes bug 38023: https://bugs.llvm.org/show_bug.cgi?id=38023
The SimplifyCFG pass will perform jump threading in some cases where
doing so is trivial and would simplify the CFG. When folding a series
of blocks with redundant conditional branches into an unconditional "critical
edge" block, it does not keep the debug location associated with the previous
conditional branch.
This patch fixes the bug described by copying the debug info from the
old conditional branch to the new unconditional branch instruction, and
adds a regression test for the SimplifyCFG pass that covers this case.
Patch by Stephen Tozer!
Differential Revision: https://reviews.llvm.org/D59206
llvm-svn: 355833
Fixes bug 37966: https://bugs.llvm.org/show_bug.cgi?id=37966
The Jump Threading pass will replace certain conditional branch
instructions with unconditional branches when it can prove that only one
branch can occur. Prior to this patch, it would not carry the debug
info from the old instruction to the new one.
This patch fixes the bug described by copying the debug info from the
conditional branch instruction to the new unconditional branch
instruction, and adds a regression test for the Jump Threading pass that
covers this case.
Patch by Stephen Tozer!
Differential Revision: https://reviews.llvm.org/D58963
llvm-svn: 355822
Summary:
Right now, when we encounter a string equality check,
e.g. `if (memcmp(a, b, s) == 0)`, we try to expand to a comparison if `s` is a
small compile-time constant, and fall back on calling `memcmp()` else.
This is sub-optimal because memcmp has to compute much more than
equality.
This patch replaces `memcmp(a, b, s) == 0` by `bcmp(a, b, s) == 0` on platforms
that support `bcmp`.
`bcmp` can be made much more efficient than `memcmp` because equality
compare is trivially parallel while lexicographic ordering has a chain
dependency.
Subscribers: fedor.sergeev, jyknight, ckennelly, gchatelet, llvm-commits
Differential Revision: https://reviews.llvm.org/D56593
llvm-svn: 355672
In some loops, we end up generating loop induction variables that look like:
{(-1 * (zext i16 (%i0 * %i1) to i32))<nsw>,+,1}
As opposed to the simpler:
{(zext i16 (%i0 * %i1) to i32),+,-1}
i.e we count up from -limit to 0, not the simpler counting down from limit to
0. This is because the scores, as LSR calculates them, are the same and the
second is filtered in place of the first. We end up with a redundant SUB from 0
in the code.
This patch tries to make the calculation of the setup cost a little more
thoroughly, recursing into the scev members to better approximate the setup
required. The cost function for comparing LSR costs is:
return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds,
C1.ScaleCost, C1.ImmCost, C1.SetupCost) <
std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds,
C2.ScaleCost, C2.ImmCost, C2.SetupCost);
So this will only alter results if none of the other variables turn out to be
different.
Differential Revision: https://reviews.llvm.org/D58770
llvm-svn: 355597
Summary:
While implementing inlining support for callbr
(https://bugs.llvm.org/show_bug.cgi?id=40722), I hit a crash in Loop
Rotation when trying to build the entire x86 Linux kernel
(drivers/char/random.c). This is a small fix up to r353563.
Test case is drivers/char/random.c (with callbr's inlined), then ran
through creduce, then `opt -opt-bisect-limit=<limit>`, then bugpoint.
Thanks to Craig Topper for immediately spotting the fix, and teaching me
how to fish.
Reviewers: craig.topper, jyknight
Reviewed By: craig.topper
Subscribers: hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58929
llvm-svn: 355564
Summary:
They simply shuffle bits. MSan needs to do the same with shadow bits,
after making sure that the shuffle mask is fully initialized.
Reviewers: pcc, vitalybuka
Subscribers: hiraditya, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D58858
llvm-svn: 355348
I'm not too familiar with this pass, so there might be a better
solution, but this appears to fix the degenerate:
PR40930
PR40931
PR40932
PR40934
...without affecting any real-world code.
As we've seen in several other passes, when we have unreachable blocks,
they can contain semi-bogus IR and/or cause unexpected conditions. We
would not typically expect these patterns to make it this far, but we
have to guard against them anyway.
llvm-svn: 355337
I'm assuming that the nan propogation logic for InstructonSimplify's handling of fadd and fsub is correct, and applying the same to atomicrmw.
Differential Revision: https://reviews.llvm.org/D58836
llvm-svn: 355222
This patch fixes an issue where we would compute an unnecessarily small alignment during scalar promotion when no store is not to be guaranteed to execute, but we've proven load speculation safety. Since speculating a load requires proving the existing alignment is valid at the new location (see Loads.cpp), we can use the alignment fact from the load.
For non-atomics, this is a performance problem. For atomics, this is a correctness issue, though an *incredibly* rare one to see in practice. For atomics, we might not be able to lower an improperly aligned load or store (i.e. i32 align 1). If such an instruction makes it all the way to codegen, we *may* fail to codegen the operation, or we may simply generate a slow call to a library function. The part that makes this super hard to see in practice is that the memory location actually *is* well aligned, and instcombine knows that. So, to see a failure, you have to have a) hit the bug in LICM, b) somehow hit a depth limit in InstCombine/ValueTracking to avoid fixing the alignment, and c) then have generated an instruction which fails codegen rather than simply emitting a slow libcall. All around, pretty hard to hit.
Differential Revision: https://reviews.llvm.org/D58809
llvm-svn: 355217
An idempotent atomicrmw is one that does not change memory in the process of execution. We have already added handling for the various integer operations; this patch extends the same handling to floating point operations which were recently added to IR.
Note: At the moment, we canonicalize idempotent fsub to fadd when ordering requirements prevent us from using a load. As discussed in the review, I will be replacing this with canonicalizing both floating point ops to integer ops in the near future.
Differential Revision: https://reviews.llvm.org/D58251
llvm-svn: 355210
GCC correctly moans that PlainCFGBuilder::isExternalDef(llvm::Value*) and
StackSafetyDataFlowAnalysis::verifyFixedPoint() are defined but not used
in Release builds. Hide them behind 'ifndef NDEBUG'.
llvm-svn: 355205
Summary:
ConstIntInfoVec contains elements extracted from the previous function.
In new PM, releaseMemory() is not called and the dangling elements can
cause segfault in findConstantInsertionPoint.
Rename releaseMemory() to cleanup() to deliver the idea that it is
mandatory and call cleanup() in ConstantHoistingPass::runImpl to fix
this.
Reviewers: ormris, zzheng, dmgreen, wmi
Reviewed By: ormris, wmi
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58589
llvm-svn: 355174
Summary:
These sorts of blocks often contain calls to noreturn functions, like
longjmp, throw, or trap. If they don't end the program, they are
"interesting" from the perspective of sanitizer coverage, so we should
instrument them. This was discussed in https://reviews.llvm.org/D57982.
Reviewers: kcc, vitalybuka
Subscribers: llvm-commits, craig.topper, efriedma, morehouse, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58740
llvm-svn: 355152
The basic idea of the pass is to use a circular buffer to log the execution ordering of the functions. We only log the function when it is first executed. We use a 8-byte hash to log the function symbol name.
In this pass, we add three global variables:
(1) an order file buffer: a circular buffer at its own llvm section.
(2) a bitmap for each module: one byte for each function to say if the function is already executed.
(3) a global index to the order file buffer.
At the function prologue, if the function has not been executed (by checking the bitmap), log the function hash, then atomically increase the index.
Differential Revision: https://reviews.llvm.org/D57463
llvm-svn: 355133
Part 2 of CSPGO changes (mostly related to ProfileSummary).
Note that I use a default parameter in setProfileSummary() and getSummary().
This is to break the dependency in clang. I will make the parameter explicit
after changing clang in a separated patch.
Differential Revision: https://reviews.llvm.org/D54175
llvm-svn: 355131
This is part of a transform that may be done in the backend:
D13757
...but it should always be beneficial to fold this sooner in IR
for all targets.
https://rise4fun.com/Alive/vaiW
Name: sext add nsw
%add = add nsw i8 %i, C0
%ext = sext i8 %add to i32
%r = add i32 %ext, C1
=>
%s = sext i8 %i to i32
%r = add i32 %s, sext(C0)+C1
Name: zext add nuw
%add = add nuw i8 %i, C0
%ext = zext i8 %add to i16
%r = add i16 %ext, C1
=>
%s = zext i8 %i to i16
%r = add i16 %s, zext(C0)+C1
llvm-svn: 355118
Summary:
It is mentioned in the document of DTU that "It is illegal to submit any update that has already been submitted, i.e., you are supposed not to insert an existent edge or delete a nonexistent edge." It is dangerous to violet this rule because DomTree and PostDomTree occasionally crash on this scenario.
This patch fixes `MergeBlockIntoPredecessor`, making it conformant to this precondition.
Reviewers: kuhar, brzycki, chandlerc
Reviewed By: brzycki
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58444
llvm-svn: 355105
Summary:
The description of KnownBits::zext() and
KnownBits::zextOrTrunc() has confusingly been telling
that the operation is equivalent to zero extending the
value we're tracking. That has not been true, instead
the user has been forced to explicitly set the extended
bits as known zero afterwards.
This patch adds a second argument to KnownBits::zext()
and KnownBits::zextOrTrunc() to control if the extended
bits should be considered as known zero or as unknown.
Reviewers: craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58650
llvm-svn: 355099
Summary:
I hadn't realized that instrumentation runs before inlining, so we can't
use the function as the comdat group. Doing so can create relocations
against discarded sections when references to discarded __profc_
variables are inlined into functions outside the function's comdat
group.
In the future, perhaps we should consider standardizing the comdat group
names that ELF and COFF use. It will save object file size, since
__profv_$sym won't appear in the symbol table again.
Reviewers: xur, vsk
Subscribers: eraman, hiraditya, cfe-commits, #sanitizers, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D58737
llvm-svn: 355044
Summary:
The original assumption for the insertDef method was that it would not
materialize Defs out of no-where, hence it will not insert phis needed
after inserting a Def.
However, when cloning an instruction (use case used in LICM), we do
materialize Defs "out of no-where". If the block receiving a Def has at
least one other Def, then no processing is needed. If the block just
received its first Def, we must check where Phi placement is needed.
The only new usage of insertDef is in LICM, hence the trigger for the bug.
But the original goal of the method also fails to apply for the move()
method. If we move a Def from the entry point of a diamond to either the
left or right blocks, then the merge block must add a phi.
While this usecase does not currently occur, or may be viewed as an
incorrect transformation, MSSA must behave corectly given the scenario.
Resolves PR40749 and PR40754.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58652
llvm-svn: 355040
Splitting can make sanitizer errors harder to understand, as the
trapping instruction may not be in the function where the bug was
detected.
rdar://48142697
llvm-svn: 354931
Current PGO profile counts are not context sensitive. The branch probabilities
for the inlined functions are kept the same for all call-sites, and they might
be very different from the actual branch probabilities. These suboptimal
profiles can greatly affect some downstream optimizations, in particular for
the machine basic block placement optimization.
In this patch, we propose to have a post-inline PGO instrumentation/use pass,
which we called Context Sensitive PGO (CSPGO). For the users who want the best
possible performance, they can perform a second round of PGO instrument/use on
the top of the regular PGO. They will have two sets of profile counts. The
first pass profile will be manly for inline, indirect-call promotion, and
CGSCC simplification pass optimizations. The second pass profile is for
post-inline optimizations and code-gen optimizations.
A typical usage:
// Regular PGO instrumentation and generate pass1 profile.
> clang -O2 -fprofile-generate source.c -o gen
> ./gen
> llvm-profdata merge default.*profraw -o pass1.profdata
// CSPGO instrumentation.
> clang -O2 -fprofile-use=pass1.profdata -fcs-profile-generate -o gen2
> ./gen2
// Merge two sets of profiles
> llvm-profdata merge default.*profraw pass1.profdata -o profile.profdata
// Use the combined profile. Pass manager will invoke two PGO use passes.
> clang -O2 -fprofile-use=profile.profdata -o use
This change touches many components in the compiler. The reviewed patch
(D54175) will committed in phrases.
Differential Revision: https://reviews.llvm.org/D54175
llvm-svn: 354930
This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument.
Differential Revision: https://reviews.llvm.org/D58616
llvm-svn: 354790
add A, sext(B) --> sub A, zext(B)
We have to choose 1 of these forms, so I'm opting for the
zext because that's easier for value tracking.
The backend should be prepared for this change after:
D57401
rL353433
This is also a preliminary step towards reducing the amount
of bit hackery that we do in IR to optimize icmp/select.
That should be waiting to happen at a later optimization stage.
The seeming regression in the fuzzer test was discussed in:
D58359
We were only managing that fold in instcombine by luck, and
other passes should be able to deal with that better anyway.
llvm-svn: 354748
This patch adds LazyValueInfo to LowerSwitch to compute the range of the
value being switched over and reduce the size of the tree LowerSwitch
builds to lower a switch.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D58096
llvm-svn: 354670
Summary:
This patch separates two semantics of `applyUpdates`:
1. User provides an accurate CFG diff and the dominator tree is updated according to the difference of `the number of edge insertions` and `the number of edge deletions` to infer the status of an edge before and after the update.
2. User provides a sequence of hints. Updates mentioned in this sequence might never happened and even duplicated.
Logic changes:
Previously, removing invalid updates is considered a side-effect of deduplication and is not guaranteed to be reliable. To handle the second semantic, `applyUpdates` does validity checking before deduplication, which can cause updates that have already been applied to be submitted again. Then, different calls to `applyUpdates` might cause unintended consequences, for example,
```
DTU(Lazy) and Edge A->B exists.
1. DTU.applyUpdates({{Delete, A, B}, {Insert, A, B}}) // User expects these 2 updates result in a no-op, but {Insert, A, B} is queued
2. Remove A->B
3. DTU.applyUpdates({{Delete, A, B}}) // DTU cancels this update with {Insert, A, B} mentioned above together (Unintended)
```
But by restricting the precondition that updates of an edge need to be strictly ordered as how CFG changes were made, we can infer the initial status of this edge to resolve this issue.
Interface changes:
The second semantic of `applyUpdates` is separated to `applyUpdatesPermissive`.
These changes enable DTU(Lazy) to use the first semantic if needed, which is quite useful in `transforms/utils`.
Reviewers: kuhar, brzycki, dmgreen, grosser
Reviewed By: brzycki
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58170
llvm-svn: 354669
The correct edge being deleted is not to the unswitched exit block, but to the
original block before it was split. That's the key in the map, not the
value.
The insert is correct. The new edge is to the .split block.
The splitting turns OriginalBB into:
OriginalBB -> OriginalBB.split.
Assuming the orignal CFG edge: ParentBB->OriginalBB, we must now delete
ParentBB->OriginalBB, not ParentBB->OriginalBB.split.
llvm-svn: 354656
Summary:
MemorySSA is not properly updated in LoopSimplifyCFG after recent changes. Use SplitBlock utility to resolve that and clear all updates once handleDeadExits is finished.
All updates that follow are removal of edges which are safe to handle via the removeEdge() API.
Also, deleting dead blocks is done correctly as is, i.e. delete from MemorySSA before updating the CFG and DT.
Reviewers: mkazantsev, rtereshin
Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58524
llvm-svn: 354613