Note: This relands #140615 adding a ".count" suffix to the non-".all"
variants.
Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.
- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.count(i32, i32)
The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:
* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned.count(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync.count(x, y)
Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.
- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync(i32, i32)
- llvm.nvvm.barrier.cta.arrive(i32, i32)
The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:
* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y)
In case the same src BB targets to the same dest BB in different
conditions/edges, such as switch-cases, we should use
prob[SrcBB->SuccIndx] instead of prob[SrcBB->DstBB] to get probability.
In unreachable code, constant PHI nodes may appear and be replaced by their
single value. As a result, instructions may become self-referencing. This
commit adds checks to avoid going into infinite recursion when handling
self-referencing compare instructions in `evaluateOnPredecessorEdge()`.
This LLVM defect was identified via the AMD Fuzzing project.
These date back to when the non-intrinsic format of variable locations
was still being tested and was behind a compile-time flag, so not all
builds / bots would correctly run them. The solution at the time, to get
at least some test coverage, was to have tests opt-in to non-intrinsic
debug-info if it was built into LLVM.
Nowadays, non-intrinsic format is the default and has been on for more
than a year, there's no need for this flag to exist.
(I've downgraded the flag from "try" to explicitly requesting
non-intrinsic format in some places, so that we can deal with tests that
are explicitly about non-intrinsic format in their own commit).
Although an unreachable BB is skipped by processBlock, its successor can
still be handled by processBlock, and maybeMergeBasicBlockIntoOnlyPred
may merge the two BBs and delete the unreachable BB. Then the garbage
pointer is left in Unreachable set. This patch avoids merging a BB into
unreachable predecessor.
Preserve !alias.scope, !noalias and !mem.parallel_loop_access metadata
on the replacement instruction, if it does not move. In that case, the
program would be UB, if the aliasing property encoded in the metadata
does not hold. This makes use of the clarification re aliasing metadata
implying UB if the property does not hold: #116220
Same as #115868, but for !alias.scope, !noalias and
!mem.parallel_loop_access.
PR: https://github.com/llvm/llvm-project/pull/117716
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records.
If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.
For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
Use ICmpInst::compare() where possible, ConstantFoldCompareInstOperands
in other places. This only changes places where the either the fold is
guaranteed to succeed, or the code doesn't use the resulting compare if
we fail to fold.
Fixes#76609
This patch does:
- relax the phis constraint in `CanRedirectPredsOfEmptyBBToSucc`
- guarantee the BB has multiple different predecessors to redirect, so
that we can handle the case without phis in BB. Without this change and
phi constraint, we may redirect the CommonPred.
The motivation is consistent with JumpThreading. We always want the
branch to jump more direct to the destination, without passing the
middle block. In this way, we can expose more other optimization
opportunities.
An obivous example proposed by @dtcxzyw is like:
```llvm
define i32 @test(...) {
entry:
br i1 %c, label %do.end, label %if.then
if.then: ; preds = %entry
%call2 = call i32 @dummy()
%tobool3.not = icmp eq i32 %call2, 0
br i1 %tobool3.not, label %do.end, label %return
do.end: ; preds = %entry, %if.then
br label %return
return: ; preds = %if.then, %do.end
%retval.0 = phi i32 [ 0, %do.end ], [ %call2, %if.then ]
ret i32 %retval.0
}
```
`entry` can directly jump to return, without passing `do.end`, and then
the if-else pattern can be simplified further:
```llvm
define i32 @test(...) {
entry:
br i1 %c, label %return, label %if.then
if.then: ; preds = %entry
%call2 = call i32 @dummy()
br label %return
return: ; preds = %if.then
%retval.0 = phi i32 [ 0, %entry ], [ %call2, %if.then ]
ret i32 %retval.0
}
```
JumpThreading may perform AA queries while the dominator tree is not up
to date, which may result in miscompilations.
Fix this by adding a new AAQI option to disable the use of the dominator
tree in BasicAA.
Fixes https://github.com/llvm/llvm-project/issues/79175.
Debugify is extremely useful as a testing and debugging tool, and a good
number of LLVM-IR transform tests use it. We need it to support "new"
non-instruction debug-info to get test coverage, but it's not important
enough to completely convert right now (and it'd be a large
undertaking). Thus: convert to/from dbg.value/DPValue mode on entry and
exit of the pass, which gives us the functionality without any further
work. The cost is compile-time, but again this is only happening during
tests.
Tested by: the large set of debugify tests enabled here. Note the
InstCombine test (cast-mul-select.ll) that hasn't been fully enabled:
this is because there's a debug-info sinking piece of code there that
hasn't been instrumented.
With intrinsics representing debug-info, we just clone all the
intrinsics when inlining a function and don't think about it any
further. With non-instruction debug-info however we need to be a bit
more careful and manually move the debug-info from one place to another.
For the most part, this means keeping a "cursor" during block cloning of
where we last copied debug-info from, and performing debug-info copying
whenever we successfully clone another instruction.
There are several utilities in LLVM for doing this, all of which now
need to manually call cloneDebugInfo. The testing story for this is not
well covered as we could rely on normal instruction-cloning mechanisms
to do all the hard stuff. Thus, I've added a few tests to explicitly
test dbg.value behaviours, ahead of them becoming not-instructions.
This patch makes jump-threading handle non-instruction debug-info stored
in DPValues in the same way that it updates dbg.values nowadays. This
involves re-targetting their operands as with dbg.values getting moved
from one block to another, and manually cloning them when duplicating
blocks. The SSAUpdater class also grows some functions for SSA-updating
DPValues in the same way as dbg.values.
All of this is largely covered by existing debug-info tests, except for
the cloning of DPValues attached to elidable instructions and branches,
where I've added a test to thread-debug-info.ll. Where previously we
could rely on dbg.values being copied and cloned as normal instructions
are, as we need to explicitly perform that operation now I've added some
explicit testing for it.
Remove support for zext and sext constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.
There is some additional cleanup that can be done on top of this, e.g.
we can remove the ZExtInst vs ZExtOperator footgun.
This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
When evaluating comparisons in predecessors, phi operands are translated
into the predecessor. If the translation is across a backedge, this
means that the two operands of the icmp will be from two different loop
iterations, resulting in incorrect simplification.
Fix this by not performing the phi translation for phis in loop headers.
Note: This is not a complete fix. If the
jump-threading-across-loop-headers option is enabled, the LoopHeaders
variable does not get populated. Additional changes will be needed to
fix that case.
Related to https://github.com/llvm/llvm-project/issues/70651.
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:
* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.
This reverts commit fc86d031fec5e47c6811efd3a871742ad244afdd.
This change breaks LLVM buildbot clang-aarch64-sve-vls-2stage
https://lab.llvm.org/buildbot/#/builders/176/builds/5474
I am going to revert this patch as the bot has been failing for more than a day without a fix.
isLegalToHoistInto() currently return true for callbr instructions.
That means that a callbr with one successor will be considered a
proper loop preheader, which may result in instructions that use
the callbr return value being hoisted past it.
Fix this by adding callbr to isExceptionTerminator (with a rename
to isSpecialTerminator), which also fixes similar assumptions in
other places.
Fixes https://github.com/llvm/llvm-project/issues/64215.
Differential Revision: https://reviews.llvm.org/D158609
This is basically a copy and paste of the same logic we do in
`computeKnownBits` but adapts it for just `isKnownNonZero`.
Differential Revision: https://reviews.llvm.org/D157801
Conservatively return unknown in this degenerate case. This is
hard to hit in practice, because such phis are usually optimized
away before they reach a getObjectSize() call.
Fixes https://github.com/llvm/llvm-project/issues/63013.
I removed the conflict check from computeKnownBitsFromShiftOperator()
in D150648 assuming that this is now handled on the KnownBits side.
However, the nsw handling is still inside ValueTracking, so we
still need to handle conflicts there. Restore the check closer to
where it is relevant.
Fixes https://github.com/llvm/llvm-project/issues/62908.
Since D141386 has changed the return value of !range from IUB to poison,
metadata !range shouldn't be preserved even if K dominates J.
If this patch was accepted, I plan to adjust metadata !nonnull as well.
BTW, I found that metadata !noundef is not handled in combineMetadata,
is this intentional?
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D142687
Currently, JT creates and updates local instances of BPI\BFI. As a result global ones have to be invalidated if JT made any changes.
In fact, JT doesn't use any information from BPI/BFI for the sake of the transformation itself. It only creates BPI/BFI to keep them up to date. But since it updates local copies (besides cases when it updates profile metadata) it just waste of time.
Current patch is a rework of D124439. D124439 makes one step and replaces local copies with global ones retrieved through AnalysisPassManager. Here we do one more step and don't create BPI/BFI if the only reason of creation is to keep BPI/BFI up to date. Overall logic is the following. If there is cached BPI/BFI then update it along the transformations. If there is no existing BPI/BFI, then create it only if it is required to update profile metadata.
Please note if BPI/BFI exists on exit from JT (either cached or created) it is always up to date and no reason to invalidate it.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D136827