Functions using different constant expressions were incorrectly
merged, because a lot of state was missing from the comparison,
including the opcode, the comparison predicate, the GEP element
type, as well as the inbounds, inrange and nowrap poison flags.
The specialisation will not be valid when ConstantInt gains native
support for vector types.
This is largely a mechanical change but with extra attention paid to constant
folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to
remove the need to call `getIntegerType()`.
Co-authored-by: Nikita Popov <github@npopov.com>
Add the `dead_on_unwind` attribute, which states that the caller will
not read from this argument if the call unwinds. This allows eliding
stores that could otherwise be visible on the unwind path, for example:
```
declare void @may_unwind()
define void @src(ptr noalias dead_on_unwind %out) {
store i32 0, ptr %out
call void @may_unwind()
store i32 1, ptr %out
ret void
}
define void @tgt(ptr noalias dead_on_unwind %out) {
call void @may_unwind()
store i32 1, ptr %out
ret void
}
```
The optimization is not valid without `dead_on_unwind`, because the `i32
0` value might be read if `@may_unwind` unwinds.
This attribute is primarily intended to be used on sret arguments. In
fact, I previously wanted to change the semantics of sret to include
this "no read after unwind" property (see D116998), but based on the
feedback there it is better to keep these attributes orthogonal (sret is
an ABI attribute, dead_on_unwind is an optimization attribute). This is
a reboot of that change with a separate attribute.
As part of RemoveDIs, we need instruction insertion to be done with
iterators rather than instruction pointers, so that we can communicate
some debug-info facts about the position. This patch is an entirely
mechanical replacement of Instruction * with BasicBlock::iterator, plus
using insertBefore to insert some instructions because we don't have
iterator-taking constructors yet.
Sadly it's not NFC because it causes dbg.value intrinsics / their
DPValue equivalents to shift location.
That patch was missing a recent change to the non-DPValue StoreInst overload.
Add it to the DPValue version. This is tested by
`llvm/tests/Transforms/Mem2Reg/dbg_declare_to_value_conversions.ll`,
which already has `--try-experimental-debuginfo-iterators`. It doesn't
fail currently because until #74090 lands the declares are not converted
to DPValues.
The tests will become "live" once #74090 lands (see for more info).
The intrinsic variants of these functions don't do anything to
dbg.declares so the non-instruction variants should ignore them too.
Tested in llvm/test/DebugInfo/duplicate_dbgvalue.ll, which has
`--try-experimental-debuginfo-iterators` added in #73504.
The tests will become "live" once #74090 lands (see for more info).
For the two remaining uses that did not explicitly specify it,
set UndefAllowed=false. In both cases, I believe that treating
undef as a full range is the correct behavior.
This simplifies an upcoming patch to support the RemoveDIs project (tracking
variable locations without using intrinsics).
Next in this series is #73500.
LSR uses SCEVExpander to generate induction formulas. The expander
internally tries to reuse existing IR expressions. To do that, it needs
to strip any poison generating flags (nsw, nuw, exact, nneg, etc..)
which may not be valid for the newly added users.
This is conservatively correct, but has the effect that LSR will strip
nneg flags on zext instructions involved in trip counts in loop
preheaders. To avoid this, this patch adjusts the expanded to reinfer
the flags on the CSE candidate if legal for all possible users.
This should fix the regression reported in
https://github.com/llvm/llvm-project/issues/71200.
This should arguably be done inside canReuseInstruction instead, but
doing it outside is more conservative compile time wise. Both
canReuseInstruction and isGuaranteedNotToBePoison walk operand lists, so
right now we are performing work which is roughly O(N^2) in the size of
the operand graph. We should fix that before making the per operand step
more expensive. My tenative plan is to land this, and then rework the
code to sink the logic into more core interfaces.
Minor simplification applied to VFShape::getScalarShape,
VFShape::get, and VFABI::tryDemangleForVFABI methods.
Also, remove unnecessary `static_cast` in `SLPVectorizer.cpp`
Some final errors have turned up when doing stage2clang builds:
* We can insert before end(), which won't have an attached DPMarker,
thus we can have a nullptr DPMarker in Instruction::insertBefore. Add a
null check there.
* We need to use the iterator-returning form of getFirstNonPHI to ensure
we don't insert any PHIs behind debug-info at the start of a block.
The order of dbg.value intrinsics appearing in the output can affect the
order of tables in DWARF sections. This means that DPValues, our
dbg.value replacement, needs to obey the same ordering rules. For
dbg.values returned by findDbgUsers it's reverse order of creation (due
to how they're put on use-lists). Produce that order from findDbgUsers
for DPValues.
I've got a few IR files where the order of dbg.values flips, but it's a
fragile test -- ultimately it needs the number of times a DPValue is
handled by findDbgValues to be odd.
Avoid editing a range of DPValues and then remapping them. This occurs
when we try to de-duplicate dbg.values, but then re-use the same
iterator range. We can instead remap them, and then erase any
duplicates.
At the same time refactor the computation of seen-intrinsic hashes, and
account for a peculiarity of loop-rotates existing behaviour: it will
only deduplicate dbg.values that are immediately before the preheaders
branch instruction, not just any dbg.value in the preheader.
For example, this allows us to peel this loop with a `and`:
```
for (int i = 0; i < N; ++i) {
if (i % 2 == 0 && i < 3) // can peel based on || as well
f1();
f2();
```
into:
```
for (int i = 0; i < 3; ++i) { // peel three iterations
if (i % 2 == 0)
f1();
f2();
}
for (int i = 3; i < N; ++i)
f2();
```
If a suspend happens in the resume part (this can happen in the case of chained coroutines), and that's part of a loop, the pre-split CFG has the suspend block as an exit of that loop. PGO Counter Promotion will then try to commit the temporary counter to the global in that "exit" block (it also does that in the other loop exit BBs, which also includes
the "destroy" case). This interferes with symmetric transfer.
We don't need to commit the counter in the suspend case - it's not a loop exit from the perspective of the behavior of the program. The regular loop exit, together with the "destroy" case, completely cover any updates that may need to happen to the global counter.
This patch updates the last few places in LLVM using findDbgValues that
don't also collect and handle DPValue objects. This largely involves
instcombine and mem2reg changes, and are largely mechanical, calling
existing utilities on collections of DPValues instead of just
DbgValuesInsts.
A variety of tests have had RemoveDIs RUN lines added to them to cover
these behaviours. We have some technical debt of the instcombine sinking
code for DPValues not being implemented yet, so I've left FIXME stubs
indicating that we intend to cover tests with RemoveDIs but haven't yet.
CodeExtractor shifts dbg.value intrinsics out of the region being
extracted and updates them to be appropriate in the extracted function.
With new non-intrinsic variable locations, we need to manually do this
too, with DPValues.
Most of this patch shifts and refactors some utilities in
fixupDebugInfoPostExtraction so that we can add a single extra helper
lambda that iterates over DPValues and applies update-utilities. We also
have to assign the IsNewDbgInfoFormat flag in a bunch of places -- this
normally gets set the moment you insert a block into a function (or
function into a module), however a few blocks are constructed here
before being inserted, thus we have to do some manual setup.
Tested via LoopExtractor_alloca.ll, which invokes debugify.
Debugify is extremely useful as a testing and debugging tool, and a good
number of LLVM-IR transform tests use it. We need it to support "new"
non-instruction debug-info to get test coverage, but it's not important
enough to completely convert right now (and it'd be a large
undertaking). Thus: convert to/from dbg.value/DPValue mode on entry and
exit of the pass, which gives us the functionality without any further
work. The cost is compile-time, but again this is only happening during
tests.
Tested by: the large set of debugify tests enabled here. Note the
InstCombine test (cast-mul-select.ll) that hasn't been fully enabled:
this is because there's a debug-info sinking piece of code there that
hasn't been instrumented.
Loop-rotate manually maintains dbg.value intrinsics -- it also needs to
manually maintain the replacement for dbg.value intrinsics, DPValue
objects. For the most part this patch adds parallel implementations
using the new type Some extra juggling is needed when loop-rotate hoists
loop-invariant instructions out of the loop: the DPValues attached to
such an instruction need to get rotated but not hoisted. Exercised by
the new test function invariant_hoist in dbgvalue.ll.
There's also a "don't insert duplicate debug intrinsics" facility in
LoopRotate. The value and correctness of this isn't clear, but to
continue preserving behaviour that's now tested in the "tak_dup"
function in dbgvalue.ll.
Other things in this patch include a helper DebugVariable constructor
for DPValues, a insertDebugValuesForPHIs handler for RemoveDIs
(exercised by the new tests), and beefing up the dbg.value checking in
dbgvalue.ll to ensure that each record is tested (and that there's an
implicit check-not).
With intrinsics representing debug-info, we just clone all the
intrinsics when inlining a function and don't think about it any
further. With non-instruction debug-info however we need to be a bit
more careful and manually move the debug-info from one place to another.
For the most part, this means keeping a "cursor" during block cloning of
where we last copied debug-info from, and performing debug-info copying
whenever we successfully clone another instruction.
There are several utilities in LLVM for doing this, all of which now
need to manually call cloneDebugInfo. The testing story for this is not
well covered as we could rely on normal instruction-cloning mechanisms
to do all the hard stuff. Thus, I've added a few tests to explicitly
test dbg.value behaviours, ahead of them becoming not-instructions.
This patch adds support for CloneBasicBlock duplicating the DPValues
attached to instructions, and adds facilities to remap them into their new
context. The plumbing to achieve this is fairly straightforwards and
mechanical.
I've also added illustrative uses to LoopUnrollRuntime, SimpleLoopUnswitch
and SimplifyCFG. The former only updates for the epilogue right now so I've
added CHECK lines just for the end of an unrolled loop (further updates
coming later). SimpleLoopUnswitch had no debug-info tests so I've added a
new one. The two modified parts of SimplifyCFG are covered by the two
modified SimplifyCFG tests.
These are scenarios where we have to do extra cloning for copying of
DPValues because they're no longer instructions, and remap them too.
The code in the CloneInstructionsIntoPredec... function modified by this
patch has a long history that dates back to 2011, see d715ec82b4ad12c59.
There, when folding branches, all dbg.value intrinsics seen when folding
would be saved and then re-inserted at the end of whatever was folded. Over
the last 12 years this behaviour has been preserved.
However, IMO it's bad behaviour. If we have:
inst1
dbg.value1
inst2
dbg.value2
And we fold that sequence into a different block, then we would want the
instructions and variable assignments to appear in the same order. However
because of this old behaviour, the dbg.values are sunk, and we get:
inst1
inst2
dbg.value1
dbg.value2
This clustering of dbg.values can make assignments to the same variable
invisible, as well as reducing the coverage of other assignments.
This patch relaxes the CloneInstructions... function and allows it to clone
and update dbg.values in-place, causing them to appear in the original
order in the destination block. I've added some extra dbg.values to the
updated test: without the changes to the pass, the dbg.values sink into a
blob ahead of the select. The RemoveDIs code can't cope with this right now
so I've removed the "--try..." flag, restored in a commit to land in a
couple of hours.
(Metadata changes to make the LLVM-IR parser not drop the debug-info for it
being out of date. The RemoveDIs related RUN line has been removed because
it was spuriously passing due to the debug-info being dropped).
We can determine the VF from a combination of the mangled name (which
indicates the arguments that take vectors) and the element sizes of
the arguments for the scalar function the mapping has been established
for.
The assert when demangling fails has been removed in favour of just
not adding the mapping, which prevents the crash seen in
https://github.com/llvm/llvm-project/issues/71892
This patch also stops using _LLVM_ as an ISA for scalable vector tests,
since there aren't defined rules for the way vector arguments should be
handled (e.g. packed vs. unpacked representation).
This patch makes jump-threading handle non-instruction debug-info stored
in DPValues in the same way that it updates dbg.values nowadays. This
involves re-targetting their operands as with dbg.values getting moved
from one block to another, and manually cloning them when duplicating
blocks. The SSAUpdater class also grows some functions for SSA-updating
DPValues in the same way as dbg.values.
All of this is largely covered by existing debug-info tests, except for
the cloning of DPValues attached to elidable instructions and branches,
where I've added a test to thread-debug-info.ll. Where previously we
could rely on dbg.values being copied and cloned as normal instructions
are, as we need to explicitly perform that operation now I've added some
explicit testing for it.
Loop deletion identifies dbg.value intrinsics in the loop, sets them to
undef/poison, and sinks them to the exit of the loop, to ensure that any
variable assignments that happen in a deleted loop are "optimised out".
This needs to be replicated for DPValues, the non-instruction
replacement for dbg.value intrinsics.
The movement API for DPValues is (deliberately) more limited than
dbg.values, which is tricky because inserting the collection of
dbg.values at an arbitrary iterator can insert a dbg.value in the middle
of a sequence of dbg.values. A big no-no for DPValues. This patch
replicates the order by inserting DPValues in reverse at the
head-iterator of the block, to ensure the same output as dbg.value mode.
Technically the order isn't important, but we're trying to ensure
identical outputs from optimisation passes right now.
Add more CHECK lines for dbg.values in diundef.ll to ensure that we
don't create any spurious dbg.values, and to ensure that sequences of
dbg.values come out of the optimisation in the correct order.
SCEV simplifying the subtraction may result in redundant compares that
are all OR'd together. Keep track of the generated operands in
SeenCompares, with the key being the pair of operands for the compare.
If we alrady generated the same compare previously, skip it.
LCSSA needs to manually update dbg.value intrinsic users of Values that
are having LCSSA PHIs inserted -- instrument it to do the same for
DPValues, the replacement for dbg.values.
This patch also contains an opportunistic fix replacing
instruction-insertion with iterator-insertion, necessary for
communicating debug-info in the future.
Instead of expanding the src/sink SCEV expressions and emitting an IR
sub to compute the difference, the subtraction can be directly be
performed by ScalarEvolution. This allows the subtraction to be
simplified by SCEV, which in turn can reduced the number of redundant
runtime check instructions generated.
It also allows to generate checks that are invariant w.r.t. an outer
loop, if he inner loop AddRecs have the same outer loop AddRec as start.
It seems TypeSize is currently broken in the sense that:
TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)
without failing its assert that explicitly tests for this case:
assert(LHS.Scalable == RHS.Scalable && ...);
The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.
This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.
The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
THe freezes are introduced to avoid branch on undef/poison, if any of
the pointers may be poison. The same can be achieved by just freezing
the compare, which reduces the number of freezes needed. See
https://alive2.llvm.org/ce/z/NHa_ud
Note that the individual compares need to be frozen and it is not
sufficient to only freeze the resulting OR:
Result OR frozen only (UNSOUND): https://alive2.llvm.org/ce/z/YzFHQY
Individual conds frozen (SOUND): https://alive2.llvm.org/ce/z/5L6Z3f
This pass steps through a block forwards and backwards, identifying those
variable assignment records that are redundant, and erases them, saving us
a decent wedge of compile-time. This patch re-implements it to use the
replacement for DbgValueInsts, DPValues, in an almost identical way.
Alas the test I've added the try-remove-dis flag to is the only one I've
been able to find that manually runs this pass.
In present-day debug-info, when you delete all instructions, you delete
all their debug-info with it because debug-info is stored in
instructions. With debug-info stored in DPValue objects however,
deleting instructions causes DPValue objects to clump together into a
large blob of debug-info that hangs around in the block, as nothing has
explicitly deleted it.
To restore this behaviour, scatter calls to dropDbgValues around in
places that used to delete chunks of dbg.values, for example during
stripDebugInfo and in the code that deletes everything after an
Unreachable instruction. DCE is another example.
The tests with --try... added to them are new scenarios where we can now
correctly replicate the "normal" debug-info behaviour. Alas, there's no
explicit test for the opt -strip-debug option though (in dbg.value mode
or DPValue mode).
It might seem obvious, but it's not a good idea to convert a
debug-intrinsic instruction into an UnreachableInst, as this means
things operate differently with and without the -g option. However this
can happen due to the "mutate the next instruction" API calls we make.
With RemoveDIs eliminating debug intrinsics, this behaviour is at risk
of changing, hence this patch ensures we only ever mutate the next _non_
debuginfo instruction into an Unreachable.
The tests instrumented with the --try... flag all exercise this, I've
added some metadata to a SCCP test to ensure it's exercised.