It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
Fixes issue #61981 by adjusting variable location offsets (in the DIExpression)
when splitting allocas.
Patch [4/4] to fix structured bindings in SROA.
NOTE: There's still a bug in mem2reg which generates incorrect locations in some
situations: if the variable fragment has an offset into the new (split) alloca,
mem2reg will fail to convert that into a bit shift (the location contains a
garbage offset). That's not addressed here.
insertNewDbgInst - Now takes the address-expression and FragmentInfo as
separate parameters because unlike dbg_declares dbg_assigns want those to go
to different places. dbg_assign records put the variable fragment info in the
value expression only (whereas dbg_declare has only one expression so puts it
there - ideally this information wouldn't live in DIExpression, but that's
another issue).
MigrateOne - Modified to correctly compute the necessary offsets and fragment
adjustments. The previous implementation produced bogus locations for variables
with non-zero offsets. The changes replace most of the body of this lambda, so
it might be easier to review in a split-diff view and focus on the change as a
whole than to compare it to the old implementation.
This uses calculateFragmentIntersect and extractLeadingOffset added in previous
patches in this series, and createOrReplaceFragment described below.
createOrReplaceFragment - Similar to DIExpression::createFragmentExpression
except for 3 important distinctions:
1. The new fragment isn't relative to an existing fragment.
2. There are no checks on the the operation types because it is assumed
the location this expression is computing is not implicit (i.e., it's
always safe to create a fragment because arithmetic operations apply
to the address computation, not to an implicit value computation).
3. Existing extract_bits are modified independetly of fragment changes
using \p BitExtractOffset. A change to the fragment offset or size
may affect a bit extract. But a bit extract offset can change
independently of the fragment dimensions.
Returns the new expression, or nullptr if one couldn't be created. Ideally
this is only used to signal that a bit-extract has become zero-sized (and thus
the new debug record has no size and can be dropped), however, it fails for
other reasons too - see the FIXME below.
FIXME: To keep the scope of this change focused on non-bitfield structured
bindings the function bails in situations that
DIExpression::createFragmentExpression fails. E.g. when fragment and bit
extract sizes differ. These limitations can be removed in the future.
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.
This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
This patch simplifies instruction creation by replacing all overloads of
instruction constructors/Create methods that are identical other than
the Instruction *InsertBefore/BasicBlock *InsertAtEnd/BasicBlock::iterator
InsertBefore argument with a single version that takes an InsertPosition
argument. The InsertPosition class can be implicitly constructed from
any of the above, internally converting them to the appropriate
BasicBlock::iterator value which can then be used to insert the
instruction (or to not insert it if an invalid iterator is passed).
The upshot of this is that code will be deduplicated, and all callsites
will switch to calling the new unified version without any changes
needed to make the compiler happy. There is at least one exception to
this; the construction of InsertPosition is a user-defined conversion,
so any caller that was already relying on a different user-defined
conversion won't work. In all of LLVM and Clang this happens exactly
once: at clang/lib/CodeGen/CGExpr.cpp:123 we try to construct an alloca
with an AssertingVH<Instruction> argument, which must now be cast to an
Instruction* by using `&*`. If this is more common elsewhere, it could
be fixed by adding an appropriate constructor to InsertPosition.
This option was added in 3b79b2ab4e35353e63ba323a3de4b0a70c61a5f1 with
the comment:
> I had SROA implemented this way a long time ago and due to the
> overwhelming bugs that surfaced, moved to a much more relaxed
> variant. Richard Smith would like to understand the magnitude
> of this problem and it seems fairly harmless to keep some
> flag-controlled logic to get the extremely strict behavior here.
> I'll remove it if it doesn't prove useful.
As far as I know, it did not prove useful, so I'm removing it now.
With constant GEPs canonicalized to i8, GEPs that only temporarily go
out of bounds during the offset calculation do not naturally occur
anymore anyway.
Found this while trying to build a LLVM toolchain reproducibly from both
Debian 12 and FreeBSD 14. With these changes they come out bit-by-bit
identical.
Previously there was a mix of stable and unstable sorts for slices, now
only stable sorts are used.
This is the major rename patch that prior patches have built towards.
The DPValue class is being renamed to DbgVariableRecord, which reflects
the updated terminology for the "final" implementation of the RemoveDI
feature. This is a pure string substitution + clang-format patch. The
only manual component of this patch was determining where to perform
these string substitutions: `DPValue` and `DPV` are almost exclusively
used for DbgRecords, *except* for:
- llvm/lib/target, where 'DP' is used to mean double-precision, and so
appears as part of .td files and in variable names. NB: There is a
single existing use of `DPValue` here that refers to debug info, which
I've manually updated.
- llvm/tools/gold, where 'LDPV' is used as a prefix for symbol
visibility enums.
Outside of these places, I've applied several basic string
substitutions, with the intent that they only affect DbgRecord-related
identifiers; I've checked them as I went through to verify this, with
reasonable confidence that there are no unintended changes that slipped
through the cracks. The substitutions applied are all case-sensitive,
and are applied in the order shown:
```
DPValue -> DbgVariableRecord
DPVal -> DbgVarRec
DPV -> DVR
```
Following the previous rename patches, it should be the case that there
are no instances of any of these strings that are meant to refer to the
general case of DbgRecords, or anything other than the DPValue class.
The idea behind this patch is therefore that pure string substitution is
correct in all cases as long as these assumptions hold.
As part of the effort to rename the DbgRecord classes, this patch
renames the widely-used functions that operate on DbgRecords but refer
to DbgValues or DPValues in their names to refer to DbgRecords instead;
all such functions are defined in one of `BasicBlock.h`,
`Instruction.h`, and `DebugProgramInstruction.h`.
This patch explicitly does not change the names of any comments or
variables, except for where they use the exact name of one of the
renamed functions. The reason for this is reviewability; this patch can
be trivially examined to determine that the only changes are direct
string substitutions and any results from clang-format responding to the
changed line lengths. Future patches will cover renaming variables and
comments, and then renaming the classes themselves.
Have DIBuilder conditionally insert either debug intrinsics or DbgRecord
depending on the module's IsNewDbgInfoFormat flag. The insertion methods
now return a `DbgInstPtr` (a `PointerUnion<Instruction *, DbgRecord
*>`).
Add a unittest for both modes (I couldn't find an existing test testing
insertion behaviours specifically).
This patch changes the existing assumption that DbgRecords are only ever
inserted if there's an instruction to insert-before because clang
currently inserts debug intrinsics while CodeGening (like any other
instruction) meaning it'll try inserting to the end of a block without a
terminator. We already have machinery in place to maintain the
DbgRecords when a terminator is removed - these become "trailing
DbgRecords" which are re-attached when a new instruction is inserted.
All I've done is allow this state to occur while inserting DbgRecords
too, i.e., it's not only removing terminators that causes this valid
transient state, but inserting DbgRecords into incomplete blocks too.
The C API will be updated in follow up patches.
---
Note: this doesn't mean clang is emitting DbgRecords yet, because the
modules it creates are still always in the old debug mode. That will
come in a future patch.
As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.
There are two general flavours of update:
* Almost all call-sites just call getIterator on an instruction
* Several make use of an existing iterator (scenarios where the code is
actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.
Noteworthy changes:
* FindInsertedValue now takes an optional iterator rather than an
instruction pointer, as we need to always insert with iterators,
* I've added a few iterator-taking versions of some value-tracking and
DomTree methods -- they just unwrap the iterator. These are purely
convenience methods to avoid extra syntax in some passes.
* A few calls to getNextNode become std::next instead (to keep in the
theme of using iterators for positions),
* SeparateConstOffsetFromGEP has it's insertion-position field changed.
Noteworthy because it's not a purely localised spelling change.
All this should be NFC.
If a gep has only one phi as one of its operands and the remaining
indexes are constant, we can unfold `gep ptr, (phi idx1, idx2)` to `phi
((gep ptr, idx1), (gep ptr, idx2))`.
Take care not to unfold recursive phis.
Followup to #80983.
This was initially was #83087. Initial PR did not handle allocas in
entry block that weren't at the beginning of the function, causing GEPs
to be inserted after the first chunk of allocas but potentially before
an alloca not at the beginning. Insert GEPs at the end of the entry
block instead since constants/arguments/static allocas can all be used
there.
This reverts commit 2eb63982e88b9ed8336158d35884b1a1d04a0f78.
This caused verifier error
```
Instruction does not dominate all uses!
```
for some projects using Halide.
The verifier error happens inside `Halide::Internal::CodeGen_LLVM::optimize_module`
and looks like a genuine SROA issue.
If a gep has only one phi as one of its operands and the remaining
indexes are constant, we can unfold `gep ptr, (phi idx1, idx2)` to `phi
((gep ptr, idx1), (gep ptr, idx2))`.
Take care not to unfold recursive phis.
Followup to #80983.
If a split memory access introduced by SROA accesses precisely a single
field of the original operation's !tbaa.struct, use the !tbaa tag for
the accessed field directly instead of the full !tbaa.struct.
InstCombine already had a similar logic.
Motivation for this and follow-on patches is to improve codegen for
libc++, where using memcpy limits optimizations, like vectorization for
code iteration over std::vector<std::complex<float>>:
https://godbolt.org/z/f3vqYos3c
Depends on https://github.com/llvm/llvm-project/pull/81285.
SROA currently supports converting a gep of select into select of gep if
the select is in the pointer operand. This patch expands support to
selects in an index operand.
This is intended to address the regression reported in
https://github.com/llvm/llvm-project/pull/68882#issuecomment-1924909922.
With this, I get a clean test suite running under RemoveDIs, the
non-intrinsic representation of debug-info, including under asan. We've
previously established that we generate identical binaries for some
large projects, so this i just edge-case cleanup. The changes:
* CodeGenPrepare fixups need to apply to dbg.assigns as well as
dbg.values (a dbg.assign is a dbg.value).
* Pin a test for constant-deletion to intrinsic debug-info: this very
rare scenario uses a different kill-location sigil in dbg.value mode to
RemoveDIs mode, which generates spurious test differences.
* Suppress a memory leak in a unit test: the code for dealing with
trailing debug-info in a block is necessarily fiddly, leading to this
leak when testing it. Developer-facing interfaces for moving
instructions around always deal with this behind the scenes.
* SROA, when replacing some vector-loads, needs to insert the
replacement loads ahead of any debug-info records so that their values
remain dominated by a definition. Set the head-bit indicating our
insertion should come before debug-info.
f9c2a341b9
causes regressions when we have a slice with integer vector type that is
the same size as the partition, and a ptr load/store slice that is not
the size of the element type.
Ref `vector-promotion.ll:ptrLoadStoreTys`.
Before the patch, we would only consider `<4 x i32>` as a candidate type
for vector promotion, and would find that it is a viable type for all
the slices.
After the patch, we now add `<2 x ptr>` as a candidate type due to slice
with user `store ptr %val0, ptr %obj, align 8` -- and flag that we
`HaveVecPtrTy`. The pre-existing behavior of this flag results in
removing the viable `<4 x i32>` and keeping only the unviable `<2 x
ptr>`, which results in a failure to promote.
The end result is failing to promote an alloca that was previously
promoted -- this does not appear to be the intent of that patch, which
has the goal of increasing promotions by providing more promotion
opportunities.
This PR preserves this behavior via a simple reorganization of the
implemention: try first the slice types with same size as the partition,
then, if there is no promotable type, try the `LoadStoreTys.`
SROA needs to update llvm.dbg.assign intrinsics when it migrates debug
info in response to alloca splitting; this patch updates the debug info
migration code to handle DPVAssigns as well, making use of generic code
to avoid duplication as much as possible.
This patch follows on from comments on
https://github.com/llvm/llvm-project/pull/73498, implementing the
proposed split of findDbgDeclares into two separate functions for
DbgDeclareInsts and DPVDeclares, which return containers rather than
taking containers by reference.
Vectors are always bit-packed and don't respect the elements' alignment
requirements. This is different from arrays. This means offsets of
vector GEPs need to be computed differently than offsets of array GEPs.
This PR fixes many places that rely on an incorrect pattern
that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`.
We replace these by usages of `GTI.getSequentialElementStride(DL)`,
which is a new helper function added in this PR.
This changes behavior for GEPs into vectors with element types for which
the (bit) size and alloc size is different. This includes two cases:
* Types with a bit size that is not a multiple of a byte, e.g. i1.
GEPs into such vectors are questionable to begin with, as some elements
are not even addressable.
* Overaligned types, e.g. i16 with 32-bit alignment.
Existing tests are unaffected, but a miscompilation of a new test is fixed.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
Handle dbg.declares in SROA using DPValues.
In order to reduce duplication, the migrate-debug-info loop has been changed
to a generic lambda with some helper function overloads, which is called
for dbg.declares, dbg.assigns, and DPValues alike.
The tests will become "live" once #74090 lands (see for more info).
This simplifies an upcoming patch to support the RemoveDIs project (tracking
variable locations without using intrinsics).
Next in this series is #73500.
NFC cleanup towards removing method Type::getPointerTo.
* Remove unnecessary call to Type::getPointerTo
* Replace call to Type::getPointerTo with IRB.getPtrTy
This moves the SROA implementation from SROAPass into a separate SROA
class that is defined in the cpp file, and reduces the SROAPass class to
a thin NewPM wrapper. This allows to remove all implementation details
from the SROA header, and the SROALegacyPass can wrap the SROA class
instead of the NewPM SROAPass.
The trigger for this change is a GCC warning about visibility of
implementation details in the SROA header after D138238. Credits to
Nikita Popov for suggesting this reorganization.
Let `llvm.launder.invariant.group` intrinsic as well as instructions
operating on memory addresses, whose invariance may be broken by the
intrinsic, to be rewritten.
Fixes: https://github.com/llvm/llvm-project/issues/72035.
For volatile atomic, this may result in a verifier errors, if the
new alloca type is not legal for atomic accesses.
I've opted to disable this special case for volatile accesses in
general, as changing the size of the volatile access seems
dubious in any case.
Fixes https://github.com/llvm/llvm-project/issues/64721.
Unlike the load case, stores past the end of the alloca are
removed by SROA as undefined behavior. As such, there is no need
to handle this case when rewriting stores.
This patch adds a hidden CLI option "--sroa-max-alloca-slices", which is
an integer that controls the maximum number of alloca slices SROA can
consider before bailing out. This is useful because it may not be
profitable to split memcpys into (possibly tens of) thousands of loads/stores.
This also prevents an issue with exponential compile time explosion in passes
like DSE and MemCpyOpt caused by excessive alloca splitting.
Fixes https://github.com/rust-lang/rust/issues/88580.
Differential Revision: https://reviews.llvm.org/D159354
When visiting load and store instructions in SROA skip scalable vectors.
This is relevant in the implementation of the 'arm_sve_vector_bits'
attribute that is used to define VLS types, similar to D85725.
Fix https://gcc.godbolt.org/z/o561P9zj4
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D158631