The CMake flag has been on by default for a month without any issues.
This makes the feature support in LLVM unconditional (but does not
enable the feature by default).
At this stage I'm just opportunistically deleting any code using
debug-intrinsic types, largely adjacent to calls to findDbgUsers. I'll
get to deleting that in probably one or more two commits.
Patch 1/4 adding bitcode support.
Store whether or not a function is using Key Instructions in its DISubprogram so
that we don't need to rely on the -mllvm flag -dwarf-use-key-instructions to
determine whether or not to interpret Key Instructions metadata to decide
is_stmt placement at DWARF emission time. This makes bitcode support simple and
enables well defined mixing of non-key-instructions and key-instructions
functions in an LTO context.
This patch adds the bit (using DISubprogram::SubclassData1).
PR 144104 and 144103 use it during DWARF emission.
PR 44102 adds bitcode
support.
See pull request for overview of alternative attempts.
Of the 128-bits of buffer descriptor only 48 bits are address bits, so
following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54,
the logic conclusion is to set the index width to 48 bits instead of
the current value of 128.
Most of the test changes are mechanical datalayout updates, but there
is one actual change: the ptrmask test now uses .i48 instead of .i128
and I had to update SelectionDAGBuilder to correctly extend the mask.
Reviewed By: krzysz00
Pull Request: https://github.com/llvm/llvm-project/pull/139419
This commit introduces a new unit test that covers a block address
cloning bug. Specifically, this test covers the bug tracked in
http://github.com/llvm/llvm-project/issues/47769 which has been resolved
in the meantime.
Migrate their usage to the `AnyMem*Inst` family, and add a isAtomic()
query on the base class for that hierarchy. This matches the idioms we
use for e.g. isAtomic on load, store, etc.. instructions, the existing
isVolatile idioms on mem* routines, and allows us to more easily share
code between atomic and non-atomic variants.
As with #138568, the goal here is to simplify the class hierarchy and
make it easier to reason about. I'm moving from easiest to hardest, and
will stop at some point when I hit "good enough". Longer term, I'd sorta
like to merge or reverse the naming on the plain Mem*Inst and the
AnyMem*Inst, but that's a much larger and more risky change. Not sure
I'm going to actually do that.
Add:
mapAtomInstance - map the atom group number to a new group.
RemapSourceAtom - apply the mapped atom group number to this instruction.
Modify:
CloneBasicBlock - Call mapAtomInstance on cloned instruction's DebugLocs
if MapAtoms is true (default). Setting to false could
lead to a degraded debugging experience. See code comment.
Optimisations like loop unroll that duplicate instructions need to remap source
atom groups so that each duplicated source construct instance is considered
distinct when determining is_stmt locations.
This commit adds the remapping functionality and a unittest.
RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
d81d9e8b8604c85709de0a22bb8dd672a28f0401 changed SplitEdge() to make use
of ehAwareSplitEdge() for critical edges where the target is an eh pad.
However, the implementation is incorrect at least for landing pads. What
is currently produced for the code in the modified unit test is
something like this:
continue:
invoke void @sink()
to label %normal unwind label %new_bb
new_bb:
%cp = cleanuppad within %exception []
cleanupret from %cp unwind label %exception
exception:
%cleanup = landingpad i8 cleanup
br label %trivial-eh-handler
This mixes different exception handling mechanisms in a nonsensical way,
and is not well-formed IR. To actually "split" the landingpad edge (for
a rather loose definition of "split"), I think we'd have to generate
something along these lines:
exception.split:
%cleanup.split = landingpad i8 cleanup
br label %exception.cont
exception:
%cleanup.orig = landingpad i8 cleanup
br label %exception.cont
exception.cont:
%cleanup = phi i8 [ %cleanup.split, %exception_split ], [ %cleanup.orig,
%exception ]
I didn't bother actually implementing that, seeing as how nobody noticed
the existing codegen being broken in the last four years, so clearly
nobody actually needs this function to work with EH edges. Just return
nullptr instead.
If we use `CodeExtractor` to extract the block1 into a new function,
```
define void @foo() !dbg !2 {
entry:
%1 = alloca i32, i64 1, align 4
%2 = alloca i32, i64 1, align 4
#dbg_declare(ptr %1, !8, !DIExpression(), !1)
br label %block1
block1:
store i32 1, ptr %1, align 4
store i32 2, ptr %2, align 4
#dbg_declare(ptr %2, !10, !DIExpression(), !1)
ret void
}
```
it will look like the extracted function shown below (with some
irrelevent details removed).
```
define internal void @extracted(ptr %arg0, ptr %arg1) {
newFuncRoot:
br label %block1
block1:
store i32 1, ptr %arg0, align 4
store i32 2, ptr %arg1, align 4
ret void
}
```
You will notice that it has replaced the usage of values that were in
the parent function (%1 and %2) with the arguments to the new function.
But it did not do the same thing with `#dbg_declare` which was simply
dropped because its location pointed to a value outside of the new
function. Similarly arg0 is without any debug record, although the value
that it replaced had one and we could materialize one for it based on
that.
This is not just a theoretical limitations. `CodeExtractor` is used to
create functions that implement many of the `OpenMP` constructs in
`OMPIRBuilder`. As a result of these limitations, the debug information
is missing from the created functions.
This PR tries to address this problem. It iterates over the input to the
extracted function and looks at their debug uses. If they were present
in the new function, it updates their location. Otherwise it materialize
a similar usage in the new function.
Most of these changes are localized in `fixupDebugInfoPostExtraction`.
Only other change is to propagate function inputs and the replacement
values to it.
---------
Co-authored-by: Tim Gymnich <tim@gymni.ch>
Co-authored-by: Michael Kruse <llvm-project@meinersbur.de>
The "preserve input debug-info format" flag allowed some tooling to opt
into not seeing the new debug records yet, and to not autoupgrade. This
was good at the time, but un-necessary now that we'll be ditching
intrinsics shortly.
It also hides errors now: verify-uselistorder was hardcoding this flag
to on, and as a result it hasn't seen debug records before. Thus, we
missed a uselistorder variation: constant-expressions such as GEPs can
be contained within debug records and completely isolated from the value
hierachy, see the metadata-use-uselistorder.ll test. These Values didn't
get ordered, but were legitimate uses of constants like "i64 0", and we
now run into difficulty handling that. The patch to AsmWriter seeks
Values to order even through debug-info now.
Finally there are a few intrinsics-tests relying on this flag that we
can just delete, such as one in llvm-reduce and another few in the
LocalTest unit tests. For the fast-isel test, it was added in
https://reviews.llvm.org/D67703 explicitly for checking the size of
blocks without debug-info and in 1525abb9c94 the codepath it tests moved
towards being sunsetted. It'll be totally redundant once RemoveDIs is on
permanently.
Note that there's now no explicit test for the textual-IR autoupgrade
path. I submit that we can rely on the thousands of .ll files where
we've only been bothered to update the outputs, not the inputs, to debug
records.
This PR fixes incorrect LCSSA PHI node generation when splitting
critical edges with both
`PreserveLCSSA` and `MergeIdenticalEdges` enabled. The bug caused PHI
nodes in the split block
to miss predecessors when multiple identical edges were merged.
During the transition from debug intrinsics to debug records, we used
several different command line options to customise handling: the
printing of debug records to bitcode and textual could be independent of
how the debug-info was represented inside a module, whether the
autoupgrader ran could be customised. This was all valuable during
development, but now that totally removing debug intrinsics is coming
up, this patch removes those options in favour of a single flag
(experimental-debuginfo-iterators), which enables autoupgrade, in-memory
debug records, and debug record printing to bitcode and textual IR.
We need to do this ahead of removing the
experimental-debuginfo-iterators flag, to reduce the amount of
test-juggling that happens at that time.
There are quite a number of weird test behaviours related to this --
some of which I simply delete in this commit. Things like
print-non-instruction-debug-info.ll , the test suite now checks for
debug records in all tests, and we don't want to check we can print as
intrinsics. Or the update_test_checks tests -- these are duplicated with
write-experimental-debuginfo=false to ensure file writing for intrinsics
is correct, but that's something we're imminently going to delete.
A short survey of curious test changes:
* free-intrinsics.ll: we don't need to test that debug-info is a zero
cost intrinsic, because we won't be using intrinsics in the future.
* undef-dbg-val.ll: apparently we pinned this to non-RemoveDIs in-memory
mode while we sorted something out; it works now either way.
* salvage-cast-debug-info.ll: was testing intrinsics-in-memory get
salvaged, isn't necessary now
* localize-constexpr-debuginfo.ll: was producing "dead metadata"
intrinsics for optimised-out variable values, dbg-records takes the
(correct) representation of poison/undef as an operand. Looks like we
didn't update this in the past to avoid spurious test differences.
* Transforms/Scalarizer/dbginfo.ll: this test was explicitly testing
that debug-info affected codegen, and we deferred updating the tests
until now. This is just one of those silent gnochange issues that get
fixed by RemoveDIs.
Finally: I've added a bitcode test, dbg-intrinsics-autoupgrade.ll.bc,
that checks we can autoupgrade debug intrinsics that are in bitcode into
the new debug records.
The previous implementation incorrectly calculated incoming values from
loop backedges, as demonstrated by the tests. The issue was that it did
not distinguish between live-in and live-out values for blocks. This
patch addresses the problem and fixes
https://github.com/llvm/llvm-project/pull/131761.
To avoid bloating storage in `R.Defines`, processing data has been moved
to a temporary map `BBInfos`. This change helps manage heap allocation
more efficiently and likely improves caching.
These tests demonstrate the issue in SSAUpdaterBulk when it calculates
incoming values from loop back edges.
The failures are marked with `EXPECT_NONFATAL_FAILURE`, which is the way
to designate an "expected fail" in the Google Test suite.
When we collect a contextual profile, we sample the threads entering its root and only collect on one at a time (see `ContextRoot::Taken`). If we want to compare profiles between contextual profiles, and/or flat profiles, we have a problem: we don't know how to compare the counter values relative to each other. To that end, we add `ContextRoot::TotalEntries`, which is incremented every time a root is entered and serves as multiplier for the counter values collected under that root.
We expose this in the profile and leave the normalization to the user of the profile, for a few reasons:
* it's only needed if reasoning about all profiles in aggregate.
* the goal, in compiler_rt, is to flush out the profile as quickly as possible, and performing multiplications adds an overhead that may not even be necessary if the consumer of the profile doesn't care about combining profiles
* the information itself may be interesting as an indication of relative sampling of various contexts.
The profile format has now a separate section called "Contexts" - there will be a corresponding one for flat profiles. The root has a separate tag because, in addition to not having a callsite ID as all the other context nodes have under it, it will have additional fields in subsequent patches.
The rest of this patch amounts to a bit of refactorings in the reader/writer (for better reuse later) and tests fixups.
After #124287 updated several functions to return iterators rather than
Instruction *, it was no longer straightforward to pass their result to
DIBuilder. This commit updates DIBuilder methods to accept an
InsertPosition instead, so that they can be called with an iterator
(preferred), or with a deprecation warning an Instruction *, or a
BasicBlock *. This commit also updates the existing calls to the
DIBuilder methods to pass in iterators.
As a special exception, DIBuilder::insertDeclare() keeps a separate
overload accepting a BasicBlock *InsertAtEnd. This is because despite
the name, this method does not insert at the end of the block, therefore
this cannot be handled implicitly by using InsertPosition.
After #124287 updated several functions to return iterators rather than
Instruction *, it was no longer straightforward to pass their result to
DIBuilder. This commit updates DIBuilder methods to accept an
InsertPosition instead, so that they can be called with an iterator
(preferred), or with a deprecation warning an Instruction *, or a
BasicBlock *. This commit also updates the existing calls to the
DIBuilder methods to pass in iterators.
When creating `EnumDecl`s from DWARF for Objective-C `NS_ENUM`s, the
Swift compiler tries to figure out if it should perform "swiftification"
of that enum (which involves renaming the enumerator cases, etc.). The
heuristics by which it determines whether we want to swiftify an enum is
by checking the `enum_extensibility` attribute (because that's what
`NS_ENUM` pretty much are). Currently LLDB fails to attach the
`EnumExtensibilityAttr` to `EnumDecl`s it creates (because there's not
enough info in DWARF to derive it), which means we have to fall back to
re-building Swift modules on-the-fly, slowing down expression evaluation
substantially. This happens around
4b3931c8ce/lib/ClangImporter/ImportEnumInfo.cpp (L37-L59)
To speed up Swift exression evaluation, this patch proposes encoding the
C/C++/Objective-C `enum_extensibility` attribute in DWARF via a new
`DW_AT_APPLE_ENUM_KIND`. This would currently be only used from the LLDB
Swift plugin. But may be of interest to other language plugins as well
(though I haven't come up with a concrete use-case for it outside of
Swift).
I'm open to naming suggestions of the various new attributes/attribute
constants proposed here. I tried to be as generic as possible if we
wanted to extend it to other kinds of enum properties (e.g., flag
enums).
The new attribute would look as follows:
```
DW_TAG_enumeration_type
DW_AT_type (0x0000003a "unsigned int")
DW_AT_APPLE_enum_kind (DW_APPLE_ENUM_KIND_Closed)
DW_AT_name ("ClosedEnum")
DW_AT_byte_size (0x04)
DW_AT_decl_file ("enum.c")
DW_AT_decl_line (23)
DW_TAG_enumeration_type
DW_AT_type (0x0000003a "unsigned int")
DW_AT_APPLE_enum_kind (DW_APPLE_ENUM_KIND_Open)
DW_AT_name ("OpenEnum")
DW_AT_byte_size (0x04)
DW_AT_decl_file ("enum.c")
DW_AT_decl_line (27)
```
Absence of the attribute means the extensibility of the enum is unknown
and abides by whatever the language rules of that CU dictate.
This does feel like a big hammer for quite a specific use-case, so I'm
happy to discuss alternatives.
Alternatives considered:
* Re-using an existing DWARF attribute to express extensibility. E.g., a
`DW_TAG_enumeration_type` could have a `DW_AT_count` or
`DW_AT_upper_bound` indicating the number of enumerators, which could
imply closed-ness. I felt like a dedicated attribute (which could be
generalized further) seemed more applicable. But I'm open to re-using
existing attributes.
* Encoding the entire attribute string (i.e., `DW_TAG_LLVM_annotation
("enum_extensibility((open))")`) on the `DW_TAG_enumeration_type`. Then
in LLDB somehow parse that out into a `EnumExtensibilityAttr`. I haven't
found a great API in Clang to parse arbitrary strings into AST nodes
(the ones I've found required fully formed C++ constructs). Though if
someone knows of a good way to do this, happy to consider that too.
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
Add an extra knob to RuntimeUnrollMultiExit to let backends control
whether to allow multi-exit unrolling on a per-loop basis.
This gives backends more fine-grained control on deciding if multi-exit
unrolling is profitable for a given loop and uarch. Similar to
4226e0a0c75.
PR: https://github.com/llvm/llvm-project/pull/124462
This is a follow-up from PR #122545, which enabled converting yaml to contextual profiles.
This change uses the lower level yaml APIs because:
- the mapping APIs `llvm::yaml` offers don't work with `const` values, because they (the APIs) want to enable both serialization and deserialization
- building a helper data structure would be an alternative, but it'd be either memory-consuming or overly-complex design, given the recursive nature of the contextual profiles.
We have a textual representation of contextual profiles for test scenarios, mainly. This patch moves that to YAML instead of JSON. YAML is more succinct and readable (some of the .ll tests should be illustrative). In addition, JSON is parse-able by the YAML reader.
A subsequent patch will address deserialization.
(thanks, @kazutakahirata, for showing me how to use the llvm YAML reader/writer APIs, which I incorrectly thought to be more low-level than the JSON ones!)
Add an extra know to UnrollingPreferences to let backends control the
maximum budget for SCEV expansions.
This gives backends more fine-grained control on the cost of the runtime
checks for runtime unrolling.
PR: https://github.com/llvm/llvm-project/pull/118316
This modification will enable the usage of `MergeFunctions` as a
standalone library. Currently, `MergeFunctions` can only be applied to
an entire module. By adopting this change, developers will gain the
flexibility to reuse the `MergeFunctions` code within their own
projects, choosing which functions to merge; hence, promoting code
reusability. Notice that this modification will not break backward
compatibility, because `MergeFunctions` will still work as a pass after
the modification.
Reorganize the code into phases:
* Analyze/normalize
* Create extracted function prototype
* Generate the new function's implementation
* Generate call to new function
* Connect call to original function's CFG
The motivation is #114669 to optionally clone the selected code region
into the new function instead of moving it. The current structure made
it difficult to add such functionality since there was no obvious place
to do so, not made easier by some functions doing more than their name
suggests. For instance, constructFunction modifies code outside the
constructed function, but also function properties such as
setPersonalityFn are derived somewhere else. Another example is
emitCallAndSwitchStatement, which despite its name also inserts stores
for output parameters.
Many operations also implicitly depend on the order they are applied
which this patch tries to reduce. For instance, ExtractedFuncRetVals
becomes the list exit blocks which also defines the return value when
leaving via that block. It is computed early such that the new
function's return instructions and the switch can be generated
independently. Also, ExtractedFuncRetVals is combining the lists
ExitBlocks and OldTargets which were not always kept consistent with
each other or NumExitBlocks. The method recomputeExitBlocks() will
update it when necessary.
The coding style partially contradict the current coding standard. For
instance some local variable start with lower case letters. I updated
some, but not all occurrences to make the diff match at least some lines
as unchanged.
The patch [D96854](https://reviews.llvm.org/D96854) introduced some
confusion of function argument indexes this is fixed here as well, hence
the patch is not NFC anymore. Tested in modified CodeExtractorTest.cpp.
Patch [D121061](https://reviews.llvm.org/D121061) introduced
AllocationBlock, but not all allocas were inserted there.
Efectively includes the following fixes:
1. ce73b1672a
2. 4aaa925786
3. Missing allocas, still unfixed
Originally submitted as https://reviews.llvm.org/D115218
The code extractor tries to apply the names of source input and output
values to function arguments. Not all input and output values get added
as arguments: some are instead placed inside of a struct passed to the
function. The existing renaming code skipped trying to set these
struct-packed arguments names (as there is no corresponding function
argument to rename), but it still incremented the iterator over the
function arguments. This could result in dereferencing an end iterator
if struct-packed inputs/outputs preceded non-struct-packed
inputs/outputs.
This patch rewrites this loop to avoid the end iterator dereference.
As pointed out in the review of #102133, SCEVExpander currently
incorrectly reuses GEP instructions that have poison-generating flags
set. Fix this by clearing the flags on the reused instruction.
Add an overload of `InlineFunction` that updates the contextual profile. If there is no contextual profile, this overload is equivalent to the non-contextual profile variant.
Post-inlining, the update mainly consists of:
- making the PGO instrumentation of the callee "the caller's": the owner function (the "name" parameter of the instrumentation instructions) becomes the caller, and new index values are allocated for each of the callee's indices (this happens for both increment and callsite instrumentation instructions)
- in the contextual profile:
- each context corresponding to the caller has its counters updated to incorporate the counters inherited from the callee at the inlined callsite. Counter values are copied as-is because no scaling is required since the profile is contextual.
- the contexts of the callee (at the inlined callsite) are moved to the caller.
- the callee context at the inlined callsite is deleted.
An overload of `llvm::promoteCallWithIfThenElse` that updates the contextual profile.
High-level, this is very simple: after creating the `if... then (direct call) else (indirect call)` structure, we instrument the new callsites and BBs (the instrumentation will help with tracking for other IPO transformations, and, ultimately, to match counter values before flattening to `MD_prof`).
In more detail:
- move the callsite instrumentation of the indirect call to the `else` BB, before the indirect call
- create a new callsite instrumentation for the direct call
- create instrumentation for both the `then` and `else` BBs - we could instrument just one (MST-style) but we're not running the binary with this instrumentation, and at most this would save some space (less counters tracked). For simplicity instrumenting both at this point
- update each context belonging to the caller by updating the counters, and moving the indirect callee to the new, direct callsite ID
Issue #89287
These are the final few places in LLVM that use instruction pointers to
insert instructions -- use iterators instead, which is needed for
debug-info correctness in the future. Most of this is a gentle
scattering of getIterator calls or not deref-then-addrofing iterators.
libfuzzer does require a storage change to keep built instruction
positions in a container though. The unit-test changes are very
straightforwards.
This leaves us in a position where libfuzzer can't fuzz on either of
debug-info records, however I don't believe that fuzzing of debug-info
is in scope for the library.
I had put this in Transforms/Utils, but that doesn't actually make
sense if we want to populate these structures via an analysis pass.
Pull Request: https://github.com/llvm/llvm-project/pull/100621