This is a reland of #138434 except that:
- the bits for llvm/lib/CodeGen/RenameIndependentSubregs.cpp
have been dropped because they caused a test failure under asan, and
- the bits for llvm/lib/CodeGen/SelectionDAG/ScheduleDAGFast.cpp have
been improved with structured bindings.
This reverts commit a9699a334bc9666570418a3bed9520bcdc21518b.
Breaks CodeGen/AMDGPU/collapse-endcf.ll in several configs
(sanitizer builds; macOS; possibly more), see comments on
https://github.com/llvm/llvm-project/pull/138434
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
We use variable locations such as DBG_VALUE $xmm0 as shorthand to refer
to "the low lane of $xmm0", and this is reflected in how DWARF is
interpreted too. However InstrRefBasedLDV tries to be smart and
interprets such a DBG_VALUE as a 128-bit reference. We then issue a
DW_OP_deref_size of 128 bits to the stack, which isn't permitted by
DWARF (it's larger than a pointer).
Solve this for now by not using DW_OP_deref_size if it would be illegal.
Instead we'll use DW_OP_deref, and the consumer will load the variable
type from the stack, which should be correct.
There's still a risk of imprecision when LLVM decides to use smaller or
larger value types than the source-variable type, which manifests as
too-little or too-much memory being read from the stack. However we
can't solve that without putting more type information in debug-info.
fixes#64093
This relands commit #115111.
Use traditional way to update post dominator tree, i.e. break critical
edge splitting into insert, insert, delete sequence.
When splitting critical edges, the post dominator tree may change its
root node, and `setNewRoot` only works in normal dominator tree...
See
6c7e5827ed/llvm/include/llvm/Support/GenericDomTree.h (L684-L687)
This patch handles virtual registers in the VarLocBasedImpl of the
LiveDebugVariables pass, allowing it to be used on architectures that
depend on virtual registers in debugging, like NVPTX. It enables the
pass for NVPTX.
This replaces some of the most frequent offenders of using a DenseMap that
cause a malloc, where the typical element-count is small enough to fit in
an initial stack allocation.
Most of these are fairly obvious, one to highlight is the collectOffset
method of GEP instructions: if there's a GEP, of course it's going to have
at least one offset, but every time we've called collectOffset we end up
calling malloc as well for the DenseMap in the MapVector.
These would implicitly cast the register to `unsigned`. Switch most of
them to use printReg will give a more readable output. Change some
others to use Register::id() so we can eventually remove the implicit
cast to `unsigned`.
Now revised to actually make the unit test compile, which I'd been
ignoring. No actual functional change, it's a type difference.
Original commit message follows.
[DebugInfo][InstrRef] Index DebugVariables and some DILocations (#99318)
A lot of time in LiveDebugValues is spent computing DenseMap keys for
DebugVariables, and they're made up of three pointers, so are large.
This patch installs an index for them: for the SSA and value-to-location
mapping parts of InstrRefBasedLDV we don't need to access things like
the variable declaration or the inlining site, so just use a uint32_t
identifier for each variable fragment that's tracked. The compile-time
performance improvements are substantial (almost 0.4% on the tracker).
About 80% of this patch is just replacing DebugVariable references with
DebugVariableIDs instead, however there are some larger consequences. We
spend lots of time fetching DILocations when emitting DBG_VALUE
instructions, so index those with the DebugVariables: this means all
DILocations on all new DBG_VALUE instructions will normalise to the
first-seen DILocation for the variable (which should be fine).
We also used to keep an ordering of when each variable was seen first in
a DBG_* instruction, in the AllVarsNumbering collection, so that we can
emit new DBG_* instructions in a stable order. We can hang this off the
DebugVariable index instead, so AllVarsNumbering is deleted.
Finally, rather than ordering by AllVarsNumbering just before DBG_*
instructions are linked into the output MIR, store instructions along
with their DebugVariableID, so that they can be sorted by that instead.
A lot of time in LiveDebugValues is spent computing DenseMap keys for
DebugVariables, and they're made up of three pointers, so are large.
This patch installs an index for them: for the SSA and value-to-location
mapping parts of InstrRefBasedLDV we don't need to access things like
the variable declaration or the inlining site, so just use a uint32_t
identifier for each variable fragment that's tracked. The compile-time
performance improvements are substantial (almost 0.4% on the tracker).
About 80% of this patch is just replacing DebugVariable references with
DebugVariableIDs instead, however there are some larger consequences. We
spend lots of time fetching DILocations when emitting DBG_VALUE
instructions, so index those with the DebugVariables: this means all
DILocations on all new DBG_VALUE instructions will normalise to the
first-seen DILocation for the variable (which should be fine).
We also used to keep an ordering of when each variable was seen first in
a DBG_* instruction, in the AllVarsNumbering collection, so that we can
emit new DBG_* instructions in a stable order. We can hang this off the
DebugVariable index instead, so AllVarsNumbering is deleted.
Finally, rather than ordering by AllVarsNumbering just before DBG_*
instructions are linked into the output MIR, store instructions along
with their DebugVariableID, so that they can be sorted by that instead.
When resolving value-numbers to specific machine locations in the final
stages of LiveDebugValues, we've been producing a DenseMap containing
all the value-numbers we're interested in. However we never modify the
map keys as they're all pre-known. Thus, this is a suitable collection
to switch to a sorted vector that gets searched, rather than a DenseMap
that gets probed. The overall operation of LiveDebugValues isn't
affected at all.
This patch adjusts how some data is stored to avoid a number of
un-necessary DenseMap queries. There's no change to the compiler
behaviour, and it's measurably faster on the compile time tracker.
The BlockOrders vector in buildVLocValueMap collects the blocks over
which a variables value have to be determined: however the Cmp ordering
function makes two DenseMap queries to determine the RPO-order of blocks
being compared. And given that sorting is O(N log(N)) comparisons this
isn't fast. So instead, fetch the RPO-numbers of the block collection,
order those, and then map back to the blocks themselves.
The OrderToBB collection mapped an RPO-number to an MBB: it's completely
un-necessary to have DenseMap here, we can just use the RPO number as an
array index. Switch it to a SmallVector and deal with a few consequences
when iterating.
(And for good measure I've jammed in a couple of reserve calls).
Summary:
- Remove wrappers in `MachineDominatorTree`.
- Remove `MachineDominatorTree` update code in
`MachineBasicBlock::SplitCriticalEdge`.
- Use `MachineDomTreeUpdater` in passes which call
`MachineBasicBlock::SplitCriticalEdge` and preserve
`MachineDominatorTreeWrapperPass` or CFG analyses.
Commit abea99f65a97248974c02a5544eaf25fc4240056 introduced related
methods in 2014. Now we have SemiNCA based dominator tree in 2017 and
dominator tree updater, the solution adopted here seems a bit outdated.
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.
This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.
Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
Commit 1b531d54f623 (#74203) removed the usage of unique_ptrs of arrays
in favour of using vectors, but inadvertently increased peak memory
usage by removing the ability to deallocate vector memory that was no
longer needed mid-LDV.
In that same review, it was pointed out that `FuncValueTable` typedef
could be removed, since it was "just a vector".
This commit addresses both issues by making `FuncValueTable` a real data
structure, capable of mapping BBs to ValueTables and able to free
ValueTables as needed.
This reduces peak memory usage in the compiler by 10% in the benchmarks
flagged by the original review.
As a consequence, we had to remove a handful of instances of the
"declare-then-initialize" antipattern in unittests, as the
FuncValueTable class is no longer default-constructible.
These are usually difficult to reason about, and they were being used to
pass raw pointers around with array semantic (i.e., we were using
operator [] on raw pointers). To put it in InstrRef terminology: we were
passing a pointer to a ValueTable but using it as if it were a
FuncValueTable.
These could have easily been SmallVectors, which now allow us to have
reference semantics in some places, as well as simpler initialization.
In the future, we can use even more pass-by-reference with some extra
changes in the code.
This is a conservative workaround for broken liveness tracking of
SUBREG_TO_REG to speculatively fix all targets. The current reported
failures are on X86 only, but this issue should appear for all targets
that use SUBREG_TO_REG. The next minimally correct refinement would be
to disallow only implicit defs.
The coalescer now introduces implicit-defs of the super register to
track the dependency on other subregisters. If we see such an implicit
operand, we cannot simply treat the subregister def as the result
operand in case downstream users depend on the implicitly defined
parts. Really target implementations should be considering the
implicit defs and trying to interpret them appropriately (maybe with
some generic helpers). The full implicit def could possibly be
reported as the move result, rather than the subregister def but that
requires additional work.
Hopefully fixes#64060 as well.
This needs to be applied to the release branch.
https://reviews.llvm.org/D156346