This has gotten out of sync with the actual flag reduction. It's
not particularly important though, as it only seems to be used to
determine whether to do another round of delta reduction.
This hint map is not required whenever a new register is added, in fact,
at -O0, it is not used at all. Growing this map is quite expensive, as
SmallVectors are not trivially copyable.
Grow this map only when hints are actually added to avoid multiple grows
and grows when no hints are added at all.
This avoids another unserializable field. Move the DbgInfoAvailable
field into the AsmPrinter, which is only really a cache/convenience
bit for checking a direct IR module metadata check.
Most users of PseudoSourceValue.h only need PseudoSourceValue, not the
PseudoSourceValueManager. However, this header pulls in some very
expensive dependencies like ValueMap.h, which is only used for the
manager.
Split off the manager into a separate header and include it only where
used.
This creates a TargetMachine with the default options (from the command
line flags). This allows us to share a bit more code between tools.
Differential Revision: https://reviews.llvm.org/D141057
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
Record the call frame size on entry to each basic block. This is usually
zero except when a basic block has been split in the middle of a call
sequence.
This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.
Differential Revision: https://reviews.llvm.org/D156113
Record the SP adjustment on entry to each basic block. This is almost
always zero except on targets like ARM which can split a basic block in
the middle of a call sequence.
This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.
Differential Revision: https://reviews.llvm.org/D154281
Fix accidentally passing pointer to bool argument This was supposed to
be writing bitcode with preserved uselistorder, but instead was only
enabling it with LTO module summaries.
This patch replaces the uses of PointerUnion.is function by llvm::isa,
PointerUnion.get function by llvm::cast, and PointerUnion.dyn_cast by
llvm::dyn_cast_if_present. This is according to the FIXME in
the definition of the class PointerUnion.
This patch does not remove them as they are being used in other
subprojects.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D148449
We add a field `IsOutlined` to indicate whether a MachineFunction
is outlined and set it true for outlined functions in MachineOutliner.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D146191
The forwarding header is left in place because of its use in
`polly/lib/External/isl/interface/extract_interface.cc`, but I have
added a GCC warning about the fact it is deprecated, because it is used
in `isl` from where it is included by Polly.
Use the existing mechanism to change the data layout using callbacks.
Before this patch, we had a callback type DataLayoutCallbackTy that receives
a single StringRef specifying the target triple, and optionally returns
the data layout string to be used. Module loaders (both IR and BC) then
apply the callback to potentially override the module's data layout,
after first having imported and parsed the data layout from the file.
We can't do the same to fix invalid data layouts, because the import will already
fail, before the callback has a chance to fix it.
Instead, module loaders now tentatively parse the data layout into a string,
wait until the target triple has been parsed, apply the override callback
to the imported string and only then parse the tentative string as a data layout.
Moreover, add the old data layout string S as second argument to the callback,
in addition to the already existing target triple argument.
S is either the default data layout string in case none is specified, or the data
layout string specified in the module, possibly after auto-upgrades (for the BitcodeReader).
This allows callbacks to inspect the old data layout string,
and fix it instead of setting a fixed data layout.
Also allow to pass data layout override callbacks to lazy bitcode module
loader functions.
Differential Revision: https://reviews.llvm.org/D140985
We required the test and input arguments for --print-delta-passes
which is unhelpful. Also, start printing the help output if no
arguments were supplied.
It looks like there's more sophisticated ways to accomplish this with
the opt library, but it was less work to manually emit these errors.
Previously, this unconditionally emitted text IR. I ran
into a bug that manifested in broken disassembly, so the
desired output was the bitcode format. If the input format
was binary bitcode, the requested output file ends in .bc,
or an explicit -output-bitcode option was used, emit bitcode.
In a testcase I'm working on, the old write out and count IR lines
was taking about 200-300ms per iteration. This drops it out of the
profile.
This doesn't account for everything, but it doesn't seem to matter.
We should probably try to account for metadata and constantexpr tree
depths.
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code. Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.
Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.
So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.
Discovered while trying to sort out related stuff on D102817.
Differential Revision: https://reviews.llvm.org/D124697
I have a register allocator failure that only reproduces with IPRA
enabled, and requires the specific regmask if I want to only run the
one relevant pass. The printed custom regmask is enormous and I would
like to reduce it.
This reduces each individual bit in the mask, but it would probably be
better to start at register units and clear all aliasing fields at a
time. This would require stricter verification that all aliasing bits
are set in regmasks (although I would prefer to switch regmasks to use
register units in the first place).
Adds support for reading and writing LTO bitcode files.
- Emit a summary if the original bitcode file had a summary
- Use split LTO units if the original bitcode file used them.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D127168
MIR support is totally unusable for AMDGPU without this, since the set
of reserved registers is set from fields here.
Add a clone method to MachineFunctionInfo. This is a subtle variant of
the copy constructor that is required if there are any MIR constructs
that use pointers. Specifically, at minimum fields that reference
MachineBasicBlocks or the MachineFunction need to be adjusted to the
values in the new function.
Use the query that doesn't assert if TracksLiveness isn't set, which
needs to always be available. We also need to start printing liveins
regardless of TracksLiveness.
I'm a bit confused by what's actually stored for the allocation
hints. The MIR parser only handles the "simple" case where there's a
single hint. I don't really understand the assertion in
clearSimpleHint, or under what circumstances there are multiple hint
registers.
Many MIR reductions benefit from or require increasing the instruction
count. For example, unlike in the IR, you may need to insert a new
instruction to represent an undef. The current instruction reduction
pass works around this by sticking implicit defs on whatever
instruction happens to be first in the entry block block.
Other strategies I've applied manually include breaking instructions
with multiple defs into separate instructions, or breaking large
register defs into multiple subregister defs.
Make up a simple scoring system based on what I generally try to get
rid of first when manually reducing. Counts implicit defs as free
since reduction passes will be introducing them, although they
probably should count for something. It also might make more sense to
have a comparison the two functions, rather than having to compute a
contextless number. This isn't particularly well tested since overall
the MIR support isn't in a place where it is useful on the kinds of
testcases I want to throw at it.
There were two problems with directly copying the MMOs from the old
function. The MMOs are owned by the function's Allocator, so need to
be reallocated anyways (surprisingly I didn't notice breakage on
this). Second, the PseudoSourceValues are also allocated per function
and need to be reallocated.
The current testcase I'm trying to reduce only reproduces with IPRA
enabled and requires handling multiple functions.
The only real difference vs. the IR is the extra indirect to look for
the underlying MachineFunction, so treat the ReduceWorkItem as the
module instead of the function.
The ugliest piece of this is really the ugliness of
MachineModuleInfo. It not only tracks actual module state, but has a
number of transient fields used for isel and/or the asm printer. These
shouldn't do any harm for the use here, though they should be
separated out.
Just clone all the virtual registers instead of looking for def
operands. This preserves the register values used, simplifying the
rest of the code. This avoids needing to expose the register map to
target code.
Previously the specific values used for fixed frame indexes was in
reverse order in the cloned function from the original, and a map was
used to adjust all frame indexes to the potentially new values. Insert
the fixed objects in reverse to avoid this. This simplifies other
code, since now we don't need to track down all frame indexes. This
will allow targets that store frame indexes in MachineFunctionInfo to
simply copy the values.
Note this isn't directly observable in the test since the resulting
MIR print/parse can shuffle the IDs around (in particular the final
serialization implicitly strips out dead objects).
getSuccProbability was private for some reason, saying to go through
MachineBranchProbabilityInfo. There doesn't seem to be much point to
that, as that wrapper directly calls this.
Like other areas, some of these fields aren't handled by the MIR
printer/parser so aren't tested.
This didn't work at all before, and would assert on any frame
index. Also copy the other fields, which I believe should cover
everything. There are a few that are untested since MIR serialization
is apparently still missing them (isStatepointSpillSlot,
ObjectSSPLayout, and ObjectSExt/ObjectZExt).