This change removes the uint64_t constructor on LocationSize
preventing implicit conversion, and fixes up the using APIs to adapt to
the change. Note that I'm adding a couple of explicit conversion points
on routines where passing in a fixed offset as an integer seems likely
to have well understood semantics.
We had an unfortunate case which arose if you tried to pass a TypeSize
value to a parameter of LocationSize type. We'd find the implicit
conversion path through TypeSize -> uint64_t -> LocationSize which works
just fine for fixed values, but looses information and fails assertions
if the TypeSize was scalable. This change breaks the first link in that
implicit conversion chain since that seemed to be the easier one.
This just moves the x86 implementation into generic code since it appears
to be suitable for any target. The heart of this transform is inside
foldMemoryOperand so other targets won't actually kick in until they
implement said API. This just removes one piece to implement in the
process of enabling foldMemoryOperand.
This pass is designed to increase ILP by performing accumulation into
multiple registers. It currently supports only the S/UABAL accumulation
instruction, but can be extended to support additional instructions.
Reland of #126060 which was reverted due to a conflict with #131272.
This pattern shows up often in media libraries. The optimization should only
kick in for O3. Currently only supports a single family of accumulation
instructions, but can easily be expanded to support additional
instructions in the future.
Fixes the "use after poison" issue introduced by #121516 (see
<https://github.com/llvm/llvm-project/pull/121516#issuecomment-2585912395>).
The root cause of this issue is that #121516 introduced "Called Global"
information for call instructions modeling how "Call Site" info is
stored in the machine function, HOWEVER it didn't copy the
copy/move/erase operations for call site information.
The fix is to rename and update the existing copy/move/erase functions
so they also take care of Called Global info.
We want special handing for IGLP instructions in the scheduler but they
should still be treated like they have side effects by other passes. Add
a target hook to the ScheduleDAGInstrs DAG builder so that we have more
control over this.
The renamable flag is useful during MachineCopyPropagation but renamable
flag will be dropped after lowerCopy in some case.
This patch introduces extra arguments to pass the renamable flag to
copyPhysReg.
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.
Work towards TODO comment to remove `raw_string_ostream::str()`.
This opens up a door for reusing reassociation optimizations on
target-specific binary operations with non-standard operand list.
This is effectively a NFC.
D33412/D33413 introduced this to support a clang pragma to set section
names for a symbol depending on if it would be placed in
bss/data/rodata/text, which may not be known until the backend. However,
for text we know that only functions will go there, so just directly set
the section in clang instead of going through a completely separate
attribute.
Autoupgrade the "implicit-section-name" attribute to directly setting
the section on a Fuction.
We split target-dependent MachineCombiner patterns into their target
folder.
This makes MachineCombiner much more target-independent.
Reviewers:
davemgreen, asavonic, rotateright, RKSimon, lukel97, LuoYuanke, topperc, mshockwave, asi-sc
Reviewed By: topperc, mshockwave
Pull Request: https://github.com/llvm/llvm-project/pull/87991
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.
This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.
Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
Follow up on a post-commit review of 9468de4 (TargetInstrInfo: make
getOperandLatency return optional (NFC)) by Bjorn Pettersson to fix a
couple of things that are not NFC:
- std::optional<T>::operator<= returns true if the first operand is a
std::nullopt and second operand is T. Fix a couple of places where we
assumed it would return false.
- In TargetSchedule, computeInstrCost could take another codepath,
returning InstrLatency instead of DefaultDefLatency. Fix one instance
not accounting for this behavior.
In commit b05335989239 ("[X86InstrInfo] support memfold on spillable
inline asm
(#70832)"), I had a last minute fix to update the memoperands. I
originally
did this in the parent foldInlineAsmMemOperand call, updated the mir
test via
update_mir_test_checks.py, but then decided to move it to the child call
of
foldInlineAsmMemOperand.
But I forgot to rerun update_mir_test_checks.py. That last minute change
caused
the same memoperand to be added twice when recursion occurred (for tied
operands). I happened to get lucky that trailing content omitted from
the
CHECK line doesn't result in test failure.
But rerunning update_mir_test_checks.py on the mir test added in that
commit
produces updated output. This is resulting in updates to the test that:
1. conflate additions to the test in child commits with simply updating
the
test as it should have been when first committed.
2. look wrong because the same memoperand is specified twice (we don't
deduplicate memoperands when added). Example:
INLINEASM ... :: (load (s32) from %stack.0) (load (s32) from %stack.0)
Fix the bug, so that in child commits, we don't have additional
unrelated test
changes (which would be wrong anyways) from simply running
update_mir_test_checks.py.
Link: #20571
getOperandLatency has the following behavior: it returns -1 as a special
value, negative numbers other than -1 on some target-specific overrides,
or a valid non-negative latency. This behavior can be surprising, as
some callers do arithmetic on these negative values. Change the
interface of getOperandLatency to return a std::optional<unsigned> to
prevent surprises in callers. While at it, change the interface of
getInstrLatency to return unsigned instead of int.
This change was inspired by a refactoring in
TargetSchedModel::computeOperandLatency.
This enables -regalloc=greedy to memfold spillable inline asm
MachineOperands.
Because no instruction selection framework marks MachineOperands as
spillable, no language frontend can observe functional changes from this
patch. That will change once instruction selection frameworks are
updated.
Link: https://github.com/llvm/llvm-project/issues/20571
Don't blindly copy the original flags from the pre-reassociated
instrutions.
This copied the integer poison flags which are not safe to preserve
after reassociation.
For the FP flags, I think we should only keep the intersection of
the flags. Override setSpecialOperandAttr to do this.
Fixes#72777.
foldMemoryOperand looks at pairs of instructions (generally a load to
virt reg then use of the virtreg, or def of a virtreg then a store) and
attempts to combine them. This can reduce register pressure.
A prior commit added the ability to mark such a MachineOperand as
foldable. In terms of INLINEASM, this means that "rm" was used (rather
than just "r") to denote that the INLINEASM may use a memory operand
rather than a register operand. This effectively undoes decisions made
by the instruction selection framework. Callers will be added in the
register allocation frameworks. This has been tested with all of the
above (which will come as follow up patches).
Thanks to @topperc who suggested this at last years LLVM US Dev Meeting
and @qcolombet who confirmed this was the right approach.
Link: https://github.com/llvm/llvm-project/issues/20571
When using the inline asm constraint string "rm" (or "g"), we generally
would like the compiler to choose "r", but it is permitted to choose "m"
if there's register pressure. This is distinct from "r" in which the
register is not permitted to be spilled to the stack.
The decision of which to use must be made at some point. Currently, the
instruction selection frameworks (ISELs) make the choice, and the
register allocators had better be able to handle the result.
Steal a bit from Storage when using register operands to disambiguate
between the two cases. Add helpers/getters/setters, and print in MIR
when such a register is foldable.
The getter will later be used by the register allocation frameworks (and
asserted by the ISELs) while the setters will be used by the instruction
selection frameworks.
Link: https://github.com/llvm/llvm-project/issues/20571
28b9126879
introduced the path cloning format in the basic-block-sections profile.
This PR validates and applies path clonings.
A path cloning is valid if all of these conditions hold:
1. All bb ids in the path are mapped to existing blocks.
2. Each two consecutive bb ids in the path have a successor relationship
in the CFG.
3. The path does not include a block with indirect branches, except
possibly as the last block.
Applying a path cloning involves cloning all blocks in the path (except
the first one) and setting up their branches.
Once all clonings are applied, the cluster information is used to guide
block layout in the modified function.
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5.
Fix up build failures in targets I missed in #66003
Kept as 3 commits for reviewers to see better what's changed. Will
squash when
merging.
- reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
- fix all the targets I missed in #66003
- fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll
This reverts commit 2ca4d136124d151216aac77a0403dcb5c5835bcd.
Also revert the followup, "[InlineAsm] fix botched merge conflict resolution"
This reverts commit 8b9bf3a9f715ee5dce96eb1194441850c3663da1.
There were SystemZ and Mips build errors, too many to fix forward.
Similar to
commit 2fad6e69851e ("[InlineAsm] wrap Kind in enum class NFC")
Fix the TODOs added in
commit 93bd428742f9 ("[InlineAsm] refactor InlineAsm class NFC
(#65649)")
I would like to steal one of these bits to denote whether a kind may be
spilled by the register allocator or not, but I'm afraid to touch of any
this code using bitwise operands.
Make flags a first class type using bitfields, rather than launder data
around via `unsigned`.
Should add some minor type safety to the use of this information, since
there's quite a bit of metadata being laundered through an `unsigned`.
I'm looking to potentially add more bitfields to that `unsigned`, but I
find InlineAsm's big ol' bag of enum values and usage of `unsigned`
confusing, type-unsafe, and un-ergonomic. These can probably be better
abstracted.
I think the lack of static_cast outside of InlineAsm indicates the prior
code smell fixed here.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D159242
Because unconditional branch relaxation on AArch64 grows the stack to
spill a register, splitting a function would cause the red zone to be
overwritten. Explicitly disable MFS for such functions.
Differential Revision: https://reviews.llvm.org/D157127
Currently `isTriviallyReMaterializable` calls
`isReallyTriviallyReMaterializable` and
`isReallyTriviallyReMaterializableGeneric`. The two interfaces
are confusing, but there are also some real issues with this.
The documentation of this function (see below) suggests that
`isReallyTriviallyRematerializable` allows the target to override the
default behaviour.
/// For instructions with opcodes for which the M_REMATERIALIZABLE flag is
/// set, this hook lets the target specify whether the instruction is actually
/// trivially rematerializable, taking into consideration its operands.
It however implements something different. The default behaviour
is the analysis done in `isReallyTriviallyReMaterializableGeneric`,
which is testing if it is safe to rematerialize the MachineInstr.
The result of `isReallyTriviallyReMaterializable` is only considered if
`isReallyTriviallyReMaterializableGeneric` returns `false`. That means
there is no way to override the default behaviour if
`isReallyTriviallyReMaterializableGeneric` returns true (i.e. it is safe to
rematerialize, but we'd rather not).
By making this a single interface, we can override the interface to do either.
Reviewed By: craig.topper, nemanjai
Differential Revision: https://reviews.llvm.org/D156520
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.
The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work
around the underlying problem with SUBREG_TO_REG.
Record the call frame size on entry to each basic block. This is usually
zero except when a basic block has been split in the middle of a call
sequence.
This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.
Differential Revision: https://reviews.llvm.org/D156113
And dependent commits.
Details in D150388.
This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.
No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll
Reviewed By: alexfh
Differential Revision: https://reviews.llvm.org/D156381
Replacing D143754. Right now the LiveRangeSplitting during register allocation uses
TargetOpcode::COPY instruction for splitting. For AMDGPU target that creates a
problem as we have both vector and scalar copies. Vector copies perform a copy over
a vector register but only on the lanes(threads) that are active. This is mostly sufficient
however we do run into cases when we have to copy the entire vector register and
not just active lane data. One major place where we need that is live range splitting.
Allowing targets to use their own copy instructions(if defined) will provide a lot of
flexibility and ease to lower these pseudo instructions to correct MIR.
- Introduce getTargetCopyOpcode() virtual function and use if to generate copy in Live range
splitting.
- Replace necessary MI.isCopy() checks with TII.isCopyInstr() in register allocator pipeline.
Reviewed By: arsenm, cdevadas, kparzysz
Differential Revision: https://reviews.llvm.org/D150388
In rare situations involving AVX intrinsics, it seems LLVM can be coaxed
into generating copies to arguments that look like this:
$xmm0 = VMOVAPSrr $xmm1, implicit-def $ymm0
CALL64 @something ymm0
This particular form of copy implicitly zeros the upper lanes of ymm0,
hence there's an implicit-def for the register in the copy. The X86
implementation of describeLoadedValue doesn't attempt to describe this sort
of copy which causes the generic implementation in
TargetInstrInfo::describeLoadedValue to fire an assertion saying it
expected the target hook to handle it.
Play it safe in the generic implementation and return the "no location /
value" return value, rather than asserting.
Differential Revision: https://reviews.llvm.org/D148626
PATCHABLE_* instructions expand to up to 36-byte
sleds. Updating the size of PATCHABLE instructions
causes them to be outlined, so we need to add a
check to prevent the outliner from considering
basic blocks that contain PATCHABLE instructions.
Differential Revision: https://reviews.llvm.org/D147982
Due to a change in the APIs used to determine what instructions can be outlined, the check for label outling was never hit. Instead, all labels were considered invisible, which is the opposite of the intended behavior and causes obscure crashes down the line. We now replicate the original behavior more closely, with explicit checks for known-good and known-bad instruction types.
Reviewed by: paquette
Differential Revision: https://reviews.llvm.org/D147178
Each target's `TargetInstrInfo` is responsible for announcing which code
patterns it is able to transform during the MachineCombiner pass.
Currently, these patterns are applied without preserving the debug
instruction number required by the InstrRef implementation of
LiveDebugValues. As such, we've seen a number of examples where debug
information is dropped for variables in InstrRef mode that were
otherwise available in VarLoc mode. This has been observed both in X86
and AArch examples.
This commit is an initial attempt at preserving said numbers by changing
the general (target agnostic) implementation of TargetInstrInfo: the
reassociation pattern must keep the debug number of the "top level"
instruction, i.e., the instruction whose value represents the final
value of the arithmetic expression. Intermediate values must have their
debug number dropped, as they have no equivalent value in the
unoptimized code.
Future work is required to update each target's
`TargetInstrInfo::genAlternativeCodeSequence` method.
Differential Revision: https://reviews.llvm.org/D145759
A pointer dereference was added (D141302) above an assert that checks
whether the pointer is null. This commit moves the assert above the
dereference and transforms it into an llvm_unreachable to better express
the intent that certain switch cases should never be reached.
Differential Revision: https://reviews.llvm.org/D145599
For in-order cores MachineCombiner makes better decisions when the critical path
is calculated only for the current basic block and does not take into account
other blocks from the trace.
This patch adds a virtual method to TargetInstrInfo to allow each target decide
which strategy to use.
Depends on D140541
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D140542
The motivation behind this patch is to unify some of the outliner logic across architectures. This looks nicer in general and makes fixing [issues like this](https://reviews.llvm.org/D124707#3483805) easier.
There are some notable changes here:
1. `isMetaInstruction()` is used directly instead of checking for specific meta-instructions like `IMPLICIT_DEF` or `KILL`. This was already done in the RISC-V implementation, but other architectures still did hardcoded checks.
- As an exception to this, CFI instructions are explicitly delegated to the target because RISC-V has different handling for those.
2. `isTargetIndex()` checks are replaced with an assert; none of the architectures supported actually use `MO_TargetIndex` at this point in time.
3. `isCFIIndex()` and `isFI()` checks are also replaced with asserts, since these operands should not exist in [any context](https://reviews.llvm.org/D122635#3447214) at this stage in the pipeline.
Reviewed by: paquette
Differential Revision: https://reviews.llvm.org/D125072
Change MCInstrDesc::operands to return an ArrayRef so we can easily use
it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end.
A future patch will remove opInfo_begin and opInfo_end.
Also use it instead of raw access to the OpInfo pointer. A future patch
will remove this pointer.
Differential Revision: https://reviews.llvm.org/D142213