At some corner cases, the cloned MI still retains an old slot index,
which leads to the compiler crashing. This patch update the slot index
map before delete the recycled MI.
https://github.com/llvm/llvm-project/issues/123165
This patch is in preparation to enable setting the MachineInstr::MIFlag
flags, i.e. FrameSetup/FrameDestroy, on callee saved register
spill/reload instructions in prologue/epilogue. This eventually helps in
setting the prologue_end and epilogue_begin markers more accurately.
The DWARF Spec in "6.4 Call Frame Information" says:
The code that allocates space on the call frame stack and performs the
save
operation is called the subroutine’s prologue, and the code that
performs
the restore operation and deallocates the frame is called its epilogue.
which means the callee saved register spills and reloads are part of
prologue (a.k.a frame setup) and epilogue (a.k.a frame destruction),
respectively. And, IIUC, LLVM backend uses FrameSetup/FrameDestroy flags
to identify instructions that are part of call frame setup and
destruction.
In the trunk, while most targets consistently set
FrameSetup/FrameDestroy on save/restore call frame information (CFI)
instructions of callee saved registers, they do not consistently set
those flags on the actual callee saved register spill/reload
instructions.
I believe this patch provides a clean mechanism to set
FrameSetup/FrameDestroy flags on the actual callee saved register
spill/reload instructions as needed. And, by having default argument of
MachineInstr::NoFlags for Flags, this patch is a NFC.
With this patch, the targets have to just pass FrameSetup/FrameDestroy
flag to the storeRegToStackSlot/loadRegFromStackSlot calls from the
target derived spillCalleeSavedRegisters and restoreCalleeSavedRegisters
to set those flags on callee saved register spill/reload instructions.
Also, this patch makes it very easy to set the source line information
on callee saved register spill/reload instructions which is needed by
the DwarfDebug.cpp implementation to set prologue_end and epilogue_begin
markers more accurately.
As per DwarfDebug.cpp implementation:
prologue_end is the first known non-DBG_VALUE and non-FrameSetup
location
that marks the beginning of the function body
epilogue_begin is the first FrameDestroy location that has been seen in
the
epilogue basic block
With this patch, the targets have to just do the following to set the
source line information on callee saved register spill/reload
instructions, without hampering the LLVM's efforts to avoid adding
source line information on the artificial code generated by the
compiler.
<Foo>InstrInfo::storeRegToStackSlot() {
...
DebugLoc DL =
Flags & MachineInstr::FrameSetup ? DebugLoc() : MBB.findDebugLoc(I);
...
}
<Foo>InstrInfo::loadRegFromStackSlot() {
...
DebugLoc DL =
Flags & MachineInstr::FrameDestroy ? MBB.findDebugLoc(I) : DebugLoc();
...
}
While I understand this patch would break out-of-tree backend builds, I
think it is in the right direction.
One immediate use case that can benefit from this patch is fixing
#120553 becomes simpler.
The insturction selector uses the `MachineFunction::copySalvageSSA`
function to insert `DBG_PHIs` or identify a defining instruction for a
copy-like instruction when finalizing Instruction References.
AArch64 has the ORR instruction which is a logical OR with the variants
ORRWrr which refers to a register to register variant, and ORRWrs which
is a register to a shifted register variant.
An ORRWrs where the shift amount is 0, and the zero register ($wzr) is
used is considered a copy, for example:
`$w0 = ORRWrs $wzr, killed $w3, 0`
However an ORRWrr with a zero register is not considered a copy
`$w0 = ORRWrr $wzr, killed $w3`
This causes an issue in the livedebugvalues pass because in aarch64-isel
the instruction is the ORRWrr variant, but is then changed to the ORRWrs
variant before the livedebugvalues pass.
This causes a mismatch between the two passes which leads to a crash in
the livedebugvalues pass.
This patch fixes the issue.
Previously, with `-fzero-call-used-regs` clang/LLVM would incorrectly
emit Neon instructions in streaming functions, and streaming-compatible
functions without SVE.
With this change:
* In streaming functions, Z/p registers will be zeroed
* In streaming compatible functions w/o SVE, D registers will be zeroed
- (As Neon vector instructions are illegal including `movi v..`)
In case the first operand is a physical register with no register class, use
the second operand of the sub as the register class for the new virtual
register in genSubAdd2SubSub machine combine.
Move the emission of the checks performed on the authenticated LR value
during tail calls to AArch64AsmPrinter class, so that different checker
sequences can be reused by pseudo instructions expanded there.
This adds one more option to AuthCheckMethod enumeration, the generic
XPAC variant which is not restricted to checking the LR register.
This fixes issue #109250
The issue happens during the `MachineBlockPlacement` pass. The block,
whose address was previously not taken, is deemed redundant by the pass
and subsequently replaced using
`MachineBasicBlock::ReplaceUsesOfBlockWith` in `BranchFolding`.
ReplaceUsesOfBlockWith only replaces uses in the terminator. However,
`expandPostRAPseudo` introduces new block uses when expanding catchrets.
These uses do not get replaced, which results in undefined label errors
later on.
Marking the block addresss as taken prevents the replacement of the
block, without also replacing non-terminator uses.
This removes a redundant 'COPY' instruction that #81716 probably forgot
to remove.
This redundant COPY led to an issue because because code in
LiveRangeSplitting expects that the instruction emitted by
`loadRegFromStackSlot` is an instruction that accesses memory, which
isn't the case for the COPY instruction.
- When `getOutliningCandidateInfo()` returns `std::nullopt` (meaning no
`OutlinedFunction` is created), there is no need to clear the input
argument, `RepeatedSequenceLocs`, as it's already being cleared in the
main loop of `findCandidates()`.
- Replaced `2` by `MinRepeats`, which I missed from
https://github.com/llvm/llvm-project/pull/105398
This patch prepares the NFC groundwork for global outlining using
CGData, which will follow
https://github.com/llvm/llvm-project/pull/90074.
- The `MinRepeats` parameter is now explicitly passed to the
`getOutliningCandidateInfo` function, rather than relying on a default
value of 2. For local outlining, the minimum number of repetitions is
typically 2, but for the global outlining (mentioned above), we will
optimistically create a single `Candidate` for each `OutlinedFunction`
if stable hashes match a specific code sequence. This parameter is
adjusted accordingly in global outlining scenarios.
- I have also implemented `unique_ptr` for `OutlinedFunction` to ensure
safe and efficient memory management within `FunctionList`, avoiding
unnecessary implicit copies.
This depends on https://github.com/llvm/llvm-project/pull/101461.
This is a patch for
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
The renamable flag is useful during MachineCopyPropagation but renamable
flag will be dropped after lowerCopy in some case.
This patch introduces extra arguments to pass the renamable flag to
copyPhysReg.
This reverts commit 43ffe2eed0d9f73789dbe213023733d164999306.
Reason: buildbot breakage starting at https://lab.llvm.org/buildbot/#/builders/85/builds/1102
I manually bisected and found that clang crashed with 43ffe2eed0d9f73789dbe213023733d164999306 but not the immediately preceding commit (33190490c667aaf8b08d5af8b8ce84524f856e80)
A case for this transformation, https://gcc.godbolt.org/z/nhYcWq1WE
Fold
mov w8, #56952
movk w8, #15, lsl #16
ldrb w0, [x0, x8]
into
add x0, x0, 1036288
ldrb w0, [x0, 3704]
Only LDRBBroX is supported for the first time.
Fix https://github.com/llvm/llvm-project/issues/71917
Note: This PR is try relanding the commit 32878c2065 with fix crash for PR79756
this crash is exposes when there is MOVKWi instruction in the head of a block,
but without MOVZWi
This adds an implementation of AArch64InstrInfo::verifyInstruction for
AArch64, and adds some basic verification of the range of immediate
ranges of memory operations using the information from getMemOpInfo.
Some extra memory operations have been added to getMemOpInfo, along with
the equivalent opcodes to getLoadStoreImmIdx to ensure we use the
correct index.
Please let us know if this starts reporting verification failures, Thanks.
This patch tries to clean up some of the existing values in
getMemOpInfo. All values should now be in bytes (not bits), and the
MinOffset/MaxOffset are now always represented unscaled (the immediate
that will be present in the final instruction).
Although I could not find a place where it altered codegen, the offset
of a post-index instruction will be 0, not scale*imm. A
IsPostIndexLdStOpcode method has been added to try and make sure that
case is handled properly.
This popped up while investigating
https://github.com/llvm/llvm-project/issues/96950
In a few places where we need the destination reg of an instruction we
were using a call that worked only by accident.
For pauthtest ABI, there is a bunch of ptrauth-* options, including
ptrauth-returns. Use "ptrauth-returns" function attribute to indicate
need for LR signing with B key for non-leaf function to avoid using
"sign-return-address" and "sign-return-address-key" which were
originally designed for pac-ret.
Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
Co-authored-by: Anatoly Trosinenko <atrosinenko@accesssoftek.com>
I've not added any new tests for these, because the original conditions
were wrong (they did not consider streaming mode) and we have tests for
the positive cases.
This patch refactors `AArch64InstrInfo::optimizePTestInstr` to simplify
the convoluted conditions and control flow
and make it easier to add the optimisation in
https://github.com/llvm/llvm-project/pull/81141
The following code assumes that RepeatedSequenceLocs is non-empty. Bail
out if there are less than 2 candidates left, as no outlining is
possible in that case. The same check is already present in all the
other places where elements from RepeatedSequenceLocs may be dropped.
This fixes the issue reported at:
https://github.com/llvm/llvm-project/pull/93965#issuecomment-2151989716
Modulo variable expansion is a technique that resolves overlap of
variable lifetimes by unrolling. The existing implementation solves it
by making a copy by move instruction for processors with ordinary
registers such as Arm and x86. This method may result in a very large
number of move instructions, which can cause performance problems.
Modulo variable expansion is enabled by specifying -pipeliner-mve-cg. A
backend must implement some newly defined interfaces in
PipelinerLoopInfo. They were implemented for AArch64.
Discourse thread:
https://discourse.llvm.org/t/implementing-modulo-variable-expansion-for-machinepipeliner
FirstCand is a reference to RepeatedSequenceLocs[0]. However, that
vector is being modified a lot throughout the function, including one
place that reassigns the whole vector. I'm not sure whether this can
really happen in practice, but it doesn't seem unlikely that this could
lead to a use-after-free.
Avoid this by directly using RepeatedSequenceLocs[0] at the start of the
function (as a lot of other places already do) and only creating
FirstCand at the end where no more modifications take place.
For streaming-compatible functions with only +sme, we can't use
a NEON ORR (aliased as 'mov') for copies of Q-registers, so
we need to use a spill/fill instead.
This also fixes the fill, which should use the post-incrementing
addressing mode.
This adds codegen support for the "ptrauth" operand bundles, which can
be used to augment indirect calls with the equivalent of an
`@llvm.ptrauth.auth` intrinsic call on the call target (possibly
preceded by an `@llvm.ptrauth.blend` on the auth discriminator if
applicable.)
This allows the generation of combined authenticating calls
on AArch64 (in the BLRA* PAuth instructions), while avoiding
the raw just-authenticated function pointer from being
exposed to attackers.
This is done by threading a PtrAuthInfo descriptor through
the call lowering infrastructure, eventually selecting a BLRA
pseudo. The pseudo encapsulates the safe discriminator
computation, which together with the real BLRA* call get emitted
in late pseudo expansion in AsmPrinter.
Note that this also applies to the other forms of indirect calls,
notably invokes, rvmarker, and tail calls. Tail-calls in particular
bring some additional complexity, with the intersecting register
constraints of BTI and PAC discriminator computation.
However this doesn't currently support PAuth_LR tail-call variants.
This also adopts an x8+ allocation order for GPR64noip, matching
GPR64.
Fixes#82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
https://github.com/llvm/llvm-project/pull/79940 put calls to
recomputeLiveIns into
a loop, to repeatedly call the function until the computation converges.
However,
this repeats a lot of code. This changes moves the loop into a function
to simplify
the handling.
Note that this changes the order in which recomputeLiveIns is called.
For example,
```
bool anyChange = false;
do {
anyChange = recomputeLiveIns(*ExitMBB) || recomputeLiveIns(*LoopMBB);
} while (anyChange);
```
only begins to recompute the live-ins for LoopMBB after the computation
for ExitMBB
has converged. With this change, all basic blocks have a recomputation
of the live-ins
for each loop iteration. This can result in less or more calls,
depending on the
situation.
We split target-dependent MachineCombiner patterns into their target
folder.
This makes MachineCombiner much more target-independent.
Reviewers:
davemgreen, asavonic, rotateright, RKSimon, lukel97, LuoYuanke, topperc, mshockwave, asi-sc
Reviewed By: topperc, mshockwave
Pull Request: https://github.com/llvm/llvm-project/pull/87991
7dc20ab introduced an extra COPY when spilling and filling a PNR
register, which can't be elided as the input (PNR predicate) and output
(PPR predicate) register classes differ. The patch adds a new register
class that covers both PPR and PNR so that STR_PXI and LDR_PXI can
take either of them, removing the need for the copy.
The existing heuristics were assuming that every core behaves like an
Apple A7, where any extend/shift costs an extra micro-op... but in
reality, nothing else behaves like that.
On some older Cortex designs, shifts by 1 or 4 cost extra, but all other
shifts/extensions are free. On all other cores, as far as I can tell,
all shifts/extensions for integer loads are free (i.e. the same cost as
an unshifted load).
To reflect this, this patch:
- Enables aggressive folding of shifts into loads by default.
- Removes the old AddrLSLFast feature, since it applies to everything
except A7 (and even if you are explicitly targeting A7, we want to
assume extensions are free because the code will almost always run on a
newer core).
- Adds a new feature AddrLSLSlow14 that applies specifically to the
Cortex cores where shifts by 1 or 4 cost extra.
I didn't add support for AddrLSLSlow14 on the GlobalISel side because it
would require a bunch of refactoring to work correctly. Someone can pick
this up as a followup.
Remove getSizeOrUnknown call when MachineMemOperand is created. For Scalable
TypeSize, the MemoryType created becomes a scalable_vector.
2 MMOs that have scalable memory access can then use the updated BasicAA that
understands scalable LocationSize.
Original Patch by Harvin Iriawan
Co-authored-by: David Green <david.green@arm.com>
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
We can add implicit defs/uses of the 'VG' register to the instructions
to prevent the register allocator from rematerializing values in between
streaming-mode changes, as the def/use of VG will further nail down the
ordering that comes out of ISel. This avoids the heavy-handed approach
to prevent any kind of rematerialization.
While we could add 'VG' as a Use to all SVE instructions, we only really
need to do this for instructions that are rematerializable, as the
smstart/smstop instructions and pseudos act as scheduling barriers which
is sufficient to prevent other instructions from being scheduled in
between the streaming-mode-changing call sequence. However, we may
revisit this in the future.