The Xqci 0.7.0 spec just came out, with some updates to Xqciint,
bringing it to v0.4. The main update of any relevance is that
`qc.c.mienter` and `qc.c.mienter.nest` now update both the stack pointer
and the frame pointer (before, they only updated the stack pointer).
They both remain compatible with the frame pointer convention.
This change bumps the Xqciint version, and ensures that we don't emit
the unneeded frame pointer adjustment instruction after
`qc.c.mienter(.nest)`.
This change adds support for `qci-nest` and `qci-nonest` interrupt
attribute values. Both of these are machine-mode interrupts, which use
instructions in Xqciint to push and pop A- and T-registers (and a few
others) from the stack.
In particular:
- `qci-nonest` uses `qc.c.mienter` to save registers at the start of the
function, and uses `qc.c.mileaveret` to restore those registers and
return from the interrupt.
- `qci-nest` uses `qc.c.mienter.nest` to save registers at the start of
the function, and uses `qc.c.mileaveret` to restore those registers and
return from the interrupt.
- `qc.c.mienter` and `qc.c.mienter.nest` both push registers ra, s0
(fp), t0-t6, and a0-a10 onto the stack (as well as some CSRs for the
interrupt context). The difference between these is that
`qc.c.mienter.nest` re-enables M-mode interrupts.
- `qc.c.mileaveret` will restore the registers that were saved by
`qc.c.mienter(.nest)`, and return from the interrupt.
These work for both standard M-mode interrupts and the non-maskable
interrupt CSRs added by Xqciint.
The `qc.c.mienter`, `qc.c.mienter.nest` and `qc.c.mileaveret`
instructions are compatible with push and pop instructions, in as much
as they (mostly) only spill the A- and T-registers, so we can use the
`Zcmp` or `Xqccmp` instructions to spill the S-registers. This
combination (`qci-(no)nest` and `Xqccmp`/`Zcmp`) is not implemented in
this change.
The `qc.c.mienter(.nest)` instructions have a specific register storage
order so they preserve the frame pointer convention linked list past the
current interrupt handler and into the interrupted code and frames if
frame pointers are enabled.
Co-authored-by: Pankaj Gode <quic_pgode@quicinc.com>
This adds support for Xqccmp to the following passes:
- Prolog Epilog Insertion - reusing much of the existing push/pop logic,
but extending it to cope with frame pointers and reorder the CFI
information correctly.
- Move Merger - extending it to support the `qc.` variants of the
double-move instructions.
- Push/Pop Optimizer - extending it to support the `qc.` variants of the
pop instructions.
The testing is based on existing Zcmp tests, but I have put them in
separate files as some of the Zcmp tests were getting quite long.
Previously we calculated the max register id. Then converted it to the number
of registers and encoding. Then converted number of registers to stack
size. Then saved number of registers, encoding, and stack size to
MachineFunctionInfo.
This patch removes the calculation of max register id, and instead
calculates the number of registers. The encoding is removed from
MachineFunctionInfo in favor of converting the number of registers to
encoding at the time of use.
Callee saved registers should always be phyiscal registers. They are
often passed directly to other functions that take MCRegister like
getMinimalPhysRegClass or TargetRegisterClass::contains.
Unfortunately, sometimes the MCRegister is compared to a Register which
gave an ambiguous comparison error when the MCRegister is on the LHS.
Adding a MCRegister==Register comparison operator created more ambiguous
comparison errors elsewhere. These cases were usually comparing against
a base or frame pointer register that is a physical register in a
Register. For those I added an explicit conversion of Register to
MCRegister to fix the error.
Found using https://github.com/codespell-project/codespell
```
codespell RISCV --write-changes \
--ignore-words-list=FPR,fpr,VAs,ORE,WorstCase,hart,sie,MIs,FLE,fle,CarryIn,vor,OLT,VILL,vill,bu,pass-thru
```
This is a tiny change that can save up to 16 bytes of stack allocation,
which is more beneficial on RV32 than RV64.
cm.push allocates multiples of 16 bytes, but only uses a subset of those
bytes for pushing callee-saved registers. Up to 12 (rv32) or 8 (rv64)
bytes are left unused, depending on how many registers are pushed.
Before this change, we told LLVM that the entire allocation was used, by
creating a fixed stack object which covered the whole allocation.
This change instead gives an accurate extent to the fixed stack object,
to only cover the registers that have been pushed. This allows the
PrologEpilogInserter to use any unused bytes for spills. Potentially
this saves an extra move of the stack pointer after the push, because
the push can allocate up to 48 more bytes than it needs for registers.
We cannot do the same change for save/restore, because the restore
routines restore in batches of `stackalign/(xlen/8)` registers, and we
don't want to clobber the saved values of registers that we didn't tell
the compiler we were saving/restoring - for instance `__riscv_restore_0`
is used by the compiler when it only wants to save `ra`, but will end up
restoring `ra` and `s0`.
The DwarfDebug.cpp implementation expects the epilogue instructions to
have source location of last non-debug instruction after which the epilogue
instructions are inserted. The epilogue_begin is set on location of the first
FrameDestroy instruction with source line information that has been seen in
the epilogue basic block.
In the trunk, the risc-v backend sets the epilogue_begin after the epilogue has
actually begun i.e. after callee saved register reloads and the source line
information is not set on those reload instructions. This is leading to #120553
where, while debugging, breaking on or single stepping to the epilogue_begin
location will make accessing the variables from wrong place as the FP has been
restored to the parent frame's FP.
To fix that, this patch sets FrameSetup/FrameDestroy flags on the callee saved
register spill/reload instructions which is actually correct. Then the
RISCVInstrInfo::loadRegFromStackSlot uses FrameDestroy flag to identify a
reload of the callee saved register in the epilogue and copies the source
line information from insert position instruction to that reload instruction.
Requires PR #120622Fixes#120553
Use the probe loop structure to allocate vector code in the stack as
well. We add the pseudo instruction RISCV::PROBED_STACKALLOC_RVV to
differentiate from the normal loop.
Without this patch ScalableVector frame index property is used before
assignment. More precisely, let's take a look at
RISCVFrameLowering::assignCalleeSavedSpillSlots. In this function we
divide callee saved registers on scalar and vector ones, based on
ScalableVector property of their frame indexes:
```
...
const auto &UnmanagedCSI = getUnmanagedCSI(*MF, CSI);
const auto &RVVCSI = getRVVCalleeSavedInfo(*MF, CSI);
...
```
But we assign ScalableVector property several lines below:
```
...
auto storeRegToStackSlot = [&](decltype(UnmanagedCSI) CSInfo) {
for (auto &CS : CSInfo) {
// Insert the spill to the stack frame.
Register Reg = CS.getReg();
const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(Reg);
TII.storeRegToStackSlot(MBB, MI, Reg, !MBB.isLiveIn(Reg),
CS.getFrameIdx(), RC, TRI, Register());
}
};
storeRegToStackSlot(UnmanagedCSI);
...
```
Due to it, list of RVV callee saved registers will always be empty.
Currently this problem doesn't appear, but if you slightly change the
code and, for example, put some instructions between scalar and vector
spills, the resulting code will be ill formed.
Enable `-fstack-clash-protection` for RISCV and stack probe for function
prologues.
We probe the stack by creating a loop that allocates and probe the stack
in ProbeSize chunks.
We emit an unrolled probe loop for small allocations and emit a variable
length probe loop for bigger ones.
We can move the logic from adjustStackForRVV into adjustReg, which
results in the remaining logic being trivially inlined to the two
callers and allows a duplicate copy of the same logic in
eliminateFrameIndex to be pruned.
The Zcmp callee saved registers are already accounted for in
getCalleeSavedStackSize(). Subtracting RVPushStackSize subtracts
them a second time leading to incorrect stack offsets during frame
index elimination.
This should have been removed in
0de2b26942f890a6ec84cd75ac7abe3f6f2b2e37
when Zcmp handling was changed. Prior to that, RVPushStackSize was
not included in getCalleeSavedStackSize(). The commit message at the
time noted that Zcmp+RVV was likely broken.
This patch follows https://github.com/llvm/llvm-project/pull/112477.
Previously `-fsanitize=shadow-call-stack` (which get transform to
`Attribute::ShadowCallStack`) is used for enable both hardware and
software shadow stack, and another option `-force-sw-shadow-stack` is
needed if the user wants to use the software shadow stack where hardware
software shadow stack could be supported. It decouples both by using the
string attribute `hw-shadow-stack` to distinguish from the software
shadow stack attribute.
This patch fixes sp recovery in the epilogue in varargs functions when
fp register is presented and second sp adjustment is applied.
Source of the issue: https://github.com/llvm/llvm-project/pull/110809
Those registers are too fragmented in terms of usage, some are hard
coded and some are retrieved by calling function. Also some have
comments for alias name, some don't.
If we have a minimum vlen, we were adjusting StackSize to change the
unit from vscale to bytes, and then calculating the required padding
size for alignment in bytes. However, we then used that padding size as
an offset in vscale units, resulting in misplaced stack objects.
While it would be possible to adjust the object offsets by dividing
AlignmentPadding by ST.getRealMinVLen() / RISCV::RVVBitsPerBlock, we can
simplify the calculation a bit if instead we adjust the alignment to be
in vscale units.
@topperc This fixes a bug I am seeing after #110312, but I am not 100%
certain I am understanding the code correctly, could you please see if
this makes sense to you?
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.
Note: I don't have commit access.
Currently, in the cases when fp register is presented and sp register is
adjusted at the second time, sp recovery in a function epilogue isn't
performed in the best way, for example:
```
lui a0, 2
sub sp, s0, a0
addi a0, a0, -2044
add sp, sp, a0
```
This patch improves sp register recovery in such cases and the code
snippet above becomes:
```
addi sp, s0, -2044
```
If we know vlen is a multiple of 16, we don't need any alignment
padding.
I wrote the code so that it would generate the minimum amount of padding
if the stack align was 32 or larger or if RVVBitsPerBlock was smaller
than half the stack alignment.
The grouped vector register is modeled as a single register, e.g. V2M2,
which is actually V2 and V3. We need to decompose the grouped vector
register(if any) to individual vector register when emitting CFIs in
prologue.
Fixed https://github.com/llvm/llvm-project/issues/94500
cm.push can't save X26 without also saving X27. This removes two other
checks for this case.
This causes CFI to be emitted since X27 is now explicitly a callee saved
register.
The affected tests use inline assembly to clobber X26 rather than the
whole range of s0-s10.
Currently the CFI offset for RVV registers are not handled entirely,
this patch add those information for either stack unwinding or
debugger to work correctly on RVV callee-saved stack object.
Depends On D154576
Differential Revision: https://reviews.llvm.org/D156846
This code was trying to save temporary argument registers in interrupt
handler functions that contain calls. With the exception that all FP
registers are saved including the normally callee saved registers.
If all of the callees use an FP ABI and the interrupt handler doesn't
touch the normally callee saved FP registers, we don't need to save
them.
It doesn't appear that we need to special case functions with calls. The
normal callee saved register handling will already check each of the calls
and consider a register clobbered if the call doesn't explicitly say it is preserved.
All of the test changes are from the removal of the FP callee saved
registers. There are tests for interrupt handlers with F and D extension
that use ilp32 or lp64 ABIs that are not affected by this change. They
still save the FP callee saved registers as they should.
gcc appears to have a bug where the D extension being enabled with the
ilp32f or lp64f ABI does not save the FP callee saved regs. The callee
would only save/restore the lower 32 bits and clobber the upper bits.
LLVM saves the FP callee saved regs in this case and there is an
unchanged test for it.
The unnecessary save/restore was raised in this thread
https://discourse.llvm.org/t/has-bugs-when-optimizing-save-restore-csrs-by-changing-csr-xlen-f32-interrupt/78200/1
[RISCV] RISCV vector calling convention (1/2)
This is the vector calling convention based on
https://github.com/riscv-non-isa/riscv-elf-psabi-doc,
the idea is to split between "scalar" callee-saved registers
and "vector" callee-saved registers. "scalar" ones remain the
original strategy, however, "vector" ones are handled together
with RVV objects.
The stack layout would be:
|--------------------------| <-- FP
| callee-allocated save |
| area for register varargs|
|--------------------------|
| callee-saved registers | <-- scalar callee-saved
| (scalar) |
|--------------------------|
| RVV alignment padding |
|--------------------------|
| callee-saved registers | <-- vector callee-saved
| (vector) |
|--------------------------|
| RVV objects |
|--------------------------|
| padding before RVV |
|--------------------------|
| scalar local variables |
|--------------------------| <-- BP
| variable size objects |
|--------------------------| <-- SP
Note: This patch doesn't contain "tuple" type, e.g. vint32m1x2.
It will be handled in https://github.com/riscv-non-isa/riscv-elf-psabi-doc (2/2).
Differential Revision: https://reviews.llvm.org/D154576
This an alternative to #84935 to fix the miscompile, but not be optimal.
The immediate for cm.push/pop must be a multiple of 16. For RVE, it
might not be. It's not easy to increase the stack size without messing
up cfa directives and maybe other things.
This patch rounds the stack size down to a multiple of 16 before
clamping it to 48. This causes an extra addi to be emitted to handle the
remainder.
Once this commited, I can commit #84989 to add verification for these
instructions being generated with valid offsets.
We've now got enough of these in tree that we can see which patterns
appear to be idiomatic. As such, extract a helper for checking
if we know the exact VLEN.
PEI previously used fake frame indices for these callee saved registers.
These fake frame indices are not register with MachineFrameInfo. This
required them to be deleted form CalleeSavedInfo after PEI to avoid
breaking later passes. See #79535
Unfortunately, removing the registers from CalleeSavedInfo pessimizes
Interprocedural Register Allocation. The RegUsageInfoCollector pass runs
after PEI and uses CalleeSavedInfo.
This patch replaces #79535 by properly creating fixed stack objects
through MachineFrameInfo. This changes the stack size and offsets
returned by MachineFrameInfo which requires changes to how
RISCVFrameLowering uses that information.
In addition to the individual object for each register, I've also create
a single large fixed object that covers the entire stack area covered by
cm.push or the libcalls. cm.push must always push a multiple of 16 bytes
and the save restore libcall pushes a multiple of stack align. I think
this leaves holes in the stack where we could spill other registers, but
it matches what we did previously. Maybe we can optimize this in the
future.
The only test changes are due to stack alignment handling after the
callee save registers. Since we now have the fixed objects, on the stack
the offset is non-zero when an aligned object is processed so the offset
gets rounded up, increasing the stack size.
I suspect we might need some more updates for RVV related code. There is
very little or maybe even no testing of RVV mixed with Zcmp and
save-restore.
This patch enable hardware shadow stack with `Zicifss` and
`mno-forced-sw-shadow-stack`. New feature forced-sw-shadow-stack
disables hardware shadow stack even when `Zicfiss` enabled.
The SavedRegs BitVector is checks against the CSR list later. We have
a separate CSR list for RVE that excludes X16-31 so we don't need
to filter here.
If it was needed, it would be needed for the next block of code too
which didn't have an RVE check.
Registers that are pushed/popped by Zcmp or libcalls have pre-defined
frame indices that are never allocated in MachineFrameInfo. They're
being used throughout PEI, but the rest of codegen doesn't work that way
and expects each frame index to be a valid index in MFI.
This patch keeps it local to PEI and removes them from the
CalleeSavedInfo list at the end of the pass.
Before this pass, any MIR testing post-PEI is broken and asserts (see
issue #79491).