We can move the logic from adjustStackForRVV into adjustReg, which
results in the remaining logic being trivially inlined to the two
callers and allows a duplicate copy of the same logic in
eliminateFrameIndex to be pruned.
The Zcmp callee saved registers are already accounted for in
getCalleeSavedStackSize(). Subtracting RVPushStackSize subtracts
them a second time leading to incorrect stack offsets during frame
index elimination.
This should have been removed in
0de2b26942f890a6ec84cd75ac7abe3f6f2b2e37
when Zcmp handling was changed. Prior to that, RVPushStackSize was
not included in getCalleeSavedStackSize(). The commit message at the
time noted that Zcmp+RVV was likely broken.
This patch follows https://github.com/llvm/llvm-project/pull/112477.
Previously `-fsanitize=shadow-call-stack` (which get transform to
`Attribute::ShadowCallStack`) is used for enable both hardware and
software shadow stack, and another option `-force-sw-shadow-stack` is
needed if the user wants to use the software shadow stack where hardware
software shadow stack could be supported. It decouples both by using the
string attribute `hw-shadow-stack` to distinguish from the software
shadow stack attribute.
This patch fixes sp recovery in the epilogue in varargs functions when
fp register is presented and second sp adjustment is applied.
Source of the issue: https://github.com/llvm/llvm-project/pull/110809
Those registers are too fragmented in terms of usage, some are hard
coded and some are retrieved by calling function. Also some have
comments for alias name, some don't.
If we have a minimum vlen, we were adjusting StackSize to change the
unit from vscale to bytes, and then calculating the required padding
size for alignment in bytes. However, we then used that padding size as
an offset in vscale units, resulting in misplaced stack objects.
While it would be possible to adjust the object offsets by dividing
AlignmentPadding by ST.getRealMinVLen() / RISCV::RVVBitsPerBlock, we can
simplify the calculation a bit if instead we adjust the alignment to be
in vscale units.
@topperc This fixes a bug I am seeing after #110312, but I am not 100%
certain I am understanding the code correctly, could you please see if
this makes sense to you?
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.
Note: I don't have commit access.
Currently, in the cases when fp register is presented and sp register is
adjusted at the second time, sp recovery in a function epilogue isn't
performed in the best way, for example:
```
lui a0, 2
sub sp, s0, a0
addi a0, a0, -2044
add sp, sp, a0
```
This patch improves sp register recovery in such cases and the code
snippet above becomes:
```
addi sp, s0, -2044
```
If we know vlen is a multiple of 16, we don't need any alignment
padding.
I wrote the code so that it would generate the minimum amount of padding
if the stack align was 32 or larger or if RVVBitsPerBlock was smaller
than half the stack alignment.
The grouped vector register is modeled as a single register, e.g. V2M2,
which is actually V2 and V3. We need to decompose the grouped vector
register(if any) to individual vector register when emitting CFIs in
prologue.
Fixed https://github.com/llvm/llvm-project/issues/94500
cm.push can't save X26 without also saving X27. This removes two other
checks for this case.
This causes CFI to be emitted since X27 is now explicitly a callee saved
register.
The affected tests use inline assembly to clobber X26 rather than the
whole range of s0-s10.
Currently the CFI offset for RVV registers are not handled entirely,
this patch add those information for either stack unwinding or
debugger to work correctly on RVV callee-saved stack object.
Depends On D154576
Differential Revision: https://reviews.llvm.org/D156846
This code was trying to save temporary argument registers in interrupt
handler functions that contain calls. With the exception that all FP
registers are saved including the normally callee saved registers.
If all of the callees use an FP ABI and the interrupt handler doesn't
touch the normally callee saved FP registers, we don't need to save
them.
It doesn't appear that we need to special case functions with calls. The
normal callee saved register handling will already check each of the calls
and consider a register clobbered if the call doesn't explicitly say it is preserved.
All of the test changes are from the removal of the FP callee saved
registers. There are tests for interrupt handlers with F and D extension
that use ilp32 or lp64 ABIs that are not affected by this change. They
still save the FP callee saved registers as they should.
gcc appears to have a bug where the D extension being enabled with the
ilp32f or lp64f ABI does not save the FP callee saved regs. The callee
would only save/restore the lower 32 bits and clobber the upper bits.
LLVM saves the FP callee saved regs in this case and there is an
unchanged test for it.
The unnecessary save/restore was raised in this thread
https://discourse.llvm.org/t/has-bugs-when-optimizing-save-restore-csrs-by-changing-csr-xlen-f32-interrupt/78200/1
[RISCV] RISCV vector calling convention (1/2)
This is the vector calling convention based on
https://github.com/riscv-non-isa/riscv-elf-psabi-doc,
the idea is to split between "scalar" callee-saved registers
and "vector" callee-saved registers. "scalar" ones remain the
original strategy, however, "vector" ones are handled together
with RVV objects.
The stack layout would be:
|--------------------------| <-- FP
| callee-allocated save |
| area for register varargs|
|--------------------------|
| callee-saved registers | <-- scalar callee-saved
| (scalar) |
|--------------------------|
| RVV alignment padding |
|--------------------------|
| callee-saved registers | <-- vector callee-saved
| (vector) |
|--------------------------|
| RVV objects |
|--------------------------|
| padding before RVV |
|--------------------------|
| scalar local variables |
|--------------------------| <-- BP
| variable size objects |
|--------------------------| <-- SP
Note: This patch doesn't contain "tuple" type, e.g. vint32m1x2.
It will be handled in https://github.com/riscv-non-isa/riscv-elf-psabi-doc (2/2).
Differential Revision: https://reviews.llvm.org/D154576
This an alternative to #84935 to fix the miscompile, but not be optimal.
The immediate for cm.push/pop must be a multiple of 16. For RVE, it
might not be. It's not easy to increase the stack size without messing
up cfa directives and maybe other things.
This patch rounds the stack size down to a multiple of 16 before
clamping it to 48. This causes an extra addi to be emitted to handle the
remainder.
Once this commited, I can commit #84989 to add verification for these
instructions being generated with valid offsets.
We've now got enough of these in tree that we can see which patterns
appear to be idiomatic. As such, extract a helper for checking
if we know the exact VLEN.
PEI previously used fake frame indices for these callee saved registers.
These fake frame indices are not register with MachineFrameInfo. This
required them to be deleted form CalleeSavedInfo after PEI to avoid
breaking later passes. See #79535
Unfortunately, removing the registers from CalleeSavedInfo pessimizes
Interprocedural Register Allocation. The RegUsageInfoCollector pass runs
after PEI and uses CalleeSavedInfo.
This patch replaces #79535 by properly creating fixed stack objects
through MachineFrameInfo. This changes the stack size and offsets
returned by MachineFrameInfo which requires changes to how
RISCVFrameLowering uses that information.
In addition to the individual object for each register, I've also create
a single large fixed object that covers the entire stack area covered by
cm.push or the libcalls. cm.push must always push a multiple of 16 bytes
and the save restore libcall pushes a multiple of stack align. I think
this leaves holes in the stack where we could spill other registers, but
it matches what we did previously. Maybe we can optimize this in the
future.
The only test changes are due to stack alignment handling after the
callee save registers. Since we now have the fixed objects, on the stack
the offset is non-zero when an aligned object is processed so the offset
gets rounded up, increasing the stack size.
I suspect we might need some more updates for RVV related code. There is
very little or maybe even no testing of RVV mixed with Zcmp and
save-restore.
This patch enable hardware shadow stack with `Zicifss` and
`mno-forced-sw-shadow-stack`. New feature forced-sw-shadow-stack
disables hardware shadow stack even when `Zicfiss` enabled.
The SavedRegs BitVector is checks against the CSR list later. We have
a separate CSR list for RVE that excludes X16-31 so we don't need
to filter here.
If it was needed, it would be needed for the next block of code too
which didn't have an RVE check.
Registers that are pushed/popped by Zcmp or libcalls have pre-defined
frame indices that are never allocated in MachineFrameInfo. They're
being used throughout PEI, but the rest of codegen doesn't work that way
and expects each frame index to be a valid index in MFI.
This patch keeps it local to PEI and removes them from the
CalleeSavedInfo list at the end of the pass.
Before this pass, any MIR testing post-PEI is broken and asserts (see
issue #79491).
This commit includes the necessary changes to clang and LLVM to support
codegen of `RVE` and the `ilp32e`/`lp64e` ABIs.
The differences between `RVE` and `RVI` are:
* `RVE` reduces the integer register count to 16(x0-x16).
* The ABI should be `ilp32e` for 32 bits and `lp64e` for 64 bits.
`RVE` can be combined with all current standard extensions.
The central changes in ilp32e/lp64e ABI, compared to ilp32/lp64 are:
* Only 6 integer argument registers (rather than 8).
* Only 2 callee-saved registers (rather than 12).
* A Stack Alignment of 32bits (rather than 128bits).
* ilp32e isn't compatible with D ISA extension.
If `ilp32e` or `lp64` is used with an ISA that has any of the registers
x16-x31 and f0-f31, then these registers are considered temporaries.
To be compatible with the implementation of ilp32e in GCC, we don't use
aligned registers to pass variadic arguments and set stack alignment\
to 4-bytes for types with length of 2*XLEN.
FastCC is also supported on RVE, while GHC isn't since there is only one
avaiable register.
Differential Revision: https://reviews.llvm.org/D70401
We check the opcode of the first instruction in the block where the
prologue is inserted without checking if the iterator points to any
instructions. When the basic block is empty, that causes a crash. One
way the prologue block can be empty is when it starts with a call to
__builtin_readcyclecounter on RV32 since that produces a loop.
Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>
The pr does two things. One is to fix internal compiler error when we
need to spill callee saves but none of them is GPR, another is to fix
wrong register number for pushed registers are {ra, s0-s11}.
When inserting prolgue/epilogue, we use the spimm of CM.PUSH/POP to
reduce the following offset for sp. Previously, we tried to use the free
space of the push stack to minimize the following sp-offset. But it's
useless, since the free space must be less than 16 and required stack should
be aligned to 16 before/after the adjustment.
addi sp, sp, 512 may be used to recover the sp in the epilogue
when stack size is larger than 2047(2^11 - 1), however, it can
not be compressed using C extension, and addi sp, sp, 496 is
able to be compressed, so try to use 496 as the ajust amount of
the fisrt sp if function doesn't need extra instructions after
adjust.
Reviewed By: wangpc
Differential Revision: https://reviews.llvm.org/D159431
Before accessing "getOpcode" thorugh machine instruction, check if the iterator
has reached the end of Machine basic block otherwise we will crash at the
assertion `!NodePtr->isKnownSentinel()`.
The above assertion is hit in "Prologue/Epilogue Insertion & Frame Finalization
pass".
Reviewed By: craig.topper, wangpc
Differential Revision: https://reviews.llvm.org/D158256
For callee saved/restored operations, they mostly use the
following inst patterns,
sw rs2, offset(x2)
sd rs2, offset(x2)
fsw rs2, offset(x2)
fsd rs2, offset(x2)
lw rd, offset(x2)
ld rd, offset(x2)
flw rd, offset(x2)
fld rd, offset(x2)
and offset decides whether the instructions can be compressed.
now offset 2032 will be set by default if stacksize is bigger
than 2^12-1 to save and restore callee saved register, so it
will prevent all the callee saved/restored stack insts be
compressed.
Allocating proper offset for stack insts is useful to make
them be compressed.
Reviewed By: craig.topper, wangpc
Differential Revision: https://reviews.llvm.org/D157373
If save/restore libcall or Zcmp push/pop are enabled, callee-saved registers including ra would be
assigned a negative frame index (-1 for ra, -2 for s0, ..., -13 for s11) by RISCVRegisterInfo::hasReservedSpillSlot.
getLibCallID would find a callee-saved register that has max register id and with assigned negative frame index.
And use this max register to get which save/restore libcall should be selected.
In getMaxPushPopReg (for Zcmp push/pop, similar getLibCallID), it uses extra register class (PGPR) to
check the saved register is in ra, s0-s11 instead of checking the frame index is negative.
This patch just changes the checking for the saved register from `is contained in PGPR` to `has a negative frame index`.
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D156393
Previously we used the number of registers needed saved and pushable as the
number of pushed registers. We also use pushed register number to caculate
the stack size. It is not correct because Zcmp pushes registers from $ra to the
max register needed saved and there is no gurantee that the needed saved
registers are a sequenced list from $ra.
There is an example about that. PushPopRegs should be 6 (ra,s0 - s4)= instead of 1.
```
; llc -mtriple=riscv32 -mattr=+zcmp
define void @foo() {
entry:
; Old: .cfi_def_cfa_offset 16
; New: .cfi_def_cfa_offset 32
tail call void asm sideeffect "li s4, 0", "~{s4}"()
ret void
}
```
Reviewed By: Jim, kito-cheng
Differential Revision: https://reviews.llvm.org/D156407
Issue mentioned: https://github.com/riscv/riscv-code-size-reduction/issues/182
The order of callee-saved registers stored by Zcmp push in memory is reversed.
Pseudo code for cm.push in https://github.com/riscv/riscv-code-size-reduction/releases/download/v1.0.4-1/Zc.1.0.4-1.pdf
```
if (XLEN==32) bytes=4; else bytes=8;
addr=sp-bytes;
for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1) {
//if register i is in xreg_list
if (xreg_list[i]) {
switch(bytes) {
4: asm("sw x[i], 0(addr)");
8: asm("sd x[i], 0(addr)");
}
addr-=bytes;
}
}
```
The placement order for push is s11, s10, ..., ra.
CFI offset should be calculed as reversed order for correct stack unwinding.
Reviewed By: fakepaper56, kito-cheng
Differential Revision: https://reviews.llvm.org/D156437
This patch readjusts the frame stack for the push and pop instructions
co-author: @Lukacma
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D134599
to help debug and report better diagnostics for functions like
relaxDwarfCallFrameFragment (D153167).
In MCStreamer, some emitCFI* functions already take a SMLoc argument. Add a
SMLoc argument to the remaining functions that generate a MCCFIInstruction.
Prior to this patch the SCS prologue used the following instruction
sequence.
```
s[w|d] ra, 0(gp)
addi gp, gp, [4|8]
```
The problem with this sequence is that an interrupt occurring between the
store and the increment could clobber the value just written to the SCS.
https://reviews.llvm.org/D84414#inline-813203 pointed out a similar
issues that could have affected the epilogue.
This patch changes the instruction sequence in the prologue to:
```
addi gp, gp, [4|8]
s[w|d] ra, -[4|8](gp)
```
The downside to this is that there is now a data dependency between the
add and the store.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D149099
Currently we don't emit any CFI instructions for the SCS register when
enabling SCS on RISCV. This causes problems when unwinding, since the
SCS register isn't being handled properly.
Reviewed By: mcgrathr
Differential Revision: https://reviews.llvm.org/D145205