6 Commits

Author SHA1 Message Date
Iris Shi
dae5c4e1e7
[RISCV] Expand constant multiplication for targets without M extension (#137195)
Closes #137023

On RISC-V machines without a native multiply instruction (e.g., `rv32i`
base), multiplying a variable by a constant integer often compiles to a
call to a library routine like `__mul{s,d}i3`.

```assembly
	.globl __mulxi3
	.type  __mulxi3, @function
__mulxi3:
	mv     a2, a0
	mv     a0, zero
.L1:
	andi   a3, a1, 1
	beqz   a3, .L2
	add    a0, a0, a2
.L2:
	srli   a1, a1, 1
	slli   a2, a2, 1
	bnez   a1, .L1
	ret
```

This library function implements multiplication in software using a loop
of shifts and adds, processing the constant bit by bit. On rv32i, it
requires a minimum of 8 instructions (for multiply by `0`) and up to
about 200 instructions (by `0xffffffff`), involves heavy branching and
function call overhead.

When not optimizing for size, we could expand the constant
multiplication into a sequence of shift and add/sub instructions. For
now we use non-adjacent form for the shift and add/sub sequence, which
could save 1/2 - 2/3 instructions compared to a shl+add-only sequence.
2025-05-16 09:40:11 +08:00
Craig Topper
fc7fee8360
Revert "[RISCV] Allow spilling to unused Zcmp Stack (#125959)" (#137060)
This reverts commit 50cdf6cbc5035345507bb4d23fcb0292272754eb.

This patch causes miscompiles with vector and produces some odd code for
ilp32e.
2025-04-23 17:36:00 -07:00
Sam Elliott
50cdf6cbc5
[RISCV] Allow spilling to unused Zcmp Stack (#125959)
This is a tiny change that can save up to 16 bytes of stack allocation,
which is more beneficial on RV32 than RV64.

cm.push allocates multiples of 16 bytes, but only uses a subset of those
bytes for pushing callee-saved registers. Up to 12 (rv32) or 8 (rv64)
bytes are left unused, depending on how many registers are pushed.
Before this change, we told LLVM that the entire allocation was used, by
creating a fixed stack object which covered the whole allocation.

This change instead gives an accurate extent to the fixed stack object,
to only cover the registers that have been pushed. This allows the
PrologEpilogInserter to use any unused bytes for spills. Potentially
this saves an extra move of the stack pointer after the push, because
the push can allocate up to 48 more bytes than it needs for registers.

We cannot do the same change for save/restore, because the restore
routines restore in batches of `stackalign/(xlen/8)` registers, and we
don't want to clobber the saved values of registers that we didn't tell
the compiler we were saving/restoring - for instance `__riscv_restore_0`
is used by the compiler when it only wants to save `ra`, but will end up
restoring `ra` and `s0`.
2025-02-06 19:45:47 -08:00
dlav-sc
97982a8c60
[RISCV][CFI] add function epilogue cfi information (#110810)
This patch adds CFI instructions in the function epilogue.

Before patch:
addi sp, s0, -32
ld ra, 24(sp) # 8-byte Folded Reload
ld s0, 16(sp) # 8-byte Folded Reload
ld s1, 8(sp) # 8-byte Folded Reload
addi sp, sp, 32
ret

After patch:
addi sp, s0, -32
.cfi_def_cfa sp, 32
ld ra, 24(sp) # 8-byte Folded Reload
ld s0, 16(sp) # 8-byte Folded Reload
ld s1, 8(sp) # 8-byte Folded Reload
.cfi_restore ra
.cfi_restore s0
.cfi_restore s1
addi sp, sp, 32
.cfi_def_cfa_offset 0
ret

This functionality is already present in `riscv-gcc`, but it’s not in
`clang` and this slightly impairs the `lldb` debugging experience, e.g.
backtrace.
2024-11-06 00:20:21 +03:00
Craig Topper
8a9c170170
[RISCV] Align stack size down to a multiple of 16 before using cm.push/pop. (#86073)
This an alternative to #84935 to fix the miscompile, but not be optimal.

The immediate for cm.push/pop must be a multiple of 16. For RVE, it
might not be. It's not easy to increase the stack size without messing
up cfa directives and maybe other things.

This patch rounds the stack size down to a multiple of 16 before
clamping it to 48. This causes an extra addi to be emitted to handle the
remainder.

Once this commited, I can commit #84989 to add verification for these
instructions being generated with valid offsets.
2024-03-26 21:37:19 -07:00
Nemanja Ivanovic
08dd645c15
[RISC-V] Bad immediate value for Zcmp instructions with E extension (#84925)
When we are using the Zcmp extension together with the E extension in
32-bit mode and we need to spill both callee-saved registers as well as
needing a couple of 32-bit stack slots, we emit a meaningless stack
adjustment with cm.push/cm.popret. Furthermore this leads to the stack
slot for the ra being clobbered so control returns to a random location.

This is just a pre-commit test so that the PR for the fix shows the
difference in code generation.
2024-03-12 16:26:49 +01:00