PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
If a virtual register is not assigned preferred physical register, it means some
COPY instructions will be changed to real register move instructions. In this
case we can try to split the virtual register in colder blocks, if success, the
original COPY instructions can be deleted, and the new COPY instructions in
colder blocks will be generated as register move instructions. It results in
fewer dynamic register move instructions executed.
The new test case split-reg-with-hint.ll gives an example, the hot path contains
24 instructions without this patch, now it is only 4 instructions with this
patch.
Differential Revision: https://reviews.llvm.org/D156491
The issue is uncovered by #47698: for IR files without a target triple,
-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple, leaving a target triple which
may not make sense, e.g. riscv64-apple-darwin.
Therefore, -march= is error-prone and not recommended for tests without a target
triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead
of rejecting it outrightly.
Currently our AVRShiftExpand pass expands only 32-bit shifts, with the
assumption that other kinds of shifts (e.g. 64-bit ones) are
automatically reduced to 8-bit ones by LLVM during ISel.
However this is not always true and causes problems in the rust-lang runtime.
This commit changes the logic a bit, so that instead of expanding only
32-bit shifts, we expand shifts of all types except 8-bit and 16-bit.
This is not the most optimal solution, because 64-bit shifts can be
expanded to 32-bit shifts which has been deeply optimized.
I've checked the generated code using rustc + simavr, and all shifts
seem to behave correctly.
Spotted in the wild in rustc:
https://github.com/rust-lang/compiler-builtins/issues/523https://github.com/rust-lang/rust/issues/112140
Reviewed By: benshi001
Differential Revision: https://reviews.llvm.org/D154785
This patch fixes the failed test of verifyInstructionPredicates which is caused by verifyInstructionPredicates. verifyInstructionPredicates will add JMPk without checking the target predicate.
Reviewed By: benshi001
Differential Revision: https://reviews.llvm.org/D155570
Since ROLBRd needs an implicit R1 (on AVR) or an implicit R17 (on AVRTiny),
we split ROLBRd to ROLBRdR1 (on AVR) and ROLBRdR17 (on AVRTiny).
Reviewed By: aykevl, Patryk27
Differential Revision: https://reviews.llvm.org/D152248
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
The 'ELPM' instruction has three forms:
--------------------------
| form | feature |
| ----------- | -------- |
| ELPM | hasELPM |
| ELPM Rd, Z | hasELPMX |
| ELPM Rd, Z+ | hasELPMX |
--------------------------
The second form is always used in the expansion of pseudo instructions
LPMWRdZ/ELPMWRdZ. But for devices without ELPMX and with only ELPM,
only the first form can be used.
Reviewed By: aykevl, Miss_Grape
Differential Revision: https://reviews.llvm.org/D141264
Define and use a MachineOperand overload of isPlainlyKilled. This
improves codegen in a couple of tests because it catches the case where
MO does not kill Reg but another operand of the same instruction does.
Differential Revision: https://reviews.llvm.org/D147167
The 'LPM' instruction has three forms:
------------------------
| form | feature |
| ---------- | --------|
| LPM | hasLPM |
| LPM Rd, Z | hasLPMX |
| LPM Rd, Z+ | hasLPMX |
------------------------
The second form is always selected in ISelDAGToDAG, even on devices
without FeatureLPMX. This patch emits "LPM + MOV" on devices with
only FeatureLPM.
Reviewed By: jacquesguan
Differential Revision: https://reviews.llvm.org/D141246
The 'ELPM' instruction has three forms:
--------------------------
| form | feature |
| ----------- | -------- |
| ELPM | hasELPM |
| ELPM Rd, Z | hasELPMX |
| ELPM Rd, Z+ | hasELPMX |
--------------------------
The second form is always used in the expansion of the pseudo
instruction 'ELPMBRdZ'. But for devices without ELPMX but only
with ELPM, only the first form can be emitted.
Reviewed By: jacquesguan
Differential Revision: https://reviews.llvm.org/D141221
In AVRFrameLowering::spillCalleeSavedRegisters(), when a 16-bit
livein register is spilled, two PUSH instructions are generated
for the higher and lower 8-bit registers. But these two 8-bit
registers are marked as killed in the two PUSH instructions, so
any future use of them will cause a crash.
This patch fixes the above issue by adding the two sub 8-bit
registers to the livein list.
Fixes https://github.com/llvm/llvm-project/issues/56423
Reviewed By: jacquesguan
Differential Revision: https://reviews.llvm.org/D144720
All hardware address spaces on AVR can be freely cast between (they keep
the same bit pattern). They just aren't dereferenceable when they're in
a different address space as they really do point to a separate address
space.
This is supported in avr-gcc: https://godbolt.org/z/9Gfvhnhv9
avr-gcc also supports the `__memx` address space which is 24 bits. We
don't support this address space yet but I've added a safeguard just in
case.
Differential Revison: https://reviews.llvm.org/D142107
This pseudo-instruction stores two small (8-bit) registers into one wide
(16-bit) register. But apparently the order matters a lot to the
register allocator.
This patch changes the order of inserting the registers to optimize for
the best register allocation in the tests of shift32.ll. It might be
detrimental in other cases, but keeping the registers in the same
physical register seems like it would be a common case.
Differential Revision: https://reviews.llvm.org/D140573
This optimization turns shifts of almost a multiple of 8 into a shift
into the opposite direction. Unfortunately it doesn't compose well with
the other optimizations (I've tried) so it's separate from them.
Differential Revision: https://reviews.llvm.org/D140572
This uses a complicated shift sequence that avr-gcc also uses, but
extended to work over any number of bytes and in both directions
(logical shift left and logical shift right). Unfortunately it can't be
used for an arithmetic shift right: I've tried to come up with a
sequence but couldn't.
Differential Revision: https://reviews.llvm.org/D140571
This patch optimizes 32-bit constant shifts by renaming registers. This
is very effective as the compiler would otherwise need to do a lot of
single bit shift instructions. Instead, the registers are renamed at the
SSA level which means the register allocator will insert the necessary
mov instructions.
Unfortunately, the register allocator will insert some unnecessary movs
with the current code. This will be fixed in a later patch.
Differential Revision: https://reviews.llvm.org/D140570
32-bit shift instructions were previously expanded using the default
SelectionDAG expander, which meant it used 16-bit constant shifts and
ORed them together. This works, but is far from optimal.
I've optimized 32-bit shifts on AVR using a custom inserter. This is
done using three new pseudo-instructions that take the upper and lower
bits of the value in two separate 16-bit registers and outputs two
16-bit registers.
This is the first commit in a series. When completed, shift instructions
will take around 31% less instructions on average for constant 32-bit
shifts, and is in all cases equal or better than the old behavior. It
also tends to match or outperform avr-gcc: the only cases where avr-gcc
does better is when it uses a loop to shift, or when the LLVM register
allocator inserts some unnecessary movs. But it even outperforms avr-gcc
in some cases where avr-gcc does not use a loop.
As a side effect, non-constant 32-bit shifts also become more efficient.
For some real-world differences: the build of compiler-rt I use in
TinyGo becomes 2.7% smaller and the build of picolibc I use becomes 0.9%
smaller. I think picolibc is a better representation of real-world code,
but even a ~1% reduction in code size is really significant.
The current patch just lays the groundwork. The result is actually a
regression in code size. Later patches will use this as a basis to
optimize these shift instructions.
Differential Revision: https://reviews.llvm.org/D140569
Integer legalization already supported splitting the output integer of
llround and llrint, but did not support this for lround and lrint yet.
This is not a problem for 32-bit architectures, but for 8/16-bit
architectures like AVR it results in a crash like this:
ExpandIntegerResult #0: t7: i32 = lround t6
LLVM ERROR: Do not know how to expand the result of this operator!
This patch simply add lrint/lround to the list of ISD opcodes to expand.
Fixes https://github.com/llvm/llvm-project/issues/59573.
Differential Revision: https://reviews.llvm.org/D140822
These two symbols are declared in object files to indicate whether .data
needs to be copied from flash or .bss needs to be cleared. They are
supported on avr-gcc and reduce firmware size a bit, which is especially
important on very small chips.
I checked the behavior of avr-gcc and matched it as well as possible.
From my investigation, it seems to work as follows:
__do_copy_data is set when the compiler finds a data symbol:
* without a section name
* with a section name starting with ".data" or ".gnu.linkonce.d"
* with a section name starting with ".rodata" or ".gnu.linkonce.r" and
flash and RAM are in the same address space
__do_clear_bss is set when the compiler finds a data symbol:
* without a section name
* with a section name that starts with .bss
Simply checking whether the calculated section name starts with ".data",
".rodata" or ".bss" should result in the same behavior.
Fixes: https://github.com/llvm/llvm-project/issues/58857
Differential Revision: https://reviews.llvm.org/D140830
The 32-bit LDS/STS are not available on AVRTiny, so we have
to use their compact 16-bit form for memory access.
Reviewed By: aykevl
Differential Revision: https://reviews.llvm.org/D139687
These operands are illegal and rejected by avr-gcc.
subi r24, -lo8(symobl+offset)
sbci r25, -hi8(symobl+offset)
And their correct form should be
subi r24, lo8(-(symobl+offset))
sbci r25, hi8(-(symobl+offset))
Reviewed By: aykevl
Differential Revision: https://reviews.llvm.org/D140473
The attiny4/attiny5/attiny9/attiny10 have a slightly modified
instruction set that drops a number of useful instructions. This patch
makes sure to not emit them on these "reduced tiny" cores.
The affected instructions are:
* lds and sts (load/store directly from data)
* ldd and std (load/store with displacement)
* adiw and sbiw (add/sub register pairs)
* various other instructions that were emitted without checking
whether the chip actually supports them (movw, adiw, etc)
There is a variant on lds and sts on these chips, but it can only
address a limited portion of the address space and is mainly useful to
load/store I/O registers (as an extension to the in and out
instructions). I have not implemented it here, implementing it can be
done in a separate patch.
This patch is not optimal. I'm sure it can be improved a lot. For
example, we could teach the instruction selector to not select lddw/stdw
instructions so that the weird pointer adjustments are not necessary.
But for now I've focused just on correctness, not on code quality.
Updates: https://github.com/llvm/llvm-project/issues/53459
Differential Revision: https://reviews.llvm.org/D131867
This patch makes sure the compiler uses R16/R17 on avrtiny (attiny10
etc) instead of R0/R1.
Some notes:
* For the NEGW and ROLB instructions, it adds an explicit zero
register. This is necessary because the zero register is different
on avrtiny (and InstrInfo Uses lines need a fixed register).
* Not entirely sure about putting all tests in features/avr-tiny.ll,
but it doesn't seem like the "target-cpu"="attiny10" attribute
works.
Updates: https://github.com/llvm/llvm-project/issues/53459
Differential Revision: https://reviews.llvm.org/D138582
A scalar which exceeds 4 bytes should be returned via stack, other
than via registers, on an AVRTiny device.
Reviewed By: aykevl
Differential Revision: https://reviews.llvm.org/D138201
R1 is a reserved register, but LLVM gives the APIs to know when it is
used or not. So this patch uses these APIs to only save/clear/restore R1
in interrupts when necessary.
The main issue here was getting inline assembly to work. One could argue
that this is the job of Clang, but for consistency I've made sure that
R1 is always usable in inline assembly even if that means clearing it
when it might not be needed.
Information on inline assembly in AVR can be found here:
https://www.nongnu.org/avr-libc/user-manual/inline_asm.html#asm_code
Essentially, this seems to suggest that r1 can be freely used in avr-gcc
inline assembly, even without specifying it as an input operand.
Differential Revision: https://reviews.llvm.org/D117426
The code to support the case when the register allocator has assigned
the same register to the src and the dst register operand isn't actually
needed:
* LDWRdPtr and LDDWRdPtrQ have an @earlyclobber on the output
register, so the register allocator will make sure to allocate a
different register for the output register.
* LDDWRdYQ does not have an @earlyclobber, but the pointer register is
the fixed Y register which is reserved. The register allocator won't
use reserved registers for the output value.
This removes a special case in the code that makes the pseudo
instruction expansion pass more complicated than it needs to be.
Differential Revision: https://reviews.llvm.org/D131844