Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.
Note: I don't have commit access.
Add three more special cases for loading registers with immediates.
The first allows values in the range of [-255, 255] to be loaded with
MOVEQ, even if the register is more than 8 bits and the sign extention
is unwanted. This is done by loading the bitwise complement of the
desired value, then performing a NOT instruction on the loaded register.
This special case is only used when a simple MOVEQ cannot be used, and
is only used for 32 bit data registers. Address registers cannot support
MOVEQ, and the two-instruction sequence is no faster or smaller than a
plain MOVE instruction when loading 16 bit immediates on the 68000, and
likely slower for more sophisticated microarchitectures. However, the
instruction sequence is both smaller and faster than the corresponding
MOVE instruction for 32 bit register widths.
The second special case is for zeroing address registers. This simply
expands to subtracting a register with itself, consuming one instruction
word rather than 2-3, with a small improvement in speed as well.
The last special case is for assigning sign-extended 16-bit values to a
full address register. This takes advantage of the fact that the movea.w
instruction sign extends the output, permitting the immediate to be
smaller. This is similar to using lea with a 16-bit address, which is
not added in this patch as 16-bit absolute addressing is not yet
implemented.
This is a v2 submission of #90817. It also creates a 'Data' test
directory to better align with the backend's tablegen layout.
\#92331 tried to make `ObjCARCContractPass` by default, but it caused a
regression on O0 builds and was reverted.
This patch trys to bring that back by:
1. reverts the
[revert](1579e9ca9c).
2. `createObjCARCContractPass` only on optimized builds.
Tests are updated to refelect the changes. Specifically, all `O0` tests
should not include `ObjCARCContractPass`
Signed-off-by: Peter Rong <PeterRong@meta.com>
This reverts commit 8cc8e5d6c6ac9bfc888f3449f7e424678deae8c2.
This reverts commit dae55c89835347a353619f506ee5c8f8a2c136a7.
Causes major compile-time regressions for unoptimized builds.
The m68k backend will always emit external calls (including libcalls)
with
PC-relative PLT relocations, even when in non-pic mode or -fno-plt is
used.
This is unexpected, as other function calls are emitted with absolute
addressing, and a static code modes suggests that there is no PLT. It
also
leads to a miscompilation where the call instruction emitted expects an
immediate address, while the relocation emitted for that instruction is
PC-relative.
This miscompilation can even be seen in the default C function in
godbolt:
https://godbolt.org/z/zEoazovzo
Fix the issue by classifying external function references based upon the
pic
mode. This triggers a change in the static code model, making it more in
line
with the expected behaviour and allowing use of this backend in more
bare-metal
situations where a PLT does not exist.
The change avoids the issue where we emit a PLT32 relocation for an
absolute
call, and makes libcalls and other external calls use absolute
addressing modes
when a static code model is desired.
Further work should be done in instruction lowering and validation to
ensure
that miscompilations of the same type don't occur.
Add support for the moveq instruction, which is both faster and smaller
(1/2 to 1/3 the size) than a move with immediate to register.
This change introduces the instruction, along with a set of
pseudoinstructions to handle immediate moves to a register that is
lowered post-RA.
Pseudos are used as moveq can only write to the full register, which
makes
matching i8 and i16 immediate loads difficult in tablegen. Furthermore,
selecting moveq before RA constrains that immediate to be moved into a
data
register, which may not be optimal.
The bulk of this change are fixes to existing tests, which cover the new
functionality sufficiently.
Currently the bitwise NOT instruction is not recognized. Add support for
using NOT on data registers. This is a partial implementation that puts
NOT at the same level of support as NEG currently enjoys.
Using not rather than eori cuts the length of the encoded instruction
in half or in thirds, leading to a reduction of 4-10 cycles per
instruction, on the original 68000.
This change includes tests for both bitwise and arithmetic negation.
We lower overflow arithmetics to its M68kISD counterparts that produce
results of {i16/i32, i8} in which the second resut represents CCR. In
the event where we're certain there won't be an overflow, for instance
8 & 16-bit multiplications, we simply use zero in replacement of the
second result.
This patch replaces M68kISD::CMOV that takes this kind of zero or
all-ones CCR as condition value with its corresponding operand value.
M68k only has 16-bit x 16-bit -> 32-bit variant for multiplications
taking 16-bit operands. We still define two input operands for this
class of instructions, and tie the first operand to the result value.
The problem is that the two operands have different register classes
(DR32 and DR16) hence making these instructions communitive produces
invalid MachineInstr (though the final assembly will still be correct).
The codegen logic for overflow arithmetics (e.g. llvm.uadd.overflow)
was a mess; overflow multiplications were not even supported.
This patch clean up the legalization of overflow arithmetics and add
supports for common variants of overflow multiplications.
`M68k_RTD` is really similar to X86's stdcall, in which callee pops the
arguments from stack. In LLVM IR it can be written as `m68k_rtdcc`.
This patch also improves how ExpandPseudo Pass handles popping stack at
function returns in the absent of the RTD instruction.
Differential Revision: https://reviews.llvm.org/D149864
Previously when targeting 68020+, instruction selection attempted to
emit a 32-bit register-register multiplication, but failed at instruction
selection. With this, it succeeds.
Differential Revision: https://reviews.llvm.org/D152120
`TargetGlobalTLSAddress` is not considered and handled correctly when matching addressing mode, which leads to an incorrect result of instruction selection.
fixes#63162.
Reviewed By: myhsu
Differential Revision: https://reviews.llvm.org/D153103
This patch introduces TLS (Thread-Local Storage) support to the LLVM m68k backend.
Reviewed By: glaubitz
Differential Revision: https://reviews.llvm.org/D144941
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
Instruction selection was failing when trying to zero extend a value
loaded from a PC-relative address. This adds support for zero extension
using the "program counter indirect with displacement" addressing mode.
It also adds a test with code that was previously failing to compile.
This fixes a compile error in Rust's libcore.
Differential Revision: https://reviews.llvm.org/D149034
If it couldn't fit the return value in two registers, this caused an
error during codegen. It seems this method is implemented in other
backends but not here, and allows it to pass return values in memory
when it isn't able to do so in registers.
Seems to fix compilation of Rust code with certain return types:
https://github.com/rust-lang/rust/issues/89498
Differential Revision: https://reviews.llvm.org/D148856
Note that technically both M68000/010 can use M68881, despite the fact
that usually only M68020 and newer ISAs are equipped with M68881/2.
M68040 and newer processors have builtin M68882.
Differential Revision: https://reviews.llvm.org/D147479
Ideally we want to lower ATOMIC_FENCE into `__sync_synchronize`.
However, libgcc doesn't implement that builtin as GCC simply generates an
inline assembly barrier whenever there needs to be a fence.
We use a similar way to lower ATOMIC_FENCE.
Differential Revision: https://reviews.llvm.org/D146996
Put the value into A0 instead of data registers. And remove the
redundant `RetCC_M68kCommon` as there aren't many rules shared between
existing CCs other than the pointer one.
This change is tested by existing tests.
I added a new pass, callbrprepare, to the pass pipelines in
commit a3a84c9e2511 ("[llvm] add CallBrPrepare pass to pipelines")
but did not test experimental backends.
Issue #58168 describes the difficulty diagnosing stack size issues
identified by -Wframe-larger-than. For simple code, its easy to
understand the stack layout and where space is being allocated, but in
more complex programs, where code may be heavily inlined, unrolled, and
have duplicated code paths, it is no longer easy to manually inspect the
source program and understand where stack space can be attributed.
This patch implements a machine function pass that emits remarks with a
textual representation of stack slots, and also outputs any available
debug information to map source variables to those slots.
The new behavior can be used by adding `-Rpass-analysis=stack-frame-layout`
to the compiler invocation. Like other remarks the diagnostic
information can be saved to a file in a machine readable format by
adding -fsave-optimzation-record.
Fixes: #58168
Reviewed By: nickdesaulniers, thegameg
Differential Revision: https://reviews.llvm.org/D135488
I missed that a forward fix was out when reverting
0a652c540556a118bbd9386ed3ab7fd9e60a9754.
This reverts commit 488bea797e167e6bf5ddab5f7eea78031b575ba0.
This reverts commit 122efef8ee9be57055d204d52c38700fe933c033.
- Patch fixed to not reuse definitions from predecessors in EH landing pads.
- Late review suggestions (by MaskRay) have been addressed.
- M68k/pipeline.ll test updated.
- Init captures added in processBlock() to avoid capturing structured bindings.
- RISCV has this disabled for now.
Original commit message:
A new pass MachineLateInstrsCleanup is added to be run after PEI.
This is a simple pass that removes redundant and identical instructions
whenever found by scanning the MF once while keeping track of register
definitions in a map. These instructions are typically immediate loads
resulting from rematerialization, and address loads emitted by target in
eliminateFrameInde().
This is enabled by default, but a target could easily disable it by means of
'disablePass(&MachineLateInstrsCleanupID);'.
This late cleanup is naturally not "optimal" in removing instructions as it
is done by looking at phys-regs, but still quite effective. It would be
desirable to improve other parts of CodeGen and avoid these redundant
instructions in the first place, but there are no ideas for this yet.
Differential Revision: https://reviews.llvm.org/D123394
Reviewed By: RKSimon, foad, craig.topper, arsenm, asb
Currently per-function metadata consists of:
(start-pc, size, features)
This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D136078