In change https://reviews.llvm.org/D152790, it was discovered that the
alignment requirement calculation for LDRD/STRD codegen was suboptimal
and the calculation for volatile loads and stores was adjusted.
This change here adopts the calculation for the remaining non-volatile
occurances.
Recommitting after undefined behavior fix in D155093.
Differential Revision: https://reviews.llvm.org/D153800
Record the SP adjustment on entry to each basic block. This is almost
always zero except on targets like ARM which can split a basic block in
the middle of a call sequence.
This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.
Differential Revision: https://reviews.llvm.org/D154281
Currently when compiling for an execute-only target without movt then
EmitStructByval will generate a constant pool load which isn't
compatible with execute-only. Handle this by emitting tMOVi32imm,
and also simplify the existing movt handling by emitting t2MOVi32imm
or MOVi32imm.
Differential Revision: https://reviews.llvm.org/D154944
The expansion of the various MOVi32imm pseudo-instructions works by
splitting the operand into components (either halfwords or bytes) and
emitting instructions to combine those components into the final
result. When the operand is an immediate with some components being
zero this can result in pointless instructions that just add zero.
Avoid this by restructuring things so that a separate function handles
splitting the operand into components, then don't emit the component
if it is a zero immediate. This is straightforward for movw/movt,
where we just don't emit the movt if it's zero, but the thumb1
expansion using mov/add/lsl is more complex, as even when we don't
emit a given byte we still need to get the shift correct.
Differential Revision: https://reviews.llvm.org/D154943
Mark the tMOVi32imm pseudo instr as killing the flags register.
The pseudo instruction expands to a sequence of 7 movs/lsls/adds
instructions, which are all Thumb-1 flag setting instructions.
For a test case, take an existing arm test which checks for
"Don't CSE a cmp across a call that clobbers CPSR."
and retarget it at thumbv6m execute-only.
Reviewed By: stuij
Differential Revision: https://reviews.llvm.org/D154845
Change-Id: I8f8209fbc40a833f8875629937b9606c1e2c021d
Currently in LowerConstantFP, when we compile for execute-only (XO) we don't
check what architecture we're compiling for (v6m=< or >v6m). We shouldn't get
here for v6m, so put in an assert.
Reviewed By: simonwallis2, dmgreen
Differential Revision: https://reviews.llvm.org/D154506
In llvm/test/CodeGen/ARM/large-stack.ll, the C in FileCheck wasn't
uppercased. This wasn't spotted in development as MacOS's HFS+ fs is apparently
often configured case-insensitive.
The ARM backend codebase is dotted with places where armv6-m will generate
constant pools. Now that we can generate execute-only code for armv6-m, we need
to make sure we use the movs/lsls/adds/lsls/adds/lsls/adds pattern instead of
these.
Big stacks is one of the obvious places. In this patch we take care of two
sites:
1. take care of big stacks in prologue/epilogue
2. take care of save/tSTRspi nodes, which implicitly fixes
emitThumbRegPlusImmInReg which is used in several frame lowering fns
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D154233
This reverts commit 92a9c30c61da7f973d55cd84fade424159b9cac9.
This has caused a test failure in the 2nd stage of Linaro's
Arm 32 bit buildbots.
LLVM::simplified-template-names.s
7: error: Simplified template DW_AT_name could not be reconstituted:
check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8: original: f3<unsigned char, (unsigned char)'\x00'>
check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9: reconstituted: f3<unsigned char, (unsigned char)'\x7f'>
check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I suspect a load/store is slightly off.
In change https://reviews.llvm.org/D152790, it was discovered that the
alignment requirement calculation for LDRD/STRD codegen was suboptimal
and the calculation for volatile loads and stores was adjusted.
This change here adopts the calculation for the remaining non-volatile
occurances.
Differential Revision: https://reviews.llvm.org/D153800
Fix https://github.com/llvm/llvm-project/issues/63579
```
% cat a.c
void foo() {}
% clang --target=arm-none-eabi -mthumb -mno-unaligned-access -fsanitize=kcfi a.c -S -o - | grep p2align
.p2align 1
% clang --target=armv6m-none-eabi -fsanitize=function a.c -S -o - | grep p2align
.p2align 1
```
Ensure that -fsanitize={function,kcfi} instrumented functions are aligned by at
least 4, so that loading the type hash before the function label will not cause
a misaligned access. This is especially important for -mno-unaligned-access
configurations that don't set `setMinFunctionAlignment` to 4 or greater.
With this patch, the generated assembly for the examples above will contain `.p2align 2`
before the type hash.
If `__attribute__((aligned(N)))` or `-falign-functions=N` is specified, the
larger alignment will be used.
Reviewed By: simon_tatham, samitolvanen
Differential Revision: https://reviews.llvm.org/D154125
The ExpandLibcallResult result was a bitcast and not the direct call
result, so we couldn't find the chain. Use the new separate chain
return value instead.
If we have a store of a load with no other uses in between it, it's
considered dead and is removed. So sometimes when legalizing a fixed
length vector store of an insert, we end up producing better code
through scalarization than without.
An example is the follow below:
%a = load <4 x i64>, ptr %x
%b = insertelement <4 x i64> %a, i64 %y, i32 2
store <4 x i64> %b, ptr %x
If this is scalarized, then DAGCombine successfully removes 3 of the 4
stores which are considered dead, and on RISC-V we get:
sd a1, 16(a0)
However if we make the vector type legal (-mattr=+v), then we lose the
optimisation because we don't scalarize it.
This patch attempts to recover the optimisation for vectors by
identifying patterns where we store a load with a single insert
inbetween, replacing it with a scalar store of the inserted element.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D152276
When we only have a 16-bit pc-relative branch instruction we generate
a table of address for a jump table. Currently this is placed inline,
but this won't work with execute-only memory. In this case generate
the jump table out-of-line.
Differential Revision: https://reviews.llvm.org/D153774
Recently eXecute Only (XO) codegen was also allowed for armv6-M. Previously this
was only implemented for ~armv7+, effectively if MOVW/MOVT is
available. Regarding long calls, we remove the check for MOVW/MOVT when
generating code for XO, which already was redundant as in the subtarget
initialization we already check if XO is valid for the target. And targets that
generate valid XO code should be able to handle the (wrapper globaladdress)
node.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D153782
The CPSR registers ops of the instructions constructed in ExpandTMOV32BitImm
were marked as kill, instead of define. Best to use the pre-existing
t1CondCodeOp fn to construct CPSRs.
Reviewed By: simonwallis2
Differential Revision: https://reviews.llvm.org/D153763
No longer conservatively assume a load/store accesses the stack when we
can prove that we did not compute any stack-relative address up to this
point in the program.
We do this in a cheap not-quite-a-dataflow-analysis: Assume
`NoStackAddressUsed` when all predecessors of a block already guarantee
it. Process blocks in reverse post order to guarantee that except for
loop headers we have processed all predecessors of a block before
processing the block itself. For loops we accept the conservative answer
as they are unlikely to be shrink-wrappable anyway.
Differential Revision: https://reviews.llvm.org/D152213
Switch and update some tests to use `update_llc_test_checks` to reduce
clutter in upcoming change.
Differential Revision: https://reviews.llvm.org/D152215
Volatile loads/stores of i64 are lowered to LDRD/STRD on ARMv5TE.
However, these instructions require the addresses to be aligned.
Unaligned loads/stores therefore should be ignored by this handling.
Differential Revision: https://reviews.llvm.org/D152790
The current implementation tries to handle the high and low halves
separately, but that's less efficient in most cases; use a wide SETCC
instead.
Differential Revision: https://reviews.llvm.org/D151358
Temporarily disabling the execute-only tests. We recently added codegen for
armv6-m, which is still in heavy development (D152795).
Disabling the tests while we're figuring out what's going on is probably the
least disruptive option, as a patch dependent on it also already landed.
[ARM] generate armv6m eXecute Only (XO) code for immediates, globals
Previously eXecute Only (XO) support was implemented for targets that support
MOVW/MOVT (~armv7+). See: https://reviews.llvm.org/D27449
XO prevents the compiler from generating data accesses to code sections. This
patch implements XO codegen for armv6-M, which does not support MOVW/MOVT, and
must resort to the following general pattern to avoid loads:
movs r3, :upper8_15:foo
lsls r3, #8
adds r3, :upper0_7:foo
lsls r3, #8
adds r3, :lower8_15:foo
lsls r3, #8
adds r3, :lower0_7:foo
ldr r3, [r3]
This is equivalent to the code pattern generated by GCC.
The above relocations are new to LLVM and have been implemented in a parent
patch: https://reviews.llvm.org/D149443.
This patch limits itself to implementing codegen for this pattern and enabling
XO for armv6-M in the backend.
Separate patches will follow for:
- switch tables
- replacing specific loads from constant islands which are spread out over the
ARM backend codebase. Amongst others: FastISel, call lowering, stack frames.
Reviewed By: john.brawn
Differential Revision: https://reviews.llvm.org/D152795
The `__DATA,xray_instr_map` section has label differences like
`.quad Lxray_sled_0-Ltmp0` that is represented as a pair of UNSIGNED and SUBTRACTOR relocations.
LLVM integrated assembler attempts to rewrite A-B into A-B'+offset where B' can
be included in the symbol table. B' is called an atom and should be a
non-temporary symbol in the same section. However, since `xray_instr_map` does
not define a non-temporary symbol, the SUBTRACTOR relocation will have no
associated symbol, and its `r_extern` value will be 0. Therefore, we will see
linker errors like:
error: SUBTRACTOR relocation must be extern at offset 0 of __DATA,xray_instr_map in a.o
To fix this issue, we need to define a non-temporary symbol in the section. We
can accomplish this by renaming `Lxray_sleds_start0` to `lxray_sleds_start0`
("L" to "l").
`lxray_sleds_start0` serves as the atom for this dead-strippable subsection.
With the `S_ATTR_LIVE_SUPPORT` attribute, `ld -dead_strip` will retain
subsections that reference live functions.
Special thanks to Oleksii Lozovskyi for reporting the issue and providing
initial analysis.
Differential Revision: https://reviews.llvm.org/D153239
Commit ec77747fbdca901e0fded58f940dae62e0f6b726 regenerated the check lines
without being very careful about which lines were updated. This attempts to fix
them to make sure the V7 and V8 lines are emitted as needed.
As mentioned by commit c5d38924dc6688c15b3fa133abeb3626e8f0767c (Apr 2020),
PC-relative entries avoid dynamic relocations and can therefore make the
section read-only.
This is similar to D78082 and D78590. We cannot commit to support
compiler/runtime built at different versions, so just don't play with versions.
For Mach-O support (incomplete yet), we use non-temporary `lxray_fn_idx[0-9]+`
symbols. Label differences are represented as a pair of UNSIGNED and SUBTRACTOR
relocations. The SUBTRACTOR external relocation requires r_extern==1 (needs to
reference a symbol table entry) which can be satisfied by `lxray_fn_idx[0-9]+`.
A `lxray_fn_idx[0-9]+` symbol also serves as the atom for this dead-strippable
section (follow-up to commit b9a134aa629de23a1dcf4be32e946e4e308fc64d).
Differential Revision: https://reviews.llvm.org/D152661
Add the `S_ATTR_LIVE_SUPPORT` attribute to the sections so that `ld -dead_strip`
will retain subsections that reference live functions, once we we add linker
private "l" symbols as atoms.
AArch64 has five system registers intended to be useful as thread
pointers: one for each exception level which is RW at that level and
inaccessible to lower ones, and the special TPIDRRO_EL0 which is
readable but not writable at EL0. AArch32 has three, corresponding to
the AArch64 ones that aren't specific to EL2 or EL3.
Currently clang supports only a subset of these registers, and not
even a consistent subset between AArch64 and AArch32:
- For AArch64, clang permits you to choose between the four TPIDR_ELn
thread registers, but not the fifth one, TPIDRRO_EL0.
- In AArch32, on the other hand, the //only// thread register you can
choose (apart from 'none, use a function call') is TPIDRURO, which
corresponds to (the bottom 32 bits of) AArch64's TPIDRRO_EL0.
So there is no thread register that you can currently use in both
targets!
For custom and bare-metal purposes, users might very reasonably want
to use any of these thread registers. There's no reason they shouldn't
all be supported as options, even if the default choices follow
existing practice on typical operating systems.
This commit extends the range of values acceptable to the `-mtp=`
clang option, so that you can specify any of these registers by (the
lower-case version of) their official names in the ArmARM:
- For AArch64: tpidr_el0, tpidrro_el0, tpidr_el1, tpidr_el2, tpidr_el3
- For AArch32: tpidrurw, tpidruro, tpidrprw
All existing values of the option are still supported and behave the
same as before. Defaults are also unchanged. No command line that
worked already should change behaviour as a result of this.
The new values for the `-mtp=` option have been agreed with Arm's gcc
developers (although I don't know whether they plan to implement them
in the near future).
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D152433
Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D127115
XRay instrumentation works for macOS running on Apple Silicon, but
codegen is untested there. I'm going to make changes affecting this
target, get the XRay tests running on AArch64.
Data sections are going to become slightly different on x86_64 soon.
I do want the tests to be specific about symbol names, so instead of
having test check the common step, bifurcate tests a bit and check
the full symbol names.
As for ARM, XRay is not really supported on iOS at the moment, though
ARM is also really used there with modern phones. Nevertheless, codegen
tests exist and the output is going to change a little, make it easier
to write the special case for iOS.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D145291
Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D127115
The change implements intrinsics 'get_fpenv', 'set_fpenv' and 'reset_fpenv'.
They are used to read floating-point environment, set it or reset to
some default state. They do the same actions as C library functions
'fegetenv' and 'fesetenv'. By default these intrinsics are lowered to calls
to these functions.
The new intrinsics specify FP environment as a value of integer type, it
is convenient of most targets where the FP state is a content of some
register. Some targets however use long representations. On X86 the size
of FP environment is 256 bits, and even half of this size is not a legal
ibteger type. To facilitate legalization in such cases, two sets of DAG
nodes is used. Nodes GET_FPENV and SET_FPENV are used when FP
environment may be represented by a legal integer type. Nodes
GET_FPENV_MEM and SET_FPENV_MEM consider FP environment as a region in
memory, much like `fesetenv` and `fegetenv` do. They are used when
target has long representation for floationg-point state.
Differential Revision: https://reviews.llvm.org/D71742
This is an attempt to reland D42600 and enabling this optimisation by default.
This also resolves the issue pointed out in the context of PGO build.
Differential Revision: https://reviews.llvm.org/D42600
There are two motivations.
`-fno-pic -fstack-protector -mstack-protector-guard=global` created
`__stack_chk_guard` is referenced directly on all ELF OSes except FreeBSD.
This patch allows referencing the symbol indirectly with
-fno-direct-access-external-data.
Some Linux kernel folks want
`-fno-pic -fstack-protector -mstack-protector-guard-reg=gs -mstack-protector-guard-symbol=__stack_chk_guard`
created `__stack_chk_guard` to be referenced directly, avoiding
R_X86_64_REX_GOTPCRELX (even if the relocation may be optimized out by the linker).
https://github.com/llvm/llvm-project/issues/60116
Why they need this isn't so clear to me.
---
Add module flag "direct-access-external-data" and set the dso_local property of
the stack protector symbol. The module flag can benefit other LLVMCodeGen
synthesized symbols that are not represented in LLVM IR.
Nowadays, with `-fno-pic` being uncommon, ideally we should set
"direct-access-external-data" when it is true. However, doing so would require
~90 clang/test tests to be updated, which are too much.
As a compromise, we set "direct-access-external-data" only when it's different
from the implied default value.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D150841
We can compute a simpler expression for Lo for these cases. This
is an alternative for the test cases in D151180 that works for
more targets.
This is similar to some of the special cases we have for expanding
setcc operands.
Differential Revision: https://reviews.llvm.org/D151182
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762