Depends on #96158 and #96159
Support the following relocations and assembly operators:
- `R_AARCH64_AUTH_ADR_GOT_PAGE` (`:got_auth:` for `adrp`)
- `R_AARCH64_AUTH_LD64_GOT_LO12_NC` (`:got_auth_lo12:` for `ldr`)
- `R_AARCH64_AUTH_GOT_ADD_LO12_NC` (`:got_auth_lo12:` for `add`)
`LOADgotAUTH` pseudo-instruction is introduced which is later expanded
to actual instruction sequence like the following.
```
adrp x16, :got_auth:sym
add x16, x16, :got_auth_lo12:sym
ldr x0, [x16]
autia x0, x16
```
If a resign is requested, like below, `LOADgotPAC` pseudo is used, and
GOT load is lowered similarly to `LOADgotAUTH`.
```
@var = global i32 0
define ptr @resign_globalvar() {
ret ptr ptrauth (ptr @var, i32 3, i64 43)
}
```
Both SelectionDAG and GlobalISel are suppported. For FastISel, we fall
back to SelectionDAG.
Tests starting with 'ptrauth-' have corresponding variants w/o this
prefix.
See also specification
https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#appendix-signed-got
Enabled in clang using:
-fptrauth-indirect-gotos
and at the IR level using function attribute:
"ptrauth-indirect-gotos"
Signing uses IA and a per-function integer discriminator. The
discriminator isn't ABI-visible, and is currently:
ptrauth_string_discriminator("<function_name> blockaddress")
A sufficiently sophisticated frontend could benefit from per-indirectbr
discrimination, which would need additional machinery, such as allowing
"ptrauth" bundles on indirectbr. For our purposes, the simple scheme
above is sufficient.
This approach doesn't support subtracting label addresses and using
the result as offsets, because each label address is signed.
Pointer arithmetic on signed pointers corrupts the signature bits,
and because label address expressions aren't typed beyond void*,
we can't do anything reliably intelligent on the arithmetic exprs.
Not signing addresses when used to form offsets would allow
easily hijacking control flow by overwriting the offset.
This diagnoses the basic cases (`&&lbl2 - &&lbl1`) in the frontend,
while we evaluate either alternative implementations (e.g., lowering
blockaddress to a bb number, and indirectbr to a checked jump-table),
or better diagnostics (both at the frontend level and on unencodable
IR constants).
This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5.
I moved the `ISD` dependencies into the CodeGen portion of the handling,
it's a little awkward but it's the easiest solution I can think of for
now.
Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.
This is important for targets like AMDGPU that want to be able to use
`lld` to perform the final link step, but does not want the overhead of
uncalled functions. (This adds like a second to the link time trivially)
This change is an implementation of
https://github.com/llvm/llvm-project/issues/87367's investigation on
supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
This PR is just for Tan.
Now that x86 tan backend landed:
https://github.com/llvm/llvm-project/pull/90503 we can add other
backends since the shared pieces are in tree now.
Changes:
- `llvm/include/llvm/Analysis/VecFuncs.def` - vectorization of tan for
arm64 backends.
- `llvm/lib/Target/AArch64/AArch64FastISel.cpp` - Add tan to the libcall
table
- `llvm/lib/Target/AArch64/AArch64ISelLowering.cpp` - Add tan expansion
for f128, f16, and vector\neon operations
- `llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp` define
`G_FTAN` as a legal arm64 instruction
resolves#94755
There were a number of issues that needed to be addressed:
- i64 to bf16 did not correctly round
- strict rounding needed to yield a chain
- fastisel did not have logic to bail on bf16
Clang sets the nonlazybind attribute for certain ObjC features. The
AArch64 SelectionDAG implementation for non-intrinsic calls
(46e36f0953aabb5e5cd00ed8d296d60f9f71b424) is behind a cl option.
GCC implements -fno-plt for a few ELF targets. In Clang, -fno-plt also
sets the nonlazybind attribute. For SelectionDAG, make the cl option not
affect ELF so that non-intrinsic calls to a dso_preemptable function use
GOT. Adjust AArch64TargetLowering::LowerCall to handle intrinsic calls.
For FastISel, change `fastLowerCall` to bail out when a call is due to
-fno-plt.
For GlobalISel, handle non-intrinsic calls in CallLowering::lowerCall
and intrinsic calls in AArch64CallLowering::lowerCall (where the
target-independent CallLowering::lowerCall is not called).
The GlobalISel test in `call-rv-marker.ll` is therefore updated.
Note: the current -fno-plt -fpic implementation does not use GOT for a
preemptable function.
Link: #78275
Pull Request: https://github.com/llvm/llvm-project/pull/78890
For __arm_new("zt0") we need to have special setup code in the prologue.
For calls that don't preserve zt0, we need to emit code preserve ZT0
around the call.
This is only emitted by SelectionDAG ISel at the moment.
This combines the previously posted patches with some additional work
I've done to more closely match MSVC output.
Most of the important logic here is implemented in
AArch64Arm64ECCallLowering. The purpose of the
AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for
other targets, and generate most of the Arm64EC-specific bits:
generating thunks, mangling symbols, generating aliases, and generating
the .hybmp$x table. This is all done late for a few reasons: to
consolidate the logic as much as possible, and to ensure the IR exposed
to optimization passes doesn't contain complex arm64ec-specific
constructs.
The other changes are supporting changes, to handle the new constructs
generated by that pass.
There's a global llvm.arm64ec.symbolmap representing the .hybmp$x
entries for the thunks. This gets handled directly by the AsmPrinter
because it needs symbol indexes that aren't available before that.
There are two new calling conventions used to represent calls to and
from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few
changes to handle the associated exception-handling info,
SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX.
I've intentionally left out handling for structs with small
non-power-of-two sizes, because that's easily separated out. The rest of
my current work is here. I squashed my current patches because they were
split in ways that didn't really make sense. Maybe I could split out
some bits, but it's hard to meaningfully test most of the parts
independently.
Thanks to @dpaoliello for extensive testing and suggestions.
(Originally posted as https://reviews.llvm.org/D157547 .)
Vectors are always bit-packed and don't respect the elements' alignment
requirements. This is different from arrays. This means offsets of
vector GEPs need to be computed differently than offsets of array GEPs.
This PR fixes many places that rely on an incorrect pattern
that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`.
We replace these by usages of `GTI.getSequentialElementStride(DL)`,
which is a new helper function added in this PR.
This changes behavior for GEPs into vectors with element types for which
the (bit) size and alloc size is different. This includes two cases:
* Types with a bit size that is not a multiple of a byte, e.g. i1.
GEPs into such vectors are questionable to begin with, as some elements
are not even addressable.
* Overaligned types, e.g. i16 with 32-bit alignment.
Existing tests are unaffected, but a miscompilation of a new test is fixed.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
When shl is folded in compare instruction, a miscompilation occurs when
the CMP instruction is also sign-extended. For the following IR:
%op3 = shl i8 %op2, 3
%tmp3 = icmp eq i8 %tmp2, %op3
It used to generate
cmp w8, w9, sxtb #3
which means sign extend w9, shift left by 3, and then compare with the
value in w8. However, the original intention of the IR would require
`%op2` to first shift left before extending the operands in the
comparison operation . Moreover, if sign extension is used instead of
zero extension, the sample test would miscompile. This PR creates a fix
for the issue, more specifically to not fold the left shift into the CMP
instruction, and to create a zero-extended value rather than a
sign-extended value.
The WebKit Calling Convention was created specifically for the WebKit
FTL. FTL
doesn't use LLVM anymore and therefore this calling convention is
obsolete.
This commit removes the WebKit CC, its associated tests, and
documentation.
This patch fixes an issue where x18 is used for passing pointer
parameters with
the nest attribute on Windows on AArch64. The x18 register is reserved
in the
WoA ABI so can't be used for this purpose. This is fixed by introducing
a new
Win64PCS calling convention that differs from the standard AAPCS only by
not
using x18 for nest parameters.
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
The patch D136361 disabled GlobalISel and FastISel for some SME functions, as
the saving and restoring of SM is not yet handled. There were several tests
added for fp128 fadd, which will be expanded to a libcall, that only happened
to work by accident and did not handle other cases such as f32/f64 frem
libcalls.
This extends the cases where GlobalISel / FastISel is disabled for functions
with SME attributes, under the assumption that it is difficult to tell what
will become a libcall reliably, and so should fall back for all function until
GlobalISel and/or FastISel can handle them.
Differential Revision: https://reviews.llvm.org/D158490
The term "next stack offset" is misleading because the next argument is
not necessarily allocated at this offset due to alignment constrains.
It also does not make much sense when allocating arguments at negative
offsets (introduced in a follow-up patch), because the returned offset
would be past the end of the next argument.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D149566
This is rework of;
- rG13e77db2df94 (r328395; MVT)
Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h`
can be restored as well.
Depends on D148767
Differential Revision: https://reviews.llvm.org/D149024
With a similar reason as D148023; some applications make heavy use of
the CRC32 intrinsic (e.g., as part of a hash function) and therefore
benefit from avoiding frequent SelectionDAG fallbacks. In our
application, we get a 2% compile-time improvement.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D148917
The code closely follows the X86 back-end. Applications that make heavy
use of {i64, i64} returns to use two registers strongly benefit from the
reduced number of SelectionDAG fallbacks.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D148346
It turns that the issue was unrelated to the code-changes, but only triggered
by one of the tests. The SMEABI pass incorrectly marked the CFG as preserved,
even though it modified the CFG.
This reverts commit 8bcf5df3043a906c7124b70b59eda925eddd7319.
This patch ensures that GlobalISel and FastISel fall back to regular DAG ISel when:
* A function requires streaming-mode to be enabled at the start/end of the function.
This happens when the function has no streaming interface, but does have a streaming body.
* A function requires a lazy-save to be committed at the start of the function.
This happens if the function has the `aarch64_pstate_za_new` attribute.
* A call to a function requires a change in streaming-mode.
* A call to a function requires a lazy-save buffer to be set up.
Patch by @CarolineConcatto
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D136361
Propagate PC sections metadata to MachineInstr when FastISel is doing
instruction selection.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D130884
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.
Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.
KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.
A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:
```
.c:
int f(void);
int (*p)(void) = f;
p();
.s:
.4byte __kcfi_typeid_f
.global f
f:
...
```
Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.
As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.
Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.
Relands 67504c95494ff05be2a613129110c9bcf17f6c13 with a fix for
32-bit builds.
Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay
Differential Revision: https://reviews.llvm.org/D119296
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.
Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.
KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.
A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:
```
.c:
int f(void);
int (*p)(void) = f;
p();
.s:
.4byte __kcfi_typeid_f
.global f
f:
...
```
Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.
As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.
Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.
Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay
Differential Revision: https://reviews.llvm.org/D119296
This reverts commit edb7ba714acba1d18a20d9f4986d2e38aee1d109.
This changes BLR_BTI to take variable_ops meaning that we can accept
a register or a label. The pattern still expects one argument so we'll
never get more than one. Then later we can check the type of the operand
to choose BL or BLR to emit.
(this is what BLR_RVMARKER does but I missed this detail of it first time around)
Also require NoSLSBLRMitigation which I missed in the first version.
Some implementations of setjmp will end with a br instead of a ret.
This means that the next instruction after a call to setjmp must be
a "bti j" (j for jump) to make this work when branch target identification
is enabled.
The BTI extension was added in armv8.5-a but the bti instruction is in the
hint space. This means we can emit it for any architecture version as long
as branch target enforcement flags are passed.
The starting point for the hint number is 32 then call adds 2, jump adds 4.
Hence "hint #36" for a "bti j" (and "hint #34" for the "bti c" you see
at the start of functions).
The existing Arm command line option -mno-bti-at-return-twice has been
applied to AArch64 as well.
Support is added to SelectionDAG Isel and GlobalIsel. FastIsel will
defer to SelectionDAG.
Based on the change done for M profile Arm in https://reviews.llvm.org/D112427Fixes#48888
Reviewed By: danielkiss
Differential Revision: https://reviews.llvm.org/D121707
Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos.....
Differential Revision: https://reviews.llvm.org/D111998
This extends any frame record created in the function to include that
parameter, passed in X22.
The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:
* 0b0000 => a normal, non-extended record with just [FP, LR]
* 0b0001 => the extended record [X22, FP, LR]
* 0b1111 => kernel space, and a non-extended record.
All other values are currently reserved.
If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.
There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).