These are module level properties, and querying them through
a function-level subtarget context is confusing. Plus we don't
need an aliased name.
Continue change started in 91439817e8d19613ac6e25ca9abd5e7534a9d33b
When targeting architectures that do not support unaligned memory
accesses or when explictly pass -mno-unaligned-access, it requires the
compiler to expand each unaligned load/store into an inline sequences.
For 32-bit operations this typically involves:
1. 4× LDRB (or 2× LDRH),
2. multiple shift/or instructions
These sequences are emitted at every unaligned access site, and
therefore contribute significant code size in workloads that touch
packed or misaligned structures.
When compiling with -Oz and in combination with -mno-unaligned-access,
this patch lowers unaligned 32 bit and 64 bit loads and stores to below
AEABI heper calls:
```
__aeabi_uread4
__aeabi_uread8
__aeabi_uwrite4
__aeabi_uwrite8
```
And it provide a way to perform unaligned memory accesses on targets
that do not support them, such as ARMv6-M or when compiling with
-mno-unaligned-access. Although each use introduces a function call
making it less straightforward than using raw loads and stores the call
itself is often much smaller than the compiler emitted sequence of
multiple ldrb/strb operations. As a result, these helpers can greatly
reduce code-size providing they are invoked more than once across a
program.
1. Functions become smaller in AEABI mode once they contain more than a
few unaligned accesses.
2. The total image .text size becomes smaller whenever multiple
functions call the same helpers.
This PR is derived from https://reviews.llvm.org/D57595, with some minor
changes.
Co-authored-by: David Green
Libcall lowering decisions should come from the LibcallLoweringInfo
analysis. Query this through the DAG, so eventually the source
can be the analysis. For the moment this is just a wrapper around
the TargetLowering information.
Splits out change from https://github.com/llvm/llvm-project/pull/176015
Changes shouldExpandAtomicRMWInIR to take a constant argument: This is
to allow some other TargetLowering constant-argument functions to call
it. This change touches several backends. An alternative solution
exists, but to me, this seems the "right" way.
The existing condition for checking whether or not to expand an frem
instruction in expand-fp is not sufficiently precise.
The expansion on other targets than AMDGPU - which is the only intended
user right now - is only prevented due to the interaction with the
MaxLegalFpConvertBitWidth check. Relying on this is conceptually wrong
and limits the use of the pass for other targets and further expansions
(e.g. merging with the similar ExpandLargeDivRem pass).
Change the expansion criterion to always expand frem of a given type
for targets that use "Expand" as the legalization action for the
underlying scalar type and use this to exit the pass early for targets
which do not require any expansions. This requires to change the
frem legalization action for all targets which do not want frem to
be expanded in this pass from "Expand" to "LibCall".
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Previously libcall lowering decisions were made directly
in the TargetLowering constructor. Pull these into the subtarget
to facilitate turning LibcallLoweringInfo into a separate analysis
in the future.
Currently LibcallLoweringInfo is defined inside of TargetLowering,
which is owned by the subtarget. Pass in the subtarget so we can
construct LibcallLoweringInfo with the subtarget. This is a temporary
step that should be revertable in the future, after LibcallLoweringInfo
is moved out of TargetLowering.
This allows SDNodes to be validated against their expected type profiles
and reduces the number of changes required to add a new node.
Some nodes fail validation, those are enumerated in
`ARMSelectionDAGInfo::verifyTargetNode()`. Some of the bugs are easy to
fix, but probably they should be fixed separately, this patch is already big.
Part of #119709.
Pull Request: https://github.com/llvm/llvm-project/pull/168212
For tail-calls we want to re-use the caller stack-frame and potentially
need to copy stack arguments.
For large stack arguments, such as by-val structs, this can lead to
overwriting incoming stack arguments when preparing outgoing ones by
copying them. E.g., in cases like
%"struct.s1" = type { [19 x i32] }
define void @f0(ptr byval(%"struct.s1") %0, ptr %1) {
tail call void @f1(ptr %1, ptr byval(%"struct.s1") %0)
ret void
}
declare void @f1(ptr, ptr)
that swap arguments, the last bytes of %0 are on the stack, followed by
%1. To prepare the outgoing arguments, %0 needs to be copied and %1
needs to be loaded into r0. However, currently the copy of %0
overwrites the location of %1, resulting in loading garbage into r0.
We fix that by forcing the load to the pointer stack argument to happen
before the copy.
This avoids AArch64 legality rules depending on libcall
availability.
ARM, AArch64, and X86 all had custom lowering of fsincos which
all were just to emit calls to sincos_stret / sincosf_stret. This
messes with the cost heuristics around legality, because really
it's an expand/libcall cost and not a favorable custom.
This is a bit ugly, because we're emitting code trying to match the
C ABI lowered IR type for the aggregate return type. This now also
gives an easy way to lift the unhandled x86_32 darwin case, since
ARM already handled the return as sret case.
This patch does the same changes as D143001 for AArch64.
This PR is part of the work on adding strict FP support in ARM, which
was previously discussed in #137101.
This consists of marking the various strict opcodes as legal, and
adjusting instruction selection patterns so that 'op' is 'any_op'. The
changes are similar to those in D114946 for AArch64.
Custom lowering and promotion are set for some FP16 strict ops to work
correctly.
This PR is part of the work on adding strict FP support in ARM, which
was previously discussed in #137101.
Implement KCFI (Kernel Control Flow Integrity) backend support for
ARM32, Thumb2, and Thumb1. The Linux kernel has supported ARM KCFI via
Clang's generic KCFI implementation, but this has finally started to
[cause problems](https://github.com/ClangBuiltLinux/linux/issues/2124)
so it's time to get the KCFI operand bundle lowering working on ARM.
Supports patchable-function-prefix with adjusted load offsets. Provides
an instruction size worst case estimate of how large the KCFI bundle is
so that range-limited instructions (e.g. cbz) know how big the indirect
calls can become.
ARM implementation notes:
- Four-instruction EOR sequence builds the 32-bit type ID byte-by-byte
to work within ARM's modified immediate encoding constraints.
- Scratch register selection: r12 (IP) is preferred, r3 used as fallback
when r12 holds the call target. r3 gets spilled/reloaded if it is
being used as a call argument.
- UDF trap encoding: 0x8000 | (0x1F << 5) | target_reg_index, similar
to aarch64's trap encoding.
Thumb2 implementation notes:
- Logically the same as ARM
- UDF trap encoding: 0x80 | target_reg_index
Thumb1 implementation notes:
- Due to register pressure, 2 scratch registers are needed: r3 and r2,
which get spilled/reloaded if they are being used as call args.
- Instead of EOR, add/lsl sequence to load immediate, followed by
a compare.
- No trap encoding.
Update tests to validate all three sub targets.
As shown in #137101, fp16 lrint are not handled correctly on Arm. This
adds soft-half promotion for them, reusing the function that promotes a
value with operands (and can handle strict fp once that is added).
An inline asm constraint "Jr", in AArch32, means that if the input value
is a compile-time constant in the range -4095 to +4095, then it can be
inserted into the assembly language as an immediate operand, and
otherwise it will be placed in a register.
The comment in the Arm backend said "It is not clear what this
constraint is intended for". I believe the answer is that that range of
immediate values are the ones you can use in a LDR or STR instruction.
So it's suitable for cases like this:
asm("str %0,[%1,%2]" : : "r"(data), "r"(base), "Jr"(offset) : "memory");
in the same way that the "Ir" constraint is suitable for the immediate
in a data-processing instruction such as ADD or EOR.
Same deal we use for determining ucmp vs scmp.
Using selects on platforms that like selects is better than using usubo.
Rename function to be more general fitting this new description.
This hook replaces inline asm with LLVM intrinsics. It was intended to
match inline assembly implementations of bswap in libc headers and
replace them more optimizable implementations.
At this point, it has outlived its usefulness (see
https://github.com/llvm/llvm-project/issues/156571#issuecomment-3247638412),
as libc implementations no longer use inline assembly for this purpose.
Additionally, it breaks the "black box" property of inline assembly,
which some languages like Rust would like to guarantee.
Fixes https://github.com/llvm/llvm-project/issues/156571.
As noted in #153256, TableGen is generating reserved names for
RuntimeLibcalls, which resulted in a build failure for Arm64EC since
`vcruntime.h` defines `__security_check_cookie` as a macro.
To avoid using reserved names, all impl names will now be prefixed with
`Impl_`.
`NumLibcallImpls` was lifted out as a `constexpr size_t` instead of
being an enum field.
While I was churning the dependent code, I also removed the TODO to move
the impl enum into its own namespace and use an `enum class`: I
experimented with using an `enum class` and adding a namespace, but we
decided it was too verbose so it was dropped.
This is a sibling patch to #151612: passing gap masks to the renewal TLI
hooks for lowering interleaved stores that use shufflevector to do the
interleaving.
This is so that we don't expand to include unneeded 0 checks.
Also fix the logic error in LegalizerInfo so it is NOT legal on Thumb1
in Fast-ISEL.
Finally, Remove the README entry regarding this issue.
Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie,
and __guard_local. As far as I can tell these are all just different
names for the same shaped functionality on different systems.
These aren't really functions, but special global variable names. They
should probably be treated the same way; all the same contexts that
need to know about emittable function names also need to know about
this. This avoids a special case check in IRSymtab.
This isn't a complete change, there's a lot more cleanup which
should be done. The stack protector configuration system is a
complete mess. There are multiple overlapping controls, used in
3 different places. Some of the target control implementations overlap
with conditions used in the emission points, and some use correlated
but not identical conditions in different contexts.
i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and
insertSSPDeclarations are all used in inconsistent ways so I don't know
if I've tracked the intention of the system correctly.
The PowerPC test change is a bug fix on linux. Previously the manual
conditions were based around !isOSOpenBSD, which is not the condition
where __stack_chk_guard are used. Now getSDagStackGuard returns the
proper global reference, resulting in LOAD_STACK_GUARD getting a
MachineMemOperand which allows scheduling.