This patch does the same changes as D143001 for AArch64.
This PR is part of the work on adding strict FP support in ARM, which
was previously discussed in #137101.
This consists of marking the various strict opcodes as legal, and
adjusting instruction selection patterns so that 'op' is 'any_op'. The
changes are similar to those in D114946 for AArch64.
Custom lowering and promotion are set for some FP16 strict ops to work
correctly.
This PR is part of the work on adding strict FP support in ARM, which
was previously discussed in #137101.
I'm not sure if this is the best way forward or not, but we have a lot
of issues with forgetting that shuffle_vectors can be scalar again and
again. (There is another example from the recent known-bits code added
recently). As a scalar-dst shuffle vector is just an extract, and a
scalar-source shuffle vector is just a build vector, this patch makes
scalar shuffle vector illegal and adjusts the irbuilder to create the
correct node as required.
Most targets do this already through lowering or combines. Making scalar
shuffles illegal simplifies gisel as a whole, it just requires that
transforms that create shuffles of new sizes to account for the scalar
shuffle being illegal (mostly IRBuilder and LessElements).
Implement KCFI (Kernel Control Flow Integrity) backend support for
ARM32, Thumb2, and Thumb1. The Linux kernel has supported ARM KCFI via
Clang's generic KCFI implementation, but this has finally started to
[cause problems](https://github.com/ClangBuiltLinux/linux/issues/2124)
so it's time to get the KCFI operand bundle lowering working on ARM.
Supports patchable-function-prefix with adjusted load offsets. Provides
an instruction size worst case estimate of how large the KCFI bundle is
so that range-limited instructions (e.g. cbz) know how big the indirect
calls can become.
ARM implementation notes:
- Four-instruction EOR sequence builds the 32-bit type ID byte-by-byte
to work within ARM's modified immediate encoding constraints.
- Scratch register selection: r12 (IP) is preferred, r3 used as fallback
when r12 holds the call target. r3 gets spilled/reloaded if it is
being used as a call argument.
- UDF trap encoding: 0x8000 | (0x1F << 5) | target_reg_index, similar
to aarch64's trap encoding.
Thumb2 implementation notes:
- Logically the same as ARM
- UDF trap encoding: 0x80 | target_reg_index
Thumb1 implementation notes:
- Due to register pressure, 2 scratch registers are needed: r3 and r2,
which get spilled/reloaded if they are being used as call args.
- Instead of EOR, add/lsl sequence to load immediate, followed by
a compare.
- No trap encoding.
Update tests to validate all three sub targets.
This patch ports the ISD::SUB handling from SelectionDAG’s ComputeNumSignBits to GlobalISel.
Related to https://github.com/llvm/llvm-project/issues/150515.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
Based on top of #157211.
`FNEG` and `FABS` must preserve signalling NaNs, meaning they should not
convert to f32 to perform the operation. Instead legalize to `XOR` and
`AND`.
Fixes almost all of #104915
Added new register FPSCR_RM to correctly model interactions with
rounding mode control bits of fpscr and to avoid performance regressions
in normal non-strictfp case
This PR is part of the work on adding strict FP support in ARM, which
was previously discussed in #137101.
As shown in #137101, fp16 lrint are not handled correctly on Arm. This
adds soft-half promotion for them, reusing the function that promotes a
value with operands (and can handle strict fp once that is added).
In the register allocator we define non-trivial rematerialization as the
rematerlization of an instruction with virtual register uses.
We have been able to perform non-trivial rematerialization for a while,
but it has been prevented by default unless specifically overriden by
the target in `TargetTransformInfo::isReMaterializableImpl`. The
original reasoning for this given by the comment in the default
implementation is because we might increase a live range of the virtual
register, but we don't actually do this.
LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize
instructions whose virtual registers are already live at the use sites.
https://reviews.llvm.org/D106408 had originally tried to remove this
restriction but it was reverted after some performance regressions were
reported. We think it is likely that the regressions were caused by the
fact that the old isTriviallyReMaterializable API sometimes returned
true for non-trivial rematerializations.
However https://github.com/llvm/llvm-project/pull/160377 recently split
the API out into a separate non-trivial and trivial version and updated
the call-sites accordingly, and
https://github.com/llvm/llvm-project/pull/160709 and #159180 fixed
heuristics which weren't accounting for the difference between
non-trivial and trivial.
With these fixes in place, this patch proposes to again allow
non-trivial rematerialization by default which reduces a significant
amount of spills and reloads across various targets.
For llvm-test-suite built with -O3 -flto, we get the following geomean
reduction in reloads:
- arm64-apple-darwin: 11.6%
- riscv64-linux-gnu: 8.1%
- x86_64-linux-gnu: 6.5%
Previously if we had a subregister extract reading from a
full copy, the no-subregister incoming copy would overwrite
the DefSubReg index of the folding context.
There's one ugly rvv regression, but it's a downstream
issue of this; an unnecessary same class reg-to-reg full copy
was avoided.
The iOS 2.x ABI had R9 as a reserved register, 3.0 made it available,
but support for the 2.x ABI was never added to LLVM. We only use the 2.x
ABI on armv6 since before 3.0 armv6 was the only architecture supported
by iOS.
This allows removing a special case hack in ARM. ARM's implementation
of getExtractSubregLikeInputs has the strange property that it reports
a register with a class that does not support the reported subregister
index. We can however reconstrain the register to support this usage.
This is an alternative to #159600. I've included the test, but
the output is different. In this case version the VMOVSR is
replaced with an ordinary subregister extract copy.
This code doesn't work very well, but this makes it work when intrinsic
definitions are present. It now discounts functions declarations from
the set of attributes it looks at.
The code would have worked better before
0ab5b5b8581d9f2951575f7245824e6e4fc57dec when module-level attributes
could provide the information used to construct build-attributes.
In this commit:
(1) Added new pass manager support for `ReachingDefAnalysis`.
(2) Added printer pass.
(3) Make old pass manager use `ReachingDefInfoWrapperPass`
The commit in [1] changes the behavior of the Arm backend for the
attribute frame-pointer=all. Before [1], leaf functions marked with
frame-pointer=all were not emitting the frame-pointer.
After [1], frame-pointer=all started generating frame pointer for all
functions, including leaf functions.
However, the default behavior for the driver in clang is to emit the
command line option `-mframe-pointer=all` on Arm, if no options for
handling the frame pointer is specified at command line. This causes
observable regressions.
This patch addresses these regressions by configuring the driver so
to emit `-mframe-pointer=non-leaf` when targeting Arm.
Codegen tests dealing with frame pointer generation have been extended
to handle functions with a tail call, since this configuration was
missing.
[1] 4a2bd78f5b0d0661c23dff9c4b93a393a49dbf9a
This hook replaces inline asm with LLVM intrinsics. It was intended to
match inline assembly implementations of bswap in libc headers and
replace them more optimizable implementations.
At this point, it has outlived its usefulness (see
https://github.com/llvm/llvm-project/issues/156571#issuecomment-3247638412),
as libc implementations no longer use inline assembly for this purpose.
Additionally, it breaks the "black box" property of inline assembly,
which some languages like Rust would like to guarantee.
Fixes https://github.com/llvm/llvm-project/issues/156571.
The patch add branch hint for AtomicExpandImpl::expandAtomicCmpXchg, For
example: in PowerPC, it support branch hint as
```
loop:
lwarx r6,0,r3 # load and reserve
cmpw r4,r6 #1st 2 operands equal? bne- exit #skip if not
bne- exit #skip if not
stwcx. r5,0,r3 #store new value if still res’ved bne- loop #loop if lost reservation
bne- loop #loop if lost reservation
exit:
mr r4,r6 #return value from storage
```
`-` hints not taken,
`+` hints taken,
When using no-movt we don't use the pcrel version of the literal load.
This change also unifies logic with the ARM version of this function as
well,
which has:
```
if (!Subtarget.useMovt() || ForceELFGOTPIC) {
// For ELF non-PIC, use GOT PIC code sequence as well because R_ARM_GOT_ABS
// does not have assembler support.
if (TM.isPositionIndependent() || ForceELFGOTPIC)
expandLoadStackGuardBase(MI, ARM::LDRLIT_ga_pcrel, ARM::LDRi12);
else
expandLoadStackGuardBase(MI, ARM::LDRLIT_ga_abs, ARM::LDRi12);
return;
}
```
rdar://138334512
This is so that we don't expand to include unneeded 0 checks.
Also fix the logic error in LegalizerInfo so it is NOT legal on Thumb1
in Fast-ISEL.
Finally, Remove the README entry regarding this issue.
Updates SimplifyCFG to avoid jump threading through loop headers if
-keep-loops is requested. Canonical loop form requires a loop header
that dominates all blocks in the loop. If we thread through a header, we
risk breaking its domination of the loop. This change avoids this issue
by conservatively avoiding threading through headers entirely.
Fixes: https://github.com/llvm/llvm-project/issues/151144
In review of bbde6b, I had originally proposed that we support the
legacy text format. As review evolved, it bacame clear this had been a
bad idea (too much complexity), but in order to let that patch finally
move forward, I approved the change with the variant. This change undoes
the variant, and updates all the tests to just use the array form.
Many backends are missing either all tests for lrint, or specifically
those for f16, which currently crashes for `softPromoteHalf` targets.
For a number of popular backends, do the following:
* Ensure f16, f32, f64, and f128 are all covered
* Ensure both a 32- and 64-bit target are tested, if relevant
* Add `nounwind` to clean up CFI output
* Add a test covering the above if one did not exist
* Always specify the integer type in intrinsic calls
There are quite a few FIXMEs here, especially for `f16`, but much of
this will be resolved in the near future.