This patch prepares the NFC groundwork for global outlining using
CGData, which will follow
https://github.com/llvm/llvm-project/pull/90074.
- The `MinRepeats` parameter is now explicitly passed to the
`getOutliningCandidateInfo` function, rather than relying on a default
value of 2. For local outlining, the minimum number of repetitions is
typically 2, but for the global outlining (mentioned above), we will
optimistically create a single `Candidate` for each `OutlinedFunction`
if stable hashes match a specific code sequence. This parameter is
adjusted accordingly in global outlining scenarios.
- I have also implemented `unique_ptr` for `OutlinedFunction` to ensure
safe and efficient memory management within `FunctionList`, avoiding
unnecessary implicit copies.
This depends on https://github.com/llvm/llvm-project/pull/101461.
This is a patch for
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
The renamable flag is useful during MachineCopyPropagation but renamable
flag will be dropped after lowerCopy in some case.
This patch introduces extra arguments to pass the renamable flag to
copyPhysReg.
The following code assumes that RepeatedSequenceLocs is non-empty. Bail
out if there are less than 2 candidates left, as no outlining is
possible in that case. The same check is already present in all the
other places where elements from RepeatedSequenceLocs may be dropped.
This fixes the issue reported at:
https://github.com/llvm/llvm-project/pull/93965#issuecomment-2151989716
FirstCand is a reference to RepeatedSequenceLocs[0]. However, that
vector is being modified a lot throughout the function, including one
place that reassigns the whole vector. I'm not sure whether this can
really happen in practice, but it doesn't seem unlikely that this could
lead to a use-after-free.
Avoid this by directly using RepeatedSequenceLocs[0] at the start of the
function (as a lot of other places already do) and only creating
FirstCand at the end where no more modifications take place.
Fixes#82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.
This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.
Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
Make Candidate's front() and back() functions return references to
MachineInstr and introduce begin() and end() returning iterators, the
same way it is usually done in other container-like classes.
This makes possible to iterate over the instructions contained in
Candidate the same way one can iterate over MachineBasicBlock (note that
begin() and end() return bundled iterators, just like MachineBasicBlock
does, but no instr_begin() and instr_end() are defined yet).
This helper function shortens examples like
`cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to
`Node->getConstantOperandVal(1);`.
Implemented with:
`git grep -l
"cast<ConstantSDNode>\(.*->getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i
's/cast<ConstantSDNode>\((.*)->getOperand\((.*)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/`
and `git grep -l
"cast<ConstantSDNode>\(.*\.getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i
's/cast<ConstantSDNode>\((.*)\.getOperand\((.*)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`.
With a couple of simple manual fixes needed. Result then processed by
`git clang-format`.
In some cases, the machine outliner needs to preserve LR across an
outlined call by pushing it onto the stack. Previously, this also
generated unwind table instructions, which is incorrect because EHABI
unwind tables cannot represent different stack frames a different points
in the function, so the extra unwind info applied to the entire
function.
The outliner code already avoided generating CFI instructions, but EHABI
unwind data is generated later from the actual instructions, so we need
to avoid using the FrameSetup and FrameDestroy flags to prevent unwind
data being generated.
Follow up on a post-commit review of 9468de4 (TargetInstrInfo: make
getOperandLatency return optional (NFC)) by Bjorn Pettersson to fix a
couple of things that are not NFC:
- std::optional<T>::operator<= returns true if the first operand is a
std::nullopt and second operand is T. Fix a couple of places where we
assumed it would return false.
- In TargetSchedule, computeInstrCost could take another codepath,
returning InstrLatency instead of DefaultDefLatency. Fix one instance
not accounting for this behavior.
Follow up on 9468de4 (TargetInstrInfo: make getOperandLatency return
optional (NFC)) to squelch a signedness warning on MSVC, reported by
Simon Pilgrim.
getOperandLatency has the following behavior: it returns -1 as a special
value, negative numbers other than -1 on some target-specific overrides,
or a valid non-negative latency. This behavior can be surprising, as
some callers do arithmetic on these negative values. Change the
interface of getOperandLatency to return a std::optional<unsigned> to
prevent surprises in callers. While at it, change the interface of
getInstrLatency to return unsigned instead of int.
This change was inspired by a refactoring in
TargetSchedModel::computeOperandLatency.
Both the Arm and X86 implementations of areLoadsFromSameBasePtr use a
switch over the machine opcode, and repeat the same logic for both
SDNode operands. We can avoid the duplicated logic (especially lengthy
in the X86 case) by just using a lambda. This could obviously be a
candidate for moving out to a separate helper function if there were
other users, but I've made the minimal change in this patch.
The ELF code from https://reviews.llvm.org/D112811 emits LDRLIT_ga_pcrel
when `TM.isPositionIndependent()` but uses a different condition
`Subtarget.isGVIndirectSymbol(GV)` (aka dso_preemptable on ELF targets).
This would cause incorrect access for dso_preemptable
`__stack_chk_guard` with the static relocation model.
Regarding whether `__stack_chk_guard` gets the dso_local specifier,
https://reviews.llvm.org/D150841 switched to
`M.getDirectAccessExternalData()` (implied by "PIC Level") instead of
`TM.getRelocationModel() == Reloc::Static`.
The result is that when non-zero "PIC Level" is used with static
relocation model (e.g. -fPIE/-fPIC LTO compiles with -no-pie linking),
`__stack_chk_guard` accesses are incorrect.
```
ldr r0, .LCPI0_0
ldr r0, [r0]
ldr r0, [r0] // incorrectly dereferences __stack_chk_guard
...
.LCPI0_0:
.long __stack_chk_guard
```
To fix this, for dso_preemptable `__stack_chk_guard`, emit a GOT PIC
code sequence like for -fpic using `LDRLIT_ga_pcrel`:
```
ldr r0, .LCPI0_0
.LPC0_0:
add r0, pc, r0
ldr r0, [r0]
ldr r0, [r0]
...
LCPI0_0:
.Ltmp0:
.long __stack_chk_guard(GOT_PREL)-((.LPC0_0+8)-.Ltmp0)
```
Technically, `LDRLIT_ga_abs` with `R_ARM_GOT_ABS` could be used, but
`R_ARM_GOT_ABS` does not have GNU or integrated assembler support.
(Note, `.LCPI0_0: .long __stack_chk_guard@GOT` produces an
`R_ARM_GOT_BREL`, which is not desired).
This patch fixes#6499 while not changing behavior for the following
configurations:
```
run arm.linux.nopic --target=arm-linux-gnueabi -fno-pic
run arm.linux.pie --target=arm-linux-gnueabi -fpie
run arm.linux.pic --target=arm-linux-gnueabi -fpic
run armv6.darwin.nopic --target=armv6-apple-darwin -fno-pic
run armv6.darwin.dynamicnopic --target=armv6-apple-darwin -mdynamic-no-pic
run armv6.darwin.pic --target=armv6-apple-darwin -fpic
run armv7.darwin.nopic --target=armv7-apple-darwin -mcpu=cortex-a8 -fno-pic
run armv7.darwin.dynamicnopic --target=armv7-apple-darwin -mcpu=cortex-a8 -mdynamic-no-pic
run armv7.darwin.pic --target=armv7-apple-darwin -mcpu=cortex-a8 -fpic
run arm64.darwin.pic --target=arm64-apple-darwin
```
For Thumb-1 Execute-Only, expandLoadStackGuardBase generates a tMOVimm32 pseudo when calculating the stack offset.
It does this in a context where the CSPR maybe be live. tMOVimm32 may corrupt CPSR.
To fix this, generate save/restore CPSR around the tMOVimm32 using MRS/MSR to/from a scratch register.
expandLoadStackGuardBase this runs after register allocation, so the scratch register needs to be a physical register.
Use R12 as a scratch register, as is usual when expanding a pseudo.
MSR/MRS are some of the few v6-M instructions which operate on a high register.
New stack-guard test case added which was generating incorrect code without the save/restore CPSR.
Reviewed By: stuij
Differential Revision: https://reviews.llvm.org/D156968
Currently `isTriviallyReMaterializable` calls
`isReallyTriviallyReMaterializable` and
`isReallyTriviallyReMaterializableGeneric`. The two interfaces
are confusing, but there are also some real issues with this.
The documentation of this function (see below) suggests that
`isReallyTriviallyRematerializable` allows the target to override the
default behaviour.
/// For instructions with opcodes for which the M_REMATERIALIZABLE flag is
/// set, this hook lets the target specify whether the instruction is actually
/// trivially rematerializable, taking into consideration its operands.
It however implements something different. The default behaviour
is the analysis done in `isReallyTriviallyReMaterializableGeneric`,
which is testing if it is safe to rematerialize the MachineInstr.
The result of `isReallyTriviallyReMaterializable` is only considered if
`isReallyTriviallyReMaterializableGeneric` returns `false`. That means
there is no way to override the default behaviour if
`isReallyTriviallyReMaterializableGeneric` returns true (i.e. it is safe to
rematerialize, but we'd rather not).
By making this a single interface, we can override the interface to do either.
Reviewed By: craig.topper, nemanjai
Differential Revision: https://reviews.llvm.org/D156520
[ARM] generate armv6m eXecute Only (XO) code for immediates, globals
Previously eXecute Only (XO) support was implemented for targets that support
MOVW/MOVT (~armv7+). See: https://reviews.llvm.org/D27449
XO prevents the compiler from generating data accesses to code sections. This
patch implements XO codegen for armv6-M, which does not support MOVW/MOVT, and
must resort to the following general pattern to avoid loads:
movs r3, :upper8_15:foo
lsls r3, #8
adds r3, :upper0_7:foo
lsls r3, #8
adds r3, :lower8_15:foo
lsls r3, #8
adds r3, :lower0_7:foo
ldr r3, [r3]
This is equivalent to the code pattern generated by GCC.
The above relocations are new to LLVM and have been implemented in a parent
patch: https://reviews.llvm.org/D149443.
This patch limits itself to implementing codegen for this pattern and enabling
XO for armv6-M in the backend.
Separate patches will follow for:
- switch tables
- replacing specific loads from constant islands which are spread out over the
ARM backend codebase. Amongst others: FastISel, call lowering, stack frames.
Reviewed By: john.brawn
Differential Revision: https://reviews.llvm.org/D152795
AArch64 has five system registers intended to be useful as thread
pointers: one for each exception level which is RW at that level and
inaccessible to lower ones, and the special TPIDRRO_EL0 which is
readable but not writable at EL0. AArch32 has three, corresponding to
the AArch64 ones that aren't specific to EL2 or EL3.
Currently clang supports only a subset of these registers, and not
even a consistent subset between AArch64 and AArch32:
- For AArch64, clang permits you to choose between the four TPIDR_ELn
thread registers, but not the fifth one, TPIDRRO_EL0.
- In AArch32, on the other hand, the //only// thread register you can
choose (apart from 'none, use a function call') is TPIDRURO, which
corresponds to (the bottom 32 bits of) AArch64's TPIDRRO_EL0.
So there is no thread register that you can currently use in both
targets!
For custom and bare-metal purposes, users might very reasonably want
to use any of these thread registers. There's no reason they shouldn't
all be supported as options, even if the default choices follow
existing practice on typical operating systems.
This commit extends the range of values acceptable to the `-mtp=`
clang option, so that you can specify any of these registers by (the
lower-case version of) their official names in the ArmARM:
- For AArch64: tpidr_el0, tpidrro_el0, tpidr_el1, tpidr_el2, tpidr_el3
- For AArch32: tpidrurw, tpidruro, tpidrprw
All existing values of the option are still supported and behave the
same as before. Defaults are also unchanged. No command line that
worked already should change behaviour as a result of this.
The new values for the `-mtp=` option have been agreed with Arm's gcc
developers (although I don't know whether they plan to implement them
in the near future).
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D152433
This reverts commit 0a762ec1b09d96734a3462f8792a5574d089b24d.
Some CPUs enable fp64 by default (such as cortex-m7). When specifying a
single-precision fpu with them like -mfpu=fpv5-sp-d16, the fp64 feature will
be disabled, but fpreg64 will not. We need to disable them both correctly under
clang in order for the backend to be able to use the reliably. In the meantime
this reverts 0a762ec1b09d96734 until that issue is fixed.
I think it's good practice to avoid having default ctors unless they're really
valid/useful. For OutlinedFunction the default ctor was used to represent a
bail-out value for getOutliningCandidateInfo(), so I changed the API to return
an optional<getOutliningCandidateInfo> instead which seems a tad cleaner.
Differential Revision: https://reviews.llvm.org/D146375
The motivation behind this patch is to unify some of the outliner logic across architectures. This looks nicer in general and makes fixing [issues like this](https://reviews.llvm.org/D124707#3483805) easier.
There are some notable changes here:
1. `isMetaInstruction()` is used directly instead of checking for specific meta-instructions like `IMPLICIT_DEF` or `KILL`. This was already done in the RISC-V implementation, but other architectures still did hardcoded checks.
- As an exception to this, CFI instructions are explicitly delegated to the target because RISC-V has different handling for those.
2. `isTargetIndex()` checks are replaced with an assert; none of the architectures supported actually use `MO_TargetIndex` at this point in time.
3. `isCFIIndex()` and `isFI()` checks are also replaced with asserts, since these operands should not exist in [any context](https://reviews.llvm.org/D122635#3447214) at this stage in the pipeline.
Reviewed by: paquette
Differential Revision: https://reviews.llvm.org/D125072
Change MCInstrDesc::operands to return an ArrayRef so we can easily use
it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end.
A future patch will remove opInfo_begin and opInfo_end.
Also use it instead of raw access to the OpInfo pointer. A future patch
will remove this pointer.
Differential Revision: https://reviews.llvm.org/D142213
Use deduction guides instead of helper functions.
The only non-automatic changes have been:
1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).
Per reviewers' comment, some useless makeArrayRef have been removed in the process.
This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.
Differential Revision: https://reviews.llvm.org/D140955
With D134950, targets get notified when a virtual register is created and/or
cloned. Targets can do the needful with the delegate callback. AMDGPU propagates
the virtual register flags maintained in the target file itself. They are useful
to identify a certain type of machine operands while inserting spill stores and
reloads. Since RegAllocFast spills the physical register itself, there is no way
its virtual register can be mapped back to retrieve the flags. It can be solved
by passing the virtual register as an additional argument. This argument has no
use when the spill interfaces are called during the greedy allocator or even the
PrologEpilogInserter and can pass a null register in such cases.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138656
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.
Change call sites to use `std::size` instead.
Differential Revision: https://reviews.llvm.org/D133429
This interface allows a target to reject a proposed
SMS schedule. For Hexagon/PowerPC, all schedules
are accepted, leaving behavior unchanged. For ARM,
schedules which exceed register pressure limits are
rejected.
Also, two RegisterPressureTracker methods now need to be public so
that register pressure can be computed by more callers.
Reapplication of D128941/(reversion:D132037) with small fix.
Differential Revision: https://reviews.llvm.org/D132170