759 Commits

Author SHA1 Message Date
Philip Reames
3ff7caea33
[TTI] Use Register in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#80339) 2024-02-01 17:52:35 -08:00
Shengchen Kan
550f0eb2ce [NFC] Rename TargetInstrInfo::FoldImmediate to TargetInstrInfo::foldImmediate and simplify implementation for X86 2024-01-26 20:50:58 +08:00
Anatoly Trosinenko
10bd69a4f7
[MachineOutliner] Refactor iterating over Candidate's instructions (#78972)
Make Candidate's front() and back() functions return references to
MachineInstr and introduce begin() and end() returning iterators, the
same way it is usually done in other container-like classes.

This makes possible to iterate over the instructions contained in
Candidate the same way one can iterate over MachineBasicBlock (note that
begin() and end() return bundled iterators, just like MachineBasicBlock
does, but no instr_begin() and instr_end() are defined yet).
2024-01-23 17:21:40 +03:00
Alex Bradbury
80aeb62211
[llvm][NFC] Use SDValue::getConstantOperandVal(i) where possible (#76708)
This helper function shortens examples like
`cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to
`Node->getConstantOperandVal(1);`.

Implemented with:
`git grep -l
"cast<ConstantSDNode>\(.*->getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)->getOperand\((.*)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/`
and `git grep -l
"cast<ConstantSDNode>\(.*\.getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)\.getOperand\((.*)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`.
With a couple of simple manual fixes needed. Result then processed by
`git clang-format`.
2024-01-02 13:14:28 +00:00
ostannard
4888218d03
[ARM] Do not emit unwind tables when saving LR around outlined call (#69611)
In some cases, the machine outliner needs to preserve LR across an
outlined call by pushing it onto the stack. Previously, this also
generated unwind table instructions, which is incorrect because EHABI
unwind tables cannot represent different stack frames a different points
in the function, so the extra unwind info applied to the entire
function.

The outliner code already avoided generating CFI instructions, but EHABI
unwind data is generated later from the actual instructions, so we need
to avoid using the FrameSetup and FrameDestroy flags to prevent unwind
data being generated.
2023-12-14 14:46:13 +00:00
Ramkumar Ramachandra
e8dbe945f3
TargetInstrInfo, TargetSchedule: fix non-NFC parts of 9468de4 (#74338)
Follow up on a post-commit review of 9468de4 (TargetInstrInfo: make
getOperandLatency return optional (NFC)) by Bjorn Pettersson to fix a
couple of things that are not NFC:

- std::optional<T>::operator<= returns true if the first operand is a
std::nullopt and second operand is T. Fix a couple of places where we
assumed it would return false.
- In TargetSchedule, computeInstrCost could take another codepath,
returning InstrLatency instead of DefaultDefLatency. Fix one instance
not accounting for this behavior.
2023-12-05 08:18:17 +00:00
Simon Pilgrim
83e01ea1a5 Fix MSVC signed/unsigned mismatch warning. NFC. 2023-12-04 11:11:33 +00:00
Ramkumar Ramachandra
d222fa4521
TargetInstrInfo: squelch a signedness warning on MSVC (#74078)
Follow up on 9468de4 (TargetInstrInfo: make getOperandLatency return
optional (NFC)) to squelch a signedness warning on MSVC, reported by
Simon Pilgrim.
2023-12-01 16:08:41 +00:00
Ramkumar Ramachandra
9468de48fc
TargetInstrInfo: make getOperandLatency return optional (NFC) (#73769)
getOperandLatency has the following behavior: it returns -1 as a special
value, negative numbers other than -1 on some target-specific overrides,
or a valid non-negative latency. This behavior can be surprising, as
some callers do arithmetic on these negative values. Change the
interface of getOperandLatency to return a std::optional<unsigned> to
prevent surprises in callers. While at it, change the interface of
getInstrLatency to return unsigned instead of int.

This change was inspired by a refactoring in
TargetSchedModel::computeOperandLatency.
2023-12-01 11:29:19 +00:00
Alex Bradbury
5b3eb1bc22
[ARM][X86][NFC] Use lambda to avoid duplicate switches in areLoadsFromSameBasePtr (#72376)
Both the Arm and X86 implementations of areLoadsFromSameBasePtr use a
switch over the machine opcode, and repeat the same logic for both
SDNode operands. We can avoid the duplicated logic (especially lengthy
in the X86 case) by just using a lambda. This could obviously be a
candidate for moving out to a separate helper function if there were
other users, but I've made the minimal change in this patch.
2023-11-15 12:35:35 +00:00
Fangrui Song
5888dee7d0
[ARM,ELF] Fix access to dso_preemptable __stack_chk_guard with static relocation model (#70014)
The ELF code from https://reviews.llvm.org/D112811 emits LDRLIT_ga_pcrel
when `TM.isPositionIndependent()` but uses a different condition
`Subtarget.isGVIndirectSymbol(GV)` (aka dso_preemptable on ELF targets).
This would cause incorrect access for dso_preemptable
`__stack_chk_guard` with the static relocation model.

Regarding whether `__stack_chk_guard` gets the dso_local specifier,
https://reviews.llvm.org/D150841 switched to
`M.getDirectAccessExternalData()` (implied by "PIC Level") instead of
`TM.getRelocationModel() == Reloc::Static`.

The result is that when non-zero "PIC Level" is used with static
relocation model (e.g. -fPIE/-fPIC LTO compiles with -no-pie linking),
`__stack_chk_guard` accesses are incorrect.
```
        ldr     r0, .LCPI0_0
        ldr     r0, [r0]
        ldr     r0, [r0]   // incorrectly dereferences __stack_chk_guard
...
.LCPI0_0:
        .long   __stack_chk_guard
```

To fix this, for dso_preemptable `__stack_chk_guard`, emit a GOT PIC
code sequence like for -fpic using `LDRLIT_ga_pcrel`:
```
        ldr     r0, .LCPI0_0
.LPC0_0:
        add     r0, pc, r0
        ldr     r0, [r0]
        ldr     r0, [r0]
...
LCPI0_0:
.Ltmp0:
        .long   __stack_chk_guard(GOT_PREL)-((.LPC0_0+8)-.Ltmp0)
```

Technically, `LDRLIT_ga_abs` with `R_ARM_GOT_ABS` could be used, but
`R_ARM_GOT_ABS` does not have GNU or integrated assembler support.
(Note, `.LCPI0_0: .long __stack_chk_guard@GOT` produces an
`R_ARM_GOT_BREL`, which is not desired).

This patch fixes #6499 while not changing behavior for the following
configurations:
```
run arm.linux.nopic --target=arm-linux-gnueabi -fno-pic
run arm.linux.pie --target=arm-linux-gnueabi -fpie
run arm.linux.pic --target=arm-linux-gnueabi -fpic
run armv6.darwin.nopic --target=armv6-apple-darwin -fno-pic
run armv6.darwin.dynamicnopic --target=armv6-apple-darwin -mdynamic-no-pic
run armv6.darwin.pic --target=armv6-apple-darwin -fpic
run armv7.darwin.nopic --target=armv7-apple-darwin -mcpu=cortex-a8 -fno-pic
run armv7.darwin.dynamicnopic --target=armv7-apple-darwin -mcpu=cortex-a8 -mdynamic-no-pic
run armv7.darwin.pic --target=armv7-apple-darwin -mcpu=cortex-a8 -fpic
run arm64.darwin.pic --target=arm64-apple-darwin
```
2023-10-31 15:37:26 -07:00
Jie Fu
9d116b4b12 [ARM] Remove unused variable 'MF' in ARMBaseInstrInfo.cpp (NFC)
/Users/jiefu/llvm-project/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp:4986:24: error: unused variable 'MF' [-Werror,-Wunused-variable]
      MachineFunction &MF = *MBB.getParent();
                       ^
1 error generated.
2023-08-09 15:47:28 +08:00
Simon Wallis
33b9634394 [ARM] v6-M XO: save CPSR around LoadStackGuard
For Thumb-1 Execute-Only, expandLoadStackGuardBase generates a tMOVimm32 pseudo when calculating the stack offset.
It does this in a context where the CSPR maybe be live. tMOVimm32 may corrupt CPSR.
To fix this, generate save/restore CPSR around the tMOVimm32 using MRS/MSR to/from a scratch register.

expandLoadStackGuardBase this runs after register allocation, so the scratch register needs to be a physical register.
Use R12 as a scratch register, as is usual when expanding a pseudo.
MSR/MRS are some of the few v6-M instructions which operate on a high register.

New stack-guard test case added which was generating incorrect code without the save/restore CPSR.

Reviewed By: stuij

Differential Revision: https://reviews.llvm.org/D156968
2023-08-09 08:40:35 +01:00
Sander de Smalen
bbb95893de [TII] NFCI: Simplify the interface for isTriviallyReMaterializable
Currently `isTriviallyReMaterializable` calls
`isReallyTriviallyReMaterializable` and
`isReallyTriviallyReMaterializableGeneric`. The two interfaces
are confusing, but there are also some real issues with this.

The documentation of this function (see below) suggests that
`isReallyTriviallyRematerializable` allows the target to override the
default behaviour.

  /// For instructions with opcodes for which the M_REMATERIALIZABLE flag is
  /// set, this hook lets the target specify whether the instruction is actually
  /// trivially rematerializable, taking into consideration its operands.

It however implements something different. The default behaviour
is the analysis done in `isReallyTriviallyReMaterializableGeneric`,
which is testing if it is safe to rematerialize the MachineInstr.

The result of `isReallyTriviallyReMaterializable` is only considered if
`isReallyTriviallyReMaterializableGeneric` returns `false`.  That means
there is no way to override the default behaviour if
`isReallyTriviallyReMaterializableGeneric` returns true (i.e. it is safe to
rematerialize, but we'd rather not).

By making this a single interface, we can override the interface to do either.

Reviewed By: craig.topper, nemanjai

Differential Revision: https://reviews.llvm.org/D156520
2023-08-07 13:01:06 +00:00
Ties Stuij
2273741ea2 [ARM] generate armv6m eXecute Only (XO) code
[ARM] generate armv6m eXecute Only (XO) code for immediates, globals

Previously eXecute Only (XO) support was implemented for targets that support
MOVW/MOVT (~armv7+). See: https://reviews.llvm.org/D27449

XO prevents the compiler from generating data accesses to code sections. This
patch implements XO codegen for armv6-M, which does not support MOVW/MOVT, and
must resort to the following general pattern to avoid loads:

    movs    r3, :upper8_15:foo
    lsls    r3, #8
    adds    r3, :upper0_7:foo
    lsls    r3, #8
    adds    r3, :lower8_15:foo
    lsls    r3, #8
    adds    r3, :lower0_7:foo
    ldr     r3, [r3]

This is equivalent to the code pattern generated by GCC.

The above relocations are new to LLVM and have been implemented in a parent
patch: https://reviews.llvm.org/D149443.

This patch limits itself to implementing codegen for this pattern and enabling
XO for armv6-M in the backend.

Separate patches will follow for:
- switch tables
- replacing specific loads from constant islands which are spread out over the
  ARM backend codebase. Amongst others: FastISel, call lowering, stack frames.

Reviewed By: john.brawn

Differential Revision: https://reviews.llvm.org/D152795
2023-06-23 10:50:47 +01:00
Simon Tatham
10e4228114 [ARM,AArch64] Add a full set of -mtp= options.
AArch64 has five system registers intended to be useful as thread
pointers: one for each exception level which is RW at that level and
inaccessible to lower ones, and the special TPIDRRO_EL0 which is
readable but not writable at EL0. AArch32 has three, corresponding to
the AArch64 ones that aren't specific to EL2 or EL3.

Currently clang supports only a subset of these registers, and not
even a consistent subset between AArch64 and AArch32:

 - For AArch64, clang permits you to choose between the four TPIDR_ELn
   thread registers, but not the fifth one, TPIDRRO_EL0.

 - In AArch32, on the other hand, the //only// thread register you can
   choose (apart from 'none, use a function call') is TPIDRURO, which
   corresponds to (the bottom 32 bits of) AArch64's TPIDRRO_EL0.

So there is no thread register that you can currently use in both
targets!

For custom and bare-metal purposes, users might very reasonably want
to use any of these thread registers. There's no reason they shouldn't
all be supported as options, even if the default choices follow
existing practice on typical operating systems.

This commit extends the range of values acceptable to the `-mtp=`
clang option, so that you can specify any of these registers by (the
lower-case version of) their official names in the ArmARM:

 - For AArch64: tpidr_el0, tpidrro_el0, tpidr_el1, tpidr_el2, tpidr_el3
 - For AArch32: tpidrurw, tpidruro, tpidrprw

All existing values of the option are still supported and behave the
same as before. Defaults are also unchanged. No command line that
worked already should change behaviour as a result of this.

The new values for the `-mtp=` option have been agreed with Arm's gcc
developers (although I don't know whether they plan to implement them
in the near future).

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D152433
2023-06-15 09:27:41 +01:00
David Green
8998ff53c9 Revert "[ARM] Allow D-reg copies to use VMOVD with fpregs64"
This reverts commit 0a762ec1b09d96734a3462f8792a5574d089b24d.

Some CPUs enable fp64 by default (such as cortex-m7). When specifying a
single-precision fpu with them like -mfpu=fpv5-sp-d16, the fp64 feature will
be disabled, but fpreg64 will not. We need to disable them both correctly under
clang in order for the backend to be able to use the reliably. In the meantime
this reverts 0a762ec1b09d96734 until that issue is fixed.
2023-06-01 17:49:25 +01:00
David Green
0a762ec1b0 [ARM] Allow D-reg copies to use VMOVD with fpregs64
This instruction should be available with MVE, where we have D regs, not
requiring the full FP64 target feature.
2023-05-28 19:12:45 +01:00
Kazu Hirata
4241d890ae [Target] Use range-based for loops (NFC) 2023-04-15 14:14:56 -07:00
Amara Emerson
41e9c4b88c [NFC][Outliner] Delete default ctors for Candidate & OutlinedFunction.
I think it's good practice to avoid having default ctors unless they're really
valid/useful. For OutlinedFunction the default ctor was used to represent a
bail-out value for getOutliningCandidateInfo(), so I changed the API to return
an optional<getOutliningCandidateInfo> instead which seems a tad cleaner.

Differential Revision: https://reviews.llvm.org/D146375
2023-03-20 11:17:10 -07:00
Fangrui Song
432caca39a Simplify with hasFeature. NFC 2023-02-17 18:22:24 -08:00
Kazu Hirata
5e78b749ec [ARM] Use llvm::rotl and llvm::rotr (NFC) 2023-02-13 21:28:50 -08:00
duk
d61d591411
[MachineOutliner] Make getOutliningType partially target-independent
The motivation behind this patch is to unify some of the outliner logic across architectures. This looks nicer in general and makes fixing [issues like this](https://reviews.llvm.org/D124707#3483805) easier.
There are some notable changes here:
    1. `isMetaInstruction()` is used directly instead of checking for specific meta-instructions like `IMPLICIT_DEF` or `KILL`. This was already done in the RISC-V implementation, but other architectures still did hardcoded checks.
        - As an exception to this, CFI instructions are explicitly delegated to the target because RISC-V has different handling for those.

    2. `isTargetIndex()` checks are replaced with an assert; none of the architectures supported actually use `MO_TargetIndex` at this point in time.

    3. `isCFIIndex()` and `isFI()` checks are also replaced with asserts, since these operands should not exist in [any context](https://reviews.llvm.org/D122635#3447214) at this stage in the pipeline.

Reviewed by: paquette

Differential Revision: https://reviews.llvm.org/D125072
2023-02-09 14:35:00 -05:00
Archibald Elliott
62c7f035b4 [NFC][TargetParser] Remove llvm/ADT/Triple.h
I also ran `git clang-format` to get the headers in the right order for
the new location, which has changed the order of other headers in two
files.
2023-02-07 12:39:46 +00:00
Jay Foad
768aed1378 [MC] Make more use of MCInstrDesc::operands. NFC.
Change MCInstrDesc::operands to return an ArrayRef so we can easily use
it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end.
A future patch will remove opInfo_begin and opInfo_end.

Also use it instead of raw access to the OpInfo pointer. A future patch
will remove this pointer.

Differential Revision: https://reviews.llvm.org/D142213
2023-01-23 11:31:41 +00:00
Craig Topper
79858d1908 [CodeGen][Target] Remove uses of Register::isPhysicalRegister/isVirtualRegister. NFC
Use isPhysical/isVirtual methods.
2023-01-13 23:12:48 -08:00
serge-sans-paille
38818b60c5
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
Christudasan Devadasan
b5efec4b27 [CodeGen] Additional Register argument to storeRegToStackSlot/loadRegFromStackSlot
With D134950, targets get notified when a virtual register is created and/or
cloned. Targets can do the needful with the delegate callback. AMDGPU propagates
the virtual register flags maintained in the target file itself. They are useful
to identify a certain type of machine operands while inserting spill stores and
reloads. Since RegAllocFast spills the physical register itself, there is no way
its virtual register can be mapped back to retrieve the flags. It can be solved
by passing the virtual register as an additional argument. This argument has no
use when the spill interfaces are called during the greedy allocator or even the
PrologEpilogInserter and can pass a null register in such cases.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D138656
2022-12-17 11:55:34 +05:30
Gregory Alfonso
cb38be9ed3 [NFC] Use Register instead of unsigned for variables that receive a Register object
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D139451
2022-12-07 00:23:34 +00:00
Fangrui Song
b0df70403d [Target] llvm::Optional => std::optional
The updated functions are mostly internal with a few exceptions (virtual functions in
TargetInstrInfo.h, TargetRegisterInfo.h).
To minimize changes to LLVMCodeGen, GlobalISel files are skipped.

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 22:43:14 +00:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Filipp Zhinkin
945a1468c9 [ARM] Support all versions of AND, ORR, EOR and BIC in optimizeCompareInstr
Combine cmp with zero and all versions of AND, ORR, EOR and BIC instructions into S-suffixed versions.

Related issue: https://github.com/llvm/llvm-project/issues/57122

Reviewed By: efriedma, samtebbs

Differential Revision: https://reviews.llvm.org/D131786
2022-10-01 12:41:37 +03:00
Joe Loser
5e96cea1db [llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429
2022-09-08 09:01:53 -06:00
Kazu Hirata
2833760c57 [Target] Qualify auto in range-based for loops (NFC) 2022-08-28 17:35:09 -07:00
David Penry
ced705c440 [ModuloSchedule] Add interface call to accept/reject SMS schedules
This interface allows a target to reject a proposed
SMS schedule.  For Hexagon/PowerPC, all schedules
are accepted, leaving behavior unchanged.  For ARM,
schedules which exceed register pressure limits are
rejected.

Also, two RegisterPressureTracker methods now need to be public so
that register pressure can be computed by more callers.

Reapplication of D128941/(reversion:D132037) with small fix.

Differential Revision: https://reviews.llvm.org/D132170
2022-08-22 12:10:13 -07:00
Kazu Hirata
ec5eab7e87 Use range-based for loops (NFC) 2022-08-20 21:18:32 -07:00
David Penry
1c9f0408bc Revert "[ModuloSchedule] Add interface call to accept/reject SMS schedules"
This reverts commit 8c4aea438c310816bb4e4f9a32d783381ef3182e.

Needed because buildbot failures (warnings) gave a clue that there was
a functional bug in the ARM rejection logic.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D132037
2022-08-17 09:32:43 -07:00
David Penry
8c4aea438c [ModuloSchedule] Add interface call to accept/reject SMS schedules
This interface allows a target to reject a proposed
SMS schedule.  For Hexagon/PowerPC, all schedules
are accepted, leaving behavior unchanged.  For ARM,
schedules which exceed register pressure limits are
rejected.

Also, two RegisterPressureTracker methods now need to be public so
that register pressure can be computed by more callers.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D128941
2022-08-17 08:13:26 -07:00
Fangrui Song
de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Matt Arsenault
8d0383eb69 CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.

Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.

Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
2022-07-18 17:23:41 -04:00
Kazu Hirata
621f58e716 [Target, CodeGen] Use isImm(), isReg(), etc (NFC) 2022-06-18 07:41:04 -07:00
Kazu Hirata
3b9707dbc0 [llvm] Convert for_each to range-based for loops (NFC) 2022-06-05 12:07:14 -07:00
Martin Storsjö
d8e67c1ccc [ARM] Add SEH opcodes in frame lowering
Skip inserting regular CFI instructions if using WinCFI.

This is based a fair amount on the corresponding ARM64 implementation,
but instead of trying to insert the SEH opcodes one by one where
we generate other prolog/epilog instructions, we try to walk over the
whole prolog/epilog range and insert them. This is done because in
many cases, the exact number of instructions inserted is abstracted
away deeper.

For some cases, we manually insert specific SEH opcodes directly where
instructions are generated, where the automatic mapping of instructions
to SEH opcodes doesn't hold up (e.g. for __chkstk stack probes).

Skip Thumb2SizeReduction for SEH prologs/epilogs, and force
tail calls to wide instructions (just like on MachO), to make sure
that the unwind info actually matches the width of the final
instructions, without heuristics about what later passes will do.

Mark SEH instructions as scheduling boundaries, to make sure that they
aren't reordered away from the instruction they describe by
PostRAScheduler.

Mark the SEH instructions with the NoMerge flag, to avoid doing
tail merging of functions that have multiple epilogs that all end
with the same sequence of "b <other>; .seh_nop_w, .seh_endepilogue".

Differential Revision: https://reviews.llvm.org/D125648
2022-06-02 12:28:46 +03:00
David Penry
917dc0749b [ARM] Recognize t2LoopEnd for software pipelining
- Add t2LoopEnd to TargetInstrInfo::analyzeBranch and
  related functions.  As there are many side effects of
  analyzing a branch, only do so if software pipelining
  is enabled to maintain previous behavior when pipelining
  is not desired.
- Make sure that t2LoopEndDec is immediately followed by
  a t2B when it is synthesized from a t2LoopEnd. This is
  done because the t2LoopEnd might have acquired a
  fall-through path, but IfConversion assumes that
  fall-through are only possible on analyzable branches.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D126322
2022-05-26 09:55:42 -07:00
David Penry
dcb77643e3 Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM
Fixed "private field is not used" warning when compiled
with clang.

original commit: 28d09bbbc3d09c912b54a4d5edb32cab7de32a6f
reverted in: fa49021c68ef7a7adcdf7b8a44b9006506523191

------

This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7.  The t2Bcc
instruction is recognized as a loop-ending branch.

MachinePipeliner is extended by adding support for
"unpipelineable" instructions.  These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.

The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.

It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D122672
2022-04-29 10:54:39 -07:00
David Penry
fa49021c68 Revert "[CodeGen][ARM] Enable Swing Module Scheduling for ARM"
This reverts commit 28d09bbbc3d09c912b54a4d5edb32cab7de32a6f
while I investigate a buildbot failure.
2022-04-28 13:29:27 -07:00
David Penry
28d09bbbc3 [CodeGen][ARM] Enable Swing Module Scheduling for ARM
This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7.  The t2Bcc
instruction is recognized as a loop-ending branch.

MachinePipeliner is extended by adding support for
"unpipelineable" instructions.  These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.

The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.

It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D122672
2022-04-28 13:01:18 -07:00
Shengchen Kan
37b378386e [NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments 2022-03-16 20:25:42 +08:00
Jessica Paquette
6d58f4ab07 [MachineOutliner] NFC: Hide LRU-related stuff behind helper functions
It's not particularly user-friendly to have to call `initLRU` everywhere. Also,
it wasn't particularly great that the LRU for registers used in a sequence was
also initialized by `initLRU`.

This patch hides this stuff behind some helper functions:

* `isAvailableAcrossAndOutOfSeq`
* `isAnyUnavailableAcrossOrOutOfSeq`
* `isAvailableInsideSeq`

This allows the user to avoid calling `initLRU` explicitly. Also, it allows
us to separate initializing the used-in-sequence LRU from the main LRU.

Since both ARM and AArch64 check LR liveness in `insertOutlinedCall`, this
refactor requires that we de-const the Candidate there.

Some other quality-of-code improvements:

* LRUs in outliner::Candidate now have more descriptive names
* Use `Register` instead of `unsigned` in some places
* Improve readability in some places by using ranges rather than `std::for_each`

This is a preparatory commit for a larger compile time related change for the
AArch64 outliner.
2022-02-16 11:39:07 -08:00
tyb0807
762f0b5463 [ARM] Make getInstSizeInBytes() use instruction size from InstrInfo.td
Currently, ARMBaseInstrInfo::getInstSizeInBytes() uses hard-coded
instruction size for some pseudo-instructions, while this
information should ideally be found in ARMInstrInfo.td,
ARMInstrThumb(2).td files (which can be accessed via MCInstrDesc). Hence,
the .td files should be updated and no hard-coded instruction sizes
should be used by getInstSizeInBytes() anymore.

Differential Revision: https://reviews.llvm.org/D118009
2022-02-01 10:39:14 +00:00