Currently, the AIX linker and loader do not provide a mechanism to
implement ifuncs similar to GNU_ifunc on ELF Linux.
On AIX, we will lower `__attribute__((ifunc("resolver"))` to the llvm
`ifunc` as other platforms do. The llvm `ifunc` in turn will get lowered
at late stages of the optimization pipeline to an AIX-specific
implementation. No special linkage or relocations are needed when
generating assembly/object output.
On AIX, a function `foo` has two symbols associated with it: a function
descriptor (`foo`) residing in the `.data` section, and an entry point
(`.foo`) residing in the `.text` section. The first field of the
descriptor is the address of the entry point. Typically, the address
field in the descriptor is initialized once: statically, at load time
(?), or at runtime if runtime linking is enabled.
Here we would like to use the address field in the descriptor to
implement the `ifunc` semantics. Specifically, the ifunc function will
become a stub that jumps to the entry point in the address field. A
constructor function is linked into every linkage module. The
constructor walks an array of `{descriptor, resolver}` pairs, calling
the resolver and saving the result in the address field in the
descriptor (thus setting `foo`'s descriptor to point to the resolved
version early during program runtime).
Known limitations:
- Due to bug #161576, which affects object generation path, you will
need either `-ffunction-sections` or `-fno-integrated-as` to generate a
correct/linkable object file.
- aliases to ifuncs are not supported, a testcase has been added and
marked XFAIL. I'm planning to address in a follow-up PR because it's not
important enough, IMHO, for this PR
- dead ifuncs in a CU that contains at least one live ifunc, will result
in all ifuncs being kept by the linker. The fix for this is common with
a similar problem we have with PGO. PR #159435 is trying to provide a
mechanism that will allow the ifunc and PGO implementations to avoid the
dead code retention at the link step.
- the resolver must return a function that is in the same DSO as the
ifunc; the compiler will try to detect if this condition is violated and
report it, but it cannot detect it in general. To be safe, all candidate
functions (returned by a particular resolver) must either be static or
have hidden/protected visibility. This is so that the ifunc stub doesn't
have to save and restore the TOC register r2. In future work, this case
will be supported and the requirement will be lifted.
---------
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
Prior to this change, the data layout calculation would not account for
explicitly set `-mabi=elfv2` on `powerpc64-unknown-linux-gnu`, a target
that defaults to `elfv1`.
This is loosely inspired by the equivalent ARM / RISC-V code.
`make check-llvm` passes fine for me, though AFAICT all the tests
specify the data layout manually so there isn't really a test for this
and I am not really sure what the best way to go about adding one would
be.
Signed-off-by: Jens Reidel <adrian@travitia.xyz>
Clang and other frontends generally need the LLVM data layout string in
order to generate LLVM IR modules for LLVM. MLIR clients often need it
as well, since MLIR users often lower to LLVM IR.
Before this change, the LLVM datalayout string was computed in the
LLVM${TGT}CodeGen library in the relevant TargetMachine subclass.
However, none of the logic for computing the data layout string requires
any details of code generation. Clients who want to avoid duplicating
this information were forced to link in LLVMCodeGen and all registered
targets, leading to bloated binaries. This happened in PR #145899,
which measurably increased binary size for some of our users.
By moving this information to the TargetParser library, we
can delete the duplicate datalayout strings in Clang, and retain the
ability to generate IR for unregistered targets.
This is intended to be a very mechanical LLVM-only change, but there is
an immediately obvious follow-up to clang, which will be prepared
separately.
The vast majority of data layouts are computable with two inputs: the
triple and the "ABI name". There is only one exception, NVPTX, which has
a cl::opt to enable short device pointers. I invented a "shortptr" ABI
name to pass this option through the target independent interface.
Everything else fits. Mips is a bit awkward because it uses a special
MipsABIInfo abstraction, which includes members with codegen-like
concepts like ABI physical registers that can't live in TargetParser. I
think the string logic of looking for "n32" "n64" etc is reasonable to
duplicate. We have plenty of other minor duplication to preserve
layering.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
This patch updates PPCInstrInfo::copyPhysReg to support DMR and WACC
register classes and extends the PPCVSXCopy pass to handle specific WACC
copy patterns.
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Target` library.
These annotations currently have no meaningful impact on the LLVM build;
however, they are a prerequisite to support an LLVM Windows DLL (shared
library) build.
## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
A sub-set of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.
The bulk of this change is manual additions of `LLVM_ABI` to
`LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target.
Adding `LLVM_ABI` to the function implementation is required here
because they do not `#include "llvm/Support/TargetSelect.h"`, which
contains the declarations for this functions and was already updated
with `LLVM_ABI` in a previous patch. I considered patching these files
with `#include "llvm/Support/TargetSelect.h"` instead, but since
TargetSelect.h is a large file with a bunch of preprocessor x-macro
stuff in it I was concerned it would unnecessarily impact compile times.
In addition, a number of unit tests under llvm/unittests/Target required
additional dependencies to make them build correctly against the LLVM
DLL on Windows using MSVC.
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
Summary:
The patch fixes the issue [[PowerPC] missing VSX FMA Mutation optimize
in some case for option -schedule-ppc-vsx-fma-mutation-early
#111906](https://github.com/llvm/llvm-project/issues/111906)
In certain cases, the Register Coalescer pass—which eliminates COPY
instructions—can interfere with the PowerPC VSX FMA Mutation pass.
Specifically, it can prevent the mutation of a COPY adjacent to an
XSMADDADP into a single XSMADDMDP instruction. As a result, the xxspltiw
instruction is not hoisted out of the loop as expected, leading to
missed optimization opportunities.
To address this, the patch ensures that the `VSX FMA Mutation` pass runs
before the `Register Coalescer` pass when the
-schedule-ppc-vsx-fma-mutation-early option is enabled.
We rename `createGenericSchedLive` and `createGenericSchedPostRA`
to `createSchedLive` and `createSchedPostRA`, and add a template
parameter `Strategy` which is the generic implementation by default.
This can simplify some code for targets that have custom scheduler
strategy.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Register assembly printer passes in the pass registry.
This makes it possible to use `llc -start-before=<target>-asm-printer ...` in tests.
Adds a `char &ID` parameter to the AssemblyPrinter constructor to allow
targets to use the `INITIALIZE_PASS` macros and register the pass in the
pass registry. This currently has a default parameter so it won't break
any targets that have not been updated.
Replace "concept based polymorphism" with simpler PImpl idiom.
This pursues two goals:
* Enforce static type checking. Previously, target implementations hid
base class methods and type checking was impossible. Now that they
override the methods, the compiler will complain on mismatched
signatures.
* Make the code easier to navigate. Previously, if you asked your
favorite LSP server to show a method (e.g. `getInstructionCost()`), it
would show you methods from `TTI`, `TTI::Concept`, `TTI::Model`,
`TTIImplBase`, and target overrides. Now it is two less :)
There are three commits to hopefully simplify the review.
The first commit removes `TTI::Model`. This is done by deriving
`TargetTransformInfoImplBase` from `TTI::Concept`. This is possible
because they implement the same set of interfaces with identical
signatures.
The first commit makes `TargetTransformImplBase` polymorphic, which
means all derived classes should `override` its methods. This is done in
second commit to make the first one smaller. It appeared infeasible to
extract this into a separate PR because the first commit landed
separately would result in tons of `-Woverloaded-virtual` warnings (and
break `-Werror` builds).
The third commit eliminates `TTI::Concept` by merging it with the only
derived class `TargetTransformImplBase`. This commit could be extracted
into a separate PR, but it touches the same lines in
`TargetTransformInfoImpl.h` (removes `override` added by the second
commit and adds `virtual`), so I thought it may make sense to land these
two commits together.
Pull Request: https://github.com/llvm/llvm-project/pull/136674
The createSIMachineScheduler & createPostMachineScheduler
target hooks are currently placed in the PassConfig interface.
Moving it out to TargetMachine so that both legacy and
the new pass manager can effectively use them.
Following discussions in #110443, and the following earlier discussions
in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html,
https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this
PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine`
interface classes. More specifically:
1. Makes `TargetMachine` the only class implemented under
`TargetMachine.h` in the `Target` library.
2. `TargetMachine` contains target-specific interface functions that
relate to IR/CodeGen/MC constructs, whereas before (at least on paper)
it was supposed to have only IR/MC constructs. Any Target that doesn't
want to use the independent code generator simply does not implement
them, and returns either `false` or `nullptr`.
3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming
aims to make the purpose of `LLVMTargetMachine` clearer. Its interface
was moved under the CodeGen library, to further emphasis its usage in
Targets that use CodeGen directly.
4. Makes `TargetMachine` the only interface used across LLVM and its
projects. With these changes, `CodeGenCommonTMImpl` is simply a set of
shared function implementations of `TargetMachine`, and CodeGen users
don't need to static cast to `LLVMTargetMachine` every time they need a
CodeGen-specific feature of the `TargetMachine`.
5. More importantly, does not change any requirements regarding library
linking.
cc @arsenm @aeubanks
We can decide whether to expand isel or not in instruction selection
pass and early-if-conversion pass. The transformation implemented in
PPCExpandISel can be retired considering PPC backend doesn't generate
`isel` instructions post-RA.
Also if we are seeking performant branch-or-isel decision, we can turn
to selectoptimize pass.
---------
Co-authored-by: Kai Luo <lkail@cn.ibm.com>
This patch aims to reduce TOC usage by merging internal and private
global data.
Moreover, we also add the GlobalMerge pass within the PPCTargetMachine
pipeline, which is disabled by default. This transformation can be
enabled by -ppc-global-merge.
- Fix build with `EXPENSIVE_CHECKS`
- Remove unused `PassName::ID` to resolve warning
- Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly
This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to
de37c06f01772e02465ccc9f538894c76d89a7a1
It still breaks EXPENSIVE_CHECKS build. Sorry.
Port selection dag isel to new pass manager.
Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
On PowerPC the number of TOC entries must be kept low for large
applications. In order to reduce the number of constant global arrays
we can pool them into one structure and then access them as the base
address of that structure plus some offset. The constant global arrays
may be arrays of `i8` which are constant strings but they may also be
arrays of `i32, i64, etc...`.
Reviewed By: lei, amyk
Differential Revision: https://reviews.llvm.org/D155730
The alignment of function pointers was added to the Datalayout by
D57335 but currently is unset for the Power target. This will cause us
to compute a conservative minimum alignment of one if places like
Value::getPointerAlignment.
This patch implements the function pointer alignment in the Datalayout
for the Power backend and Power targets in clang, so we can query the
value for a particular Power target.
We come up with the correct value one of two ways:
- If the target uses function descriptor objects (i.e. ELFv1 & AIX ABIs),
then a function pointer points to the descriptor, so use the alignment
we would emit the descriptor with.
- If the target doesn't use function descriptor objects (i.e. ELFv2), a
function pointer points to the global entry point, so use the minimum
alignment for code on Power (i.e. 4-bytes).
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D147016
Add a member function isPPC64ELFv2ABI() to determine what ABI is used on the
64-bit PowerPC big endian operating environment.
Reviewed By: nemanjai, dim, pkubaj
Differential Revision: https://reviews.llvm.org/D144321
With the NPM, we're now defaulting to preserving LCSSA, so a couple
of tests have changed slightly.
Differential Revision: https://reviews.llvm.org/D140982
Currently default simd alignment is specified by Clang specific TargetInfo
class. This class cannot be reused for LLVM Flang. If we move the default
alignment field into TargetMachine class then we can create TargetMachine
objects and query them to find SIMD alignment.
Scope of changes:
1) Added information about maximal allowed SIMD alignment to TargetMachine
classes.
2) Removed getSimdDefaultAlign function from Clang TargetInfo class.
3) Refactored createTargetMachine function.
Reviewed By: jsjodin
Differential Revision: https://reviews.llvm.org/D138496
clang (like gcc) has the -mtune= command line option. This option
adds the "tune-cpu" attribute to a function. The intended functionality
is that the scheduling model of that cpu is used. E.g. -mtune=pwr9 -march=pwr8
generates only instructions supported on pwr8 but uses the scheduling model
of pwr9 for it.
This PR adds the infrastructure to support this in LLVM.
clang support was added in https://reviews.llvm.org/D130526.
Reviewed By: amyk, qiucf
Differential Revision: https://reviews.llvm.org/D138317
Follow up to the series:
1. https://reviews.llvm.org/D140161
2. https://reviews.llvm.org/D140349
3. https://reviews.llvm.org/D140331
4. https://reviews.llvm.org/D140323
Completes the work from the previous two for remaining targets.
This creates the following named passes that can be run via
`llc -{start|stop}-{before|after}`:
- arc-isel
- arm-isel
- avr-isel
- bpf-isel
- csky-isel
- hexagon-isel
- lanai-isel
- loongarch-isel
- m68k-isel
- msp430-isel
- mips-isel
- nvptx-isel
- ppc-codegen
- riscv-isel
- sparc-isel
- systemz-isel
- ve-isel
- wasm-isel
- xcore-isel
A nice way to write tests for SelectionDAGISel might be to use a RUN:
line like:
llc -mtriple=<triple> -start-before=<arch>-isel -stop-after=finalize-isel -o -
Fixes: https://github.com/llvm/llvm-project/issues/59538
Reviewed By: asb, zixuan-wu
Differential Revision: https://reviews.llvm.org/D140364
This fixes what I consider to be an API flaw I've tripped over
multiple times. The point this is constructed isn't well defined, so
depending on where this is first called, you can conclude different
information based on the MachineFunction. For example, the AMDGPU
implementation inspected the MachineFrameInfo on construction for the
stack objects and if the frame has calls. This kind of worked in
SelectionDAG which visited all allocas up front, but broke in
GlobalISel which hasn't visited any of the IR when arguments are
lowered.
I've run into similar problems before with the MIR parser and trying
to make use of other MachineFunction fields, so I think it's best to
just categorically disallow dependency on the MachineFunction state in
the constructor and to always construct this at the same time as the
MachineFunction itself.
A missing feature I still could use is a way to access an custom
analysis pass on the IR here.
Tail duplication may modify the loop to a "non-canonical" form
that CTR Loop pass can not recognize. We fixed one issue in D135846.
And we found in some other case, the loop is changed to irreducible form.
It is hard to fix this case in CTR loop pass, instead we reorder the
CTR loop pass before tail duplication pass and just after finalize-isel
pass to avoid any unexpected change to the loop form.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D138265
This patch implements a new way to generate the CTR loops. Now the
intrinsics inserted in hardware loop pass will be mapped to pseudo
instructions and these pseudo instructions will be expanded to CTR
loop or normal compare+branch loop in this post ISEL pass.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D122125