This was being used for 2 different purposes.
The TargetMachine constructor prepends +64bit based on isPPC64
triples as a mode switch. The same feature name was also explicitly
added to different processors, making it impossible to perform a pure
feature check for whether 64-bit mode is enabled ir not. i.e.,
checkFeatures("+64bit") would be true even for ppc32 triples.
The comment in tablegen suggests it's relevant to track which processors
support 64-bit mode independently of whether that's the active compile
target, so replace that with a new feature.
This patch updates `overrideSchedPolicy` and `overridePostRASchedPolicy`
to take a
`SchedRegion` parameter instead of just `NumRegionInstrs`. This provides
access to both the
instruction range and the parent `MachineBasicBlock`, which enables
looking up function-level
attributes.
With this change, targets can select post-RA scheduling direction per
function using a function
attribute. For example:
```cpp
void overridePostRASchedPolicy(MachineSchedPolicy &Policy,
const SchedRegion &Region) const {
const Function &F = Region.RegionBegin->getMF()->getFunction();
Attribute Attr = F.getFnAttribute("amdgpu-post-ra-direction");
...
}
Utilize common API in PPCTargetParser
(https://github.com/llvm/llvm-project/pull/97541) to set default CPU
with same interfaces for LLC.
This will update AIX default CPU to pwr7 and LoP powerppc64 default CPU
to ppc64.
Some targets only pass a TargetMachine & to their subtarget constructor
and require a static_cast to their target-specific TargetMachine subclass
to create *InstructionSelector.
These 3 targets already have the correct TargetMachine subclass
reference so no cast is needed.
This removes the uses of target flags to disable subreg liveness,
relying on the `-enable-subreg-liveness` flag instead. The
`-enable-subreg-liveness` flag has been changed to take precedence over
the subtarget if set, and one use of `Subtarget->enableSubRegLiveness()`
has been changed to `MRI->subRegLivenessEnabled()` to make sure the
option properly applies.
Under some circumstance (library loaded with the main program), TLS
initial-exec model can be applied to local-dynamic access(es). We
could use some simple heuristic to decide the update at function level:
* If there is equal or less than a number of TLS local-dynamic access(es)
in the function, use TLS initial-exec model. (the threshold which default to
1 is controlled by hidden option)
Following the aix-small-local-exec-tls target attribute, this patch adds
a target attribute for an AIX-specific option in llc that informs the
compiler that it can use a faster access sequence for the local-dynamic
TLS model (formally named aix-small-local-dynamic-tls) when TLS
variables are less than ~32KB in size.
The patch either produces an addi/la with a displacement off of module
handle (return value from .__tls_get_mod) when the address is
calculated, or it produces an addi/la followed by a load/store when the
address is calculated and used for further accesses.
---------
Co-authored-by: Amy Kwan <amy.kwan1@ibm.com>
Exploit the per global code model attribute on AIX. On AIX we need to
update both the code sequence used to access the global (either 1 or 2
instructions for small and large code model respectively) and the
storage mapping class that we emit the toc entry.
---------
Co-authored-by: Amy Kwan <akwan0907@gmail.com>
The following tests fail when built with Address
and Undefined sanitizers:
CodeGen/PowerPC/basic-toc-data-def.ll
CodeGen/PowerPC/toc-data-large-array2.ll
Subtarget may be null in emitGlobalVariable, for example in the testcase
where we have no functions in the IR. The fix moves this function from
PPCSubtarget to a static helper function. This only fails with
sanitizers because the Subtarget is not used in the member function.
This patch disallows the use of the -maix-small-local-exec-tls and
-fno-data-sections options within clang, and also disallows the use of
the aix-small-local-exec-tls attribute with the -data-sections=false
option in llc.
This is because having data sections off when using the
aix-small-local-exec-tls feature is not ideal for performance. As the
small-local-exec-tls region is a limited resource, this space should not
used for variables that may be replaced.
Note, that on AIX, data sections is turned on by default, so this patch
makes it so that a diagnostic is emitted when users explicitly turn off
data sections while using the aix-small-local-exec-tls feature.
This patch adds a target attribute for an AIX-specific option that
informs the compiler that it can use a faster access sequence for the
local-exec TLS model (formally named aix-small-local-exec-tls).
The Clang portion of this option is in D155544.
The initial implementation to generate the faster access sequence is in
D155600.
Differential Revision: https://reviews.llvm.org/D156203
Add a member function isPPC32SecurePlt() to determine whether Secure
PLT is used by the target 32-bit PowerPC operating environment.
Reviewed By: dim, maskray
Differential Revision: https://reviews.llvm.org/D144444
clang (like gcc) has the -mtune= command line option. This option
adds the "tune-cpu" attribute to a function. The intended functionality
is that the scheduling model of that cpu is used. E.g. -mtune=pwr9 -march=pwr8
generates only instructions supported on pwr8 but uses the scheduling model
of pwr9 for it.
This PR adds the infrastructure to support this in LLVM.
clang support was added in https://reviews.llvm.org/D130526.
Reviewed By: amyk, qiucf
Differential Revision: https://reviews.llvm.org/D138317
The flags, initialization of the flags, and the getter methods for
features defined in PPC.td can be generated by TableGen.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D140028
Some PowerPC CPU may have slow MFLR instruction, so we need to
schedule the MFLR and its store in function prologue away to
hidden the long latency for slow MFLR instruction.
This patch adds a new feature fastMFLR and the new feature will
be used in https://reviews.llvm.org/D137423.
Reviewed By: RolandF
Differential Revision: https://reviews.llvm.org/D137612
On Power PC we have ISA3.0 for Power 9, ISA3.1 for Power 10.
This patchs adds an ISA for mcpu=future. The idea is to have a placeholder ISA
for work that is experimental and may not be supported by existing ISAs.
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D126075
Implement 'back-to-back' FX fusion according to Power10 User Manual
'19.1.5.4 Fusion', not enabled by default.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D114345
This implements the rest of Power10 instruction fusion pairs, according
to user manual, including 'wide immediate', 'load compare', 'zero move'
and 'SHA3 assist'.
Only 'SHA3 assist' is enabled by default.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D112912
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.
This allows us to ensure that Support doesn't have includes from MC/*.
Differential Revision: https://reviews.llvm.org/D111454
This patch makes sure that the builtins __builtin_ppc_load8r and
__ builtin_ppc_store8r are only available for Power 7 and up.
Currently the builtins seem to produce incorrect code if used for
Power 6 or before.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D110653
[NFC] This patch adds features for pwr7, pwr8, and pwr9 that can be
used for semachecking builtin functions that are only valid for certain
versions of ppc.
Reviewed By: nemanjai, #powerpc
Authored By: Quinn Pham <Quinn.Pham@ibm.com>
Differential revision: https://reviews.llvm.org/D105501
[NFC] This patch adds features for pwr7, pwr8, and pwr9 that can be
used for semachecking builtin functions that are only valid for certain
versions of ppc.
Reviewed By: nemanjai, #powerpc
Authored By: Quinn Pham <Quinn.Pham@ibm.com>
Differential revision: https://reviews.llvm.org/D105501
Add an option to tell the compiler that it can use privileged instructions.
This patch only adds the option. Backend implementation will be added in a
future patch.
Reviewed By: lei, amyk
Differential Revision: https://reviews.llvm.org/D99193
In order to have the same option on power PC LLVM and power PC gcc
the option will be changed from -mrop-protection to -mrop-protect.
The feature will be off by default and turned on when the option is used.
Reviewed By: lei, amyk
Differential Revision: https://reviews.llvm.org/D99185
The TargetMachine uses the triple to determine endianness. Just
use that logic rather than replicating it in PPCSubtarget.
Differential revision: https://reviews.llvm.org/D98674
Added -mrop-protection for Power PC to turn on codegen that provides some
protection from ROP attacks.
The option is off by default and can be turned on for Power 8, Power 9 and
Power 10.
This patch is for the option only. The feature will be implemented by a later
patch.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D96512
Legacy AIX assembly might not support all extended mnes,
add one feature bit to control the generation in MC,
and avoid generating them by default on AIX.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D94458
PowerPC cores like e200z759n3 [1] using an efpu2 only support single precision
hardware floating point instructions. The single precision instructions efs*
and evfs* are identical to the spe float instructions while efd* and evfd*
instructions trigger a not implemented exception.
This patch introduces a new command line option -mefpu2 which leads to
single-hardware / double-software code generation.
[1] Core reference:
https://www.nxp.com/files-static/32bit/doc/ref_manual/e200z759CRM.pdf
Differential revision: https://reviews.llvm.org/D92935
Add a triple for powerpcle-*-*.
This is a little-endian encoding of the 32-bit PowerPC ABI, useful in certain niche situations:
1) A loader such as the FreeBSD loader which will be loading a little endian kernel. This is required for PowerPC64LE to load properly in pseries VMs.
Such a loader is implemented as a freestanding ELF32 LSB binary.
2) Userspace emulation of a 32-bit LE architecture such as x86 on 64-bit hosts such as PowerPC64LE with tools like box86 requires having a 32-bit LE toolchain and library set, as they operate by translating only the main binary and switching to native code when making library calls.
3) The Void Linux for PowerPC project is experimenting with running an entire powerpcle userland.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D93918
This adds the initial GlobalISel skeleton for PowerPC. It can only run
ir-translator and legalizer for `ret void`.
This is largely based on the initial GlobalISel patch for RISCV
(https://reviews.llvm.org/D65219).
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D83100
On Power10, it's profitable to schedule some stores with adjacent target
address together. This patch implements this feature.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D86754
This patch adds frontend and backend options to enable and disable
the PowerPC MMA operations added in ISA 3.1. Instructions using these
options will be added in subsequent patches.
Differential Revision: https://reviews.llvm.org/D81442
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.
This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.
One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.
I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.
Differential Revision: https://reviews.llvm.org/D85165
Adds frontend and backend options to enable and disable the
PowerPC paired vector memory operations added in ISA 3.1.
Instructions using these options will be added in subsequent patches.
Differential Revision: https://reviews.llvm.org/D83722