Sometimes we want to create a SchedPredicate out of different types of
SchedPredicate: composing a `FeatureSchedPredicate` with a
`MCSchedPredicate`, for example. For those cases, this patch creates
several new TableGen constructions so that users can create a
SchedPredicate like
```
AllOfSchedPreds<[FeatureFooPred,
MCSchedPredicate<CheckZeroOperand<1>>]>
```
which evaluates to true only when both `FeatureFooPred` and
`MCSchedPredicate<CheckZeroOperand<1>>` hold.
- Adopt ifdef and namespace emitters in SubtargeEmitter.
- To aid that, factor out emission of different sections of the code
into individual helper functions.
Introduce a new SchedPredicate, `FeatureSchedPredicate`, that holds true
when a certain SubtargetFeature is enabled. This could be useful when we
want to configure a scheduling model with subtarget features.
I add this as a separate SchedPredicate rather than piggy-back on the
existing `SchedPredicate<[{....}]>` because first and foremost,
`SchedPredicate` is expected to only operate on MachineInstr, so it does
_not_ appear in `MCGenSubtargetInfo::resolveVariantSchedClass` but only
show up in `TargetGenSubtargetInfo::resolveSchedClass`. Yet I think
`FeatureSchedPredicate` will be useful for both MCInst and MachineInstr.
There is another subtle difference between `resolveVariantSchedClass`
and `resolveSchedClass` regarding how we access the MCSubtargetInfo
instance, if we really want to express `FeatureSchedPredicate` using
`SchedPredicate<[{.....}]>`.
So I thought it'll be easier to add another new SchedPredicate for
SubtargetFeature.
`Target_MC::resolveVariantSchedClassImpl` is the implementation function
for `TargetGenMCSubtargetInfo::resolveVariantSchedClass`. Despite being
only called by `resolveVariantSchedClass`,
`resolveVariantSchedClassImpl` is still a standalone function that
cannot access a MCSubtargetInfo through `this` (i.e.
`TargetGenMCSubtargetInfo`). And having access to a `MCSubtargetInfo`
could be useful for some (future) SchedPredicate.
This patch modifies TableGen to generate `resolveVariantSchedClassImpl`
with an additional `MCSubtargetInfo` argument passing in. Note that this
does not change any public interface in either `TargetGenMCSubtargetInfo
` or `MCSubtargetInfo`, as `resolveVariantSchedClassImpl` is basically
an internal function.
This is a generalization of the LookupPtrRegClass mechanism.
AMDGPU has several use cases for swapping the register class of
instruction operands based on the subtarget, but none of them
really fit into the box of being pointer-like.
The current system requires manual management of an arbitrary integer
ID. For the AMDGPU use case, this would end up being around 40 new
entries to manage.
This just introduces the base infrastructure. I have ports of all
the target specific usage of PointerLikeRegClass ready.
`Predicates` and `Features` fields serve the same purpose. They should
be kept in sync, but not all predicates are based on features. This
resulted in introducing dummy features for that only reason.
This patch removes `Features` field and changes TableGen emitters to use
`Predicates` instead.
Historically, predicates were written with the assumption that the
checking code will be used in `SelectionDAGISel` subclasses, meaning
they will have access to the subclass variables, such as `Subtarget`.
There are no such variables in the generated
`GenSubtargetInfo::getHwModeSet()`, so we need to provide them. This can
be achieved by subclassing `HwModePredicateProlog`, see an example in
`Hexagon.td`.
Dynamic relocations are expensive on ELF/Linux platforms because they
are applied in userspace on process startup. Therefore, it is worth
optimizing them to make PIE and PIC dylib builds faster. In +asserts
builds (non-NDEBUG), nikic identified these schedule class name string
pointers as the leading source of dynamic relocations. [1]
This change uses llvm::StringTable and the StringToOffsetTable TableGen
helper to turn the string pointers into 32-bit offsets into a separate
character array.
The number of dynamic relocations is reduced by ~60%:
❯ llvm-readelf --dyn-relocations lib/libLLVM.so | wc -l
381376 # before
155156 # after
The test suite time is modestly affected, but I'm running on a shared
noisy workstation VM with a ton of cores:
https://gist.github.com/rnk/f38882c2fe2e63d0eb58b8fffeab69de
Testing Time: 100.88s # before
Testing Time: 78.50s. # after
Testing Time: 96.25s. # before again
I haven't used any fancy hyperfine/denoising tools, but I think the
result is clearly visible and we should ship it.
[1] https://gist.github.com/nikic/554f0a544ca15d5219788f1030f78c5a
1. The PR proceeds with a backend target hook to allow front-ends to
determine what target features are available in a compilation based on
the CPU name.
2. Fix a backend target feature bug that supports HTM for
Power8/9/10/11. However, HTM is only supported on Power8/9 according to
the ISA.
3. All target features that are hardcoded in PPC.cpp can be retrieved
from the backend target feature. I have double-checked that the
hardcoded logic for inferring target features from the CPU in the
frontend(PPC.cpp) is the same as in PPC.td.
The reland patch addressed the comment
https://github.com/llvm/llvm-project/pull/137670#discussion_r2143541120
This reverts commit 9208b343e962b9f1140ee345c0050a3920bdcbf2.
TargetParser shouldn't re-run the PPC subtarget tablegen target, it
should define its own `-gen-ppc-target-def` rule like all the other
targets do in llvm/include/llvm/TargetParser/CMakeLists.txt .
One user reported that there are incorrect CMake dependencies after this
change, so I will roll this back in the meantime.
1. The PR proceeds with a backend target hook to allow front-ends to
determine what target features are available in a compilation based on
the CPU name.
2. Fix a backend target feature bug that supports HTM for
Power8/9/10/11. However, HTM is only supported on Power8/9 according to
the ISA.
3. All target features that are hardcoded in PPC.cpp can be retrieved
from the backend target feature. I have double-checked that the
hardcoded logic for inferring target features from the CPU in the
frontend(PPC.cpp) is the same as in PPC.td.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
I don't think a processor should normally contains duplicate features. This patch prints a warning whenever that happens in either the features or tune features.
I was looking at the value range of AcquireAtCycle / ReleaseAtCycle, and
I noticed that while the TableGen error messages said AcquireAtCycle has
to be less than ReleaseAtCycle, in reality they are actually allowed to
be the same. This patch fixes it and add more test cases.
This patch adds support for describing per-write resource cycle counts
for ReadAdvance records via a new optional field called `tunables`.
This makes it possible to declare ReadAdvance records such as:
def : ReadAdvance<Read_C, 1, [Write_A, Write_B], [2]>;
The above will effectively declare two entries in the ReadAdvance
table for Read_C, one for Write_A with a cycle count of 1+2, and one for
Write_B with a cycle count of 1+0 (omitted values are assumed 0).
The field `tunables` provides a list of deltas relative to the base
`cycle` count of the ReadAdvance. Since the field is optional and
defaults to a list of 0's, this change doesn't affect current targets.
This patch adds a new option `-aarch64-enable-zpr-predicate-spills`
(which is disabled by default), this option replaces predicate spills
with vector spills in streaming[-compatible] functions.
For example:
```
str p8, [sp, #7, mul vl] // 2-byte Folded Spill
// ...
ldr p8, [sp, #7, mul vl] // 2-byte Folded Reload
```
Becomes:
```
mov z0.b, p8/z, #1
str z0, [sp] // 16-byte Folded Spill
// ...
ldr z0, [sp] // 16-byte Folded Reload
ptrue p4.b
cmpne p8.b, p4/z, z0.b, #0
```
This is done to avoid streaming memory hazards between FPR/vector and
predicate spills, which currently occupy the same stack area even when
the `-aarch64-stack-hazard-size` flag is set.
This is implemented with two new pseudos SPILL_PPR_TO_ZPR_SLOT_PSEUDO
and FILL_PPR_FROM_ZPR_SLOT_PSEUDO. The expansion of these pseudos
handles scavenging the required registers (z0 in the above example) and,
in the worst case spilling a register to an emergency stack slot in the
expansion. The condition flags are also preserved around the `cmpne` in
case they are live at the expansion point.
Use this to improve performance of SubtargetEmitter::findWriteResources
and SubtargetEmitter::findReadAdvance. Now we can do a map lookup
instead of a linear search through all WriteRes/ReadAdvance records.
This reduces the build time of RISCVGenSubtargetInfo.inc on my
machine from 43 seconds to 10 seconds.
Code cleanups for TableGen files, changes includes function names,
variable names and unused imports.
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Adopt scaled indent in PredicateExpander.
Added pre/post inc/dec operators to `indent` and related unit tests.
Verified by comparing *.inc files generated by LLVM build with/without
the change.
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
Split RecordKeeper `getAllDerivedDefinitions` family of functions into
two variants:
(a) non-const ones that return vectors of `Record *` and
(b) const ones, that return vector/ArrayRef of `const Record *`.
This will help gradual migration of TableGen backends to use
`const RecordKeeper` and by implication change code to work
with const pointers and better const correctness.
Existing backends are not yet compatible with the const family of
functions, so change them to use a non-constant `RecordKeeper`
reference, till they are migrated.
- Extract code for sorting and checking duplicate Records into a helper
function and update `collectProcModels` to use the helper.
- Update `FeatureKeyValues` to:
(a) Remove code for duplicate checks and use the helper.
(b) Trim features with empty name explicitly to be able to use
the helper.
- Make the sorting deterministic by using record name as a secondary
key for sorting, and re-enable SubtargetFeatureUniqueNames.td test
that was disabled due to the non-determinism of the error messages.
- Change wording of error message when duplicate records are found to
be source code position agnostic, since `First` may not be before
`Second` lexically.
- Keep track of last definition of a feature in a `DenseMap` and use
it to report a better error message when a duplicate feature is found.
- Use StringMap instead of a std::map in `EmitStageAndOperandCycleData`
- Add a unit test to check if duplicate names are flagged.
SubtargetEmitter::GenSchedClassTables takes a CodeGenProcModel, but
calls hasReadOfWrite which loops over all ProcModels. We move
hasReadOfWrite to CodeGenProcModel and remove the loop over all
ProcModels. This leads to a 144% speedup on the RISC-V backend of our
downstream.
1. Bitwise operations are used to access HwMode, allowing for the
coexistence of HwMode IDs for different features (such as RegInfo and
EncodingInfo). This will provide better scalability for HwMode.
Currently, most users utilize HwMode primarily for configuring
Register-related information, and few use it for configuring Encoding.
The limited scalability of HwMode has been a significant factor in this
usage pattern.
2. Sink the HwMode Encodings selection logic down to per instruction
level, this makes the logic for choosing encodings clearer and provides
better error messages.
3. Add some HwMode ID conflict detection to the getHwMode() interface.
Refactor of the llvm-tblgen source into:
- a "Basic" library, which contains the bare minimum utilities to build
`llvm-min-tablegen`
- a "Common" library which contains all of the helpers for TableGen
backends. Such helpers can be shared by more than one backend, and even
unit tested (e.g. CodeExpander is, maybe we can add more over time)
Fixes#80647
Forming a `ReadAdvance` with an entry in the `ValidWrites` list that is
not used by any instruction results in the entire `ReadAdvance` to be
ignored by the scheduler due to an invalid entry.
The `SchedRW` collection code only picks up `SchedWrites` that are
reachable from `Instructions`, `InstRW`, `ItinRW` and `SchedAlias`,
leaving the unreachable ones with an invalid entry (0) in
`SubtargetEmitter::GenSchedClassTables` when going through the list of
`ReadAdvances`
Historically TableGen has used `A.swap(B)` to move containers without
the expense of copying them. Perhaps this predated rvalue references. In
any case `A = std::move(B)` seems like a more direct way to implement
this when only A is required after the operation.
`Fusion` is inherited from `SubtargetFeature` now. Each definition
of `Fusion` will define a `SubtargetFeature` accordingly.
Method `getMacroFusions` is added to `TargetSubtargetInfo`, which
returns a list of `MacroFusionPredTy` that will be evaluated by
MacroFusionMution.
`getMacroFusions` will be auto-generated if the target has `Fusion`
definitions.
Use of llvm::Optional was migrated to std::optional. This included a
change in the constructor of ArrayRef.
However, there are still 2 places in the SubtargetEmitter which uses
llvm::None, causing a compile error when emitted.
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
Differential Revision: https://reviews.llvm.org/D158568