`Fusion` is inherited from `SubtargetFeature` now. Each definition
of `Fusion` will define a `SubtargetFeature` accordingly.
Method `getMacroFusions` is added to `TargetSubtargetInfo`, which
returns a list of `MacroFusionPredTy` that will be evaluated by
MacroFusionMution.
`getMacroFusions` will be auto-generated if the target has `Fusion`
definitions.
This tries to fix a bug by resolving a few FIXMEs. The bug is that
`EraseInstAction` is emitted after emitting the _first_ `BuildMIAction`,
which is too early because the erased instruction may still be used by
subsequent `BuildMIAction`s (in particular, by `CopyRenderer`).
An example of the bug (from `match-table-operand-types.td`):
```
def InstTest0 : GICombineRule<
(defs root:$a),
(match (G_MUL i32:$x, i32:$b, i32:$c),
(G_MUL $a, i32:$b, i32:$x)),
(apply (G_ADD i64:$tmp, $b, i32:$c),
(G_ADD i8:$a, $b, i64:$tmp))>;
GIR_EraseFromParent, /*InsnID*/0,
GIR_BuildMI, /*InsnID*/1, /*Opcode*/GIMT_Encode2(TargetOpcode::G_ADD),
GIR_Copy, /*NewInsnID*/1, /*OldInsnID*/0, /*OpIdx*/0, // a
GIR_Copy, /*NewInsnID*/1, /*OldInsnID*/0, /*OpIdx*/1, // b
GIR_AddSimpleTempRegister, /*InsnID*/1, /*TempRegID*/0,
```
Here, the root instruction is destroyed before copying its operands ('a'
and 'b') to the new instruction.
The solution is to emit `EraseInstAction` for the root instruction as
the last action in the emission pipeline.
We record the usage of each `Predicate` and sort them by usage.
For the top 8 `Predicate`s, we will emit a `PC_CheckPredicateN` to
save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 61K.
This is a recommit of 1a57927, which was reverted in bc98c31.
The CI failures occurred when doing expensive checks (with option
`LLVM_ENABLE_EXPENSIVE_CHECKS` being ON).
The key point here is that we need stable sorting result in the
test, but doing expensive checks uncovered the non-determinism of
`llvm::sort`. So `llvm::sort` is changed to `llvm::stable_sort`
in this revised patch.
And we use `llvm::MapVector` to keep insertion order.
We record the usage of each `Predicate` and sort them by usage.
For the top 8 `Predicate`s, we will emit a `PC_CheckPredicateN` to
save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 61K.
We record the usage of each `PatternPredicate` and sort them by
usage.
For the top 8 `PatternPredicate`s, we will emit a
`OPC_CheckPatternPredicateN` to save one byte.
The old `OPC_CheckPatternPredicate2` is renamed to
`OPC_CheckPatternPredicateTwoByte`.
Overall this reduces the llc binary size with all in-tree targets by
about 93K.
We record the usage of each `ComplexPat` and sort the `ComplexPat`s
by usage.
For the top 8 `ComplexPat`s, we will emit a `OPC_CheckComplexPatN`
to save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 89K.
This patch adds support in the llvm-exegesis tablegen emitter for
validation counters. Full support for validation counters in
llvm-exegesis will be added in a future patch.
This is the logical equivalent for #76710 for APInt and uses the same
naming scheme.
Converted existing users through:
`git grep -l "cast<ConstantSDNode>\(.*\).*getAPIntValueValue" | xargs
sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getAPIntValue/\1->getAsAPIntVal/'`
When there is just one element in the type equivalence class (TEC),
`inferNamedOperandType` fails because it does not consider the passed
operand as a suitable one. This is incorrect when inferring the type of
an (unnamed) immediate operand.
This patch is a straightfoward change based on the design in #77202.
It does not have any effect since we haven't supported compressing ND
to non-ND in X86CompressEVEX.cpp.
BTW, we relax the condition for EVEX compression from
ST.hasAVX512() to ST.hasEGPR() || ST.hasAVX512(). It does not have any
effect now b/c no APX instruction is in the EVEX compression table so
far.
This patch is to extract NFC in #77065 into a separate commit.
1. Simplify getValueFromBitsInit about cast and return type
2. Remove out-of-date comments and allow memory ops in function
object `IsMatch` so that we can reuse it for EVEX2Legacy compression.
This patch is to extract NFC in #77065 into a separate commit.
Remove these two classes and put all the entries in X86 EVEX compression tables
that need special handling in .def file.
PR #77065 tries to add entries that need special handling for APX in
.def file. Compared to setting fields in td files, that method looks
cleaner. This patch is to unify the addition of manual entries.
RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031
APX introduces EGPR, NDD and NF instructions. In addition to compressing
EVEX encoded AVX512 instructions into VEX encoding, we also have several
more possible optimizations.
a. Promoted instruction (EVEX space) -> pre-promotion instruction (legacy space)
b. NDD (EVEX space) -> non-NDD (legacy space)
c. NF_ND (EVEX space) -> NF (EVEX space)
The first two types of compression can usually reduce code size, while
the third type of compression can help hardware decode although the
instruction length remains unchanged.
So we do the renaming for the upcoming APX optimizations.
BTW, I clang-format the code in X86CompressEVEX.cpp,
X86CompressEVEXTablesEmitter.cpp.
This patch also extracts the NFC in #77065 into a separate commit.
`FusionPredicate` is used to predicate if target instruction matches
the requirement. The targets can be firstMI, secondMI or both.
The `Fusion` contains a list of `FusionPredicate`. The generated code
will be like:
```
bool isNAME(const TargetInstrInfo &TII,
const TargetSubtargetInfo &STI,
const MachineInstr *FirstMI,
const MachineInstr &SecondMI) {
auto &MRI = SecondMI.getMF()->getRegInfo();
/* Predicates */
return true;
}
```
A boilerplate class called `SimpleFusion` is added. `SimpleFusion` has
a predefined structure of predicates and accepts predicate for
`firstMI`, predicate for `secondMI` and epilog/prolog as arguments.
The generated code for `SimpleFusion` will be like:
```
bool isNAME(const TargetInstrInfo &TII,
const TargetSubtargetInfo &STI,
const MachineInstr *FirstMI,
const MachineInstr &SecondMI) {
auto &MRI = SecondMI.getMF()->getRegInfo();
/* Prolog */
/* Predicate for `SecondMI` */
/* Wildcard */
/* Predicate for `FirstMI` */
/* Check One Use */
/* Tie registers */
/* Epilog */
return true;
}
```
For instructions that don't map to a mnemonic string, the implementation
of MCInstPrinter::getMnemonic would return an invalid pointer due to the
result of the calculation of the instruction's position in the `AsmStrs`
table. This patch fixes the issue by ensuring those cases return a
`nullptr` value instead.
Fixes#74177.
The followed byte of `OPC_EmitRegister` is a MVT type, which is
usually i32 or i64.
We add `OPC_EmitRegisterI32` and `OPC_EmitRegisterI64` so that we
can reduce one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 10K.
- Instead of checking the default ops directly, this change queries DAG
default operands collected during patterns reading. It does not only
simplify the code but also handle few cases where integer values are
converted from convertible types, such as 'bits'.
- A test case is added GlobalISelEmitter.td as the regression test of
default 'bits' values.
This adds a link from the main docs page back to the README where
I have previously added a list of useful resources.
To that list, I've added a link to my recent llvm blog post.
Most users of AddImm and CheckConstantInt only use 1 byte immediates, so
I added an opcode variants for those. That way all those instructions
save 7 bytes.
Also added an opcode for AddTempRegister for the cases where there are
no register flags.
Space savings:
- AMDGPUGenGlobalISel: 470180 bytes to 422564 (-10%)
- AArch64GenGlobalISel.inc: 383893 bytes to 374046
There are a lot of operations to move current node to parent and
then move to another child.
So `OPC_MoveSibling` and its space-optimized forms are added to do
this "move to sibling" operations.
These new operations will be generated when optimizing matcher in
`ContractNodes`. Currently `MoveParent+MoveChild` will be optimized
to `MoveSibling` and sequences `MoveParent+RecordChild+MoveChild`
will be transformed into `MoveSibling+RecordNode`.
Overall this reduces the llc binary size with all in-tree targets by
about 30K.
If there is only one bit set in EmitNodeInfo, then we can encode it
implicitly to save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 168K.
The most common type is i32 or i64 so we add `OPC_CheckChildTypeI32`
and `OPC_CheckChildTypeI64` to save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 70K.
These new opcodes implicitly indicate the RecNo.
The old `OPC_EmitCopyToReg2` is renamed to `OPC_EmitCopyToRegTwoByte`.
Overall this reduces the llc binary size with all in-tree targets by
about 33K (most are from RISCV target).
The most common type is i32 or i64 so we add `OPC_CheckTypeI32` and
`OPC_CheckTypeI64` to save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 29K.
When importing instruction selection patterns into GlobalISel, the
operands matched in the "source" DAG are copied into corresponding
operands of the "destination" DAG according to their names (such as Rd).
If multiple operands in the source DAG share the same name, a
GIM_CheckIsSameOperand predicate makes instruction selector check the
corresponding operands for equality (at compiler run-time) as part of
matching the source pattern.
The Def operands of the root node of the destination DAG are handled
specially. The operands of the instruction corresponding to the root
node are taken and GIM_CheckRegBankForClass predicates are
tablegen-erated accordingly. If by coincidence the Def operand in
question has the same name as one of the named operands in the pattern,
a GIM_CheckIsSameOperand predicate is automatically added that is likely
to prevent matching the source of otherwise applicable selection pattern
at compiler run-time.
This patch mangles the Def operand names taken from the instruction
corresponding to the root of the destination DAG (for example, "Rd"
becomes "DstI[Rd]") preventing unexpected name clashes with pattern's
named operands.
The patch consists of three sets of changes:
* changes to the GlobalISelEmitter.cpp file are the actual fix
* a test case is added to GlobalISelEmitter.td file as a regression test
* everything else is the biggest and least interesting part - updates to
the existing test cases: renames of the form Rd -> DstI[Rd] inside the
inline comments in tablegen-erated code