CodeGenIntrinsic changes:
- Use `const` Record pointers, and `StringRef` when possible.
- Default initialize several fields with their definition instead of in
the constructor.
- Simplify various string checks in the constructor using StringRef
starts_with()/ends_with() functions.
- Eliminate first argument to `setDefaultProperties` and use `TheDef`
class member instead.
IntrinsicEmitter changes:
- Emit `namespace llvm::Intrinsic` instead of nested namespaces.
- End generated comments with a .
- Use range based for loops, and early continue within loops.
- Emit `static constexpr` instead of `static const` for arrays.
- Change `compareFnAttributes` to use std::tie() to compare intrinsic
attributes and return a default value when all attributes are equal.
STLExtras:
- Add std::replace wrapper which takes a range.
Combiners that use C++ code in their "apply" pattern only use that. They
never mix it with MIR patterns as that has little added value.
This patch restricts C++ apply code so that if C++ is used, we cannot
use MIR patterns or builtins with it. Adding this restriction allows us
to merge calls to match and apply C++ code together, which in turns
makes it so we can just have MatchData variables on the stack.
So before, we would have
```
GIM_CheckCxxInsnPredicate // match
GIM_CheckCxxInsnPredicate // apply
GIR_Done
```
Alongside a massive C++ struct holding the MatchData of all rules
possible (which was a big space/perf issue).
Now we just have
```
GIR_DoneWithCustomAction
```
And the function being ran just does
```
unsigned SomeMatchData;
if (match(SomeMatchData))
apply(SomeMatchData)
```
This approach solves multiple issues in one:
- MatchData handling is greatly simplified and more efficient, "don't
pay for what you don't use"
- We reduce the size of the match table
- Calling C++ code has a certain overhead (we need a switch), and this
overhead is only paid once now.
Handling of C++ code inside PatFrags is unchanged though, that still
emits a `GIM_CheckCxxInsnPredicate`. This is completely fine as they
can't use MatchDatas.
- Remove some cases where ULEB128 isn't needed
- Add a fastDecodeULEB128 tailored for GlobalISel which does unchecked
decoding optimized for the common case, which is 1 byte values. We
rarely have >1 byte Inst IDs, OpIdx, etc. and those are the most common
ULEB users by far.
This specific LEB128 decode function generates almost 2x less
instructions than the generic one.
The vast majority of the following (very common) opcodes were always
called with identical arguments:
- `GIM_CheckType` for the root
- `GIM_CheckRegBankForClass` for the root
- `GIR_Copy` between the old and new root
- `GIR_ConstrainSelectedInstOperands` on the new root
- `GIR_BuildMI` to create the new root
I added overloaded version of each opcode specialized for the root
instructions. It always saves between 1 and 2 bytes per instance
depending on the number of arguments specialized into the opcode. Some
of these opcodes had between 5 and 15k occurences in the AArch64
GlobalISel Match Table.
Additionally, the following opcodes are almost always used in the same
sequence:
- `GIR_EraseFromParent 0` + `GIR_Done`
- `GIR_EraseRootFromParent_Done` has been created to do both. Saves 2
bytes per occurence.
- `GIR_IsSafeToFold` was *always* called for each InsnID except 0.
- Changed the opcode to take the number of instructions to check after
`MI[0]`
The savings from these are pretty neat. For `AArch64GenGlobalISel.inc`:
- `AArch64InstructionSelector.cpp.o` goes down from 772kb to 704kb (-10%
code size)
- Self-reported MatchTable size goes from 420380 bytes to 352426 bytes
(~ -17%)
A smaller match table means a faster match table because we spend less
time iterating and decoding.
I don't have a solid measurement methodology for GlobalISel performance
so I don't have precise numbers but I saw a few % of improvements in a
simple testcase.
Refactor of the llvm-tblgen source into:
- a "Basic" library, which contains the bare minimum utilities to build
`llvm-min-tablegen`
- a "Common" library which contains all of the helpers for TableGen
backends. Such helpers can be shared by more than one backend, and even
unit tested (e.g. CodeExpander is, maybe we can add more over time)
Fixes#80647