Refactor IIT encoding generation. The core change here is that when
generating IIT encodings, we pre-generate all the bits of the IIT
encoding except cases where a type needs to encode its own overload
index, which is patched in later in `TypeInfoGen`. In addition, this
change introduces a class hierarchy for dependent types, so that the
checks in `TypeInfoGen` are more meaningful, and renames/simplifies
several other pieces of code, as listed below.
1. Change the encoding for IIT_ARG's ArgInfo byte to encode the overload
slot index in lower 5 bits and the argument kind in upper 3 bits. This
enabled generating the same packed format for all other dependent types
that need to encode an overload slot index in the IIT encoding. Adjusted
the corresponding C++ code in `IITDescriptor::getArgumentNumber` and
`IIT_Descriptor::getArgumentKind`.
2. Introduce more descriptive classes to handle packing of the overload
index + arg kind into the IIT encoding. `OverloadIndexPlaceholder` is
used to generate a transient value in the type-signature that is patched
in `TypeInfoGen` with that type's overload index. `PackOverloadIndex` is
used to encapsulate the final packing of an overload index and argument
kind in a single byte, and `PatchOverloadIndex` is the class that does
the required patching of a `OverloadIndexPlaceholder` given the type's
overload index.
3. Delete `isAny`, `ArgCode` and `Number` from base `LLVMType` class.
Replace use of `isAny` with `isa<LLVMAnyType>`, `ArgCode` is not used
anymore, and move `Number`, which was used to represent the overload
index for a dependent type to the `LLVMDependentType` class and rename
it to `OverloadIndex`.
4. Introduce `LLVMDependentType` as a base class of all dependent types.
It holds the overload index of the type it depends on in its
`OverloadIndex` field. Also introduce 2 subclasses,
`LLVMFullyDependentType` to represent all fully dependent types (which
encode just the appropriate IIT code and the dependent type's overload
index) and `LLVMPartiallyDependentType` to represent partially dependent
types, that encode the appropriate IIT code and both this type's
overload index and the dependent type's overload index.
5. Change existing dependent type classes to derive from one of these
classes and rename the `num` class argument to `oidx` to better reflect
its meaning.
6. Rename various fields and classes used in `TypeInfoGen` to be more
meaningful. `AssignOverloadIndex` to do overload index assignment,
rename `ACIdxs` to `OverloadIdxs`, `ACTys` to `OverloadTypes` and use
the `DoPatchOverloadIndex` to patch in assigned overload slot indexes.
Eliminate `MappingRIdx` by making it an identity function. Currently,
`MappingRIdx` is used to map the index of an `llvm_any*` type in an
intrinsic type signature to its overload index. Eliminating this mapping
means that dependent types in LLVM intrinsic definitions (like
`LLVMMatchType` and its subclasses) should use the overload index to
reference the overload type that it depends on (and not the index within
the llvm_any* subset of overloaded types).
See
https://discourse.llvm.org/t/rfc-simplifying-intrinsics-type-signature-iit-info-generation-encoding-in-intrinsicemitter-cpp/90383
NF (No-Flags) instructions should not compress to non-NF instructions,
as this would incorrectly modify flags behavior. The compression table
is only intended for encoding optimizations that preserve semantics.
This removes the incorrect NF entries that could have led to
miscompilation if the compression logic were applied.
This reverts commit b1aa6a45060bb9f89efded9e694503d6b4626a4a and commit
ce44d63e0d14039f1e8f68e6b7c4672457cabd4e.
This fails the build with some older gcc:
llvm/include/llvm/CodeGenTypes/LowLevelType.h:501:35: error: call to
non-constexpr function ‘static llvm::LLT llvm::LLT::integer(unsigned
int)’
return integer(getSizeInBits());
^
Added extra information in LLT to support ambiguous fp types during
GlobalISel. Original idea by @tgymnich
Main differences from https://github.com/llvm/llvm-project/pull/122503
are:
* Do not deprecate LLT::scalar
* Allow targets to enable/disable IR translation with extenden LLT via
`TargetOption::EnableGlobalISelExtendedLLT` (disabled by default)
* `IRTranslator` use `TargetLoweringInfo` for appropriate `LLT`
generation.
* For this reason added flag in GlobalISelMatchTable` to allow switch
between legacy and new extended LLT names
* Revert using stubs like `LLT::float32` for float types as they are
real now. Added `TODO` for such cases.
Also MIRParser now may parse new type indentifiers.
---------
Co-authored-by: Tim Gymnich <tim@gymni.ch>
Co-authored-by: Ryan Cowan <ryan.cowan@arm.com>
Adds support for `atomicrmw` `fminimumnum`/`fmaximumnum` operations.
These were added to C++ in P3008, and are exposed in libc++ in #186716 .
Adding LLVM IR support for these unblocks work in both backends with HW
support, and frontends.
This PR blocklist instructions that are unsafe for masked-load folding.
Folding with the same mask is only safe if every active destination
element reads only from source elements that are also active under the
same mask. These instructions perform element rearrangement or
broadcasting, which may cause active destination elements to read from
masked-off source elements.
VPERMILPD and VPERMILPS are safe only in the rrk form, the rik form
needs to be blocklisted. In the rrk form, the masked source operand is a
control mask, while in the rik form the masked source operand is the
data/value. This is also why VPSHUFB is safe to fold, while other
shuffles such as VSHUFPS are not.
Examples:
```
EVEX.128.66.0F.WIG 67 /r VPACKUSWB xmm1{k1}{z}, xmm2, xmm3/m128
A: 00010203 7F000001 80000002 DEADBEEF
E : 00000000 00000001 00000002 00000003
D: 11111111 22222222 33333333 44444444
k = 0x0400
Masked_e = 00000000 00000000 00000000 00000000 (vmovdqu8{k}{z} Masked_e E)
res1 = 00000000 00000000 00010000 00000000 (VPACKUSWB D{k}{z}, A, E)
res2 = 00000000 00000000 00000000 00000000 (VPACKUSWB D{k}{z}, A, Masked_e)
EVEX.128.66.0F38.W0 C4 /r VPCONFLICTD xmm1 {k1}{z}, xmm2/m128/m32bcst
A: DAA66D2B FFFFFFFC FFFFFFFC D9A0643C
E : 7DDF743F 00000000 5FD99E73 4ED634C9
D: 2629AB38 9E37782F 67BB800F AD66764A
k = 0x0002
Masked_e = (vmovdqu32 {k}{z} Masked_e E)
res1 = 00000000 00000000 00000000 00000000 (VPCONFLICTD D{k}{z}, E)
res2 = 00000000 00000001 00000000 00000000 (VPCONFLICTD D{k}{z}, Masked_e)
EVEX.128.66.0F38.W1 8D /r VPERMW xmm1 {k1}{z}, xmm2, xmm3/m128
A: 00010203 7F000001 80000002 DEADBEEF
E : 00000000 00000001 00000002 00000003
D: 11111111 22222222 33333333 44444444
k = 0x0010
Masked_e = 00000000 00000000 00000002 00000000 (vmovdqu16 {k}{z} Masked_e E)
res1 = 00000000 00000000 00000001 00000000 (vpermw D{k}{z}, A, E)
res2 = 00000000 00000000 00000000 00000000 (vpermw D{k}{z}, A, Masked_e)
EVEX.128.66.0F38.W0 78 /r VPBROADCASTB xmm1{k1}{z}, xmm2/m8
E : 7F4A7C15 6E490933 5D4C9659 4C433CE3
D: F63F9D36 97F6E2B2 9432E8E6 FAEE7A3E
k = 0x0002
Masked_e = 00007C00 00000000 00000000 00000000 (vmovdqu8{k}{z} Masked_e E)
res = 00001500 00000000 00000000 00000000 (vpbroadcastb D{k}{z}, E)
res = 00000000 00000000 00000000 00000000 (vpbroadcastb D{k}{z}, Masked_e)
```
Baseline: https://github.com/llvm/llvm-project/pull/178411
The checks have been unused forever. This was an oversight in the patch
that introduced this test: https://reviews.llvm.org/D63814
Also fix the checks to match the actual output. This looks like another
oversight in the original patch, presumably because the checks were
never actually tested.
## Motivation
LLVM TableGen currently lacks a way to **accumulate** field values
across class hierarchies. When a derived class sets a field via `let`,
it completely replaces the parent's value. This forces users into
verbose workarounds like:
```tablegen
class Op { // This is generic MLIR Base
code extraClassDeclaration = ?;
}
// Some Generic shared base
class MyShared1OpClass : Op {
code shared1ExtraClassDeclaration = [{ some generic code 1 }];
}
class MyShared2OpClass : MyShared1OpClass {
code shared2ExtraClassDeclaration = [{ some generic code 2 }];
}
def MyOp : MyShared2OpClass {
// need to manually concatenate shared code
let extraClassDeclaration =
shared1ExtraClassDeclaration
# shared2ExtraClassDeclaration
# [{ additional specialized code }];
}
```
Instead I propose a more natural incremental solution without
unnecessery intermediate definitions:
```
class Op {
code extraClassDeclaration = ?;
}
class MyShared1OpClass : Op {
let append extraClassDeclaration = [{ some generic code 1 }];
}
class MyShared2OpClass : MyShared1OpClass {
let append extraClassDeclaration = [{ some generic code 2 }];
}
def MyOp : MyShared2OpClass {
let append extraClassDeclaration = [{ additional specialized code }];
}
```
This is especially painful in MLIR, where dialect authors want base
op/type/attribute classes to inject shared C++ declarations into all
derived definitions. I attempted to solve this in PR
https://github.com/llvm/llvm-project/pull/182265 with MLIR-specific
`inheritableExtraClassDeclaration`/`Definition` fields, but as
@joker-eph [pointed
out](https://github.com/llvm/llvm-project/pull/182265#discussion_r2098718600),
this is ad-hoc -- the same inheritance problem exists for `traits`,
`arguments`, `results`, and any other list/string/dag field. Rather than
adding `inheritable*` variants per field, we should solve this at the
language level.
## Design
This PR adds two new modifiers to the `let` statement: **`append`** and
**`prepend`**.
```tablegen
class Base {
list<int> items = [1, 2];
string text = "hello";
dag d = (op);
}
def Example : Base {
let append items = [3, 4]; // items = [1, 2, 3, 4]
let prepend items = [0]; // items = [0, 1, 2]
let append text = " world"; // text = "hello world"
let prepend text = "say "; // text = "say hello"
let append d = (op 3:$a); // d = (op 3:$a)
}
```
### Supported types
| Field type | Operation | Concat operator |
|---|---|---|
| `list<T>` | append/prepend | `!listconcat` |
| `string` / `code` | append/prepend | `!strconcat` |
| `dag` | append/prepend | `!con` |
| Other (`bit`, `int`, `bits`) | -- | Error |
### Semantics
- **`let append`** concatenates the new value **after** the current
value
- **`let prepend`** concatenates the new value **before** the current
value
- If the current value is **unset** (`?`), the new value is used
directly
- A plain **`let`** (without modifier) still replaces, allowing opt-out
from accumulated values
- Works in both **body-level** (`def Foo { let append ... }`) and
**top-level** (`let append ... in { }`) contexts
### Multi-level inheritance
Accumulation works naturally across inheritance chains:
```tablegen
class Base {
list<int> items = [1, 2];
}
class Middle : Base {
let append items = [3]; // items = [1, 2, 3]
}
def Leaf : Middle {
let append items = [4]; // items = [1, 2, 3, 4]
}
```
### Multiple inheritance
TableGen supports multiple inheritance (`def D : A, B { ... }`), where
parent classes are processed left to right and the **last parent class's
value wins** for any shared field. `let append`/`let prepend` operates
on whatever value the field has *after* inheritance resolution — it does
not accumulate across sibling parents:
```tablegen
class A { list<int> items = [1, 2]; }
class B { list<int> items = [3, 4]; }
def D : A, B {
let append items = [5]; // items = [3, 4, 5] (A's value is lost)
}
```
This also applies to diamond inheritance:
```tablegen
class Base { list<int> items = [1]; }
class Left : Base { let append items = [2]; } // [1, 2]
class Right : Base { let append items = [3]; } // [1, 3]
def D : Left, Right {
let append items = [4]; // items = [1, 3, 4] (Left's [2] is lost)
}
```
This is consistent with how plain `let` works with multiple inheritance
— it is the standard last-writer-wins rule. Users who need accumulation
from multiple parents should use a single-inheritance chain instead.
## Backward compatibility
This proposal is **fully backward compatible**. The keywords `append`
and `prepend` are implemented as **context-sensitive keywords** — they
are only recognized as modifiers when they appear immediately after
`let` (in both body-level and top-level contexts). In all other
positions, `append` and `prepend` remain valid identifiers and can be
used as field names, class names, def names, etc. This means:
- No existing `.td` files (in-tree or out-of-tree) will break
- Fields named `append` or `prepend` continue to work: `let append
append = [5];` is valid (the first `append` is the modifier, the second
is the field name)
- The parser checks for the identifier string value after `let`, not for
a reserved token
RFC:
https://discourse.llvm.org/t/rfc-tablegen-add-let-append-prepend-syntax-for-field-concatenation/89924/
This PR:
Fixes a slight off-by-one error in the check for how many bits are
allocated for subreg lane masks. If 65 subreg lanes are used, it fails
later, but the error message is not clear as to what has occured.
These instructions don't ignore the W bit as we had previously marked.
Also support (v)pcmpestr(m/i)l as an alias for the W0 form to match
binutils.
Fixes part of #183635
Artificial registers were added in
eb0c510ecde667cd911682cc1e855f73f341d134
as a means of giving super-registers heavier weights than that
of their subregisters, even when they only contain a single
physical subregister.
Artifical registers thus do exist in code and participate in
register unit weight calculations, but are not supposed to be
available for register allocation.
This patch completes the support for artificial registers to:
- Ignore artificial registers when joining register unit uber
sets. Artificial registers may be members of classes that
together include registers and their sub-registers, making it
impossible to compute normalised weights for uber sets they
belong to.
We have a use case downstream relying on this being supported,
which allows to avoid introducing a large number of additional
register classes.
- Not generate purely artificial register class intersections.
It is critical not to have such classes, as the common LLVM
codegen infrastructure will try to use them to constrain
classes of virtual registers instead of producing COPYs
whenever both the source and target register classes contain
the same artificial registers.
- Not generate sub-classes where classes with the same
non-artificial members already exist. This is mostly for
convenience. For example, the HI16-capable subset of AMDGPU's
AV_32 is VGPR_32, except VGPR_32 also contains the artificial
staging registers. If the staging registers are not ignored,
we'll end up having an additional generated register class,
AV_32_with_hi16_in_VGPR_16, -- harmless, but also useless.
Eliminates a few inferred AMDGPU register classes:
- VS_32_with_hi16
- VS_32_Lo256_with_hi16
- VS_32_Lo128_with_hi16
- VRegOrLds_32_and_VS_32_Lo256
- VRegOrLds_32_and_VS_32_Lo128
- SRegOrLds_32_and_VRegOrLds_32
Causes no register class changes for other targets.
Allow specification of the underlying C++ data type for `GenericEnum`.
I ran into this because I was trying to use a TableGen-genered enum in
`DenseSet` which requires the underlying type be specified.
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
Add target name prefix for a few static global variables in the
generated code. Also rework the TargetRegisterInfo constructor a bit to
use a ArrayRef for array of register classes and rename a few
constructor arguments to match the member names they initialize.
Sort the unique ValueTypeByHwMode combinations by usage and add a
compressed opcode for the most common.
Reduces the RISCVGenDAGISel.inc table by about ~12K. The most common
being XLenVT.
I plan to add EmitIntegerByHwMode0 and EmitRegisterByHwMode0 in
subsequent patches.
Assisted-by: claude
This is useful for `InstAlias` where a fixed register may depend on the
HwMode. The motivating use case for this is the RISC-V RVY ISA where
certain instructions mnemonics are remapped to take a different
register class depending on the HwMode and can be used as follows:
```
def NullReg : RegisterByHwMode<PtrRC, [RV32I, RV64I, RV64Y, RV64Y],
[X0, X0, X0_Y, X0_Y]>;
```
Pull Request: https://github.com/llvm/llvm-project/pull/175227
In comments in generated files and in -register-info-debug output, use
the standard name "DefaultMode" for consistency, instead of hard coding
an alternative name "Default".
I have a downstream target which has 128-bit instructions where some
instructions can have large sections of encoding to be determined ahead
of time. This results in the island calculations for decoder tables to
emit checks over 64-bits.
This change will emit multiple separate checks when the island exceeds
64-bits.
This patch proposes new a tuning feature string format that helps users
to build a performance model by "configuring" an existing tune CPU,
along with its scheduling model. For example, this string
```
"sifive-x280:single-element-vec-fp64"
```
takes ``sifive-x280`` as the "base" tune CPU and configured it with
``single-element-vec-fp64``. This gives us a performance model that
looks exactly like that of ``sifive-x280``, except some of the 64-bit
vector floating point instructions now produce only a single element per
cycle due to ``single-element-vec-fp64``.
This string could eventually be used in places like ``-mtune`` at the
frontend. Right now, this patch only implements the parser part, which
is put under the TargetParser library.
The grammar for this string is:
```
tune-cpu ::= 'tuning CPU name in lower case'
directive ::= "[a-zA-Z0-9_-]+"
tune-features ::= directive ["," directive]*
```
A *directive* can and can only _enable_ or _disable_ a certain tuning
feature from the tuning CPU. A **positive directive**, like the
``single-element-vec-fp64`` we just saw, enables an additional tuning
feature in the associated tuning model.
A **negative directive**, on the other hand, removes a certain tuning
feature. For example, ``sifive-x390`` already has the
``single-element-vec-fp64`` feature, and we can use
"sifive-x390:no-single-element-vec-fp64" to create a new performance
model that looks nearly the same as ``sifive-x390`` except
``single-element-vec-fp64`` being cut out. In this case,
``no-single-element-vec-fp64`` is a negative directive.
There are additional restrictions on what we can put in the list of
directives, please refer to the documentations for more details.
Right now, this string only accepts directives that are explicitly
supported by the tune CPU. For example, "sifive-x280:prefer-w-inst" is
not a valide string as ``prefer-w-inst`` is not supported by
``sifive-x280`` at this moment. Vendors of these processors are expected
to maintain the compatibility of their supported directives across
different versions.
---------
Co-authored-by: Sam Elliott <aelliott@qti.qualcomm.com>
The operand lists for these opcode require 1 byte per operand and are
usually small values that fit in 3-4 bits. This makes their storage
inefficient. In addition, many EmitNode/MorphNodeTo in the isel table
will use the same list of operand numbers.
This patch proposes to separate the operand lists into their own table
where they can be de-duplicated. The OPC_EmitNode/MorphNodeTo in the
main table will only store an index into this smaller table.
This is a reduced version of a suggestion from this very old FIXME.
d8d4096c0b/llvm/utils/TableGen/DAGISelMatcherGen.cpp (L1070)
For RISC-V this reduces the main table from 1437353 bytes to 1276015
bytes plus a 929 byte operand list table. A savings of about 11%.
For X86 this reduces the main table from 719237 bytes to 623612 bytes
plus a 1042 byte operand list table. A savings of about 11%.
I expect further savings could be had by moving more bytes over.
- Adopt IfDefEmitter in IntrinsicEmitter.
- Remove #undef for various flags in Intrinsics.cpp/Intrinsics.h as the
TableGen generated code does that now.
Support GlobalISel and switch to checking `nnan` flag on instruction
instead of TargetOptions.
Instruction are renamed to v_cvt_floor and v_cvt_nearest on gfx11+
so add gfx11 tests as well.
This Change makes `RegState` into an enum class, with bitwise operators.
It also:
- Updates declarations of flag variables/arguments/returns from
`unsigned` to `RegState`.
- Updates empty RegState initializers from 0 to `{}`.
If this is causing problems in downstream code:
- Adopt the `RegState getXXXRegState(bool)` functions instead of using a
ternary operator such as `bool ? RegState::XXX : 0`.
- Adopt the `bool hasRegState(RegState, RegState)` function instead of
using a bitwise check of the flags.
Also handle the case when Pat->Child(i) is null in
CodeGenDAGPatterns::FindPatternInputsAndOutputs().
Fixes issue #157619 : TableGen asserts on invalid cast
…unctions (#176253)"
This reverts commit cf68af690ba7f98943e5f0f5cb39a91868d62098.
It increased the compilation time for a number of clang source files.
See comments in https://github.com/llvm/llvm-project/pull/176253 for
more information.