- Change InstrInfoEmitter to emit OpName as an enum class
instead of an anonymous enum in the OpName namespace.
- This will help clearly distinguish between values that are
OpNames vs just operand indices and should help avoid
bugs due to confusion between the two.
- Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES.
- Emit declaration of getOperandIdx() along with the OpName
enum so it doesn't have to be repeated in various headers.
- Also updated AMDGPU, RISCV, and WebAssembly backends
to conform to the new definition of OpName (mostly
mechanical changes).
- Use range for loops and `enumerate` in a few places.
- Use `StringRef` for `TargetName` in `InstrInfoEmitter::run`.
- Use `\n` character for new line instead of string.
- Use StringRef in `InstrNames` (instead of std::string) and
avoid string copies.
- Detect whether logical operand mapping/named operand mappings have
been enabled in a previous pass over instructions and execute the
relevant emission code only if those mappings are enabled.
- For these mappings, skip the fixed set of predefined instructions as
they won't have these mappings enabled.
- Emit operand type mappings only for X86 target, as they are only used
by X86 and look for X86 specific `X86MemOperand`.
- Cleanup `emitOperandTypeMappings` code: remove code to handle empty
instruction list and use range for loops.
After f9250401ef120a4605ad67bb43d3b25500900498, this function is
tail recursive so it was straightforward to convert this to iteratively
walk the linkd list.
ContractNodes recursively walks forward through a linked list. During
this recursion, Matchers are combined into other Matchers.
Previously the formation of MoveSiblingMatcher was after the
recursive call so it occurred as we were unwinding. If a
MoveSiblingMatcher was formed, we would recursively walk forward
to the end of the linked list again which isn't efficient.
To make this more efficient, move the formation of MoveSiblingMatcher
to the forward pass. Add additional rules to unfold MoveSiblingMatcher
if it would be more efficient to use CheckChildType, CheckChildInteger,
CheckChildSame, etc.
As an added benefit, this makes the function tail recursive which
the compiler can better optimize.
- Emit C++17 nested namespaces.
- Shorten the binary search table name to just `Table` since its
declared in the scope of each search function.
- Use `using namespace XXX` in the search function to avoid emitting the
Target Inst Namespace prefix in the table entries.
- Add short-cut handling of `TableSize` == 0 case (verified in Hexagon
target).
- Use `SetVector` in `ColFieldValueMap` to get automatic deduplication
and eliminate manual deduplication code.
- Use range for loops.
I tried to keep it readable by making the width of the column with the
index always enough to contain the largest number.
That way things don't shift to the right every time a new digit appears,
it remains consistent.
Tests don't break because this only affects the beginning of the line
and FileCheck doesn't care about what comes before for the most part.
Example of the new output:
```
/* 758359 */ // Label 9988: @758359
/* 758359 */ GIM_Try, /*On fail goto*//*Label 9989*/ GIMT_Encode4(758435), // Rule ID 6715 //
/* 758364 */ GIM_CheckConstantInt8, /*MI*/0, /*Op*/2, 0,
/* 758368 */ // MIs[0] offset
```
Fixes#119177
Add declarations of SDTypeConstraint's operator== and operator< to the
llvm namespace. These are declared as friends inside the class which
makes them part of the enclosing namespace, but gcc wants it to be more
explicit.
Fixes#125537.
- Use StringRef::str() instead of std::string(StringRef).
- Use const pointers for `Candidates` in getSuperRegForSubReg().
- Make `AsmParserCat` and `AsmWriterCat` static.
- Use enumerate() in `ComputeInstrsByEnum` to assign inst enums.
- Use range-based for loops.
This patch adds a new option `-aarch64-enable-zpr-predicate-spills`
(which is disabled by default), this option replaces predicate spills
with vector spills in streaming[-compatible] functions.
For example:
```
str p8, [sp, #7, mul vl] // 2-byte Folded Spill
// ...
ldr p8, [sp, #7, mul vl] // 2-byte Folded Reload
```
Becomes:
```
mov z0.b, p8/z, #1
str z0, [sp] // 16-byte Folded Spill
// ...
ldr z0, [sp] // 16-byte Folded Reload
ptrue p4.b
cmpne p8.b, p4/z, z0.b, #0
```
This is done to avoid streaming memory hazards between FPR/vector and
predicate spills, which currently occupy the same stack area even when
the `-aarch64-stack-hazard-size` flag is set.
This is implemented with two new pseudos SPILL_PPR_TO_ZPR_SLOT_PSEUDO
and FILL_PPR_FROM_ZPR_SLOT_PSEUDO. The expansion of these pseudos
handles scavenging the required registers (z0 in the above example) and,
in the worst case spilling a register to an emergency stack slot in the
expansion. The condition flags are also preserved around the `cmpne` in
case they are live at the expansion point.
The loop at the top of FactorNodes creates additional variables to deal
with needing to use a pointer to a unique_ptr instead of a reference.
Encapsulate this to its own function for better scoping.
This also allows us to directly skip this loop when we already know we
have a ScopeMatcher.
The code that moves CheckOpcode before CheckType/CheckChildType/RecordDwith
was running after ContractNodes started unwinding its recursion. If a
move occurs we would start a new recursion going forward
through the list again. I don't believe this can lead to any new
combines so it was just wasted work.
This patch moves the code earlier so it doesn't start a new recursion.
- Looks like this sentinel value is used in some downstream backends, so
restore emitting it.
- It now also has the correct value (earlier code may have emitted an
incorrect value for OPERAND_LAST and hence it was removed in
https://github.com/llvm/llvm-project/pull/124960)
MatchTableRecord stores a 64-bit RawValue. However, this field is only
needed by a small part of the code (jump table generation).
Create a separate RecordAndValue structure that is used in just the
necessary places.
Based on massif, this reduces memory usage on RISCVGenGlobalISel.inc by
about 100MB (to 2.15GB).
- Assign `OpName` enum values in the same alphabetical order in which
they are emitted.
- Get rid of OPERAND_LAST which is not used anywhere.
- Inline `initOperandMapData` into `emitOperandNameMappings` to help see
clearly how various maps are computed.
- Emit the static `OperandMap` table as int8_t when possible. This
should help reduce the static size by 50% in the common case.
- Change maps/vectors to use StringRef instead of std::string to avoid
unnecessary copies.
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
Historically, the main example of *very* large string tables used the
`EmitCharArray` to work around MSVC limitations with string literals,
but that was switched (without removing the API) in order to consolidate
on a nicer emission primitive.
While this large string table in `IntrinsicsImpl.inc` seems to compile
correctly on MSVC without the work around in `EmitCharArray` (and that
this PR adds back to the nicer emission path), other users have
repeatedly hit this MSVC limitation as you can see in the discussion on
PR https://github.com/llvm/llvm-project/pull/120534. This PR teaches the
string offset table emission to look at
the size of the table and switch to the char array emission strategy
when the table becomes too large.
This work around does have the downside of making compile times worse
for large string tables, but that appears unavoidable until we can
identify known good MSVC versions and switch to requiring them for all
LLVM users. It also reduces searchability of the generated string table
-- I looked at emitting a comment with each string but it is tricky
because the escaping rules for an inline comment are different from
those of of a string literal, and there's no real way to turn the string
literal into a comment.
While improving the output in this way, also clean up the output to not
emit an extraneous empty string at the end of the string table, and
update the `StringTable` class to not look for that. It isn't actually
used by anything and is wasteful.
This PR also switches the `IntrinsicsImpl.inc` string tables over to the
new `StringTable` runtime abstraction. I didn't want to do this until
landing the MSVC workaround in case it caused even this example to start
hitting the MSVC bug, but I wanted to switch here so that I could
simplify the API for emitting the string table with the workaround
present. With the two different emission strategies, its important to
use a very exact syntax and that seems better encapsulated in the API.
Last but not least, the `SDNodeInfoEmitter` is updated, including its
tests to match the new output.
This PR should unblock landing
https://github.com/llvm/llvm-project/pull/120534 and letting us switch
all of
Clang's builtins to use string tables. That PR has all the details
motivating the overall effort.
Follow-up patches will try to consolidate the remaining users onto the
single interface, but those at least were easy to separate into
follow-ups and keep this PR somewhat smaller.
We currently have an issue where bf16 patters can be used to match fp16
types, as GISel does not know about the difference between the two. This
patch explicitly disables them to make sure that they are never used.
The opposite can also happen too, where fp16 patterns are used for
operators that should be bf16. So this also changes any operations with
bf16 types to now cause a fallback to SDAG.
The pass setup for GISel has been slightly adjusted to make sure that a
verify pass does not get added between AMD-SDAG and SIFixSGPRCopiesPass,
which otherwise can cause verifier issues when falling back.
- Bail out of TableGen if any asserts fail before running the backend.
- Add asserts to validate that the `Objects` and `Modes` lists for
various `HwModeSelect` subclasses are of same length.
- Eliminate equivalent check in CodeGenHWModes.cpp
Use this to remove a linear scan from CodeGenProcModel::hasReadOfWrite.
This reduces build time of RISCVGenSubtargetInfo.inc on by machine from
~6 seconds to ~3 seconds.
2 of the 3 callers of each of these already had a reference they
converted to index. Use that reference and make the one caller
that only has an index responsible for looking up the reference from it.
Now that we have a dedicated abstraction for string tables, switch the
option parser library's string table over to it rather than using a raw
`const char*`. Also try to use the `StringTable::Offset` type rather
than a raw `unsigned` where we can to avoid accidental increments or
other issues.
This is based on review feedback for the initial switch of options to a
string table. Happy to tweak or adjust if desired here.
Use this to improve performance of SubtargetEmitter::findWriteResources
and SubtargetEmitter::findReadAdvance. Now we can do a map lookup
instead of a linear search through all WriteRes/ReadAdvance records.
This reduces the build time of RISCVGenSubtargetInfo.inc on my
machine from 43 seconds to 10 seconds.
This patch adds a simplistic backend that gathers all target-specific
SelectionDAG nodes and emits descriptions for most of them.
This includes generating node enumeration, node names, and information
about node "prototype" that can be used to verify that a node is valid.
The patch also extends SDNode by adding target-specific flags, which are
also included in the generated tables.
Part of #119709,
[RFC](https://discourse.llvm.org/t/rfc-tablegen-erating-sdnode-descriptions/83627).
Pull Request: https://github.com/llvm/llvm-project/pull/123002
- Redefines `DXILAttribute` to denote a function attribute, compatible
to how it was define in DXC/LLVM 3.7
- Fix how `DXILAttribute` is emitted to be a struct of set attributes
instead of an "or" of the enums
- Implement the lowering of `DXILAttribute` to LLVM function attributes
in `DXILOpBuilder.cpp`. A custom mapping is defined.
- Audit all current ops to specify the correct attributes consistent
with DXC. This is done here to allow for testing.
- Update testcases in `llvm/test/CodeGen/DirectX` of all ops with
attributes to match that attributes are set
- Update testcases of ops that had previously incorrectly set attributes
to check there is no attributes set
- Defines `DXILProperty` to denote the other type of attributes from DXC
used to query properties.
- Emit `DXILProperty` as a struct of set attributes.
- Updates `DXIL.td` to specify applicable `DXILProperty`s on ops
Note: `DXILProperty` was referred to as 'queryable attributes' in design
discussion. Changed to property to allow for better expression in
`DXIL.td`
Resolves#114461Resolves#115912
Combine the EarlyOut and IsContiguous range check.
Also avoid "comparison is always false" warnings in emitted code when
the lower-bound check is against 0.