This is an implementation of the saturating fp to int conversions for
GlobalISel. On AArch64 the converstion instrctions work this way,
producing saturating results. LegalizerHelper::lowerFPTOINT_SAT is
ported from SDAG.
AArch64 has a lot of existing tests for fptosi_sat, covering a wide
range of types. I have tried to make most of them work all at once, but
a few fall back due to other missing features such as f128 handling for
min/max.
Add PrintError and family overload that accepts a print function. This
avoids constructing potentially long strings for passing into these
print functions.
Fail if we see an intrinsic that returns more than the supported number
of return values.
Intrinsics can return only upto a certain nyumber of values, as defined
by the `IIT_RetNumbers` list in `Intrinsics.td`. Currently, if we define
an intrinsic that exceeds the limit, llvm-tblgen crashes. Instead, read
this limit and fail if it's exceeded with a proper error message.
The InitUndef pass currently uses target-specific pseudo instructions,
with one pseudo per register class.
Instead, add a generic pseudo instruction, which can be used by all
targets and register classes.
Eliminate unused `isTarget` field in Intrinsic record.
Eliminate `isOverloaded`, `Types` and `TypeSig` fields from the record,
as they are already available through the `TypeInfo` field. Change
intrinsic emitter code to look for this info using fields of the
`TypeInfo` record attached to the `Intrinsic` record.
Fix several intrinsic related unit tests to source the `Intrinsic` class
def from Intrinsics.td as opposed to defining a skeleton in the test.
This eliminates some duplication of information in the Intrinsic class,
as well as reduces the memory allocated for record fields, resulting in
~2% reduction (though that's not the main goal).
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
- Use formatv() and raw string literals to simplify emission code.
- Use range based for loops and structured bindings to simplify loops.
- Use const Pointers to Records.
- Rename `ComputeFixedEncoding` to `ComputeTypeSignature` to reflect
what the function actually does, cnd change it to return a vector.
- Use reverse() and range based for loop to pack 8 nibbles into 32-bits.
- Rename some variables to follow LLVM coding standards.
- For function memory effects, print human readable effects in comment.
- Detect invalid macro names specified on command line and fail if one
found.
- Specifically, -DXYZ=1 for example, will fail instead is being silently
accepted.
Add a dummy resolver to resolve references outside records. This invokes
Fold() with isFinal to force resolution.
Fixes#102447
Co-authored-by: Akshat Oke <Akshat.Oke@amd.com>
- Extract code for sorting and checking duplicate Records into a helper
function and update `collectProcModels` to use the helper.
- Update `FeatureKeyValues` to:
(a) Remove code for duplicate checks and use the helper.
(b) Trim features with empty name explicitly to be able to use
the helper.
- Make the sorting deterministic by using record name as a secondary
key for sorting, and re-enable SubtargetFeatureUniqueNames.td test
that was disabled due to the non-determinism of the error messages.
- Change wording of error message when duplicate records are found to
be source code position agnostic, since `First` may not be before
`Second` lexically.
- This test was reported as failing in ASAN builds, with
non-deterministic order of the error message and note expected.
- This is due to llvm::sort() being unstable.
- Temporarily disable the test while its been fixed.
- Keep track of last definition of a feature in a `DenseMap` and use
it to report a better error message when a duplicate feature is found.
- Use StringMap instead of a std::map in `EmitStageAndOperandCycleData`
- Add a unit test to check if duplicate names are flagged.
We don't really care about the precise label values for specific tests
like this, so just match any number.
The comments are enough to say where we jump in every case, and we have
plenty of tests that check basic match table features (on top of being
just tested by the fact GlobalISel CodeGen isn't broken).
RFC:
https://discourse.llvm.org/t/rfc-extend-machine-value-type-from-uint8-t-to-uint16-t/80274
compile-time-tracker:
https://llvm-compile-time-tracker.com/compare.php?from=4b9fab591916eec9fd1942f37afe3b137b564089&to=177d28247efe5a4d59a8d8150b4daf01e4f57d74&stat=wall-time
Currently 208 out of 256 MVTs are used, it will be run out soon, so
ultimately we need to extend the original `MVT::SimpleValueType` from
`uint8_t` to `uint16_t` to accomodate more types.
The `MatcherTable` uses `unsigned char` for encoding the matcher code,
so the extended MVTs are no longer fit into the table, thus we need to
use VBR to encode them as we do on others that are wider than 8 bits.
The statistics below shows the difference of "Total Array size" of the
matcher table that appears in every files:
```
Table Before After Change(%)
WebAssemblyGenDAGISel.inc 23576 23775 0.844
NVPTXGenDAGISel.inc 173498 173498 0
RISCVGenDAGISel.inc 2179121 2369929 8.756
AVRGenDAGISel.inc 2754 2754 0
PPCGenDAGISel.inc 163315 163617 0.185
MipsGenDAGISel.inc 47280 47447 0.353
SystemZGenDAGISel.inc 56243 56461 0.388
AArch64GenDAGISel.inc 467893 487830 4.261
MSP430GenDAGISel.inc 8069 8069 0
LoongArchGenDAGISel.inc 78928 79131 0.257
XCoreGenDAGISel.inc 3432 3432 0
BPFGenDAGISel.inc 3733 3733 0
VEGenDAGISel.inc 65174 66456 1.967
LanaiGenDAGISel.inc 2067 2067 0
X86GenDAGISel.inc 628787 636987 1.304
ARMGenDAGISel.inc 170968 171036 0.040
HexagonGenDAGISel.inc 155764 155764 0
SparcGenDAGISel.inc 5762 5798 0.625
AMDGPUGenDAGISel.inc 504356 504463 0.021
R600GenDAGISel.inc 29785 29785 0
```
The statistics below shows the runtime peak memory usage by compiling a
simple C program:
`/bin/time -v clang -target $TARGET -O3 -c test.c`
```
int test(int a) {
return a * 3;
}
```
```
Target Before(kbytes) After(kbytes) Change(%)
wasm64 110172 110088 -0.076
nvptx64 109784 109980 0.179
riscv64 114020 113656 -0.319
avr 110352 110068 -0.257
ppc64 112612 112476 -0.120
mips64 113588 113668 0.070
systemz 110860 110760 -0.090
aarch64 113704 113432 -0.239
msp430 110284 110200 -0.076
loongarch64 111052 110756 -0.267
xcore 108340 108020 -0.295
bpf 110620 110708 0.080
ve 110960 110920 -0.036
lanai 110180 109960 -0.200
x86_64 113640 113304 -0.296
arm64 113540 113172 -0.324
hexagon 114620 114684 0.056
sparc 110412 110136 -0.250
amdgcn 118164 117144 -0.863
r600 111200 110508 -0.622
```
Generating the mapping from a register class to a register bank is
complex:
- there can be lots of register classes
- the mapping may be ambiguos
- a register class can span several register banks (e.g. a register
class containing all registers)
- the type information is not enough to decide which register bank to
map to (e.g. a register class containing floating point and vector
registers, and all register can represent a f64 value)
The approach taken here is to encode the register banks in an array
indexed by the ID of the register class. To save space, the entries are
packed into chunks of size 2^n.
Base on https://github.com/riscv-non-isa/riscv-c-api-doc/pull/74.
This patch defines the groupid/bitmask in RISCVFeatures.td and generates
the corresponding table in RISCVTargetParserDef.inc.
The groupid/bitmask of extensions provides an abstraction layer between
the compiler and runtime functions.
In the RISC-V architecture, multiple vendor-specific Control and Status
Registers (CSRs) share the same encoding. However, the existing lookup
function, which currently returns only a single result, falls short.
During disassembly, it consistently returns the first CSR encountered,
which may not be the correct CSR for the subtarget.
To address this issue, we modify the function definition to return a
range of results. These results can then be iterated upon to identify
the CSR that best fits the subtarget’s feature requirements. The
behavior of this new definition is controlled by a variable named
`ReturnRange`, which defaults to `false`.
Specifically, this patch introduces support for emitting a new lookup
function for the primary key. This function returns a pair of iterators
pointing to the first and last values, providing a comprehensive range
of values that satisfy the query
Some organizations have added operators downstream, and the 3 tests in
this PR tend to fail with off-by-n errors (with n being the number of
added operators) periodically.
To alleviate this. the proposed change is taking advantage of the fact
that the 2nd and 3rd element in a switch table represent the lower and
upper bounds of the operator id, and that the offset of the first byte
of the encoded operations is a function of these 2 bounds.
Additionally, we the default label offset is encoded in a FileCheck
variable.
This re-applies #94241 after fixing buildbot failure, see
https://lab.llvm.org/buildbot/#/builders/51/builds/570
According to standard, `constexpr` variables and `const` variables
initialized with constant expressions can be used in lambdas w/o
capturing - see https://en.cppreference.com/w/cpp/language/lambda.
However, MSVC used on buildkite seems to ignore that rule and does not
allow using such uncaptured variables in lambdas: we have "error C3493:
'Mask16' cannot be implicitly captured because no default capture mode
has been specified" - see
https://buildkite.com/llvm-project/github-pull-requests/builds/73238
Explicitly capturing such a variable, however, makes buildbot fail with
"error: lambda capture 'Mask16' is not required to be captured for this
use [-Werror,-Wunused-lambda-capture]" - see
https://lab.llvm.org/buildbot/#/builders/51/builds/570.
Fix both cases by using `0xffff` value directly instead of giving a name
to it.
Original PR description below.
Depends on #94240.
Define the following pseudos for lowering ptrauth constants in code:
- non-`extern_weak`:
- no GOT load needed: `MOVaddrPAC` - similar to `MOVaddr`, with added
PAC;
- GOT load needed: `LOADgotPAC` - similar to `LOADgot`, with added PAC;
- `extern_weak`: `LOADauthptrstatic` - similar to `LOADgot`, but use a
special stub slot named `sym$auth_ptr$key$disc` filled by dynamic linker
during relocation resolving instead of a GOT slot.
---------
Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
Depends on #94240.
Define the following pseudos for lowering ptrauth constants in code:
- non-`extern_weak`:
- no GOT load needed: `MOVaddrPAC` - similar to `MOVaddr`, with added
PAC;
- GOT load needed: `LOADgotPAC` - similar to `LOADgot`, with added PAC;
- `extern_weak`: `LOADauthptrstatic` - similar to `LOADgot`, but use a
special stub slot named `sym$auth_ptr$key$disc` filled by dynamic linker
during relocation resolving instead of a GOT slot.
---------
Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
Currently TableGen does not directly detect duplicate synthesized
registers as can happen in this example:
def GPR128 : RegisterTuples<[sub0, sub1, sub2, sub3],
[(decimate (shl GPR32, 0), 1),
(decimate (shl GPR32, 1), 1),
(decimate (shl GPR32, 2), 1),
(decimate (shl GPR32, 3), 1)]>;
def GPR128_Aligned : RegisterTuples<[sub0, sub1, sub2, sub3],
[(decimate (shl GPR32, 0), 4),
(decimate (shl GPR32, 1), 4),
(decimate (shl GPR32, 2), 4),
(decimate (shl GPR32, 3), 4)]>;
TableGen does fail, but with an unrelated and difficult to understand
error that happens downstream of tuple expansion:
"error: No SubRegIndex for R0_R1_R2_R3 in R0_R1_R2_R3".
This patch detects the problem directly during expansion and emits an
error pointing the user to the actual issue:
"error: Register tuple redefines register 'R0_R1_R2_R3'".
As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.
This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:
When we compare sNaN vs NUM:
ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
+0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.
So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.
Half-fix: #93033
The categories are primarily meant for OpenMP, where the spec assigns a
category to each directive. It's one of declarative, executable,
informational, meta, subsidiary, and utility.
These will be used in clang to avoid listing directives belonging to
certain categories by hand.
---------
Co-authored-by: Valentin Clement (バレンタイン クレメン) <clementval@gmail.com>
This predicate tells GlobalISelEmitter and DAGISelEmitter to check that
the instruction to emit has only one use of its result. This can be used
on a PatFrag instead of defining custom predicates for both emitters per
record that requires it.
Combiners that use C++ code in their "apply" pattern only use that. They
never mix it with MIR patterns as that has little added value.
This patch restricts C++ apply code so that if C++ is used, we cannot
use MIR patterns or builtins with it. Adding this restriction allows us
to merge calls to match and apply C++ code together, which in turns
makes it so we can just have MatchData variables on the stack.
So before, we would have
```
GIM_CheckCxxInsnPredicate // match
GIM_CheckCxxInsnPredicate // apply
GIR_Done
```
Alongside a massive C++ struct holding the MatchData of all rules
possible (which was a big space/perf issue).
Now we just have
```
GIR_DoneWithCustomAction
```
And the function being ran just does
```
unsigned SomeMatchData;
if (match(SomeMatchData))
apply(SomeMatchData)
```
This approach solves multiple issues in one:
- MatchData handling is greatly simplified and more efficient, "don't
pay for what you don't use"
- We reduce the size of the match table
- Calling C++ code has a certain overhead (we need a switch), and this
overhead is only paid once now.
Handling of C++ code inside PatFrags is unchanged though, that still
emits a `GIM_CheckCxxInsnPredicate`. This is completely fine as they
can't use MatchDatas.