From OpenMP 6.0 features list
- OpenMP directives in concurrent loop regions
- atomics constructs on concurrent loop regions
- Lift nesting restriction on concurrent loop
Testing
- Updated test/OpenMP/for_order_messages.cpp
- check-all
This patch does two things.
1. Previously, when checking driver arguments, we emitted an error for
unsupported values of `-mbranch-protection` when using pauthtest ABI.
The reason for that was ptrauth-returns being enabled as part of
pauthtest. This patch changes the check against pauthtest to a check
against ptrauth-returns.
2. Similarly, check against values of the following function attribute
which are unsupported with ptrauth-returns:
`__attribute__((target("branch-protection=XXX`. Note that existing
`validateBranchProtection` function is used, and current behavior is to
ignore the unsupported attribute value, so no error is emitted.
This requires adding support to the general builtins emission for
producing prefixed builtin infos separately from un-prefixed which is
a bit crufty. But we don't currently have any good way of having a more
refined model than a single hard-coded prefix string per TableGen
emission. Something more powerful and/or elegant is possible, but this
is a fairly minimal first step that at least allows factoring out the
builtin prefix for something like X86.
This moves the main builtins and several targets to use nice generated
string tables and info structures rather than X-macros. Even without
obvious prefixes factored out, the resulting tables are significantly
smaller and much cheaper to compile with out all the X-macro overhead.
This leaves the X-macros in place for atomic builtins which have a wide
range of uses that don't seem reasonable to fold into TableGen.
As future work, these should move to their own file (whether as X-macros
or just generated patterns) so the AST headers don't have to include all
the data for other builtins.
This leverages the sharded structure of the builtins to make it easy to
directly tablegen most of the AArch64 and ARM builtins while still using
X-macros for a few edge cases. It also extracts common prefixes as part
of that.
This makes the string tables for these targets dramatically smaller.
This is especially important as the SVE builtins represent (by far) the
largest string table and largest builtin table across all the targets in
Clang.
This both reapplies #118734, the initial attempt at this, and updates it
significantly.
First, it uses the newly added `StringTable` abstraction for string
tables, and simplifies the construction to build the string table and
info arrays separately. This should reduce any `constexpr` compile time
memory or CPU cost of the original PR while significantly improving the
APIs throughout.
It also restructures the builtins to support sharding across several
independent tables. This accomplishes two improvements from the
original PR:
1) It improves the APIs used significantly.
2) When builtins are defined from different sources (like SVE vs MVE in
AArch64), this allows each of them to build their own string table
independently rather than having to merge the string tables and info
structures.
3) It allows each shard to factor out a common prefix, often cutting the
size of the strings needed for the builtins by a factor two.
The second point is important both to allow different mechanisms of
construction (for example a `.def` file and a tablegen'ed `.inc` file,
or different tablegen'ed `.inc files), it also simply reduces the sizes
of these tables which is valuable given how large they are in some
cases. The third builds on that size reduction.
Initially, we use this new sharding rather than merging tables in
AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system
works, as without further changes these still push scaling limits.
Subsequent commits will more deeply leverage the new structure,
including using the prefix capabilities which cannot be easily factored
out here and requires deep changes to the targets.
This patch adds intrinsics for the tcgen05 alloc/dealloc
family of PTX instructions. This patch also adds an
addrspace 6 for tensor memory which is used by
these intrinsics.
lit tests are added and verified with a ptxas-12.8 executable.
Documentation for these additions is also added in NVPTXUsage.rst.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
If we have +sme but not +sve, we would not set vscale_range on
functions. It should be valid to apply it with the same range with just
+sme, which can help mitigate some performance regressions in cases such
as scalable vector bitcasts (https://godbolt.org/z/exhe4jd8d).
Before this change, we would set this to Clang's default of {64, 64}.
Now, we explicitly set it to {256, 64} which matches our ARM behavior
for ARMv8 targets and GCC's behavior for AArch64 targets.
This reverts commit 928cad49beec0120686478f502899222e836b545 i.e.,
relands dccd27112722109d2e2f03e8da9ce8690f06e11b, with a fix to avoid
use-after-scope by changing the lambda to capture by value.
This adds the plumbing between -fsanitize-skip-hot-cutoff (introduced in
https://github.com/llvm/llvm-project/pull/121619) and
LowerAllowCheckPass<cutoffs> (introduced in
https://github.com/llvm/llvm-project/pull/124211).
The net effect is that -fsanitize-skip-hot-cutoff now combines the
functionality of -ubsan-guard-checks and
-lower-allow-check-percentile-cutoff (though this patch does not remove
those yet), and generalizes the latter to allow per-sanitizer cutoffs.
Note: this patch replaces Intrinsic::allow_ubsan_check's
SanitizerHandler parameter with SanitizerOrdinal; this is necessary
because the hot cutoffs are specified in terms of SanitizerOrdinal
(e.g., null, alignment), not SanitizerHandler (e.g., TypeMismatch).
Likewise, CodeGenFunction::EmitCheck is changed to emit
allow_ubsan_check() for each individual check.
---------
Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
Thread-local code generation requires constant pools because most of the
relocations needed for it operate on data, so it cannot be used with
-mexecute-only (or -mpure-code, which is aliased in the driver).
Without this we hit an assertion in the backend when trying to generate
a constant pool.
This switches them to use the common builtin TableGen emission.
The fancy feature string preprocessor tricks are replaced with a fairly
direct translation into TableGen.
All of the actual definitions were created using a quite hack-y Python
script that was never intended to be productionized. It preserves the
order, spacing, and even comments from the original files. For
posterity, the script used is here:
https://gist.github.com/chandlerc/f53c7d735e33eecf388529bd9a6010df
The original `.def` file appears to be generated by some out-of-tree
`iset.py` script, which because it is out of tree I couldn't update. It
should be very straightforward though to update it to generate a similar
structure as was used to produce the `.td` file.
In addition to helping move towards TableGen for all of the builtins,
these builtins in particular can be *much* more efficiently handled
using TableGen when we start emitting string tables for them because it
allows de-duplicating all of the feature strings.
The commit sha parent at the time the PR was made is
7253c6fde498c4c9470b681df47d46e6930d6a02 and at that commit, the
resulting TableGen file produces a `.inc` file that only differs in
whitespace and the order of the builtins defined.
This is take two of #70976. This iteration of the patch makes sure that
custom
diagnostics without any warning group don't get promoted by `-Werror` or
`-Wfatal-errors`.
This implements parts of the extension proposed in
https://discourse.llvm.org/t/exposing-the-diagnostic-engine-to-c/73092/7.
Specifically, this makes it possible to specify a diagnostic group in an
optional third argument.
This switches them to use tho common TableGen layer, extending it to
support the missing features needed by the NVPTX backend.
The biggest thing was to build a TableGen system that computes the
cumulative SM and PTX feature sets the same way the macros did. That's
done with some string concatenation tricks in TableGen, but they worked
out pretty neatly and are very comparable in complexity to the macro
version.
Then the actual defines were mapped over using a very hacky Python
script. It was never productionized or intended to work in the future,
but for posterity:
https://gist.github.com/chandlerc/10bdf8fb1312e252b4a501bace184b66
Last but not least, there was a very odd "bug" in one of the converted
builtins' prototype in the TableGen model: it didn't handle uses of `Z`
and `U` both as *qualifiers* of a single type, treating `Z` as its own
`int32_t` type. So my hacky Python script converted `ZUi` into two
types, an `int32_t` and an `unsigned int`. This produced a very wrong
prototype. But the tests caught this nicely and I fixed it manually
rather than trying to improve the Python script as it occurred in
exactly one place I could find.
This should provide direct benefits of allowing future refactorings to
more directly leverage TableGen to express builtins more structurally
rather than textually. It will also make my efforts to move builtins to
string tables significantly more effective for the NVPTX backend where
the X-macro approach resulted in *significantly* less efficient string
tables than other targets due to the long repeated feature strings.
Since `__STDC_NO_THREADS__` is a reserved identifier,
- If `MSVC version < 17.9`
- C version < C11(201112L)
- When `<threads.h>` is unavailable `!__has_include(<threads.h>)` is
`__has_include` is defined.
Closes#115529
Introduces a new address space `hlsl_constant(2)` for constant buffer
declarations.
This address space is applied to declarations inside `cbuffer` block.
Later on, it will also be applied to `ConstantBuffer<T>` syntax and the
default `$Globals` constant buffer.
Clang codegen translates constant buffer declarations to global
variables and loads from `hlsl_constant(2)` address space. More work
coming soon will include addition of metadata that will map these
globals to individual constant buffers and enable their transformation
to appropriate constant buffer load intrinsics later on in an LLVM pass.
Fixes#123406
Previously, they used a hand-rolled Pascal-string encoding different
from all the other string tables produced from TableGen. This moves them
to use the newly introduced runtime abstraction, and enhances that
abstraction to support iterating over the string table as used in this
case.
From what I can tell the Pascal-string encoding isn't critical here to
avoid expensive `strlen` calls, so I think this is a simpler and more
consistent model. But if folks would prefer a Pascal-string style
encoding, I can instead work to switch the `StringTable` abstraction
towards that. It would require some tricky tradeoffs though to make it
reasonably general: either using 4 bytes instead of 1 byte to encode the
size, or having a fallback to `strlen` for long strings.
Clang uses a long-time special handling of the case where 3 element
vector loads and stores are performed as 4 element, and then a
shufflevector is used to extract the used elements. Odd sized vector
codegen should now work reasonably well.
This patch removes the compiler argument `-fpreserve-vec3-type` and adds
a target hook to determine if the special handling of vector type is
needed.
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
This patch adds support for the next-generation arch15
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Detection of arch15 as host processor.
- Assembler/disassembler support for new instructions.
- Exploitation of new instructions for code generation.
- New vector (signed|unsigned|bool) __int128 data types.
- New LLVM intrinsics for certain new instructions.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining __VEC__ == 10305.
Note: No currently available Z system supports the arch15
architecture. Once new systems become available, the
official system name will be added as supported -march name.
When building with -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON with a recent
libstdc++ (e.g. from gcc 13.3.0) the testcase
clang/test/Misc/warning-flags-tree.c fail with the message:
```
+ diagtool tree --internal
.../include/c++/13.3.0/bits/stl_algo.h:2013:
In function:
_ForwardIterator std::lower_bound(_ForwardIterator, _ForwardIterator,
const _Tp &, _Compare) [_ForwardIterator = const
diagtool::DiagnosticRecord *, _Tp = diagtool::DiagnosticRecord, _Compare
= bool (*)(const diagtool::DiagnosticRecord &, const
diagtool::DiagnosticRecord &)]
Error: elements in iterator range [first, last) are not partitioned by the predicate __comp and value __val.
Objects involved in the operation:
iterator "first" @ 0x7ffea8ef2fd8 {
}
iterator "last" @ 0x7ffea8ef2fd0 {
}
```
The reason for this error is that std::lower_bound is called on
BuiltinDiagnosticsByID without it being entirely sorted. Calling
std::lower_bound If the range is not sorted, the behavior of this
function is undefined. This is detected when building with expensive
checks.
To make BuiltinDiagnosticsByID sorted we need to slightly change the
order the inc-files are included. The include of
DiagnosticCrossTUKinds.inc in DiagnosticNames.cpp is included too early
and should be moved down directly after DiagnosticCommentKinds.inc.
As a part of pull request the includes that build up
BuiltinDiagnosticsByID table are extracted into a common wrapper header
file AllDiagnosticKinds.inc that is used by both clang and diagtool.
The 20204-12 ISA update release adds a new feature: FEAT_SSVE_BitPerm,
which allows the sve-bitperm instructions to run in streaming mode.
It also removes the requirement of FEAT_SVE2 for FEAT_SVE_BitPerm. The
sve2-bitperm feature is now an alias for sve-bitperm and sve2.
A new feature flag sve-bitperm is added to reflect the change that the
instructions under FEAT_SVE_BitPerm are supported if:
on non streaming mode with FEAT_SVE2 and FEAT_SVE_BitPerm or
in streaming mode with FEAT_SME and FEAT_SSVE_BitPerm
arm-apple-none-macho uses DarwinTargetInfo which provides several Apple
specific macros. arm64-apple-none-macho however just uses the generic
AArch64leTargetInfo and doesn't get any of those macros. It's not clear
if everything from DarwinTargetInfo is desirable for
arm64-apple-none-macho, so make an AppleMachOTargetInfo to hold the
generic Apple macros and a few other basic things.
This adds a function to parse weighted sanitizer flags (e.g.,
`-fsanitize-blah=undefined=0.5,null=0.3`) and adds the plumbing to apply
that to a new flag, `-fsanitize-skip-hot-cutoff`.
`-fsanitize-skip-hot-cutoff` currently has no effect; future work will
use it to generalize ubsan-guard-checks (originally introduced in
5f9ed2ff8364ff3e4fac410472f421299dafa793).
---------
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
Currently, the more features a version has, the higher its priority is.
We are changing ACLE https://github.com/ARM-software/acle/pull/370 as
follows:
"Among any two versions, the higher priority version is determined by
identifying the highest priority feature that is specified in exactly
one of the versions, and selecting that version."
Embedded development often needs to use a different C standard library,
replacing the existing one normally passed as -internal-externc-isystem.
This works fine for an apple-macos target, but apple-none-macho doesn't
work because the MachO driver doesn't implement
AddClangSystemIncludeArgs to add the resource directory as
-internal-isystem like most other drivers do. Move most of the search
path logic from Darwin and DarwinClang down into an AppleMachO toolchain
between the MachO and Darwin toolchains.
Also define __MACH__ for apple-none-macho, as Swift expects all MachO
targets to have that defined.
Embedded development often needs to use a different C standard library,
replacing the existing one normally passed as -internal-externc-isystem.
This works fine for an apple-macos target, but apple-none-macho doesn't
work because the MachO driver doesn't implement
AddClangSystemIncludeArgs to add the resource directory as
-internal-isystem like most other drivers do. Move most of the search
path logic from Darwin and DarwinClang down into an AppleMachO toolchain
between the MachO and Darwin toolchains.
Also define \_\_MACH__ for apple-none-macho, as Swift expects all MachO
targets to have that defined.
- Update pr labeler so new SPIRV files get properly labeled.
- Add distance target builtin to BuiltinsSPIRV.td.
- Update TargetBuiltins.h to account for spirv builtins.
- Update clang basic CMakeLists.txt to build spirv builtin tablegen.
- Hook up sema for SPIRV in Sema.h|cpp, SemaSPIRV.h|cpp, and
SemaChecking.cpp.
- Hookup sprv target builtins to SPIR.h|SPIR.cpp target.
- Update GBuiltin.cpp to emit spirv intrinsics when we get the expected
spirv target builtin.
Consensus was reach in this RFC to add both target builtins and pattern
matching:
https://discourse.llvm.org/t/rfc-add-targetbuiltins-for-spirv-to-support-hlsl/83329.
pattern matching will come in a separate pr this one just sets up the
groundwork to do target builtins for spirv.
partially resolves
[#99107](https://github.com/llvm/llvm-project/issues/99107)
This PR follows https://github.com/llvm/llvm-project/pull/120831 for
x86-64.
Similar to that PR, this does a very mechanical port of X86 builtins to
TableGen. There is a *lot* of improvement available here to use TableGen
more effectively and collapse repeated structures. But those can now be
follow-up PRs that restructure *within* the `.td` file.
The current structure produces a file that exactly matches the original
X-macros except for the differences outlined in
https://github.com/llvm/llvm-project/pull/120831:
- Horizontal whitespace
- `long long` types now use `long long` outside of OpenCL, but switch to
`long` in OpenCL where relevant.
Otherwise, only the order of builtins change, and no tests regress.
The goal is to make incremental (if small) progress towards fully
TableGen'ed builtins, and to unblock #120534 by gaining access to more
powerful TableGen-based representations.
The bulk `.td` file addition was generated with the help of a very rough
Python script. That script made no attempt to be robust or reusable, it
specifically handled only the cases in the X86 `.def` file.
Four entries from the `.def` file were not handled automatically as they
used `BUILTIN` rather than `TARGET_BUILTIN`. These were ported by hand
to an empty-feature `TargetBuiltin` entry, which seems like a better
match.
For all the automatically ported entries, the results were compared by
sorting and diffing the `.def` file and the generated `.inc` file. The
only differences were:
- Different horizontal whitespace
- Additional entries that had already been ported to the `.td` file.
- More systematically using `Oi` instead of `LLi` for the type `long
long int` in the fully general `__builtin_ia32_...` builtins for OpenCL
support. The `.def` file was only partially moved to this it seems, and
the systematic migration has updated a few missed builtins.