Adds a builtin that serves as an optimization hint to apply specific optimized
DAG mutations during scheduling. This also disables any other mutations or
clustering that may interfere with the desired pipeline. The first optimization
strategy that is added here is designed to improve the performance of small gemm
kernels on gfx90a.
Reviewed By: jrbyrnes
Differential Revision: https://reviews.llvm.org/D132079
I went over the output of the following mess of a command:
(ulimit -m 2000000; ulimit -v 2000000; git ls-files -z |
parallel --xargs -0 cat | aspell list --mode=none --ignore-case |
grep -E '^[A-Za-z][a-z]*$' | sort | uniq -c | sort -n |
grep -vE '.{25}' | aspell pipe -W3 | grep : | cut -d' ' -f2 | less)
and proceeded to spend a few days looking at it to find probable typos
and fixed a few hundred of them in all of the llvm project (note, the
ones I found are not anywhere near all of them, but it seems like a
good start).
Differential Revision: https://reviews.llvm.org/D130827
This builtin allows the creation of custom scheduling pipelines on a per-region
basis. Like the sched_barrier builtin this is intended to be used either for
testing, in situations where the default scheduler heuristics cannot be
improved, or in critical kernels where users are trying to get performance that
is close to handwritten assembly. Obviously using these builtins will require
extra work from the kernel writer to maintain the desired behavior.
The builtin can be used to create groups of instructions called "scheduling
groups" where ordering between the groups is enforced by the scheduler.
__builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter
is a mask that determines the types of instructions that you would like to
synchronize around and add to a scheduling group. These instructions will be
selected from the bottom up starting from the sched_group_barrier's location
during instruction scheduling. The second parameter is the number of matching
instructions that will be associated with this sched_group_barrier. The third
parameter is an identifier which is used to describe what other
sched_group_barriers should be synchronized with. Note that multiple
sched_group_barriers must be added in order for them to be useful since they
only synchronize with other sched_group_barriers. Only "scheduling groups" with
a matching third parameter will have any enforced ordering between them.
As an example, the code below tries to create a pipeline of 1 VMEM_READ
instruction followed by 1 VALU instruction followed by 5 MFMA instructions...
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 1 VALU
__builtin_amdgcn_sched_group_barrier(2, 1, 0)
// 5 MFMA
__builtin_amdgcn_sched_group_barrier(8, 5, 0)
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 3 VALU
__builtin_amdgcn_sched_group_barrier(2, 3, 0)
// 2 VMEM_WRITE
__builtin_amdgcn_sched_group_barrier(64, 2, 0)
Reviewed By: jrbyrnes
Differential Revision: https://reviews.llvm.org/D128158
Clang has traditionally allowed C programs to implicitly convert
integers to pointers and pointers to integers, despite it not being
valid to do so except under special circumstances (like converting the
integer 0, which is the null pointer constant, to a pointer). In C89,
this would result in undefined behavior per 3.3.4, and in C99 this rule
was strengthened to be a constraint violation instead. Constraint
violations are most often handled as an error.
This patch changes the warning to default to an error in all C modes
(it is already an error in C++). This gives us better security posture
by calling out potential programmer mistakes in code but still allows
users who need this behavior to use -Wno-error=int-conversion to retain
the warning behavior, or -Wno-int-conversion to silence the diagnostic
entirely.
Differential Revision: https://reviews.llvm.org/D129881
When overload resolution fails, clang emits a note diagnostic for each
candidate. For OpenCL builtins this often leads to many repeated note
diagnostics with no new information. Stop emitting such notes.
Update a test that was relying on counting those notes to check how
many builtins are available for certain extension configurations.
Differential Revision: https://reviews.llvm.org/D127961
For backwards compatiblity, we emit only a warning instead of an error if the
attribute is one of the existing type attributes that we have historically
allowed to "slide" to the `DeclSpec` just as if it had been specified in GNU
syntax. (We will call these "legacy type attributes" below.)
The high-level changes that achieve this are:
- We introduce a new field `Declarator::DeclarationAttrs` (with appropriate
accessors) to store C++11 attributes occurring in the attribute-specifier-seq
at the beginning of a simple-declaration (and other similar declarations).
Previously, these attributes were placed on the `DeclSpec`, which made it
impossible to reconstruct later on whether the attributes had in fact been
placed on the decl-specifier-seq or ahead of the declaration.
- In the parser, we propgate declaration attributes and decl-specifier-seq
attributes separately until we can place them in
`Declarator::DeclarationAttrs` or `DeclSpec::Attrs`, respectively.
- In `ProcessDeclAttributes()`, in addition to processing declarator attributes,
we now also process the attributes from `Declarator::DeclarationAttrs` (except
if they are legacy type attributes).
- In `ConvertDeclSpecToType()`, in addition to processing `DeclSpec` attributes,
we also process any legacy type attributes that occur in
`Declarator::DeclarationAttrs` (and emit a warning).
- We make `ProcessDeclAttribute` emit an error if it sees any non-declaration
attributes in C++11 syntax, except in the following cases:
- If it is being called for attributes on a `DeclSpec` or `DeclaratorChunk`
- If the attribute is a legacy type attribute (in which case we only emit
a warning)
The standard justifies treating attributes at the beginning of a
simple-declaration and attributes after a declarator-id the same. Here are some
relevant parts of the standard:
- The attribute-specifier-seq at the beginning of a simple-declaration
"appertains to each of the entities declared by the declarators of the
init-declarator-list" (https://eel.is/c++draft/dcl.dcl#dcl.pre-3)
- "In the declaration for an entity, attributes appertaining to that entity can
appear at the start of the declaration and after the declarator-id for that
declaration." (https://eel.is/c++draft/dcl.dcl#dcl.pre-note-2)
- "The optional attribute-specifier-seq following a declarator-id appertains to
the entity that is declared."
(https://eel.is/c++draft/dcl.dcl#dcl.meaning.general-1)
The standard contains similar wording to that for a simple-declaration in other
similar types of declarations, for example:
- "The optional attribute-specifier-seq in a parameter-declaration appertains to
the parameter." (https://eel.is/c++draft/dcl.fct#3)
- "The optional attribute-specifier-seq in an exception-declaration appertains
to the parameter of the catch clause" (https://eel.is/c++draft/except.pre#1)
The new behavior is tested both on the newly added type attribute
`annotate_type`, for which we emit errors, and for the legacy type attribute
`address_space` (chosen somewhat randomly from the various legacy type
attributes), for which we emit warnings.
Depends On D111548
Reviewed By: aaron.ballman, rsmith
Differential Revision: https://reviews.llvm.org/D126061
For newer OpenCL extensions that do not require a pragma, such as
`cl_khr_subgroup_shuffle`, a user could still accidentally attempt to
use a pragma. This would result in a warning
"unknown OpenCL extension 'cl_khr_subgroup_shuffle' - ignoring"
which could be mistakenly interpreted as "clang does not support this
extension at all" instead of "clang does not require any pragma for
this extension".
Differential Revision: https://reviews.llvm.org/D126660
The vload*_half* and vstore*_half* builtins do not require the
cl_khr_fp16 extension: pointers to `half` can be declared without the
extension and the _half variants of vload and vstore should be
available without the extension.
This aligns the guards for these builtins for
`-fdeclare-opencl-builtins` with `opencl-c.h`.
Fixes https://github.com/llvm/llvm-project/issues/55275
Differential Revision: https://reviews.llvm.org/D125401
Before issuing the warning about use of a strict prototype, check if
the declarator is required to have a prototype through some other means
determined at parse time.
This silences false positives in OpenCL code (where the functions are
forced to have a prototype) and block literal expressions.
Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If
there is a need to have highly optimized codegen and kernel developers have
knowledge of inter-wave runtime behavior which is unknown to the compiler this
builtin can be used to tune scheduling.
This intrinsic creates a barrier between scheduling regions. The immediate
parameter is a mask to determine the types of instructions that should be
prevented from crossing the sched_barrier. In this initial patch, there are only
two variations. A mask of 0 means that no instructions may be scheduled across
the sched_barrier. A mask of 1 means that non-memory, non-side-effect inducing
instructions may cross the sched_barrier.
Note that this intrinsic is only meant to work with the scheduling passes. Any
other transformations that may move code will not be impacted in the ways
described above.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D124700
C89 allowed a type specifier to be elided with the resulting type being
int, aka implicit int behavior. This feature was subsequently removed
in C99 without a deprecation period, so implementations continued to
support the feature. Now, as with implicit function declarations, is a
good time to reevaluate the need for this support.
This patch allows -Wimplicit-int to issue warnings in C89 mode (off by
default), defaults the warning to an error in C99 through C17, and
disables support for the feature entirely in C2x. It also removes a
warning about missing declaration specifiers that really was just an
implicit int warning in disguise and other minor related cleanups.
In C++ and C2x, we would avoid calling ImplicitlyDefineFunction at all,
but in OpenCL mode we would still call the function and have it produce
an error diagnostic. Instead, we now have a helper function to
determine when implicit function definitions are allowed and we use
that to determine whether to call ImplicitlyDefineFunction so that the
behavior is more consistent across language modes.
This changes the diagnostic behavior from telling the users that an
implicit function declaration is not allowed in OpenCL to reporting use
of an unknown identifier and going through typo correction, as done in
C++ and C2x.
C89 had a questionable feature where the compiler would implicitly
declare a function that the user called but was never previously
declared. The resulting function would be globally declared as
extern int func(); -- a function without a prototype which accepts zero
or more arguments.
C99 removed support for this questionable feature due to severe
security concerns. However, there was no deprecation period; C89 had
the feature, C99 didn't. So Clang (and GCC) both supported the
functionality as an extension in C99 and later modes.
C2x no longer supports that function signature as it now requires all
functions to have a prototype, and given the known security issues with
the feature, continuing to support it as an extension is not tenable.
This patch changes the diagnostic behavior for the
-Wimplicit-function-declaration warning group depending on the language
mode in effect. We continue to warn by default in C89 mode (due to the
feature being dangerous to use). However, because this feature will not
be supported in C2x mode, we've diagnosed it as being invalid for so
long, the security concerns with the feature, and the trivial
workaround for users (declare the function), we now default the
extension warning to an error in C99-C17 mode. This still gives users
an easy workaround if they are extensively using the extension in those
modes (they can disable the warning or use -Wno-error to downgrade the
error), but the new diagnostic makes it more clear that this feature is
not supported and should be avoided. In C2x mode, we no longer allow an
implicit function to be defined and treat the situation the same as any
other lookup failure.
Differential Revision: https://reviews.llvm.org/D122983
Functions without prototypes in C (also known as K&R C functions) were
introduced into C89 as a deprecated feature and C2x is now reclaiming
that syntax space with different semantics. However, Clang's
-Wstrict-prototypes diagnostic is off-by-default (even in pedantic
mode) and does not suffice to warn users about issues in their code.
This patch changes the behavior of -Wstrict-prototypes to only diagnose
declarations and definitions which are not going to change behavior in
C2x mode, and enables the diagnostic in -pedantic mode. The diagnostic
is now specifically about the fact that the feature is deprecated.
It also adds -Wdeprecated-non-prototype, which is grouped under
-Wstrict-prototypes and diagnoses declarations or definitions which
will change behavior in C2x mode. This diagnostic is enabled by default
because the risk is higher for the user to continue to use the
deprecated feature.
Differential Revision: https://reviews.llvm.org/D122895
This adds -no-opaque-pointers to clang tests whose output will
change when opaque pointers are enabled by default. This is
intended to be part of the migration approach described in
https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9.
The patch has been produced by replacing %clang_cc1 with
%clang_cc1 -no-opaque-pointers for tests that fail with opaque
pointers enabled. Worth noting that this doesn't cover all tests,
there's a remaining ~40 tests not using %clang_cc1 that will need
a followup change.
Differential Revision: https://reviews.llvm.org/D123115
A significant number of our tests in C accidentally use functions
without prototypes. This patch converts the function signatures to have
a prototype for the situations where the test is not specific to K&R C
declarations. e.g.,
void func();
becomes
void func(void);
Move the SourceRange from the old ParsedAttributesWithRange into
ParsedAttributesView, so we have source range information available
everywhere we use attributes.
This also removes ParsedAttributesWithRange (replaced by simply using
ParsedAttributes) and ParsedAttributesVieWithRange (replaced by using
ParsedAttributesView).
Differential Revision: https://reviews.llvm.org/D121201
This is the `ext_vector_type` alternative to D81083.
This patch extends Clang to allow 'bool' as a valid vector element type
(attribute ext_vector_type) in C/C++.
This is intended as the canonical type for SIMD masks and facilitates
clean vector intrinsic declarations. Vectors of i1 are supported on IR
level and below down to many SIMD ISAs, such as AVX512, ARM SVE (fixed
vector length) and the VE target (NEC SX-Aurora TSUBASA).
The RFC on cfe-dev: https://lists.llvm.org/pipermail/cfe-dev/2020-May/065434.html
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D88905
This patch adds -Wno-strict-prototypes to all of the test cases that
use functions without prototypes, but not as the primary concern of the
test. e.g., attributes testing whether they can/cannot be applied to a
function without a prototype, etc.
This is done in preparation for enabling -Wstrict-prototypes by
default.
Until now, subgroup builtins are available with `opencl-c.h` when at
least one of `cl_intel_subgroups`, `cl_khr_subgroups`, or
`__opencl_c_subgroups` is defined. With `-fdeclare-opencl-builtins`,
subgroup builtins are conditionalized on `cl_khr_subgroups` only.
Align `-fdeclare-opencl-builtins` to `opencl-c.h` by introducing the
internal `__opencl_subgroup_builtins` macro.
Differential Revision: https://reviews.llvm.org/D120254
Until now, overloads with a 64-bit atomic type argument were always
made available with `-fdeclare-opencl-builtins`. Ensure these
overloads are only available when both the `cl_khr_int64_base_atomics`
and `cl_khr_int64_extended_atomics` extensions have been enabled, as
required by the OpenCL specification.
Differential Revision: https://reviews.llvm.org/D119858
OpenCL C 3.0 __opencl_c_subgroups feature is slightly different
then other equivalent features and extensions (fp64 and 3d image writes):
OpenCL C 3.0 device can support the extension but not the feature.
cl_khr_subgroups requires subgroup independent forward progress.
This patch adjusts the check which is used when translating language
builtins to check either the extension or feature is supported.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D118999
OpenCL C 3.0 introduces optionality to some builtins, in particularly
to those which are conditionally supported with pipe, device enqueue
and generic address space features.
The idea is to conditionally support such builtins depending on the language options
being set for a certain feature. This allows users to define functions with names
of those optional builtins in OpenCL (as such names are not reserved).
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D118605
Add the atomic overloads for the `global` and `local` address spaces,
which are new in OpenCL 3.0. Ensure the preexisting `generic`
overloads are guarded by the generic address space feature macro.
Ensure a subset of the atomic builtins are guarded by the
`__opencl_c_atomic_order_seq_cst` and `__opencl_c_atomic_scope_device`
feature macros, and enable those macros for SPIR/SPIR-V targets in
`opencl-c-base.h`.
Also guard the `cl_ext_float_atomics` builtins with the atomic order
and scope feature macros.
Differential Revision: https://reviews.llvm.org/D119420
A significant number of our tests in C accidentally use functions
without prototypes. This patch converts the function signatures to have
a prototype for the situations where the test is not specific to K&R C
declarations. e.g.,
void func();
becomes
void func(void);
This is the third batch of tests being updated (there are a significant
number of other tests left to be updated).
Currently, -fdeclare-opencl-builtins always adds the generic address
space overloads of e.g. the vload builtin functions in OpenCL 3.0
mode, even when the generic address space feature is disabled.
Guard the generic address space overloads by the
`__opencl_c_generic_address_space` feature instead of by OpenCL
version.
Guard the private, global, and local overloads using the internal
`__opencl_c_named_address_space_builtins` feature.
Differential Revision: https://reviews.llvm.org/D107769
This feature requires support of __opencl_c_generic_address_space and
__opencl_c_program_scope_global_variables so diagnostics for that is provided as well.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D115640
Ensure any use of a `read_write` image is guarded behind the
`__opencl_c_read_write_images` feature macro.
Differential Revision: https://reviews.llvm.org/D117899
Use the "pure" attribute (or "readonly") for the vload, vload_half and
vloada_half builtins.
Includes test changes to SemaOpenCL/fdeclare-opencl-builtins.cl to avoid
triggering unused-result warnings.
Reviewed By: svenvh
Differential Revision: https://reviews.llvm.org/D110742
Based on post-commit review discussion on
2bd84938470bf2e337801faafb8a67710f46429d with Richard Smith.
Other uses of forcing HasEmptyPlaceHolder to false seem OK to me -
they're all around pointer/reference types where the pointer/reference
token will appear at the rightmost side of the left side of the type
name, so they make nested types (eg: the "int" in "int *") behave as
though there is a non-empty placeholder (because the "*" is essentially
the placeholder as far as the "int" is concerned).
This was originally committed in 277623f4d5a672d707390e2c3eaf30a9eb4b075c
Reverted in f9ad1d1c775a8e264bebc15d75e0c6e5c20eefc7 due to breakages
outside of clang - lldb seems to have some strange/strong dependence on
"char [N]" versus "char[N]" when printing strings (not due to that name
appearing in DWARF, but probably due to using clang to stringify type
names) that'll need to be addressed, plus a few other odds and ends in
other subprojects (clang-tools-extra, compiler-rt, etc).
Looks like lldb has some issues with this - somehow it causes lldb to
treat a "char[N]" type as an array of chars (prints them out
individually) but a "char [N]" is printed as a string. (even though the
DWARF doesn't have this string in it - it's something to do with the
string lldb generates for itself using clang)
This reverts commit 277623f4d5a672d707390e2c3eaf30a9eb4b075c.
Based on post-commit review discussion on
2bd84938470bf2e337801faafb8a67710f46429d with Richard Smith.
Other uses of forcing HasEmptyPlaceHolder to false seem OK to me -
they're all around pointer/reference types where the pointer/reference
token will appear at the rightmost side of the left side of the type
name, so they make nested types (eg: the "int" in "int *") behave as
though there is a non-empty placeholder (because the "*" is essentially
the placeholder as far as the "int" is concerned).
Add atomic_half types and builtins operating on the types from the
cl_ext_float_atomics extension.
Patch by Haonan Yang.
Differential Revision: https://reviews.llvm.org/D109740
Adds support for a feature macro __opencl_c_3d_image_writes in
C++ for OpenCL 2021 enabling a respective optional core feature
from OpenCL 3.0.
This change aims to achieve compatibility between C++ for OpenCL
2021 and OpenCL 3.0.
Differential Revision: https://reviews.llvm.org/D109328
Adds support for a feature macro `__opencl_c_read_write_images` in
C++ for OpenCL 2021 enabling a respective optional core feature
from OpenCL 3.0.
This change aims to achieve compatibility between C++ for OpenCL
2021 and OpenCL 3.0.
Differential Revision: https://reviews.llvm.org/D109307
Adds support for a feature macro `__opencl_c_pipes` in C++ for
OpenCL 2021 enabling a respective optional core feature from
OpenCL 3.0.
This change aims to achieve compatibility between C++ for OpenCL
2021 and OpenCL 3.0.
Differential Revision: https://reviews.llvm.org/D109306
Adds support for macro `__opencl_c_program_scope_global_variables`
in C++ for OpenCL 2021 enabling a respective optional core feature
from OpenCL 3.0.
This change aims to achieve compatibility between C++ for OpenCL
2021 and OpenCL 3.0.
Differential Revision: https://reviews.llvm.org/D109305
Adds support for a feature macro `__opencl_c_images` in C++ for
OpenCL 2021 enabling a respective optional core feature from
OpenCL 3.0.
This change aims to achieve compatibility between C++ for OpenCL
2021 and OpenCL 3.0.
Differential Revision: https://reviews.llvm.org/D109002
`.rgba` vector extension setting in C++ for OpenCL 2021 is now
performed analogously to OpenCL C 3.0. Test case added.
Differential Revision: https://reviews.llvm.org/D109370