This patch puts the individual target region information attributes into a
struct so that the nested mappings are not needed and passing the information
around is simplified.
Reviewed By: jdoerfert, mikerice
Differential Revision: https://reviews.llvm.org/D136601
This alters the 8.3 complex intrinsics to be target-gated, as opposed to
hidden behind preprocessor macros. This is the last of arm_neon.h, and
follows the same formula as before.
Differential Revision: https://reviews.llvm.org/D135647
As a continuation of D132034, this switches the QRDMX v8.1a neon
intrinsics over from preprocessor defines to be target-gated. As there
is no "rdma" or "qrdmx" target feature, they use the "v8.1a"
architecture feature directly.
This works well for AArch64, but something needs to be done for Arm at
the same time, as they both use the same header and tablegen emitter.
This patch opts for adding "v8.1a" and all dependant target features to
the Arm TargetParser, similar to what was recently done for AArch64 but
through initFeatureMap when the Architecture is parsed. I attempted to
make the code similar to the AArch64 backend.
Otherwise this is similar to the changes made in D132034.
Differential Revision: https://reviews.llvm.org/D135615
A common post condition of the various visitor functions in CodeGen is that instructions, that do not return any values, simply return a nullptr Value as a sentinel. This has not been the case however for calls to some builtins returning void, as well as for an initializer expression of the form `void()`. This would then lead to ICEs in CodeGen on code relying on nullptr being returned for void values, which is eg. the case for conditional expressions [0].
This patch fixes that by returning nullptr Values for intrinsics known not to return any values as well as for a scalar initializer returning void.
Fixes https://github.com/llvm/llvm-project/issues/53127
[0] 266ec801fb/clang/lib/CodeGen/CGExprScalar.cpp (L4849-L4892)
Differential Revision: https://reviews.llvm.org/D136548
This reverts commit cecc9a92cfca71c1b6c2a35c5e302ab649496d11.
The problem ended up being how we were handling the lambda-context in
code generation: we were assuming any decl context here would be a
named-decl, but that isn't the case. Instead, we just replace it with
the concept's owning context.
Differential Revision: https://reviews.llvm.org/D136451
This reverts commit b876f6e2f28779211a829d7d4e841fe68885ae20.
Still getting build failures on PPC AIX that aren't obvious what is causing
them, so reverting while I try to figure this out.
This reverts commit b7c922607c5ba93db8b893d4ba461052af8317b5.
This seems to cause some problems with some modules related things,
which makes me think I should have updated the version-major in
ast-bit-codes? Going to revert to confirm this was a problem, then
change that and re-try a commit.
As that bug reports, the problem here is that the lambda's
'context-decl' was not set to the concept, and the lambda picked up
template arguments from the concept. SO, we failed to get the correct
template arguments in SemaTemplateInstantiate.
However, a Concept Specialization is NOT a decl, its an expression, so
we weren't able to put the concept in the decl tree like we needed.
This patch introduces a ConceptSpecializationDecl, which is the smallest
type possible to use for this purpose, containing only the template
arguments.
The net memory impliciation of this is turning a
trailing-objects into a pointer to a type with trailing-objects, so it
should be minor.
As future work, we may consider giving this type more responsibility, or
figuring out how to better merge duplicates, but as this is just a
template-argument collection at the moment, there isn't much value to
it.
Differential Revision: https://reviews.llvm.org/D136451
This switches the v8.5-a FRINT intrinsics over to be target-gated,
behind preprocessor defines. This one is pretty simple, being AArch64
only.
Differential Revision: https://reviews.llvm.org/D135646
As @python3kgae pointed out we're going to want to assign these IDs
after optimization so that we can remove unused resrouces. This patch
just removes the unused ID value from the frontend metadata, clang code
generation, and updates associated test cases.
Reviewed By: python3kgae
Differential Revision: https://reviews.llvm.org/D136271
short will be promoted to int in UsualUnaryConversions.
Disable it for HLSL to keep int16_t as 16bit.
Reviewed By: aaron.ballman, rjmccall
Differential Revision: https://reviews.llvm.org/D133668
Move ResourceClass into llvm/Frontend/HLSL/HLSLResource.h so it could be shared between clang and DirectX backend.
Reviewed By: beanz
Differential Revision: https://reviews.llvm.org/D136134
This patch adds support for the `depend` clause for the `task`
construct.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D135695
This is an alternative of D120395 and D120411.
Previously we use `__bfloat16` as a typedef of `unsigned short`. The
name may give user an impression it is a brand new type to represent
BF16. So that they may use it in arithmetic operations and we don't have
a good way to block it.
To solve the problem, we introduced `__bf16` to X86 psABI and landed the
support in Clang by D130964. Now we can solve the problem by switching
intrinsics to the new type.
Reviewed By: LuoYuanke, RKSimon
Differential Revision: https://reviews.llvm.org/D132329
Support SV_DispatchThreadID attribute.
Translate it into dx.thread.id in clang codeGen.
Reviewed By: beanz, aaron.ballman
Differential Revision: https://reviews.llvm.org/D133983
This patch changes the kernels generated by OpenMP to have protected
visibility. This is unlikely to change anything functionally. However,
protected visibility better matches the behaviour of these GPU kernels.
We do not expect any pending shared library load to preempt these
kernels so we can specify a more restrictive visibility.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D136198
Currently generation of align assumptions for OpenMP simd construct is done
outside OMPIRBuilder for C code and it is not supported for Fortran.
According to OpenMP 5.0 standard (2.9.3) only pointers and arrays can be
aligned for C code.
If given aligned variable is pointer, then Clang generates the following set
of the LLVM IR isntructions to support simd align clause:
; memory allocation for pointer address:
%A.addr = alloca ptr, align 8
; some LLVM IR code
; Alignment instructions (alignment is equal to 32):
%0 = load ptr, ptr %A.addr, align 8
call void @llvm.assume(i1 true) [ "align"(ptr %0, i64 32) ]
If given aligned variable is array, then Clang generates the following set
of the LLVM IR isntructions to support simd align clause:
; memory allocation for array:
%B = alloca [10 x i32], align 16
; some LLVM IR code
; Alignment instructions (alignment is equal to 32):
%arraydecay = getelementptr inbounds [10 x i32], ptr %B, i64 0, i64 0
call void @llvm.assume(i1 true) [ "align"(ptr %arraydecay, i64 32) ]
OMPIRBuilder was modified to generate aligned assumptions. It generates only
llvm.assume calls. Frontend is responsible for generation of aligned pointer
and getting the default alignment value if user does not specify it in aligned
clause.
Unit and regression tests were added to check if aligned clause was handled correctly.
Differential Revision: https://reviews.llvm.org/D133578
Reviewed By: jdoerfert
Extra NUL does not impact functionality of the generated code, but it confuses
various NVIDIA tools used to examine embedded GPU binaries.
Differential Revision: https://reviews.llvm.org/D135832
''register(ID, space)'' like register(t3, space1) will be translated into
i32 3, i32 1 as the last 2 operands for resource annotation metadata.
NamedMetadata for CBuffers and SRVs are added as "hlsl.srvs" and "hlsl.cbufs".
Reviewed By: beanz
Differential Revision: https://reviews.llvm.org/D130951
Calling `getFunctionLinkage(CalleeInfo.getCalleeDecl())` will crash when the declaration does not have a body, e.g., `extern void foo();`. Instead, we can use `isExternallyVisible()` to see if the delcaration has internal linkage.
I believe using `!isExternallyVisible()` is correct because the clang linkage must be `InternalLinkage` or `UniqueExternalLinkage`, both of which are "internal linkage" in llvm.
9c26f51f5e/clang/include/clang/Basic/Linkage.h (L28-L40)
Fixes https://github.com/llvm/llvm-project/issues/54139
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D135926
PPC64 ABI pass aggregates smaller than a register into the least
significant bits of the register. In the case of variadic functions,
they will end up right-aligned in their argument slots in the argument
area on big-endian targets. Apply right-alignment for these aggregates.
Fixes#55900.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D133338
This patch moves the implementation of the OffloadEntriesInfoManager
to the OMPIRbuilder. This class will later be used by flang as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D135786
This change pulls some code from the DirectX backend into a new
LLVMFrontendHLSL library to share utility data structures between the
HLSL code generation in Clang and the backend in LLVM.
This is a small refactoring as a first start to get code into the
right structure and get the library built and dependencies correct.
Fixes#58000 (https://github.com/llvm/llvm-project/issues/58000)
Reviewed By: python3kgae
Differential Revision: https://reviews.llvm.org/D135110
functions in getter/setter functions of non-trivial C struct properties
This fixes a bug where the getter/setter functions were doing a trivial
copy instead of calling the synthesized functions that copy non-trivial
C struct types.
This fixes https://github.com/llvm/llvm-project/issues/56680.
Differential Revision: https://reviews.llvm.org/D131701
There are currently two options that are used to tell the compiler to perform
unsafe floating-point optimizations:
'-ffast-math' and '-funsafe-math-optimizations'.
'-ffast-math' is enabled by default. It automatically enables the driver option
'-menable-unsafe-fp-math'.
Below is a table illustrating the special operations enabled automatically by
'-ffast-math', '-funsafe-math-optimizations' and '-menable-unsafe-fp-math'
respectively.
Special Operations -ffast-math -funsafe-math-optimizations -menable-unsafe-fp-math
MathErrno 0 1 1
FiniteMathOnly 1 0 0
AllowFPReassoc 1 1 1
NoSignedZero 1 1 1
AllowRecip 1 1 1
ApproxFunc 1 1 1
RoundingMath 0 0 0
UnsafeFPMath 1 0 1
FPContract fast on on
'-ffast-math' enables '-fno-math-errno', '-ffinite-math-only',
'-funsafe-math-optimzations' and sets 'FpContract' to 'fast'. The driver option
'-menable-unsafe-fp-math' enables the same special options than
'-funsafe-math-optimizations'. This is redundant.
We propose to remove the driver option '-menable-unsafe-fp-math' and use
instead, the setting of the special operations to set the function attribute
'unsafe-fp-math'. This attribute will be enabled only if those special
operations are enabled and if 'FPContract' is either 'fast' or set to the
default value.
Differential Revision: https://reviews.llvm.org/D135097
The diagnostics engine is very smart about being passed a NamedDecl to
print as part of a diagnostic; it gets the "right" form of the name,
quotes it properly, etc. However, the result of using an unnamed tag
declaration was to print '' instead of anything useful.
This patch causes us to print the same information we'd have gotten if
we had printed the type of the declaration rather than the name of it,
as that's the most relevant information we can display.
Differential Revision: https://reviews.llvm.org/D134813
cbuffer A {
float a;
float b;
}
will be translated to a global variable.
Something like
struct CB_Ty {
float a;
float b;
};
CB_Ty A;
And all use of a and b will be replaced with A.a and A.b.
Only support none-legacy cbuffer layout now.
CodeGen for Resource binding will be in separate patch.
In the separate patch, resource binding will map the resource information to the global variable.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D130131
This fixes a bug from https://reviews.llvm.org/D131424 that removed the implicit `_cmd` parameter as an argument to `objc_direct` method implementations. In many cases the generated getter/setter will call `objc_getProperty` or `objc_setProperty`, both of which require the selector of the getter/setter; since `_cmd` didn't automatically have backing storage, attempting to load the address asserted.
For direct property generated getters/setters, this now passes an undefined/uninitialized/poison value as the `_cmd` argument to `objc_getProperty`/`objc_setProperty`. Prior to removing the `_cmd` argument from the ABI of direct methods, it was left uninitialized/undefined; although references within hand-implemented methods would load the selector in the method prologue, generated getters/setters never did and just forwarded the undefined value that was passed as the argument.
This change keeps the generated code mostly similar to before, passing an uninitialized/undefined/poison value; for setters, the value argument may be moved to another register.
Added a test that triggers the assert prior to the implementation code.
Differential Revision: https://reviews.llvm.org/D135091
Performing a load before calling __cxa_guard_acquire is supposed to be
an optimization, but it isn't much of one if we're just going to emit a
call to __atomic_load_1 instead. Instead, just skip the load, and
let __cxa_guard_acquire do whatever it wants.
(In practice, on such targets, the C++ library is just built with
threading turned off, so the result isn't actually threadsafe, but
there's not really anything clang can do about that.)
The alternative here is that we try to define some ABI for threadsafe
init that allows the speculative load without full atomics. Almost any
target without full atomics has a load that's s "atomic enough" for this
purpose. But it's not clear how we emit an "atomic enough" load in LLVM
IR, and there isn't any ABI document we can refer to.
Or I guess we could turn off -fthreadsafe-statics by default on
Cortex-M0, but that seems like it would be surprising.
Fixes https://github.com/llvm/llvm-project/issues/58184
Differential Revision: https://reviews.llvm.org/D135628
Similar to D131064, this alters most of the intrinsics in arm_neon.h to
be target based, not preprocessor based. The intrinsics that are changed
are the ones with obvious target features (fp16, fp16fml, cryptos, i8mm
and bf16). The ones that are not yet altered are the ones without target
features like rdma (8.1) and complex (8.3). Those will be switched in a
followup patch that allows targeting architecture versions.
The existing ArchGuard in arm_neon.td is split into ArchGuard that still
adds ifdef defines (for example for intrinsics that require __aarch64__),
and TargetGuards for intrinsics dependant on target features. From there
the TargetGuards are used in two ways:
- For intrinsics emitted as functions, __attribute__((target(TargetGuard)))
is added to the definition of the function. Along with the existing
always_inline intrinsic, this will give a compile time error if the
function is used in a context where the target feature is not available.
- For intrinsics emitted as macros, the __builtins are emitted into
arm_neon.inc using TARGET_BUILTIN as opposed to BUILTIN, which includes
the target feature and gives an error if the builtin is found in a
function without the required features, similar to arm_sve.h.
The second method requires that the intrinsics be separable from the
existing _v intrinsics used in other types. For example
__builtin_neon_splat_lane_bf16 is used as opposed to
__builtin_neon_splat_lane_v. There are some adjustments to the CGBuiltin
to account for intrinsics that can be treated similarly, except for
their target features.
Differential Revision: https://reviews.llvm.org/D132034
There is no 6.9 in C++11, the quote actually lives in
[intro.multithread] for that revision. However, the words moved in
C++17 to [intro.progress] so I added that information as well.
Generally, with PGO enabled the C++20 likelyhood attributes shall be
dropped assuming the profile has a good coverage. However, currently
this is not the case for the following code:
if (always_false()) [[likely]] {
...
}
The patch fixes this and drops the attribute, if the parent context was
executed in the profile. The patch still preserves the attribute, if the
parent context was not executed, e.g. to support the cases when the
profile has insufficient coverage.
Differential Revision: https://reviews.llvm.org/D134456
Currently there is a middle-end or backend issue
https://github.com/llvm/llvm-project/issues/58176
which causes values loaded from bool pointer incorrect when
bool range metadata is emitted. Temporarily
disable bool range metadata until the backend issue
is fixed.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D135269
Fixes: SWDEV-344137
This patch moves the emitOffloadingArraysArgument function and
supporting data structures to OpenMPIRBuilder. This will later be used
in flang as well. The TargetDataInfo class was split up into generic
information and clang-specific data, which remain in clang. Further
migration will be done in in the future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D134662
The `cl-uniform-work-group` attribute asserts that the global work-size
be a multiple of the work-group specified work group size. This should
allow optimizations. It is already present by default in the AMD
compiler and for HIP kernels so it should be safe to allow this for
OpenMP kernels by default.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D135374
After generated call for ctor/dtor for entry, global variable for ctor/dtor are useless.
Remove them for non-lib profiles.
Lib profile still need these in case export function used the global variable which require ctor/dtor.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D133993
We use protected visibility for almost everything with offloading. This
is because it provides us with the ability to read things from the host
without the expectation that it will be preempted by a shared library
load, bugs related to this have happened when offloading to the host.
This patch just makes the `exec_mode` global generated for each plugin
have protected visibility.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D135285
Details posted here: https://reviews.llvm.org/D119051#3747201
3 cases that were inconsistent with the MSABI without this patch applied:
https://godbolt.org/z/GY48qxh3G - field with protected member
https://godbolt.org/z/Mb1PYhjrP - non-static data member initializer
https://godbolt.org/z/sGvxcEPjo - defaulted copy constructor
I'm not sure what's suitable/sufficient testing for this - I did verify
the three cases above. Though if it helps to add them as explicit tests,
I can do that too.
Also, I was wondering if the other use of isTrivialForAArch64MSVC in
isPermittedToBeHomogenousAggregate could be another source of bugs - I
tried changing the function to unconditionally call
isTrivialFor(AArch64)MSVC without testing AArch64 first, but no tests
fail, so it looks like this is undertested in any case. But I had
trouble figuring out how to exercise this functionality properly to add
test coverage and then compare that to MSVC itself... - I got very
confused/turned around trying to test this, so I've given up enough to
send what I have out for review, but happy to look further into this
with help.
Differential Revision: https://reviews.llvm.org/D133817