Debug info correlation is an option in InstrProfiling pass, which is used by
both IR instrumentation and front-end instrumentation. So, Clang coverage can
also benefits the binary size saving from it.
Reviewed By: ellis
Differential Revision: https://reviews.llvm.org/D157913
One of the main user of these kind of coroutines is swift. There yield-once (`retcon.once`) coroutines are used to temporary "expose" pointers to internal fields of various objects creating borrow scopes.
However, in some cases it might be useful also to allow these coroutines to produce a normal result, but there is no convenient way to represent this (as compared to switched-resume kind of coroutines where C++ `co_return`
is transformed to a member / callback call on promise object).
The extension is simple: we allow continuation function to have a non-void result and accept optional extra arguments via a special `llvm.coro.end.result` intrinsic that would essentially forward them as normal results.
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
Currently, clang emits LLVM IR that fails verifier for the following
code:
```
template<typename T>
__global__ void foo(T x);
void bar() {
foo<<<1, 1>>>(0);
}
```
This is due to clang putting the kernel handle for foo into comdat,
which is not allowed, since the kernel handle is a declaration.
The siutation is similar to calling a declaration-only template
function. The callee will be a declaration in LLVM IR and won't be put
into comdat. This is in contrast to calling a template function with
body, which will be put into comdat.
Fixes: SWDEV-419769
Default atomic ordering information is processed in the OpenMP dialect
to LLVM IR lowering stage at every spot where an operation can be
affected by it. The rest of clauses are stored globally in the
OpenMPIRBuilderConfig object before starting that lowering stage, so
that the OMPIRBuilder can conditionally modify code generation
depending on these. At the end of the process, the omp.requires
attribute is itself lowered into a global constructor that passes these
clauses as flags to the OpenMP runtime.
Depends on D147217, D147218 and D158278.
Differential Revision: https://reviews.llvm.org/D147219
This patch updates the `OpenMPIRBuilderConfig` structure to hold all
available 'requires' clauses, and it replicates part of the code
generation for the 'requires' registration function from clang in the
`OMPIRBuilder`, to be used with flang.
Porting the rest of features of the clang implementation to the IRBuilder
and sharing it between clang and flang remains for a future patch, due to the
complexity of the logic selecting the attributes of the generated
registration function.
Differential Revision: https://reviews.llvm.org/D147217
MSVC allows users to pass structures with required alignments greater
than 4 to variadic functions. It does not pass them indirectly to
correctly align them. Instead, it passes them directly with the usual 4
byte stack alignment.
This change implements the same logic in clang on the passing side. The
receiving side (va_arg) never implemented any of this indirect logic, so
it doesn't need to be updated.
This issue pre-existed, but @aaron.ballman noticed it when we started
passing structs containing aligned fields indirectly in D152752.
The new ACLE PR#225[1] now combines the slice parameters for some
builtins.
Slice specifies the ZA slice number directly and needs to be explicity
implemented by the "user" with the base register plus the immediate
offset
[1]https://github.com/ARM-software/acle/pull/225/files
Summary:
Currently, there is an assertion that prevents us from emitting an
AMDGPU global with a non-target specific address space (i.e. numerical
attribute). I'm unsure what the original intentions of this assertion
were, but we should be able to use OpenCL address spaces when compiling
directly to AMDGPU from C++. This is permitted on NVPTX so I'm unsure
what this assertion is guarding. The patch simply removes the assertion
and adds a test to ensure that these emit the expected address spaces.
Fixes https://github.com/llvm/llvm-project/issues/65069
Summary:
We use the `llvm.amgcn.abi.version` varaible to control code generation.
This is emitted in every module now to indicate what should be used when
compiling. Previously, the logic caused us to emit an external reference
to this variable when creating the code for the `none` type. This would
then cause us not to emit the actual definition. This patch refines the
logic to create the external reference, and then update it if it is
found unset by the time we emit the global. I had to remove the
reference to `GetOrCreateLLVmGlobal` because it did not accept the
proper address space.
This reverts commit c6a33ff49dfb3498dae15c718820ea3d9c19f3cb. Makes
clang segfault.
// clang t.cc
class a;
class c {
public:
[[clang::annotate("")]] c(const c *) {}
};
class d {
d(const c *, a *, a *);
c e;
};
d::d(const c *f, a *, a *) : e(f) {}
Previously, annotations were only emitted for function definitions. With
this change annotations are also emitted for declarations. Also, emitting
function annotations is now deferred until the end so that the most
up to date declaration is used which will have any inherited annotations.
Differential Revision: https://reviews.llvm.org/D156172/new/
Fix for an issue where clang was not adding the address space according
to the data layout, instead was using the default which resulted in a
crash at times. The fix includes changes to the cases of
LargeCapMemAlloc and CGroupMemAlloc where we are setting the AddrSpace
according to the DataLayout.
The new ACLE PR#225[1] now combines the slice parameters for some
builtins. This patch is the #2 of 3 patches to update the interface.
Slice specifies the ZA slice number directly and needs to be explicity
implemented by the "user" with the base register plus the immediate
offset
[1]https://github.com/ARM-software/acle/pull/225/files
Continuing the discussion in
https://discourse.llvm.org/t/codegen-layout-of-si-class-type-info-doesnt-match-the-actual-size/73274
Before we had this code:
@_ZTVN10__cxxabiv117__class_type_infoE = external global ptr
now we'll produce:
@_ZTVN10__cxxabiv117__class_type_infoE = external global [0 x ptr]
This is because we may not know the exact size of this data, and clang
issues gep inbounds with idx=2. Before, that gep would always result in
poison.
This reapplies ddbcc10b9e26b18f6a70e23d0611b9da75ffa52f, except for a tiny part that was reverted separately: 65331da0032ab4253a4bc0ddcb2da67664bd86a9. That will be reapplied later on, since it turned out to be more involved.
This commit is enabled by 5523fefb01c282c4cbcaf6314a9aaf658c6c145f and f0f548a65a215c450d956dbcedb03656449705b9, specifically the part that makes 'clang-tidy/checkers/misc/header-include-cycle.cpp' separator agnostic.
This is an alternative of D157485 and a pre-feature to support AVX10.
AVX10 Architecture Specification: https://cdrdv2.intel.com/v1/dl/getContent/784267
AVX10 Technical Paper: https://cdrdv2.intel.com/v1/dl/getContent/784343
RFC: https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661
Based on the feedbacks from LLVM and GCC community, we have agreed to
start from supporting `-m[no-]evex512` on existing AVX512 features.
The option `-mno-evex512` can be used with `-mavx512xxx` to build
binaries that can run on both legacy AVX512 targets and AVX10-256.
There're still arguments about what's the expected behavior when this
option as well as `-mavx512xxx` used together with `-mavx10.1-256`. We
decided to defer the support of `-mavx10.1` after we made consensus.
Or furthermore, we start from supporting AVX10.2 and not providing any
AVX10.1 options.
Reviewed By: RKSimon, skan
Differential Revision: https://reviews.llvm.org/D159250
Buitlins from AMD's device-libs are compiled without specifying a
target-cpu, which results in builtins without the target-features
attribute set.
Before this patch, when linking this builtins with -mlink-builtin-bitcode
the target-features were not propagated in the incoming builtins.
With this patch, the default target features are propagated
if they are compatible with the target-features in the incoming builtin.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D159206
This is an alternative of D157485 and a pre-feature to support AVX10.
AVX10 Architecture Specification: https://cdrdv2.intel.com/v1/dl/getContent/784267
AVX10 Technical Paper: https://cdrdv2.intel.com/v1/dl/getContent/784343
RFC: https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661
Based on the feedbacks from LLVM and GCC community, we have agreed to
start from supporting `-m[no-]evex512` on existing AVX512 features.
The option `-mno-evex512` can be used with `-mavx512xxx` to build
binaries that can run on both legacy AVX512 targets and AVX10-256.
There're still arguments about what's the expected behavior when this
option as well as `-mavx512xxx` used together with `-mavx10.1-256`. We
decided to defer the support of `-mavx10.1` after we made consensus.
Or furthermore, we start from supporting AVX10.2 and not providing any
AVX10.1 options.
Reviewed By: RKSimon, skan
Differential Revision: https://reviews.llvm.org/D159250
This commit replaces some calls to the deprecated `FileEntry::getName()` with `FileEntryRef::getName()` by swapping current usages of `SourceManager::getFileEntryForID()` with `SourceManager::getFileEntryRefForID()`. This lowers the number of usages of the deprecated `FileEntry::getName()` from 95 to 50.
The goal of this change is to clean up some of the code surrounding
HLSL using CXXThisExpr as a non-pointer l-value. This change cleans up
a bunch of assumptions and inconsistencies around how the type of
`this` is handled through the AST and code generation.
This change is be mostly NFC for HLSL, and completely NFC for other
language modes.
This change introduces a new member to query for the this object's type
and seeks to clarify the normal usages of the this type.
With the introudction of HLSL to clang, CXXThisExpr may now be an
l-value and behave like a reference type rather than C++'s normal
method of it being an r-value of pointer type.
With this change there are now three ways in which a caller might need
to query the type of `this`:
* The type of the `CXXThisExpr`
* The type of the object `this` referrs to
* The type of the implicit (or explicit) `this` argument
This change codifies those three ways you may need to query
respectively as:
* CXXMethodDecl::getThisType()
* CXXMethodDecl::getThisObjectType()
* CXXMethodDecl::getThisArgType()
This change then revisits all uses of `getThisType()`, and in cases
where the only use was to resolve the pointee type, it replaces the
call with `getThisObjectType()`. In other cases it evaluates whether
the desired returned type is the type of the `this` expr, or the type
of the `this` function argument. The `this` expr type is used for
creating additional expr AST nodes and for member lookup, while the
argument type is used mostly for code generation.
Additionally some cases that used `getThisType` in simple queries could
be substituted for `getThisObjectType`. Since `getThisType` is
implemented in terms of `getThisObjectType` calling the later should be
more efficient if the former isn't needed.
Reviewed By: aaron.ballman, bogner
Differential Revision: https://reviews.llvm.org/D159247
Some flags named "IsLTO" and "IsThinLTO" implied they described
compilation modes, but with Unified LTO this is no longer true. Rename
these to "PrepForXXX" to be less confusing to readers. Also, deleted
"IsThinOrUnifiedLTO" because Unified implies PrepareForThinLTO.
In GCC, the .refptr stubs are only generated for x86_64, and only
for code models medium and larger (and medium is the default for
x86_64 since this was introduced). They can be omitted for
projects that are conscious about performance and size, and don't
need automatically importing dll data members, by passing -mcmodel=small.
In Clang/LLVM, such .refptr stubs are generated for any potentially
symbol reference that might end up autoimported. The .refptr stubs
are emitted for three separate reasons:
- Without .refptr stubs, undefined symbols are mostly referenced
with 32 bit wide relocations. If the symbol ends up autoimported
from a different DLL, a 32 bit relative offset might not be
enough to reference data in a different DLL, depending on runtime
loader layout.
- Without .refptr stubs, the runtime pseudo relocation mechanism
will need to temporarily make sections read-write-executable
if there are such relocations in the text section
- On ARM and AArch64, the immediate addressing encoded into
instructions isn't in the form of a plain 32 bit relative offset,
but is expressed with various bits scattered throughout two
instructions - the mingw runtime pseudo relocation mechanism
doesn't support updating offsets in that form.
If autoimporting is known not to be needed, the user can now
compile with -fno-auto-import, avoiding the extra overhead of
the .refptr stubs.
However, omitting them is potentially fragile as the code
might still rely on automatically importing some symbol without
the developer knowing. If this happens, linking still usually
will succeed, but users may encounter issues at runtime.
Therefore, if the new option -fno-auto-import is passed to the compiler
when driving linking, it passes the flag --disable-auto-import to
the linker, making sure that no symbols actually are autoimported
when the generated code doesn't expect it.
Differential Revision: https://reviews.llvm.org/D61670
For vector * scalar + vector, we emit `fmuladd` directly from clang.
This enables it also for matrix * scalar + matrix.
rdar://113967122
Differential Revision: https://reviews.llvm.org/D158883
Summary:
There is no support in XCOFF for labels on common symbols. Therefore, an alias for a common symbol is not supported. Issue an error in the front end when an aliasee is a common symbol. Issue a similar error in the back end in case an IR specifies an alias for a common symbol.
Reviewed by: hubert.reinterpretcast, DiggerLin
Differential Revision: https://reviews.llvm.org/D158739
There were 3 definitions of the mergeDefaultFunctionDefinitionAttributes
function: A private implementation, a version exposed in CodeGen, a
version exposed in CodeGenModule.
This patch removes the private and the CodeGenModule versions and keeps
a single definition in CodeGen.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D159256
For function multi-versioning using the target or target_clones
function attributes, currently we incorrectly set comdat for internal
linkage resolvers. This is problematic for ELF linkers
as GRP_COMDAT deduplication will kick in even with STB_LOCAL signature
(https://groups.google.com/g/generic-abi/c/2X6mR-s2zoc
"GRP_COMDAT group with STB_LOCAL signature").
In short, two `__attribute((target_clones(...))) static void foo()`
in two translation units will be deduplicated. Fix this.
Fix#65114
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D158963
The motivation for this patch is that many code bases use exception handling. As GPUs are not expected to support exception handling in the near future, we can experiment with compiling the code for GPU targets anyway. This will
allow us to run the code, as long as no exception is thrown.
The overall idea is very simple:
- If a throw expression is compiled to AMDGCN or NVPTX, it is replaced with a trap during code generation.
- If a try/catch statement is compiled to AMDGCN or NVPTX, we generate code for the try statement as if it were a basic block.
With this patch, the compilation of the following example
```
int gaussian_sum(int a,int b){
if ((a + b) % 2 == 0) {throw -1;};
return (a+b) * ((a+b)/2);
}
int main(void) {
int gauss = 0;
#pragma omp target map(from:gauss)
{
try {
gauss = gaussian_sum(1,100);
}
catch (int e){
gauss = e;
}
}
std::cout << "GaussianSum(1,100)="<<gauss<<std::endl;
#pragma omp target map(from:gauss)
{
try {
gauss = gaussian_sum(1,101);
}
catch (int e){
gauss = e;
}
}
std::cout << "GaussianSum(1,101)="<<gauss<<std::endl;
return (gauss > 1) ? 0 : 1;
}
```
with offloading to `gfx906` results in
```
./bin/target_try_minimal_fail
GaussianSum(1,100)=5050
AMDGPU fatal error 1: Received error in queue 0x155555506000: HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception.
zsh: abort (core dumped)
```
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D153924
This patch deletes the unused `addDefaultFunctionDefinitionAttributes(llvm::Function);` function,
while it still keeps `void addDefaultFunctionDefinitionAttributes(llvm::AttrBuilder &attrs);` which is being used.
Differential Revision: https://reviews.llvm.org/D158990
D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}`
This patch is an NFC fix to suppress the new warning in llvm,clang,lld builds to pass CI in the above patch.
Differential Revision: https://reviews.llvm.org/D158016