The motivating use case is to support import the function declaration
across modules to construct call graph edges for indirect calls [1]
when importing the function definition costs too much compile time
(e.g., the function is too large has no `noinline` attribute).
1. Currently, when the compiled IR module doesn't have a function
definition but its postlink combined summary contains the function
summary or a global alias summary with this function as aliasee, the
function definition will be imported from source module by IRMover. The
implementation is in FunctionImporter::importFunctions [2]
2. In order for FunctionImporter to import a declaration of a function,
both function summary and alias summary need to carry the def / decl
state. Specifically, all existing summary fields doesn't differ across
import modules, but the def / decl state of is decided by
`<ImportModule, Function>`.
This change encodes the def/decl state in `GlobalValueSummary::GVFlags`.
In the subsequent changes
1. The indexing step `computeImportForModule` [3]
will compute the set of definitions and the set of declarations for each
module, and passing on the information to bitcode writer.
2. Bitcode writer will look up the def/decl state and sets the state
when it writes out the flag value. This is demonstrated in
https://github.com/llvm/llvm-project/pull/87600
3. Function importer will read the def/decl state when reading the
combined summary to figure out two sets of global values, and IRMover
will be updated to import the declaration (aka linkGlobalValuePrototype [4])
into the destination module.
- The next change is https://github.com/llvm/llvm-project/pull/87600
[1] mentioned in rfc https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5
[2] 3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L1608-L1764)
[3] 3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L856)
[4] 3b337242ee/llvm/lib/Linker/IRMover.cpp (L605)
This reverts commit ccdebbae4d77d3efc236af92c22941de5d437e01.
Causes test failures in the presence of Android runtime libraries in resource-dir.
See comments on https://github.com/llvm/llvm-project/pull/87866.
IR for 'target teams loop' is now dependent on suitability of associated
loop-nest.
If a loop-nest:
- does not contain a function call, or
- the -fopenmp-assume-no-nested-parallelism has been specified,
- or the call is to an OpenMP API AND
- does not contain nested loop bind(parallel) directives
then it can be emitted as 'target teams distribute parallel for', which
is the current default. Otherwise, it is emitted as 'target teams
distribute'.
Added debug output indicating how 'target teams loop' was emitted. Flag
is -mllvm -debug-only=target-teams-loop-codegen
Added LIT tests explicitly verifying 'target teams loop' emitted as a
parallel loop and a distribute loop.
Updated other 'loop' related tests as needed to reflect change in IR.
- These updates account for most of the changed files and
additions/deletions.
Previously, we were propagating storage locations the other way around,
i.e.
from initializers to result objects, using `RecordValue::getLoc()`. This
gave
the wrong behavior in some cases -- see the newly added or fixed tests
in this
patch.
In addition, this patch now unblocks removing the `RecordValue` class
entirely,
as we no longer need `RecordValue::getLoc()`.
With this patch, the test `TransferTest.DifferentReferenceLocInJoin`
started to
fail because the framework now always uses the same storge location for
a
`MaterializeTemporaryExpr`, meaning that the code under test no longer
set up
the desired state where a variable of reference type is mapped to two
different
storage locations in environments being joined. Rather than trying to
modify
this test to set up the test condition again, I have chosen to replace
the test
with an equivalent test in DataflowEnvironmentTest.cpp that sets up the
test
condition directly; because this test is more direct, it will also be
less
brittle in the face of future changes.
This patch introduces a new command-line option for clang, namely,
amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt
instruction is generated after each memory load/store instruction. The
counter values are always 0, but which counters are involved depends on
the memory instruction.
---------
Co-authored-by: Jun Wang <jun.wang7@amd.com>
Avoids the need to linearly re-scan all seen parent nodes to check for
duplicates, which previously caused a slowdown for ancestry checks in
Clang AST matchers.
Fixes: #86881
When writing out a PCM, we compute the set of module maps that did
affect the compilation and we strip the rest to make the output
independent of them. The most common way to read a module map that is
not affecting is with implicit module map search. The other option is to
pass a bunch of unnecessary `-fmodule-map-file=<path>` arguments on the
command-line, in which case the client should probably not give those to
Clang anyway.
This makes serialization of explicit modules faster, mostly due to
reduced file system traffic.
This test just checks for the stdout/stderr of clang, but it
incidentally tries to write to `a.out` in the current directory, which
may be write protected. Typically one would write `clang -o %t.o` for a
writeable dir, but since we only care about stdout/stderr, throw away
the object file and just write to /dev/null instead.
As a followup to my previous commits, this is an implementation of a
single clause, in this case the 'default' clause. This implements all
semantic analysis for it on compute clauses, and continues to leave it
rejected for all others (some as 'doesnt appertain', others as 'not
implemented' as appropriate).
This also implements and tests the TreeTransform as requested in the
previous patch.
This patch moves SYCL-related `Sema` functions into new `SemaSYCL`
class, following the recent example of OpenACC and HLSL. This is a part
of the effort to split `Sema`. Additional context can be found in
#82217, #84184, #87634.
Summary:
The linker wrapper attempts to maintain consistent semantics with
existing host invocations. Static libraries by default only extract if
there are non-weak symbols that remain undefined. However, we have
situations between linkers that put different meanings on ordering. The
ld.bfd linker requires static libraries to be defined after the symbols,
while `ld.lld` relaxes this rule. The linker wrapper went with the
former as it's the easier solution, however this has caused a lot of
issues as I've had to explain this rule to several people, it also make
it difficult to include things like `libc` in the OpenMP runtime because
it would sometimes be linked before or after.
This patch reworks the logic to more or less perform the following logic
for static libraries.
1. Split library / object inputs.
2. Include every object input and record its undefined symbols
3. Repeatedly try to extract static libraries to resolve these
symbols. If a file is extracted we need to check every library
again to resolve any new undefined symbols.
This allows the following to work and will cause fewer issues when
replacing HIP, which does `--whole-archive` so it's very likely the old
logic will regress.
```console
$ clang -lfoo main.c -fopenmp --offload-arch=native
```
There has been an optimization for `SizeOfPackExprs` since c5452ed9, in
which
we overlooked a case where the template arguments were not yet
formed into a `PackExpansionType` at the token annotation stage. This
led to a problem in that a template involving such expressions may
lose its nature of being dependent, causing some false-positive
diagnostics.
Fixes https://github.com/llvm/llvm-project/issues/84220
Fixes https://github.com/llvm/llvm-project/issues/63818 for control flow
out of an expressions.
#### Background
A control flow could happen in the middle of an expression due to
stmt-expr and coroutine suspensions.
Due to branch-in-expr, we missed running cleanups for the temporaries
constructed in the expression before the branch.
Previously, these cleanups were only added as `EHCleanup` during the
expression and as normal expression after the full expression.
Examples of such deferred cleanups include:
`ParenList/InitList`: Cleanups for fields are performed by the
destructor of the object being constructed.
`Array init`: Cleanup for elements of an array is included in the array
cleanup.
`Lifetime-extended temporaries`: reference-binding temporaries in
braced-init are lifetime extended to the parent scope.
`Lambda capture init`: init in the lambda capture list is destroyed by
the lambda object.
---
#### In this PR
In this PR, we change some of the `EHCleanups` cleanups to
`NormalAndEHCleanups` to make sure these are emitted when we see a
branch inside an expression (through statement expressions or coroutine
suspensions).
These are supposed to be deactivated after full expression and destroyed
later as part of the destructor of the aggregate or array being
constructed. To simplify deactivating cleanups, we add two utilities as
well:
* `DeferredDeactivationCleanupStack`: A stack to remember cleanups with
deferred deactivation.
* `CleanupDeactivationScope`: RAII for deactivating cleanups added to
the above stack.
---
#### Deactivating normal cleanups
These were previously `EHCleanups` and not `Normal` and **deactivation**
of **required** `Normal` cleanups had some bugs. These specifically
include deactivating `Normal` cleanups which are not the top of
`EHStack`
[source1](92b56011e6/clang/lib/CodeGen/CGCleanup.cpp (L1319)),
[2](92b56011e6/clang/lib/CodeGen/CGCleanup.cpp (L722-L746)).
This has not been part of our test suite (maybe it was never required
before statement expressions). In this PR, we also fix the emission of
required-deactivated-normal cleanups.
This turns the current `Pointer` class into a discriminated union of
`BlockPointer` and `IntPointer`. The former is what `Pointer` currently
is while the latter is just an integer value and an optional
`Descriptor*`.
The `Pointer` then has type check functions like
`isBlockPointer()`/`isIntegralPointer()`/`asBlockPointer()`/`asIntPointer()`,
which can be used to access its data.
Right now, the `IntPointer` and `BlockPointer` structs do not have any
methods of their own and everything is instead implemented in Pointer
(like it was before) and the functions now just either assert for the
right type or decide what to do based on it.
This also implements bitcasts by decaying the pointer to an integral
pointer.
`test/AST/Interp/const-eval.c` is a new test testing all kinds of stuff
related to this. It still has a few tests `#ifdef`-ed out but that
mostly depends on other unimplemented things like
`__builtin_constant_p`.
The compiler doesn't know in advance if the streaming and non-streaming
vector-lengths are different, so it should be safe to give a warning
diagnostic to warn the user about possible undefined behaviour. If the
user knows the vector lengths are equal, they can disable the warning
separately.
This relands #87149.
The previous commit exposed failures on some targets. The reason is only
a few targets support COFF ObjectFormatType on Windows:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/TargetParser/Triple.cpp#L835-L842
With #87149, the targets don't support COFF will report "warning:
argument unused during compilation: '-gcodeview-command-line'
[-Wunused-command-line-argument]" in the test gcodeview-command-line.c
This patch limits gcodeview-command-line.c only run on targets support
COFF.
This patch takes advantage of a recent NFC change that refactored
`EvaluateBinaryTypeTrait()` to accept `TypeSourceInfo` instead of
`QualType` c7db450e5c1a83ea768765dcdedfd50f3358d418.
Before:
```
test2.cpp:105:55: error: variable length arrays are not supported in '__is_layout_compatible'
105 | static_assert(!__is_layout_compatible(int[n], int[n]));
| ^
test2.cpp:125:76: error: incomplete type 'CStructIncomplete' where a complete type is required
125 | static_assert(__is_layout_compatible(CStructIncomplete, CStructIncomplete));
| ^
```
After:
```
test2.cpp:105:41: error: variable length arrays are not supported in '__is_layout_compatible'
105 | static_assert(!__is_layout_compatible(int[n], int[n]));
| ^
test2.cpp:125:40: error: incomplete type 'CStructIncomplete' where a complete type is required
125 | static_assert(__is_layout_compatible(CStructIncomplete, CStructIncomplete));
| ^
```
```
typedef long long t67 __attribute__((aligned (4)));
struct s67 {
int a;
t67 b;
};
void f67(struct s67 x) {
}
```
When classify:
a: Lo = Integer, Hi = NoClass
b: Lo = Integer, Hi = NoClass
struct S: Lo = Integer, Hi = NoClass
```
define dso_local void @f67(i64 %x.coerce) {
```
In this case, only one i64 register is used when the structure parameter
is transferred, which is obviously incorrect.So we need to treat the
split case specially. fix
https://github.com/llvm/llvm-project/issues/85387.
MSVC doesn't support generating __vectorcall calls in Arm64EC mode, but
it does treat it as a distinct type. The Microsoft STL depends on this
functionality. (Not sure if this is intentional.) Add support for
parsing the same way as MSVC, and add some checks to ensure we don't try
to actually generate code.
The error handling in CodeGen is ugly, but I can't think of a better way
to do it.
A few verification checks need to happen until all AST's have been
traversed, specifically for zippered framework checking. To keep source
location until that time valid, hold onto to references of
FrontendRecords + SourceManager.
Summary:
These entires are generic for offloading with the new driver now. Having
the `omp` prefix was a historical artifact and is confusing when used
for CUDA. This patch just renames them for now, future patches will
rework the binary format to make it more common.
This reverts commit 407a2f23 which stopped propagating the callback to module compiles, effectively disabling dependency directive scanning for all modular dependencies. Also added a regression test.
Compiler options recognizable to ccc-analyzer are stored in maps. An
option missing in the map will be dropped by ccc-analyzer. This causes a
build error in one of our projects that only happens in scan-build but
not regular build, because ccc-analyzer do not recognize `-nostdlibinc`.
This commit adds the option to the map.
rdar://126082053
Follow-up to discussion: https://github.com/llvm/llvm-project/pull/87761
As discussed after landing the original PR:
Since fails could happen w.r.t. checking `!6`, these checks should be
removed.
- Add '--' argument to prevent interpreting intput files as options
starting with '/'. Fix test failure after
2921a0928c71f4ee652a2478283e47ab5ffebf58.
This diagnostic is one of the ones that GCC also does not have a
warning group for, but a user requested adding a group to control
selectively turning off this diagnostic. So this adds the diagnostic
to a new group, -Wtentative-definition-array
Fixes#87766