`GetOmpObjectList` takes a clause, and returns the pointer to the
contained OmpObjectList, or nullptr if the clause does not contain one.
Some clauses with object list were not recognized: handle all clauses,
and move the implementation to flang/Parser/openmp-utils.cpp.
This PR adds support for complex power operations (`cpow`) in the
`ComplexToROCDLLibraryCalls` conversion pass, specifically targeting
AMDGPU architectures. The implementation optimises complex
exponentiation by using mathematical identities and special-case
handling for small integer powers.
- Force lowering to `complex.pow` operations for the `amdgcn-amd-amdhsa`
target instead of using library calls
- Convert `complex.pow(z, w)` to `complex.exp(w * complex.log(z))` using
mathematical identity
A CMake change included in CMake 4.0 makes `AIX` into a variable
(similar to `APPLE`, etc.)
ff03db6657
However, `${CMAKE_SYSTEM_NAME}` unfortunately also expands exactly to
`AIX` and `if` auto-expands variable names in CMake. That means you get
a double expansion if you write:
`if (${CMAKE_SYSTEM_NAME} MATCHES "AIX")`
which becomes:
`if (AIX MATCHES "AIX")`
which is as if you wrote:
`if (ON MATCHES "AIX")`
You can prevent this by quoting the expansion of "${CMAKE_SYSTEM_NAME}",
due to policy
[CMP0054](https://cmake.org/cmake/help/latest/policy/CMP0054.html#policy:CMP0054)
which is on by default in 4.0+. Most of the LLVM CMake already does
this, but this PR fixes the remaining cases where we do not.
This patch is part of the upstreaming effort for supporting SYCL
language front end.
It makes the following changes:
1. Adds sycl_external attribute for functions with external linkage,
which is intended for use to implement the SYCL_EXTERNAL macro as
specified by the SYCL 2020 specification
2. Adds checks to avoid emitting device code when sycl_external and
sycl_kernel_entry_point attributes are not enabled
3. Fixes test failures caused by the above changes
This patch is missing diagnostics for the following diagnostics listed
in the SYCL 2020 specification's section 5.10.1, which will be addressed
in a subsequent PR:
Functions that are declared using SYCL_EXTERNAL have the following
additional restrictions beyond those imposed on other device functions:
1. If the SYCL backend does not support the generic address space then
the function cannot use raw pointers as parameter or return types.
Explicit pointer classes must be used instead;
2. The function cannot call group::parallel_for_work_item;
3. The function cannot be called from a parallel_for_work_group scope.
In addition to that, the subsequent PR will also implement diagnostics
for inline functions including virtual functions defined as inline.
---------
Co-authored-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
The new builtin `__builtin_dedup_pack` removes duplicates from list of
types.
The added builtin is special in that they produce an unexpanded pack
in the spirit of P3115R0 proposal.
Produced packs can be used directly in template argument lists and get
immediately expanded as soon as results of the computation are
available.
It allows to easily combine them, e.g.:
```cpp
template <class ...T>
struct Normalize {
// Note: sort is not included in this PR, it illustrates the idea.
using result = std::tuple<
__builtin_sort_pack<
__builtin_dedup_pack<int, double, T...>...
>...>;
}
;
```
Limitations:
- only supported in template arguments and bases,
- can only be used inside the templates, even if non-dependent,
- the builtins cannot be assigned to template template parameters.
The actual implementation proceeds as follows:
- When the compiler encounters a `__builtin_dedup_pack` or other
type-producing
builtin with dependent arguments, it creates a dependent
`TemplateSpecializationType`.
- During substitution, if the template arguments are non-dependent, we
will produce: a new type `SubstBuiltinTemplatePackType`, which stores
an argument pack that needs to be substituted. This type is similar to
the existing `SubstTemplateParmPack` in that it carries the argument
pack that needs to be expanded further. The relevant code is shared.
- On top of that, Clang also wraps the resulting type into
`TemplateSpecializationType`, but this time only as a sugar.
- To actually expand those packs, we collect the produced
`SubstBuiltinTemplatePackType` inside `CollectUnexpandedPacks`.
Because we know the size of the produces packs only after the initial
substitution, places that do the actual expansion will need to have a
second run over the substituted type to finalize the expansions (in
this patch we only support this for template arguments, see
`ExpandTemplateArgument`).
If the expansion are requested in the places we do not currently
support, we will produce an error.
More follow-up work will be needed to fully shape this:
- adding the builtin that sorts types,
- remove the restrictions for expansions,
- implementing P3115R0 (scheduled for C++29, see
https://github.com/cplusplus/papers/issues/2300).
This function tries to look for (seteq (and (reduce_or), mask), 0). If
the mask is a sign bit, InstCombine will have turned it into (setgt
(reduce_or), -1). We should handle that case too.
I'm looking into adding the same canonicalization to SimplifySetCC and
this change is needed to prevent test regressions.
Blocks without a terminator are not handled correctly by
`Block::without_terminator`: the last operation is excluded, even when
it is not a terminator. With this commit, only terminators are excluded.
If the last operation is unregistered, it is included for safety.
Transfer all casts by kind as we currently do implicit casts. This
obviates the need for specific handling of static casts.
Also transfer CK_BaseToDerived and CK_DerivedToBase and add tests for
these and missing tests for already-handled cast types.
Ensure that CK_BaseToDerived casts result in modeling of the fields of the derived class.
When either one of the operands is all ones in high or low parts,
splitting these opens up other opportunities for combines. One of two
new instructions will either be removed or become a simple copy.
Add parameter to file lock API to allow exclusive file lock. Both Unix
and Windows support lock the file exclusively for write for one process
and LLVM OnDiskCAS uses exclusive file lock to coordinate CAS creation.
I have seen misuse of the `hasEffect` API in downstream projects: users
sometimes think that `hasEffect == false` indicates that the operation
does not have a certain memory effect. That's not necessarily the case.
When the op does not implement the `MemoryEffectsOpInterface`, it is
unknown whether it has the specified effect. "false" can also mean
"maybe".
This commit clarifies the semantics in the documentation. Also adds
`hasUnknownEffects` and `mightHaveEffect` convenience functions. Also
simplifies a few call sites.
These invariants are always expected to hold, however it's not always
clear that they do. Adding explicit checks for these invariants inside
non-trivial functions of basic_streambuf makes that clear.
A number of recipes compute costs for the same opcodes for scalars or
vectors, depending on the recipe.
Move the common logic out to a helper in VPRecipeWithIRFlags, that is
then used by VPReplicateRecipe, VPWidenRecipe and VPInstruction.
This makes it easier to cover all relevant opcodes, without duplication.
PR: https://github.com/llvm/llvm-project/pull/153361
The standard is ambiguous, but we can only support
arrays/array-sections/etc of the composite type, so make sure we enforce
the rule that way. This will better support how we need to do lowering.
Enable gfx942 for tests that are affected by the an AMDGPU bitcast
constant combine (#154115)
Expecting to see more tests affected in aforementioned PR after rebase
on top of this PR
recreate this patch from
https://github.com/llvm/llvm-project/pull/153894
It seems ISel sliently ignore the `i64 = zext i16` with a chained
`reg_sequence` pattern and thus this is causing a selection failure in
hip test. Recreate a new patch with an alternative pattern, and added a
ll test global-extload-gfx11plus.ll
A call through a function pointer has no associated FunctionDecl, but it
still might have a nodiscard return type. Ensure there is a codepath to
emit the nodiscard warning in this case.
Fixes#142453
Building with assertions flag (-sAssertions=2) gives me these
```
[ RUN ] InterpreterTest.InstantiateTemplate Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueWithAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Error in loading dynamic library incr_module_3.wasm: RuntimeError: Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueWithAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Could not load dynamic lib: incr_module_3.wasm RuntimeError: Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueWithAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment)
[ RUN ] InterpreterTest.InstantiateTemplate Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueNoAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Error in loading dynamic library incr_module_3.wasm: RuntimeError: Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueNoAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Could not load dynamic lib: incr_module_3.wasm RuntimeError: Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueNoAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment)
[ RUN ] InterpreterTest.InstantiateTemplate Aborted(Assertion failed: undefined symbol '_ZnwmPv26__clang_Interpreter_NewTag'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Error in loading dynamic library incr_module_23.wasm: RuntimeError: Aborted(Assertion failed: undefined symbol '_ZnwmPv26__clang_Interpreter_NewTag'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Could not load dynamic lib: incr_module_23.wasm RuntimeError: Aborted(Assertion failed: undefined symbol '_ZnwmPv26__clang_Interpreter_NewTag'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment)
[ RUN ] InterpreterTest.Value Aborted(Assertion failed: undefined symbol '_Z9getGlobalv'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Error in loading dynamic library incr_module_36.wasm: RuntimeError: Aborted(Assertion failed: undefined symbol '_Z9getGlobalv'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Could not load dynamic lib: incr_module_36.wasm
[ RUN ] InterpreterTest.Value Aborted(Assertion failed: undefined symbol '_Z9getGlobalv'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Error in loading dynamic library incr_module_36.wasm: RuntimeError: Aborted(Assertion failed: undefined symbol '_Z9setGlobali'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Could not load dynamic lib: incr_module_36.wasm
```
**So we have some symbols missing here that are needed by the side
modules being created here.**
First 2 are needed by both tests
Last 3 are needed for these lines accordingly in the Value test.
dc23869f98/clang/unittests/Interpreter/InterpreterTest.cpp (L355)dc23869f98/clang/unittests/Interpreter/InterpreterTest.cpp (L364)dc23869f98/clang/unittests/Interpreter/InterpreterTest.cpp (L365)
Everything should work as expected after this
```
[----------] 9 tests from InterpreterTest
[ RUN ] InterpreterTest.Sanity
[ OK ] InterpreterTest.Sanity (18 ms)
[ RUN ] InterpreterTest.IncrementalInputTopLevelDecls
[ OK ] InterpreterTest.IncrementalInputTopLevelDecls (45 ms)
[ RUN ] InterpreterTest.Errors
[ OK ] InterpreterTest.Errors (29 ms)
[ RUN ] InterpreterTest.DeclsAndStatements
[ OK ] InterpreterTest.DeclsAndStatements (34 ms)
[ RUN ] InterpreterTest.UndoCommand
/Users/anutosh491/work/llvm-project/clang/unittests/Interpreter/InterpreterTest.cpp:156: Skipped
Test fails for Emscipten builds
[ SKIPPED ] InterpreterTest.UndoCommand (0 ms)
[ RUN ] InterpreterTest.FindMangledNameSymbol
[ OK ] InterpreterTest.FindMangledNameSymbol (85 ms)
[ RUN ] InterpreterTest.InstantiateTemplate
[ OK ] InterpreterTest.InstantiateTemplate (127 ms)
[ RUN ] InterpreterTest.Value
[ OK ] InterpreterTest.Value (608 ms)
[ RUN ] InterpreterTest.TranslationUnit_CanonicalDecl
[ OK ] InterpreterTest.TranslationUnit_CanonicalDecl (64 ms)
[----------] 9 tests from InterpreterTest (1014 ms total)
```
This is similar to how we need to take care of some symbols while
building side modules during running cppinterop's test suite !
This is helping some windows users, here is the doc:
**LLVM_LIT_TOOLS_DIR**:PATH
The path to GnuWin32 tools for tests. Valid on Windows host. Defaults to
the empty string, in which case lit will look for tools needed for tests
(e.g. ``grep``, ``sort``, etc.) in your ``%PATH%``. If GnuWin32 is not
in your
``%PATH%``, then you can set this variable to the GnuWin32 directory so
that
lit can find tools needed for tests in that directory.
In setVectorizedCallDecision we attempt to calculate the scalar costs
for vectorisation calls, even for scalable VFs where we already know the
answer is Invalid. We can avoid doing unnecessary work by skipping this
completely for scalable vectors.
64-bit version of 7425af4b7aaa31da10bd1bc7996d3bb212c79d88. We
still need to lower to 32-bit v_accagpr_write_b32s, so this has
a unique value restriction that requires both halves of the constant
to be 32-bit inline immediates. This only introduces the new
pseudo definitions, but doesn't try to use them yet.
This patch adds support for atomic loads and stores. Specifically, it
adds support for the following intrinsic calls:
- `__atomic_load` and `__atomic_store`;
- `__c11_atomic_load` and `__c11_atomic_store`.
LoopPeel currently considers PHI nodes that become loop invariants
through peeling. However, in some cases, peeling transforms PHI nodes
into induction variables (IVs), potentially enabling further
optimizations such as loop vectorization. For example:
```c
// TSVC s292
int im = N-1;
for (int i=0; i<N; i++) {
a[i] = b[i] + b[im];
im = i;
}
```
In this case, peeling one iteration converts `im` into an IV, allowing
it to be handled by the loop vectorizer.
This patch adds a new feature to peel loops when to convert PHIs into
IVs. At the moment this feature is disabled by default.
Enabling it allows to vectorize the above example. I have measured on
neoverse-v2 and observed a speedup of more than 60% (options: `-O3
-ffast-math -mcpu=neoverse-v2 -mllvm -enable-peeling-for-iv`).
This PR is taken over from #94900
Related #81851
SILoadStoreOptimizer can now recognise consecutive 16-bit and 8-bit
`TBUFFER_LOAD`/`TBUFFER_STORE` instructions that each write
* a single component (`X`), or
* two components (`XY`),
and fold them into the wider native variants:
```
X + X --> XY
X + X + X + X --> XYZW
XY + XY --> XYZW
X + X + X --> XYZ
XY + X --> XYZ
```
The optimisation cuts the number of TBUFFER instructions, shrinking code
size and improving memory throughput.
This is an attempt to reland #151660 by including a missing STL header
found by a buildbot failure.
The stable function map could be huge for a large application. Fully
loading it is slow and consumes a significant amount of memory, which is
unnecessary and drastically slows down compilation especially for
non-LTO and distributed-ThinLTO setups. This patch introduces an opt-in
lazy loading support for the stable function map. The detailed changes
are:
- `StableFunctionMap`
- The map now stores entries in an `EntryStorage` struct, which includes
offsets for serialized entries and a `std::once_flag` for thread-safe
lazy loading.
- The underlying map type is changed from `DenseMap` to
`std::unordered_map` for compatibility with `std::once_flag`.
- `contains()`, `size()` and `at()` are implemented to only load
requested entries on demand.
- Lazy Loading Mechanism
- When reading indexed codegen data, if the newly-introduced
`-indexed-codegen-data-lazy-loading` flag is set, the stable function
map is not fully deserialized up front. The binary format for the stable
function map now includes offsets and sizes to support lazy loading.
- The safety of lazy loading is guarded by the once flag per function
hash. This guarantees that even in a multi-threaded environment, the
deserialization for a given function hash will happen exactly once. The
first thread to request it performs the load, and subsequent threads
will wait for it to complete before using the data. For single-threaded
builds, the overhead is negligible (a single check on the once flag).
For multi-threaded scenarios, users can omit the flag to retain the
previous eager-loading behavior.
When testing LLDB, we want to make sure to use the same Python as the
one we used to build it.
We already did this in https://github.com/llvm/llvm-project/pull/143183
for the Unit and Shell tests. This patch does the same thing for the API
tests as well.
This patch reworks how VG is handled around streaming mode changes.
Previously, for functions with streaming mode changes, we would:
- Save the incoming VG in the prologue
- Emit `.cfi_offset vg, <offset>` and `.cfi_restore vg` around streaming
mode changes
Additionally, for locally streaming functions, we would:
- Also save the streaming VG in the prologue
- Emit `.cfi_offset vg, <incoming VG offset>` in the prologue
- Emit `.cfi_offset vg, <streaming VG offset>` and `.cfi_restore vg`
around streaming mode changes
In both cases, this ends up doing more than necessary and would be hard
for an unwinder to parse, as using `.cfi_offset` in this way does not
follow the semantics of the underlying DWARF CFI opcodes.
So the new scheme in this patch is to:
In functions with streaming mode changes (inc locally streaming)
- Save the incoming VG in the prologue
- Emit `.cfi_offset vg, <offset>` in the prologue (not at streaming mode
changes)
- Emit `.cfi_restore vg` after the saved VG has been deallocated
- This will be in the function epilogue, where VG is always the same as
the entry VG
- Explicitly reference the incoming VG expressions for SVE callee-saves
in functions with streaming mode changes
- Ensure the CFA is not described in terms of VG in functions with
streaming mode changes
A more in-depth discussion of this scheme is available in:
https://gist.github.com/MacDue/b7a5c45d131d2440858165bfc903e97b
But the TLDR is that following this scheme, SME unwinding can be
implemented with minimal changes to existing unwinders. All unwinders
need to do is initialize VG to `CNTD` at the start of unwinding, then
everything else is handled by standard opcodes (which don't need changes
to handle VG).
Fixes#150163
MLIR bytecode does not preserve alias definitions, so each attribute
encountered during deserialization is treated as a new one. This can
generate duplicate `DISubprogram` nodes during deserialization.
The patch adds a `StringMap` cache that records attributes and fetches
them when encountered again.
This patch implements pages 15-17 from
jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf
Documentation:
jhauser.us/RISCV/ext-P/RVP-baseInstrs-014.pdf
jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf
Summary:
It's extremely common to conditionally blend two vectors. Previously
this was done with mask registers, which is what the normal ternary code
generation does when used on a vector. However, since Clang 15 we have
supported boolean vector types in the compiler. These are useful in
general for checking the mask registers, but are currently limited
because they do not map to an LLVM-IR select instruction.
This patch simply relaxes these checks, which are technically forbidden
by
the OpenCL standard. However, general vector support should be able to
handle these. We already support this for Arm SVE types, so this should
be make more consistent with the clang vector type.
Fix the `sys.path` logic in the GDB plugin to insert the intended
self-path in the first position rather than appending it to the end. The
latter implied that if `sys.path` (naturally) contained the GDB's
`gdb-plugin` directory, `import ompd` would return the top-level
`ompd/__init__.py` module rather than the `ompd/ompd.py` submodule, as
intended by adding the `ompd/` directory to `sys.path`.
This is intended to be a minimal change necessary to fix the issue.
Alternatively, the code could be modified to import `ompd.ompd` and stop
modifying `sys.path` entirely. However, I do not know why this option
was chosen in the first place, so I can't tell if this won't break
something.
Fixes#153954
Signed-off-by: Michał Górny <mgorny@gentoo.org>
Both these functions update an `AllocInfoMap` structure in the context,
however they did not use any locks, causing random failures in threaded
code. Now they use a mutex.