In OpenMP 6.0 a subset of the dependence types is also used in the
`depinfo-modifier` on INIT clause. Make the enum be a common type to
avoid defining separate enum types with mostly identical members.
Use the name `OmpDependenceKind` because the other obvious candidate,
OmpDependenceType, used to be a modifier name in older OpenMP specs.
…ions
When a merge-like instruction has all readanylane sources and the result
is copied to VGPRs, eliminate the readanylanes by either using the
original unmerge source directly or building a new merge with the VGPR
sources.
The code to recognize the level_zero plugin as a liboffload backend was
split from #158900. This PR adds the support back.
---------
Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com>
Co-authored-by: Nick Sarnie <nick.sarnie@intel.com>
Co-authored-by: Joseph Huber <huberjn@outlook.com>
When looking for the device address of a symbol, we need to also look if
it's a function symbol if not found as global symbol in the device.
---------
Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com>
Co-authored-by: Nick Sarnie <nick.sarnie@intel.com>
Co-authored-by: Joseph Huber <huberjn@outlook.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Support for getDebugLevel was removed as part of the new debug macros
(#165416). This PR updates such usages to use the new ODBG_* macros.
---------
Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com>
Co-authored-by: Nick Sarnie <nick.sarnie@intel.com>
Co-authored-by: Joseph Huber <huberjn@outlook.com>
Today if SPIR-V Tools is not found, you get the below error:
```
clang: error: unable to execute command: posix_spawn failed: No such file or directory
clang: error: spirv-as command failed with exit code 1 (use -v to see invocation)
```
which is not exactly user friendly.
Explain what software package is missing and give suggestions on getting
it.
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
This patch adds the `IN6_IS_ADDR_MC*` macro, which checks whether an
address is multicast node-local address, multicast link-local address,
multicast site-local address, multicast organization-local address and
multicast global address.
Currently, we assign the same scheduling info to COPY regardless of
whether it's a scalar or vector one. But this might cause vector COPY
from physical registers to schedule too closed to its consumer,
prolonging the physical register live range and running out of registers
during RA as seen in #167008 .
This patch addresses this issue by creating schedule variants for COPY
instructions of vector register classes so that they can have the same
latency as simple vector arithmetics (WriteVIALUV). It is worth noting
that we _only_ need latency in this case -- keeping processor resources
in (vector) COPYs still causes the aforementioned register shortage
issue, because these COPY might then be blocked by structural hazards
and again, got sunk further down than we want.
This patch introduces VPInstruction::Reverse and extracts the reverse
operations of loaded/stored values from reverse memory accesses. This
extraction facilitates future support for permutation elimination within
VPlan.
With recent refactoring, LDS promotion worklists for all allocas are
populated upfront. In some cases, this results in a User in multiple
lists. Then as each list is processed, a User might get deleted via
removeFromParent, potentially leaving a dangling pointer in a subsequent
worklist.
Currently this only occurs for memcpy and memmove. Prior to refactoring,
these were handled by DeferredInstr, and were processed after the last
use of the then singular worklist.
This change moves processing of DeferredInstr to after all worklists
have be processed.
I recently observed that LLVM generates the following code:
```
addi a1, a0, -1
sltu a0, a0, a1
addi a0, a0, -1
and a0, a0, a1
ret
```
This could be optimized using the snez instruction instead.
Tests have to perform an additional FADD to prevent
combineConcatVectorOfCasts from performing the fold - we're trying to
show when this fails to occur during a combineConcatVectorOps recursion
Interestingly, due to uitofp expansion AVX1/2 is often managing to
concat where AVX512 can't
This occurs after type legalization, so the index type can be i32 or
i64. This patch simplifies the matching and checks for the optional zero
extend.
Also, a few tests from when this fold was added had broken due to
incorrectly adding `nuw` to the `add <eltCount>, #-1`, which this patch
corrects.
This patch implements the 'first' clause for OMP, which is the
'proc_bind' clause. This clause takes one of a handful of values and
just passes it onto the OMP dialect.
The 'default' value for this isn't present in the OMP dialect, however
the classic-codegen doesn't generate the library call when this value is
passed, so this is effectively a 'no-op'.
This adds parsing and lowering of the COMBINER clause. It utilizes the
existing lowering code for combiner-expression to lower the COMBINER
clause as well.
Resolves#169312
Enables the usage of the following X86 intrinsics in `constexpr`:
```c
_mm256_permute2f128_pd _mm256_permute2f128_ps
_mm256_permute2f128_si256 _mm256_permute2x128_si256
```
This PR partially upstreams support for the `co_return` keyword. It
still needs to address the case where a `co_return` returns a value from
a `co_await`.
Additionally, this change focuses on `emitBodyAndFallthrough`, where
depending on whether the function falls through or not it will emit the
user written `co_await`. Another thing to note is the difference from
classic CodeGen, previously it checked whether it could fall through by
using `GetInsertBlock()` to verify that the block existed. In our case,
when a `co_return` is emitted, we mark `setCoreturn()` to indicate that
the coroutine contains a `co_return`.
So far, the setting enforced only jitlink but not rtdyld. We get better
test coverage now that we honor both cases. We drop EPC-based execution
on the way, because with ORC lli always executes in-process.
This patch is a follow-up to 96c733e to fix a missing space in the
frame.pc format entity. This space was intended to be prepended to the
module format entity scope but if the module is not valid, which is
often the case for python pc-less scripted frames, the space between the
pc and the function name is missing.
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
This test used to work on non-Linux platforms that could run simple ELF
objects in a JIT session. However, there is a risk that this will become
too unstable for CI, so let's limit it to what we actually need.
few tests are broken on ubuntu, need find out the cause
This reverts commit 999c9382571d6aadf9b786263862bf4085dd2dba.
Co-authored-by: yavtuk <yavtuk@ya.ru>
The pass now contains a non-fp expansion and should
be used for any similar expansions regardless of the
types involved. Hence a generic name seems apt.
Rename the source files, pass, and adjust the pass
description. Move all tests for the expansions
that have previously been merged into the pass
to a single directory.
llvm::count_if calls std::count_if which returns a difference_type.
difference_type is always signed but is never going to be a negative
value when used as the result of count_if.
This resulted in warnings in our 32-bit Arm builds like:
```
AMDGPUIGroupLP.cpp:1050:20: warning: comparison of integers of different signs:
'typename iterator_traits<const SDep *>::difference_type' (aka 'int') and 'unsigned int' [-Wsign-compare]
1050 | if (SuccSize >= Size)
| ~~~~~~~~ ^ ~~~~
```
I presume these warnings are not generated in 64-bit builds because
unsigned is 32-bit even for 64-bit platforms and there is no risk in
extending 32-bit unsigned into 64-bit signed.
To fix the warning I've changed the type of SuccSize to unsigned, and
the assignment acts like a static_cast into that type.