AMDGCN flavoured SPIR-V allows AMDGCN specific builtins, including those
for scoped fences and some specific RMWs. However, at present we don't
map syncscopes to their SPIR-V equivalents, but rather use the AMDGCN
ones. This ends up pessimising the resulting code as system scope is
used instead of device (agent) or subgroup (wavefront), so we correct
the behaviour, to ensure that we do the right thing during reverse
translation.
This PR starts using the correct VFS in `GCOVProfilerPass` instead of
using the real FS directly. This matches compiler's behavior for other
input files.
The builtins `__fixunstfdi` and `__multc3` may be removed by the
preprocessor depending on configuration flags. When this happens, the
corresponding tests fail at link time due to missing definitions.
Disable these tests when the builtins are not available.
Also remove the XFAILs for aarch64 windows. As this test now became a
no-op on platforms that lack CRT_HAS_128BIT or CRT_HAS_F128 (aarch64
windows lacks the latter), it no longer fails.
This reapplies e9e166e54354330c474457711a8e7a7ca2efd731 and
656707086e5f6fccd2eb57f5aaf987c328c0f4f1 after fixing declarations of
the builtins in the tests in b54250940c2cd70f911386b02239b50c165e5354.
rdar://159705803
rdar://159705705
---------
Co-authored-by: Martin Storsjö <martin@martin.st>
Utilize new extensions to LLVM Offloading API to
handle offloading fatbin Bundles.
The tool will output a list of available offload bundles
using URI syntax.
---------
Co-authored-by: dsalinas_amdeng <david.salinas@amd.com>
Summary:
Several of these tests have been failing for literal years. Ideally we
make efforts to fix this, but keeping these broken has had serious
consequences on our testing infrastructure where failures are the norm
so almost all test failures are disregarded. I made a tracking issue for
the ones that have been disabled.
https://github.com/llvm/llvm-project/issues/161265
First time check was introduced in
`fa3ab4599d717feedbb83e08e7f654913942520b` to work around a debug-info
generation bug in Clang. This bug was fixed in Clang-4. The check has
since been adjusted (first in
`808ff186f6a6ba1fd38cc7e00697cd82f4afe540`, and then most recently in
`370db9c62910195e664e82dde6f0adb3e255a4fd`).
This check is getting quite convoluted, and all it does is turn an
`array[1]` into an `array[0]` type when it is deemed correct. At this
point the workaround probably never fires, apart from actually valid
codegen. This patch removes the special conditions and emits the error
specifically in those cases where we know the DWARF is malformed.
Added some shell tests for the error case.
Add OnDiskTrieRawHashMap. This is a on-disk persistent hash map that
uses a Trie data structure that is similar to TrieRawHashMap.
OnDiskTrieRawHashMap is thread safe and process safe. It is mostly lock
free, except it internally coordinates cross process creation and
closing using file lock.
Replace `long double` and `long double _Complex` with `fp_t` and
`Qcomplex` in the test files.
This prepares for reapplying 656707086e5f6fccd2eb57f5aaf987c328c0f4f1
and running tests on targets where `fp_t` is not `long double`.
`PRIVATE | ATTACH` maps can be used to represent firstprivate pointers
that should be initialized by doing doing the pointee's device address,
if its lookup succeeds, or retain the original host pointee's address
otherwise.
With this, for a test like the following:
```f90
integer, pointer :: p(:)
!$omp target map(p(1))
... print*, p(1)
!$omp end target
```
The codegen can look like:
```llvm
; maps for p:
; &p(1), &p(1), sizeof(p(1)), TO|FROM //(1)
; &ref_ptr(p), &p(1), sizeof(ref_ptr(p)), ATTACH //(2)
; &ref_ptr(p), &p(1), sizeof(ref_ptr(p)), PRIVATE|ATTACH|PARAM //(3)
call... @__omp_outlined...(ptr %ref_ptr_of_p)
```
* `(1)` maps the pointee `p(1)`.
* `(2)` attaches it to the (previously) mapped `ref_ptr(p)`, if present.
It can be controlled via OpenMP 6.1's `attach(auto/always/never)`
map-type modifiers.
* `(3)` privatizes and initializes the local `ref_ptr(p)`, which gets
passed
in as the kernel argument `%ref_ptr_of_p`. Can be skipped if p is not
referenced directly within the region.
While similar mapping can be used for C/C++, it's more important/useful
for Fortran as we can avoid creating another argument for passing the
descriptor, and use that to initialize the private copy in the body of
the kernel.
Following up 12cb540. Also, that commit left behind a few cases where a
temporary `StringRef` was being constructed from those variables just to
use its `.split()` function, so this PR cleans those up too.
Adds a new unit test for the Mustache Set Delimiter feature.
This test is written with inverted logic (`EXPECT_NE`) so that it
passes with the current implementation, which does not support
the feature. Once the feature is implemented, this test will fail,
signaling that the test logic should be flipped to `EXPECT_EQ`.
The current implementaion did not correctly handle indentation for
standalone partial tags. It was only applied to lines following a
newline, instead of the first line of a partial's content. This was
fixed by updating the AddIndentation implementaion to prepend the
indentation to the first line of the partial.
This patch fixes a documentation typo in the MLIR Toy tutorial (Ch6).
The tutorial currently refers to `examples/toy/Ch6/toy.cpp`, but the
correct
file name is `examples/toy/Ch6/toyc.cpp`.
We can rewrite this to (srai(w)/srli X, C1) == C2 so the AND immediate
is free. This transform is done by performSETCCCombine in
RISCVISelLowering.cpp.
This fixes the opaque constant case mentioned in #157416.
This adds basic operator delete handling in CIR. This does not yet
handle destroying delete or array delete, which will be added later. It
also does not insert non-null checks when not optimizing for size.
This sets the MLIR module name to the main filename (according to the
SourceManager), if one is available. The module name gets used when
creating global init functions, so we will need it to be set.
Changes to support for array elements in reduction clause e.g.
"reduction (+:a[1])"
---------
Co-authored-by: Sunil Kuravinakop <kuravina@pe31.hpc.amslabs.hpecorp.net>
When the root node of the `RedirectingFileSystem` is to be resolved to
the current working directory, we previously consulted the real FS
instead of the provided underlying VFS. This PR fixes that issue.
Summary:
Forgot to port this option's old handling from offload. It's not way
easier since they're built in the same CMake project. Also delete the
leftover directory that's not used anymore, don't know how that was
still there.
This patch fixes a bug in EquivalenceClasses::erase, where we lose a
leader bit in a certain scenario.
Here is some background. In EquivalenceClasses, each equivalence
class is maintained as a singly linked list over its members. When we
join two classes, we concatenate the two singly linked lists. To
support path compression, each member points to the leader (through
lazy updates). This is implemented with the two members:
class ECValue {
mutable const ECValue *Leader, *Next;
:
};
Each member stores its leader in Leader and its sibling in Next. Now,
the leader uses the Leader field to to point the last element of the
singly linked list to accommodate the list concatenation. We use the
LSB of the Next field to indicate whether a given member is a leader
or not.
Now, imagine we have an equivalence class:
Elem 1 -> Elem 2 -> nullptr
Leader
and wish to remove Elem 2. We would like to end up with:
Elem 1 -> nullptr
Leader
but we mistakenly drop the leader bit when we update the Next field of
Elem 1 with:
Pre->Next = nullptr;
This makes Elem 1 the end of the singly linked list, as intended, but
mistakenly clears its leader bit stored in the LSB of Next, so we end
up with an equivalence class with no leader.
This patch fixes the problem by preserving the leader bit:
Pre->Next = reinterpret_cast<const ECValue *>(
static_cast<intptr_t>(Pre->isLeader()));
The unit test closely follows the scenario above.
This PR starts using the correct VFS in `ModuleDependencyCollector`
instead of using the real FS directly. This matches compiler's behavior
for other input files.
Do not include the ``__PAGEZERO`` segment when calculating size information
for Mach-O files when `--exclude-pagezero` is used. The ``__PAGEZERO``
segment is a virtual memory region used for memory protection that does not
contribute to actual size, and excluding can provide a better representation of
actual size.
Fixes#86644
Store relocations directly as `SmallVector<Relocation, 0>` within
EhInputSection to avoid processing different relocation formats
(REL/RELA/CREL) throughout the codebase.
Next: Refactor RelocationScanner to utilize EhInputSection::rels
Pull Request: https://github.com/llvm/llvm-project/pull/161041
This PR starts using the correct VFS for getting file's real path in
`FileCollector` instead of using the real FS directly. This matches
compiler's behavior for other input files.
Currently LVI does the union of value ranges from block predecessors.
When storing the ranges per predecessor, the resulting ranges may be
more restricted and enable additional optimizations.
However this is costly (memory + compile time), so place this under a
flag disabled by default.
See: https://github.com/llvm/llvm-project/issues/158139.
Loop headers frequently consume the loop-carried value in the header
block via non-lookthrough ops (e.g. byte-wise vector binops).
LiveRegOptimizer’s same-BB filter currently prunes these users, so the
loop-carried PHI is not coerced to i32 and the intended packed form is
lost.
Relax the filter: when the def is a PHI, allow same-BB non-lookthrough
users. Also fix the check to look at the user (CII) rather than the def
(II) so the walk does not terminate prematurely.