This patch adds initial support for Integrated Distributed ThinLTO
(DTLTO) in LLVM, which manages distribution internally during the
traditional link step. This enables compatibility with any build
system that supports in-process ThinLTO. In contrast, existing
approaches to distributed ThinLTO, which split the thin-link
(--thinlto-index-only), backend compilation, and final link into
separate steps, require build system support, e.g. Bazel.
This patch implements the core DTLTO mechanism, which enables
delegation of ThinLTO backend jobs to an external process (the
distributor). The distributor can then manage job distribution through
systems like Incredibuild. A generic JSON interface is used to
communicate with the distributor, allowing for the creation of new
distributors (and thus integration with different distribution
systems) without modifying LLVM.
Please see llvm/docs/dtlto.rst for more details.
RFC: https://discourse.llvm.org/t/rfc-integrated-distributed-thinlto/69641
Design Review: https://github.com/llvm/llvm-project/pull/126654
Utilize the new extensions to the LLVM Offloading API to extend to
llvm-objdump to handle dumping fatbin offload bundles generated by HIP.
This extension to llvm-objdump adds the option --offload-fatbin.
Specifying this option will take the input object/executable and extract
all offload fatbin bundle entries into distinct code object files with
names reflecting the source file name combined with the Bundle Entry ID.
Users can also use the --arch-name option to filter offload fatbin
bundle entries by their target triple.
---------
Co-authored-by: dsalinas <dsalinas@MKM-L1-DSALINAS.amd.com>
Note: This relands #140615 adding a ".count" suffix to the non-".all"
variants.
Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.
- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.count(i32, i32)
The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:
* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned.count(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync.count(x, y)
This PR was split from https://github.com/llvm/llvm-project/pull/137228
(which introduced support for `DW_TAG_module` and `DW_AT_byte_size`).
This PR improves `LVDWARFReader` by introducing handling of
`DW_AT_byte_size`. Most DWARF emitters include this attribute for types
to specify the size of an entity of the given type.
Some hardware (for example, certain AVR chips) have peripheral registers
mapped to the data space address 0. Although a volatile load/store on
`ptr null` already generates expected code, the wording in the LangRef
makes operations on null seem like undefined behavior in all cases. This
commit adds a comment that, for volatile operations, it may be defined
behavior to access the address null, if the architecture permits it. The
intended use case is MMIO registers with hard-coded addresses that
include bit-value 0. A simple CodeGen test is included for AVR, as an
architecture known to have this quirk, that does `load volatile` and
`store volatile` to `ptr null`, expecting to generate `lds <reg>, 0` and
`sts 0, <reg>`.
See [this
thread](https://rust-lang.zulipchat.com/#narrow/channel/213817-t-lang/topic/Adding.20the.20possibility.20of.20volatile.20access.20to.20address.200)
and [the
RFC](https://discourse.llvm.org/t/rfc-volatile-access-to-non-dereferenceable-memory-may-be-well-defined/86303)
for discussion and context.
This adds assembler/disassembler support for XSfmmbase 0.6 and related
SiFive matrix multiplication extensions based on the spec here
https://www.sifive.com/document-file/xsfmm-matrix-extensions-specification
Functionality-wise, this is the same as the Zvma extension proposal that
SiFive shared with the Attached Matrix Extension Task Group. The
extension names and instruction mnemonics have been changed to use
vendor prefixes.
Note this is a non-conforming extension as the opcodes used here are in
the standard opcode space in OP-V or OP-VE.
---------
Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.
- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync(i32, i32)
- llvm.nvvm.barrier.cta.arrive(i32, i32)
The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:
* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y)
- Adds support for `DW_TAG_module` DIEs and recurse over their children.
Prior to this patch, entities hanging below `DW_TAG_module` were just
not visible. This DIE kind is commonly generated by Objective-C modules.
This patch will represent such entities, which will print as
```
[001] {CompileUnit} '/llvm/tools/clang/test/modules/<stdin>'
[002] {Producer} 'LLVM version 3.7.0'
{Directory} '/llvm/tools/clang/test/modules'
{File} '<stdin>'
[002] {Module} 'DebugModule'
```
The minimal test case included is just the result of
```
$ llc llvm/test/DebugInfo/X86/DIModule.ll
-accel-tables=Dwarf
-o llvm/unittests/DebugInfo/LogicalView/Inputs/test-dwarf-clang-module.o
-filetype=obj
```
The conditional variants of SMSTART/STOP currently take the current
PStateSM as a variadic value. This is not supported by the verification
added in #140472 (which requires variadic values to be of type Register
or RegisterMask), so this patch splits the the conditional variants into
new `COND_` nodes, where these extra parameters are fixed arguments.
Suggested in
https://github.com/llvm/llvm-project/pull/140472#discussion_r2094635066
Part of #140472.
The Chromium bug tracker is in an archived state. The Security Response
Group has preemptively created llvm-project GitHub issues with PDF
copies of the Chromium issues should the repository become inaccessible.
* Add URLs for redirects from
https://bugs.chromium.org/p/llvm/issues/detail?id=X to
https://issuetracker.google.com/issues/y
* Add URLs to llvm-project archive issues.
* Add an explanation of archive use.
Adds additional subgroup block prefetch, load,
load transposed, load transformed and store
instructions to read two-dimensional blocks of
data from a two-dimensional region of memory, or
to write two-dimensional blocks of data to a
two-dimensional region of memory.
Spec:
https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/INTEL/SPV_INTEL_2d_block_io.asciidoc
---------
Co-authored-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to
LDS from global (address space 1) pointers and buffer fat pointers
(address space 7), since they use the same API and "gather from a
pointer to LDS" is something of an abstract operation.
This commit adds the intrinsic and its lowerings for addrspaces 1 and 7,
and updates the MLIR wrappers to use it (loosening up the restrictions
on loads to LDS along the way to match the ground truth from target
features).
It also plumbs the intrinsic through to clang.
This function can be used to retrieve the number of bits that can be used
for arithmetic in a given address space (i.e. the range of the address
space). For most in-tree targets this should not make any difference
but differentiating between the size of a pointer in bits and the address
range is extremely important e.g. for CHERI-enabled targets, where pointers
carry additional metadata such as bounds and permissions and only a subset
of the pointer bits is used as the address.
The address size is defined to be the same as the index size.
We considered adding a separate property since targets exist where indexing
and address range actually use different sizes (AMDGPU fat pointers with
160 representation, 48 bit address and 32 bit index), but for the purposes
of LLVM semantics, differentiating them does not add much value and it
introduces a lot of complexity in ensure the correct bits are used. See
the reasoning by @nikic on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/38https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/49
Originally uploaded as https://reviews.llvm.org/D135158
Reviewed By: davidchisnall, krzysz00
Pull Request: https://github.com/llvm/llvm-project/pull/139347
Utilize the new extensions to the LLVM Offloading API to extend to
llvm-objdump to handle dumping fatbin offload bundles generated by HIP.
This extension to llvm-objdump adds the option --offload-fatbin.
Specifying this option will take the input object/executable and extract
all offload fatbin bundle entries into distinct code object files with
names reflecting the source file name combined with the Bundle Entry ID.
Users can also use the --arch-name option to filter offload fatbin
bundle entries by their target triple.
---------
Co-authored-by: dsalinas <dsalinas@MKM-L1-DSALINAS.amd.com>
Currently coroutine promises are modeled as allocas. This is problematic
because other middle-end passes will assume promise dead after coroutine
suspend, leading to misoptimizations.
I propose the front ends remain free to emit and use allocas to model
coro promise. At CoroEarly, we will replace all uses of promise alloca
with `coro.promise`. Non coroutine passes should only access promise
through `coro.promise`. Then at CoroSplit, we will lower `coro.promise`
back to promise alloca again. So that it will be correctly collected
into coro frame. Note that we do not have to bother maintainers of other
middle-end passes.
Fix#105595
As observed by LLVM Discord user "buck", the Radeon Pro 5600M is gfx1011
and the entry in the gfx1010 category was probably meant to be Radeon RX
5600M which indeed is gfx1010.
This patch adds a new variant of TMA Bulk Copy
intrinsics introduced in sm100+. This variant
has an additional byte_mask to select the bytes
for the copy operation.
* Selection is all done through table-gen now.
So, this patch removes the corresponding
SelectCpAsyncBulkS2G() function.
* lit tests are verified with a cuda-12.8 ptxas
executable.
PTX Spec link:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-bulk-copy
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
Make this consistent with other operations with respect to
signaling nan quieting. This was specifying that quieting is
required, which is true for IEEE. Make this consistent with other
IR operations, and signaling nan quieting is possible but optional
in the case where there are two nan inputs.
This permits directly selecting the intrinsic to the hardware
instruction in the default floating-point environment for shaders.
Thread-local globals live, by default, in the default globals address
space, which may not be 0, so we need to overload @llvm.thread.pointer
to support other address spaces, and use the default globals address
space in Clang.
This adds a GISelValueTrackingPrinterPass that can print the known bits
and sign bit of each def in a function. It is built on the new pass
manager and so adds a NPM GISelValueTrackingAnalysis, renaming the older
class to GISelValueTrackingAnalysisLegacy.
The first 2 functions from the AArch64GISelMITest are ported over to an
mir test to show it working. It also runs successfully on all files in
llvm/test/CodeGen/AArch64/GlobalISel/*.mir that are not invalid. It can
hopefully be used to test GlobalISel known bits analysis more directly
in common cases, without jumping through the hoops that the C++ tests
requires.
This updates all the extensions to their version in the v0.11.0 spec.
All changes from this version are already implemented or are not
relevant to LLVM.
This change also alphabetises the lists of Xqci extensions, to make
future checks easier, and removes irrelevant info from the usage docs.
Step 2 tells you to checkout "llvm-test-suite" to "test-suite", but I
don't see a particular reason to use a non-default path.
If you're following the instructions exactly, it all works, but if you
autopilot that step it is surprising later when things do not work.
It's not hard for an individual to fix later, but we should suggest the
least surprising thing where we can.
Currently, each variant in the variant part of a structure type can only
contain a single member. This was sufficient for Rust, where each
variant is represented as its own type.
However, this isn't really enough for Ada, where a variant can have
multiple members.
This patch adds support for this scenario. This is done by allowing the
use of DW_TAG_variant by DICompositeType, and then changing the DWARF
generator to recognize when a DIDerivedType representing a variant holds
one of these. In this case, the fields from the DW_TAG_variant are
inlined into the variant, like so:
```
<4><7d>: Abbrev Number: 9 (DW_TAG_variant)
<7e> DW_AT_discr_value : 74
<5><7f>: Abbrev Number: 7 (DW_TAG_member)
<80> DW_AT_name : (indirect string, offset: 0x43): field0
<84> DW_AT_type : <0xa7>
<88> DW_AT_alignment : 8
<89> DW_AT_data_member_location: 0
<5><8a>: Abbrev Number: 7 (DW_TAG_member)
<8b> DW_AT_name : (indirect string, offset: 0x4a): field1
<8f> DW_AT_type : <0xa7>
<93> DW_AT_alignment : 8
<94> DW_AT_data_member_location: 8
```
Note that the intermediate DIDerivedType is still needed in this
situation, because that is where the discriminants are stored.