11851 Commits

Author SHA1 Message Date
Jim Lin
8ddada41df
[RISCV] Add Andes XAndesVBFHCvt (Andes Vector BFLOAT16 Conversion) extension (#144320)
The spec can be found at:
https://github.com/andestech/andes-v5-isa/releases/tag/ast-v5_4_0-release.

This patch only supports assembler. The instructions are similar to
`Zvfbfmin` and the only difference with `Zvfbfmin` is that
`XAndesVBFHCvt` doesn't have mask variant.
2025-06-18 09:17:46 +08:00
Sam Elliott
98c6c371d6
[RISCV] Xqccmp v0.3 (#137854)
All the changes for v0.2 and v0.3 are either already implemented, or
irrelevant to the compiler implementation.
2025-06-16 22:13:45 -07:00
Sam Elliott
c0ac95181e
[RISCV] Update Xqci to v0.13.0 (#144398) 2025-06-16 22:12:12 -07:00
Scott Linder
e8362234f6
[Object][AMDGPU] Support REL relocations (#143966)
Shaders compiled with DXC/LLPC generate these relocations, and even if
that changes in the future we want to handle existing binaries. The
friction to support this and the maintenance cost long term both seem
incredibly low, considering other targets like ARM support both REL/RELA
static relocations behind the same interface.
2025-06-16 15:03:02 -04:00
Yingwei Zheng
299a55a88f
[InstCombine][Docs] Update InstCombine contributor guide (#144228)
Update the guideline to reduce the chance of miscompilation/performance
regression.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
Co-authored-by: Antonio Frighetto <me@antoniofrighetto.com>
2025-06-16 18:07:27 +08:00
Javier Lopez-Gomez
383b326879
[llvm-debuginfo-analyzer] Fix ODR violation in llvm::logicalview::LVObject (#140265)
Some data members are only part of a class definition in a Debug build,
e.g. `LVObject::ID`. If `debuginfologicalview` is used as a library,
`NDEBUG` cannot be used for this purpose, as this PP macro may have a
different definition in a downstream project, which in turn triggers an
ODR violation. Fix it by
- Making `LVObject::ID` an unconditional data member.
- Making `LVObject::dump()` non-virtual. Rationale: `virtual` is not
needed (and it calls `print()`, which is virtual anyway).

Fixes #139098.
2025-06-16 10:47:00 +02:00
Sudharsan Veeravalli
7d9a451d87
[RISCV] Change input register type for QC_SWM and QC_SWMI (#144294)
Version 0.13 of the `Xqci` spec changes the register type of input
operand `rs3` from `GPR` to `GPRNoX0` for these two instructions.

The spec can be found at
https://github.com/quic/riscv-unified-db/releases/tag/Xqci-0.13.0
2025-06-16 12:28:12 +05:30
Fabian Meumertzheim
83f215b035
Reland "[llvm-cov] Add support for baseline coverage" (#144130)
When no profile is provided, but the new --empty-profile option is
specified, the export/report/show commands now emit coverage data
equivalent to that obtained from a profile with all zero counters
("baseline coverage").

This is useful for build systems (e.g. Bazel) that can track coverage
information for each build target, even those that are never linked into
tests and thus don't have runtime coverage data recorded. By merging in
baseline coverage, lines in files that aren't linked into tests are
correctly reported as uncovered.

Reland with fixes to `CoverageMappingTest.cpp`.

Reverts llvm/llvm-project#144121
2025-06-13 12:09:58 -07:00
Keith Smiley
65d88d31ea
Revert "[llvm-cov] Add support for baseline coverage" (#144121)
Reverts llvm/llvm-project#117910

```
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/unittests/ProfileData/CoverageMappingTest.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/unittests/ProfileData/CoverageMappingTest.cpp:281:28: error: 'std::reference_wrapper' may not intend to support class template argument deduction [-Werror,-Wctad-maybe-unsupported]
  281 |         std::make_optional(std::reference_wrapper(*ProfileReader));
      |                            ^
/usr/lib/gcc/ppc64le-redhat-linux/8/../../../../include/c++/8/bits/refwrap.h:289:11: note: add a deduction guide to suppress this warning
  289 |     class reference_wrapper
      |           ^
```
2025-06-13 10:04:45 -07:00
Daniel Hernandez-Juarez
68b6f392ed
[MLIR][AMDGPU] Fix bug in GatherToLDSOpLowering, get the correct MemRefType for destination (#142915)
This PR fixes a bug in GatherToLDSOpLowering, we were getting the
MemRefType of source for the destination. Additionally, some related
typos are corrected.

CC: @krzysz00 @umangyadav @lialan
2025-06-13 11:33:51 -05:00
Fabian Meumertzheim
dc9e300f12
[llvm-cov] Add support for baseline coverage (#117910)
When no profile is provided, but the new --empty-profile option is
specifed, the export/report/show commands now emit coverage data
equivalent to that obtained from a profile with all zero counters
("baseline coverage").

This is useful for build systems (e.g. Bazel) that can track coverage
information for each build target, even those that are never linked into
tests and thus don't have runtime coverage data recorded. By merging in
baseline coverage, lines in files that aren't linked into tests are
correctly reported as uncovered.
2025-06-13 08:49:30 -07:00
Stephen Tozer
cc365331af
[DLCov] Origin-Tracking: Add config options (#143590)
This patch is part of a series that adds origin-tracking to the debugify
source location coverage checks, allowing us to report symbolized stack
traces of the point where missing source locations appear.

This patch adds the configuration options needed to enable this feature,
in the form of a new CMake option that enables a flag in
`llvm-config.h`; this is not an entirely new CMake flag, but a new
option, `COVERAGE_AND_ORIGIN`, for the existing flag
`LLVM_ENABLE_DEBUGLOC_COVERAGE_TRACKING`. This patch contains
documentation, but no actual implementation for the flag itself.
2025-06-13 12:54:30 +01:00
Diana Picus
a5cbd2ab0b
Revert "[AMDGPU] Skip register uses in AMDGPUResourceUsageAnalysis (#… (#144039)
…133242)"

This reverts commit 130080fab11cde5efcb338b77f5c3b31097df6e6 because it
causes issues in testcases similar to coalescer_remat.ll [1], i.e. when
we use a VGPR tuple but only write to its lower parts. The high VGPRs
would then not be included in the vgpr_count, and accessing them would
be an out of bounds violation.

[1]
https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AMDGPU/coalescer_remat.ll
2025-06-13 12:48:24 +02:00
Saiyedul Islam
432d06ab91
[NFC][AMDGPU] Fix stale links to ROCm repositories (#143949)
Following GitHub organizations were merged into the ROCm org:
  * ROCm-Developer-Tools
  * RadeonOpenCompute
  * ROCmSoftwarePlatform

Ensure that all hyperlinks to the old organizations now point to the new
organization at https://github.com/ROCm.
2025-06-13 11:33:52 +05:30
Durgadoss R
3e5d50f9c6
[NVPTX] Add cta_group support to TMA G2S intrinsics (#143178)
This patch extends the TMA G2S intrinsics with the
support for cta_group::1/2 available from Blackwell onwards.
The existing intrinsics are auto-upgraded with a default
value of '0' for the `cta_group` flag operand.

* lit tests are added for all combinations of the newer variants.
* Negative tests are added to validate the error-handling 
   when the value of the cta_group flag falls out-of-range.
* The generated PTX is verified with a 12.8 ptxas executable.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-06-12 15:20:39 +05:30
Paul Kirth
ad2a2b8eed
[llvm] Add a tool to check mustache compliance against the public spec (#142813)
This is a cli tool to that tests the conformance of LLVM's mustache
implementation against the public Mustache spec, hosted at
https://github.com/mustache/spec. This is a revised version of the
patches in #111487.

Co-authored-by: Peter Chou <peter.chou@mail.utoronto.ca>
2025-06-11 13:05:21 -07:00
RonDahan101
fa9e1a1515
[AArch64] Expand llvm.histogram intrinsic to support umax, umin, and uadd.sat operations (#138447)
This patch extends the llvm.histogram intrinsic to support additional
update operations beyond the existing add. Specifically, the new
supported operations are:

* umax: unsigned maximum

* umin: unsigned minimum

* uadd.sat: unsigned saturated addition

Based on the discussion from:


https://discourse.llvm.org/t/rfc-expanding-the-experimental-histogram-intrinsic/84673
2025-06-11 15:15:24 +01:00
arysef
78af498035
[LLVM][IR] Support target extension types in vectors (#140630)
This change is to support target extension types in vectors. The change
allows sized target extension types to opt-in to being a valid vector
element.
Allowing target extension types as vector elements will allow backends
to use vector operations such as `insertelement` and `extractelement` on
their target types with minimal changes.

RFC:
https://discourse.llvm.org/t/rfc-supporting-sized-target-extension-types-in-vector/86431
2025-06-09 12:03:15 -07:00
Durgadoss R
c4012bb5de
[NVPTX] Add pm_event intrinsics (#141278)
This patch adds the pm_event.mask intrinsic and its
clang-builtin.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-06-06 19:39:33 +05:30
Javier Lopez-Gomez
0f38c54c6f
[llvm-debuginfo-analyzer] Add support for parsing DWARF / CodeView SourceLanguage (#137223)
This pull request adds support for parsing the source language in both
DWARF and CodeView. Specifically,

- The `LVSourceLanguage` class is introduced to represent any supported
language by any of the debug info representations.

- Update `LVDWARFReader.cpp` and `LVCodeViewVisitor.cpp` to parse the
source language where it applies. Added a new `=Language` attribute;
`getAttributeLanguage()` is internally used to control whether this
information is being printed.
2025-06-06 15:03:07 +01:00
Ebuka Ezike
89d2d62e46
[symbolizer] Update Release notes. (#142951)
Also add post-commit changes from commit #71ba852

in PR #135857

---------

Co-authored-by: James Henderson <James.Henderson@sony.com>
2025-06-06 13:25:07 +01:00
Konrad Kleine
6918314918
[lit] show retry attempts (#142413)
If a test took more than one attempt to complete, show the number of attempts and the maximum allowed attempts as `2 of 4 attempts` inside the `<progress info>` (see [TEST RUN OUTPUT FORMAT](https://llvm.org/docs/CommandGuide/lit.html#test-run-output-format)).

NOTE: Additionally this is a fixup for #141851 where the tests were not quite right. `max-retries-per-test/allow-retries-test_retry_attempts/test.py` was added but never used there. Now we're calling it. To correlate better between the test output and the test script I've used higher numbers of max allowed retries.
2025-06-06 10:24:26 +02:00
Jim Lin
2a8c7d3c69
[RISCV] Add support for -mtune=andes-45-series (#142900)
Enables the use of `-mtune=andes-45-series` to generate code optimized
with the Andes 45 series scheduling model and tuning features.
2025-06-06 11:34:19 +08:00
Min-Yih Hsu
feb21e26fa
[RISCV] Add SiFive X390 processor definition (#142517)
X390 is an in-order core designed for AI/ML workload, with VLEN=1024.
https://www.sifive.com/cores/intelligence-x300-series

Scheduling model will be added in a follow-up patch.
2025-06-04 09:25:59 -07:00
Konrad Kleine
107d8c792f
[docs] don't use "=" in lit options with arguments (#142340)
This is a fixup for #141851 and removes `=` from all
options with additional arguments.

Before 14 out of 22 options with arguments used "=" and 7 didn't.
2025-06-04 12:36:12 +02:00
clubby789
c7c79d2590
[IR][DSE] Support non-malloc functions in malloc+memset->calloc fold (#138299)
Add a `alloc-variant-zeroed` function attribute which can be used to
inform folding allocation+memset. This addresses
https://github.com/rust-lang/rust/issues/104847, where LLVM does not
know how to perform this transformation for non-C languages.

Co-authored-by: Jamie <jamie@osec.io>
2025-06-04 09:35:20 +02:00
Nikita Popov
7547ff5cad
[X86] Consistently use f128 libcalls (#142386)
On x86, the `*l` libcalls are for 80-bit extended precision. `fp128`
needs to use the `*f128` libcalls instead.

Add a few missing ones, esp. for FP min/max.

Also use the `f128` libcalls on x86-32. I believe the situation there is
the same as on x86-64.
2025-06-03 12:28:31 +02:00
Diana Picus
130080fab1
[AMDGPU] Skip register uses in AMDGPUResourceUsageAnalysis (#133242)
Don't count register uses when determining the maximum number of
registers used by a function. Count only the defs. This is really an
underestimate of the true register usage, but in practice that's not
a problem because if a function uses a register, then it has either
defined it earlier, or some other function that executed before has
defined it.

In particular, the register counts are used:
1. When launching an entry function - in which case we're safe because
   the register counts of the entry function will include the register
   counts of all callees.
2. At function boundaries in dynamic VGPR mode. In this case it's safe
   because whenever we set the new VGPR allocation we take into account
   the outgoing_vgpr_count set by the middle-end.

The main advantage of doing this is that the artificial VGPR arguments
used only for preserving the inactive lanes when using the
llvm.amdgcn.init.whole.wave intrinsic are no longer counted. This
enables us to allocate only the registers we need in dynamic VGPR mode.

---------

Co-authored-by: Thomas Symalla <5754458+tsymalla@users.noreply.github.com>
2025-06-03 11:20:48 +02:00
Simon Tatham
56acb06bc6
[ARM,AArch64] Don't put BTI at asm goto branch targets (#141562)
In 'asm goto' statements ('callbr' in LLVM IR), you can specify one or
more labels / basic blocks in the containing function which the assembly
code might jump to. If you're also compiling with branch target
enforcement via BTI, then previously listing a basic block as a possible
jump destination of an asm goto would cause a BTI instruction to be
placed at the start of the block, in case the assembly code used an
_indirect_ branch instruction (i.e. to a destination address read from a
register) to jump to that location. Now it doesn't do that any more:
branches to destination labels from the assembly code are assumed to be
direct branches (to a relative offset encoded in the instruction), which
don't require a BTI at their destination.

This change was proposed in https://discourse.llvm.org/t/85845 and there
seemed to be no disagreement. The rationale is:

1. it brings clang's handling of asm goto in Arm and AArch64 in line
with gcc's, which didn't generate BTIs at the target labels in the first
place.

2. it improves performance in the Linux kernel, which uses a lot of 'asm
goto' in which the assembly language just contains a NOP, and the
label's address is saved elsewhere to let the kernel self-modify at run
time to swap between the original NOP and a direct branch to the label.
This allows hot code paths to be instrumented for debugging, at only the
cost of a NOP when the instrumentation is turned off, instead of the
larger cost of an indirect branch. In this situation a BTI is
unnecessary (if the branch happens it's direct), and since the code
paths are hot, also a noticeable performance hit.

Implementation:

`SelectionDAGBuilder::visitCallBr` is the place where 'asm goto' target
labels are handled. It calls `setIsInlineAsmBrIndirectTarget()` on each
target `MachineBasicBlock`. Previously it also called
`setMachineBlockAddressTaken()`, which made `hasAddressTaken()` return
true, which caused a BTI to be added in the Arm backends.

Now `visitCallBr` doesn't call `setMachineBlockAddressTaken()` any more
on asm goto targets, but `hasAddressTaken()` also checks the flag set by
`setIsInlineAsmBrIndirectTarget()`. So call sites that were using
`hasAddressTaken()` don't need to be modified. But the Arm backends
don't call `hasAddressTaken()` any more: instead they test two more
specific query functions that cover all the reasons `hasAddressTaken()`
might have returned true _except_ being an asm goto target.

Testing:

The new test `AArch64/callbr-asm-label-bti.ll` is testing the actual
change, where it expects not to see a `bti` instruction after
`[[LABEL]]`. The rest of the test changes are all churn, due to the
flags on basic blocks changing. Actual output code hasn't changed in any
of the existing tests, only comments and diagnostics.

Further work:

`RISCVIndirectBranchTracking.cpp` and `X86IndirectBranchTracking.cpp`
also call `hasAddressTaken()` in a way that might benefit from using the
same more specific check I've put in `ARMBranchTargets.cpp` and
`AArch64BranchTargets.cpp`. But I'm not sure of that, so in this commit
I've only changed the Arm backends, and left those alone.
2025-06-03 08:44:13 +01:00
S. VenkataKeerthy
741136a8ac
[NFC][IR2Vec] Removing Dimension from Embedder::Create (#142486)
This PR removes the necessity to know the dimension of the embeddings while invoking `Embedder::Create`. Having the `Dimension` parameter introduces complexities in downstream consumers. 

(Tracking issue - #141817)
2025-06-02 15:05:11 -07:00
Cullen Rhodes
e3a0cb8d3f
[docs] Update ir-normalizer to normalize (#141764)
While the class name is IRNormalizer the registered name is "normalize"
and the docs should match this.

Fixes #136347.
2025-06-02 09:54:12 +01:00
Konrad Kleine
84fd907aa7
[lit] add --max-retries-per-test execution option (#141851)
When packaging LLVM we've seen arbitrary tests fail.
It happened sporadically and most of the times the test
do work if they are run a second time on the next day.

The tests themselves were always different and we didn't
know ahead of time which ones we wanted to re-run.
That's we filter-out a lot of `libomp` and `libarcher` tests [1].

This change allows us to set
`LIT_OPTS="--max-retries-per-test=12"`
when running any "check-XXX" build target. Then any lit test
will at most be re-run 12 times, unless there's an `ALLOW_RETRIES:`
in one of the test scripts that's specifying a different value
than `12`. `12` is just an example here, any positive integer
will work.

Please note, that this only adds the possibility to re-run
lit tests. It does not actually do it until the caller specifies
`--max-retries-per-test=<POSITIVE_INT>` either on a call to `lit` or
in `LIT_OPTS`.

Also note, that one can still use `ALLOW_RETRIES:` in test scripts
and it will always rule over `--max-retries-per-test`. When
`--max-retries-per-test` is set too low, but the
`config.test_retry_attempts`
is high enough, it works as well.

Any option in the list below overrules its predecessor:

* `--max-retries-per-test`
* `config.test_retry_attempts`
* `ALLOW_RETRIES` keyword

From the above options to re-run tests, `--max-retries-per-test` is the
only one that doesn't require a change in the test scripts or the test
config.

[1]:
https://src.fedoraproject.org/rpms/llvm/blob/rawhide/f/llvm.spec#_2326

Downstream PR to make use of the `--max-retries-per-test` option:
https://src.fedoraproject.org/rpms/llvm/pull-request/434
Downstream ticket: https://issues.redhat.com/browse/LLVM-145
2025-05-31 03:46:12 +02:00
Jacob Lalonde
1a1927abd3
[lldb] Add release note about ELF thread siginfo and negative SI Codes. (#141631)
Adding a release note about adding `thread siginfo` support to
`ThreadELFCore` and expanding Linux signals to understand user space
signals.
2025-05-30 09:15:16 -07:00
Ying Chen
5483190216
[RISCV] Add shlcofideleg extension (#141572)
This is for `shlcofideleg` extension, that supports delegating LCOFI
interrupts to VS-mode.

Spec:
https://github.com/riscv/riscv-isa-manual/blob/main/src/hypervisor.adoc
2025-05-30 16:52:08 +08:00
Cyndy Ishida
17ee4aecda
[docs] Update expected cadence of when Transparency reports are published (#142026) 2025-05-29 13:48:35 -07:00
S. VenkataKeerthy
ed5eb1c6c6
[NFC] Fixing typos in MLGO.rst (#141878) 2025-05-28 16:39:38 -07:00
S. VenkataKeerthy
3581e9bb4c
[NFC][IR2Vec] Refactoring for Stateless Embedding Computation (#141811)
Currently, users have to invoke two APIs: `computeEmbeddings()` followed
by getters to access the embeddings. This PR refactors the code to
reduce this *stateful* access of APIs. Users can now directly invoke
getters; Internally, getters would compute the embeddings.
2025-05-28 12:19:02 -07:00
Jonas Devlieghere
4000113b88
[lldb] Add release note for riscv32 elf core file support in LLDB (#141629)
Add a release note for riscv32 elf core file support in LLDB.
2025-05-28 08:39:58 -07:00
Luke Lau
3033f202f6
[IR] Add llvm.vector.[de]interleave{4,6,8} (#139893)
This adds [de]interleave intrinsics for factors of 4,6,8, so that every
interleaved memory operation supported by the in-tree targets can be
represented by a single intrinsic.

For context, [de]interleaves of fixed-length vectors are represented by
a series of shufflevectors. The intrinsics are needed for scalable
vectors, and we don't currently scalably vectorize all possible factors
of interleave groups supported by RISC-V/AArch64.

The underlying reason for this is that higher factors are currently
represented by interleaving multiple interleaves themselves, which made
sense at the time in the discussion in
https://github.com/llvm/llvm-project/pull/89018.

But after trying to integrate these for higher factors on RISC-V I think
we should revisit this design choice:

- Matching these in InterleavedAccessPass is non-trivial: We currently
only support factors that are a power of 2, and detecting this requires
a good chunk of code
- The shufflevector masks used for [de]interleaves of fixed-length
vectors are much easier to pattern match as they are strided patterns,
but for the intrinsics it's much more complicated to match as the
structure is a tree.
- Unlike shufflevectors, there's no optimisation that happens on
[de]interleave2 intriniscs
- For non-power-of-2 factors e.g. 6, there are multiple possible ways a
[de]interleave could be represented, see the discussion in #139373
- We already have intrinsics for 2,3,5 and 7, so by avoiding 4,6 and 8
we're not really saving much

By representing these higher factors are interleaved-interleaves, we can
in theory support arbitrarily high interleave factors. However I'm not
sure this is actually needed in practice: SVE only has instructions
for factors 2,3,4, whilst RVV only supports up to factor 8.

This patch would make it much easier to support scalable interleaved
accesses in the loop vectorizer for RISC-V for factors 3,5,6 and 7, as
the loop vectorizer and InterleavedAccessPass wouldn't need to
construct and match trees of interleaves.

For interleave factors above 8, for which there are no hardware memory
operations to match in the InterleavedAccessPass, we can still keep the
wide load + recursive interleaving in the loop vectorizer.
2025-05-26 18:45:12 +01:00
Viktoria Maximova
435d8b12ef
Reland [SPIR-V] Support SPV_INTEL_int4 extension (#141279)
This relands #141031 

This change ensures generated SPIR-V is valid and passes machine
verification:
```
*** Bad machine code: inconsistent constant size ***
- function:    foo
- basic block: %bb.1 entry (0x9ec9298)
- instruction: %12:iid(s8) = G_CONSTANT i4 1
```
That is done by promoting `G_CONSTANT` instructions with small integer
types (e.g., `i4`) to `i8` if no extensions for "special" integer types
are enabled.
2025-05-26 14:15:27 +02:00
Weining Lu
581d175a86 [LoongArch] Document the inline asm q constraint
See #141037.
2025-05-26 09:14:27 +08:00
Jim Lin
7e09a00d79
[llvm-exegesis][Docs] --dump-object-to-disk option is specified by filename rather than bool (#141178) 2025-05-24 10:33:29 +08:00
Mircea Trofin
c6be4566bc
[docs][mlgo] Document custom builds (#141243) 2025-05-23 12:23:58 -07:00
bd1976bris
6520b21ce0
[DTLTO][LLVM] Integrated Distributed ThinLTO (DTLTO) (#127749)
This patch adds initial support for Integrated Distributed ThinLTO
(DTLTO) in LLVM, which manages distribution internally during the
traditional link step. This enables compatibility with any build
system that supports in-process ThinLTO. In contrast, existing
approaches to distributed ThinLTO, which split the thin-link
(--thinlto-index-only), backend compilation, and final link into
separate steps, require build system support, e.g. Bazel.

This patch implements the core DTLTO mechanism, which enables
delegation of ThinLTO backend jobs to an external process (the
distributor). The distributor can then manage job distribution through
systems like Incredibuild. A generic JSON interface is used to
communicate with the distributor, allowing for the creation of new
distributors (and thus integration with different distribution
systems) without modifying LLVM.

Please see llvm/docs/dtlto.rst for more details.

RFC: https://discourse.llvm.org/t/rfc-integrated-distributed-thinlto/69641
Design Review: https://github.com/llvm/llvm-project/pull/126654
2025-05-23 20:07:53 +01:00
Kazu Hirata
961613bd24
[llvm] Fix a typo in documentation (#141204) 2025-05-23 10:33:02 -07:00
David Salinas
51a03ed272
Extend llvm objdump fatbin (#140286)
Utilize the new extensions to the LLVM Offloading API to extend to
llvm-objdump to handle dumping fatbin offload bundles generated by HIP.
This extension to llvm-objdump adds the option --offload-fatbin.
Specifying this option will take the input object/executable and extract
all offload fatbin bundle entries into distinct code object files with
names reflecting the source file name combined with the Bundle Entry ID.
Users can also use the --arch-name option to filter offload fatbin
bundle entries by their target triple.

---------

Co-authored-by: dsalinas <dsalinas@MKM-L1-DSALINAS.amd.com>
2025-05-23 11:55:16 -04:00
Dmitry Sidorov
69d6c1ff66
Revert "[SPIR-V] Support SPV_INTEL_int4 extension" (#141219)
Reverts llvm/llvm-project#141031
2025-05-23 11:45:55 +02:00
Alex MacLean
3a84a4e55d
Reland "[NVPTX] Unify and extend barrier{.cta} intrinsic support" (#141143)
Note: This relands #140615 adding a ".count" suffix to the non-".all"
variants.

Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.

- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.count(i32, i32)

The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:

* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned.count(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync.count(x, y)
2025-05-22 19:38:10 -07:00
S. VenkataKeerthy
d6596482ef
Minor typo fix in IR2Vec section of MLGO doc (#141162)
Co-authored-by: svkeerthy <venkatakeerthy@google.com>
2025-05-22 16:49:21 -07:00
Kazu Hirata
19a4e5202d
[llvm] Fix typos in documentation (#141078) 2025-05-22 13:58:42 -07:00