llvm-project

Author	SHA1	Message	Date
Jim Lin	8ddada41df	[RISCV] Add Andes XAndesVBFHCvt (Andes Vector BFLOAT16 Conversion) extension (#144320 ) The spec can be found at: https://github.com/andestech/andes-v5-isa/releases/tag/ast-v5_4_0-release. This patch only supports assembler. The instructions are similar to `Zvfbfmin` and the only difference with `Zvfbfmin` is that `XAndesVBFHCvt` doesn't have mask variant.	2025-06-18 09:17:46 +08:00
Sam Elliott	98c6c371d6	[RISCV] Xqccmp v0.3 (#137854 ) All the changes for v0.2 and v0.3 are either already implemented, or irrelevant to the compiler implementation.	2025-06-16 22:13:45 -07:00
Sam Elliott	c0ac95181e	[RISCV] Update Xqci to v0.13.0 (#144398 )	2025-06-16 22:12:12 -07:00
Scott Linder	e8362234f6	[Object][AMDGPU] Support REL relocations (#143966 ) Shaders compiled with DXC/LLPC generate these relocations, and even if that changes in the future we want to handle existing binaries. The friction to support this and the maintenance cost long term both seem incredibly low, considering other targets like ARM support both REL/RELA static relocations behind the same interface.	2025-06-16 15:03:02 -04:00
Yingwei Zheng	299a55a88f	[InstCombine][Docs] Update InstCombine contributor guide (#144228 ) Update the guideline to reduce the chance of miscompilation/performance regression. --------- Co-authored-by: Nikita Popov <github@npopov.com> Co-authored-by: Antonio Frighetto <me@antoniofrighetto.com>	2025-06-16 18:07:27 +08:00
Javier Lopez-Gomez	383b326879	[llvm-debuginfo-analyzer] Fix ODR violation in llvm::logicalview::LVObject (#140265 ) Some data members are only part of a class definition in a Debug build, e.g. `LVObject::ID`. If `debuginfologicalview` is used as a library, `NDEBUG` cannot be used for this purpose, as this PP macro may have a different definition in a downstream project, which in turn triggers an ODR violation. Fix it by - Making `LVObject::ID` an unconditional data member. - Making `LVObject::dump()` non-virtual. Rationale: `virtual` is not needed (and it calls `print()`, which is virtual anyway). Fixes #139098.	2025-06-16 10:47:00 +02:00
Sudharsan Veeravalli	7d9a451d87	[RISCV] Change input register type for QC_SWM and QC_SWMI (#144294 ) Version 0.13 of the `Xqci` spec changes the register type of input operand `rs3` from `GPR` to `GPRNoX0` for these two instructions. The spec can be found at https://github.com/quic/riscv-unified-db/releases/tag/Xqci-0.13.0	2025-06-16 12:28:12 +05:30
Fabian Meumertzheim	83f215b035	Reland "[llvm-cov] Add support for baseline coverage" (#144130 ) When no profile is provided, but the new --empty-profile option is specified, the export/report/show commands now emit coverage data equivalent to that obtained from a profile with all zero counters ("baseline coverage"). This is useful for build systems (e.g. Bazel) that can track coverage information for each build target, even those that are never linked into tests and thus don't have runtime coverage data recorded. By merging in baseline coverage, lines in files that aren't linked into tests are correctly reported as uncovered. Reland with fixes to `CoverageMappingTest.cpp`. Reverts llvm/llvm-project#144121	2025-06-13 12:09:58 -07:00
Keith Smiley	65d88d31ea	Revert "[llvm-cov] Add support for baseline coverage" (#144121 ) Reverts llvm/llvm-project#117910 ``` /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/unittests/ProfileData/CoverageMappingTest.cpp /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/unittests/ProfileData/CoverageMappingTest.cpp:281:28: error: 'std::reference_wrapper' may not intend to support class template argument deduction [-Werror,-Wctad-maybe-unsupported] 281 \| std::make_optional(std::reference_wrapper(*ProfileReader)); \| ^ /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../include/c++/8/bits/refwrap.h:289:11: note: add a deduction guide to suppress this warning 289 \| class reference_wrapper \| ^ ```	2025-06-13 10:04:45 -07:00
Daniel Hernandez-Juarez	68b6f392ed	[MLIR][AMDGPU] Fix bug in GatherToLDSOpLowering, get the correct MemRefType for destination (#142915 ) This PR fixes a bug in GatherToLDSOpLowering, we were getting the MemRefType of source for the destination. Additionally, some related typos are corrected. CC: @krzysz00 @umangyadav @lialan	2025-06-13 11:33:51 -05:00
Fabian Meumertzheim	dc9e300f12	[llvm-cov] Add support for baseline coverage (#117910 ) When no profile is provided, but the new --empty-profile option is specifed, the export/report/show commands now emit coverage data equivalent to that obtained from a profile with all zero counters ("baseline coverage"). This is useful for build systems (e.g. Bazel) that can track coverage information for each build target, even those that are never linked into tests and thus don't have runtime coverage data recorded. By merging in baseline coverage, lines in files that aren't linked into tests are correctly reported as uncovered.	2025-06-13 08:49:30 -07:00
Stephen Tozer	cc365331af	[DLCov] Origin-Tracking: Add config options (#143590 ) This patch is part of a series that adds origin-tracking to the debugify source location coverage checks, allowing us to report symbolized stack traces of the point where missing source locations appear. This patch adds the configuration options needed to enable this feature, in the form of a new CMake option that enables a flag in `llvm-config.h`; this is not an entirely new CMake flag, but a new option, `COVERAGE_AND_ORIGIN`, for the existing flag `LLVM_ENABLE_DEBUGLOC_COVERAGE_TRACKING`. This patch contains documentation, but no actual implementation for the flag itself.	2025-06-13 12:54:30 +01:00
Diana Picus	a5cbd2ab0b	Revert "[AMDGPU] Skip register uses in AMDGPUResourceUsageAnalysis (#… (#144039 ) …133242)" This reverts commit 130080fab11cde5efcb338b77f5c3b31097df6e6 because it causes issues in testcases similar to coalescer_remat.ll [1], i.e. when we use a VGPR tuple but only write to its lower parts. The high VGPRs would then not be included in the vgpr_count, and accessing them would be an out of bounds violation. [1] https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AMDGPU/coalescer_remat.ll	2025-06-13 12:48:24 +02:00
Saiyedul Islam	432d06ab91	[NFC][AMDGPU] Fix stale links to ROCm repositories (#143949 ) Following GitHub organizations were merged into the ROCm org: * ROCm-Developer-Tools * RadeonOpenCompute * ROCmSoftwarePlatform Ensure that all hyperlinks to the old organizations now point to the new organization at https://github.com/ROCm.	2025-06-13 11:33:52 +05:30
Durgadoss R	3e5d50f9c6	[NVPTX] Add cta_group support to TMA G2S intrinsics (#143178 ) This patch extends the TMA G2S intrinsics with the support for cta_group::1/2 available from Blackwell onwards. The existing intrinsics are auto-upgraded with a default value of '0' for the `cta_group` flag operand. * lit tests are added for all combinations of the newer variants. * Negative tests are added to validate the error-handling when the value of the cta_group flag falls out-of-range. * The generated PTX is verified with a 12.8 ptxas executable. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-06-12 15:20:39 +05:30
Paul Kirth	ad2a2b8eed	[llvm] Add a tool to check mustache compliance against the public spec (#142813 ) This is a cli tool to that tests the conformance of LLVM's mustache implementation against the public Mustache spec, hosted at https://github.com/mustache/spec. This is a revised version of the patches in #111487. Co-authored-by: Peter Chou <peter.chou@mail.utoronto.ca>	2025-06-11 13:05:21 -07:00
RonDahan101	fa9e1a1515	[AArch64] Expand llvm.histogram intrinsic to support umax, umin, and uadd.sat operations (#138447 ) This patch extends the llvm.histogram intrinsic to support additional update operations beyond the existing add. Specifically, the new supported operations are: * umax: unsigned maximum * umin: unsigned minimum * uadd.sat: unsigned saturated addition Based on the discussion from: https://discourse.llvm.org/t/rfc-expanding-the-experimental-histogram-intrinsic/84673	2025-06-11 15:15:24 +01:00
arysef	78af498035	[LLVM][IR] Support target extension types in vectors (#140630 ) This change is to support target extension types in vectors. The change allows sized target extension types to opt-in to being a valid vector element. Allowing target extension types as vector elements will allow backends to use vector operations such as `insertelement` and `extractelement` on their target types with minimal changes. RFC: https://discourse.llvm.org/t/rfc-supporting-sized-target-extension-types-in-vector/86431	2025-06-09 12:03:15 -07:00
Durgadoss R	c4012bb5de	[NVPTX] Add pm_event intrinsics (#141278 ) This patch adds the pm_event.mask intrinsic and its clang-builtin. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-06-06 19:39:33 +05:30
Javier Lopez-Gomez	0f38c54c6f	[llvm-debuginfo-analyzer] Add support for parsing DWARF / CodeView SourceLanguage (#137223 ) This pull request adds support for parsing the source language in both DWARF and CodeView. Specifically, - The `LVSourceLanguage` class is introduced to represent any supported language by any of the debug info representations. - Update `LVDWARFReader.cpp` and `LVCodeViewVisitor.cpp` to parse the source language where it applies. Added a new `=Language` attribute; `getAttributeLanguage()` is internally used to control whether this information is being printed.	2025-06-06 15:03:07 +01:00
Ebuka Ezike	89d2d62e46	[symbolizer] Update Release notes. (#142951 ) Also add post-commit changes from commit #71ba852 in PR #135857 --------- Co-authored-by: James Henderson <James.Henderson@sony.com>	2025-06-06 13:25:07 +01:00
Konrad Kleine	6918314918	[lit] show retry attempts (#142413 ) If a test took more than one attempt to complete, show the number of attempts and the maximum allowed attempts as `2 of 4 attempts` inside the `<progress info>` (see [TEST RUN OUTPUT FORMAT](https://llvm.org/docs/CommandGuide/lit.html#test-run-output-format)). NOTE: Additionally this is a fixup for #141851 where the tests were not quite right. `max-retries-per-test/allow-retries-test_retry_attempts/test.py` was added but never used there. Now we're calling it. To correlate better between the test output and the test script I've used higher numbers of max allowed retries.	2025-06-06 10:24:26 +02:00
Jim Lin	2a8c7d3c69	[RISCV] Add support for -mtune=andes-45-series (#142900 ) Enables the use of `-mtune=andes-45-series` to generate code optimized with the Andes 45 series scheduling model and tuning features.	2025-06-06 11:34:19 +08:00
Min-Yih Hsu	feb21e26fa	[RISCV] Add SiFive X390 processor definition (#142517 ) X390 is an in-order core designed for AI/ML workload, with VLEN=1024. https://www.sifive.com/cores/intelligence-x300-series Scheduling model will be added in a follow-up patch.	2025-06-04 09:25:59 -07:00
Konrad Kleine	107d8c792f	[docs] don't use "=" in lit options with arguments (#142340 ) This is a fixup for #141851 and removes `=` from all options with additional arguments. Before 14 out of 22 options with arguments used "=" and 7 didn't.	2025-06-04 12:36:12 +02:00
clubby789	c7c79d2590	[IR][DSE] Support non-malloc functions in malloc+memset->calloc fold (#138299 ) Add a `alloc-variant-zeroed` function attribute which can be used to inform folding allocation+memset. This addresses https://github.com/rust-lang/rust/issues/104847, where LLVM does not know how to perform this transformation for non-C languages. Co-authored-by: Jamie <jamie@osec.io>	2025-06-04 09:35:20 +02:00
Nikita Popov	7547ff5cad	[X86] Consistently use f128 libcalls (#142386 ) On x86, the `l` libcalls are for 80-bit extended precision. `fp128` needs to use the `f128` libcalls instead. Add a few missing ones, esp. for FP min/max. Also use the `f128` libcalls on x86-32. I believe the situation there is the same as on x86-64.	2025-06-03 12:28:31 +02:00
Diana Picus	130080fab1	[AMDGPU] Skip register uses in AMDGPUResourceUsageAnalysis (#133242 ) Don't count register uses when determining the maximum number of registers used by a function. Count only the defs. This is really an underestimate of the true register usage, but in practice that's not a problem because if a function uses a register, then it has either defined it earlier, or some other function that executed before has defined it. In particular, the register counts are used: 1. When launching an entry function - in which case we're safe because the register counts of the entry function will include the register counts of all callees. 2. At function boundaries in dynamic VGPR mode. In this case it's safe because whenever we set the new VGPR allocation we take into account the outgoing_vgpr_count set by the middle-end. The main advantage of doing this is that the artificial VGPR arguments used only for preserving the inactive lanes when using the llvm.amdgcn.init.whole.wave intrinsic are no longer counted. This enables us to allocate only the registers we need in dynamic VGPR mode. --------- Co-authored-by: Thomas Symalla <5754458+tsymalla@users.noreply.github.com>	2025-06-03 11:20:48 +02:00
Simon Tatham	56acb06bc6	[ARM,AArch64] Don't put BTI at asm goto branch targets (#141562 ) In 'asm goto' statements ('callbr' in LLVM IR), you can specify one or more labels / basic blocks in the containing function which the assembly code might jump to. If you're also compiling with branch target enforcement via BTI, then previously listing a basic block as a possible jump destination of an asm goto would cause a BTI instruction to be placed at the start of the block, in case the assembly code used an _indirect_ branch instruction (i.e. to a destination address read from a register) to jump to that location. Now it doesn't do that any more: branches to destination labels from the assembly code are assumed to be direct branches (to a relative offset encoded in the instruction), which don't require a BTI at their destination. This change was proposed in https://discourse.llvm.org/t/85845 and there seemed to be no disagreement. The rationale is: 1. it brings clang's handling of asm goto in Arm and AArch64 in line with gcc's, which didn't generate BTIs at the target labels in the first place. 2. it improves performance in the Linux kernel, which uses a lot of 'asm goto' in which the assembly language just contains a NOP, and the label's address is saved elsewhere to let the kernel self-modify at run time to swap between the original NOP and a direct branch to the label. This allows hot code paths to be instrumented for debugging, at only the cost of a NOP when the instrumentation is turned off, instead of the larger cost of an indirect branch. In this situation a BTI is unnecessary (if the branch happens it's direct), and since the code paths are hot, also a noticeable performance hit. Implementation: `SelectionDAGBuilder::visitCallBr` is the place where 'asm goto' target labels are handled. It calls `setIsInlineAsmBrIndirectTarget()` on each target `MachineBasicBlock`. Previously it also called `setMachineBlockAddressTaken()`, which made `hasAddressTaken()` return true, which caused a BTI to be added in the Arm backends. Now `visitCallBr` doesn't call `setMachineBlockAddressTaken()` any more on asm goto targets, but `hasAddressTaken()` also checks the flag set by `setIsInlineAsmBrIndirectTarget()`. So call sites that were using `hasAddressTaken()` don't need to be modified. But the Arm backends don't call `hasAddressTaken()` any more: instead they test two more specific query functions that cover all the reasons `hasAddressTaken()` might have returned true _except_ being an asm goto target. Testing: The new test `AArch64/callbr-asm-label-bti.ll` is testing the actual change, where it expects not to see a `bti` instruction after `[[LABEL]]`. The rest of the test changes are all churn, due to the flags on basic blocks changing. Actual output code hasn't changed in any of the existing tests, only comments and diagnostics. Further work: `RISCVIndirectBranchTracking.cpp` and `X86IndirectBranchTracking.cpp` also call `hasAddressTaken()` in a way that might benefit from using the same more specific check I've put in `ARMBranchTargets.cpp` and `AArch64BranchTargets.cpp`. But I'm not sure of that, so in this commit I've only changed the Arm backends, and left those alone.	2025-06-03 08:44:13 +01:00
S. VenkataKeerthy	741136a8ac	[NFC][IR2Vec] Removing Dimension from `Embedder::Create` (#142486 ) This PR removes the necessity to know the dimension of the embeddings while invoking `Embedder::Create`. Having the `Dimension` parameter introduces complexities in downstream consumers. (Tracking issue - #141817)	2025-06-02 15:05:11 -07:00
Cullen Rhodes	e3a0cb8d3f	[docs] Update ir-normalizer to normalize (#141764 ) While the class name is IRNormalizer the registered name is "normalize" and the docs should match this. Fixes #136347.	2025-06-02 09:54:12 +01:00
Konrad Kleine	84fd907aa7	[lit] add --max-retries-per-test execution option (#141851 ) When packaging LLVM we've seen arbitrary tests fail. It happened sporadically and most of the times the test do work if they are run a second time on the next day. The tests themselves were always different and we didn't know ahead of time which ones we wanted to re-run. That's we filter-out a lot of `libomp` and `libarcher` tests [1]. This change allows us to set `LIT_OPTS="--max-retries-per-test=12"` when running any "check-XXX" build target. Then any lit test will at most be re-run 12 times, unless there's an `ALLOW_RETRIES:` in one of the test scripts that's specifying a different value than `12`. `12` is just an example here, any positive integer will work. Please note, that this only adds the possibility to re-run lit tests. It does not actually do it until the caller specifies `--max-retries-per-test=<POSITIVE_INT>` either on a call to `lit` or in `LIT_OPTS`. Also note, that one can still use `ALLOW_RETRIES:` in test scripts and it will always rule over `--max-retries-per-test`. When `--max-retries-per-test` is set too low, but the `config.test_retry_attempts` is high enough, it works as well. Any option in the list below overrules its predecessor: * `--max-retries-per-test` * `config.test_retry_attempts` * `ALLOW_RETRIES` keyword From the above options to re-run tests, `--max-retries-per-test` is the only one that doesn't require a change in the test scripts or the test config. [1]: https://src.fedoraproject.org/rpms/llvm/blob/rawhide/f/llvm.spec#_2326 Downstream PR to make use of the `--max-retries-per-test` option: https://src.fedoraproject.org/rpms/llvm/pull-request/434 Downstream ticket: https://issues.redhat.com/browse/LLVM-145	2025-05-31 03:46:12 +02:00
Jacob Lalonde	1a1927abd3	[lldb] Add release note about ELF `thread siginfo` and negative SI Codes. (#141631 ) Adding a release note about adding `thread siginfo` support to `ThreadELFCore` and expanding Linux signals to understand user space signals.	2025-05-30 09:15:16 -07:00
Ying Chen	5483190216	[RISCV] Add shlcofideleg extension (#141572 ) This is for `shlcofideleg` extension, that supports delegating LCOFI interrupts to VS-mode. Spec: https://github.com/riscv/riscv-isa-manual/blob/main/src/hypervisor.adoc	2025-05-30 16:52:08 +08:00
Cyndy Ishida	17ee4aecda	[docs] Update expected cadence of when Transparency reports are published (#142026 )	2025-05-29 13:48:35 -07:00
S. VenkataKeerthy	ed5eb1c6c6	[NFC] Fixing typos in MLGO.rst (#141878 )	2025-05-28 16:39:38 -07:00
S. VenkataKeerthy	3581e9bb4c	[NFC][IR2Vec] Refactoring for Stateless Embedding Computation (#141811 ) Currently, users have to invoke two APIs: `computeEmbeddings()` followed by getters to access the embeddings. This PR refactors the code to reduce this stateful access of APIs. Users can now directly invoke getters; Internally, getters would compute the embeddings.	2025-05-28 12:19:02 -07:00
Jonas Devlieghere	4000113b88	[lldb] Add release note for riscv32 elf core file support in LLDB (#141629 ) Add a release note for riscv32 elf core file support in LLDB.	2025-05-28 08:39:58 -07:00
Luke Lau	3033f202f6	[IR] Add llvm.vector.[de]interleave{4,6,8} (#139893 ) This adds [de]interleave intrinsics for factors of 4,6,8, so that every interleaved memory operation supported by the in-tree targets can be represented by a single intrinsic. For context, [de]interleaves of fixed-length vectors are represented by a series of shufflevectors. The intrinsics are needed for scalable vectors, and we don't currently scalably vectorize all possible factors of interleave groups supported by RISC-V/AArch64. The underlying reason for this is that higher factors are currently represented by interleaving multiple interleaves themselves, which made sense at the time in the discussion in https://github.com/llvm/llvm-project/pull/89018. But after trying to integrate these for higher factors on RISC-V I think we should revisit this design choice: - Matching these in InterleavedAccessPass is non-trivial: We currently only support factors that are a power of 2, and detecting this requires a good chunk of code - The shufflevector masks used for [de]interleaves of fixed-length vectors are much easier to pattern match as they are strided patterns, but for the intrinsics it's much more complicated to match as the structure is a tree. - Unlike shufflevectors, there's no optimisation that happens on [de]interleave2 intriniscs - For non-power-of-2 factors e.g. 6, there are multiple possible ways a [de]interleave could be represented, see the discussion in #139373 - We already have intrinsics for 2,3,5 and 7, so by avoiding 4,6 and 8 we're not really saving much By representing these higher factors are interleaved-interleaves, we can in theory support arbitrarily high interleave factors. However I'm not sure this is actually needed in practice: SVE only has instructions for factors 2,3,4, whilst RVV only supports up to factor 8. This patch would make it much easier to support scalable interleaved accesses in the loop vectorizer for RISC-V for factors 3,5,6 and 7, as the loop vectorizer and InterleavedAccessPass wouldn't need to construct and match trees of interleaves. For interleave factors above 8, for which there are no hardware memory operations to match in the InterleavedAccessPass, we can still keep the wide load + recursive interleaving in the loop vectorizer.	2025-05-26 18:45:12 +01:00
Viktoria Maximova	435d8b12ef	Reland [SPIR-V] Support `SPV_INTEL_int4` extension (#141279 ) This relands #141031 This change ensures generated SPIR-V is valid and passes machine verification: ``` * Bad machine code: inconsistent constant size * - function: foo - basic block: %bb.1 entry (0x9ec9298) - instruction: %12:iid(s8) = G_CONSTANT i4 1 ``` That is done by promoting `G_CONSTANT` instructions with small integer types (e.g., `i4`) to `i8` if no extensions for "special" integer types are enabled.	2025-05-26 14:15:27 +02:00
Weining Lu	581d175a86	[LoongArch] Document the inline asm `q` constraint See #141037.	2025-05-26 09:14:27 +08:00
Jim Lin	7e09a00d79	[llvm-exegesis][Docs] --dump-object-to-disk option is specified by filename rather than bool (#141178 )	2025-05-24 10:33:29 +08:00
Mircea Trofin	c6be4566bc	[docs][mlgo] Document custom builds (#141243 )	2025-05-23 12:23:58 -07:00
bd1976bris	6520b21ce0	[DTLTO][LLVM] Integrated Distributed ThinLTO (DTLTO) (#127749 ) This patch adds initial support for Integrated Distributed ThinLTO (DTLTO) in LLVM, which manages distribution internally during the traditional link step. This enables compatibility with any build system that supports in-process ThinLTO. In contrast, existing approaches to distributed ThinLTO, which split the thin-link (--thinlto-index-only), backend compilation, and final link into separate steps, require build system support, e.g. Bazel. This patch implements the core DTLTO mechanism, which enables delegation of ThinLTO backend jobs to an external process (the distributor). The distributor can then manage job distribution through systems like Incredibuild. A generic JSON interface is used to communicate with the distributor, allowing for the creation of new distributors (and thus integration with different distribution systems) without modifying LLVM. Please see llvm/docs/dtlto.rst for more details. RFC: https://discourse.llvm.org/t/rfc-integrated-distributed-thinlto/69641 Design Review: https://github.com/llvm/llvm-project/pull/126654	2025-05-23 20:07:53 +01:00
Kazu Hirata	961613bd24	[llvm] Fix a typo in documentation (#141204 )	2025-05-23 10:33:02 -07:00
David Salinas	51a03ed272	Extend llvm objdump fatbin (#140286 ) Utilize the new extensions to the LLVM Offloading API to extend to llvm-objdump to handle dumping fatbin offload bundles generated by HIP. This extension to llvm-objdump adds the option --offload-fatbin. Specifying this option will take the input object/executable and extract all offload fatbin bundle entries into distinct code object files with names reflecting the source file name combined with the Bundle Entry ID. Users can also use the --arch-name option to filter offload fatbin bundle entries by their target triple. --------- Co-authored-by: dsalinas <dsalinas@MKM-L1-DSALINAS.amd.com>	2025-05-23 11:55:16 -04:00
Dmitry Sidorov	69d6c1ff66	Revert "[SPIR-V] Support `SPV_INTEL_int4` extension" (#141219 ) Reverts llvm/llvm-project#141031	2025-05-23 11:45:55 +02:00
Alex MacLean	3a84a4e55d	Reland "[NVPTX] Unify and extend barrier{.cta} intrinsic support" (#141143 ) Note: This relands #140615 adding a ".count" suffix to the non-".all" variants. Our current intrinsic support for barrier intrinsics is confusing and incomplete, with multiple intrinsics mapping to the same instruction and intrinsic names not clearly conveying intrinsic semantics. Further, we lack support for some variants. This change unifies the IR representation to a single consistently named set of intrinsics. - llvm.nvvm.barrier.cta.sync.aligned.all(i32) - llvm.nvvm.barrier.cta.sync.aligned.count(i32, i32) - llvm.nvvm.barrier.cta.arrive.aligned.count(i32, i32) - llvm.nvvm.barrier.cta.sync.all(i32) - llvm.nvvm.barrier.cta.sync.count(i32, i32) - llvm.nvvm.barrier.cta.arrive.count(i32, i32) The following Auto-Upgrade rules are used to maintain compatibility with IR using the legacy intrinsics: * llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0) * llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned.count(x, y) * llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x) * llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync.count(x, y)	2025-05-22 19:38:10 -07:00
S. VenkataKeerthy	d6596482ef	Minor typo fix in IR2Vec section of MLGO doc (#141162 ) Co-authored-by: svkeerthy <venkatakeerthy@google.com>	2025-05-22 16:49:21 -07:00
Kazu Hirata	19a4e5202d	[llvm] Fix typos in documentation (#141078 )	2025-05-22 13:58:42 -07:00

... 2 3 4 5 6 ...

11851 Commits