llvm-project

Author	SHA1	Message	Date
Dmitry Sidorov	d057b53a7d	[SPIR-V] Add SPV_INTEL_joint_matrix extension (#118578 ) The spec is available here: https://github.com/intel/llvm/pull/12497 The PR doesn't add OpCooperativeMatrixApplyFunctionINTEL instruction as it's still experimental and not properly tested E2E. The PR also fixes few bugs in the related code: 1. CooperativeMatrixMulAddKHR optional operand must be literal, not a constant; 2. Fixed available capabilities table creation for a case, when a single extension adds few capabilities, that occupy not contiguous op codes. --------- Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>	2024-12-04 19:00:19 +01:00
Rahul Joshi	e2c3d16282	[NFC] Eliminate need of Emacs tag and file name in file header (#118553 ) - Simplify file header to not require file name and C++ Emacs tag. See https://discourse.llvm.org/t/is-c-in-header-files-still-relevant/83124/1	2024-12-04 08:57:27 -08:00
Thorsten Schütt	148fdc519c	[GlobalISel] Add G_ABDS and G_ABDU instructions (#118122 ) The DAG has the same instructions: the signed and unsigned absolute difference of it's input. For AArch64, they map to uabd and sabd for Neon and SVE. The Neon and SVE instructions will require custom patterns. They are pseudo opcodes and are not imported by the IRTranslator. We need combines to create them. PowerPC, ARM, and AArch64 have native instructions. /// i.e trunc(abs(sext(Op0) - sext(Op1))) becomes abds(Op0, Op1) /// or trunc(abs(zext(Op0) - zext(Op1))) becomes abdu(Op0, Op1) For GlobalISel, we are going to write the combines in MIR patterns. see: llvm/test/CodeGen/AArch64/abd-combine.ll - [ ] combine into abd - [ ] legalize and add td patterns	2024-12-04 12:53:15 +01:00
John Brawn	ecbe4d1e36	[IR] Allow fast math flags on fptrunc and fpext (#115894 ) This consists of: * Make these instructions part of FPMathOperator. * Adjust bitcode/ir readers/writers to expect fast math flags on these instructions. * Make IRBuilder set the fast math flags on these instructions. * Update langref and release notes. * Update a bunch of tests. Some of these are due to InstCombineCasts incorrectly adding fast math flags to fptrunc, which will be fixed in a later patch.	2024-12-04 10:53:04 +00:00
Shilei Tian	68bcba6d7a	Revert "[AMDGPU] Use COV6 by default (#118515 )" This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some buildbots are not ready yet.	2024-12-03 20:17:06 -05:00
Shilei Tian	410cbe3cf2	[AMDGPU] Use COV6 by default (#118515 )	2024-12-03 19:38:35 -05:00
Dan Gohman	35cce408ee	[WebAssembly] Support the new "Lime1" CPU (#112035 ) This adds WebAssembly support for the new [Lime1 CPU]. First, this defines some new target features. These are subsets of existing features that reflect implementation concerns: - "call-indirect-overlong" - implied by "reference-types"; just the overlong encoding for the `call_indirect` immediate, and not the actual reference types. - "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and `memory.fill`, and not the other instructions in the bulk-memory proposal. Next, this defines a new target CPU, "lime1", which enables mutable-globals, bulk-memory-opt, multivalue, sign-ext, nontrapping-fptoint, extended-const, and call-indirect-overlong. Unlike the default "generic" CPU, "lime1" is meant to be frozen, and followed up by "lime2" and so on when new features are desired. [Lime1 CPU]: https://github.com/WebAssembly/tool-conventions/blob/main/Lime.md#lime1 --------- Co-authored-by: Heejin Ahn <aheejin@gmail.com>	2024-12-03 16:35:23 -08:00
Shilei Tian	17cfd016b4	[AMDGPU][Doc] Add `gfx950` to `gfx9-4-generic` in the document	2024-12-03 11:17:22 -05:00
Vyacheslav Levytskyy	874b4fb6ad	[SPIR-V] Fix emission of debug and annotation instructions and add SPV_EXT_optnone SPIR-V extension (#118402 ) This PR fixes: * emission of OpNames (added newly inserted internal intrinsics and basic blocks) * emission of function attributes (SRet is added) * implementation of SPV_INTEL_optnone so that it emits OptNoneINTEL Function Control flag, and add implementation of the SPV_EXT_optnone SPIR-V extension.	2024-12-03 16:18:06 +01:00
Viktoria Maximova	4a6ecd3821	Add support for SPIR-V extension: SPV_INTEL_media_block_io (#118024 ) This changes implements SPV_INTEL_media_block_io extension in SPIR-V backend.	2024-12-03 13:47:18 +01:00
Sudharsan Veeravalli	6881c6d2a6	[RISCV] Add Qualcomm uC Xqcia (Arithmetic) extension (#118113 ) This extension adds 11 instructions that perform integer arithmetic. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support.	2024-12-01 17:06:22 +05:30
Nuno Lopes	ed7f36e1ec	[LangRef] update a couple of struct/vector creation examples to use poison	2024-11-29 09:42:25 +00:00
Min-Yih Hsu	96dd39c575	[XRay] Add `__xray_default_options` to specify build-time defined options (#117921 ) Similar to `__asan_default_options`, users can specify default options upon building the instrumented binaries by providing their own definition of `__xray_default_options` which returns the option strings. This is useful in cases where setting the `XRAY_OPTIONS` environment variable might be difficult. Plus, it's a convenient way to populate XRay options when you always want the instrumentation to be enabled.	2024-11-28 22:48:57 -08:00
Sudharsan Veeravalli	8fcbba82d6	[RISCV] Add Qualcomm uC Xqcisls (Scaled Load Store) extension (#117987 ) This extension adds 8 load/store instructions with a scaled index addressing mode. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support.	2024-11-29 10:26:00 +05:30
Sudharsan Veeravalli	c4645ffeda	[RISCV] Add Qualcomm uC Xqcicsr (CSR) extension (#117169 ) The Qualcomm uC Xqcicsr extension adds 2 instructions that can read and write CSRs. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support.	2024-11-28 12:46:15 +05:30
Durgadoss R	40d0058e6a	[NVPTX] Add TMA bulk tensor reduction intrinsics (#116854 ) This patch adds NVVM intrinsics and NVPTX codegen for: * cp.async.bulk.tensor.reduce.1D -> 5D variants, supporting both Tile and Im2Col modes. * These intrinsics optionally support cache_hints as indicated by the boolean flag argument. * Lit tests are added for all combinations of these intrinsics in cp-async-bulk-tensor-reduce.ll. * The generated PTX is verified with a 12.3 ptxas executable. * Added docs for these intrinsics in NVPTXUsage.rst file. PTX Spec reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-reduce-async-bulk-tensor Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-11-27 10:57:51 +05:30
Brandon Wu	4a7dbede6b	[RISCV] Support `svukte` extension (#115657 ) This is the extension for "Address-Independent Latency of User-Mode Faults to Supervisor Addresses". Spec: https://github.com/riscv/riscv-isa-manual/pull/1564, https://lf-riscv.atlassian.net/browse/RVS-2977 The spec states that the `svukte` depends on `sv39`, but we don't have `sv39` yet, so I didn't add it to the implied list.	2024-11-27 10:54:57 +08:00
Louis Dionne	5bdcaf1a08	[github] Document the process for requesting the CI/CD role (#115321 ) See https://discourse.llvm.org/t/rfc-proposing-a-new-ci-cd-admin-for-the-project	2024-11-26 14:18:49 -05:00
LiqinWeng	bf07a569b7	[LangRef] Remove extra commas of llvm.vp.ctlz (#117542 )	2024-11-26 10:22:26 +08:00
Justin Bogner	bb88fd171a	[DirectX] Calculate resource binding offsets using the lower bound (#117303 ) In the DXIL CreateHandle and CreateHandleFromBinding ops, resource bindings are indexed from the beginning of the binding space, not from the binding itself. Translate from an index into the binding to one from the beginning of the space when lowering to these operations.	2024-11-25 10:44:01 -08:00
LiqinWeng	73bebf96bc	[LangRef] Update the position of some parameters in the vp intrinsic of abs/cttz/ctlz (#117519 )	2024-11-25 14:47:50 +08:00
Matt Arsenault	d1cca3133a	AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260 ) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.	2024-11-22 20:12:50 -08:00
Matt Arsenault	01c9a14ccf	AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723 ) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp). I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns. The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.	2024-11-21 08:51:58 -08:00
Jonas Devlieghere	8bfa87cadf	Release note lldb completion improvements (#117058 )	2024-11-21 07:02:45 -08:00
Jonas Devlieghere	4acf935b95	Add release note for parallel module creation in LLDB (#116857 ) Release note #110646 and #114507.	2024-11-20 13:25:36 -08:00
Petr Penzin	41c86ca714	[RISCV] Add TT-Ascalon-d8 processor (#115100 ) Ascalon is an out-of-order CPU core from Tenstorrent. Overview: https://tenstorrent.com/ip/tt-ascalon Adding 8-wide version, -mcpu=tt-ascalon-d8. Scheduling model will be added in a separate PR. --------- Co-authored-by: Anton Blanchard <antonb@tenstorrent.com>	2024-11-19 14:20:55 -08:00
Adrian Prantl	3e552ed589	Add release notes for LLDB inline diagnostics (#116841 )	2024-11-19 09:00:54 -08:00
David Spickett	ee4fb3a876	[llvm][docs] Correct setence in How To Add A Builder Looks like a few different phrasings got mashed up together.	2024-11-19 13:36:13 +00:00
Tom Stellard	6fe94c3bae	[Workflows] Enable commit access requests via GitHub issues (#100458 ) This updates the auto-labeler to match a specific issue title that is going to be used for requesting commit access and then add the infrastructure:commit-access-request label. This will notify the admin team who will be able to handle the request. See https://discourse.llvm.org/t/rfc-change-the-process-for-requesting-commit-access/80184 --------- Co-authored-by: Vlad Serebrennikov <serebrennikov.vladislav@gmail.com>	2024-11-18 17:53:59 -08:00
Sam Elliott	486e1d91e3	[RISCV][docs] Release Notes These cover recent additions and changes to assembly and inline assembly support.	2024-11-18 11:02:48 -08:00
Matt Arsenault	5a556d55fb	AMDGPU: Increase the LDS size to support to 160 KB for gfx950 (#116309 )	2024-11-18 10:48:56 -08:00
Matt Arsenault	a6fc489bb7	AMDGPU: Add gfx950 subtarget definitions (#116307 ) Mostly a stub, but adds some baseline tests and tests for removed instructions.	2024-11-18 10:41:14 -08:00
Steven Perron	756fe54dc7	[SPIRV] Add write to image buffer for shaders. (#115927 ) This commit adds an intrinsic that will write to an image buffer. We chose to match the name of the DXIL intrinsic for simplicity in clang. We cannot reuse the existing openCL write_image function because that is not a reserved name in HLSL. There is not much common code to factor out.	2024-11-18 09:06:05 -05:00
Antonio Frighetto	9fc4654462	[LangRef] Fix mislabeling in calling convention name (NFC) We have explained how musttail can be guaranteed when the calling convention is not `swifttailcc` or `tailcc`, ensure what needs to adhere when it is the opposite case.	2024-11-18 11:02:10 +01:00
Freddy Ye	97836bed63	Reland "[X86] Support -march=diamondrapids (#113881 )" (#116564 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-11-18 10:40:32 +08:00
Freddy Ye	90e92239bd	Revert "[X86] Support -march=diamondrapids (#113881 )" (#116563 ) This reverts commit 826b845c9e97448395431be3e4e5da585bd98c5e.	2024-11-18 08:45:28 +08:00
Freddy Ye	826b845c9e	[X86] Support -march=diamondrapids (#113881 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-11-18 08:31:17 +08:00
Florian Hahn	c95daac4c1	[LangRef] Spell out alias attribute/metadata violations are UB. (#116220 ) Update the documentation for the noalias attribute, !alias.scope and !loop.parallel_accesses metadata to clarify they are UB on voilation the noalias property. PR: https://github.com/llvm/llvm-project/pull/116220 --------- Co-authored-by: Nuno Lopes <nuno.lopes@tecnico.ulisboa.pt>	2024-11-16 13:38:58 +00:00
Alex Bradbury	298127dcbe	Reapply [IR] Initial introduction of llvm.experimental.memset_pattern (#97583 ) Relands 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3 after regenerating the test case. Supersedes the draft PR #94992, taking a different approach following feedback: * Lower in PreISelIntrinsicLowering * Don't require that the number of bytes to set is a compile-time constant * Define llvm.memset_pattern rather than llvm.memset_pattern.inline As discussed in the [RFC thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496), the intent is that the intrinsic will be lowered to loops, a sequence of stores, or libcalls depending on the expected cost and availability of libcalls on the target. Right now, there's just a single lowering path that aims to handle all cases. My intent would be to follow up with additional PRs that add additional optimisations when possible (e.g. when libcalls are available, when arguments are known to be constant etc).	2024-11-15 15:21:39 +00:00
Alex Bradbury	0fb8fac5d6	Revert "[IR] Initial introduction of llvm.experimental.memset_pattern (#97583 )" This reverts commit 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3. Recent scheduling changes means tests need to be re-generated. Reverting to green while I do that.	2024-11-15 14:48:32 +00:00
Alex Bradbury	7ff3a9acd8	[IR] Initial introduction of llvm.experimental.memset_pattern (#97583 ) Supersedes the draft PR #94992, taking a different approach following feedback: * Lower in PreISelIntrinsicLowering * Don't require that the number of bytes to set is a compile-time constant * Define llvm.memset_pattern rather than llvm.memset_pattern.inline As discussed in the [RFC thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496), the intent is that the intrinsic will be lowered to loops, a sequence of stores, or libcalls depending on the expected cost and availability of libcalls on the target. Right now, there's just a single lowering path that aims to handle all cases. My intent would be to follow up with additional PRs that add additional optimisations when possible (e.g. when libcalls are available, when arguments are known to be constant etc).	2024-11-15 14:07:46 +00:00
joaosaffran	bc6c068127	[HLSL] Adding HLSL `clip` function. (#114588 ) Adding HLSL `clip` function. - adding llvm intrinsic - adding sema checks - adding dxil lowering - ading spirv lowering - adding sema tests - adding codegen tests - adding lowering tests Closes #99093 --------- Co-authored-by: Joao Saffran <jderezende@microsoft.com>	2024-11-14 23:34:07 -08:00
Matin Raayai	bb3f5e1fed	Overhaul the TargetMachine and LLVMTargetMachine Classes (#111234 ) Following discussions in #110443, and the following earlier discussions in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html, https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine` interface classes. More specifically: 1. Makes `TargetMachine` the only class implemented under `TargetMachine.h` in the `Target` library. 2. `TargetMachine` contains target-specific interface functions that relate to IR/CodeGen/MC constructs, whereas before (at least on paper) it was supposed to have only IR/MC constructs. Any Target that doesn't want to use the independent code generator simply does not implement them, and returns either `false` or `nullptr`. 3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming aims to make the purpose of `LLVMTargetMachine` clearer. Its interface was moved under the CodeGen library, to further emphasis its usage in Targets that use CodeGen directly. 4. Makes `TargetMachine` the only interface used across LLVM and its projects. With these changes, `CodeGenCommonTMImpl` is simply a set of shared function implementations of `TargetMachine`, and CodeGen users don't need to static cast to `LLVMTargetMachine` every time they need a CodeGen-specific feature of the `TargetMachine`. 5. More importantly, does not change any requirements regarding library linking. cc @arsenm @aeubanks	2024-11-14 13:30:05 -08:00
Justin Fargnoli	2e9f8696e9	Reland "[LLVM] Add IRNormalizer Pass" (#113780 ) `IRNormalizer` will reorder instructions. Thus, we need to invalidate analyses. Done in cd500d28cba3177c213f2f2faf50f14ea56e230b. This should resolve the [BuildBot failure](https://github.com/llvm/llvm-project/pull/68176#issuecomment-2428243474). --- Original PR: #68176 Original commit: 1295d2e6da2fe90f3b770ab1d35bf5caecd38bed Reverted with: 8a12e0131f3d84b470fac63af042aa96a1b19f56 --- Add the llvm-canon tool. Description from the [original PR](https://reviews.llvm.org/D66029#change-wZv3yOpDdxIu): > Added a new llvm-canon tool which aims to transform LLVM Modules into a canonical form by reordering and renaming instructions while preserving the same semantics. This tool makes it easier to spot semantic differences while diffing two modules which have undergone different transformation passes. The current version of this tool can: - Reorder instructions within a function. - Rename instructions based on the operands. - Sort commutative operands. This code was originally written by @michalpaszkowski and [submitted to mainline LLVM](`14d358537f`). However, it was quickly [reverted](`335de55fa3`) to do BuildBot errors. Michal presented his version of the tool in [LLVM-Canon: Shooting for Clear Diffs](https://www.youtube.com/watch?v=c9WMijSOEUg). @AidanGoldfarb and I ported the code to the new pass manager, added more tests, and fixed some bugs related to PHI nodes that may have been the root cause of the BuildBot errors that caused the patch to be reverted. Additionally, we rewrote the implementation of instruction reordering to fix cases where the original algorithm would break use-def chains. Note that this is @AidanGoldfarb and I's first time submitting to LLVM. Please liberally critique the PR! CC @plotfi for initial review. --------- Co-authored-by: Aidan <aidan.goldfarb@mail.mcgill.ca>	2024-11-14 09:56:22 -08:00
Graham Hunter	ed5aaddd7b	[IR] Vector extract last active element intrinsic (#113587 ) As discussed in #112738, it may be better to have an intrinsic to represent vector element extracts based on mask bits. This intrinsic is for the case of extracting the last active element, if any, or a default value if the mask is all-false. The target-agnostic SelectionDAG lowering is similar to the IR in #106560.	2024-11-14 17:48:43 +00:00
Diana Picus	2aa6cedfa8	[AMDGPU] Clarify amdgpu.cs.chain + init whole wave. NFC (#115452 ) Add some docs clarifying how inactive lanes are handled in the amdgpu_cs_chain calling convention when the llvm.amdgcn.init.whole.wave intrinsic is used.	2024-11-14 10:10:33 +01:00
Ricardo Jesus	e52238b59f	[AArch64] Add @llvm.experimental.vector.match (#101974 ) This patch introduces an experimental intrinsic for matching the elements of one vector against the elements of another. For AArch64 targets that support SVE2, the intrinsic lowers to a MATCH instruction for supported fixed and scalar vector types.	2024-11-14 09:00:19 +00:00
Rakshit Patel	c63e83f495	[lit] Add --report-failures-only option for lit test reports (#115439 ) - Add option (--report-failures-only) to generate a reduced report for lit tests that only includes failing tests - This is a continuation of proposed patches by @gregbedwell here: - https://reviews.llvm.org/D143516 - https://reviews.llvm.org/D143519 --------- Co-authored-by: Greg Bedwell <greg.bedwell@sony.com> Co-authored-by: James Henderson <James.Henderson@sony.com>	2024-11-13 08:30:33 +00:00
Alex Bradbury	2baead09b2	[docs] Add blank line before bulletpoint list to fix HowToAddABuilder The bulletpoint list wasn't rendering properly due to a missing blank line.	2024-11-13 05:26:02 +00:00
Shilei Tian	de0fd64bed	[AMDGPU] Introduce a new generic target `gfx9-4-generic` (#115190 ) This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.	2024-11-12 23:11:05 -05:00

1 2 3 4 5 ...

11263 Commits