11247 Commits

Author SHA1 Message Date
Brandon Wu
4a7dbede6b
[RISCV] Support svukte extension (#115657)
This is the extension for "Address-Independent Latency of User-Mode
Faults to Supervisor Addresses".
Spec: https://github.com/riscv/riscv-isa-manual/pull/1564,
https://lf-riscv.atlassian.net/browse/RVS-2977
The spec states that the `svukte` depends on `sv39`, but we don't have
`sv39` yet, so I didn't add it to the implied list.
2024-11-27 10:54:57 +08:00
Louis Dionne
5bdcaf1a08
[github] Document the process for requesting the CI/CD role (#115321)
See https://discourse.llvm.org/t/rfc-proposing-a-new-ci-cd-admin-for-the-project
2024-11-26 14:18:49 -05:00
LiqinWeng
bf07a569b7
[LangRef] Remove extra commas of llvm.vp.ctlz (#117542) 2024-11-26 10:22:26 +08:00
Justin Bogner
bb88fd171a
[DirectX] Calculate resource binding offsets using the lower bound (#117303)
In the DXIL CreateHandle and CreateHandleFromBinding ops, resource
bindings are
indexed from the beginning of the binding space, not from the binding
itself.
Translate from an index into the binding to one from the beginning of
the space
when lowering to these operations.
2024-11-25 10:44:01 -08:00
LiqinWeng
73bebf96bc
[LangRef] Update the position of some parameters in the vp intrinsic of abs/cttz/ctlz (#117519) 2024-11-25 14:47:50 +08:00
Matt Arsenault
d1cca3133a
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
2024-11-22 20:12:50 -08:00
Matt Arsenault
01c9a14ccf
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers
are supported yet (op_sel, neg or clamp).

I'm not sure the intrinsic should really expose op_sel (or any of the
others). If I'm reading the documentation correctly, we should be able
to just have the raw scale operands and auto-match op_sel to byte
extract patterns.

The op_sel syntax also seems extra horrible in this usage, especially with the
usual assumed op_sel_hi=-1 behavior.
2024-11-21 08:51:58 -08:00
Jonas Devlieghere
8bfa87cadf
Release note lldb completion improvements (#117058) 2024-11-21 07:02:45 -08:00
Jonas Devlieghere
4acf935b95
Add release note for parallel module creation in LLDB (#116857)
Release note #110646 and #114507.
2024-11-20 13:25:36 -08:00
Petr Penzin
41c86ca714
[RISCV] Add TT-Ascalon-d8 processor (#115100)
Ascalon is an out-of-order CPU core from Tenstorrent. Overview:
https://tenstorrent.com/ip/tt-ascalon

Adding 8-wide version, -mcpu=tt-ascalon-d8. Scheduling model will be
added in a separate PR.

---------

Co-authored-by: Anton Blanchard <antonb@tenstorrent.com>
2024-11-19 14:20:55 -08:00
Adrian Prantl
3e552ed589
Add release notes for LLDB inline diagnostics (#116841) 2024-11-19 09:00:54 -08:00
David Spickett
ee4fb3a876 [llvm][docs] Correct setence in How To Add A Builder
Looks like a few different phrasings got mashed up together.
2024-11-19 13:36:13 +00:00
Tom Stellard
6fe94c3bae
[Workflows] Enable commit access requests via GitHub issues (#100458)
This updates the auto-labeler to match a specific issue title that is
going to be used for requesting commit access and then add the
infrastructure:commit-access-request label.

This will notify the admin team who will be able to handle the request.

See
https://discourse.llvm.org/t/rfc-change-the-process-for-requesting-commit-access/80184

---------

Co-authored-by: Vlad Serebrennikov <serebrennikov.vladislav@gmail.com>
2024-11-18 17:53:59 -08:00
Sam Elliott
486e1d91e3 [RISCV][docs] Release Notes
These cover recent additions and changes to assembly and inline assembly
support.
2024-11-18 11:02:48 -08:00
Matt Arsenault
5a556d55fb
AMDGPU: Increase the LDS size to support to 160 KB for gfx950 (#116309) 2024-11-18 10:48:56 -08:00
Matt Arsenault
a6fc489bb7
AMDGPU: Add gfx950 subtarget definitions (#116307)
Mostly a stub, but adds some baseline tests and
tests for removed instructions.
2024-11-18 10:41:14 -08:00
Steven Perron
756fe54dc7
[SPIRV] Add write to image buffer for shaders. (#115927)
This commit adds an intrinsic that will write to an image buffer. We
chose to match the name of the DXIL intrinsic for simplicity in clang.

We cannot reuse the existing openCL write_image function because that is
not a reserved name in HLSL. There is not much common code to factor
out.
2024-11-18 09:06:05 -05:00
Antonio Frighetto
9fc4654462 [LangRef] Fix mislabeling in calling convention name (NFC)
We have explained how musttail can be guaranteed when the calling
convention is not `swifttailcc` or `tailcc`, ensure what needs to
adhere when it is the opposite case.
2024-11-18 11:02:10 +01:00
Freddy Ye
97836bed63
Reland "[X86] Support -march=diamondrapids (#113881)" (#116564)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-11-18 10:40:32 +08:00
Freddy Ye
90e92239bd
Revert "[X86] Support -march=diamondrapids (#113881)" (#116563)
This reverts commit 826b845c9e97448395431be3e4e5da585bd98c5e.
2024-11-18 08:45:28 +08:00
Freddy Ye
826b845c9e
[X86] Support -march=diamondrapids (#113881)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-11-18 08:31:17 +08:00
Florian Hahn
c95daac4c1
[LangRef] Spell out alias attribute/metadata violations are UB. (#116220)
Update the documentation for the noalias attribute, !alias.scope and
!loop.parallel_accesses metadata to clarify they are UB on voilation the
noalias property.

PR: https://github.com/llvm/llvm-project/pull/116220
---------

Co-authored-by: Nuno Lopes <nuno.lopes@tecnico.ulisboa.pt>
2024-11-16 13:38:58 +00:00
Alex Bradbury
298127dcbe Reapply [IR] Initial introduction of llvm.experimental.memset_pattern (#97583)
Relands 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3 after regenerating the
test case.

Supersedes the draft PR #94992, taking a different approach following
feedback:
* Lower in PreISelIntrinsicLowering
* Don't require that the number of bytes to set is a compile-time
constant
* Define llvm.memset_pattern rather than llvm.memset_pattern.inline

As discussed in the [RFC
thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496),
the intent is that the intrinsic will be lowered to loops, a sequence of
stores, or libcalls depending on the expected cost and availability of
libcalls on the target. Right now, there's just a single lowering path
that aims to handle all cases. My intent would be to follow up with
additional PRs that add additional optimisations when possible (e.g.
when libcalls are available, when arguments are known to be constant
etc).
2024-11-15 15:21:39 +00:00
Alex Bradbury
0fb8fac5d6 Revert "[IR] Initial introduction of llvm.experimental.memset_pattern (#97583)"
This reverts commit 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3.

Recent scheduling changes means tests need to be re-generated. Reverting
to green while I do that.
2024-11-15 14:48:32 +00:00
Alex Bradbury
7ff3a9acd8
[IR] Initial introduction of llvm.experimental.memset_pattern (#97583)
Supersedes the draft PR #94992, taking a different approach following
feedback:
* Lower in PreISelIntrinsicLowering
* Don't require that the number of bytes to set is a compile-time
constant
* Define llvm.memset_pattern rather than llvm.memset_pattern.inline

As discussed in the [RFC
thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496),
the intent is that the intrinsic will be lowered to loops, a sequence of
stores, or libcalls depending on the expected cost and availability of
libcalls on the target. Right now, there's just a single lowering path
that aims to handle all cases. My intent would be to follow up with
additional PRs that add additional optimisations when possible (e.g.
when libcalls are available, when arguments are known to be constant
etc).
2024-11-15 14:07:46 +00:00
joaosaffran
bc6c068127
[HLSL] Adding HLSL clip function. (#114588)
Adding HLSL `clip` function.
 - adding llvm intrinsic
 - adding sema checks
 - adding dxil lowering
 - ading spirv lowering
 - adding sema tests
 - adding codegen tests
 - adding lowering tests

Closes #99093

---------

Co-authored-by: Joao Saffran <jderezende@microsoft.com>
2024-11-14 23:34:07 -08:00
Matin Raayai
bb3f5e1fed
Overhaul the TargetMachine and LLVMTargetMachine Classes (#111234)
Following discussions in #110443, and the following earlier discussions
in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html,
https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this
PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine`
interface classes. More specifically:
1. Makes `TargetMachine` the only class implemented under
`TargetMachine.h` in the `Target` library.
2. `TargetMachine` contains target-specific interface functions that
relate to IR/CodeGen/MC constructs, whereas before (at least on paper)
it was supposed to have only IR/MC constructs. Any Target that doesn't
want to use the independent code generator simply does not implement
them, and returns either `false` or `nullptr`.
3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming
aims to make the purpose of `LLVMTargetMachine` clearer. Its interface
was moved under the CodeGen library, to further emphasis its usage in
Targets that use CodeGen directly.
4. Makes `TargetMachine` the only interface used across LLVM and its
projects. With these changes, `CodeGenCommonTMImpl` is simply a set of
shared function implementations of `TargetMachine`, and CodeGen users
don't need to static cast to `LLVMTargetMachine` every time they need a
CodeGen-specific feature of the `TargetMachine`.
5. More importantly, does not change any requirements regarding library
linking.

cc @arsenm @aeubanks
2024-11-14 13:30:05 -08:00
Justin Fargnoli
2e9f8696e9
Reland "[LLVM] Add IRNormalizer Pass" (#113780)
`IRNormalizer` will reorder instructions. Thus, we need to invalidate
analyses. Done in cd500d28cba3177c213f2f2faf50f14ea56e230b. This should
resolve the [BuildBot
failure](https://github.com/llvm/llvm-project/pull/68176#issuecomment-2428243474).

---

Original PR: #68176
Original commit: 1295d2e6da2fe90f3b770ab1d35bf5caecd38bed
Reverted with: 8a12e0131f3d84b470fac63af042aa96a1b19f56

---

Add the llvm-canon tool. Description from the [original
PR](https://reviews.llvm.org/D66029#change-wZv3yOpDdxIu):

> Added a new llvm-canon tool which aims to transform LLVM Modules into
a canonical form by reordering and renaming instructions while
preserving the same semantics. This tool makes it easier to spot
semantic differences while diffing two modules which have undergone
different transformation passes.

The current version of this tool can:

- Reorder instructions within a function.
- Rename instructions based on the operands.
- Sort commutative operands.

This code was originally written by @michalpaszkowski and [submitted to
mainline
LLVM](14d358537f).
However, it was quickly
[reverted](335de55fa3)
to do BuildBot errors.

Michal presented his version of the tool in [LLVM-Canon: Shooting for
Clear Diffs](https://www.youtube.com/watch?v=c9WMijSOEUg).

@AidanGoldfarb and I ported the code to the new pass manager, added more
tests, and fixed some bugs related to PHI nodes that may have been the
root cause of the BuildBot errors that caused the patch to be reverted.
Additionally, we rewrote the implementation of instruction reordering to
fix cases where the original algorithm would break use-def chains.

Note that this is @AidanGoldfarb and I's first time submitting to LLVM.
Please liberally critique the PR!

CC @plotfi for initial review.

---------

Co-authored-by: Aidan <aidan.goldfarb@mail.mcgill.ca>
2024-11-14 09:56:22 -08:00
Graham Hunter
ed5aaddd7b
[IR] Vector extract last active element intrinsic (#113587)
As discussed in #112738, it may be better to have an intrinsic to represent vector element extracts based on mask bits. This intrinsic is for the case of extracting the last active element, if any, or a default value if the mask is all-false.

The target-agnostic SelectionDAG lowering is similar to the IR in #106560.
2024-11-14 17:48:43 +00:00
Diana Picus
2aa6cedfa8
[AMDGPU] Clarify amdgpu.cs.chain + init whole wave. NFC (#115452)
Add some docs clarifying how inactive lanes are handled in the
amdgpu_cs_chain calling convention when the llvm.amdgcn.init.whole.wave
intrinsic is used.
2024-11-14 10:10:33 +01:00
Ricardo Jesus
e52238b59f
[AArch64] Add @llvm.experimental.vector.match (#101974)
This patch introduces an experimental intrinsic for matching the
elements of one vector against the elements of another.

For AArch64 targets that support SVE2, the intrinsic lowers to a MATCH
instruction for supported fixed and scalar vector types.
2024-11-14 09:00:19 +00:00
Rakshit Patel
c63e83f495
[lit] Add --report-failures-only option for lit test reports (#115439)
- Add option (--report-failures-only) to generate a reduced report for
lit tests that only includes failing tests
- This is a continuation of proposed patches by @gregbedwell here:
    - https://reviews.llvm.org/D143516
    - https://reviews.llvm.org/D143519

---------

Co-authored-by: Greg Bedwell <greg.bedwell@sony.com>
Co-authored-by: James Henderson <James.Henderson@sony.com>
2024-11-13 08:30:33 +00:00
Alex Bradbury
2baead09b2 [docs] Add blank line before bulletpoint list to fix HowToAddABuilder
The bulletpoint list wasn't rendering properly due to a missing blank
line.
2024-11-13 05:26:02 +00:00
Shilei Tian
de0fd64bed
[AMDGPU] Introduce a new generic target gfx9-4-generic (#115190)
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
2024-11-12 23:11:05 -05:00
Alex Bradbury
8da61a3434
[llvm][docs] Expand HowToAddABuilder with guidance on testing locally (#115024)
With <https://github.com/llvm/llvm-zorg/pull/289> and <https://github.com/llvm/llvm-zorg/pull/293> landed, it's now reasonable to ask people to test their builder configurations locally. This patch adds documentation on how to do so.
2024-11-12 22:02:20 +00:00
Tex Riddell
5c2a133b13
Emit constrained atan2 intrinsic for clang builtin (#113636)
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

- `Builtins.td` - Add f16 support for libm atan2 builtin
- `CGBuiltin.cpp` - Emit constraint atan2 intrinsic for clang builtin
- `clang/test/CodeGenCXX/builtin-calling-conv.cpp` - Use erff instead of
atan2 for clang builtin to lib call calling convention check, now that
atan2 maps to an intrinsic.
- add atan2 cases to llvm.experimental.constrained tests for more
backends: ARM, PowerPC, RISCV, SystemZ.
- LangRef.rst: add llvm.experimental.constrained.atan2, revise
llvm.atan2 description.

Last part of Implement the atan2 HLSL Function. Fixes #70096.
2024-11-12 13:34:29 -08:00
Steven Perron
ba572abeb4
[SPIRV] Add reads from image buffer for shaders. (#115178)
This commit adds an intrinsic that will read from an image buffer. We
chose to match the name of the DXIL intrinsic for simplicity in clang.

We cannot reuse the existing openCL readimage function because that is
not a reserved name in HLSL.

I considered trying to refactor generateReadImageInst, so that we could
share code between the two implementations. However, most of the code in
generateReadImageInst is concerned with trying to figure out which type
of image read is being done. Once we factor out the code that will be
common, then we end up with just a single call to the MIRBuilder being
common.
2024-11-12 14:04:45 -05:00
Fangrui Song
5a094241de
[LangRef] Clarify RISC-V v? constraints
Pull Request: https://github.com/llvm/llvm-project/pull/115820
2024-11-12 09:20:54 -08:00
David Spickett
7c04da12f0
[llvm][docs] Add terminology note to Buildbot docs (#115856)
Choosing another term for this one document would only create confusion,
and vendoring Buildbot to change it is a lot of work (as explained in
the linked Buildbot issue).
2024-11-12 12:45:43 +00:00
Stephen Tozer
6d23ac1aa2
[DebugInfo] Update policy for when to merge locations (#115349)
Following discussions on PR #114231 this patch changes the policy on
merging locations, making the rule that new instructions should use a
merge of the locations of all the instructions whose output is produced
by the new instructions; in the case where only one instruction's output
is produced, as in most InstCombine optimizations, we use only that
instruction's location.
2024-11-12 09:16:59 +00:00
Carlos Alberto Enciso
5e7662efec
[llvm-debuginfo-analyzer] Incorrect DW_AT_call_line/DW_AT_call_file. (#115701)
The code dealing with DW_AT_call_line/DW_AT_call_file is in the wrong
place. The correct functions were call, but with incorrect values:
  DW_AT_call_line <-- Filename Index
  DW_AT_call_file <-- Line number
2024-11-11 13:00:24 +00:00
Thorsten Schütt
a5d09f4ad9
[GlobalISel] Add G_STEP_VECTOR instruction (#115598)
aka llvm.stepvector Intrinsic
2024-11-11 10:45:02 +01:00
Luke Lau
5ca082cdfe
[LangRef] Fix evl type on float VP reduction intrinsics (#115421)
Looks like a search-and-replace typo
2024-11-11 13:13:08 +08:00
Will
ff0698b258
[LangRef] Fix examples for float to int saturating intrinsics (#115629)
As per the [LangRef:Simple
Constants](https://llvm.org/docs/LangRef.html#simple-constants), exact
decimal values of floating-point constants are required. For instance,
23.9 is a repeating decimal in binary and results in the reported error.

https://godbolt.org/z/1h7ETPnf6

Fixes #113529.
2024-11-10 16:51:29 +01:00
Durgadoss R
4edd711b4d
[NVPTX] Add TMA bulk tensor prefetch intrinsics (#115527)
This patch adds NVVM intrinsics and NVPTX codegen for:
* cp.async.bulk.tensor.prefetch.1D -> 5D variants, supporting both Tile
  and Im2Col modes. These intrinsics optionally support cache_hints as
  indicated by the boolean flag argument.
* Lit tests are added for all combinations of these intrinsics in cp-async-bulk-tensor-prefetch.ll.
* The generated PTX is verified with a 12.3 ptxas executable.
* Added docs for these intrinsics in NVPTXUsage.rst file.
* PTX Spec reference: 
  https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk-prefetch-tensor

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2024-11-10 13:44:42 +05:30
T-Tie
c17a914675
[RISCV] Add Smdbltrp and Ssdbltrp extension (#111837)
Smdbltrp and Ssdbltrp supports are added in this PR.
Specification link(Smdbltrp) :
[https://github.com/riscv/riscv-isa-manual/blob/main/src/smdbltrp.adoc](url)
Specification link(Ssdbltrp) :
[https://github.com/riscv/riscv-isa-manual/blob/main/src/ssdbltrp.adoc](url)
2024-11-08 15:01:51 +08:00
Min-Yih Hsu
e8b70e9744
[TableGen] Make !and and !or short-circuit (#113963)
The idea is that by preemptively simplifying the result of `!and` and `!or`, we can fold
some of the conditional operators, like `!if` or `!cond`, as early as
possible.
2024-11-07 10:22:03 -08:00
Sjoerd Meijer
6720ce75f6
[Docs][llvm-exegesis] Clarify AArch64 support (#114989)
Claiming AArch64 support for llvm-exegesis is a bit of a stretch in my
opinion as only a couple of opcodes with GPR64 operands will work for
snippet benchmarking, so I propose to clarify that AArch64 support is
very experimental. Also added some clarifications about its libpfm4
dependency.
2024-11-07 10:48:52 +00:00
Durgadoss R
1b01064faa
[NVPTX] Add TMA bulk tensor copy intrinsics (#96083)
This patch adds NVVM intrinsics and NVPTX codegen for:
* cp.async.bulk.tensor.S2G.1D -> 5D variants, supporting both Tile and
   Im2Col modes. These intrinsics optionally support cache_hints as
   indicated by the boolean flag argument.
* cp.async.bulk.tensor.G2S.1D -> 5D variants, with support for both Tile
  and Im2Col modes. The Im2Col variants have an extra set of offsets as
  parameters. These intrinsics optionally support multicast and cache_hints,
  as indicated by the boolean arguments at the end of the intrinsics.
* The backend looks through these flag arguments and lowers to the
   appropriate PTX instruction.
* Lit tests are added for all combinations of these intrinsics in
  cp-async-bulk-tensor-g2s/s2g.ll.
* The generated PTX is verified with a 12.3 ptxas executable.
* Added docs for these intrinsics in NVPTXUsage.rst file.
* PTX Spec reference:
  https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk-tensor

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2024-11-07 15:21:53 +05:30
abhishek-kaushik22
d2aff182d3
Revert "TLS loads opimization (hoist)" (#114740)
This reverts commit c31014322c0b5ae596da129cbb844fb2198b4ef4.

Based on the discussions in #112772, this pass is not needed after the
introduction of `llvm.threadlocal.address` intrinsic.

Fixes https://github.com/llvm/llvm-project/issues/112771.
2024-11-07 10:10:28 +01:00