The spec is available here:
https://github.com/intel/llvm/pull/12497
The PR doesn't add OpCooperativeMatrixApplyFunctionINTEL instruction as
it's still experimental and not properly tested E2E.
The PR also fixes few bugs in the related code:
1. CooperativeMatrixMulAddKHR optional operand must be literal, not a
constant;
2. Fixed available capabilities table creation for a case, when a single
extension adds few capabilities, that occupy not contiguous op codes.
---------
Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
The DAG has the same instructions: the signed and unsigned absolute
difference of it's input. For AArch64, they map to uabd and sabd for
Neon and SVE. The Neon and SVE instructions will require custom
patterns.
They are pseudo opcodes and are not imported by the IRTranslator. We
need combines to create them.
PowerPC, ARM, and AArch64 have native instructions.
/// i.e trunc(abs(sext(Op0) - sext(Op1))) becomes abds(Op0, Op1)
/// or trunc(abs(zext(Op0) - zext(Op1))) becomes abdu(Op0, Op1)
For GlobalISel, we are going to write the combines in MIR patterns.
see:
llvm/test/CodeGen/AArch64/abd-combine.ll
- [ ] combine into abd
- [ ] legalize and add td patterns
This consists of:
* Make these instructions part of FPMathOperator.
* Adjust bitcode/ir readers/writers to expect fast math flags on these
instructions.
* Make IRBuilder set the fast math flags on these instructions.
* Update langref and release notes.
* Update a bunch of tests. Some of these are due to InstCombineCasts
incorrectly adding fast math flags to fptrunc, which will be fixed in a
later patch.
This adds WebAssembly support for the new [Lime1 CPU].
First, this defines some new target features. These are subsets of
existing
features that reflect implementation concerns:
- "call-indirect-overlong" - implied by "reference-types"; just the
overlong
encoding for the `call_indirect` immediate, and not the actual reference
types.
- "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and
`memory.fill`, and not the other instructions in the bulk-memory
proposal.
Next, this defines a new target CPU, "lime1", which enables
mutable-globals,
bulk-memory-opt, multivalue, sign-ext, nontrapping-fptoint,
extended-const,
and call-indirect-overlong. Unlike the default "generic" CPU, "lime1" is
meant
to be frozen, and followed up by "lime2" and so on when new features are
desired.
[Lime1 CPU]:
https://github.com/WebAssembly/tool-conventions/blob/main/Lime.md#lime1
---------
Co-authored-by: Heejin Ahn <aheejin@gmail.com>
This PR fixes:
* emission of OpNames (added newly inserted internal intrinsics and
basic blocks)
* emission of function attributes (SRet is added)
* implementation of SPV_INTEL_optnone so that it emits OptNoneINTEL
Function Control flag, and add implementation of the SPV_EXT_optnone
SPIR-V extension.
Similar to `__asan_default_options`, users can specify default options
upon building the instrumented binaries by providing their own
definition of `__xray_default_options` which returns the option strings.
This is useful in cases where setting the `XRAY_OPTIONS` environment
variable might be difficult. Plus, it's a convenient way to populate
XRay options when you always want the instrumentation to be enabled.
This patch adds NVVM intrinsics and NVPTX codegen for:
* cp.async.bulk.tensor.reduce.1D -> 5D variants, supporting both Tile
and Im2Col modes.
* These intrinsics optionally support cache_hints as indicated by the
boolean flag argument.
* Lit tests are added for all combinations of these intrinsics in
cp-async-bulk-tensor-reduce.ll.
* The generated PTX is verified with a 12.3 ptxas executable.
* Added docs for these intrinsics in NVPTXUsage.rst file.
PTX Spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-reduce-async-bulk-tensor
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
In the DXIL CreateHandle and CreateHandleFromBinding ops, resource
bindings are
indexed from the beginning of the binding space, not from the binding
itself.
Translate from an index into the binding to one from the beginning of
the space
when lowering to these operations.
This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.
The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers
are supported yet (op_sel, neg or clamp).
I'm not sure the intrinsic should really expose op_sel (or any of the
others). If I'm reading the documentation correctly, we should be able
to just have the raw scale operands and auto-match op_sel to byte
extract patterns.
The op_sel syntax also seems extra horrible in this usage, especially with the
usual assumed op_sel_hi=-1 behavior.
Ascalon is an out-of-order CPU core from Tenstorrent. Overview:
https://tenstorrent.com/ip/tt-ascalon
Adding 8-wide version, -mcpu=tt-ascalon-d8. Scheduling model will be
added in a separate PR.
---------
Co-authored-by: Anton Blanchard <antonb@tenstorrent.com>
This updates the auto-labeler to match a specific issue title that is
going to be used for requesting commit access and then add the
infrastructure:commit-access-request label.
This will notify the admin team who will be able to handle the request.
See
https://discourse.llvm.org/t/rfc-change-the-process-for-requesting-commit-access/80184
---------
Co-authored-by: Vlad Serebrennikov <serebrennikov.vladislav@gmail.com>
This commit adds an intrinsic that will write to an image buffer. We
chose to match the name of the DXIL intrinsic for simplicity in clang.
We cannot reuse the existing openCL write_image function because that is
not a reserved name in HLSL. There is not much common code to factor
out.
We have explained how musttail can be guaranteed when the calling
convention is not `swifttailcc` or `tailcc`, ensure what needs to
adhere when it is the opposite case.
Update the documentation for the noalias attribute, !alias.scope and
!loop.parallel_accesses metadata to clarify they are UB on voilation the
noalias property.
PR: https://github.com/llvm/llvm-project/pull/116220
---------
Co-authored-by: Nuno Lopes <nuno.lopes@tecnico.ulisboa.pt>
Relands 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3 after regenerating the
test case.
Supersedes the draft PR #94992, taking a different approach following
feedback:
* Lower in PreISelIntrinsicLowering
* Don't require that the number of bytes to set is a compile-time
constant
* Define llvm.memset_pattern rather than llvm.memset_pattern.inline
As discussed in the [RFC
thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496),
the intent is that the intrinsic will be lowered to loops, a sequence of
stores, or libcalls depending on the expected cost and availability of
libcalls on the target. Right now, there's just a single lowering path
that aims to handle all cases. My intent would be to follow up with
additional PRs that add additional optimisations when possible (e.g.
when libcalls are available, when arguments are known to be constant
etc).
This reverts commit 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3.
Recent scheduling changes means tests need to be re-generated. Reverting
to green while I do that.
Supersedes the draft PR #94992, taking a different approach following
feedback:
* Lower in PreISelIntrinsicLowering
* Don't require that the number of bytes to set is a compile-time
constant
* Define llvm.memset_pattern rather than llvm.memset_pattern.inline
As discussed in the [RFC
thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496),
the intent is that the intrinsic will be lowered to loops, a sequence of
stores, or libcalls depending on the expected cost and availability of
libcalls on the target. Right now, there's just a single lowering path
that aims to handle all cases. My intent would be to follow up with
additional PRs that add additional optimisations when possible (e.g.
when libcalls are available, when arguments are known to be constant
etc).
Following discussions in #110443, and the following earlier discussions
in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html,
https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this
PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine`
interface classes. More specifically:
1. Makes `TargetMachine` the only class implemented under
`TargetMachine.h` in the `Target` library.
2. `TargetMachine` contains target-specific interface functions that
relate to IR/CodeGen/MC constructs, whereas before (at least on paper)
it was supposed to have only IR/MC constructs. Any Target that doesn't
want to use the independent code generator simply does not implement
them, and returns either `false` or `nullptr`.
3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming
aims to make the purpose of `LLVMTargetMachine` clearer. Its interface
was moved under the CodeGen library, to further emphasis its usage in
Targets that use CodeGen directly.
4. Makes `TargetMachine` the only interface used across LLVM and its
projects. With these changes, `CodeGenCommonTMImpl` is simply a set of
shared function implementations of `TargetMachine`, and CodeGen users
don't need to static cast to `LLVMTargetMachine` every time they need a
CodeGen-specific feature of the `TargetMachine`.
5. More importantly, does not change any requirements regarding library
linking.
cc @arsenm @aeubanks
`IRNormalizer` will reorder instructions. Thus, we need to invalidate
analyses. Done in cd500d28cba3177c213f2f2faf50f14ea56e230b. This should
resolve the [BuildBot
failure](https://github.com/llvm/llvm-project/pull/68176#issuecomment-2428243474).
---
Original PR: #68176
Original commit: 1295d2e6da2fe90f3b770ab1d35bf5caecd38bed
Reverted with: 8a12e0131f3d84b470fac63af042aa96a1b19f56
---
Add the llvm-canon tool. Description from the [original
PR](https://reviews.llvm.org/D66029#change-wZv3yOpDdxIu):
> Added a new llvm-canon tool which aims to transform LLVM Modules into
a canonical form by reordering and renaming instructions while
preserving the same semantics. This tool makes it easier to spot
semantic differences while diffing two modules which have undergone
different transformation passes.
The current version of this tool can:
- Reorder instructions within a function.
- Rename instructions based on the operands.
- Sort commutative operands.
This code was originally written by @michalpaszkowski and [submitted to
mainline
LLVM](14d358537f).
However, it was quickly
[reverted](335de55fa3)
to do BuildBot errors.
Michal presented his version of the tool in [LLVM-Canon: Shooting for
Clear Diffs](https://www.youtube.com/watch?v=c9WMijSOEUg).
@AidanGoldfarb and I ported the code to the new pass manager, added more
tests, and fixed some bugs related to PHI nodes that may have been the
root cause of the BuildBot errors that caused the patch to be reverted.
Additionally, we rewrote the implementation of instruction reordering to
fix cases where the original algorithm would break use-def chains.
Note that this is @AidanGoldfarb and I's first time submitting to LLVM.
Please liberally critique the PR!
CC @plotfi for initial review.
---------
Co-authored-by: Aidan <aidan.goldfarb@mail.mcgill.ca>
As discussed in #112738, it may be better to have an intrinsic to represent vector element extracts based on mask bits. This intrinsic is for the case of extracting the last active element, if any, or a default value if the mask is all-false.
The target-agnostic SelectionDAG lowering is similar to the IR in #106560.
Add some docs clarifying how inactive lanes are handled in the
amdgpu_cs_chain calling convention when the llvm.amdgcn.init.whole.wave
intrinsic is used.
This patch introduces an experimental intrinsic for matching the
elements of one vector against the elements of another.
For AArch64 targets that support SVE2, the intrinsic lowers to a MATCH
instruction for supported fixed and scalar vector types.
- Add option (--report-failures-only) to generate a reduced report for
lit tests that only includes failing tests
- This is a continuation of proposed patches by @gregbedwell here:
- https://reviews.llvm.org/D143516
- https://reviews.llvm.org/D143519
---------
Co-authored-by: Greg Bedwell <greg.bedwell@sony.com>
Co-authored-by: James Henderson <James.Henderson@sony.com>
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.