There are cases (like in an upcoming patch to MLIR's `Property` class)
where the ? value is a useful null value. However, existing predicates
make ti difficult to test if the value in a record one is operating is ?
or not.
This commit adds the !initialized predicate, which is 1 on concrete,
non-? values and 0 on ?.
---------
Co-authored-by: Akshat Oke <Akshat.Oke@amd.com>
commit a9aff440d9dd ("[libc][docs] reorganize documentation (#118836)")
moved https://libc.llvm.org/math/index.html to
https://libc.llvm.org/headers/math/index.html which makes links from
various slide decks stale.
There's an extension for sphinx that can generate redirects. Add a dependency
on that, then use it to create a redirect so that those older links still work.
I was able to install this sphinx extension via:
$ sudo apt install python3-sphinx-reredirects
We may need to install this on whatever server generates the llvm
documentation.
`--disassemble`/`--cdis` parses input bytes as decimal, 0bbin, 0ooct, or
0xhex. While the hexadecimal digit form is most commonly used, requiring
a 0x prefix for each byte (`0x48 0x29 0xc3`) is cumbersome.
Tools like xxd -p and rz-asm use a plain hex dump form without the 0x
prefix or space separator. This patch adds --hex to disassemble such hex
bytes with optional whitespace.
```
% rz-asm -a x86 -b 64 -d 4829c34829c4
sub rbx, rax
sub rsp, rax
% llvm-mc -triple=x86_64 --cdis --hex --output-asm-variant=1 <<< 4829c34829c4
.text
sub rbx, rax
sub rsp, rax
```
Pull Request: https://github.com/llvm/llvm-project/pull/119992
This PR adds the following features:
* saturation and float rounding mode decorations,
* arithmetic constrained floating-point intrinsics (strict_fadd,
strict_fsub, strict_fmul, strict_fdiv, strict_frem, strict_fma and
strict_fldexp),
* and SPV_INTEL_float_controls2 extension,
* using recent improvements of emit-intrinsics step, this PR also
simplifies pre- and post-legalizer steps and improves instruction
selection.
The P8700 is a high-performance processor from MIPS designed to meet the
demands of modern workloads, offering exceptional scalability and
efficiency. It builds on MIPS's established architectural strengths
while introducing enhancements that set it apart. For more details, you
can check out the official product page here:
https://mips.com/products/hardware/p8700/.
Scheduling model will be added in a separate commit/PR.
Kaleidoscope has switched to new pass manager before (#72324), but both
code and tutorial document have some missing parts.
This pull request fixes the following problems:
1. Adds `PromotePass` to the function pass manager. This pass was
removed during the switch from legacy pass manager to the new pass
manager.
2. Syncs the tutorial with the code.
The Qualcomm uC Xqcics extension adds 8 conditional select instructions.
The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/latest
This patch adds assembler only support.
---------
Co-authored-by: Harsh Chandel <hchandel@qti.qualcomm.com>
A more lightweight variant of
https://github.com/llvm/llvm-project/pull/109193,
which dispatches to multiple exit blocks via the middle blocks.
The patch also introduces a bit of required scaffolding to enable
early-exit vectorization, including an option. At the moment, early-exit
vectorization doesn't come with legality checks, and is only used if the
option is provided and the loop has metadata forcing vectorization. This
is only intended to be used for testing during bring-up, with @david-arm
enabling auto early-exit vectorization plugging in the changes from
https://github.com/llvm/llvm-project/pull/88385.
PR: https://github.com/llvm/llvm-project/pull/112138
I'd like to nominate myself as an additional Apple representative
(vendor contact) on the llvm security group.
I met many of you at the llvm-dev meeting roundtable(s) in Santa Clara.
I closely work with @ahmedbougacha @jroelofs at Apple.
- Abhay
This pull request modifies the behavior of the
`@llvm.experimental.stackmap` intrinsic to require that its two first
operands (`id` and `numShadowBytes`) be **immediate values**. This
change ensures that variables cannot be passed as two first arguments to
this intrinsic.
Related Issue: https://github.com/llvm/llvm-project/issues/115733
### Testing
- Added new test cases to ensure errors are emitted for non-immediate
operands.
- Ran the full LLVM test suite to verify no regressions were introduced.
The reason for this change is to clarify an existing technical
restriction of LLVM: there needs to be a way to implicitly define a type
if there is any way to legally define that type by another means.
The spec is available here:
https://github.com/intel/llvm/pull/12497
The PR doesn't add OpCooperativeMatrixApplyFunctionINTEL instruction as
it's still experimental and not properly tested E2E.
The PR also fixes few bugs in the related code:
1. CooperativeMatrixMulAddKHR optional operand must be literal, not a
constant;
2. Fixed available capabilities table creation for a case, when a single
extension adds few capabilities, that occupy not contiguous op codes.
---------
Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
The DAG has the same instructions: the signed and unsigned absolute
difference of it's input. For AArch64, they map to uabd and sabd for
Neon and SVE. The Neon and SVE instructions will require custom
patterns.
They are pseudo opcodes and are not imported by the IRTranslator. We
need combines to create them.
PowerPC, ARM, and AArch64 have native instructions.
/// i.e trunc(abs(sext(Op0) - sext(Op1))) becomes abds(Op0, Op1)
/// or trunc(abs(zext(Op0) - zext(Op1))) becomes abdu(Op0, Op1)
For GlobalISel, we are going to write the combines in MIR patterns.
see:
llvm/test/CodeGen/AArch64/abd-combine.ll
- [ ] combine into abd
- [ ] legalize and add td patterns
This consists of:
* Make these instructions part of FPMathOperator.
* Adjust bitcode/ir readers/writers to expect fast math flags on these
instructions.
* Make IRBuilder set the fast math flags on these instructions.
* Update langref and release notes.
* Update a bunch of tests. Some of these are due to InstCombineCasts
incorrectly adding fast math flags to fptrunc, which will be fixed in a
later patch.
This adds WebAssembly support for the new [Lime1 CPU].
First, this defines some new target features. These are subsets of
existing
features that reflect implementation concerns:
- "call-indirect-overlong" - implied by "reference-types"; just the
overlong
encoding for the `call_indirect` immediate, and not the actual reference
types.
- "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and
`memory.fill`, and not the other instructions in the bulk-memory
proposal.
Next, this defines a new target CPU, "lime1", which enables
mutable-globals,
bulk-memory-opt, multivalue, sign-ext, nontrapping-fptoint,
extended-const,
and call-indirect-overlong. Unlike the default "generic" CPU, "lime1" is
meant
to be frozen, and followed up by "lime2" and so on when new features are
desired.
[Lime1 CPU]:
https://github.com/WebAssembly/tool-conventions/blob/main/Lime.md#lime1
---------
Co-authored-by: Heejin Ahn <aheejin@gmail.com>
This PR fixes:
* emission of OpNames (added newly inserted internal intrinsics and
basic blocks)
* emission of function attributes (SRet is added)
* implementation of SPV_INTEL_optnone so that it emits OptNoneINTEL
Function Control flag, and add implementation of the SPV_EXT_optnone
SPIR-V extension.
Similar to `__asan_default_options`, users can specify default options
upon building the instrumented binaries by providing their own
definition of `__xray_default_options` which returns the option strings.
This is useful in cases where setting the `XRAY_OPTIONS` environment
variable might be difficult. Plus, it's a convenient way to populate
XRay options when you always want the instrumentation to be enabled.
This patch adds NVVM intrinsics and NVPTX codegen for:
* cp.async.bulk.tensor.reduce.1D -> 5D variants, supporting both Tile
and Im2Col modes.
* These intrinsics optionally support cache_hints as indicated by the
boolean flag argument.
* Lit tests are added for all combinations of these intrinsics in
cp-async-bulk-tensor-reduce.ll.
* The generated PTX is verified with a 12.3 ptxas executable.
* Added docs for these intrinsics in NVPTXUsage.rst file.
PTX Spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-reduce-async-bulk-tensor
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
In the DXIL CreateHandle and CreateHandleFromBinding ops, resource
bindings are
indexed from the beginning of the binding space, not from the binding
itself.
Translate from an index into the binding to one from the beginning of
the space
when lowering to these operations.
This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.
The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers
are supported yet (op_sel, neg or clamp).
I'm not sure the intrinsic should really expose op_sel (or any of the
others). If I'm reading the documentation correctly, we should be able
to just have the raw scale operands and auto-match op_sel to byte
extract patterns.
The op_sel syntax also seems extra horrible in this usage, especially with the
usual assumed op_sel_hi=-1 behavior.