We can simplify the code with *Map::try_emplace where we need
default-constructed values while avoding calling constructors when
keys are already present.
Add a canonicalization patterns that simplifies `truncf(sitofp(x))` to
`sitofp(x)` and `truncf(uitofp(x))` to `uitofp(x)`, if truncf has default rounding mode.
This assumes that the destination type of truncf is representable by the
intermediate type.
Note that the truncf semantics requires that the destination type is
narrower than the source type, so this is true for all types I can
possibly think of, but one could probably construct an artificial
counter example.
Somewhat related: https://github.com/llvm/llvm-project/pull/128096
This PR moves the register usage checking to after the plans are
created, so that any recipes that optimise register usage (such as
partial reductions) can be properly costed and not have their VF pruned
unnecessarily.
Depends on https://github.com/llvm/llvm-project/pull/137746
Added APInt::clearBits(unsigned loBit, unsigned hiBit) that clears bits within a certain range.
Fixes#136550
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
Introduce m_scev_AffineAddRec to match affine AddRecs, a class_match for
SCEVConstant, and demonstrate their utility in LSR and SCEV. While at
it, rename m_Specific to m_scev_Specific for clarity.
This patch adds preprocessor macros when Zicfilp CFI is enabled. To be
specific:
+ `#define __riscv_landing_pad 1` when `-fcf-protection=[full|branch]`
+ `#define __riscv_landing_pad_unlabeled 1` when
`-fcf-protection=[full|branch] -mcf-branch-label-scheme=unlabeled`
The macros are proposed in riscv-non-isa/riscv-c-api-doc#76 , and the
CLI flags are from riscv-non-isa/riscv-toolchain-conventions#54.
This completes the set of maths builtins.
No attempt to vectorize or optimize this code. The implementation is
licensed to SunPro so will probably need to be replaced at some point in
the future anyway. Calls to other builtins have been replaced with the
CLC equivalents, and some bit-hacking was replaced with the fabs
builtin.
This sorts ConstantData to the low values, so we can perform
a hasUseList check in a single compare instead of requiring 2
compares plus an and for the range check.
This patch extends the non-tensor TMA Bulk Copy Op
(from shared_cta to global) with an optional
byte mask operand. This mask helps selectively
copy a particular byte to the destination.
* lit tests are added to verify the lowering to the intrinsics.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
This patch contains two minor changes, which I believe were the original
author's intent.
* when folding `transpose(broadcast(x))` emit `broadcast(x)` instead of
`broadcast(broadcast(x))`. The latter causes transient verifier
failures with `mlir-opt --debug` , e.g.
```
mlir-asm-printer: 'func.func' failed to verify and will be printed in generic form
"func.func"() <{function_type = (vector<4x1x1x7xi8>) -> vector<3x2x4x5x6x7xi8>, sym_name = "broadcast_transpose_mixed_example"}> ({
^bb0(%arg0: vector<4x1x1x7xi8>):
%0 = "vector.broadcast"(%arg0) : (vector<4x1x1x7xi8>) -> vector<2x3x4x5x6x7xi8>
%1 = "vector.broadcast"(%0) : (vector<2x3x4x5x6x7xi8>) -> vector<3x2x4x5x6x7xi8>
"func.return"(%1) : (vector<3x2x4x5x6x7xi8>) -> ()
}) : () -> ()
```
* when checking permutation groups the variable `low` was set just once
to zero, thus checking was quadratic. It looks the intent was for `low`
to track the beginning of each dimension groups. (Nevertheless the check
was correct).
I think this was denied by accident in
68180d8d16.
Flush of a common block is allowed by the standard on my reading. It is
not allowed by classic-flang but is supported by gfortran and ifx.
This doesn't need any lowering changes. The LLVM translation ignores the
flush argument list because the openmp runtime library doesn't support
flushing specific data.
Depends upon https://github.com/llvm/llvm-project/pull/139522. Ignore
the first commit in this PR.
- Use a bool in `generateGetQueryInst` and rename the variable to better
convey its purpose.
- Replace mentions of 0 by `DefaultValue` in comments.
- Fix typos.
Adds an LLVMIR op interface that can used by external operations to
model LLVM intrinsics. Related 'op to llvm.call_intrinsic' rewriter
helper is moved into common LLVM conversion patterns. The x86vector
dialect is refactored to use the new common abstraction.
The one-to-one intrinsic op is tied to LLVM intrinsic call semantics.
Thus, the op interface, previously defined as a part of x86vector
dialect, is moved into the LLVMIR interfaces to allow other low-level
dialects to define operations abstracting specific intrinsic semantics
while minimizing infrastructure duplication.
Related RFC:
https://discourse.llvm.org/t/rfc-simplify-x86-intrinsic-generation/85581/6
Add a linker test for -Bsymbolic-functions to AddLLVM and remove
the illumos hardcoded bits for its handling. OpenBSD also has a
local patch for linking with the old BFD linker on mips64 and
sparc64.
Introduce MCAsmBackend::addReloc to manage relocation appending.
The default implementation uses shouldForceRelocation to check
unresolved fixups and calls recordRelocation to append a relocation when
needed.
RISCV and LoongArch override addReloc to handle ADD/SUB relocations,
with potential support for RELAX relocations in the future.
AVR overrides addReloc to customize shouldForceRelocation behavior
(#121498).
applyFixup is moved into evaluateFixup.
Move the command plugins out of the DAP header and into their file. This
PR also renames the classes from "RequestHandler" to "Command". Although
they are implemented in terms of sending requests, they are not
"handlers".