This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
- `Builtins.td` - Add f16 support for libm atan2 builtin
- `CGBuiltin.cpp` - Emit constraint atan2 intrinsic for clang builtin
- `clang/test/CodeGenCXX/builtin-calling-conv.cpp` - Use erff instead of
atan2 for clang builtin to lib call calling convention check, now that
atan2 maps to an intrinsic.
- add atan2 cases to llvm.experimental.constrained tests for more
backends: ARM, PowerPC, RISCV, SystemZ.
- LangRef.rst: add llvm.experimental.constrained.atan2, revise
llvm.atan2 description.
Last part of Implement the atan2 HLSL Function. Fixes#70096.
This reverts commit c31014322c0b5ae596da129cbb844fb2198b4ef4.
Based on the discussions in #112772, this pass is not needed after the
introduction of `llvm.threadlocal.address` intrinsic.
Fixes https://github.com/llvm/llvm-project/issues/112771.
StructType::setBody is the only mechanism that can potentially create
recursion in the type system. Add a runtime check that it is not
actually used to create recursion.
If the check fails, report an error from LLParser, BitcodeReader and
IRLinker. In all other cases assert that the check succeeds.
In future StructType::setBody will be removed in favor of specifying the
body when the type is created, so any performance hit from this runtime
check will be temporary.
This adds the `llvm.sincos` intrinsic, legalization, and lowering.
The `llvm.sincos` intrinsic takes a floating-point value and returns
both the sine and cosine (as a struct).
```
declare { float, float } @llvm.sincos.f32(float %Val)
declare { double, double } @llvm.sincos.f64(double %Val)
declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val)
declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val)
declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val)
```
The lowering is built on top of the existing FSINCOS ISD node, with
additional type legalization to allow for f16, f128, and vector values.
# What
This PR renames the newly-introduced llvm attribute
`sanitize_realtime_unsafe` to `sanitize_realtime_blocking`. Likewise,
sibling variables such as `SanitizeRealtimeUnsafe` are renamed to
`SanitizeRealtimeBlocking` respectively. There are no other functional
changes.
# Why?
- There are a number of problems that can cause a function to be
real-time "unsafe",
- we wish to communicate what problems rtsan detects and *why* they're
unsafe, and
- a generic "unsafe" attribute is, in our opinion, too broad a net -
which may lead to future implementations that need extra contextual
information passed through them in order to communicate meaningful
reasons to users.
- We want to avoid this situation and make the runtime library boundary
API/ABI as simple as possible, and
- we believe that restricting the scope of attributes to names like
`sanitize_realtime_blocking` is an effective means of doing so.
We also feel that the symmetry between `[[clang::blocking]]` and
`sanitize_realtime_blocking` is easier to follow as a developer.
# Concerns
- I'm aware that the LLVM attribute `sanitize_realtime_unsafe` has been
part of the tree for a few weeks now (introduced here:
https://github.com/llvm/llvm-project/pull/106754). Given that it hasn't
been released in version 20 yet, am I correct in considering this to not
be a breaking change?
A call to a function that has this attribute is not a source of
divergence, as used by UniformityAnalysis. That allows a front-end to
use known-name calls as an instruction extension mechanism (e.g.
https://github.com/GPUOpen-Drivers/llvm-dialects ) without such a call
being a source of divergence.
Fixes https://github.com/llvm/llvm-project/issues/60942: IEEE semantics
is likely what many frontends want (it definitely is what Rust wants),
and it is what LLVM passes already assume when they use APFloat to
propagate float operations.
This does not reflect what happens on x87, but what happens there is
just plain unsound (https://github.com/llvm/llvm-project/issues/89885,
https://github.com/llvm/llvm-project/issues/44218); there is no coherent
specification that will describe this behavior correctly -- the backend
in combination with standard LLVM passes is just fundamentally buggy in
a hard-to-fix-way.
There's also the questions around flushing subnormals to zero, but [this
discussion](https://discourse.llvm.org/t/questions-about-llvm-canonicalize/79378)
seems to indicate a general stance of: this is specific non-standard
hardware behavior, and generally needs LLVM to be told that basic float
ops do not return the standard result. Just naively running
LLVM-compiled code on hardware configured to flush subnormals will lead
to #89885-like issues.
AFAIK this is also what Alive2 implements (@nunoplopes please correct me
if I am wrong).
This change implements support of metadata strings in operand bundle
values. It makes possible calls like:
call void @some_func(i32 %x) [ "foo"(i32 42, metadata !"abc") ]
It requires some extension of the bitcode serialization. As SSA values
and metadata are stored in different tables, there must be a way to
distinguish them during deserialization. It is implemented by putting a
special marker before the metadata index. The marker cannot be treated
as a reference to any SSA value, so it unambiguously identifies
metadata. It allows extending the bitcode serialization without breaking
compatibility.
Metadata as operand bundle values are intended to be used in
floating-point function calls. They would represent the same information
as now is passed by the constrained intrinsic arguments.
This is intended to solve a problem with lowering atomics in
OpenMP and C++ common to AMDGPU and NVPTX.
In OpenCL and CUDA, it is undefined behavior for an atomic instruction
to modify an object in thread private memory. In OpenMP, it is defined.
Correspondingly, the hardware does not handle this correctly. For
AMDGPU,
32-bit atomics work and 64-bit atomics are silently dropped. We
therefore
need to codegen this by inserting a runtime address space check,
performing
the private case without atomics, and fallback to issuing the real
atomic
otherwise. This metadata allows us to avoid this extra check and branch.
Handle this by introducing metadata intended to be applied to atomicrmw,
indicating they cannot access the forbidden address space.
This extends FPMathOperator to allow calls that return literal structs
of homogeneous floating-point or vector-of-floating-point types.
The intended use case for this is to support FP intrinsics that return
multiple values (such as `llvm.sincos`).
We already disallow accessing the callee's allocas from a tail-called
function, because their stack memory will have been de-allocated before
the tail call. I think this should apply to byval arguments too, as they
also occupy space in the caller's stack frame.
This was originally part of #109943, spilt out for separate review.
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
This preliminary work adds the intrinsic to llvm and expands using atan
intrinsic for DXIL backend, since DXIL has no atan2 op.
Part 1 for Implement the atan2 HLSL Function #70096.
(reland #108865 reverted in #109842 due to doc build break)
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
This preliminary work adds the intrinsic to llvm and expands using atan
intrinsic for DXIL backend, since DXIL has no atan2 op.
Part 1 for Implement the atan2 HLSL Function #70096.
For the purpose of verifying proper arguments extensions per the target's ABI,
introduce the NoExt attribute that may be used by a target when neither sign-
or zeroextension is required (e.g. with a struct in register). The purpose of
doing so is to be able to verify that there is always one of these attributes
present and by this detecting cases where sign/zero extension is actually
missing.
As a first step, this patch has the verification step done for the SystemZ
backend only, but left off by default until all known issues have been
addressed.
Other targets/front-ends can now also add NoExt attribute where needed and do
this check in the backend.
This reverts commit 178fc4779ece31392a2cd01472b0279e50b3a199.
This attribute was not needed now that we are using the lsan style
ScopedDisabler for disabling this sanitizer
See #106736#106125
For more discussion
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
This patch is moving out stepvector intrinsic from the experimental
namespace.
This intrinsic exists in LLVM for several years now, and is widely used.
The previous semantics of `llvm.experimental.get.vector.length` was too
permissive such that it gave optimizers a hard time on anything related
to the number of iterations of VP-vectorized loops.
This patch tries to address this by assigning it a set of stricter
semantics similar to that of RVV's VSETVLI instructions, while being not
too RISC-V specific and leaving room for other (future) targets.
---------
Co-authored-by: Craig Topper <craig.topper@sifive.com>
Verify that the arguments of a naked function are not used. They can
only be referenced via registers/stack in inline asm, not as IR values.
Doing so will result in assertion failures in the backend.
There's probably more that we should verify, though I'm not completely
sure what the constraints are (would it be correct to require that naked
functions are exactly an inline asm call + unreachable, or is more
allowed?)
Fixes https://github.com/llvm/llvm-project/issues/104718.
C23 introduced new functions fminimum_num and fmaximum_num, and they
follow the minimumNumber and maximumNumber of IEEE754-2019. Let's
introduce new intrinsics to support them.
This patch introduces support only support for scalar values. The
support of
vector (vp, vp.reduce, vector.reduce),
experimental.constrained
will be added in future patches.
With this patch, MIPSr6 and LoongArch can work out of box with
fcanonical and fmax/fmin.
Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while
they have no fcanonical support yet.
I will add it in future patches.
The FMIN/FMAX of RISC-V instructions follows the
minimumNumber/maximumNumber of IEEE754-2019. We can just add it in
future patch.
Background
https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735
Currently we have fminnum/fmaxnum, which have different behavior on
different platform for NUM vs sNaN:
1) Fallback to fmin(3)/fmax(3): return qNaN.
2) ARM64/ARM32+Neon: same as libc.
3) MIPSr6/LoongArch/RISC-V: return NUM.
And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008
will submit as separated patches.
Basically, these operations are equivalent to a loop that iterates all
elements and then does a `getelementptr` (without `inbounds`!) plus
`load`/`store` only for the masked-on elements.
The first change is the clarification of rewrite-based semantics, and
the fact that when doing the rewrite, all of the instructions involved
need to have the rewrite. This is not a change in semantics: there is
wide agreement that this behavior is true for most flags. But it is
necessary to clarify this, and also clarify that there is a fundamental
difference between a flag like `nnan` and a flag like `contract`. Note
that several InstCombine transforms do not correctly check this behavior
at the moment.
The second change is a specific clarification of the rewrites performed
by arcp. These rewrites capture what is necessary to enable the
transformations that currently require just arcp, none of which are
using the flag incorrectly right now.
It is now translated to `<1 x i64>`, which allows the removal of a bunch
of special casing.
This _incompatibly_ changes the ABI of any LLVM IR function with
`x86_mmx` arguments or returns: instead of passing in mmx registers,
they will now be passed via integer registers. However, the real-world
incompatibility caused by this is expected to be minimal, because Clang
never uses the x86_mmx type -- it lowers `__m64` to either `<1 x i64>`
or `double`, depending on ABI.
This change does _not_ eliminate the SelectionDAG `MVT::x86mmx` type.
That type simply no longer corresponds to an IR type, and is used only
by MMX intrinsics and inline-asm operands.
Because SelectionDAGBuilder only knows how to generate the
operands/results of intrinsics based on the IR type, it thus now
generates the intrinsics with the type MVT::v1i64, instead of
MVT::x86mmx. We need to fix this before the DAG LegalizeTypes, and thus
have the X86 backend fix them up in DAGCombine. (This may be a
short-lived hack, if all the MMX intrinsics can be removed in upcoming
changes.)
Works towards issue #98272.
This PR adds a new vector intrinsic `@llvm.experimental.vector.compress`
to "compress" data within a vector based on a selection mask, i.e., it
moves all selected values (i.e., where `mask[i] == 1`) to consecutive
lanes in the result vector. A `passthru` vector can be provided, from
which remaining lanes are filled.
The main reason for this is that the existing
`@llvm.masked.compressstore` has very strong constraints in that it can
only write values that were selected, resulting in guard branches for
all targets except AVX-512 (and even there the AMD implementation is
_very_ slow). More instruction sets support "compress" logic, but only
within registers. So to store the values, an additional store is needed.
But this combination is likely significantly faster on many target as it
avoids branches.
In follow up PRs, my plan is to add target-specific lowerings for x86,
SVE, and possibly RISCV. I also want to combine this with a store
instruction, as this is probably a common case and we can avoid some
memory writes in that case.
See [discussion in
forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663)
for initial discussion on the design.
This reverts commit ac4b6b662630cd4d3bf6929f2b39ea203c0054a1.
A test change was missing for
mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir in the initial commit.
Following on from the discussion in
https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496
and the equivalent change for llvm.memset.inline (#95397), this removes
the requirement that the length of llvm.memcpy.inline is constant.
PreISelInstrinsicLowering will expand llvm.memcpy.inline with
non-constant lengths, while the codegen path for constant lengths is
left unaltered.