Targets without a `modf` libcall lower the intrinsic directly, matching
the existing `llvm.frexp` expansion. Targets with an existing libcall
are unchanged.
Fixes#173021
Libcall lowering decisions should come from the LibcallLoweringInfo
analysis. Query this through the DAG, so eventually the source
can be the analysis. For the moment this is just a wrapper around
the TargetLowering information.
Add support for `__builtin_stack_address` builtin. The semantics match
those of GCC's builtin with the same name.
`__builtin_stack_address` returns the starting address of the stack
region that may be used by called functions. It may or may not include
the space used for on-stack arguments passed to a callee (See [GCC
Bug/121013](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121013)).
Fixes#82632.
On platforms that soft promote `half`, using `lrint` intrinsics crashes
with the following:
SoftPromoteHalfOperand Op #0: t5: i32 = lrint t4
LLVM ERROR: Do not know how to soft promote this operator's operand!
PLEASE submit a bug report to
https://github.com/llvm/llvm-project/issues/ and include the crash
backtrace.
Stack dump:
0. Program arguments:
/Users/tmgross/Documents/projects/llvm/llvm-build/bin/llc
-mtriple=riscv32
1. Running pass 'Function Pass Manager' on module '<stdin>'.
2. Running pass 'RISC-V DAG->DAG Pattern Instruction Selection' on
function '@test_lrint_ixx_f16'
Resolve this by adding a soft promotion. GISel is included since tests
cover both.
Fixes crash tests added in
https://github.com/llvm/llvm-project/pull/152662 for targets that use
`softPromoteHalfType`.
Co-authored-by: Folkert de Vries <folkert@folkertdev.nl>
This PR implements the first change outlined in
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel
In order to allow non-immediate offsets in the llvm.vector.splice
intrinsic, we need to separate out the "shift left" and "shift right"
modes into two separate intrinsics, which were previously determined by
whether or not the offset is positive or negative.
The description in the LangRef has also been reworded in terms of
sliding elements left or right and extracting either the upper or lower
half as opposed to extracting from a certain index, which brings it
inline with the definition of `llvm.fshr.*`/`llvm.fshl.*`.
This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into
their new equivalent one based on their offset, so existing uses of
vector.splice should still work.
Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced
in this PR to keep the diff small and kick the tyres on the AutoUpgrader
a bit. I planned to do this in a follow up NFC but can include it in
this PR if reviewers prefer.
Similarly the shuffle costing kind `SK_Splice` has just been kept the
same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight`
later.
In line with a std proposal to introduce the llvm.clmul family of
intrinsics corresponding to carry-less multiply operations. This work
builds upon 727ee7e ([APInt] Introduce carry-less multiply primitives),
and follow-up patches will introduce custom-lowering on supported
targets, replacing target-specific clmul intrinsics.
Testing is done on the RISC-V target, which should be sufficient to
prove that the intrinsics work, since no RISC-V specific lowering has
been added.
Ref: https://isocpp.org/files/papers/P3642R3.html
Co-authored-by: Craig Topper <craig.topper@sifive.com>
Treat these like other shift operations by allowing the shift amount to
be a different type than the result.
The PromoteIntOp_Shift and LegalizeDAG code are not tested due to lack
of target support.
I'm looking at adding SSHLSAT for the RISC-V P extension. I don't need
this support for that since RISC-V only has one legal type. I just thought it
was odd that they weren't like other shifts.
sincospi/sincospif/sincospil does not appear to exist on common
targets. Darwin targets have __sincospi and __sincospif, so define
and use those implementations. I have no idea what version added
those calls, so I'm just guessing it's the same conditions as
__sincos_stret.
Most of this patch is working to preserve codegen when a vector
library is explicitly enabled. This only covers sleef and armpl,
as those are the only cases tested.
The multiple result libcalls have an aberrant process where the
legalizer looks for the scalar type's libcall in RuntimeLibcalls,
and then cross references TargetLibraryInfo to find a matching
vector call. This was unworkable in the sincospi case, since the
common case is there is no scalar call available. To preserve
codegen if the call is available, first try to match a libcall
with the vector type before falling back on the old scalar search.
Eventually all of this logic should be contained in RuntimeLibcalls,
without the link to TargetLibraryInfo. In principle we should perform
the same legalization logic as for an ordinary operation, trying
to find a matching subvector type with a libcall.
This avoids AArch64 legality rules depending on libcall
availability.
ARM, AArch64, and X86 all had custom lowering of fsincos which
all were just to emit calls to sincos_stret / sincosf_stret. This
messes with the cost heuristics around legality, because really
it's an expand/libcall cost and not a favorable custom.
This is a bit ugly, because we're emitting code trying to match the
C ABI lowered IR type for the aggregate return type. This now also
gives an easy way to lift the unhandled x86_32 darwin case, since
ARM already handled the return as sret case.
getNode updates flags correctly for CSE. Calling setFlags after getNode
may set the flags where they don't apply.
I've added a Flags argument to getSelectCC and the signature of getNode that takes
an ArrayRef of EVTs.
Hexagon currently has an untested global flag to control fast
math variants of libcalls. Add fast variants as explicit libcall
options so this can be a flag based lowering decision, and implement
it. I have no idea what fast math flags the hexagon case requires,
so I picked the maximally potentially relevant set of flags although
this probably is refinable per call. Looking in compiler-rt, I'm not
sure if the fast variants are anything more than aliases.
This PR takes the work previously done by @pawan-nirpal-031 on X86 in
#106370, and makes it available in common code. This should enable all
targets to use `__builtin_canonicalize` for all `f(16|32|64|128)` data
types.
Canonicalization is implemented here as multiplication by `1.0`, as
suggested in [the
docs](https://llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic).
This reverts commit 58cc1675ec7b4aa5bc2dab56180cb7af1b23ade5.
I also made the incorrect assumption that we know both values are
+/-0.0 here as well. Revert for now.
After https://github.com/llvm/llvm-project/pull/139893, we now have
[de]interleave intrinsics for factors 2-8 inclusive, with the plan to
eventually get the loop vectorizer to emit a single intrinsic for these
factors instead of recursively deinterleaving (to support scalable
non-power-of-2 factors and to remove the complexity in the interleaved
access pass).
AArch64 currently supports scalable interleaved groups of factors 2 and
4 from the loop vectorizer. For factor 4 this is currently emitted as a
series of recursive [de]interleaves, and normally converted to a target
intrinsic in the interleaved access pass.
However if for some reason the interleaved access pass doesn't catch it,
the [de]interleave4 intrinsic will need to be lowered by the backend.
This patch legalizes the node and any other power-of-2 factor to smaller
factors, so if a target can lower [de]interleave2 it should be able to
handle this without crashing.
Factor 3 will probably be more complicated to lower so I've left it out
for now. We can disable it in the AArch64 cost model when implementing
the loop vectorizer changes.
FMAXIMUM is currently legalized via IS_FPCLASS for the signed zero
handling. This is problematic, because it assumes the equivalent integer
type is legal. Many targets have legal fp128, but illegal i128, so this
results in legalization failures.
Fix this by replacing IS_FPCLASS with checking the bitcast to integer
instead. In that case it is sufficient to use any legal integer type, as
we're just interested in the sign bit. This can be obtained via a stack
temporary cast. There is existing FloatSignAsInt functionality used for
legalization of FABS and similar we can use for this purpose.
Fixes https://github.com/llvm/llvm-project/issues/139380.
Fixes https://github.com/llvm/llvm-project/issues/139381.
Fixes https://github.com/llvm/llvm-project/issues/140445.
This is a reland of #99752 with the bug fixed (see test diff in the
third commit in this PR).
All `popcount` libcalls return `int`, but `ISD::CTPOP` returns the type
of the argument, which can be wider than `int`. The fix is to make DAG
legalizer pass the correct return type to `makeLibCall` and sign-extend
the result afterwards.
Original commit message:
The main change is adding CTPOP to `RuntimeLibcalls.def` to allow
targets to use LibCall action for CTPOP. DAG legalizers are changed
accordingly.
Pull Request: https://github.com/llvm/llvm-project/pull/101786
- _Float16 is now accepted by Clang.
- The half IR type is fully handled by the backend.
- These values are passed in FP registers and converted to/from float around
each operation.
- Compiler-rt conversion functions are now built for s390x including the missing
extendhfdf2 which was added.
Fixes#50374
By pulling the truncates and extensions out of operations during
operation legalization we enable more optimization via DAGCombiner.
While the test cases show only cosmetic improvements (unlikely to impact
the final SASS) in real programs the exposure of these truncates can
allow for more optimization.
PR #130124 added a use of FlagInserter to the start of
SelectionDAGLegalize::PromoteNode, making some of the places where we
set flags be redundant, so remove them. The places where the setting of
flags remains are in non-floating-point operations.
When we have a floating-point operation that a target doesn't support
for a given type, but does support for a wider type, then there are two
ways this can be handled:
* If the target doesn't have any registers at all of this type then
LegalizeTypes will convert the operation.
* If we do have registers but no operation for this type, then the
operation action will be Promote and it's handled in PromoteNode.
In both cases the operation at the wider type, and the conversion
operations to and from that type, should have the same fast math flags
as the original operation.
This is being done in preparation for a DAGCombine patch which makes use
of these fast math flags.
This adds the `llvm.sincospi` intrinsic, legalization, and lowering
(mostly reusing the lowering for sincos and frexp).
The `llvm.sincospi` intrinsic takes a floating-point value and returns
both the sine and cosine of the value multiplied by pi. It computes the
result more accurately than the naive approach of doing the
multiplication ahead of time, especially for large input values.
```
declare { float, float } @llvm.sincospi.f32(float %Val)
declare { double, double } @llvm.sincospi.f64(double %Val)
declare { x86_fp80, x86_fp80 } @llvm.sincospi.f80(x86_fp80 %Val)
declare { fp128, fp128 } @llvm.sincospi.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 } @llvm.sincospi.ppcf128(ppc_fp128 %Val)
declare { <4 x float>, <4 x float> } @llvm.sincospi.v4f32(<4 x float> %Val)
```
Currently, the default lowering of this intrinsic relies on the
`sincospi[f|l]` functions being available in the target's runtime (e.g.
libc).
This adds the `llvm.modf` intrinsic, legalization, and lowering (mostly
reusing the lowering for sincos and frexp).
The `llvm.modf` intrinsic takes a floating-point value and returns both
the integral and fractional parts (as a struct).
```
declare { float, float } @llvm.modf.f32(float %Val)
declare { double, double } @llvm.modf.f64(double %Val)
declare { x86_fp80, x86_fp80 } @llvm.modf.f80(x86_fp80 %Val)
declare { fp128, fp128 } @llvm.modf.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 } @llvm.modf.ppcf128(ppc_fp128 %Val)
declare { <4 x float>, <4 x float> } @llvm.modf.v4f32(<4 x float> %Val)
```
This corresponds to the libm `modf` function but returns multiple values
in a struct (rather than take output pointers), which makes it easier to
vectorize.