364 Commits

Author SHA1 Message Date
Gergo Stomfai
15d48c5bbe
[X86][DAG] remove LowerFCanonicalize (#188127)
Remove LowerFCanonicalize. Added fallback for cases when the scalar type also has its Custom lowering to avoid regressions on AMDGPU and SystemZ.

Fixes #143862
2026-04-01 13:34:05 +00:00
Luke Lau
598f3535fa
[SelectionDAG] Expand CTTZ_ELTS[_ZERO_POISON] and handle legalization (#188691)
This is a second attempt at "[SelectionDAG] Expand
CTTZ_ELTS[_ZERO_POISON] and handle splitting" (#188220)

That PR had to be reverted in 7d39664a6ae8daaf186b65578492244d96a50bf2
because we had crashes on AMDGPU since we didn't have scalarization
support, and other crashes on PowerPC because we didn't handle the case
when a vector needed widened. Tests for these are added in
AMDGPU/cttz-elts.ll, RISCV/rvv/cttz-elts-scalarize.ll and
PowerPC/cttz-elts.ll.

The former crash has been fixed by adding
DAGTypeLegalizer::ScalarizeVecOp_CTTZ_ELTS.

The second crash has been fixed by reworking
TargetLowering::expandCttzElts. The expansion for CTTZ_ELTS is nearly
identical to VECTOR_FIND_LAST_ACTIVE, except it uses a reverse step
vector and subtracts the result from VF. The easiest way to fix these
crashes without introducing regressions is to reuse the
VECTOR_FIND_LAST_ACTIVE expansion which already handles the case where
the vector needs widened.

This means that the node now needs to take in a boolean vector argument
and uses VSELECT instead of an AND to zero out inactive lanes, so the op
promotion code has also been shared.
2026-03-31 07:25:57 +00:00
Demetrius Kanios
96bd7b6e15
[CodeGen] Add additional params to TargetLoweringBase::getTruncStoreAction (#187422)
The truncating store analogue of #181104.

Adds `Alignment` and `AddrSpace` parameters to
`TargetLoweringBase::getTruncStoreAction` and dependents, and introduces
a `getCustomTruncStoreAction` hook for targets to customize legalization
behavior using this new information.

This change is fully backwards compatible from the target's point of
view, with `setTruncStoreAction` having identical functionality. The
change is purely additive.
2026-03-30 16:52:45 -07:00
Luke Lau
7d39664a6a
Revert "[SelectionDAG] Expand CTTZ_ELTS[_ZERO_POISON] and handle splitting" (#188220)
Reverts llvm/llvm-project#185605

Buildbot failures caused by ISel crashes in
https://lab.llvm.org/buildbot/#/builders/157/builds/45416 and
https://lab.llvm.org/buildbot/#/builders/10/builds/25156
2026-03-24 11:35:14 +00:00
Luke Lau
fe105347e2
[SelectionDAG] Expand CTTZ_ELTS[_ZERO_POISON] and handle splitting (#185605)
Currently a cttz.elts of e.g. nxv32i1 will get expanded to a reduction
of nxv32i64 or equivalent, but we can split it into two legal nxv16i1
cttz.elts once we have dedicated SelectionDAG nodes.

This implements the splitting for them the same way we implement type
splitting for vp.cttz.elts, i.e. check if the low result is VF, and if
so add it to the result of the high result. It also implements operand
type promotion for NEON which needs to promote i1 vectors to something
larger first.

We also need to move expansion into LegalizeVectorOps so it doesn't get
expanded before type legalization can do splitting. This uses
LegalizeVectorOps in case the scalar reduction type, which depends on
the minimum bitwidth needed to store the result, still needs type
promotion.

The TTI costs should be updated after this to reflect the more efficient
codegen, but that is deferred to another PR.
2026-03-24 10:11:46 +00:00
Demetrius Kanios
351501799a
[CodeGen] Improve getLoadExtAction and friends (#181104)
Alternative approach to the same goals as #162407

This takes `TargetLoweringBase::getLoadExtAction`, renames it to
`TargetLoweringBase::getLoadAction`, merges `getAtomicLoadExtAction`
into it, and adds more inputs for relavent information (alignment,
address space).

The `isLoadExtLegal[OrCustom]` helpers are also modified in a matching
manner.

This is fully backwards compatible, with the existing `setLoadExtAction`
working as before. But this allows targets to override a new hook to
allow the query to make more use of the information. The hook
`getCustomLoadAction` is called with all the parameters whenever the
table lookup yields `LegalizeAction::Custom`, and can return any other
action it wants.
2026-03-17 23:40:19 -07:00
Dmitry Sidorov
a636928bb4
[SelectionDAG] Add expansion for llvm.convert.from.arbitrary.fp (#179318)
The expansion converts arbitrary-precision FP represented as integer
following these algorithm:
1. Extract sign, exponent, and mantissa bit fields via masks and shifts.
2. Classify the input (zero, denormal, normal, Inf, NaN) using the
exponent and mantissa fields.
3. Normal path: adjusting the exponent bias and left-shifting the
mantissa to fit the wider destination format.
4. Denormal path: normalizing by finding the MSB position of the
mantissa (via count-leading-zeros), computing the correct exponent from
that position, stripping the implicit leading 1, and shifting the
fraction into the destination mantissa field.
5. Assemble the destination IEEE bit pattern (sign | exponent |
mantissa) and select among the normal, denormal, and special-value
results.

Currently only conversions from OCP floats are covered, in LLVM terms
these are: Float8E5M2, Float8E4M3FN, Float6E3M2FN, Float6E2M3FN,
Float4E2M1FN.

OCP spec:

https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf

AI has assisted in X86 E2E testing.
2026-03-04 10:40:47 +01:00
David Sherwood
0b36d4265e
[AArch64] Add vector expansion support for ISD::FCBRT when using ArmPL (#183750)
This patch teaches the backend how to lower the FCBRT DAG node to the
vector math library function when using ArmPL. This is similar to what
we already do for llvm.pow/FPOW, however the only way to expose this is
via a DAG combine that converts

  FPOW(<2 x double> %x, <2 x double> <double 1.0/3.0, double 1.0/3.0>)

into

  FCBRT(<2 x double> %x)

when the appropriate fast math flags are present on the node. I've
updated the DAG combine to handle vector types and only perform the
transformation if there exists a vector library variant of cbrt.
2026-03-03 10:39:21 +00:00
Craig Topper
d20395cfa3
[LegalizeVectorOps][RISCV][PowerPC][AArch64][X86] Enable the clmul/clmulr/clmulh expansion code. (#184257)
These opcodes weren't added to the master switch statement that
determines if they should be considered vector ops.
2026-03-02 21:50:40 -08:00
David Sherwood
9e95cff515
[AArch64] Add vector expansion support for ISD::FPOW when using ArmPL (#183526)
This patch is split off from PR #183319 and teaches the backend how to
lower the FPOW DAG node to the vector math library function when using
ArmPL. This is similar to what we already do for llvm.sincos/FSINCOS
today.
2026-02-27 09:43:05 +00:00
Matt Arsenault
a521774217
DAG: Use poison for unused shuffle operands in legalizer (#177578) 2026-01-23 18:20:56 +01:00
Matt Arsenault
01e6245af4
DAG: Avoid querying libcall info from TargetLowering (#176268)
Libcall lowering decisions should come from the LibcallLoweringInfo
analysis. Query this through the DAG, so eventually the source
can be the analysis. For the moment this is just a wrapper around
the TargetLowering information.
2026-01-16 09:02:49 +00:00
Ramkumar Ramachandra
9e5e267a03
[ISel] Introduce llvm.clmul intrinsic (#168731)
In line with a std proposal to introduce the llvm.clmul family of
intrinsics corresponding to carry-less multiply operations. This work
builds upon 727ee7e ([APInt] Introduce carry-less multiply primitives),
and follow-up patches will introduce custom-lowering on supported
targets, replacing target-specific clmul intrinsics.

Testing is done on the RISC-V target, which should be sufficient to
prove that the intrinsics work, since no RISC-V specific lowering has
been added.

Ref: https://isocpp.org/files/papers/P3642R3.html

Co-authored-by: Craig Topper <craig.topper@sifive.com>
2026-01-05 20:24:06 +00:00
Islam Imad
7ceecfad40
[CodeGen] Fix EVT::changeVectorElementType assertion on simple-to-extended fallback (#173413)
Fixes #171608
2025-12-28 18:51:18 +00:00
Benjamin Maxwell
baa49835da
[AArch64] Support lowering v4i16/f16 VECTOR_COMPRESS nodes to SVE (#173256)
This is a follow-up to #171162, which broke the (untested) lowering of
v4i16/f16 to SVE.

See: https://github.com/llvm/llvm-project/pull/171162#discussion_r2601901376
2025-12-24 14:26:13 +00:00
Sam Tebbs
19e1011df5
[SelectionDAG] Fix unsafe cases for loop.dependence.{war/raw}.mask (#168565)
Both `LOOP_DEPENDENCE_WAR_MASK` and `LOOP_DEPENDENCE_RAW_MASK` are
currently hard to split correctly, and there are a number of incorrect
cases.

The difficulty comes from how the intrinsics are defined. For example,
take `LOOP_DEPENDENCE_WAR_MASK`.

It is defined as the OR of:

* `(ptrB - ptrA) <= 0`
* `elementSize * lane < (ptrB - ptrA)`

Now, if we want to split a loop dependence mask for the high half of the
mask we want to compute:

* `(ptrB - ptrA) <= 0`
* `elementSize * (lane + LoVT.getElementCount()) < (ptrB - ptrA)`

However, with the current opcode definitions, we can only modify ptrA or
ptrB, which may change the result of the first case, which should be
invariant to the lane.

This patch resolves these cases by adding a "lane offset" to the ISD
opcodes. The lane offset is always a constant. For scalable masks, it is
implicitly multiplied by vscale.

This makes splitting trivial as we increment the lane offset by
`LoVT.getElementCount()` now.

Note: In the AArch64 backend, we only support zero lane offsets (as
other cases are tricky to lower to whilewr/rw).

---------

Co-authored-by: Benjamin Maxwell <benjamin.maxwell@arm.com>
2025-12-12 08:44:33 +00:00
Matt Arsenault
a3aaa1a391
DAG: Use RuntimeLibcalls to legalize vector frem calls (#170719)
This continues the replacement of TargetLibraryInfo uses in codegen
with RuntimeLibcallsInfo started in
821d2825a4f782da3da3c03b8a002802bff4b95c.
The series there handled all of the multiple result calls. This
extends for the other handled case, which happened to be frem.

For some reason the Libcall for these are prefixed with "REM_", for
the instruction "frem", which maps to the libcall "fmod".
2025-12-11 13:33:27 +00:00
AZero13
d831f8df52
[SelectionDAG] Fix AArch64 machine verifier bug when expanding LOOP_DEPENDENCE_MASK (#168221)
TargetConstant nodes don't match TableGen ImmLeaf patterns during
instruction selection. When this zero constant flows into the AArch64
CCMP formation code, the machine verifier hits an assertion in expensive
checks.

Fixes: #168227
2025-11-15 21:12:11 +00:00
Matt Arsenault
c5aace4236
DAG: Move expandMultipleResultFPLibCall to TargetLowering (NFC) (#166988)
This kind of helper is higher level and not general enough to go
directly in SelectionDAG. Most similar utilities are in TargetLowering.
2025-11-12 03:50:33 +00:00
Matt Arsenault
95f2728b5c
DAG: Stop using TargetLibraryInfo for multi-result FP intrinsic codegen (#166987)
Only use RuntimeLibcallsInfo. Remove the helper functions used to
transition.
2025-11-12 02:47:28 +00:00
Matt Arsenault
4b9771e41a
DAG: Use modf vector libcalls through RuntimeLibcalls (#166986)
Copy new process from sincos/sincospi
2025-11-11 18:05:35 -08:00
Matt Arsenault
de68181d7f
DAG: Use sincos vector libcalls through RuntimeLibcalls (#166984)
Copy new process from sincospi.
2025-11-11 10:51:23 -08:00
Matt Arsenault
821d2825a4
RuntimeLibcalls: Remove incorrect sincospi from most targets (#166982)
sincospi/sincospif/sincospil does not appear to exist on common
targets. Darwin targets have __sincospi and __sincospif, so define
and use those implementations. I have no idea what version added
those calls, so I'm just guessing it's the same conditions as
__sincos_stret.

Most of this patch is working to preserve codegen when a vector
library is explicitly enabled. This only covers sleef and armpl,
as those are the only cases tested.

The multiple result libcalls have an aberrant process where the
legalizer looks for the scalar type's libcall in RuntimeLibcalls,
and then cross references TargetLibraryInfo to find a matching
vector call. This was unworkable in the sincospi case, since the
common case is there is no scalar call available. To preserve
codegen if the call is available, first try to match a libcall
with the vector type before falling back on the old scalar search.

Eventually all of this logic should be contained in RuntimeLibcalls,
without the link to TargetLibraryInfo. In principle we should perform
the same legalization logic as for an ordinary operation, trying
to find a matching subvector type with a libcall.
2025-11-10 11:05:08 -08:00
Damian Heaton
70f4b596cf
Add llvm.vector.partial.reduce.fadd intrinsic (#159776)
With this intrinsic, and supporting SelectionDAG nodes, we can better
make use of instructions such as AArch64's `FDOT`.
2025-11-07 15:36:54 +00:00
Sam Tebbs
569d738d4e
[Intrinsics][AArch64] Add intrinsics for masking off aliasing vector lanes (#117007)
It can be unsafe to load a vector from an address and write a vector to
an address if those two addresses have overlapping lanes within a
vectorised loop iteration.

This PR adds intrinsics designed to create a mask with lanes disabled if
they overlap between the two pointer arguments, so that only safe lanes
are loaded, operated on and stored. The `loop.dependence.war.mask`
intrinsic represents cases where the store occurs after the load, and
the opposite for `loop.dependence.raw.mask`. The distinction between
write-after-read and read-after-write is important, since the ordering
of the read and write operations affects if the chain of those
instructions can be done safely.

Along with the two pointer parameters, the intrinsics also take an
immediate that represents the size in bytes of the vector element types.

This will be used by #100579.
2025-09-02 15:35:15 +01:00
Nikita Popov
01bc742185
[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817)
This ensures that the required fields are set, and also makes the
construction more convenient.
2025-08-15 18:06:07 +02:00
Craig Topper
8d549cf036
[SelectionDAG] Pass SDNodeFlags through getNode instead of setFlags. (#149852)
getNode updates flags correctly for CSE. Calling setFlags after getNode
may set the flags where they don't apply.

I've added a Flags argument to getSelectCC and the signature of getNode that takes
an ArrayRef of EVTs.
2025-07-22 08:06:30 -07:00
Paul Walker
68732ce8e0
[LLVM][CodeGen][SVE] Add isel for bfloat unordered reductions. (#143540)
The omissions are VECREDUCE_SEQ_* and MUL. The former goes down a
different code path and the latter is unsupported across all element types.
2025-06-20 11:46:25 +01:00
Philip Reames
939666380f
[SDAG] Add partial_reduce_sumla node (#141267)
We have recently added the partial_reduce_smla and partial_reduce_umla
nodes to represent Acc += ext(b) * ext(b) where the two extends have to
have the same source type, and have the same extend kind.

For riscv64 w/zvqdotq, we have the vqdot and vqdotu instructions which
correspond to the existing nodes, but we also have vqdotsu which
represents the case where the two extends are sign and zero respective
(i.e. not the same type of extend).

This patch adds a partial_reduce_sumla node which has sign extension for
A, and zero extension for B. The addition is somewhat mechanical.
2025-06-09 07:17:45 -07:00
Philip Reames
1651aa2943
[SDAG] Split the partial reduce legalize table by opcode [nfc] (#141970)
On it's own, this change should be non-functional. This is a preparatory
change for https://github.com/llvm/llvm-project/pull/141267 which adds a
new form of PARTIAL_REDUCE_*MLA. As noted in the discussion on that
review, AArch64 needs a different set of legal and custom types for the
PARTIAL_REDUCE_SUMLA variant than the currently existing
PARTIAL_REDUCE_UMLA/SMLA.
2025-05-29 14:05:31 -07:00
Philip Reames
cf2f558501 [DAG/RISCV] Continue mitgrating to getInsertSubvector and getExtractSubvector
Follow up to 6e654caab, use the new routines in more places.  Note that
I've excluded from this patch any case which uses a getConstant index
instead of a getVectorIdxConstant index just to minimize room for
error.  I'll get those in a separate follow up.
2025-05-08 09:40:45 -07:00
Nicholas Guy
a1f369e630
[AArch64][SVE] Add dot product lowering for PARTIAL_REDUCE_MLA node (#130933)
Add lowering in tablegen for PARTIAL_REDUCE_U/SMLA ISD nodes. Only
happens when the combine has been performed on the ISD node. Also adds
in check to only do the DAG combine when the node can then eventually be
lowered, so changes neon tests too.

---------

Co-authored-by: James Chesterman <james.chesterman@arm.com>
2025-04-23 13:19:41 +01:00
Jim Lin
94f6b6d538
[SelectionDAG][RISCV] Promote VECREDUCE_{FMAX,FMIN,FMAXIMUM,FMINIMUM} (#128800)
This patch also adds the tests for VP_REDUCE_{FMAX,FMIN,FMAXIMUM,FMINIMUM}, which have been supported for a while.
2025-02-28 23:13:30 +08:00
James Chesterman
d4a0848dc6
[SelectionDAG] Add PARTIAL_REDUCE_U/SMLA ISD Nodes (#125207)
Add signed and unsigned PARTIAL_REDUCE_MLA ISD nodes. Add command line
argument (aarch64-enable-partial-reduce-nodes) that indicates whether the
intrinsic experimental_vector_partial_ reduce_add will be transformed
into the new ISD node. Lowering with the new ISD nodes will, for now,
always be done as an expand.
2025-02-18 09:08:47 +00:00
Benjamin Maxwell
19556eccf6
[RTLIB] Rename getFSINCOS() to getSINCOS (NFC) (#126705)
This makes the name more consistent with the other helpers.
2025-02-11 11:51:35 +00:00
Benjamin Maxwell
701223ac20
[IR] Add llvm.sincospi intrinsic (#125873)
This adds the `llvm.sincospi` intrinsic, legalization, and lowering
(mostly reusing the lowering for sincos and frexp).

The `llvm.sincospi` intrinsic takes a floating-point value and returns
both the sine and cosine of the value multiplied by pi. It computes the
result more accurately than the naive approach of doing the
multiplication ahead of time, especially for large input values.

```
declare { float, float }          @llvm.sincospi.f32(float  %Val)
declare { double, double }        @llvm.sincospi.f64(double %Val)
declare { x86_fp80, x86_fp80 }    @llvm.sincospi.f80(x86_fp80  %Val)
declare { fp128, fp128 }          @llvm.sincospi.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 }  @llvm.sincospi.ppcf128(ppc_fp128  %Val)
declare { <4 x float>, <4 x float> } @llvm.sincospi.v4f32(<4 x float>  %Val)
```

Currently, the default lowering of this intrinsic relies on the
`sincospi[f|l]` functions being available in the target's runtime (e.g.
libc).
2025-02-11 09:01:30 +00:00
Benjamin Maxwell
4bf97aa818
[IR] Add llvm.modf intrinsic (#121948)
This adds the `llvm.modf` intrinsic, legalization, and lowering (mostly
reusing the lowering for sincos and frexp).

The `llvm.modf` intrinsic takes a floating-point value and returns both
the integral and fractional parts (as a struct).

```
declare { float, float }             @llvm.modf.f32(float  %Val)
declare { double, double }           @llvm.modf.f64(double %Val)
declare { x86_fp80, x86_fp80 }       @llvm.modf.f80(x86_fp80  %Val)
declare { fp128, fp128 }             @llvm.modf.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 }     @llvm.modf.ppcf128(ppc_fp128  %Val)
declare { <4 x float>, <4 x float> } @llvm.modf.v4f32(<4 x float>  %Val)
```

This corresponds to the libm `modf` function but returns multiple values
in a struct (rather than take output pointers), which makes it easier to
vectorize.
2025-02-07 09:25:13 +00:00
Graham Hunter
d9f165ddea
[SDAG] Add an ISD node to help lower vector.extract.last.active (#118810)
Based on feedback from the clastb codegen PR, I'm refactoring basic codegen for the vector.extract.last.active intrinsic to lower to an ISD node in SelectionDAGBuilder then expand in LegalizeVectorOps, instead of doing everything in the builder.

The new ISD node (vector_find_last_active) only covers finding the index of the last active element of the mask, and extracting the element + handling passthru is left to existing ISD nodes.
2025-01-20 12:57:05 +00:00
Craig Topper
8ce81f17a1
[LegalizeVectorOps][RISCV] Use VP_FP_EXTEND/ROUND when promoting VP_FP* operations. (#122784)
This preserves the original VL leading to more reuse of VL for vsetvli.
The VLOptimizer can also clean up a lot of this, but I'm not sure if it
gets all of it.

There are some regressions in here from propagating the mask too, but
I'm not sure if that's a concern.
2025-01-13 15:18:41 -08:00
abhishek-kaushik22
366e62a0cb
[X86] Combine uitofp <v x i32> to <v x half> (#121809)
Closes #121793
2025-01-08 16:49:29 +08:00
Simon Pilgrim
923675193b [DAG] VectorLegalizer::ExpandUINT_TO_FLOAT- pull out repeated getValueType calls. NFC. 2025-01-06 18:49:51 +00:00
Phoebe Wang
1547382033
[X86] Support lowering of FMINIMUMNUM/FMAXIMUMNUM (#121464) 2025-01-06 21:28:58 +08:00
Craig Topper
e32afded92
[LegalizeVectorOps] Use getBoolConstant instead of getAllOnesConstant in VectorLegalizer::UnrollVSETCC. (#121526)
This code should follow the target preference for boolean contents of a
vector type. We shouldn't assume that true is negative one.
2025-01-03 10:46:37 -08:00
Benjamin Maxwell
ea6b8fa4b9
[SDAG] Merge multiple-result libcall expansion into DAG.expandMultipleResultFPLibCall() (#114792)
This merges the logic for expanding both FFREXP and FSINCOS into one
method `DAG.expandMultipleResultFPLibCall()`. This reduces duplication
and also allows FFREXP to benefit from the stack slot elimination
implemented for FSINCOS. This method will also be used in future to
implement more multiple-result intrinsics (such as modf and sincospi).
2024-11-06 11:06:06 +00:00
Benjamin Maxwell
89a8c71db6
[SDAG] Support expanding FSINCOS to vector library calls (#114039)
This shares most of its code with the scalar sincos expansion. It allows
expanding vector FSINCOS nodes to a library call from the specified
`-vector-library`. The upside of this is it will mean the vectorizer
only needs to handle the sincos intrinsic, which has no memory effects,
and this can handle lowering the intrinsic to a call that takes output
pointers.
2024-10-31 12:41:43 +00:00
Yingwei Zheng
cf9d1c1486
[SDAG] Simplify SDNodeFlags with bitwise logic (#114061)
This patch allows using enumeration values directly and simplifies the
implementation with bitwise logic. It addresses the comment in
https://github.com/llvm/llvm-project/pull/113808#discussion_r1819923625.
2024-10-31 08:10:07 +08:00
Benjamin Maxwell
c3260c65e8
[IR] Add llvm.sincos intrinsic (#109825)
This adds the `llvm.sincos` intrinsic, legalization, and lowering.

The `llvm.sincos` intrinsic takes a floating-point value and returns
both the sine and cosine (as a struct).

```
declare { float, float }          @llvm.sincos.f32(float  %Val)
declare { double, double }        @llvm.sincos.f64(double %Val)
declare { x86_fp80, x86_fp80 }    @llvm.sincos.f80(x86_fp80  %Val)
declare { fp128, fp128 }          @llvm.sincos.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 }  @llvm.sincos.ppcf128(ppc_fp128  %Val)
declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float>  %Val)
```

The lowering is built on top of the existing FSINCOS ISD node, with
additional type legalization to allow for f16, f128, and vector values.
2024-10-29 10:52:20 +00:00
Tex Riddell
875afa939d
[X86][CodeGen] Add base atan2 intrinsic lowering (p4) (#110760)
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

Based on example PR #96222 and fix PR #101268, with some differences due
to 2-arg intrinsic and intermediate refactor (RuntimeLibCalls.cpp).

- Add llvm.experimental.constrained.atan2 - Intrinsics.td,
ConstrainedOps.def, LangRef.rst
- Add to ISDOpcodes.h and TargetSelectionDAG.td, connect to intrinsic in
BasicTTIImpl.h, and LibFunc_ in SelectionDAGBuilder.cpp
- Update LegalizeDAG.cpp, LegalizeFloatTypes.cpp, LegalizeVectorOps.cpp,
and LegalizeVectorTypes.cpp
- Update isKnownNeverNaN in SelectionDAG.cpp
- Update SelectionDAGDumper.cpp
- Update libcalls - RuntimeLibcalls.def, RuntimeLibcalls.cpp
- TargetLoweringBase.cpp - Expand for vectors, promote f16
- X86ISelLowering.cpp - Expand f80, promote f32 to f64 for MSVC

Part 4 for Implement the atan2 HLSL Function #70096.
2024-10-16 11:43:17 -07:00
Paul Walker
02dd6b1014
[LLVM][CodeGen] Add lowering for scalable vector bfloat operations. (#109803)
Specifically:
  fabs, fadd, fceil, fdiv, ffloor, fma, fmax, fmaxnm, fmin, fminnm,
  fmul, fnearbyint, fneg, frint, fround, froundeven, fsub, fsqrt &
  ftrunc
2024-10-07 13:01:59 +01:00
Craig Topper
92a8b81bdf
[LegalizeVectorOps] Enable ExpandFABS/COPYSIGN to use integer ops for fixed vectors in some cases. (#109232)
Copy the same FSUB check from ExpandFNEG to avoid breaking AArch64 and
ARM.
2024-09-30 11:44:49 -07:00