259 Commits

Author SHA1 Message Date
Graham Hunter
d9f165ddea
[SDAG] Add an ISD node to help lower vector.extract.last.active (#118810)
Based on feedback from the clastb codegen PR, I'm refactoring basic codegen for the vector.extract.last.active intrinsic to lower to an ISD node in SelectionDAGBuilder then expand in LegalizeVectorOps, instead of doing everything in the builder.

The new ISD node (vector_find_last_active) only covers finding the index of the last active element of the mask, and extracting the element + handling passthru is left to existing ISD nodes.
2025-01-20 12:57:05 +00:00
Antonio Frighetto
8a2113c5c5 Reapply "[SelectionDAG] Add preliminary plumbing for samesign flag"
Original commit: 19c8475871faee5ebb06281034872a85a2552675

Multiple 2-stage sanitizer buildbots were reporting failures, the issue
has been addressed separately in 29246a92aee87e86cbc2bb64ee520d7385644f34.
2024-11-02 14:19:58 +01:00
Vitaly Buka
c5eb591257
Revert "[SelectionDAG] Add preliminary plumbing for samesign flag" (#114647)
Crashes on ARM builds
https://lab.llvm.org/buildbot/#/builders/85/builds/2548
```
DAGCombiner.cpp:16157: SDValue (anonymous
namespace)::DAGCombiner::visitFREEZE(SDNode *):
Assertion `DAG.isGuaranteedNotToBeUndefOrPoison(R,
false) && "Can't create node that may be
undef/poison!"' failed.
```

Issue #114648

This reverts commit 19c8475871faee5ebb06281034872a85a2552675.
2024-11-02 01:35:13 -07:00
Antonio Frighetto
19c8475871 [SelectionDAG] Add preliminary plumbing for samesign flag
Extend recently-added poison-generating IR flag to codegen as well.
2024-10-31 19:47:50 +01:00
Tex Riddell
875afa939d
[X86][CodeGen] Add base atan2 intrinsic lowering (p4) (#110760)
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

Based on example PR #96222 and fix PR #101268, with some differences due
to 2-arg intrinsic and intermediate refactor (RuntimeLibCalls.cpp).

- Add llvm.experimental.constrained.atan2 - Intrinsics.td,
ConstrainedOps.def, LangRef.rst
- Add to ISDOpcodes.h and TargetSelectionDAG.td, connect to intrinsic in
BasicTTIImpl.h, and LibFunc_ in SelectionDAGBuilder.cpp
- Update LegalizeDAG.cpp, LegalizeFloatTypes.cpp, LegalizeVectorOps.cpp,
and LegalizeVectorTypes.cpp
- Update isKnownNeverNaN in SelectionDAG.cpp
- Update SelectionDAGDumper.cpp
- Update libcalls - RuntimeLibcalls.def, RuntimeLibcalls.cpp
- TargetLoweringBase.cpp - Expand for vectors, promote f16
- X86ISelLowering.cpp - Expand f80, promote f32 to f64 for MSVC

Part 4 for Implement the atan2 HLSL Function #70096.
2024-10-16 11:43:17 -07:00
Matt Arsenault
9578db9c11 DAG: Handle atomic fsub in node dumper 2024-09-13 10:22:27 +04:00
anjenner
4af249fe6e
Add usub_cond and usub_sat operations to atomicrmw (#105568)
These both perform conditional subtraction, returning the minuend and
zero respectively, if the difference is negative.
2024-09-06 16:19:20 +01:00
Stephen Tozer
3d08ade7bd
[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149)
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:

https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850

This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).

Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
2024-08-29 17:53:32 +01:00
YunQiang Su
fb9e685fc4
Intrinsic: introduce minimumnum and maximumnum for IR and SelectionDAG (#96649)
C23 introduced new functions fminimum_num and fmaximum_num, and they
follow the minimumNumber and maximumNumber of IEEE754-2019. Let's
introduce new intrinsics to support them.

This patch introduces support only support for scalar values. The
support of
  vector (vp, vp.reduce, vector.reduce),
  experimental.constrained
will be added in future patches.

With this patch, MIPSr6 and LoongArch can work out of box with
fcanonical and fmax/fmin.

Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while
they have no fcanonical support yet.
I will add it in future patches.

The FMIN/FMAX of RISC-V instructions follows the
minimumNumber/maximumNumber of IEEE754-2019. We can just add it in
future patch.

Background

https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735
Currently we have fminnum/fmaxnum, which have different behavior on
different platform for NUM vs sNaN:
   1) Fallback to fmin(3)/fmax(3): return qNaN.
   2) ARM64/ARM32+Neon: same as libc.
   3) MIPSr6/LoongArch/RISC-V: return NUM.

And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008
will submit as separated patches.
2024-08-15 14:09:36 +08:00
hanbeom
0d074ba197
[DAG] Support saturated truncate (#99418)
A truncate is considered saturated if no additional conversion is required between the target and return values. If the target is saturated when attempting to truncate from a vector, there is an opportunity to optimize it.

Previously, each architecture had its own attempt at optimization, leading to redundant code. This patch implements common logic by introducing three new ISDs:

`ISD::TRUNCATE_SSAT_S`: When the operand is a signed value and  the range of values matches the range of signed values of the  destination type.

`ISD::TRUNCATE_SSAT_U`: When the operand is a signed value and the range of values matches the range of unsigned values of the destination type.

`ISD::TRUNCATE_USAT_U`: When the operand is an unsigned value and the range of values matches the range of unsigned values of the destination type.

These ISDs indicate a saturated truncate.

Fixes https://github.com/llvm/llvm-project/issues/85903
2024-08-14 09:52:36 +01:00
Lawrence Benson
177ce1900f
[LLVM] Add llvm.experimental.vector.compress intrinsic (#92289)
This PR adds a new vector intrinsic `@llvm.experimental.vector.compress`
to "compress" data within a vector based on a selection mask, i.e., it
moves all selected values (i.e., where `mask[i] == 1`) to consecutive
lanes in the result vector. A `passthru` vector can be provided, from
which remaining lanes are filled.

The main reason for this is that the existing
`@llvm.masked.compressstore` has very strong constraints in that it can
only write values that were selected, resulting in guard branches for
all targets except AVX-512 (and even there the AMD implementation is
_very_ slow). More instruction sets support "compress" logic, but only
within registers. So to store the values, an additional store is needed.
But this combination is likely significantly faster on many target as it
avoids branches.

In follow up PRs, my plan is to add target-specific lowerings for x86,
SVE, and possibly RISCV. I also want to combine this with a store
instruction, as this is probably a common case and we can avoid some
memory writes in that case.

See [discussion in
forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663)
for initial discussion on the design.
2024-07-17 14:24:24 +02:00
Farzon Lotfi
0b58f34c98
[X86][CodeGen] Add base trig intrinsic lowerings (#96222)
This change is an implementation of
https://github.com/llvm/llvm-project/issues/87367's investigation on
supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This change adds constraint intrinsics and some lowering cases for
`acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh`.
The only x86 specific change was for f80.

https://github.com/llvm/llvm-project/issues/70079
https://github.com/llvm/llvm-project/issues/70080
https://github.com/llvm/llvm-project/issues/70081
https://github.com/llvm/llvm-project/issues/70083
https://github.com/llvm/llvm-project/issues/70084
https://github.com/llvm/llvm-project/issues/95966
    
The x86 lowering is going to be done in three pr changes with this being
the first.
A second PR will be put up for Loop Vectorizing and then SLPVectorizer.

The constraint intrinsics is also going to be in multiple parts, but
just 2.
This part covers just the llvm specific changes, part2 will cover clang
specifc changes and legalization for backends than have special
legalization
 requirements like aarch64 and wasm.
2024-07-11 15:58:43 -04:00
Daniil Kovalev
1488fb4153
[PAC][AArch64] Lower ptrauth constants in code (#96879)
This re-applies #94241 after fixing buildbot failure, see
https://lab.llvm.org/buildbot/#/builders/51/builds/570

According to standard, `constexpr` variables and `const` variables
initialized with constant expressions can be used in lambdas w/o
capturing - see https://en.cppreference.com/w/cpp/language/lambda.
However, MSVC used on buildkite seems to ignore that rule and does not
allow using such uncaptured variables in lambdas: we have "error C3493:
'Mask16' cannot be implicitly captured because no default capture mode
has been specified" - see
https://buildkite.com/llvm-project/github-pull-requests/builds/73238

Explicitly capturing such a variable, however, makes buildbot fail with
"error: lambda capture 'Mask16' is not required to be captured for this
use [-Werror,-Wunused-lambda-capture]" - see
https://lab.llvm.org/buildbot/#/builders/51/builds/570.

Fix both cases by using `0xffff` value directly instead of giving a name
to it.

Original PR description below.

Depends on #94240.

Define the following pseudos for lowering ptrauth constants in code:

- non-`extern_weak`:
  - no GOT load needed: `MOVaddrPAC` - similar to `MOVaddr`, with added
PAC;
  - GOT load needed: `LOADgotPAC` - similar to `LOADgot`, with added PAC;
- `extern_weak`: `LOADauthptrstatic` - similar to `LOADgot`, but use a
special stub slot named `sym$auth_ptr$key$disc` filled by dynamic linker
during relocation resolving instead of a GOT slot.

---------

Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
2024-06-28 07:29:38 +03:00
Daniil Kovalev
99251f5a11
Revert "[PAC][AArch64] Lower ptrauth constants in code (#94241)" (#96865)
This reverts #94241.

See buildbot failure
https://lab.llvm.org/buildbot/#/builders/51/builds/570
2024-06-27 11:10:38 +03:00
Daniil Kovalev
b5cc19e572
[PAC][AArch64] Lower ptrauth constants in code (#94241)
Depends on #94240.

Define the following pseudos for lowering ptrauth constants in code:

- non-`extern_weak`:
  - no GOT load needed: `MOVaddrPAC` - similar to `MOVaddr`, with added
    PAC;
  - GOT load needed: `LOADgotPAC` - similar to `LOADgot`, with added PAC;
- `extern_weak`: `LOADauthptrstatic` - similar to `LOADgot`, but use a
  special stub slot named `sym$auth_ptr$key$disc` filled by dynamic linker
  during relocation resolving instead of a GOT slot.

---------

Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
2024-06-27 10:02:17 +03:00
Nikita Popov
f2f18459d4 Revert "Intrinsic: introduce minimumnum and maximumnum (#93841)"
As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.

This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
2024-06-21 08:34:04 +02:00
YunQiang Su
8988148003
Intrinsic: introduce minimumnum and maximumnum (#93841)
Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:

When we compare sNaN vs NUM:

ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
     +0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.

So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.

Half-fix: #93033
2024-06-21 11:53:08 +08:00
Poseydon42
995835fe6d
[SelectionDAG] Add support for the 3-way comparison intrinsics [US]CMP (#91871)
This PR adds initial support for the `scmp`/`ucmp` 3-way comparison
intrinsics in the SelectionDAG. Some of the expansions/lowerings
are not optimal yet.
2024-06-17 11:16:52 +02:00
Matt Arsenault
457bedfe39 DAG: Fix not handling atomic fmin/fmax in dumper 2024-06-12 10:39:54 +02:00
Farzon Lotfi
1d87433593
[x86] Add tan intrinsic part 4 (#90503)
This change is an implementation of #87367's investigation on supporting
IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294


Much of this change was following how G_FSIN and G_FCOS were used.

Changes:
- `llvm/docs/GlobalISel/GenericOpcode.rst` - Document the `G_FTAN`
opcode
-  `llvm/docs/LangRef.rst` - Document the tan intrinsic
- `llvm/include/llvm/Analysis/VecFuncs.def` - Associate the tan
intrinsic as a vector function similar to the tanf libcall.
- `llvm/include/llvm/CodeGen/BasicTTIImpl.h` - Map the tan intrinsic to
`ISD::FTAN`
- `llvm/include/llvm/CodeGen/ISDOpcodes.h` - Define ISD opcodes for
`FTAN` and `STRICT_FTAN`
-  `llvm/include/llvm/IR/Intrinsics.td` - Create the tan intrinsic
- `llvm/include/llvm/IR/RuntimeLibcalls.def` - Define tan libcall
mappings
- `llvm/include/llvm/Target/GenericOpcodes.td` - Define the `G_FTAN`
Opcode
- `llvm/include/llvm/Support/TargetOpcodes.def` - Create a `G_FTAN`
Opcode handler
- `llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td` - Map
`G_FTAN` to `ftan`
- `llvm/include/llvm/Target/TargetSelectionDAG.td` - Define `ftan`,
`strict_ftan`, and `any_ftan` and map them to the ISD opcodes for `FTAN`
and `STRICT_FTAN`
- `llvm/lib/Analysis/VectorUtils.cpp` - Associate the tan intrinsic as a
vector intrinsic
- `llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp` Map the tan intrinsic
to `G_FTAN` Opcode
- `llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp` - Add `G_FTAN` to
the list of floating point math operations also associate `G_FTAN` with
the `TAN_F` runtime lib.
- `llvm/lib/CodeGen/GlobalISel/Utils.cpp` - More floating point math
operation common behaviors.
- llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp - List the function
expansion operations for `FTAN` and `STRICT_FTAN`. Also define both
opcodes in `PromoteNode`.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp` - More `FTAN`
and `STRICT_FTAN` handling in the legalizer
- `llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h` - Define
`SoftenFloatRes_FTAN` and `ExpandFloatRes_FTAN`.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp` - Define `FTAN`
as a legal vector operation.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp` - Define
`FTAN` as a legal vector operation.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp` - define tan as an
intrinsic that doesn't return NaN.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp` Map
`LibFunc_tan`, `LibFunc_tanf`, and `LibFunc_tanl` to `ISD::FTAN`. Map
`Intrinsic::tan` to `ISD::FTAN` and add selection dag handling for
`Intrinsic::tan`.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp` - Define `ftan`
and `strict_ftan` names for the equivalent ISD opcodes.
- `llvm/lib/CodeGen/TargetLoweringBase.cpp` -Define a Tan128 libcall and
ISD::FTAN as a target lowering action.
- `llvm/lib/Target/X86/X86ISelLowering.cpp` - Add x86_64 lowering for
tan intrinsic

resolves https://github.com/llvm/llvm-project/issues/70082
2024-06-05 15:01:33 -04:00
Roger Ferrer Ibáñez
05e6bb40eb
[SelectionDAG] Add an ISD::CLEAR_CACHE node to lower llvm.clear_cache (#93795)
The current way of lowering `llvm.clear_cache` is a bit unusual. As
suggested by Matt Arsenault we are better off using an ISD node.

This change introduces a new `ISD::CLEAR_CACHE`, registers a new libcall
by default named `__clear_cache` and the default legalisation is a
libcall.

This is preparatory work for a custom lowering of `ISD::CLEAR_CACHE`
needed by RISC-V on some platforms.
2024-05-30 14:55:32 +02:00
Graham Hunter
fbb37e9606
[AArch64] Add an all-in-one histogram intrinsic
Based on discussion from
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788

Current interface is:

llvm.experimental.histogram(<vecty> ptrs, <intty> inc_amount, <vecty> mask)

The integer type used by 'inc_amount' needs to match the type of the buckets in memory.

The intrinsic covers the following operations:
  * Gather load
  * histogram on the elements of 'ptrs'
  * multiply the histogram results by 'inc_amount'
  * add the result of the multiply to the values loaded by the gather
  * scatter store the results of the add

Supports lowering to histcnt instructions for AArch64 targets, and scalarization for all others at present.
2024-05-13 11:35:28 +01:00
Matt Arsenault
3cb660d772 DAG: Fix missing fptrunc_round name in debug dumps 2024-04-24 11:44:03 +02:00
Pierre van Houtryve
cf328ff96d
[IR] Memory Model Relaxation Annotations (#78569)
Implements the core/target-agnostic components of Memory Model
Relaxation Annotations.

RFC:
https://discourse.llvm.org/t/rfc-mmras-memory-model-relaxation-annotations/76361/5
2024-04-24 08:52:25 +02:00
Jay Foad
a6170d5b7e
[SelectionDAG] Dump convergencectrl_glue DAG node (#87487) 2024-04-03 16:21:57 +01:00
Jonas Paulsson
8b8e1adbde
[SystemZ] Don't lower ATOMIC_LOAD/STORE to LOAD/STORE (#75879)
- Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE
nodes to regular LOAD/STORE nodes, make them legal and select those nodes
properly instead. This avoids exposing them to the DAGCombiner.

- AtomicExpand pass no longer casts float/double atomic load/stores to integer
  (FP128 is still casted).
2024-03-18 17:21:50 -04:00
Sameer Sahasrabuddhe
60822637bf Restore "Implement convergence control in MIR using SelectionDAG (#71785)"
This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.
Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306.

LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-03-06 12:19:32 +05:30
Mitch Phillips
f010b1bef4 Revert "Restore "Implement convergence control in MIR using SelectionDAG (#71785)""
This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.

Reason: Broke the sanitizer buildbots. See the comments at
https://github.com/llvm/llvm-project/pull/71785
for more information.
2024-03-04 17:05:34 +01:00
Sameer Sahasrabuddhe
c7fdd8c11e Restore "Implement convergence control in MIR using SelectionDAG (#71785)"
Original commit 79889734b940356ab3381423c93ae06f22e772c9.
Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e.

LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-03-04 13:28:04 +05:30
Shilei Tian
8300f30a92 [SelectionDAG] Add STRICT_BF16_TO_FP and STRICT_FP_TO_BF16 (#80056)
This patch adds the support for `STRICT_BF16_TO_FP` and
`STRICT_FP_TO_BF16`.
2024-03-04 01:08:49 -05:00
Shilei Tian
2c5d01c2cf Revert "[SelectionDAG] Add STRICT_BF16_TO_FP and STRICT_FP_TO_BF16 (#80056)"
This reverts commit b0c158bd947c360a4652eb0de3a4794f46deb88b.

The changes in `compiler-rt` broke tests.
2024-03-04 00:33:31 -05:00
Shilei Tian
b0c158bd94
[SelectionDAG] Add STRICT_BF16_TO_FP and STRICT_FP_TO_BF16 (#80056)
This patch adds the support for `STRICT_BF16_TO_FP` and
`STRICT_FP_TO_BF16`.
2024-03-04 00:01:50 -05:00
Sameer Sahasrabuddhe
a2afcd5721 Revert "Implement convergence control in MIR using SelectionDAG (#71785)"
This reverts commit 79889734b940356ab3381423c93ae06f22e772c9.

Encountered multiple buildbot failures.
2024-02-21 11:07:02 +05:30
Sameer Sahasrabuddhe
79889734b9
Implement convergence control in MIR using SelectionDAG (#71785)
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-02-21 10:06:37 +05:30
Joseph Huber
11fcae69db
[LLVM] Add __builtin_readsteadycounter intrinsic and builtin for realtime clocks (#81331)
Summary:
This patch adds a new intrinsic and builtin function mirroring the
existing `__builtin_readcyclecounter`. The difference is that this
implementation targets a separate counter that some targets have which
returns a fixed frequency clock that can be used to determine elapsed
time, this is different compared to the cycle counter which often has
variable frequency.

This patch only adds support for the NVPTX and AMDGPU targets.

This is done as a new and separate builtin rather than an argument to
`readcyclecounter` to avoid needing to change existing code and to make
the separation more explicit.
2024-02-13 10:06:25 -06:00
Nico Weber
184ca39529
[llvm] Move CodeGenTypes library to its own directory (#79444)
Finally addresses https://reviews.llvm.org/D148769#4311232 :)

No behavior change.
2024-01-25 12:01:31 -05:00
Alex Bradbury
197214e39b
[RFC][SelectionDAG] Add and use SDNode::getAsZExtVal() helper (#76710)
This follows on from #76708, allowing
`cast<ConstantSDNode>(N)->getZExtValue()` to be replaced with just
`N->getAsZextVal();`
    
Introduced via `git grep -l "cast<ConstantSDNode>\(.*\).*getZExtValue" |
xargs sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getZExtValue/\1->getAsZExtVal/'` and
then using `git clang-format` on the result.
2024-01-09 12:25:17 +00:00
Craig Topper
bbd57e1832
[SelectionDAG] Add initial plumbing for the disjoint flag. (#76751)
This copies the flag from IR to the SDNode in SelectionDAGBuilder, clears
the flag in SimplifyDemandedBits, and adds it to canCreateUndefOrPoison.

Uses of the flag will come in later patches.
2024-01-02 21:58:00 -08:00
Craig Topper
70b35ec0a8
[SelectionDAG] Add initial support for nneg flag on ISD::ZERO_EXTEND. (#70872)
This adds the nneg flag to SDNodeFlags and the node printing code.
SelectionDAGBuilder will add this flag to the node if the target doesn't
prefer sign extend.

A future RISC-V patch can remove the sign extend preference from
SelectionDAGBuilder.

I've also added the flag to the DAG combine that converts
ISD::SIGN_EXTEND to ISD::ZERO_EXTEND.
2023-11-03 11:15:08 -07:00
Matt Arsenault
b14e83d1a4 IR: Add llvm.exp10 intrinsic
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.

https://reviews.llvm.org/D157871
2023-09-01 19:45:03 -04:00
Daniel Paoliello
0c5c7b52f0 Emit the CodeView S_ARMSWITCHTABLE debug symbol for jump tables
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:

* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).

Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.

Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)

This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D149367
2023-08-31 12:06:50 -07:00
Arthur Eubanks
0a4fc4ac1c Revert "Emit the CodeView S_ARMSWITCHTABLE debug symbol for jump tables"
This reverts commit 8d0c3db388143f4e058b5f513a70fd5d089d51c3.

Causes crashes, see comments in https://reviews.llvm.org/D149367.

Some follow-up fixes are also reverted:

This reverts commit 636269f4fca44693bfd787b0a37bb0328ffcc085.
This reverts commit 5966079cf4d4de0285004eef051784d0d9f7a3a6.
This reverts commit e7294dbc85d24a08c716d9babbe7f68390cf219b.
2023-08-25 18:34:15 -07:00
Daniel Paoliello
8d0c3db388 Emit the CodeView S_ARMSWITCHTABLE debug symbol for jump tables
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:

* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).

Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.

Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)

This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D149367
2023-08-25 10:19:17 -07:00
Serge Pavlov
6862f0fab1 [FPEnv] Intrinsics for access to FP control modes
The change introduces intrinsics 'get_fpmode', 'set_fpmode' and
'reset_fpmode'. They manage all target dynamic floating-point control
modes, which include, for instance, rounding direction, precision,
treatment of denormals and so on. The intrinsics do the same
operations as the C library functions 'fegetmode' and 'fesetmode'. By
default they are lowered to calls to these functions.

Two main use cases are supported by this implementation.

1. Local modification of the control modes. In this case the code
usually has a pattern (in pseudocode):

    saved_modes = get_fpmode()
    set_fpmode(<new_modes>)
    ...
    <do operations under the new modes>
    ...
    set_fpmode(saved_modes)

In the case when it is known that the current FP environment is default,
the code may be shorter:

    set_fpmode(<new_modes>)
    ...
    <do operations under the new modes>
    ...
    reset_fpmode()

Such patterns appear not only in user code but also in implementations
of various FP controlling pragmas. In particular, the implementation of
`#pragma STDC FENV_ROUND` requires similar code if the target does not
support static rounding mode.

2. Portable control of FP modes. Usually FP control modes are set by
writing to some control register. Different targets have different
layout of this register, the way the register is accessed also may be
different. Using set of target-specific definitions for the control
register bits together with these intrinsic functions provides enough
portable way to handle control modes across wide range of hardware.

This change defines only llvm intrinsic function, which implement the
access required for the aforementioned use cases.

Differential Revision: https://reviews.llvm.org/D82525
2023-08-24 15:52:19 +07:00
Matt Arsenault
003b58f65b IR: Add llvm.frexp intrinsic
Add an intrinsic which returns the two pieces as multiple return
values. Alternatively could introduce a pair of intrinsics to
separately return the fractional and exponent parts.

AMDGPU has native instructions to return the two halves, but could use
some generic legalization and optimization handling. For example, we
should be able to handle legalization of f16 on older targets, and for
bf16. Additionally antique targets need a hardware workaround which
would be better handled in the backend rather than in library code
where it is now.
2023-06-28 14:50:16 -04:00
Anna Thomas
26bfbec5d2 [Intrinsic] Introduce reduction intrinsics for minimum/maximum
This patch introduces the reduction intrinsic for floating point minimum
and maximum which has the same semantics (for NaN and signed zero) as
llvm.minimum and llvm.maximum.

Reviewed-By: nikic

Differential Revision: https://reviews.llvm.org/D152370
2023-06-13 12:29:58 -04:00
Matt Arsenault
eece6ba283 IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.

Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
2023-06-06 17:07:18 -04:00
Serge Pavlov
eecaeb6f10 [FPEnv] Intrinsics for access to FP environment
The change implements intrinsics 'get_fpenv', 'set_fpenv' and 'reset_fpenv'.
They are used to read floating-point environment, set it or reset to
some default state. They do the same actions as C library functions
'fegetenv' and 'fesetenv'. By default these intrinsics are lowered to calls
to these functions.

The new intrinsics specify FP environment as a value of integer type, it
is convenient of most targets where the FP state is a content of some
register. Some targets however use long representations. On X86 the size
of FP environment is 256 bits, and even half of this size is not a legal
ibteger type. To facilitate legalization in such cases, two sets of DAG
nodes is used. Nodes GET_FPENV and SET_FPENV are used when FP
environment may be represented by a legal integer type. Nodes
GET_FPENV_MEM and SET_FPENV_MEM consider FP environment as a region in
memory, much like `fesetenv` and `fegetenv` do. They are used when
target has long representation for floationg-point state.

Differential Revision: https://reviews.llvm.org/D71742
2023-06-05 13:10:01 +07:00
NAKAMURA Takumi
c1221251fb Restore CodeGen/MachineValueType.h from Support
This is rework of;

  - rG13e77db2df94 (r328395; MVT)

Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h`
can be restored as well.

Depends on D148767

Differential Revision: https://reviews.llvm.org/D149024
2023-05-03 00:13:20 +09:00
Sergei Barannikov
e744e51b12 [SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC)
This will make them consistent with other overflow-aware nodes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D148196
2023-04-29 21:59:58 +03:00