1656 Commits

Author SHA1 Message Date
Craig Topper
acab142751 [LegalizeDAG] Freeze index when converting insert_elt/insert_subvector to load/store on stack.
We try clamp the index to be within the bounds of the stack object
we create, but if we don't freeze it, poison can propagate into the
clamp code. This can cause the access to leave the bounds of the
stack object.

We have other instances of this issue in type legalization and extract_elt/subvector,
but posting this patch first for direction check.

Fixes #86717
2024-03-27 13:01:23 -07:00
Craig Topper
1c965801c4
[LegalizeDAG] Merge PerformInsertVectorEltInMemory into ExpandInsertToVectorThroughStack. NFC (#86755)
These functions are very similar. We can share them like we do for
EXTRACT_VECTOR_ELT and EXTRACT_SUBVECTOR.
2024-03-27 09:39:35 -07:00
Craig Topper
09155ac290 [LegalizeDAG] Remove unneeded temporary SDValues from PerformInsertVectorEltInMemory. NFC
There were 3 temporaries that just renamed the 3 well name arguments to the
function to Tmp1-3. Looks like this was done when the code was extracted from
elsewhere into a separate function 15 years ago.
2024-03-26 16:25:24 -07:00
David Green
601e102bdb
[CodeGen] Use LocationSize for MMO getSize (#84751)
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.

This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.

Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
2024-03-17 18:15:56 +00:00
Shilei Tian
8300f30a92 [SelectionDAG] Add STRICT_BF16_TO_FP and STRICT_FP_TO_BF16 (#80056)
This patch adds the support for `STRICT_BF16_TO_FP` and
`STRICT_FP_TO_BF16`.
2024-03-04 01:08:49 -05:00
Shilei Tian
2c5d01c2cf Revert "[SelectionDAG] Add STRICT_BF16_TO_FP and STRICT_FP_TO_BF16 (#80056)"
This reverts commit b0c158bd947c360a4652eb0de3a4794f46deb88b.

The changes in `compiler-rt` broke tests.
2024-03-04 00:33:31 -05:00
Shilei Tian
b0c158bd94
[SelectionDAG] Add STRICT_BF16_TO_FP and STRICT_FP_TO_BF16 (#80056)
This patch adds the support for `STRICT_BF16_TO_FP` and
`STRICT_FP_TO_BF16`.
2024-03-04 00:01:50 -05:00
David Majnemer
cc13f3ba45
Correctly round FP -> BF16 when SDAG expands such nodes (#82399)
We did something pretty naive:
- round FP64 -> BF16 by first rounding to FP32
- skip FP32 -> BF16 rounding entirely
- taking the top 16 bits of a FP32 which will turn some NaNs into
infinities

Let's do this in a more principled way by rounding types with more
precision than FP32 to FP32 using round-inexact-to-odd which will negate
double rounding issues.
2024-02-21 12:37:02 -05:00
Manish Kausik H
652081ca9e
[NFC][SelectionDAG] Move function getStackAlignedMMO to the beginning of LegalizeDAG.cpp (#82171) 2024-02-19 19:11:42 +04:00
Joseph Huber
11fcae69db
[LLVM] Add __builtin_readsteadycounter intrinsic and builtin for realtime clocks (#81331)
Summary:
This patch adds a new intrinsic and builtin function mirroring the
existing `__builtin_readcyclecounter`. The difference is that this
implementation targets a separate counter that some targets have which
returns a fixed frequency clock that can be used to determine elapsed
time, this is different compared to the cycle counter which often has
variable frequency.

This patch only adds support for the NVPTX and AMDGPU targets.

This is done as a new and separate builtin rather than an argument to
`readcyclecounter` to avoid needing to change existing code and to make
the separation more explicit.
2024-02-13 10:06:25 -06:00
Simon Pilgrim
114a33be47 [DAG] getStackAlignedMMO - return the getMachineMemOperand result directly (style). NFC. 2024-02-04 14:01:55 +00:00
Manish Kausik H
a768bc6ef6
[SelectionDAG] Use unaligned store to move AVX registers onto stack for extractelement (#78422)
Prior to this patch, SelectionDAG generated aligned move onto stacks for
AVX registers when the function was marked as a no-realign-stack
function. This lead to misalignment between the stack and the
instruction generated. This patch fixes the issue.

Fixes #77730
2024-02-02 22:49:31 +05:30
Nico Weber
184ca39529
[llvm] Move CodeGenTypes library to its own directory (#79444)
Finally addresses https://reviews.llvm.org/D148769#4311232 :)

No behavior change.
2024-01-25 12:01:31 -05:00
Kazu Hirata
e4a6be0fc0 [CodeGen] Use getConstantOperandVal (NFC) 2024-01-13 18:18:51 -08:00
Matt Arsenault
460ffcddd9
AMDGPU: Make bf16/v2bf16 legal types (#76215)
There are some intrinsics are using i16 vectors in place of bfloat
vectors.
Move towards making bf16 vectors legal so these can migrate. Leave the
larger vectors for a later change.

Depends #76213 #76214
2024-01-04 22:31:18 +07:00
Matt Arsenault
ed6dc62862
DAG: Handle equal size element build_vector promotion (#76213) 2023-12-23 20:43:14 +07:00
Matt Arsenault
4d1cd38c95 DAG: Handle promotion of fcanonicalize
This avoids a regression in a future commit
2023-12-22 12:50:18 +07:00
Matt Arsenault
9e574a3936 DAG: Fix expansion of bf16 sourced extloads
Also fix assorted vector extload failures for AMDGPU.
2023-12-20 19:24:27 +07:00
Simon Pilgrim
22df0886a1
[DAG] Don't split f64 constant stores if the fp imm is legal (#74622)
If the target can generate a specific fp immediate constant, then don't split the store into 2 x i32 stores

Another cleanup step for #74304
2023-12-07 10:33:03 +00:00
Sander de Smalen
81b7f115fb
[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979)
It seems TypeSize is currently broken in the sense that:

  TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)

without failing its assert that explicitly tests for this case:

  assert(LHS.Scalable == RHS.Scalable && ...);

The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.

This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.

The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
2023-11-22 08:52:53 +00:00
Nikita Popov
1a1a5ec756 [SDAG] Avoid use of ConstantExpr::getFPTrunc() (NFC)
Use the constant folding API instead. As we're working on
ConstantFP, it is guaranteed to succeed.
2023-11-06 15:55:23 +01:00
Craig Topper
8912200966
[RISCV] Add experimental support for making i32 a legal type on RV64 in SelectionDAG. (#70357)
This will select i32 operations directly to W instructions without
custom nodes. Hopefully this can allow us to be less dependent on
hasAllNBitUsers to recover i32 operations in RISCVISelDAGToDAG.cpp.

This support is enabled with a command line option that is off by
default.

Generated code is still not optimal.

I've duplicated many test cases for this, but its not complete. Enabling this runs all existing lit tests without crashing.
2023-11-01 09:36:41 -07:00
Youngsuk Kim
e5026f0179 [llvm] Remove uses of Type::getPointerTo() (NFC)
Partial progress towards removing in-tree uses of `getPointerTo()`,
by employing the following options:

* Drop the call entirely if the sole purpose of it is to support a no-op
  bitcast (remove the no-op bitcast as well).

* Replace with `PointerType::get()`/`PointerType::getUnqual()`

This is a NFC cleanup effort.

Reviewed By: barannikov88

Differential Revision: https://reviews.llvm.org/D155232
2023-09-22 19:44:38 -04:00
Yingwei Zheng
b423e1f05d
[SDAG][RISCV] Avoid neg instructions when lowering atomic_load_sub with a constant rhs
This patch avoids creating (sub x0, rhs) when lowering atomic_load_sub with a constant rhs.
Comparison with GCC: https://godbolt.org/z/c5zPdP7j4

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158673
2023-09-16 17:09:41 +08:00
Matt Arsenault
b14e83d1a4 IR: Add llvm.exp10 intrinsic
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.

https://reviews.llvm.org/D157871
2023-09-01 19:45:03 -04:00
Philip Reames
685e1909e9 [LegalizeDAG] Use scalable aware idiom for checking for single element vector
NFC for fixed vectors (all that reaches here currently), and future proofing for scalable vectors.
2023-09-01 11:56:03 -07:00
Matt Arsenault
ad9d13d535 SelectionDAG: Swap operands of atomic_store
Irritatingly, atomic_store had operands in the opposite order from
regular store. This made it difficult to share patterns between
regular and atomic stores.

There was a previous incomplete attempt to move atomic_store into the
regular StoreSDNode which would be better.

I think it was a mistake for all atomicrmw to swap the operand order,
so maybe it's better to take this one step further.

https://reviews.llvm.org/D123143
2023-08-31 17:30:10 -04:00
Daniel Paoliello
0c5c7b52f0 Emit the CodeView S_ARMSWITCHTABLE debug symbol for jump tables
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:

* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).

Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.

Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)

This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D149367
2023-08-31 12:06:50 -07:00
Arthur Eubanks
0a4fc4ac1c Revert "Emit the CodeView S_ARMSWITCHTABLE debug symbol for jump tables"
This reverts commit 8d0c3db388143f4e058b5f513a70fd5d089d51c3.

Causes crashes, see comments in https://reviews.llvm.org/D149367.

Some follow-up fixes are also reverted:

This reverts commit 636269f4fca44693bfd787b0a37bb0328ffcc085.
This reverts commit 5966079cf4d4de0285004eef051784d0d9f7a3a6.
This reverts commit e7294dbc85d24a08c716d9babbe7f68390cf219b.
2023-08-25 18:34:15 -07:00
Daniel Paoliello
8d0c3db388 Emit the CodeView S_ARMSWITCHTABLE debug symbol for jump tables
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:

* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).

Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.

Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)

This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D149367
2023-08-25 10:19:17 -07:00
Serge Pavlov
6862f0fab1 [FPEnv] Intrinsics for access to FP control modes
The change introduces intrinsics 'get_fpmode', 'set_fpmode' and
'reset_fpmode'. They manage all target dynamic floating-point control
modes, which include, for instance, rounding direction, precision,
treatment of denormals and so on. The intrinsics do the same
operations as the C library functions 'fegetmode' and 'fesetmode'. By
default they are lowered to calls to these functions.

Two main use cases are supported by this implementation.

1. Local modification of the control modes. In this case the code
usually has a pattern (in pseudocode):

    saved_modes = get_fpmode()
    set_fpmode(<new_modes>)
    ...
    <do operations under the new modes>
    ...
    set_fpmode(saved_modes)

In the case when it is known that the current FP environment is default,
the code may be shorter:

    set_fpmode(<new_modes>)
    ...
    <do operations under the new modes>
    ...
    reset_fpmode()

Such patterns appear not only in user code but also in implementations
of various FP controlling pragmas. In particular, the implementation of
`#pragma STDC FENV_ROUND` requires similar code if the target does not
support static rounding mode.

2. Portable control of FP modes. Usually FP control modes are set by
writing to some control register. Different targets have different
layout of this register, the way the register is accessed also may be
different. Using set of target-specific definitions for the control
register bits together with these intrinsic functions provides enough
portable way to handle control modes across wide range of hardware.

This change defines only llvm intrinsic function, which implement the
access required for the aforementioned use cases.

Differential Revision: https://reviews.llvm.org/D82525
2023-08-24 15:52:19 +07:00
Jianjian GUAN
879e801a91 [RISCV] Apply promotion for f16 vector ops when only have zvfhmin
For most fp16 vector ops, we could promote it to fp32 vector when zvfhmin is enable but zvfh is not.
But for nxv32f16, we need to split it first since nxv32f32 is not a valid MVT.

Reviewed By: michaelmaitland

Differential Revision: https://reviews.llvm.org/D153848
2023-08-23 16:49:20 +08:00
Craig Topper
45b172c838 [LegalizeDAG] Prevent LegalizeLoadOps from creating extloads that mix int and fp types.
For RISC-V, getRegisterType for fp16 returns i16. i16->fp64 extload
is considered legal because the LoadExtActions defaults to Legal
for all entries. Only fp/fp and int/int entries are changed to
Expand fore RISC-V.

This patch detects the FP-ness has changed and won't try to call
isLoadExtLegal.

Alternatively, we could add Expand for int/fp and fp/int, but that
seemed a little silly.

Fixes #63816

Reviewed By: asb, wangpc

Differential Revision: https://reviews.llvm.org/D155040
2023-07-12 08:03:35 -07:00
Matt Arsenault
1d92b68ead DAG: Correct chain management for frexp libcalls
We need to replace the other uses of the call chain with the new load
chain.

Fixes not preserving the return def with unused x86_fp80
results. Regression reported here:
https://reviews.llvm.org/rGb15bf305ca3e9ce63aaef7247d32fb3a75174531#1224999
2023-07-10 21:39:15 -04:00
Matt Arsenault
7d644dc598 DAG: Really fix patch split 2023-06-30 09:14:02 -04:00
Matt Arsenault
2b988801c9 DAG: Fix broken patch split 2023-06-30 09:07:23 -04:00
Matt Arsenault
160d7227e0 DAG: Fix libcall expansion for frexp on ARM
The ExpandLibcallResult result was a bitcast and not the direct call
result, so we couldn't find the chain. Use the new separate chain
return value instead.
2023-06-30 09:03:45 -04:00
Matt Arsenault
b69b6b8399 DAG: Return the chain from ExpandLibCall
If the libcall expansion requires use of the inserted call's result
chain, it's unreliable to query it from the main result. The call
lowering may have added additional casts or other obscuring operations
we don't want to parse through.
2023-06-30 09:03:40 -04:00
Matt Arsenault
003b58f65b IR: Add llvm.frexp intrinsic
Add an intrinsic which returns the two pieces as multiple return
values. Alternatively could introduce a pair of intrinsics to
separately return the fractional and exponent parts.

AMDGPU has native instructions to return the two halves, but could use
some generic legalization and optimization handling. For example, we
should be able to handle legalization of f16 on older targets, and for
bf16. Additionally antique targets need a hardware workaround which
would be better handled in the backend rather than in library code
where it is now.
2023-06-28 14:50:16 -04:00
Anna Thomas
26bfbec5d2 [Intrinsic] Introduce reduction intrinsics for minimum/maximum
This patch introduces the reduction intrinsic for floating point minimum
and maximum which has the same semantics (for NaN and signed zero) as
llvm.minimum and llvm.maximum.

Reviewed-By: nikic

Differential Revision: https://reviews.llvm.org/D152370
2023-06-13 12:29:58 -04:00
Anna Thomas
b2195bc771 [SelectionDAG][AArch64] Legalize FMAXIMUM/FMINIMUM
The missing legalization in SelectionDAG was identified when adding the
intrinsic support for vector reduction for maximum/minimum (D152370).

Fixes part of PR: https://github.com/llvm/llvm-project/issues/63267

Differential Revision: https://reviews.llvm.org/D152718
2023-06-12 12:22:21 -04:00
Matt Arsenault
eece6ba283 IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.

Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
2023-06-06 17:07:18 -04:00
Serge Pavlov
eecaeb6f10 [FPEnv] Intrinsics for access to FP environment
The change implements intrinsics 'get_fpenv', 'set_fpenv' and 'reset_fpenv'.
They are used to read floating-point environment, set it or reset to
some default state. They do the same actions as C library functions
'fegetenv' and 'fesetenv'. By default these intrinsics are lowered to calls
to these functions.

The new intrinsics specify FP environment as a value of integer type, it
is convenient of most targets where the FP state is a content of some
register. Some targets however use long representations. On X86 the size
of FP environment is 256 bits, and even half of this size is not a legal
ibteger type. To facilitate legalization in such cases, two sets of DAG
nodes is used. Nodes GET_FPENV and SET_FPENV are used when FP
environment may be represented by a legal integer type. Nodes
GET_FPENV_MEM and SET_FPENV_MEM consider FP environment as a region in
memory, much like `fesetenv` and `fegetenv` do. They are used when
target has long representation for floationg-point state.

Differential Revision: https://reviews.llvm.org/D71742
2023-06-05 13:10:01 +07:00
Simon Pilgrim
04e809ab90 [DAG] Add TargetLowering::expandABD and convert X86 lowering to use it
Scalar widening cases are still custom lowered in the X86 backend - we still need to add promotion/legalization support to handle these
2023-05-05 15:13:23 +01:00
NAKAMURA Takumi
c1221251fb Restore CodeGen/MachineValueType.h from Support
This is rework of;

  - rG13e77db2df94 (r328395; MVT)

Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h`
can be restored as well.

Depends on D148767

Differential Revision: https://reviews.llvm.org/D149024
2023-05-03 00:13:20 +09:00
Sergei Barannikov
e744e51b12 [SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC)
This will make them consistent with other overflow-aware nodes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D148196
2023-04-29 21:59:58 +03:00
Kazu Hirata
5f48b861f8 [SelectionDAG] Use isOneConstant (NFC) 2023-03-23 19:26:42 -07:00
Jay Foad
c5085c91cc [CodeGen] Trivial simplification of some getRegisterType calls. NFC. 2023-02-14 16:31:46 +00:00
Kazu Hirata
64dad4ba9a Use llvm::bit_cast (NFC) 2023-02-14 01:22:12 -08:00
Guillaume Chatelet
8fd5558b29 [NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:49:38 +00:00