35 Commits

Author SHA1 Message Date
Benjamin Kramer
55c466da2f [X86][AVX512BF16] Add a few missing insert/extract patterns
These are really the same as the f16 (and i16) instructions, but we need
them for any type that can occur.
2024-03-06 00:52:29 +01:00
Phoebe Wang
ff72c83b01
[X86] Add missing subvector_subreg_lowering for BF16 (#83720)
Fixes: #83358
2024-03-04 10:15:43 +08:00
Simon Pilgrim
a2a0089ac3
[X86] movsd/movss/movd/movq - add support for constant comments (#78601)
If we're loading a constant value, print the constant (and the zero upper elements) instead of just the shuffle mask.

This did require me to move the shuffle mask handling into addConstantComments as we can't handle this in the MC layer.
2024-01-19 14:21:26 +00:00
Phoebe Wang
9745c13ca8
[X86][BF16] Improve float -> bfloat lowering under AVX512BF16 and AVXNECONVERT (#78042) 2024-01-17 10:09:26 +08:00
Phoebe Wang
59af659ee3
[X86][BF16] Try to use f16 for lowering (#76901)
This patch fixes BF16 32-bit ABI problem:
https://godbolt.org/z/6dMnh8jGG
2024-01-05 15:25:18 +08:00
Phoebe Wang
176c341198 [X86][BF16] Add 32-bit tests to show ABI problem, NFC 2024-01-04 15:43:34 +08:00
Phoebe Wang
a384cd5012
[X86][BF16] Add subvec_zero_lowering patterns (#76507) 2023-12-31 11:14:41 +08:00
Phoebe Wang
e499ae53b3
[X86][BF16] Support INSERT_SUBVECTOR and CONCAT_VECTORS (#76485) 2023-12-28 13:29:01 +08:00
Phoebe Wang
3081bacb60
[X86][BF16] Add X86SubVBroadcastld patterns (#76479) 2023-12-28 10:08:27 +08:00
Simon Pilgrim
11276563c8 [X86] X86DAGToDAGISel - attempt to merge XMM/YMM loads with YMM/ZMM loads of the same ptr (#73126)
If we are loading the same ptr at different vector widths, then reuse the largest load and just extract the low subvector.

Unlike the equivalent VBROADCAST_LOAD/SUBV_BROADCAST_LOAD folds which can occur in DAG, we have to wait until DAGISel otherwise we can hit infinite loops if constant folding recreates the original constant value.

This is mainly useful for better constant sharing.
2023-11-27 10:26:26 +00:00
Simon Pilgrim
381efa4960 Revert rG67275263b3b781a "[X86] X86DAGToDAGISel - attempt to merge XMM/YMM loads with YMM/ZMM loads of the same ptr (#73126)"
Missed an issue that we were calling continue from within the for loop - fixed version incoming shortly.
2023-11-23 16:50:58 +00:00
Simon Pilgrim
67275263b3
[X86] X86DAGToDAGISel - attempt to merge XMM/YMM loads with YMM/ZMM loads of the same ptr (#73126)
If we are loading the same ptr at different vector widths, then reuse the larger load and just extract the low subvector.

Unlike the equivalent VBROADCAST_LOAD/SUBV_BROADCAST_LOAD folds which can occur in DAG, we have to wait until DAGISel otherwise we can hit infinite loops if constant folding recreates the original constant value.

This is mainly useful for better constant sharing.
2023-11-23 14:10:23 +00:00
Jay Foad
7b3bbd83c0 Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.

Reverted due to various buildbot failures.
2023-10-09 12:31:32 +01:00
Jay Foad
2501ae58e3
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
2023-10-09 11:44:41 +01:00
Phoebe Wang
b667e9c23d [X86][BF16] Lower FP_ROUND for vector types under AVX512BF16
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D158952
2023-08-29 13:48:13 +08:00
Phoebe Wang
30ec9473c6 [X86][BF16] Add test coverage for AVX-NE-CONVERT
Split from D158952.
2023-08-29 09:08:01 +08:00
Phoebe Wang
6688701497 [X86][BF16] Lower FP_EXTEND for vector types under AVX512BF16
Fixes #64460

Reviewed By: RKSimon, skan

Differential Revision: https://reviews.llvm.org/D158950
2023-08-28 21:27:10 +08:00
Phoebe Wang
23e2a82446 Revert "[X86][BF16] Lower FP_EXTEND for vector types under AVX512BF16"
This reverts commit 4ae7ed6e19bab0d62c0f936bd6f555103cc3b197.

Sorry, missing the test update.
2023-08-28 21:06:22 +08:00
Phoebe Wang
4ae7ed6e19 [X86][BF16] Lower FP_EXTEND for vector types under AVX512BF16
Fixes #64460

Reviewed By: RKSimon, skan

Differential Revision: https://reviews.llvm.org/D158950
2023-08-28 20:52:06 +08:00
Phoebe Wang
3753ea8311 Revert "[X86][BF16] Lower FP_EXTEND for vector types under AVX512BF16"
This reverts commit 915139fc73fd34223caaec0c3b3525ad79540ec0.

The constant value is 16 rather than 8. Revert it and then reland.
2023-08-28 20:48:16 +08:00
Phoebe Wang
915139fc73 [X86][BF16] Lower FP_EXTEND for vector types under AVX512BF16
Fixes #64460

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D158950
2023-08-28 09:47:12 +08:00
Phoebe Wang
04527f1d32 [X86][BF16] Customize INSERT_VECTOR_ELT for bf16 when feature BF16 is on
Fixes root cause of #63017.
The reason is similar to BUILD_VECTOR. We have legal vector type but
still soft promote for scalar type. So we need to customize these scalar
to vector nodes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D155961
2023-07-22 20:26:34 +08:00
Phoebe Wang
f11526b091 [X86][BF16] Do not scalarize masked load for BF16 when we have AVX512BF16
Fixes #63017

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D155952
2023-07-22 18:16:49 +08:00
Phoebe Wang
fbae3d1d3c Revert "[X86][BF16] Do not scalarize masked load for BF16 when we have BWI"
This reverts commit ca1c05208ed35ba72869c65ad773b2cca4bbd360.

It caused Buildbot fail: https://lab.llvm.org/buildbot#builders/220/builds/24870
2023-07-21 23:29:11 +08:00
Phoebe Wang
ca1c05208e [X86][BF16] Do not scalarize masked load for BF16 when we have BWI
Fixes #63017

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D155952
2023-07-21 23:18:54 +08:00
Phoebe Wang
c778ca201e [X86][BF16] Split vNbf16 vectors according to vNf16
Fixes #63017

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D151778
2023-06-09 09:04:56 +08:00
Phoebe Wang
7634905a73 [X86][BF16] Share FP16 vector ABI with BF16
The ABI of BF16 is identical to FP16 rather than i16.

Fixes #62997

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D151710
2023-06-09 09:04:56 +08:00
Phoebe Wang
bc1819389f [X86][RFC] Using __bf16 for AVX512_BF16 intrinsics
This is an alternative of D120395 and D120411.

Previously we use `__bfloat16` as a typedef of `unsigned short`. The
name may give user an impression it is a brand new type to represent
BF16. So that they may use it in arithmetic operations and we don't have
a good way to block it.

To solve the problem, we introduced `__bf16` to X86 psABI and landed the
support in Clang by D130964. Now we can solve the problem by switching
intrinsics to the new type.

Reviewed By: LuoYuanke, RKSimon

Differential Revision: https://reviews.llvm.org/D132329
2022-10-19 23:47:04 +08:00
Matthias Braun
189900eb14 X86: Stop assigning register costs for longer encodings.
This stops reporting CostPerUse 1 for `R8`-`R15` and `XMM8`-`XMM31`.
This was previously done because instruction encoding require a REX
prefix when using them resulting in longer instruction encodings. I
found that this regresses the quality of the register allocation as the
costs impose an ordering on eviction candidates. I also feel that there
is a bit of an impedance mismatch as the actual costs occure when
encoding instructions using those registers, but the order of VReg
assignments is not primarily ordered by number of Defs+Uses.

I did extensive measurements with the llvm-test-suite wiht SPEC2006 +
SPEC2017 included, internal services showed similar patterns. Generally
there are a log of improvements but also a lot of regression. But on
average the allocation quality seems to improve at a small code size
regression.

Results for measuring static and dynamic instruction counts:

Dynamic Counts (scaled by execution frequency) / Optimization Remarks:
    Spills+FoldedSpills   -5.6%
    Reloads+FoldedReloads -4.2%
    Copies                -0.1%

Static / LLVM Statistics:
    regalloc.NumSpills    mean -1.6%, geomean -2.8%
    regalloc.NumReloads   mean -1.7%, geomean -3.1%
    size..text            mean +0.4%, geomean +0.4%

Static / LLVM Statistics:
    mean -2.2%, geomean -3.1%) regalloc.NumSpills
    mean -2.6%, geomean -3.9%) regalloc.NumReloads
    mean +0.6%, geomean +0.6%) size..text

Static / LLVM Statistics:
    regalloc.NumSpills   mean -3.0%
    regalloc.NumReloads  mean -3.3%
    size..text           mean +0.3%, geomean +0.3%

Differential Revision: https://reviews.llvm.org/D133902
2022-09-30 16:01:33 -07:00
Benjamin Kramer
c349d7f4ff [SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path
The main difference is that this preserves intermediate rounding steps,
which the other route doesn't. This aligns bfloat16 more with half
floats, which use this path on most targets.

I didn't understand what the difference was between these softening
approaches when I first added bfloat lowerings, would be nice if we only
had one of them.

Based on @pengfei 's D131502

Differential Revision: https://reviews.llvm.org/D133207
2022-09-06 11:54:34 +02:00
Phoebe Wang
c7ec6e19d5 [X86][BF16] Make backend type bf16 to follow the psABI
X86 psABI has updated to support __bf16 type, the ABI of which is the
same as FP16. See https://discourse.llvm.org/t/patch-add-optional-bfloat16-support/63149

This is an alternative of D129858, which has less code modification and
supports the vector type as well.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D130832
2022-08-10 08:58:56 +08:00
Benjamin Kramer
8c4a07c61f [DAGCombiner] Fold fold (fp_to_bf16 (bf16_to_fp op)) -> op 2022-06-15 19:54:39 +02:00
Benjamin Kramer
ca50cb120b [SelectionDAG] Constant fold FP_TO_BF16 and BF16_TO_FP. 2022-06-15 18:51:32 +02:00
Benjamin Kramer
8bc0bb9564 Add a conversion from double to bf16
This introduces a new compiler-rt function `__truncdfbf2`.
2022-06-15 12:56:31 +02:00
Benjamin Kramer
fb34d531af Promote bf16 to f32 when the target doesn't support it
This is modeled after the half-precision fp support. Two new nodes are
introduced for casting from and to bf16. Since casting from bf16 is a
simple operation I opted to always directly lower it to integer
arithmetic. The other way round is more complicated if you want to
preserve IEEE semantics, so it's handled by a new __truncsfbf2
compiler-rt builtin.

This is of course very bare bones, but sufficient to get a semi-softened
fadd on x86.

Possible future improvements:
 - Targets with bf16 conversion instructions can now make fp_to_bf16 legal
 - The software conversion to bf16 can be replaced by a trivial
   implementation under fast math.

Differential Revision: https://reviews.llvm.org/D126953
2022-06-15 12:56:31 +02:00