llvm-project

Author	SHA1	Message	Date
Benjamin Kramer	55c466da2f	[X86][AVX512BF16] Add a few missing insert/extract patterns These are really the same as the f16 (and i16) instructions, but we need them for any type that can occur.	2024-03-06 00:52:29 +01:00
Phoebe Wang	ff72c83b01	[X86] Add missing subvector_subreg_lowering for BF16 (#83720 ) Fixes: #83358	2024-03-04 10:15:43 +08:00
Simon Pilgrim	a2a0089ac3	[X86] movsd/movss/movd/movq - add support for constant comments (#78601 ) If we're loading a constant value, print the constant (and the zero upper elements) instead of just the shuffle mask. This did require me to move the shuffle mask handling into addConstantComments as we can't handle this in the MC layer.	2024-01-19 14:21:26 +00:00
Phoebe Wang	9745c13ca8	[X86][BF16] Improve float -> bfloat lowering under AVX512BF16 and AVXNECONVERT (#78042 )	2024-01-17 10:09:26 +08:00
Phoebe Wang	59af659ee3	[X86][BF16] Try to use `f16` for lowering (#76901 ) This patch fixes BF16 32-bit ABI problem: https://godbolt.org/z/6dMnh8jGG	2024-01-05 15:25:18 +08:00
Phoebe Wang	176c341198	[X86][BF16] Add 32-bit tests to show ABI problem, NFC	2024-01-04 15:43:34 +08:00
Phoebe Wang	a384cd5012	[X86][BF16] Add subvec_zero_lowering patterns (#76507 )	2023-12-31 11:14:41 +08:00
Phoebe Wang	e499ae53b3	[X86][BF16] Support INSERT_SUBVECTOR and CONCAT_VECTORS (#76485 )	2023-12-28 13:29:01 +08:00
Phoebe Wang	3081bacb60	[X86][BF16] Add X86SubVBroadcastld patterns (#76479 )	2023-12-28 10:08:27 +08:00
Simon Pilgrim	11276563c8	[X86] X86DAGToDAGISel - attempt to merge XMM/YMM loads with YMM/ZMM loads of the same ptr (#73126 ) If we are loading the same ptr at different vector widths, then reuse the largest load and just extract the low subvector. Unlike the equivalent VBROADCAST_LOAD/SUBV_BROADCAST_LOAD folds which can occur in DAG, we have to wait until DAGISel otherwise we can hit infinite loops if constant folding recreates the original constant value. This is mainly useful for better constant sharing.	2023-11-27 10:26:26 +00:00
Simon Pilgrim	381efa4960	Revert rG67275263b3b781a "[X86] X86DAGToDAGISel - attempt to merge XMM/YMM loads with YMM/ZMM loads of the same ptr (#73126 )" Missed an issue that we were calling continue from within the for loop - fixed version incoming shortly.	2023-11-23 16:50:58 +00:00
Simon Pilgrim	67275263b3	[X86] X86DAGToDAGISel - attempt to merge XMM/YMM loads with YMM/ZMM loads of the same ptr (#73126 ) If we are loading the same ptr at different vector widths, then reuse the larger load and just extract the low subvector. Unlike the equivalent VBROADCAST_LOAD/SUBV_BROADCAST_LOAD folds which can occur in DAG, we have to wait until DAGISel otherwise we can hit infinite loops if constant folding recreates the original constant value. This is mainly useful for better constant sharing.	2023-11-23 14:10:23 +00:00
Jay Foad	7b3bbd83c0	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038 )" This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c. Reverted due to various buildbot failures.	2023-10-09 12:31:32 +01:00
Jay Foad	2501ae58e3	[CodeGen] Really renumber slot indexes before register allocation (#67038 ) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps.	2023-10-09 11:44:41 +01:00
Phoebe Wang	b667e9c23d	[X86][BF16] Lower FP_ROUND for vector types under AVX512BF16 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D158952	2023-08-29 13:48:13 +08:00
Phoebe Wang	30ec9473c6	[X86][BF16] Add test coverage for AVX-NE-CONVERT Split from D158952.	2023-08-29 09:08:01 +08:00
Phoebe Wang	6688701497	[X86][BF16] Lower FP_EXTEND for vector types under AVX512BF16 Fixes #64460 Reviewed By: RKSimon, skan Differential Revision: https://reviews.llvm.org/D158950	2023-08-28 21:27:10 +08:00
Phoebe Wang	23e2a82446	Revert "[X86][BF16] Lower FP_EXTEND for vector types under AVX512BF16" This reverts commit 4ae7ed6e19bab0d62c0f936bd6f555103cc3b197. Sorry, missing the test update.	2023-08-28 21:06:22 +08:00
Phoebe Wang	4ae7ed6e19	[X86][BF16] Lower FP_EXTEND for vector types under AVX512BF16 Fixes #64460 Reviewed By: RKSimon, skan Differential Revision: https://reviews.llvm.org/D158950	2023-08-28 20:52:06 +08:00
Phoebe Wang	3753ea8311	Revert "[X86][BF16] Lower FP_EXTEND for vector types under AVX512BF16" This reverts commit 915139fc73fd34223caaec0c3b3525ad79540ec0. The constant value is 16 rather than 8. Revert it and then reland.	2023-08-28 20:48:16 +08:00
Phoebe Wang	915139fc73	[X86][BF16] Lower FP_EXTEND for vector types under AVX512BF16 Fixes #64460 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D158950	2023-08-28 09:47:12 +08:00
Phoebe Wang	04527f1d32	[X86][BF16] Customize INSERT_VECTOR_ELT for bf16 when feature BF16 is on Fixes root cause of #63017. The reason is similar to BUILD_VECTOR. We have legal vector type but still soft promote for scalar type. So we need to customize these scalar to vector nodes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D155961	2023-07-22 20:26:34 +08:00
Phoebe Wang	f11526b091	[X86][BF16] Do not scalarize masked load for BF16 when we have AVX512BF16 Fixes #63017 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D155952	2023-07-22 18:16:49 +08:00
Phoebe Wang	fbae3d1d3c	Revert "[X86][BF16] Do not scalarize masked load for BF16 when we have BWI" This reverts commit ca1c05208ed35ba72869c65ad773b2cca4bbd360. It caused Buildbot fail: https://lab.llvm.org/buildbot#builders/220/builds/24870	2023-07-21 23:29:11 +08:00
Phoebe Wang	ca1c05208e	[X86][BF16] Do not scalarize masked load for BF16 when we have BWI Fixes #63017 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D155952	2023-07-21 23:18:54 +08:00
Phoebe Wang	c778ca201e	[X86][BF16] Split vNbf16 vectors according to vNf16 Fixes #63017 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D151778	2023-06-09 09:04:56 +08:00
Phoebe Wang	7634905a73	[X86][BF16] Share FP16 vector ABI with BF16 The ABI of BF16 is identical to FP16 rather than i16. Fixes #62997 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D151710	2023-06-09 09:04:56 +08:00
Phoebe Wang	bc1819389f	[X86][RFC] Using `__bf16` for AVX512_BF16 intrinsics This is an alternative of D120395 and D120411. Previously we use `__bfloat16` as a typedef of `unsigned short`. The name may give user an impression it is a brand new type to represent BF16. So that they may use it in arithmetic operations and we don't have a good way to block it. To solve the problem, we introduced `__bf16` to X86 psABI and landed the support in Clang by D130964. Now we can solve the problem by switching intrinsics to the new type. Reviewed By: LuoYuanke, RKSimon Differential Revision: https://reviews.llvm.org/D132329	2022-10-19 23:47:04 +08:00
Matthias Braun	189900eb14	X86: Stop assigning register costs for longer encodings. This stops reporting CostPerUse 1 for `R8`-`R15` and `XMM8`-`XMM31`. This was previously done because instruction encoding require a REX prefix when using them resulting in longer instruction encodings. I found that this regresses the quality of the register allocation as the costs impose an ordering on eviction candidates. I also feel that there is a bit of an impedance mismatch as the actual costs occure when encoding instructions using those registers, but the order of VReg assignments is not primarily ordered by number of Defs+Uses. I did extensive measurements with the llvm-test-suite wiht SPEC2006 + SPEC2017 included, internal services showed similar patterns. Generally there are a log of improvements but also a lot of regression. But on average the allocation quality seems to improve at a small code size regression. Results for measuring static and dynamic instruction counts: Dynamic Counts (scaled by execution frequency) / Optimization Remarks: Spills+FoldedSpills -5.6% Reloads+FoldedReloads -4.2% Copies -0.1% Static / LLVM Statistics: regalloc.NumSpills mean -1.6%, geomean -2.8% regalloc.NumReloads mean -1.7%, geomean -3.1% size..text mean +0.4%, geomean +0.4% Static / LLVM Statistics: mean -2.2%, geomean -3.1%) regalloc.NumSpills mean -2.6%, geomean -3.9%) regalloc.NumReloads mean +0.6%, geomean +0.6%) size..text Static / LLVM Statistics: regalloc.NumSpills mean -3.0% regalloc.NumReloads mean -3.3% size..text mean +0.3%, geomean +0.3% Differential Revision: https://reviews.llvm.org/D133902	2022-09-30 16:01:33 -07:00
Benjamin Kramer	c349d7f4ff	[SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path The main difference is that this preserves intermediate rounding steps, which the other route doesn't. This aligns bfloat16 more with half floats, which use this path on most targets. I didn't understand what the difference was between these softening approaches when I first added bfloat lowerings, would be nice if we only had one of them. Based on @pengfei 's D131502 Differential Revision: https://reviews.llvm.org/D133207	2022-09-06 11:54:34 +02:00
Phoebe Wang	c7ec6e19d5	[X86][BF16] Make backend type bf16 to follow the psABI X86 psABI has updated to support __bf16 type, the ABI of which is the same as FP16. See https://discourse.llvm.org/t/patch-add-optional-bfloat16-support/63149 This is an alternative of D129858, which has less code modification and supports the vector type as well. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D130832	2022-08-10 08:58:56 +08:00
Benjamin Kramer	8c4a07c61f	[DAGCombiner] Fold fold (fp_to_bf16 (bf16_to_fp op)) -> op	2022-06-15 19:54:39 +02:00
Benjamin Kramer	ca50cb120b	[SelectionDAG] Constant fold FP_TO_BF16 and BF16_TO_FP.	2022-06-15 18:51:32 +02:00
Benjamin Kramer	8bc0bb9564	Add a conversion from double to bf16 This introduces a new compiler-rt function `__truncdfbf2`.	2022-06-15 12:56:31 +02:00
Benjamin Kramer	fb34d531af	Promote bf16 to f32 when the target doesn't support it This is modeled after the half-precision fp support. Two new nodes are introduced for casting from and to bf16. Since casting from bf16 is a simple operation I opted to always directly lower it to integer arithmetic. The other way round is more complicated if you want to preserve IEEE semantics, so it's handled by a new __truncsfbf2 compiler-rt builtin. This is of course very bare bones, but sufficient to get a semi-softened fadd on x86. Possible future improvements: - Targets with bf16 conversion instructions can now make fp_to_bf16 legal - The software conversion to bf16 can be replaced by a trivial implementation under fast math. Differential Revision: https://reviews.llvm.org/D126953	2022-06-15 12:56:31 +02:00

35 Commits