llvm-project

Author	SHA1	Message	Date
David Majnemer	9eff001d3d	[TargetLowering] Correctly yield NaN from FP_TO_BF16 We didn't set the exponent field, resulting in tiny numbers instead of NaNs.	2024-02-21 22:17:02 +00:00
David Majnemer	ddc0f1d8fe	[TargetLowering] Actually add the adjustment to the significand The logic was supposed to be choosing between {0, 1, -1} as an adjustment to the FP bit pattern. However, the adjustment itself was used as the bit pattern instead which result in garbage results.	2024-02-21 19:34:11 +00:00
David Majnemer	cc13f3ba45	Correctly round FP -> BF16 when SDAG expands such nodes (#82399 ) We did something pretty naive: - round FP64 -> BF16 by first rounding to FP32 - skip FP32 -> BF16 rounding entirely - taking the top 16 bits of a FP32 which will turn some NaNs into infinities Let's do this in a more principled way by rounding types with more precision than FP32 to FP32 using round-inexact-to-odd which will negate double rounding issues.	2024-02-21 12:37:02 -05:00
Luke Lau	2cd59bdc89	[RISCV] Add test case for miscompile in gather -> strided load combine. NFC This shows the issue in #82430, but triggers it via the widening SEW combine rather than a GEP that RISCVGatherScatterLowering doesn't detect.	2024-02-22 00:30:38 +08:00
Jonas Paulsson	9c0e45d7f0	[SystemZ] Use VT (not ArgVT) for SlotVT in LowerCall(). (#82475 ) When an integer argument is promoted and not split (like i72 -> i128 on a new machine with vector support), the SlotVT should be i128, which is stored in VT - not ArgVT. Fixes #81417	2024-02-21 16:26:16 +01:00
Dinar Temirbulatov	5a023f564f	[AArch64][SVE2] Enable dynamic shuffle for fixed length types. (#72490 ) When SVE register size is unknown or the minimal size is not equal to the maximum size then we could determine the actual SVE register size in the runtime and adjust shuffle mask in the runtime.	2024-02-21 14:59:47 +00:00
Momchil Velikov	1a7166833d	[AArch64] Fix stack probing clobbering flags (#81879 ) Certain stack probing sequences might clobber flags, then we can't use a block as a prologue if the flags register is a live-in on entry to that block.	2024-02-21 13:58:04 +00:00
Simon Pilgrim	b8c9b06134	[X86] LowerCTPOP - add i3 and i4 LUT 'shift+mask' expansions Use the 3 or 4 active bits as a shift amount into a i32/i64 constant representing the number of set bits. In future, it might be worthwhile to move this into a generic location in case other targets want to make use of them. Another expansion pulled from #79823	2024-02-21 13:53:47 +00:00
Simon Pilgrim	98a07f72ee	[X86] LowerCTPOP - "ctpop(i2 x) --> sub(x, (x >> 1))" If we only have 2 active bits then we can avoid the i8 CTPOP multiply expansion entirely Another expansion pulled from #79823	2024-02-21 13:53:47 +00:00
chuongg3	0fb3d4296f	[AArch64][GlobalISel] Refactor BITCAST Legalization (#80505 ) Ensure BITCAST is only legal for types with the same amount of bits. Enable BITCAST to work with non-legal vector types as well.	2024-02-21 13:24:45 +00:00
hev	dd3e0a4643	[LoongArch] Assume no-op addrspacecasts by default (#82332 ) This PR indicates that `addrspacecasts` are always no-ops on LoongArch. Fixes #82330	2024-02-21 21:15:17 +08:00
Chia	c50ca3daa4	[RISCV][ISel] Combine vector fadd/fsub/fmul with fp extend. (#81248 ) Extend D133739 and #76785 to support vector widening floating-point add/sub/mul instructions. Specifically, this patch works for the below optimization case: ### Source code ``` define void @vfwmul_v2f32_multiple_users(ptr %x, ptr %y, ptr %z, <2 x float> %a, <2 x float> %b, <2 x float> %b2) { %c = fpext <2 x float> %a to <2 x double> %d = fpext <2 x float> %b to <2 x double> %d2 = fpext <2 x float> %b2 to <2 x double> %e = fmul <2 x double> %c, %d %f = fadd <2 x double> %c, %d2 %g = fsub <2 x double> %d, %d2 store <2 x double> %e, ptr %x store <2 x double> %f, ptr %y store <2 x double> %g, ptr %z ret void } ``` ### Before this patch [Compiler Explorer](https://godbolt.org/z/aaEMs5s9h) ``` vfwmul_v2f32_multiple_users: vsetivli zero, 2, e32, mf2, ta, ma vfwcvt.f.f.v v11, v8 vfwcvt.f.f.v v8, v9 vfwcvt.f.f.v v9, v10 vsetvli zero, zero, e64, m1, ta, ma vfmul.vv v10, v11, v8 vfadd.vv v11, v11, v9 vfsub.vv v8, v8, v9 vse64.v v10, (a0) vse64.v v11, (a1) vse64.v v8, (a2) ret ``` ### After this patch ``` vfwmul_v2f32_multiple_users: vsetivli zero, 2, e32, mf2, ta, ma vfwmul.vv v11, v8, v9 vfwadd.vv v12, v8, v10 vfwsub.vv v8, v9, v10 vse64.v v11, (a0) vse64.v v12, (a1) vse64.v v8, (a2) ```	2024-02-21 22:06:40 +09:00
Simon Pilgrim	3cb4f62de0	[X86] Regenerate vector tests to add missing avx512 constant broadcast comments	2024-02-21 10:46:12 +00:00
Simon Pilgrim	bdeb3d47d1	[X86] Regenerate saddsat/ssubsat vector tests Adds missing avx512 constant broadcast comments	2024-02-21 10:46:12 +00:00
Nick Anderson	5db49f7266	[GlobalISel] replace right identity X * -1.0 with fneg(x) (#80526 ) follow up patch to #78673 @Pierre-vh @jayfoad @arsenm Could you review when you have a chance.	2024-02-21 09:41:59 +00:00
Tuan Chuong Goh	1ff1e82383	[AArch64][GlobalISel] Pre-Commit Tests for Refactor BITCAST	2024-02-21 09:17:05 +00:00
Yingwei Zheng	02fad0565f	[RISCV][SDAG] Fold `select c, ~x, x` into `xor -c, x` (#82462 ) This patch lowers select of constants if `TrueV == ~FalseV`. Address the comment in https://github.com/llvm/llvm-project/pull/82456#discussion_r1496881603.	2024-02-21 16:27:43 +08:00
Owen Anderson	44b717df4d	[GlobalISel] Clamp out-of-range G_EXTRACT_VECTOR_ELT constant indices when converting them into loads. (#82460 ) This avoid turning a poison value into a segfault, and fixes https://github.com/llvm/llvm-project/issues/78383	2024-02-21 00:42:22 -05:00
Sameer Sahasrabuddhe	a2afcd5721	Revert "Implement convergence control in MIR using SelectionDAG (#71785 )" This reverts commit 79889734b940356ab3381423c93ae06f22e772c9. Encountered multiple buildbot failures.	2024-02-21 11:07:02 +05:30
Wang Pengcheng	b8ed69ecc0	[RISCV] Support llvm.readsteadycounter intrinsic This intrinsic was introduced by #81331, which is a lot like `llvm.readcyclecounter`. For the RISCV implementation, we rename `ReadCycleWide` pseudo to `ReadCounterWide` and make it accept two operands (the low and high parts of the counter). As for legalization and lowering parts, we reuse the code of `ISD::READCYCLECOUNTER` (make it able to handle both intrinsics), and we use `time` CSR for `ISD::READSTEADYCOUNTER`. Tests using Clang builtins are runned on real hardware and it works as excepted. Reviewers: asb, MaskRay, dtcxzyw, preames, topperc, jhuber6 Reviewed By: jhuber6, asb, MaskRay, dtcxzyw Pull Request: https://github.com/llvm/llvm-project/pull/82322	2024-02-21 13:12:14 +08:00
Sameer Sahasrabuddhe	79889734b9	Implement convergence control in MIR using SelectionDAG (#71785 ) LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-02-21 10:06:37 +05:30
Michal Paszkowski	03203b79c6	[SPIR-V] Fix vloadn OpenCL builtin lowering (#81148 ) This pull request fixes an issue with missing vector element count immediate in OpExtInst calls and adds a case for generating bitcasts before GEPs for kernel arguments of non-matching pointer type. The new LITs are based on basic/vload_local and basic/vload_global OpenCL CTS tests. The tests after this change pass SPIR-V validation.	2024-02-20 20:04:04 -08:00
Owen Anderson	c02b0d008c	[GlobalISel] Make sure to check for load barriers when merging G_EXTRACT_VECTOR_ELT into G_LOAD. (#82306 ) Fixes https://github.com/llvm/llvm-project/issues/78477	2024-02-20 22:42:14 -05:00
Sumanth Gundapaneni	1219214a3b	[Hexagon] Update InstrInfo to include LD/ST offsets of vector instructions (#82386 ) The hook HexagonInstrInfo::isValidOffset() is updated to evaluate offsets of missed LD/ST vector instructions.	2024-02-20 15:29:05 -06:00
Valery Pykhtin	807ed697be	[AMDGPU] Use autogenerated test checks for sdwa-preserve.mir test. NFC. (#82380 )	2024-02-20 20:05:44 +01:00
Yuta Saito	ba3c1f9ce3	[WebAssembly] Add segment RETAIN flag to support private retained data (#81539 ) In WebAssembly, we have `WASM_SYMBOL_NO_STRIP` symbol flag to mark the referenced content as retained. However, the flag is not enough to express retained data that is not referenced by any symbol. This patch adds a new segment flag`WASM_SEG_FLAG_RETAIN` to support "private" linkage data that is retained by llvm.used. This kind of data that is not referenced but must be retained is usually used with encapsulation symbols (__start/__stop). Swift runtime uses this technique and depends on the fact "all metadata sections in live objects are retained", which was not guaranteed with `--gc-sections` before this patch. This is a revised version of https://reviews.llvm.org/D126950 (has been reverted) based on @MaskRay's comments	2024-02-21 03:35:36 +09:00
Caroline Concatto	48af281f7a	Revert "[AArch64] Restore Z-registers before P-registers (#79623 )" This reverts commit 3f0404aae7ed2f7138526e1bcd100a60dfe08227. std::reverse is breaking some builds	2024-02-20 18:13:33 +00:00
Caroline Concatto	7af70643ca	Revert "[AArch64] Remove unused ReverseCSRRestoreSeq option. (#82326 )" Patch 3f0404aae7ed2 is breaking some debugs build so we cannot use the reverse here. This reverts commit 493f10106f7f1799eb67be95058b251e6a3bf0af.	2024-02-20 18:13:33 +00:00
Simon Pilgrim	066773c411	[X86] computeKnownBitsForTargetNode - add generic handling of PSHUFB When PSHUFB is used as a LUT (for CTPOP, BITREVERSE etc.), its the source operand that is constant and the index operand the variable. As long as the indices don't set the MSB (which zeros the output element), then the common known bits from the source operand can be used directly, even though the shuffle mask isn't constant. Further helps to improve CTPOP reduction codegen	2024-02-20 17:14:49 +00:00
Simon Pilgrim	2f1e33df32	[X86] Fold add(psadbw(X,0),psadbw(Y,0)) -> psadbw(add(X,Y),0) If the vXi8 add(X,Y) is guaranteed not to overflow then we can push the addition though the psadbw nodes (being used for reduction) and only need a single psadbw node. Noticed while working on CTPOP reduction codegen	2024-02-20 15:58:29 +00:00
Sander de Smalen	493f10106f	[AArch64] Remove unused ReverseCSRRestoreSeq option. (#82326 ) This patch removes the `-reverse-csr-restore-seq` option from AArch64FrameLowering, since this is no longer used.	2024-02-20 15:08:06 +00:00
stephenpeckham	26db845536	[XCOFF] Support the subtype flag in DWARF section headers (#81667 ) The section headers for XCOFF files have a subtype flag for Dwarf sections. This PR updates obj2yaml, yaml2obj, and llvm-readobj so that they recognize the subtype.	2024-02-20 08:42:12 -06:00
Shilei Tian	2ad43fa467	[AMDGPU] Fix operand types for `V_DOT2_F32_BF16` (#82044 )	2024-02-20 08:25:01 -05:00
Krasimir Georgiev	49a8fc0da4	Revert "[Hexagon] Optimize post-increment load and stores in loops. (#82011 )" This reverts commit 0e6a48c3e8cc53f9eb5945ec04f8e03f6d2bae37. Temporary revert as it causes bad codegen: https://github.com/llvm/llvm-project/pull/82011#issuecomment-1951426107	2024-02-20 12:15:23 +00:00
David Green	1b12974ccb	[AArch64][AMDGPU][GlobalISel] Remove vector handling from unmerge_dead_to_trunc (#82224 ) This combine transforms an unmerge where only the first element is used into a truncate. That works OK for scalar but for vector needs to insert a bitcast to integers, perform the truncate then bitcast back to vectors. This generates more awkward code than using an Unmerge.	2024-02-20 10:54:44 +00:00
Thorsten Schütt	63a4b4f610	[GlobalIsel] Combine logic of floating point compares (#81886 ) It is purely based on symmetry. Registers can be scalars, vectors, and non-constants. X < 5.0 \|\| X > 5.0 -> X != 5.0 X < Y && X > Y -> FCMP_FALSE X < Y && X < Y -> FCMP_TRUE see InstCombinerImpl::foldLogicOfFCmps	2024-02-20 09:56:33 +01:00
Yeting Kuo	61ae7e4982	[RISCV] Select pattern (shl (sext_vl/zext_vl), 1) to VWADD/VWADDU. (#82225 ) Previously, we already had similar selection pattern for (shl (ext)) and (shl_vl (ext_vl)).	2024-02-20 09:23:31 +08:00
Michael Maitland	44a46a0b68	[RISCV][GISEL] Add IRTranslation for insertelement with scalable vector type (#80377 ) This patch is stacked on #80372, #80307, and #80306.	2024-02-19 15:30:48 -05:00
Vyacheslav Levytskyy	66ebda46fc	Add support for the SPIR-V extension SPV_KHR_uniform_group_instructions (#82064 ) This PR is to add support for the SPIR-V extension SPV_KHR_uniform_group_instructions that adds new instructions to SPIR-V to support additional group operations within uniform control flow.	2024-02-19 21:30:31 +01:00
Craig Topper	f8cbb67b10	[DAGCombiner] Preserve nneg flag from inner zext when we combine (z/s/aext (zext X)) (#82199 )	2024-02-19 12:21:17 -08:00
Vyacheslav Levytskyy	8e8f9c0bc0	fix generation of unnecessary OpExecutionMode records (#81839 ) SPIRV-V Backend generates unnecessary OpExecutionMode records, putting into the id's which are not the Entry Point operands of an OpEntryPoint (ref: https://github.com/llvm/llvm-project/issues/81753). This PR is to fix the issue.	2024-02-19 20:51:32 +01:00
Craig Topper	2426055a64	[RISCV] Add more zext nneg tests. NFC This adds additional tests for #82199. These tests need us to propagate the nneg flag when we zero/sign extend an existing zext nneg node. For these tests on RV64, call lowering will need to sign extend or zero extend the existing zext nneg to i64. getNode will fold this into a single zext. We should propagate the nneg flag from the original zext nneg. This will allow us to remove the zext nneg based on known sign bits during DAG combine.	2024-02-19 11:09:43 -08:00
Craig Topper	f668a08e00	[DAGCombiner][RISCV] Optimize (zext nneg (truncate X)) if X has known sign bits. (#82227 ) This treats the zext nneg as sext if X is known to have sufficient sign bits to allow the zext or truncate or both to removed. This code is taken from the same optimization for sext.	2024-02-19 10:45:11 -08:00
Craig Topper	b1849a2c6b	[RISCV] Add test cases for missed opportunites to treat a zext nneg as sext. NFC These tests have a dominating icmp that require an i16 value to be sign extended to do the compare. Because of this, the i16 will be exported from the first basic block sign extended to XLen. We can use this fact to remove the zext nneg in the scond block.	2024-02-19 10:27:32 -08:00
Simon Pilgrim	5bd374df3e	[X86] psadbw.ll - add AVX2 target test coverage	2024-02-19 17:04:07 +00:00
Simon Pilgrim	8d8bb35ac3	[X86] Add some basic test coverage for #81765 Test cases demonstrating poor value tracking of PSADBW results	2024-02-19 15:20:09 +00:00
CarolineConcatto	3f0404aae7	[AArch64] Restore Z-registers before P-registers (#79623 ) This is needed by PR#77665[1] that uses a P-register while restoring Z-registers. The reverse for SVE register restore in the epilogue was added to guarantee performance, but further work was done to improve sve frame restore and besides that the schedule also may change the order of the restore, undoing the reverse restore. [1]https://github.com/llvm/llvm-project/pull/77665	2024-02-19 13:39:24 +00:00
Vyacheslav Levytskyy	925768eeab	Add support for atomic instruction on floating-point numbers (#81683 ) This PR adds support for atomic instruction on floating-point numbers: * SPV_EXT_shader_atomic_float_add * SPV_EXT_shader_atomic_float_min_max * SPV_EXT_shader_atomic_float16_add and fixes asm printer output for half floating-type.	2024-02-19 12:12:09 +01:00
Momchil Velikov	658e4763a2	[AArch64] Fix wrong condition in `canUseAsPrologue` (#81878 ) Inline stack probing code may need a scratch register, hence basic blocks where such register is not available cannot be used as prologues. Checking for an available scratch regidster was incorrectly skipped when the function uses stack probing.	2024-02-19 10:40:21 +00:00
Tim Northover	0215d2c58b	arm64_32: extend @llvm.stackguard call to in-DAG 64-bits before handing off Pointers are 64-bits in the DAG, so we need to extend the result of loading the cookie when building the DAG.	2024-02-19 10:32:29 +00:00

... 12 13 14 15 16 ...

52796 Commits