llvm-project

Author	SHA1	Message	Date
Krzysztof Drewniak	3b0f506c87	[AMDGPU] Support `nuw` and `nusw` in buffer fat pointer lowering (#115039 ) This commit usis the `nuw` flag on `getelemnetptr` to set the `nuw` flag on buffer offset additions, and also moves from `inbounds` to the looser `nusw` for the existing case.	2024-11-06 11:42:47 -06:00
Matt Arsenault	aa7941289e	AMDGPU: Fold copy of scalar add of frame index (#115058 ) This is a pre-optimization to avoid a regression in a future commit. Currently we almost always emit frame index with a v_mov_b32 and use vector adds for the pointer operations. We need to consider the users of the frame index (or rather, the transitive users of derived pointer operations) to know whether the value will be used in a vector or scalar context. This saves an sgpr->vgpr copy. This optimization could be more general for any opcode that's trivially convertible from a scalar to vector form (although this is a workaround for a proper regbankselect).	2024-11-06 09:10:58 -08:00
Craig Topper	5dc8d61177	[RISCV][GISel] Implement zexti32/zexti16 ComplexPatterns. (#115097 )	2024-11-06 08:48:43 -08:00
Matt Arsenault	efe87fbc9d	AMDGPU: Improve vector of pointer handling in amdgpu-promote-alloca (#114144 )	2024-11-06 08:47:15 -08:00
dlav-sc	83f92c33a4	[RISCV] fix SP recovery in varargs functions (#114316 ) This patch fixes sp recovery in the epilogue in varargs functions when fp register is presented and second sp adjustment is applied. Source of the issue: https://github.com/llvm/llvm-project/pull/110809	2024-11-06 19:30:32 +03:00
Yingwei Zheng	f74aed7938	[DAGCombiner] Add basic support for `trunc nsw/nuw` (#113808 ) This patch adds basic support for `trunc nsw/nuw` in SDAG. It will allow DAGCombiner to further eliminate in-reg `zext/sext` instructions.	2024-11-07 00:23:53 +08:00
Sarah Spall	fb90733e19	[HLSL] implement elementwise firstbithigh hlsl builtin (#111082 ) Implements elementwise firstbithigh hlsl builtin. Implements firstbituhigh intrinsic for spirv and directx, which handles unsigned integers Implements firstbitshigh intrinsic for spirv and directx, which handles signed integers. Fixes #113486 Closes #99115	2024-11-06 07:31:39 -08:00
yingopq	86e4beb702	[MIPS] LLVM data layout give i128 an alignment of 16 for mips64 (#112084 ) Fix parts of #102783.	2024-11-06 16:14:30 +01:00
Oliver Stannard	9b016e3cb2	[ARM] Add early-clobber to MVE VCMLA.f32 (#114995 ) This instruction (but not the f16 variant) cannot us the same register for the output as either of the inputs, so it needs to be marked as early-clobber.	2024-11-06 14:46:08 +00:00
Paul Walker	38fffa630e	[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548 )	2024-11-06 11:53:33 +00:00
Vyacheslav Levytskyy	5a062191f7	[SPIR-V] Ensure correct pointee types of some OpenCL Extended Instructions' pointer arguments (#114846 ) OpenCL Extended Instruction Set Specification defines relations between return/operand types and pointee type of pointer arguments in case of remquo, fract, frexp, lgamma_r, modf, sincos and prefetch instructions (https://registry.khronos.org/SPIR-V/specs/unified1/OpenCL.ExtendedInstructionSet.100.html). This PR ensures correct pointee types of those OpenCL Extended Instructions' pointer arguments.	2024-11-06 12:44:53 +01:00
hev	cab606c306	[LoongArch] Enable alias analysis by default (#114980 ) Enable use of alias analysis during code generation.	2024-11-06 19:30:57 +08:00
Benjamin Maxwell	ea6b8fa4b9	[SDAG] Merge multiple-result libcall expansion into DAG.expandMultipleResultFPLibCall() (#114792 ) This merges the logic for expanding both FFREXP and FSINCOS into one method `DAG.expandMultipleResultFPLibCall()`. This reduces duplication and also allows FFREXP to benefit from the stack slot elimination implemented for FSINCOS. This method will also be used in future to implement more multiple-result intrinsics (such as modf and sincospi).	2024-11-06 11:06:06 +00:00
Simon Pilgrim	c75353313e	[X86] combineConcatVectorOps - add 256-bit concat(shuffle(),shuffle()) handling Improve IsConcatFree detection to handle splat vector-loads (which can be folded as X86ISD::SUBV_BROADCAST_LOAD). Fixes #114959	2024-11-06 10:47:01 +00:00
Oliver Stannard	2d56de9e7e	Revert "[ARM] Add extra tests for CVE-2024-7883 with undef/poison" Reverting because this causes a test failure in the expensive-checks buildbot. This reverts commit ed9dab67e2932baf11bfa514b07b159c3bffd518.	2024-11-06 10:35:44 +00:00
Vyacheslav Levytskyy	ebfafa2511	[SPIR-V] Fix OpFunctionParameter vs. OpTypeFunction types for pointer arguments when there are functions with aggregate arguments (#115044 ) The goal of the PR is to ensure that if module contains functions with mutated signature (due to preprocessing of aggregate types), functions still are going through re-creating of function type to preserve pointee type information for arguments. This fixes a bug when a module with (1) a function having aggregate arguments and/or return, and (2) at least two functions with signatures different only wrt. pointee types is translated so that one of two similar functions gets an incorrect OpFunctionParameter type that is different from the corresponding OpTypeFunction definition. A reproducer is attached as a new test case.	2024-11-06 11:17:45 +01:00
David Green	3d4d033cea	[AArch64][Arm] Add nested double reduction tests. NFC	2024-11-06 10:08:14 +00:00
Simon Pilgrim	270bfb2f2a	[X86] Add test coverage for #114959	2024-11-06 09:44:10 +00:00
Simon Pilgrim	e29d092af8	[X86] getFauxShuffleMask - add ISD::SHL/SRL handling This is currently mostly the same as the VSHLI/VSRLI handling below, although I've kept them separate as I'm investigating adding non-uniform shift amount handling as a followup	2024-11-06 09:44:10 +00:00
Zhaoxin Yang	8c565de5ec	[LoongArch] Support llvm.lround intrinsics with i32 return type. (#114733 ) This is needed by flang, similar to RISCV-64 in https://reviews.llvm.org/D147195.	2024-11-06 17:34:13 +08:00
Oliver Stannard	ed9dab67e2	[ARM] Add extra tests for CVE-2024-7883 with undef/poison	2024-11-06 09:28:14 +00:00
Wang Pengcheng	37ce18951f	[RISCV] Add requirement of asserts We forgot to add `REQUIRES: asserts` here.	2024-11-06 17:01:24 +08:00
BoyaoWang430	69d0bab826	[RISCV] Add load/store clustering in post machine schedule (#111504 ) #73789 added load clustering and #73796 tried to add store clustering. If post machine schedule is used, previous cluster of load/store which formed in machine schedule may break. In order to solve this, add load/sotre clustering to post machine schedule.	2024-11-06 16:21:30 +08:00
Gergely Futo	08411c855f	[RISCV] Correct fcopysign pattern for zdinx (#114954 ) Correcting the pattern fixes the following error: fatal error: error in backend: Cannot select: t17: f64 = fcopysign t5, t8	2024-11-06 09:10:37 +01:00
Pengcheng Wang	7a5b040e20	[RISCV] Add initial support of memcmp expansion There are two passes that have dependency on the implementation of `TargetTransformInfo::enableMemCmpExpansion` : `MergeICmps` and `ExpandMemCmp`. This PR adds the initial implementation of `enableMemCmpExpansion` so that we can have some basic benefits from these two passes. We don't enable expansion when there is no unaligned access support currently because there are some issues about unaligned loads and stores in `ExpandMemcmp` pass. We should fix these issues and enable the expansion later. Vector case hasn't been tested as we don't generate inlined vector instructions for memcmp currently. Reviewers: preames, arcbbb, topperc, asb, dtcxzyw Reviewed By: topperc, preames Pull Request: https://github.com/llvm/llvm-project/pull/107548	2024-11-06 15:44:12 +08:00
Pengcheng Wang	5adb5c05a2	[RISCV] Add tests for memcmp expansion We add tests for the following cases: * Length = 0, 1, 2, 3, 4, 5, 6, 7, 8, 15, 16, 31, 32, 63, 64, 127, 128, runtime. * Comparisons against zero. * RUN lines for scalar/vector w/ or w/o strict align. * Optimize for size. Reviewers: topperc, preames Reviewed By: topperc, preames Pull Request: https://github.com/llvm/llvm-project/pull/107824	2024-11-06 15:12:35 +08:00
Craig Topper	f4270045f4	[RISCV] Add Zfinx/Zdinx RUN lines to rv64d-double-convert-strict.ll and rv64f-float-convert-strict.ll. NFC	2024-11-05 21:48:38 -08:00
Heejin Ahn	492812f613	[WebAssembly] Fix rethrow's index calculation (#114693 ) So far we have assumed that we only rethrow the exception caught in the innermost EH pad. This is true in code we directly generate, but after inlining this may not be the case. For example, consider this code: ```ll ehcleanup: %0 = cleanuppad ... call @destructor cleanupret from %0 unwind label %catch.dispatch ``` If `destructor` gets inlined into this function, the code can be like ```ll ehcleanup: %0 = cleanuppad ... invoke @throwing_func to label %unreachale unwind label %catch.dispatch.i catch.dispatch.i: catchswitch ... [ label %catch.start.i ] catch.start.i: %1 = catchpad ... invoke @some_function to label %invoke.cont.i unwind label %terminate.i invoke.cont.i: catchret from %1 to label %destructor.exit destructor.exit: cleanupret from %0 unwind label %catch.dispatch ``` We lower a `cleanupret` into `rethrow`, which assumes it rethrows the exception caught by the nearest dominating EH pad. But after the inlining, the nearest dominating EH pad is not `ehcleanup` but `catch.start.i`. The problem exists in the same manner in the new (exnref) EH, because it assumes the exception comes from the nearest EH pad and saves an exnref from that EH pad and rethrows it (using `throw_ref`). This problem can be fixed easily if `cleanupret` has the basic block where its matching `cleanuppad` is. The bitcode instruction `cleanupret` kind of has that info (it has a token from the `cleanuppad`), but that info is lost when when we enter ISel, because `TargetSelectionDAG.td`'s `cleanupret` node does not have any arguments: `5091a359d9/llvm/include/llvm/Target/TargetSelectionDAG.td (L700)` Note that `catchret` already has two basic block arguments, even though neither of them means `catchpad`'s BB. This PR adds the `cleanuppad`'s BB as an argument to `cleanupret` node in ISel and uses it in the Wasm backend. Because this node is also used in X86 backend we need to note its argument there too but nothing more needs to change there as long as X86 doesn't need it. --- - Details about changes in the Wasm backend: After this PR, our pseudo `RETHROW` instruction takes a BB, which means the EH pad whose exception it needs to rethrow. There are currently two ways to generate a `RETHROW`: one is from `llvm.wasm.rethrow` intrinsic and the other is from `CLEANUPRET` we discussed above. In case of `llvm.wasm.rethrow`, we add a '0' as a placeholder argument when it is lowered to a `RETHROW`, and change it to a BB in LateEHPrepare. As written in the comments, this PR doesn't change how this BB is computed. The BB argument will be converted to an immediate argument as with other control flow instructions in CFGStackify. In case of `CLEANUPRET`, it already has a BB argument pointing to an EH pad, so it is just converted to a `RETHROW` with the same BB argument in LateEHPrepare. This will also be lowered to an immediate in CFGStackify with other control flow instructions. --- Fixes #114600.	2024-11-05 21:45:13 -08:00
Craig Topper	cbc7812565	[RISCV] Add Zdinx RUN line to rv64d-double-convert.ll. NFC We already have a Zfinx RUN line for rv64f-float-convert.ll.	2024-11-05 21:12:09 -08:00
WANG Rui	a165bbddf9	[LoongArch][NFC] Reland "Pre-commit tests for codegen with alias analysis"	2024-11-06 11:54:34 +08:00
WANG Rui	9ba0e5c27d	Revert "[LoongArch][NFC] Pre-commit tests for codegen with alias analysis" This reverts commit 445db93844cb50eeb6f587bef0749c2950b46e70.	2024-11-06 11:45:18 +08:00
Madhur Amilkanthwar	895a8e66c6	[AArch64][GISel] Support neon.abs intrinsic for vector types (#107226 ) This patch lowers the intrinsic to G_ABS and thus supports the intrinsic in GISel.	2024-11-06 08:31:46 +05:30
Luke Lau	3a26feb607	[RISCV] Lower fixed-length mgather/mscatter for zvfhmin/zvfbfmin (#114945 ) In preparation for allowing zvfhmin and zvfbfmin in isLegalElementTypeForRVV, this lowers fixed-length masked gathers and scatters We need to mark f16 and bf16 as legal in isLegalMaskedGatherScatter otherwise ScalarizeMaskedMemIntrin will just scalarize them, but we can move this back into isLegalElementTypeForRVV afterwards. The scalarized codegen required #114938, #114927 and #114915 to not crash.	2024-11-06 10:33:06 +08:00
Craig Topper	db21dbd12a	[RISCV][GISel] Add constant_fold_cast_op to RISCVPostLegalizerCombiner.	2024-11-05 17:48:54 -08:00
Jon Roelofs	4c3e1e3c4a	[llvm][AsmPrinter] Add an option to print instruction latencies (#113243 ) ... matching what we have in the disassembler. This isn't turned on by default since several of the scheduling models are not completely accurate, and we don't want to be misleading.	2024-11-05 17:28:52 -08:00
ZhaoQi	92be2cb086	[LoongArch] Use LSX for scalar FP rounding with explicit rounding mode (#114766 ) LoongArch FP base ISA only have frint.{s/d} instruction which reads the global rounding mode. Utilize LSX for explicit rounding mode for scalar ceil/floor/trunc/roundeven calls when -mlsx opend. It is faster than calling the libm library functions. Same as what gcc did: https://gcc.gnu.org/pipermail/gcc-cvs/2023-November/394218.html	2024-11-06 09:26:28 +08:00
Philip Reames	a905203b9e	[RISCV] Prefer strided load for interleave load with only one lane active (#115069 ) If only one of the elements is actually used, then we can legally use a strided load in place of the segment load. Doing so reduces vector register pressure, so if both segment and strided are believed to be element/segment at a time, then prefer the strided load variant. Note that I've seen the vectorizer emitting wide interleave loads to represent a strided load, so this does happen in practice. It doesn't matter much for small LMUL*NF, but at large NF can start causing problems in register allocation. Note that this patch only covers the fixed vector formation cases. In theory, we should do the same patch for scalable, but we can currently only represent NF2 in scalable IR, and NF2 is assumed to be optimized to better than segment-at-a-time by default, so there's currently nothing to do.	2024-11-05 16:15:20 -08:00
Craig Topper	339f395ece	[RISCV][GISel] Enable commute_constant_to_rhs in RISCVPostLegalizerCombiner.	2024-11-05 15:08:43 -08:00
Craig Topper	a20b902b35	[RISCV][GISel] Copy some Zbb and Zbkb IR tests. NFC These are copies of SDAG tests with some of the more specialized cases removed. We can add them later when we're ready to improve them.	2024-11-05 15:08:43 -08:00
Craig Topper	13b5899c29	[SelectionDAGBuilder][X86] Don't form FMAXNUM for f16 vectors if FMAXNUM needs to be promoted. (#114943 ) In #70357, I changed a isLegalOrCustom to isLegalOrCustomOrPromote in visitSelect to enable integer min/max to be formed when the operation was promoted. Unfortunately, this also affected floating point. For floating point, fmaxnum may require a libcall so we also need to check if the operation on the promoted type is legal or custom. Other changes to RISC-V have seen made the original change untested so this patch restores the original isLegalOrCustom. Fixes #114520.	2024-11-05 15:06:37 -08:00
Stanislav Mekhanoshin	6d7e51de5e	[AMDGPU] Extend type support for update_dpp intrinsic (#114597 ) We can split 64-bit DPP as a post-RA pseudo if control values are supported, but cannot handle other types.	2024-11-05 13:59:14 -08:00
dlav-sc	97982a8c60	[RISCV][CFI] add function epilogue cfi information (#110810 ) This patch adds CFI instructions in the function epilogue. Before patch: addi sp, s0, -32 ld ra, 24(sp) # 8-byte Folded Reload ld s0, 16(sp) # 8-byte Folded Reload ld s1, 8(sp) # 8-byte Folded Reload addi sp, sp, 32 ret After patch: addi sp, s0, -32 .cfi_def_cfa sp, 32 ld ra, 24(sp) # 8-byte Folded Reload ld s0, 16(sp) # 8-byte Folded Reload ld s1, 8(sp) # 8-byte Folded Reload .cfi_restore ra .cfi_restore s0 .cfi_restore s1 addi sp, sp, 32 .cfi_def_cfa_offset 0 ret This functionality is already present in `riscv-gcc`, but it’s not in `clang` and this slightly impairs the `lldb` debugging experience, e.g. backtrace.	2024-11-06 00:20:21 +03:00
Brox Chen	e8644e3b47	[AMDGPU][True16][MC] VOP2 update instructions with fake16 format (#114436 ) Some old "t16" VOP2 instructions are actually in fake16 format. Correct and update test file	2024-11-05 16:12:49 -05:00
Matt Arsenault	0b40f97929	AMDGPU: Treat uint32_max as the default value for amdgpu-max-num-workgroups (#113751 ) 0 does not make sense as a value for this to be, much less the default. Also stop emitting each individual field if it is the default, rather than if any element was the default. Also fix the name of the test since it didn't exactly match the real attribute name.	2024-11-05 12:50:44 -08:00
Kai Nacke	4a37799a48	[SystemZ][XRay] Implement XRay instrumentation for SystemZ (#113253 ) Expands pseudo instructions PATCHABLE_FUNCTION_ENTER and PATCHABLE_RET into a small instruction sequence which calls into the XRay library.	2024-11-05 15:42:55 -05:00
Kai Nacke	8b659736f7	[SystemZ] Make lit test more specific (#115050 ) The lit test fmuladd-soft-float.ll only specifies s390x as platform, but the test is Linux specific, causing problems when run on z/OS. This change updates the triple to fix this.	2024-11-05 15:29:32 -05:00
Craig Topper	e566ae8812	[RISCV][GISel] Remove s32 support for G_ABS on RV64. I plan to remove s32 as a legal type to match SelectionDAG and to remove i32 from the GPR regclass on RV64.	2024-11-05 12:05:30 -08:00
Matt Arsenault	ce067c5a3b	AMDGPU: Rename test file	2024-11-05 10:42:12 -08:00
Finn Plummer	3cdac06708	[HLSL][SPIRV][DXIL] Implement `dot4add_i8packed` intrinsic (#113623 ) - create a clang built-in in Builtins.td - link dot4add_i8packed in hlsl_intrinsics.h - add lowering to spirv backend through expansion of operation as OPSDot is missing up to SPIRV 1.6 in SPIRVInstructionSelector.cpp - add lowering to spirv backend using OpSDot in applicable SPIRV version or if SPV_KHR_integer_dot_product is enabled - add dot4add_i8packed intrinsic to IntrinsicsDirectX.td and mapping to DXIL.td op Dot4AddI8Packed - add tests for HLSL intrinsic lowering to dx/spv intrinsic in dot4add_i8packed.hlsl - add tests for sema checks in dot4add_i8packed-errors.hlsl - add test of spir-v lowering in SPIRV/dot4add_i8packed.ll - add test to dxil lowering in DirectX/dot4add_i8packed.ll Resolves #99220	2024-11-05 10:29:08 -08:00
Simon Pilgrim	61d5addd94	[X86] SimplifyDemandedBitsForTargetNode - call SimplifyMultipleUseDemandedBits on SSE shift-by-immediate nodes. Attempt to peek through multiple-use SHLI/SRLI/SRAI source vectors.	2024-11-05 18:24:13 +00:00

1 2 3 4 5 ...

55866 Commits