llvm-project

Author	SHA1	Message	Date
Sander de Smalen	81b7f115fb	[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979 ) It seems TypeSize is currently broken in the sense that: TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8) without failing its assert that explicitly tests for this case: assert(LHS.Scalable == RHS.Scalable && ...); The reason this fails is that `Scalable` is a static method of class TypeSize, and LHS and RHS are both objects of class TypeSize. So this is evaluating if the pointer to the function Scalable == the pointer to the function Scalable, which is always true because LHS and RHS have the same class. This patch fixes the issue by renaming `TypeSize::Scalable` -> `TypeSize::getScalable`, as well as `TypeSize::Fixed` to `TypeSize::getFixed`, so that it no longer clashes with the variable in FixedOrScalableQuantity. The new methods now also better match the coding standard, which specifies that: * Variable names should be nouns (as they represent state) * Function names should be verb phrases (as they represent actions)	2023-11-22 08:52:53 +00:00
Sander de Smalen	00a831421f	[AArch64][SME] Extend Inliner cost-model with custom penalty for calls. (#68416 ) This is a stacked PR following on from #68415 This patch has two purposes: (1) It tries to make inlining more likely when it can avoid a streaming-mode change. (2) It avoids inlining when inlining causes more streaming-mode changes. An example of (1) is: ``` void streaming_compatible_bar(void); void foo(void) __arm_streaming { /* other code / streaming_compatible_bar(); / other code / } void f(void) { foo(); // expensive streaming mode change } -> void f(void) { / other code / streaming_compatible_bar(); / other code */ } ``` where it wouldn't have inlined the function when foo would be a non-streaming function. An example of (2) is: ``` void streaming_bar(void) __arm_streaming; void foo(void) __arm_streaming { streaming_bar(); streaming_bar(); } void f(void) { foo(); // expensive streaming mode change } -> (do not inline into) void f(void) { streaming_bar(); // these are now two expensive streaming mode changes streaming_bar(); }```	2023-10-31 10:28:40 +00:00
Igor Kirillov	849f963e31	[CodeGen] Improve ExpandMemCmp for more efficient non-register aligned sizes handling (#70469 ) * Enhanced the logic of ExpandMemCmp pass to merge contiguous subsequences in LoadSequence, based on sizes allowed in `AllowedTailExpansions`. * This enhancement seeks to minimize the number of basic blocks and produce optimized code when using memcmp with non-register aligned sizes. * Enable this feature for AArch64 with memcmp sizes modulo 8 equal to 3, 5, and 6. Reapplication of #69942 after fixing a bug	2023-10-30 18:40:48 +00:00
Sander de Smalen	6d30bc0085	[AArch64][SME] Allow inlining when streaming-mode attributes dont match up. (#68415 ) The use-case here is to support things like: int foo(int x, int y) __arm_streaming { return std::max<int>(x, y); } where the call to non-streaming `std::max<int>(x, y)` can be safely inlined into the streaming function. This is a first step and will need further work to allow more cases (e.g. more finegrained analysis of the function calls to ensure they don't result in any incompatible instructions for the requested mode).	2023-10-30 10:47:07 +00:00
Antonio Frighetto	138e6c1c86	[AArch64][TTI] Improve `LegalVF` when gather loads are scalarized After determining the cost of loads that could not be coalesced into `VectorizedLoads` in SLP, computing the cost of a gather-vectorized load is carried out. Favour a potentially high valid cost when the type of a group of loads, whose type is a vector of size dependent upon `VF`, may be legalized into a scalar value. Fixes: https://github.com/llvm/llvm-project/issues/68953.	2023-10-27 20:22:54 +02:00
Igor Kirillov	deb429e5b0	Revert "[CodeGen] Improve ExpandMemCmp for more efficient non-register aligned sizes handling (#69942 )" This reverts commit 9bcb30d31813bbdea6b65789f64aed3f0e58d507.	2023-10-27 14:12:45 +00:00
Igor Kirillov	9bcb30d318	[CodeGen] Improve ExpandMemCmp for more efficient non-register aligned sizes handling (#69942 ) * Enhanced the logic of ExpandMemCmp pass to merge contiguous subsequences in LoadSequence, based on sizes allowed in `AllowedTailExpansions`. * This enhancement seeks to minimize the number of basic blocks and produce optimized code when using memcmp with non-register aligned sizes. * Enable this feature for AArch64 with memcmp sizes modulo 8 equal to 3, 5, and 6.	2023-10-27 12:41:08 +01:00
Fangrui Song	8e247b8f47	Replace TypeSize::{getFixed,getScalable} with canonical TypeSize::{Fixed,Scalable}. NFC	2023-10-27 00:30:41 -07:00
KAWASHIMA Takahiro	926173c614	[AArch64] Prevent argument promotion of vector with size > 128 bits (#70034 ) This patch prevents argument promotion from promoting pointers to fixed-length vector types larger than 128 bits like `<8 x float>` into the values of the pointees. Such vector types are used for SVE VLS but there is no ABI for SVE VLS arguments and the backend cannot lower such value arguments. Fixes #69147	2023-10-26 14:51:20 +09:00
zhongyunde 00443407	bf90ffb9b4	[SVE][InstCombine] Delete redundante sel instructions with ptrue svsel(pture, x, y) => x. depend on D121792 Reviewed By: paulwalker-arm, david-arm	2023-10-13 09:20:36 +08:00
Alexey Bataev	263a00fa91	[COST][AARCH64]Fix crash in cost calculation for shuffles. Need to take the mask size as number of elements, not the number of elements of the original fixed vector. Otherwise, the compiler may crash.	2023-10-02 07:49:03 -07:00
David Sherwood	fad69a5009	[Analysis][SVE] Improve cost model for some extending masked loads (#65957 ) When performing a masked load of an unpacked SVE vector type, i.e. nxv8i8, followed by a zero- or sign-extend to an illegal wide type such as nxv8i32 we typically end up with a combination of an extending masked load and pair(s) of uunpklo/hi or sunpklo/hi instructions. For example, see test @masked_sload_8i8_8i32 in file CodeGen/AArch64/sve-masked-ldst-sext.ll where %aval = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8(... %aext = sext <vscale x 8 x i8> %aval to <vscale x 8 x i32> gets lowered to ld1sb { z1.h }, ... sunpklo z0.s, z1.h sunpkhi z1.s, z1.h Currently the cost for the 'sext' operation in the example above is 1, whereas this patch changes it to 2 to reflect the pair of instructions required. Similarly, when doing a masked load of a nxv8i8 and extending to nxv8i64 the cost is changed to 6 to reflect the 6 unpacks required.	2023-10-02 10:50:56 +01:00
zhongyunde	f41223eeca	[AArch64][SVE2] Delete an unused parameter for isExtPartOfAvgExpr, NFC Depend on D157628, which set the cost of extends 0 because they will fold into the s/urhadd. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D159273	2023-09-01 23:40:52 +08:00
Sander de Smalen	7e815dd76d	[AArch64][SME] Create new interface for isSVEAvailable. When a function is compiled to be in Streaming(-compatible) mode, the full set of SVE instructions may not be available. This patch adds an interface to query that and changes the codegen for FADDA (not legal in Streaming-SVE mode) to instead be expanded for fixed-length vectors, or otherwise not to code-generate for scalable vectors. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D156109	2023-09-01 12:00:36 +00:00
Sander de Smalen	0a32a999ae	[AArch64][SME] NFC: Rename hasNewZAInterface to hasNewZABody. __arm_new_za is a declaration attribution, not a type attribute, and is therefore not part of the interface of a function.	2023-08-30 13:14:42 +00:00
Kerry McLaughlin	9a98ab589a	[AArch64][SVE2] Change the cost of extends with S/URHADD to 0 When SVE2 is enabled, we can combine an add of 1, add & shift right by 1 to a single s/urhadd instruction. If the operands to the adds are extended, these extends will fold into the s/urhadd and their costs should be 0. Reviewed By: david-arm, dtemirbulatov Differential Revision: https://reviews.llvm.org/D157628	2023-08-29 12:24:47 +00:00
Alexey Bataev	9a207578ac	[TTI]Add InsertSubvector pattern in improveShuffleKindFromMask(). It improves shuffle instructions estimation and improves vectorization outcome. Differential Revision: https://reviews.llvm.org/D157425	2023-08-18 13:47:01 -07:00
Kerry McLaughlin	5d814b3848	Revert "[AArch64][SVE2] Change the cost of extends with S/URHADD to 0" This reverts commit dda2cd2505301aa626fcd3e8dea2a447227d00ca.	2023-08-14 10:44:13 +00:00
Kerry McLaughlin	dda2cd2505	[AArch64][SVE2] Change the cost of extends with S/URHADD to 0 When SVE2 is enabled, we can combine an add of 1, add & shift right by 1 to a single s/urhadd instruction. If the operands to the adds are extended, these extends will fold into the s/urhadd and their costs should be 0. Reviewed By: dtemirbulatov Differential Revision: https://reviews.llvm.org/D157628	2023-08-14 10:32:06 +00:00
Mel Chen	425e9e81a0	[LV] Rename the Select[I\|F]Cmp reduction pattern to [I\|F]AnyOf. (NFC) Regarding this NFC change, please refer to the discussion in this thread. https://reviews.llvm.org/D150851#4467261 Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D155786	2023-08-03 00:37:19 -07:00
David Green	2a859b2014	[AArch64] Change the cost of vector insert/extract to 2 The cost of vector instructions has always been high under AArch64, in order to add a high cost for inserts/extracts, shuffles and scalarization. This is a conservative approach to limit the scope of unusual SLP vectorization where the codegen ends up being quite poor, but has always been higher than the correct costs would be for any specific core. This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is also overridden for integer vector at the same time, to remove the effect of lane 0 being considered free for integer vectors (something that should only be true for float when scalarizing). The lower insert/extract cost will reduce the cost of insert, extracts, shuffling and scalarization. The adjustments of ScalaizationOverhead will increase the cost on integer, especially for small vectors. The end result will be lower cost for float and long-integer types, some higher cost for some smaller vectors. This, along with the raw insert/extract cost being lower, will generally mean more vectorization from the Loop and SLP vectorizer. We may end up regretting this, as that vectorization is not always profitable. In all the benchmarking I have done this is generally an improvement in the overall performance, and I've attempted to address the places where it wasn't with other costmodel adjustments. Differential Revision: https://reviews.llvm.org/D155459	2023-07-28 21:26:50 +01:00
David Green	8da62b865f	[AArch64] Basic vector bswap costs This adds some basic vector bswap costs, providing the type is supported. Differential Revision: https://reviews.llvm.org/D155806	2023-07-21 08:48:53 +01:00
Sander de Smalen	08fd44b300	[AArch64] Force streaming-compatible codegen when attributes are set. Before this patch, the only way to generate streaming-compatible code was to use the `-force-streaming-compatible-sve` flag, but the compiler should also avoid the use of instructions invalid in streaming mode when a function has the aarch64_pstate_sm_enabled/compatible attribute. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D155428	2023-07-18 10:26:00 +00:00
Sander de Smalen	ec6af93d02	[AArch64] NFC: Replace 'forceStreamingCompatibleSVE' with 'isNeonAvailable'. The AArch64Subtarget interface 'isNeonAvailable' is more appropriate going forward, as we may also want to generate 'streaming SVE' code (not just 'streaming-compatible SVE' code), but here we must still make sure not to use NEON instructions which are invalid in streaming SVE mode.	2023-07-17 08:24:10 +00:00
David Green	1712ae6709	[AArch64] Improve cost of umull from known bits As in D140287, we can now generate umull from mul(zext(x), y) in cases where we know that the top bits of y are zero. This teaches that to the cost model, adjusting how isWideningInstruction detects mul operations that can extend both operands. This helps for constants and other cases where the operands of the mul are known to be extended, but not directly extends. Differential Revision: https://reviews.llvm.org/D154936	2023-07-12 13:13:06 +01:00
Tuan Chuong Goh	e36dd3ea8a	[AArch64] Fix cost modelling for SVE Min/Max Intrinsics Add more legal types for SMIN, SMAX, UMIN, UMAX in cost modelling for AArch64 Differential Revision: https://reviews.llvm.org/D154622	2023-07-12 07:46:12 +01:00
Youngsuk Kim	f69b9b7cce	[llvm] Remove uses of Type::getPointerTo() (NFC) Partial progress towards removing in-tree uses of `getPointerTo()`, by employing the following options: * Drop the call entirely if the sole purpose of it is to support a no-op bitcast (remove the no-op bitcast as well). * Replace with `PointerType::get()`/`PointerType::getUnqual()`. Also, remove no-op function `EmitBitCastOfLValueToProperType()`. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D154392	2023-07-08 13:05:58 -04:00
David Green	12025cef3e	[CostModel] Use min/max intrinsics for vecreduce.min/max costs This changes the costmodelling of the vecreduce.min/max nodes to use the costs of the relevant min/max intrinsics instead of expanding them to compare and selects. The getMinMaxReductionCost have changed to take a Opcode for the relevant intrinsic, dropping the IsUnsigned and CondTy parameters as they are no longer needed. A follow up patch will add some basic fminimum/fmaximum costmodelling. Differential Revision: https://reviews.llvm.org/D153547	2023-07-04 15:02:30 +01:00
David Green	5106b221c8	[AArch64] Treat the icmp in icmp(and(..), 0) as free As in https://godbolt.org/z/4dafd9Geq, the icmp from an And may use an Ands to set flags, meaning the icmp is free. This could also be done for add/sub, but those patterns often happen in the induction variable of a loop, making them quite performance sensitive. Differential Revision: https://reviews.llvm.org/D153611	2023-07-01 21:59:54 +01:00
Igor Kirillov	17bde328d6	[LV] Add mask support for vectorizing interleaved groups This patch extends LoopVectorize to handle the vectorization of interleaved memory accesses with scalable vectors when mask is required or/and predicated tail folding is enabled. Differential Revision: https://reviews.llvm.org/D152258	2023-06-29 17:50:56 +00:00
Jolanta Jensen	5cd16e2cb7	[NFC SVE ACLE] Remove IR combines that no longer apply. Remove IR combines that no longer apply after the SVE merging intrinsics taking an all active predicate, have been canonicalised to their equivalent undef (_u) variants. Differential Revision: https://reviews.llvm.org/D153415	2023-06-22 10:26:20 +00:00
Jolanta Jensen	ecb07f481b	[SVE ACLE] Implement IR combines to convert intrinsics used for _m C/C++ builtins This patch implements IR combines to convert intrinsics used for _m C/C++ builtins which take an all active predicate to their equivalent _u intrinsic. Differential Revision: https://reviews.llvm.org/D152005	2023-06-21 10:35:13 +00:00
Zhongyunde	cb353dc74e	[LV] Add cost model for simd vector select instructions of type float For simd vector selects, use cmeq + bsl for v2f32/v4f32/v2f64, so their cost are cheep. Fix https://github.com/llvm/llvm-project/issues/63082 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D152523	2023-06-20 13:12:19 +08:00
Paul Walker	b7287a82d3	[SVE][AArch64TTI] Fix invalid mla combine that miscomputes the value of inactive lanes. Consider: add(pg, a, mul_u(pg, b, c)) Although the multiply's inactive lanes are undefined, they don't contribute to the final result. The overall result of the inactive lanes come from "a" and thus the above is another form of mla rather than mla_u.	2023-06-18 13:07:03 +01:00
Paul Walker	c7c71aa123	[NFC][AArch64TTI] Breakout add/sub combines into discrete functions.	2023-06-18 13:07:03 +01:00
Nikita Popov	f9f8517e03	[InstCombine][AArch64] Fix phi insertion point Fix the issue reported at https://reviews.llvm.org/rG724f4a5bac25#inline-9083, by specifying the correct insertion point for the new phi.	2023-06-16 14:58:33 +02:00
Florian Hahn	baebe719a5	[AArch64] Address post-commit comments from D150482. Address @v01dXYZ's comments, thanks!	2023-06-15 16:38:09 +01:00
Graham Hunter	95bfb1902d	[LV][AArch64] Allow (limited) interleaving for scalable vectors This patch uses the (de)interleaving intrinsics introduced in D141924 to handle vectorization of interleaving groups with a factor of 2 for scalable vectors. Reviewed By: fhahn, reames Differential Revision: https://reviews.llvm.org/D145163	2023-06-09 11:42:10 +01:00
zhongyunde	df19d87227	[LV] Add option to tune the cost model, NFC For Neon, the default nonconst stride cost is conservative, and it is a local variable, which is not convenience to to tune the loop vectorize. So I try to use a option, which is similar to SVEGatherOverhead brought in D115143. Fix https://github.com/llvm/llvm-project/issues/63082. Reviewed By: dmgreen, fhahn Differential Revision: https://reviews.llvm.org/D152253	2023-06-07 22:08:29 +08:00
Jolanta Jensen	a963dbb5ac	[SVE ACLE] Extend existing aarch64_sve_mul combines to also act on aarch64_sve_mul_u. Differential Revision: https://reviews.llvm.org/D152004	2023-06-06 15:26:33 +00:00
Jolanta Jensen	dc63b35b02	[SVE ACLE] Extend IR combines for fmul, fsub, fadd to cover _u variants This patch extends existing IR combines for: fmul, fsub and fadd, relying on all active predicate to also apply to their equivalent undef (_u) intrinsics. Differential Revision: https://reviews.llvm.org/D150768	2023-06-02 11:06:57 +00:00
Florian Hahn	e97b8a7e3f	[AArch64] Don't use tbl lowering if ZExt can be folded into user. If the ZExt can be lowered to a single ZExt to the next power-of-2 and the remaining ZExt folded into the user, don't use tbl lowering. Fixes #62620. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D150482	2023-06-02 11:53:04 +01:00
David Green	eb764a7f38	[AArch64] Increase the cost of i1 inserts / extracts i1 inserts will need an extra cset, and i1 extracts need a cmp (or tst) in order to be used. This increase the cost of them a little to account for those extra instructions. https://godbolt.org/z/3c5z4G7Mh Differential Revision: https://reviews.llvm.org/D151189	2023-06-01 10:54:53 +01:00
David Green	e79fac2968	[AArch64] Adjust costs of i1 and/or/xor reductions This expands the reduction cost of i1 and/or/xor, so that larger type sizes get handled by the existing code. For i1 reductions - and will use maxv, or will use minv and xor will use addv, plus the cost of legalizing the type for larger vectors using and/or/xor. The i1 vectors will be legalized to higher width integers (say v16i8), which this overrides the cost of. As with all i1 vectors there is a chance that the types the i1 vector is created with and how it is used will not match, introducing extra extends that are not necessarily costmodelled. https://godbolt.org/z/6Gc9K6b7T Differential Revision: https://reviews.llvm.org/D151184	2023-06-01 09:28:48 +01:00
Craig Topper	6006d43e2d	LLVM_FALLTHROUGH => [[fallthrough]]. NFC Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D150996	2023-05-24 12:40:10 -07:00
Dinar Temirbulatov	7489301c03	[AArch64][LV] Disable maximising bandwidth for streaming compatible sve Fixing last commit by adding actual change to AArch64TargetTransformInfo.cpp Differential Revision: https://reviews.llvm.org/D150336	2023-05-23 13:24:01 +00:00
Sander de Smalen	11926e6149	[SME2/SVE2p1] Extend llvm.aarch64.sve.convert.to/from.svbool to accept target("aarch64.svcount") The convert intrinsics can be used to implement existing operations on svcount_t when the actual bits/content of the predicate register doesn't matter (such as PSEL, which copies the full contents of the first source register to the destination register). Reviewed By: CarolineConcatto, david-arm Differential Revision: https://reviews.llvm.org/D150959	2023-05-22 13:52:18 +00:00
David Sherwood	c7dbe326df	[AArch64][LoopVectorize] Enable tail-folding of simple loops on neoverse-v1 This patch enables the tail-folding of simple loops by default when targeting the neoverse-v1 CPU. Simple loops exclude those with recurrences or reductions or loops that are reversed. New tests have been added here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll In terms of SPEC2017 only one benchmark is really affected when building with "-Ofast -mcpu=neoverse-v1 -flto", which is (+ faster, - slower): 525.x264: +7.0% Differential Revision: https://reviews.llvm.org/D130618	2023-05-18 10:35:57 +00:00
David Sherwood	7beb2ca8fa	[AArch64][NFC] Refactor the tail-folding option This patch does simple refactoring of the tail-folding option in preparation for enabling tail-folding by default for neoverse-v1. It adds a default tail-folding option field to the AArch64Subtarget class that can be set on a per-CPU. Differential Revision: https://reviews.llvm.org/D149659	2023-05-17 08:39:40 +00:00
Jolanta Jensen	105d63a250	[SVE ACLE] Change the lowering of SVE integer builtins Change the lowering of SVE integer mla_x/mls_x and mad_x/msb_x builtins to use dedicated undef (_u) intrinsics. Differential Revision: https://reviews.llvm.org/D150553	2023-05-16 17:44:24 +00:00

1 2 3 4 5 ...

405 Commits