llvm-project

Author	SHA1	Message	Date
Eric Biggers	09058654f6	[RISCV] Remove experimental from Vector Crypto extensions (#74213 ) The RISC-V vector crypto extensions have been ratified. This patch updates the Clang and LLVM support for these extensions to be non-experimental, while leaving the C intrinsics as experimental since the C intrinsics are not yet standardized. Co-authored-by: Brandon Wu <brandon.wu@sifive.com>	2023-12-18 22:04:22 -08:00
David Green	7433120137	[CostModel] Mark ssa_copy as free (#75294 ) These are intrinsics are only used ephemerally and be should be given a zero cost.	2023-12-13 11:24:47 +00:00
David Green	b003fed283	[CostModel] Add some ssa.copy costmodel tests. NFC	2023-12-13 07:26:17 +00:00
Nikita Popov	eecb99c5f6	[Tests] Add disjoint flag to some tests (NFC) These tests rely on SCEV looking recognizing an "or" with no common bits as an "add". Add the disjoint flag to relevant or instructions in preparation for switching SCEV to use the flag instead of the ValueTracking query. The IR with disjoint flag matches what InstCombine would produce.	2023-12-05 14:09:36 +01:00
Simon Pilgrim	761a963dfc	[DAG] narrowExtractedVectorBinOp - ensure we limit late node creation to LegalOperations only (#72130 ) Avoids infinite issues in some upcoming patches to help D152928 - x86 sees a number of regressions that are addressed by extending SimplifyDemandedVectorEltsForTargetNode to cover more binop opcodes	2023-11-20 10:56:41 +00:00
Acim-Maravic	f3138524db	[AMDGPU] Generic lowering for rint and nearbyint (#69596 ) The are three different rounding intrinsics, that are brought down to same instruction. Co-authored-by: Acim Maravic <acim.maravic@amd.com>	2023-11-14 18:49:21 +01:00
Changpeng Fang	8ceb72ffe5	[AMDGPU] make v32i16/v32f16 legal (#70484 ) Some upcoming intrinsics will be using these new types	2023-10-27 15:28:31 -07:00
Antonio Frighetto	138e6c1c86	[AArch64][TTI] Improve `LegalVF` when gather loads are scalarized After determining the cost of loads that could not be coalesced into `VectorizedLoads` in SLP, computing the cost of a gather-vectorized load is carried out. Favour a potentially high valid cost when the type of a group of loads, whose type is a vector of size dependent upon `VF`, may be legalized into a scalar value. Fixes: https://github.com/llvm/llvm-project/issues/68953.	2023-10-27 20:22:54 +02:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Michael Maitland	2483763f1a	[RISCV][CostModel] Recommit VPIntrinsics have same cost as their non-vp counterparts (#68752 ) This was reverted in commit 0abaf3caee88ae74def2c7000aff8e61b24634bb (#67178). This version of the patch includes a fix which was caused by vp-reductions having an extra start value argument which the non-vp counterparts did not have.	2023-10-20 13:10:48 -04:00
Ramkumar Ramachandra	98c90a13c6	ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering (#66924 ) The issue #55208 noticed that std::rint is vectorized by the SLPVectorizer, but a very similar function, std::lrint, is not. std::lrint corresponds to ISD::LRINT in the SelectionDAG, and std::llrint is a familiar cousin corresponding to ISD::LLRINT. Now, neither ISD::LRINT nor ISD::LLRINT have a corresponding vector variant, and the LangRef makes this clear in the documentation of llvm.lrint.* and llvm.llrint.. This patch extends the LangRef to include vector variants of llvm.lrint. and llvm.llrint.*, and lays the necessary ground-work of scalarizing it for all targets. However, this patch would be devoid of motivation unless we show the utility of these new vector variants. Hence, the RISCV target has been chosen to implement a custom lowering to the vfcvt.x.f.v instruction. The patch also includes a CostModel for RISCV, and a trivial follow-up can potentially enable the SLPVectorizer to vectorize std::lrint and std::llrint, fixing #55208. The patch includes tests, obviously for the RISCV target, but also for the X86, AArch64, and PowerPC targets to justify the addition of the vector variants to the LangRef.	2023-10-19 13:05:04 +01:00
Jay Foad	eca2fcbdeb	[AMDGPU] Fix cost of fast unsafe f32 fdiv (#68988 )	2023-10-15 12:25:36 +01:00
Stella Laurenzo	0abaf3caee	Revert "[RISCV][CostModel] VPIntrinsics have same cost as their non-vp counterparts (#67178 )" This reverts commit fc865c20345860f394448c228054beafc22a1d4d. Triggering assert on X86: ``` iree-compile: /work/third_party/llvm-project/llvm/include/llvm/Support/Casting.h:662: decltype(auto) llvm::dyn_cast(From *) [To = llvm::PointerType, From = llvm::Type]: Assertion `detail::isPresent(Val) && "dyn_cast on a non-existent value"' failed. ``` See PR for comments and full stack trace.	2023-10-06 11:43:37 -07:00
Michael Maitland	fc865c2034	[RISCV][CostModel] VPIntrinsics have same cost as their non-vp counterparts (#67178 ) On RISCV, only a few VPIntrinsics have their cost modeled by the VectorIntrinsicCostTable. Even so, none of those entries consider LMUL. All other VPIntrinsics do not have meaningful modeling. This patch models the cost of a VPIntrinsic as the cost of its non-VP counterpart. It is possible that the VP Intrinsic is cheaper than the non-VP version depending on VL. On RISCV, this may be due two reasons (if the instruction is part of a loop): 1. A smaller VL can be used on the last iteration of the loop. 2. The VP instruction may avoid a scalar remainder loop. I have left this as a TODO since I think this change puts us on the right path of modeling the cost of a VPInstruction, and it isn't entirely clear to me how much of a discount we should give to a known VL<VLMAX or what to do when VL is unknown at compile time.	2023-10-05 10:10:02 -04:00
Simon Pilgrim	baecc9e997	[CostModel][X86] getShuffleCost - add fallback (to half vector) for bfloat vector shuffle costs Add initial half/bfloat broadcast shuffles test coverage (more to follow) Fixes #68117 - which was stuck in a loop between getting scalarized insert/extract costs for the shuffle and then trying to convert a bfloat insert into a shuffle again......	2023-10-05 11:12:40 +01:00
David Sherwood	fad69a5009	[Analysis][SVE] Improve cost model for some extending masked loads (#65957 ) When performing a masked load of an unpacked SVE vector type, i.e. nxv8i8, followed by a zero- or sign-extend to an illegal wide type such as nxv8i32 we typically end up with a combination of an extending masked load and pair(s) of uunpklo/hi or sunpklo/hi instructions. For example, see test @masked_sload_8i8_8i32 in file CodeGen/AArch64/sve-masked-ldst-sext.ll where %aval = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8(... %aext = sext <vscale x 8 x i8> %aval to <vscale x 8 x i32> gets lowered to ld1sb { z1.h }, ... sunpklo z0.s, z1.h sunpkhi z1.s, z1.h Currently the cost for the 'sext' operation in the example above is 1, whereas this patch changes it to 2 to reflect the pair of instructions required. Similarly, when doing a masked load of a nxv8i8 and extending to nxv8i64 the cost is changed to 6 to reflect the 6 unpacks required.	2023-10-02 10:50:56 +01:00
Simon Pilgrim	f8734c71f1	[CostModel][X86] fround.ll - add <2 x float> test coverage	2023-09-27 17:28:50 +01:00
Ramkumar Ramachandra	7c128f6d0e	CostModel/RISCV: tweak cost of vector ctpop under ZVBB (#67020 ) Under RISCV experimental-zvbb, vector variants of llvm.ctpop lower to a single instruction: vcpop. The cost-model does not check for the ZVBB extension, and always associates a high cost to vector variants of llvm.ctpop. Fix this defect.	2023-09-27 13:00:33 +01:00
Ramkumar Ramachandra	65de98f4de	CostModel/RISCV: tweak test for ctpop, with/without ZVBB (#67013 ) Vector ctpop only exists under ZVBB, but ZVBB is unaccounted for in the cost-model of ctpop. Document this defect with an additional RUN line in the test for ctpop, showing identical costs with/without ZVBB. A follow-up patch could fix this defect.	2023-09-27 12:16:54 +01:00
Sergey Kachkov	3d7df0a547	[RISCV][CostModel] Estimate cost of Extract/InsertElement with non-constant index when vector instructions are not available (#67334 ) This patch fixes the compilation time issue of matrix-types-spec test from test-suite. Reproduction of the problem: ``` clang++ -DNDEBUG --target=riscv64-linux-gnu --sysroot=<sysroot path> --gcc-toolchain=<gcc path> -O2 -fenable-matrix <test-suite-path>/SingleSource/UnitTests/matrix-types-spec.cpp ``` On my machine, compilation takes 50.44s. In comparison, the same test with RVV (-march=rv64gcv) compiles in 3.06s, and for x86-64 target it takes 1.71s. It turns out that the main issue is unrolling of loop in multiplySpec function, that has extractelements with non-constant index: ``` for.body9.i: ; preds = %for.body9.i, %for.cond6.preheader.i %indvars.iv.i92 = phi i64 [ 0, %for.cond6.preheader.i ], [ %indvars.iv.next.i93, %for.body9.i ] %Elt.033.i = phi double [ 0.000000e+00, %for.cond6.preheader.i ], [ %80, %for.body9.i ] %77 = mul nuw nsw i64 %indvars.iv.i92, 25 %78 = add nuw nsw i64 %77, %indvars.iv39.i91 %matrixext.i = extractelement <475 x double> %62, i64 %78 %79 = add nuw nsw i64 %indvars.iv.i92, %74 %matrixext13.i = extractelement <209 x double> %73, i64 %79 %80 = tail call double @llvm.fmuladd.f64(double %matrixext.i, double %matrixext13.i, double %Elt.033.i) %indvars.iv.next.i93 = add nuw nsw i64 %indvars.iv.i92, 1 %exitcond.not.i94 = icmp eq i64 %indvars.iv.next.i93, 19 br i1 %exitcond.not.i94, label %for.cond.cleanup8.i, label %for.body9.i, !llvm.loop !21 ``` When RVV is supported, extractelement/insertelement with non-constant index can be lowered quite efficiently with vslidedown/vslideup; otherwise it's lowered via stack memory operations, i.e. for extractelement each vector element is stored on stack and then the needed element is loaded back; for insertelement is stores all vector elements, rewrites the required element value and then loads vector back. Currently the cost of such expensive operation is estimated as zero, so loop unroll processes the loop more aggresively. The proper estimation of cost (in a way like in X86 target) prohibits unrolling of this loop and fixes compilation time (2.77s on my machine).	2023-09-27 13:12:42 +03:00
Sergey Kachkov	0a5d52a757	[RISCV][CostModel] Add getCFInstrCost RISC-V implementation (#65599 ) This patch implements getCFInstrCost TTI hook that mostly affects LoopVectorizer decisions. It sets zero cost for PHI nodes and zero throughput cost for branches (assuming that branches are likely to be predicted). The implementation is similar to X86/AArch64/PowerPC targets and reduces loop cost by excluding induction PHIs/loop latch branches, which in turn leads to selecting smaller vectorization factor.	2023-09-25 12:26:01 +03:00
Ramkumar Ramachandra	88800f79e0	CostModel/RISCV: fix typos in fround test, vector length (#67025 ) There are several typos in fround.ll, persumably caused by copy-pasting, where there is a strange nvx5* type. From the surrounding code, it is clear that this was intended to be nvx4*. Fix these typos.	2023-09-21 17:04:28 +01:00
Paul Walker	162bafc8b7	[SVE] Fix crash when costing getelementptr with scalable target type. Fixes #66594	2023-09-18 12:48:30 +00:00
Matt Arsenault	72a7024add	AMDGPU: Correctly lower llvm.sqrt.f32 Make codegen emit correctly rounded sqrt by default. Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare based on !fpmath, like the fdiv case. Hack around visitation ordering problems from AMDGPUCodeGenPrepare using forward iteration instead of a well behaved combiner. https://reviews.llvm.org/D158129	2023-09-12 23:22:54 +03:00
David Sherwood	7d574ffa09	[NFC][Analysis] Run update_analyze_test_checks.py on Analysis/CostModel/AArch64/sve-ldst.ll	2023-09-11 10:39:22 +00:00
David Green	233fb987fc	[ARM] Improve bitwise reduction costs This adds some basic and/or/xor reduction costs for NEON/MVE, handling them like other reductions where vector operations are used to reduce to legal sizes, followed by an optional VREV+VAND/VORR/VEOR step and scalarization from there.	2023-09-04 16:22:52 +01:00
David Green	4cef24a886	[ARM] Improve reduction integer min/max costs This adds some basic smin/smax/umin/umax reduction costs for MVE/NEON, similar to the existing Add reduction costs. They follow the same style as Add reductions, but include a higher cost as the costs tend to be dependant on the element size for vminv/vmaxv. These costs may not be precise, but will be more inline than the default that extracts each element.	2023-09-04 15:47:06 +01:00
David Green	2955cc15ff	[ARM] Improve costs for FMin/Max reductions Similar to the other reductions, this changes the cost of fmin/fmax reductions under MVE/NEON to perform vector operations until the types need to be scalarized. The fp16 vectors can perform a VREV+FMIN/FMAX to skip a step of the reduction, and otherwise need lanewise extract fro the top lanes.	2023-09-04 12:49:13 +01:00
David Green	4530f02916	[ARM] Improve reduction fadd/fmul costs This adds some basic fadd/fmul reduction costs for MVE/NEON. It reduces by halving the vector size until it it gets scalarized, with some additional costs for fp16 which may require extracting the top lanes. Differential Revision: https://reviews.llvm.org/D159367	2023-09-04 11:37:14 +01:00
David Green	5afb161ed5	[ARM] Add various vector reduce costmodel tests. NFC See D159367 and the followups.	2023-09-04 10:50:58 +01:00
Philip Reames	aea452841b	[RISCV] Improve cost model test coverage for insert/extract element In particular, high LMULs, constant offsets within high LMUL, and types which require splitting. Note that most of these are way off with current lowering.	2023-08-30 10:34:02 -07:00
Kerry McLaughlin	9a98ab589a	[AArch64][SVE2] Change the cost of extends with S/URHADD to 0 When SVE2 is enabled, we can combine an add of 1, add & shift right by 1 to a single s/urhadd instruction. If the operands to the adds are extended, these extends will fold into the s/urhadd and their costs should be 0. Reviewed By: david-arm, dtemirbulatov Differential Revision: https://reviews.llvm.org/D157628	2023-08-29 12:24:47 +00:00
Harvin Iriawan	db158c7c83	[AArch64] Update generic sched model to A510 Refresh of the generic scheduling model to use A510 instead of A55. Main benefits are to the little core, and introducing SVE scheduling information. Changes tested on various OoO cores, no performance degradation is seen. Differential Revision: https://reviews.llvm.org/D156799	2023-08-21 12:25:15 +01:00
Alexey Bataev	9a207578ac	[TTI]Add InsertSubvector pattern in improveShuffleKindFromMask(). It improves shuffle instructions estimation and improves vectorization outcome. Differential Revision: https://reviews.llvm.org/D157425	2023-08-18 13:47:01 -07:00
Fraser Cormack	c058eb998a	[AArch64] Fix crash when neither Neon nor SVE are enabled The subtarget was unconditionally reporting that SVE was to be used to lower vectors when Neon was unavailable, even when SVE itself was unavailable. This decision leads other parts of the compiler to crash, e.g., when querying SVE vector sizes. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D158179	2023-08-17 16:07:27 +01:00
Roland Froese	4d425f8663	[PowerPC] vector cost model add cost to extract i1 Try to avoid some unprofitable predication on PPC. Recognize in the cost model that computing on i1 values will require extra mask or compare operation. Differential Revision: https://reviews.llvm.org/D155876	2023-08-14 17:04:11 -04:00
Josh Stone	85e4ee15d3	[SystemZ] Avoid type legalization on structs In SystemZTTIImpl::getMemoryOpCost, the call to getNumberOfParts will run type legalization, which can't handle structs. So before that, we check for an unknown value type and forward to BaseT, just like many other targets do in this situation. https://bugzilla.redhat.com/show_bug.cgi?id=2224885 Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D156379	2023-08-07 17:52:15 -07:00
Craig Topper	814250191d	[RISCV] Add vector legalization for fmaximum/fminimum. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D156937	2023-08-04 08:07:14 -07:00
David Green	2a859b2014	[AArch64] Change the cost of vector insert/extract to 2 The cost of vector instructions has always been high under AArch64, in order to add a high cost for inserts/extracts, shuffles and scalarization. This is a conservative approach to limit the scope of unusual SLP vectorization where the codegen ends up being quite poor, but has always been higher than the correct costs would be for any specific core. This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is also overridden for integer vector at the same time, to remove the effect of lane 0 being considered free for integer vectors (something that should only be true for float when scalarizing). The lower insert/extract cost will reduce the cost of insert, extracts, shuffling and scalarization. The adjustments of ScalaizationOverhead will increase the cost on integer, especially for small vectors. The end result will be lower cost for float and long-integer types, some higher cost for some smaller vectors. This, along with the raw insert/extract cost being lower, will generally mean more vectorization from the Loop and SLP vectorizer. We may end up regretting this, as that vectorization is not always profitable. In all the benchmarking I have done this is generally an improvement in the overall performance, and I've attempted to address the places where it wasn't with other costmodel adjustments. Differential Revision: https://reviews.llvm.org/D155459	2023-07-28 21:26:50 +01:00
Simon Pilgrim	bbfdb8cc2d	[CostModel][X86] Add scalar rotate-by-immediate costs As noted on #63980 rotate by immediate amounts is much cheaper than variable amounts. This still needs to be expanded to vector rotate cases, and we need to add reasonable funnel-shift costs as well (very tricky as there's a huge range in CPU behaviour for these).	2023-07-27 16:54:30 +01:00
Matt Arsenault	e3fd8f83a8	AMDGPU: Correctly expand f64 sqrt intrinsic rocm-device-libs and llpc were avoiding using f64 sqrt intrinsics in favor of their own expansions. Port the expansion into the backend. Both of these users should be updated to call the intrinsic instead. The library and llpc expansions are slightly different. llpc uses an ldexp to do the scale; the library uses a multiply. Use ldexp to do the scale instead of the multiply. I believe v_ldexp_f64 and v_mul_f64 are always the same number of cycles, but it's cheaper to materialize the 32-bit integer constant than the 64-bit double constant. The libraries have another fast version of sqrt which will be handled separately. I am tempted to do this in an IR expansion instead. In the IR we could take advantage of computeKnownFPClass to avoid the 0-or-inf argument check.	2023-07-25 07:54:11 -04:00
Craig Topper	49429783b0	[RISCV] Add lowering for scalar fmaximum/fminimum. Unlike fmaxnum and fminnum, these operations propagate nan and consider -0.0 to be less than +0.0. Without Zfa, we don't have a single instruction for this. The lowering I've used forces the other input to nan if one input is a nan. If both inputs are nan, they get swapped. Then use the fmax or fmin instruction. New ISD nodes are needed because fmaxnum/fminnum to not define the order of -0.0 and +0.0. This lowering ensures the snans are quieted though that is probably not required in default environment). Also ensures non-canonical nans are canonicalized, though I'm also not sure that's needed. Another option could be to use fmax/fmin and then overwrite the result based on the inputs being nan, but I'm not sure we can do that with any less code. Future work will handle nonans FMF, and handling the case where we can prove the input isn't nan. This does fix the crash in #64022, but we need to do more work to avoid scalarization. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D156069	2023-07-24 13:46:35 -07:00
David Green	8da62b865f	[AArch64] Basic vector bswap costs This adds some basic vector bswap costs, providing the type is supported. Differential Revision: https://reviews.llvm.org/D155806	2023-07-21 08:48:53 +01:00
David Green	4f578e9407	[AArch64] Update bswap cost test. NFC See D155806	2023-07-20 13:53:18 +01:00
Craig Topper	3055c5815a	[RISCV] Upgrade Zvfh version to 1.0 and move out of experimental state. This has been ratified according to https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions Differential Revision: https://reviews.llvm.org/D155668	2023-07-19 10:03:57 -07:00
Philip Reames	7cc6b80d9a	[RISCV][CostModel] Model vrgather.vv as being quadradic in LMUL vrgather.vv across multiple vector registers (i.e. LMUL > 1) requires all to all data movement. This includes two conceptual sets of changes: For permutes, we were modeling these as being linear in LMUL. For reverse, we were modeling them as being fixed cost in LMUL. Both were wrong, and have been adjusted to O(LMUL^2). Noticed via code inspection while looking at something else. Its worth asking whether we should be lowering reverse to something other than a vrgather at high LMULs. That shuffle is quite expensive. (Future work) Differential Revision: https://reviews.llvm.org/D152019	2023-07-18 11:52:34 -07:00
David Green	faca9fdc4f	[AArch64] Regenerate CostModel tests with update_analyze_test_checks. NFC	2023-07-17 10:23:27 +01:00
Nikita Popov	edb2fc6dab	[llvm] Remove explicit -opaque-pointers flag from tests (NFC) Opaque pointers mode is enabled by default, no need to explicitly enable it.	2023-07-12 14:35:55 +02:00
David Green	1712ae6709	[AArch64] Improve cost of umull from known bits As in D140287, we can now generate umull from mul(zext(x), y) in cases where we know that the top bits of y are zero. This teaches that to the cost model, adjusting how isWideningInstruction detects mul operations that can extend both operands. This helps for constants and other cases where the operands of the mul are known to be extended, but not directly extends. Differential Revision: https://reviews.llvm.org/D154936	2023-07-12 13:13:06 +01:00
Tuan Chuong Goh	e36dd3ea8a	[AArch64] Fix cost modelling for SVE Min/Max Intrinsics Add more legal types for SMIN, SMAX, UMIN, UMAX in cost modelling for AArch64 Differential Revision: https://reviews.llvm.org/D154622	2023-07-12 07:46:12 +01:00

1 2 3 4 5 ...

1529 Commits