llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	fb8038db73	[TTI] getExtendedReductionCost - replace std::optional<FastMathFlags> args with FastMathFlags Followup to D148149 where it was noticed that the std::optional wrapper wasn't helping with anything (we can just use an empty FastMathFlags()).	2023-04-13 11:26:28 +01:00
Simon Pilgrim	9e30b87afb	[TTI] getMinMaxReductionCost - add FastMathFlag argument Similar to the getArithmeticReductionCost / getExtendedReductionCost calls (which really don't need to use std::optional<>). This will be necessary to correct recognize fast/nnan fmax/fmul reductions which can avoid nan handling - which will allow us to remove the fmax/fmin special case in X86TTIImpl::getMinMaxCost and use getIntrinsicInstrCost like we do for integer reductions (63c3895327839ba5b57f5b99ec9e888abf976ac6). Differential Revision: https://reviews.llvm.org/D148149	2023-04-13 10:42:42 +01:00
Philip Reames	b0e0c1e46c	[RISCV][TTI] Call improveShuffleKindFromMask like all the other backends No test diff; noticed via inspection.	2023-04-12 17:43:36 -07:00
Philip Reames	27b6ddbf6e	[RISCV] Speculative fix for issue reported against D147470 post commit	2023-04-05 17:25:42 -07:00
Philip Reames	0e6d7eceaa	[RISCV][TTI] Cost model for SK_ExtractSubvector Differential Revision: https://reviews.llvm.org/D147618	2023-04-05 09:56:43 -07:00
Philip Reames	37646a2c28	[RISCV] Account for LMUL in memory op costs Generally, the cost of a memory op will scale with the number of vector registers accessed. Machines might exist which have a narrow memory access than vector register width, but machines with a wider memory access width than vector register width seem unlikely. I noticed this because we were preferring wide loads + deinterleaves on examples where the cost of a short gather (actually a strided load) would be better. Touching 8 vector registers instead of doing a 4 element gather is not a good tradeoff. Differential Revision: https://reviews.llvm.org/D147470	2023-04-05 07:58:56 -07:00
Philip Reames	8865ed4dbb	[RISCV][TTI] Cost SK_Tranpose as a generic two element shuffle This matches the actual lowering. The previous costing was "as if" it had been fully scalarized.	2023-04-04 12:52:50 -07:00
Luke Lau	273f736fcc	[RISCV] Add FIXME comment about expensive vector mem op costs	2023-04-04 16:45:54 +01:00
Luke Lau	971a4501f7	[RISCV] Model vlseg/vsseg in interleaved memory ops If the legalized type is a legal interleaved access type (i.e. there's a supported vlseg/vsseg instruction for it), the interleaved access pass will pick any interleaved memory op (wide load + shuffles) and lower it into a vlseg/vsseg intrinsic. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D146522	2023-04-04 15:05:14 +01:00
Philip Reames	57492b1eeb	[RISCV] Cost model for general case of dual vector permute The cost model was not accounting for the fact that we can generate a dual vrgather + an index expression sequence instead of scalarizing. A couple cases to call out: 1) I did not model the difference between vrgather and vrgatherei16. The result is the constant pool cost can be slightly understated on RV32. I don't think we care, but if someone disagrees, this would be easy to add. 2) Our current codegen for i8 vectors longer than 256 (which is the limit of what this costs) has some room for improvement. 3) As indicated by the regression in reported cost for <2 x iN> vectors, our current vector lowering is missing support for a sub-case where scalarize-and-insert is actually faster than the generic fallback path. Differential Revision: https://reviews.llvm.org/D147063	2023-03-29 07:36:35 -07:00
Philip Reames	4b7b612c5d	[RISCV][TTI] Extract getConstantPoolLoadCost helper routine [nfc] We had 3 copies of this code, and I am about to add a fourth.	2023-03-28 07:48:09 -07:00
Philip Reames	64f69e453e	[RISCV] Cost model for general case of single vector permute The cost model was not accounting for the fact that we can generate vrgather + an index expression. Two cases to call out. 1) I did not model the difference between vrgather and vrgatherei16. The result is the constant pool cost can be slightly understated on RV32. I don't think we care, but if someone disagrees, this would be easy to add. 2) Our current codegen for i8 vectors longer than 256 (which is the limit of what this costs) has some room for improvement. Differential Revision: https://reviews.llvm.org/D147000	2023-03-28 07:34:11 -07:00
Craig Topper	29463612d2	[RISCV] Replace RISCV -> RISC-V in comments. NFC To be consistent with RISC-V branding guidelines https://riscv.org/about/risc-v-branding-guidelines/ Think we should be using RISC-V where possible. More patches will follow. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D146449	2023-03-27 09:50:17 -07:00
Luke Lau	f23ea4cbd4	[RISCV] Model select and insertsubvector shuffle kinds Selects get lowered to a vmerge with a mask, and insertsubvectors get lowered to a vslideup. Differential Revision: https://reviews.llvm.org/D146747	2023-03-24 17:30:32 +00:00
Luke Lau	8d16c6809a	[RISCV] Increase default vectorizer LMUL to 2 After some discussion and experimentation, we have seen that changing the default number of vector register bits to LMUL=2 strikes a sweet spot. Whilst we could be clever here and make the vectorizer smarter about dynamically selecting an LMUL that a) Doesn't affect register pressure b) Suitable for the microarchitecture we would need to teach its heuristics about RISC-V register grouping specifics. Instead this just does the easy, pragmatic thing by changing the default to a safe value that doesn't affect register pressure signifcantly[1], but should increase throughput and unlock more interleaving. [1] Register spilling when compiling sqlite at various levels of `-riscv-v-register-bit-width-lmul`: LMUL=1 2573 spills LMUL=2 2583 spills LMUL=4 2819 spills LMUL=8 3256 spills Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D143723	2023-03-23 10:33:50 +00:00
Luke Lau	b9238abe05	[RISCV] Enable interleaved access vectorization The loop vectorizer supports generating interleaved loads and stores via shuffle patterns for fixed length vectors. This enables it for RISC-V, since interleaved shuffle patterns can be lowered to vlseg/vsseg in https://reviews.llvm.org/D145022 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D145155	2023-03-16 15:48:55 +00:00
Luke Lau	4e1ba0c518	[RISCV] Don't accidentally match deinterleave masks as interleaves Consider a shuffle mask of <0, 2>: This is one of two deinterleave masks to deinterleave a vector of 4 elements with factor 2. Unfortunately, this is also technically an interleave mask, where two subvectors of length 1 at indexes 0 and 2 will be interleaved. This is because a mask can interleave non-contiguous subvectors: e.g. <0, 6, 4, 1, 7, 5> on a vector of size 8: ``` <0 1 2 3 4 5 6 7> indices ^ ^ ^ ^ ^ ^ 0 0 2 2 1 1 deinterleaved subvector ``` This means that deinterleaving shuffles can accidentally be costed as interleaves. And it's incorrect in the context of interleaves, because the only interleave shuffles we model at the moment are single permutation shuffles, i.e. we are interleaving the first vector below and ignoring the second: shufflevector <2 x i32> %v0, <2 x i32> poison, <2 x i32> <i32 0, i32 2> A mask of <0, 2> interleaves across both vectors. The fix here is to set NumInputElts correctly: We were setting it to twice the mask length, i.e. using both input vectors. But in fact we're actually only using the first vector here, and isInterleaveMask actually already has logic to ensure that the mask indices stay within the bounds of the input vectors. This lacks a test case due to how we're unable to test deinterleave shuffles (because they are length changing), but is covered in the tests in D145155 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D146176	2023-03-16 15:48:51 +00:00
Luke Lau	fc220a1aa9	Revert "[RISCV] Enable interleaved access vectorization" This reverts commit acc03ad10af4f379a644e3956cb9aca54e40696c.	2023-03-15 22:00:48 +00:00
Luke Lau	acc03ad10a	[RISCV] Enable interleaved access vectorization The loop vectorizer supports generating interleaved loads and stores via shuffle patterns for fixed length vectors. This enables it for RISC-V, since interleaved shuffle patterns can be lowered to vlseg/vsseg in https://reviews.llvm.org/D145022 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D145155	2023-03-15 21:56:30 +00:00
Philip Reames	4e3608bf29	[RISCV][TTI] Fix indentation and remove tabs [nfc]	2023-03-15 09:29:03 -07:00
Ben Shi	cb45be2b4f	[RISCV][NFC] Combine identical switch cases in TTI Reviewed By: craig.topper, asb Differential Revision: https://reviews.llvm.org/D146008	2023-03-15 08:27:58 +08:00
Luke Lau	a9d9616c0d	[RISCV][NFC] Share interleave mask checking logic This adds two new methods to ShuffleVectorInst, isInterleave and isInterleaveMask, so that the logic to check if a shuffle mask is an interleave can be shared across the TTI, codegen and the interleaved access pass. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D145971	2023-03-14 11:02:52 +00:00
Philip Reames	ca0cd670dc	[RISCV] Improve SK_Reverse shuffle costs for fixed length vectors As noted by @luke (https://reviews.llvm.org/D145953#inline-1409312), we were accounting for the cost of vector element size using vlenb whereas the expression can be constant folded for fixed length vectors. Differential Revision: https://reviews.llvm.org/D145973	2023-03-13 15:17:42 -07:00
Philip Reames	64fc41ad82	[RISCV] Extend SK_Broadcast costing to scalable vectors The existing scalable costing was just bad. No LMUL cost, no i1 specific costing, etc.. We had updated the fixed cost model, but none of the code is actually fixed length specific. Moving it down handles the scalable cases too.	2023-03-13 11:07:26 -07:00
Philip Reames	a37dfbb79c	[RISCV] Fallback to scalable lowering costs for fixed length vectors Fixed vector costs may be more precise, but the actual lowering will use scalable vectors if nothing better is available. During review, we noticed a case where fixed vector reverse can be improved cost model wise, that will follow seperately. Differential Revision: https://reviews.llvm.org/D145953	2023-03-13 10:07:57 -07:00
Philip Reames	cfcf274245	[RISCV] Inline and delete RISCVTTIImpl::getSpliceCost [nfc] The code structure was copied from AArch64 which has a much more complicated splice cost model.	2023-03-13 08:55:32 -07:00
Philip Reames	21bca796d7	[RISCV] Use switch in RISCVTargetTransformInfo::getShuffleCost [nfc] Refactoring in advance of a semantic change.	2023-03-13 08:40:47 -07:00
Luke Lau	c417266db5	[RISCV] Model interleave and deinterleave shuffles in cost model Interleave and deinterleave shuffles are lowered by a more efficient sequence if the element size is smaller than ELEN. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D145678	2023-03-10 01:10:00 +00:00
ShihPo Hung	fb661e2554	[CostModel][RISCV] Model code size cost for reduction Since code-size cost doesn't scale linearly with LMUL, this change is to separate it from throughput. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D142068	2023-03-05 17:58:45 -08:00
Kazu Hirata	a28b252d85	Use APInt::getSignificantBits instead of APInt::getMinSignedBits (NFC) Note that getMinSignedBits has been soft-deprecated in favor of getSignificantBits.	2023-02-19 23:56:52 -08:00
Kazu Hirata	e078201835	[Target] Use llvm::count{l,r}_{zero,one} (NFC)	2023-01-28 09:23:07 -08:00
Kazu Hirata	7a3e87298e	[RISCV] Use llvm::bit_floor and std::clamp (NFC)	2023-01-28 00:49:38 -08:00
Philip Reames	a9871772a8	[RISCV][LSR] Treat number of instructions as dominate factor in LSR cost decisions This matches the behavior from a number of other targets, including e.g. X86. This does have the effect of increasing register pressure slightly, but we have a relative abundance of registers in the ISA compared to other targets which use the same heuristic. The motivation here is that our current cost heuristic treats number of registers as the dominant cost. As a result, an extra use outside of a loop can radically change the LSR result. As an example consider test4 from the recently added test/Transforms/LoopStrengthReduce/RISCV/lsr-cost-compare.ll. Without a use outside the loop (see test3), we convert the IV into a pointer increment. With one, we leave the gep in place. The pointer increment version both decreases number of instructions in some loops, and creates parallel chains of computation (i.e. decreases critical path depth). Both are generally profitable. Arguably, we should really be using a more sophisticated model here - such as e.g. using profile information or explicitly modeling parallelism gains. However, as a practical matter starting with the same mild hack that other targets have used seems reasonable. Differential Revision: https://reviews.llvm.org/D142227	2023-01-24 11:42:37 -08:00
ShihPo Hung	5fb3a57ea7	[Cost] Add CostKind to getVectorInstrCost and its related users LoopUnroll estimates the loop size via getInstructionCost(), but getInstructionCost() cannot pass CostKind to getVectorInstrCost(). And so does getShuffleCost() to getBroadcastShuffleOverhead(), getPermuteShuffleOverhead(), getExtractSubvectorOverhead(), and getInsertSubvectorOverhead(). To address this, this patch adds an argument CostKind to these functions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D142116	2023-01-21 05:29:24 -08:00
liqinweng	1f8746cc80	[RISCV][CostModel] Add half type support for the cost model of sqrt/fabs 1. Refactor for costs of sqrt/fabs 2. Add half type support for the cost model of sqrt/fabs Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D132908	2023-01-09 12:57:03 +08:00
liqinweng	f3408739da	[RISCV][CostModel] Add cost model for integer abs Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D132999	2023-01-09 11:38:24 +08:00
Alexey Bataev	9b5f62685a	[SLP]Fix cost of the broadcast buildvector/gather. Need to include the cost of the initial insertelement to the cost of the broadcasts. Also, need to adjust the cost of the gather/buildvector if the element is inserted into poison/undef vector. Differential Revision: https://reviews.llvm.org/D140498	2023-01-06 09:25:05 -08:00
Guillaume Chatelet	87b6b347fc	Revert D141134 "[NFC] Only expose getXXXSize functions in TypeSize" The patch should be discussed further. This reverts commit dd56e1c92b0e6e6be249f2d2dd40894e0417223f.	2023-01-06 15:27:50 +00:00
Guillaume Chatelet	dd56e1c92b	[NFC] Only expose getXXXSize functions in TypeSize Currently 'TypeSize' exposes two functions that serve the same purpose: - getFixedSize / getFixedValue - getKnownMinSize / getKnownMinValue source : `bf82070ea4/llvm/include/llvm/Support/TypeSize.h (L337-L338)` This patch offers to remove one of the two and stick to a single function in the code base. Differential Revision: https://reviews.llvm.org/D141134	2023-01-06 15:24:52 +00:00
Craig Topper	239a174d92	[RISCV] Prevent constant hoisting for or/and/xor that can use bseti/bclri/binvi. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D140928	2023-01-05 11:18:31 -08:00
Yeting Kuo	1e9e1b9cf8	[VP][RISCV] Add vp.ctlz/cttz and RISC-V support. The patch also adds expandVPCTLZ and expandVPCTTZ to expand vp.ctlz/cttz nodes and the cost model of vp.ctlz/cttz. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D140370	2023-01-04 15:15:01 +08:00
Yeting Kuo	ad68586a37	[VP][RISCV] Add vp.ctpop and RISC-V support. The patch also adds expandVPCTPOP in TargetLowering to expand VP_CTPOP nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D139920	2022-12-14 09:47:44 +08:00
Yeting Kuo	47b9da72e0	[VP][RISCV] Add vp.bitreverse and RISC-V support. The patch also added function expandVPBITREVERSE to expand ISD::VP_BITREVERSE nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D139697	2022-12-12 10:58:44 +08:00
Yeting Kuo	0f8c761c48	[VP][RISCV] Recommit "Add vp.fshl/fshr and RISC-V support." This reverts commit 7883e5b061bdbbe8bee5f479ebe911db5045b7e9. The original commit was reverted that it didn't update test files after D136263 landed. The recommit fixed those. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D139509	2022-12-07 15:58:12 +08:00
Kazu Hirata	7883e5b061	Revert "[VP][RISCV] Add vp.fshl/fshr and RISC-V support." This reverts commit 70de0e014013b4d97febe6704881a9a8c893d078. I'm seeing: Failed Tests (2): LLVM :: CodeGen/RISCV/rvv/fixed-vectors-fshr-fshl-vp.ll LLVM :: CodeGen/RISCV/rvv/fshr-fshl-vp.ll Also reported at: https://lab.llvm.org/buildbot/#/builders/123/builds/14531	2022-12-06 22:27:43 -08:00
Yeting Kuo	8c8a6e1488	[RISCV] Add basic cost model for vp float rounding instructions. Reviewed By: craig.topper, reames Differential Revision: https://reviews.llvm.org/D137766	2022-12-07 14:15:13 +08:00
Yeting Kuo	70de0e0140	[VP][RISCV] Add vp.fshl/fshr and RISC-V support. The patch made VectorLegalizer expand ISD::VP_FSHL and ISD::VP_FSHR to achieve the codegen. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D138379	2022-12-07 12:16:36 +08:00
Krzysztof Parzyszek	86fe4dfdb6	TargetTransformInfo: convert Optional to std::optional Recommit: added missing "#include <cstdint>".	2022-12-02 11:42:15 -08:00
Krzysztof Parzyszek	4e12d1836a	Revert "TargetTransformInfo: convert Optional to std::optional" This reverts commit b83711248cb12639e7ef7303cfbb4452b4067e85. Some buildbots are failing.	2022-12-02 11:34:04 -08:00
Krzysztof Parzyszek	b83711248c	TargetTransformInfo: convert Optional to std::optional	2022-12-02 11:27:12 -08:00

1 2 3 4

153 Commits