llvm-project

Author	SHA1	Message	Date
Philip Reames	2e7c7d20d5	[RISCV][TTI] Adjust cost for extract/insert element when VLEN is known (#108595 ) If we know an exact VLEN, then the index is effectively modulo the number of elements in a single vector register. Our lowering performs this subvector optimization. A bit of context. This change may look a bit strange on it's own given we are currently not scaling insert/extract cost by LMUL. This costing decision needs to change, but is very intertwined with SLP profitability, and is thus a bit hard to adjust. I'm hoping that https://github.com/llvm/llvm-project/pull/108419 will let me start to untangle this. This change is basically a case of finding a subset I can tackle before other dependencies are in place which does no real harm in the meantime.	2024-09-17 08:43:40 -07:00
Luke Lau	41f1b467a2	[RISCV] Account for zvfhmin and zvfbfmin promotion in register usage (#108370 ) A half with only zvfhmin or bfloat will end up getting promoted to a f32 for most instructions. Unless the loop consists only of memory ops and permutation instructions which don't need promoted (is this common?), we'll end up using double the LMUL than what's currently being returned by getRegUsageForType. Since this is used by the loop vectorizer, it seems better to be conservative and assume that any usage of a zvfhmin half/bfloat will end up being widened to a f32	2024-09-17 13:50:19 +08:00
Elvis Wang	1b3e64a9d2	[RISCV][TTI] Add vp.cmp intrinsic cost with functionalOPC. (#107504 ) This patch make the instruction cost of VP compare intrinsics as same as their non-VP counterpart.	2024-09-12 07:06:36 +08:00
Elvis Wang	845d8d909c	[RISCV][TTI] Add cost of typebased cast VPIntrinsics with functionalOPC. (#97797 ) This patch make the instruction cost of type-based cast VP intrinsics will be same as their non-VP counterpart. This is the following patch of [#93435](https://github.com/llvm/llvm-project/pull/93435)	2024-09-05 13:05:01 +08:00
Shih-Po Hung	837ee5b46a	[RISCV][TTI] Scale the cost of FP-Int conversion with LMUL (#87506 ) Widening/narrowing the source data type to match the destination data type may require multiple steps. To model the costs, the patch generated the interim type by following the logic in RISCVTargetLowering::lowerVPFPIntConvOp.	2024-09-02 09:38:42 +08:00
Philip Reames	59f05b683d	[RISCV][TTI] Model cost for insert/extract into illegal types (#106440 ) We'd previously just deferred to the base implementation, but that more or less always returns 1. This underestimates the cost of the insert/extract, biases the SLP vectorizer towards forming illegally typed vectors, and underestimates the cost of scalarized operations (like unaligned scatter/gather).	2024-08-29 09:45:47 -07:00
Maciej Gabka	95d2d1cba0	Move stepvector intrinsic out of experimental namespace (#98043 ) This patch is moving out stepvector intrinsic from the experimental namespace. This intrinsic exists in LLVM for several years now, and is widely used.	2024-08-28 12:48:20 +01:00
Alexey Bataev	2a50dac9fb	[RISCV][TTI]Fix the cost estimation for long select shuffle. The code was broken completely. Need to iterate over the whole mask and process the submasks correctly, check if they form full indentity and adjust indices correctly. Fixes https://github.com/llvm/llvm-project/issues/106126	2024-08-26 17:27:52 -07:00
Philip Reames	424b87b8d6	[RISCV][TTI] Use legalized element types when costing casts (#105723 ) This fixes a crash introduced by my ac6e1fd0c089043fe60bd0040ba3cad884f00206. I had failed to consider the case where a vector is truncated to an illegal element type. The resulting intermediate VT wasn't an MVT and we'd fail an assertion. Surprisingly, SLP does query illegal element types in some cases.	2024-08-22 16:19:48 -07:00
LiqinWeng	abaa53199e	[RISCV] Implement RISCVTTIImpl::shouldConsiderAddressTypePromotion for RISCV (#102560 ) This optimization helps reduce repeated calculations of base addresses by extracting type extensions when the same base address is accessed multiple times but its offset is a constant.	2024-08-15 10:37:04 +08:00
Philip Reames	ac6e1fd0c0	[RISCV][TTI] Cost non-power-of-two size changing casts (#101047 ) For a cast with src and destination size being unequal, we were costing the cast as if it were being scalarized, when in fact we can often promote such cases to a wider legal type. Note that for casts with equal size (i.e. bitcast, some fp<->i, and ptrtoint) the generic logic in BasicTTI already assumed promotion. It just doesn't handle the cast where source and destination are both promoted to non-equal types. This is analogous to d3fd28a, but with the same reasoning applied to casts instead.	2024-08-13 14:58:16 -07:00
Jeremy Morse	bde243259b	Revert "[Asan] Provide TTI hook to provide memory reference infromation of target intrinsics. (#97070 )" This reverts commit e8ad87c7d06afe8f5dde2e4c7f13c314cb3a99e9. This reverts commit d3c9bb0cf811424dcb8c848cf06773dbdde19965. A few buildbots trip up on asan-rvv-intrinsics.ll. I've also reverted the follow-up commit d3c9bb0cf8. https://lab.llvm.org/buildbot/#/builders/46/builds/2895	2024-08-08 12:26:05 +01:00
Yeting Kuo	e8ad87c7d0	[Asan] Provide TTI hook to provide memory reference infromation of target intrinsics. (#97070 ) Previously asan considers target intrinsics as black boxes, so asan could not instrument accurate check. This patch provide TTI hooks to make targets describe their intrinsic informations to asan. Note, 1. this patch renames InterestingMemoryOperand to MemoryRefInfo. 2. this patch does not support RVV indexed/segment load/store.	2024-08-08 13:40:26 +08:00
Craig Topper	ad80265874	[RISCV] Qualify all XCV predicates with !is64Bit. (#101074 ) The tablegen patterns all have isRV32. I did not check if any of them could naively support RV64. Fixes #101067 and probably other bugs like it we haven't found yet.	2024-07-29 21:52:57 -07:00
Philip Reames	b66310f938	[RISCV][TTI] Split costing of [u/s]int_to_fp from fp_to_[u/s]int [nfc] (#101029 ) The amount of code sharing between them is fairly small, and the split version is much easier to read.	2024-07-29 09:32:36 -07:00
Philip Reames	d3fd28a134	[RISCV][TTI] Properly model odd vector sized LD/ST operations (#100436 ) The motivation for this change is the costing of a LD or ST with nearly power of 2 vectors (e.g. <3 x i32> or <7 x i32>) on V. There's an experimental option in SLP to allow emitting these if the cost model says they're profitable. This really helps with e.g. RGB vectors. Our actual lowering for these depends on whether a wider container type is known available. If so, we use a vle or vse on the wider type with a restricted VL. If not, we split until a legal type is found, and then apply the vle/vse on the sub-pieces. This change is intentionally restricted to only the case where promotion (widening w/VL predication) is involved. We appear to have at least one bug in our splitting lowering (see discussion on review), and to avoid exposing this more widely, I chose to not adjust costs for the splitting case. The current splitting costing assumes scalarization (which is not true of the actual lowering), but that has the effect of biasing vectorization away from such cases strongly. For the widening case, the true cost scales with the next largest legal type. The default implementation assumes that such a type is scalarized. Changing that brings our cost in line with our actual lowering decision. Note that since scalarization is not possible for scalable types, the prior costing falsely returned Invalid for that case.	2024-07-26 12:52:20 -07:00
Luke Lau	58854facb3	[RISCV] Don't cost vector arithmetic fp ops as cheaper than scalar (#99594 ) I was comparing some SPEC CPU 2017 benchmarks across rva22u64 and rva22u64_v, and noticed that in a few cases that rva22u64_v was considerably slower. One of them was 519.lbm_r, which has a large loop that was being unprofitably vectorized. It has an if/else in the loop which requires large amounts of predication when vectorized, but despite the loop vectorizer taking this into account the vector cost came out as cheaper than the scalar. It looks like the reason for this is because we cost scalar floating point ops as 2, but their vector equivalents as 1 (for LMUL 1). This comes from how we use BasicTTIImpl for scalars which treats floats as twice as expensive as integers. This patch doubles the cost of vector floating point arithmetic ops so that they're at least as expensive as their scalar counterparts, which gives a 13% speedup on 519.lbm_r at -O3 on the spacemit-x60. Fixes #62576 (the last point there about scalar fsub/fmul)	2024-07-22 13:56:10 +08:00
Alex Bradbury	8687f7cd66	[RISCV] Support constant hoisting of immediate store values (#96073 ) Previously getIntImmInstCost only calculated the cost of materialising the argument of a store if it was the address. This means ConstantHoisting's transformation wouldn't kick in for cases like storing two values that require multiple instructions to materialise but where one can be cheaply generated from the other (e.g. by an addition). Two key changes were needed to avoid regressions when enabling this: * Allowing constant materialisation cost to be calculated assuming zeroes are free (as might happen if you had a 2XLEN constant and one half is zero). Avoiding constant hoisting if we have a misaligned store that's going to be a legalised to a sequence of narrower stores. I'm seeing cases where hoisting the constant ends up with worse codegen in that case. Out of caution and so as not to unexpectedly degrade other existing hoisting logic, FreeZeroes is used only for the new cost calculations for the load instruction. It would likely make sense to revisit this later.	2024-07-17 15:19:31 +01:00
Elvis Wang	4762f3bab0	[RISCV][TTI] Add cost of type based binOp VP intrinsics with functionalOPC. (#93435 ) Intrinsics not supported in the backend will fall Into BasicTTIImpl, which will check if the VP intrinsic is a type based instruction. All type based instruction will fall into the `getTypeBasedIntrinsicInstrCost()` which doesn't support instruction with scalable vector type. This patch adds the instruction cost for type based binOp VP intrinsic instructions in the backend to get the valid instruction costs. The cost of type based binOp VP intrinsics will be same as their non-VP counterpart.	2024-07-05 08:13:18 +08:00
Philip Reames	25b65be43d	[RISCV][LSR] Account for temporary register for base addition (#92296 ) An LSR formula may require the addition of multiple base or scale registers, this sum reduction requires a temporary register to perform. Since the formulas are independent, we only need one temporary, regardless of the number of unique formula. Each formula can reuse the same temporary. A later CSE pass may come along and combine sub-expressions - but then the register pressure would be that passes problem to consider. This change fixes up the costing in the RISCV specific way, but this is really a generic LSR problem. I just didn't feel like fighting with LSR and dealing with all the various targets swinging slightly in hard to reason about ways. This problem is more pronounced on RISCV than any other target due to our lack of addressing modes. This change is not hugely important on it's own, but I have an upcoming change to add support fo shNadd in LSR which biases us fairly strongly towards adding more "base adds". Without this change, we see net regression due to the increase in register pressure which is not accounted for.	2024-05-22 13:38:39 -07:00
Elvis Wang	b60e62896e	[RISCV][CostModel] Remove cost of icmp inst in icmp+select with SFB. (#91158 ) With ShortFowrardBranchOpt(SFB) or ConditionalMoveFusion, scalar ICmp and scalar Select instructions will lower to SELECT_CC and lower to PseudoCCMOVGPR which will generate a conditional branch instruction and a move instruction. The cost of scalar (ICmp + Select) = (0 + Select instruction cost)	2024-05-20 16:03:18 +08:00
Craig Topper	487b43cdc9	[RISCV] Pass subvector type to isLegalInterleavedAccessType in getInterleavedMemoryOpCost. (#91825 ) isLegalInterleavedAccessType expects the subvector type, but getInterleavedMemoryOpCost is called with the full vector type. So we need to divide by Factor.	2024-05-15 21:47:29 -07:00
Min-Yih Hsu	4c68de5a00	[RISCV][CostModel] Add cost model for experimental.cttz.elts (#91778 ) The cost of `experimental.cttz.elts` in RISC-V equals to the cost of vfirst when the zero_is_poison argument is true. Otherwise, we add additional costs of cmp + select to convert the -1 result from vfirst to EVL.	2024-05-14 09:18:08 -07:00
Shih-Po Hung	22213d5883	Recommit [RISCV][TTI] Support fdiv/udiv/sdiv/srem/urem in getArithmeticInstrCost (#89170 ) Insert a break to fix the implicit-fallthrough caught by sanitizer. Original commit message: This patch made following changes: 1. Support ISD FDIV/UDIV/SDIV/UREM/SREM 2. Classify instructions which cost the same	2024-05-12 20:10:51 -07:00
ShihPo Hung	d67c3a4b1f	Revert "[RISCV][TTI] Support fdiv/udiv/sdiv/srem/urem in getArithmeticInstrCost (#89170 )" This reverts commit ed16e7aac44f2024b45d8c6c9dc2817d77d0ea97.	2024-05-12 19:57:40 -07:00
Shih-Po Hung	ed16e7aac4	[RISCV][TTI] Support fdiv/udiv/sdiv/srem/urem in getArithmeticInstrCost (#89170 ) This patch made following changes: 1. Support ISD FDIV/UDIV/SDIV/UREM/SREM 2. Classify instructions which cost the same	2024-05-13 09:47:57 +08:00
Mel Chen	3f1fef3699	[RISCV] Support interleaved accesses for scalable vector. (#90583 ) The support for interleaved accesses for scalable vector with a factor of 2 is enabled in vectorizer. Therefore, the patch removed the restriction for scalable vector with a factor of 2.	2024-05-03 21:56:31 +08:00
Shih-Po Hung	097b68ff06	[RISCV][TTI] Refine the cost of FCmp (#88833 ) This patch introduces following changes - Support all fp predicates - Use the Val type to estimate the latency/throughput cost - Assign a cost of 1 for mask operations as LMULCost for mask types cannot be correctly estimated.	2024-04-18 09:44:31 +08:00
Shih-Po Hung	f3a8112d98	[RISCV][TTI] Scale the cost of ICmp with LMUL (#88235 ) Use the Val type to estimate the instruction cost for ICmp.	2024-04-16 09:37:32 +08:00
Shih-Po Hung	3d985a6f1b	[RISCV][TTI] Scale the cost of Select with LMUL (#88098 ) Use the Val type to estimate the instruction cost for SelectInst.	2024-04-10 14:18:15 +08:00
Shih-Po Hung	ee52add6cb	[RISCV][TTI] Implement cost of intrinsic active_lane_mask (#87931 ) This patch uses the argument type to infer the LMUL cost for the index generation, add, and comparison.	2024-04-10 10:08:33 +08:00
David Green	4ac2721e51	[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934 ) This tries to add some costs for the shuffle in a ST3/ST4 instruction, which are represented in LLVM IR as store(interleaving shuffle). In order to detect the store, it needs to add a CxtI context instruction to check the users of the shuffle. LD3 and LD4 are added, LD2 should be a zip1 shuffle, which will be added in another patch. It should help fix some of the regressions from #87510.	2024-04-09 16:36:08 +01:00
Alexey Bataev	413a66f339	[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172 ) This patch introduces generating VP intrinsics in the Loop Vectorizer. Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions. These architectures can make better use of their predication capabilities. Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV (just adds a new tail-folding mode using EVL), but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions. The patch adds a new VPlanTransforms to replace the wide header predicate compare with EVL and updates codegen for load/stores to use VP store/load with EVL. Other important part of this approach is how the Explicit Vector Length is computed. (VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We use an experimental intrinsic `get_vector_length`, that can be lowered to architecture specific instruction(s) to compute EVL. Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives. Differential Revision: https://reviews.llvm.org/D99750	2024-04-04 18:30:17 -04:00
Shih-Po Hung	97523e5321	[RISCV][TTI] Scale the cost of intrinsic stepvector with LMUL (#87301 ) Use the return type to measure the LMUL size for latency/throughput cost	2024-04-04 08:30:15 +08:00
Shih-Po Hung	d7a43a00fe	[RISCV][TTI] Scale the cost of trunc/fptrunc/fpext with LMUL (#87101 ) Use the destination data type to measure the LMUL size for latency/throughput cost	2024-04-02 09:30:51 +08:00
Shih-Po Hung	84f24c2daf	[RISCV][TTI] Scale the cost of intrinsic umin/umax/smin/smax with LMUL (#87245 ) Use the return type to measure the LMUL size for throughput/latency cost	2024-04-02 09:26:27 +08:00
Shih-Po Hung	c7954ca312	Recommit "[RISCV] Refine cost on Min/Max reduction (#79402 )" (#86480 ) This is recommitted as the test and fix for llvm.vector.reduce.fmaximum/fminimum are covered in #80553 and #80697	2024-04-01 14:44:10 +08:00
ShihPo Hung	aa2d5d5413	Recommit "[RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617 )" Changes in Recommit: Add an additional check on sign/zero extend to the same type. Original message: Use the destination data type to measure the LMUL size for latency/throughput cost	2024-03-26 23:41:16 -07:00
Jianjian Guan	05a7b22a01	[RISCV] Add areInlineCompatible for riscv target (#86639 ) Inline a callee if its target-features are a subset of the callers target-features.	2024-03-27 14:16:03 +08:00
ShihPo Hung	da3e58e74a	Revert "[RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617 )" This reverts commit 7545c635729a2055a429c5decd26a619a8d6e74b as it's failing on the Linux bots.	2024-03-26 21:47:32 -07:00
Shih-Po Hung	7545c63572	[RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617 ) Use the destination data type to measure the LMUL size for latency/throughput cost	2024-03-27 10:58:17 +08:00
Craig Topper	2fbc40d36d	[RISCV] Split compound if statement to fix a crash. We're not allowed to call getELEN when the vector extension is not enabled. If we're looking at a vector type, isTypeLegal would only return true if the vector extensions are enabled. So early out for non-vector types before we call isTypeLegal and getELEN.	2024-03-26 11:53:17 -07:00
ShihPo Hung	5dc0c75aab	[RISCV][TTI] Fix missing return in the end of function	2024-03-25 23:32:18 -07:00
Shih-Po Hung	817f453aa5	[RISCV][TTI] Refactor getCastInstrCost to exit early (#86619 ) To reduce the indentation by using early returns, this patch hoist the return for illegal type and non vector type earlier. It should mostly be an NFC.	2024-03-26 14:15:40 +08:00
Shih-Po Hung	3cb024198f	[RISCV][CostModel] Estimate cost of llvm.vector.reduce.fmaximum/fminimum (#80697 ) The ‘llvm.vector.reduce.fmaximum/fminimum.*’ intrinsics propagate NaNs if any element of the vector is a NaN. Following #79402, the patch adds the cost for NaN check (vmfne + vcpop)	2024-03-25 17:17:36 +08:00
Kolya Panchenko	aa68e2814d	[RISCV] Support `llvm.masked.compressstore` intrinsic (#83457 ) The changeset enables lowering of `llvm.masked.compressstore(%data, %ptr, %mask)` for RVV for fixed vector type into: ``` %0 = vcompress %data, %mask, %vl %new_vl = vcpop %mask, %vl vse %0, %ptr, %1, %new_vl ``` Such lowering is only possible when `%data` fits into available LMULs and otherwise `llvm.masked.compressstore` is scalarized by `ScalarizeMaskedMemIntrin` pass. Even though RVV spec in the section `15.8` provide alternative sequence for compressstore, use of `vcompress + vcpop` should be a proper canonical form to lower `llvm.masked.compressstore`. If RISC-V target find the sequence from `15.8` better, peephole optimization can transform `vcompress + vcpop` into that sequence.	2024-03-13 15:18:51 -04:00
Visoiu Mistrih Francis	eceb24c439	[RISCV] Hoist immediate addresses from loads/stores (#83644 ) In case of loads/stores from an immediate address, avoid rematerializing the constant for every block and allow consthoist to hoist it to the entry block.	2024-03-05 22:41:56 -08:00
Shih-Po Hung	fb67dce1cb	[RISCV] Fix crash when unrolling loop containing vector instructions (#83384 ) When MVT is not a vector type, TCK_CodeSize should return an invalid cost. This patch adds a check in the beginning to make sure all cost kinds return invalid costs consistently. Before this patch, TCK_CodeSize returns a valid cost on scalar MVT but other cost kinds doesn't. This fixes the issue #83294 where a loop contains vector instructions and MVT is scalar after type legalization when the vector extension is not enabled,	2024-03-02 12:33:55 +08:00
Shih-Po Hung	6ee9c8afbc	[RISCV][CostModel] Updates reduction and shuffle cost (#77342 ) - Make `andi` cost 1 in SK_Broadcast - Query the cost of VID_V, VRSUB_VX/VRSUB_VI which would scale with LMUL	2024-02-29 15:41:19 +08:00
Philip Reames	f037e709ca	[RISCV][TTI] Cost a subvector extract at a register boundary with exact vlen (#82405 ) If we have exact vlen knowledge, we can figure out which indices correspond to register boundaries. Our lowering uses this knowledge to replace the vslidedown.vi with a sub-register extract. Our costs can reflect that as well. This is another piece split off https://github.com/llvm/llvm-project/pull/80164 --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>	2024-02-21 07:56:08 -08:00

1 2 3 4 5 ...

261 Commits