llvm-project

Author	SHA1	Message	Date
Craig Topper	f139bde8d8	[SelectionDAG] Move SDNode::use_iterator::getOperandNo to SDUse. (#120536 ) This allows us to write more range based for loops because we no longer need the iterator. It also matches IR's Use class.	2024-12-19 09:07:42 -08:00
Craig Topper	e6b2495545	[SelectionDAG] Split SDNode::use_iterator into user_iterator and use_iterator. (#120531 ) SDNode::use_iterator now returns an SDUse& when dereferenced. SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses work on use_iterator. SDNode::user_begin/user_end/users work on user_iterator. We can now write range based for loops using SDUse& and SDNode::uses(). I've converted many of these in this patch. I didn't update loops that have additional variables updated in their for statement. Some loops use SDNode::use_iterator::getOperandNo() which also prevents using range based for loops. I plan to move this into SDUse in a follow up patch.	2024-12-19 08:35:32 -08:00
Craig Topper	bd261ecc5a	[SelectionDAG] Add SDNode::user_begin() and use it in some places (#120509 ) Most of these are just places that want the first user and aren't iterating over the whole list. While there I changed some use_size() == 1 to hasOneUse() which is more efficient. This is part of an effort to rename use_iterator to user_iterator and provide a use_iterator that dereferences to SDUse&. This patch helps reduce the diff on later patches.	2024-12-18 22:13:04 -08:00
Craig Topper	104ad9258a	[SelectionDAG] Rename SDNode::uses() to users(). (#120499 ) This function is most often used in range based loops or algorithms where the iterator is implicitly dereferenced. The dereference returns an SDNode * of the user rather than SDUse * so users() is a better name. I've long beeen annoyed that we can't write a range based loop over SDUse when we need getOperandNo. I plan to rename use_iterator to user_iterator and add a use_iterator that returns SDUse& on dereference. This will make it more like IR.	2024-12-18 20:09:33 -08:00
Craig Topper	dc72ec808d	[RISCV] Custom legalize vp.merge for mask vectors. (#120479 ) The default legalization uses vmslt with a vector of XLen to compute a mask. This doesn't work if the type isn't legal. For fixed vectors it will scalarize. For scalable vectors it crashes the compiler. This patch uses an alternate strategy that promotes the i1 vector to an i8 vector and does the merge. I don't claim this to be the best lowering. I wrote it quickly almost 3 years ago when a crash was reported in our downstream. Fixes #120405.	2024-12-18 19:19:14 -08:00
Philip Reames	984cb791db	[RISCV] Use vmv.v.x to materialize masks in deinterleave2 lowering (#118500 ) This is a follow up to 2af2634 to use vmv.v.x of i8 constants instead of the prior vid/vand/vmsne sequence. The advantage of the vmv.v.x sequence is that it's always m1 (so cheaper at high LMUL), and can be rematerialized by the register allocator if needed to locally reduce register pressure.	2024-12-17 12:50:09 -08:00
Brendan Sweeney	bfe8a21bad	[RISCV][ISEL] Lowering to load-acquire/store-release for RISCV Zalasr (#82914 ) Lowering to load-acquire/store-release for RISCV Zalasr. Currently uses the psABI lowerings for WMO load-acquire/store-release (which are identical to A.7). These are incompatable with the A.6 lowerings currently used by LLVM. This should be OK for now since Zalasr is behind the enable experimental extensions flag, but needs to be fixed before it is removed from that. For TSO, it uses the standard Ztso mappings except for lowering seq_cst loads/store to load-acquire/store-release, I had Andrea review that.	2024-12-17 00:19:45 -08:00
Luke Lau	088db868f3	[RISCV] Merge shuffle sources if lanes are disjoint (#119401 ) In x264, there's a few kernels with shuffles like this: %41 = add nsw <16 x i32> %39, %40 %42 = sub nsw <16 x i32> %39, %40 %43 = shufflevector <16 x i32> %41, <16 x i32> %42, <16 x i32> <i32 11, i32 15, i32 7, i32 3, i32 26, i32 30, i32 22, i32 18, i32 9, i32 13, i32 5, i32 1, i32 24, i32 28, i32 20, i32 16> Because this is a complex two-source shuffle, this will get lowered as two vrgather.vvs that are blended together. vadd.vv v20, v16, v12 vsub.vv v12, v16, v12 vrgatherei16.vv v24, v20, v10 vrgatherei16.vv v24, v12, v16, v0.t However the indices coming from each source are disjoint, so we can blend the two together and perform a single source shuffle instead: %41 = add nsw <16 x i32> %39, %40 %42 = sub nsw <16 x i32> %39, %40 %43 = select <0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1> %41, %42 %44 = shufflevector <16 x i32> %43, <16 x i32> poison, <16 x i32> <i32 11, i32 15, i32 7, i32 3, i32 10, i32 14, i32 6, i32 2, i32 9, i32 13, i32 5, i32 1, i32 8, i32 12, i32 4, i32 0> The select will likely get merged into the preceding instruction, and then we only have to do one vrgather.vv: vadd.vv v20, v16, v12 vsub.vv v20, v16, v12, v0.t vrgatherei16.vv v24, v20, v10 This patch bails if either of the sources are a broadcast/splat/identity shuffle, since that will usually already have some sort of cheaper lowering. This improves performance on 525.x264_r by 4.12% with -O3 -flto -march=rva22u64_v on the spacemit-x60.	2024-12-12 13:41:27 +08:00
Luke Lau	2698fc699b	[RISCV] Refactor helper in isDesirableToCommuteWithShift. NFC (#119526 ) Instead of duplicating the loop twice, add arguments to the lambda. I plan on reusing this in #119527	2024-12-12 13:24:28 +08:00
Raphael Moreira Zinsly	708a478d67	[RISCV] Add stack clash protection (#117612 ) Enable `-fstack-clash-protection` for RISCV and stack probe for function prologues. We probe the stack by creating a loop that allocates and probe the stack in ProbeSize chunks. We emit an unrolled probe loop for small allocations and emit a variable length probe loop for bigger ones.	2024-12-10 16:48:26 +00:00
LiqinWeng	3083acc215	[DAGCombine] Remove oneuse restrictions for RISCV in folding (shl (add_nsw x, c1)), c2) and folding (shl(sext(add x, c1)), c2) in some scenarios (#101294 ) This patch remove the restriction for folding (shl (add_nsw x, c1)), c2) and folding (shl(sext(add x, c1)), c2), and test case from dhrystone , see this link: riscv32: https://godbolt.org/z/o8GdMKrae riscv64: https://godbolt.org/z/Yh5bPz56z	2024-12-10 11:17:54 +08:00
David Sherwood	8630a7ba7c	Reapply "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566 )" (#118823 ) [Reverts d57892a2a153ab71a796f07e39d939eae6910c21] For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant. --------- Co-authored-by: Paul Walker <paul.walker@arm.com>	2024-12-09 10:56:44 +00:00
Philip Reames	02ad623bb5	[RISCV] Prefer strided store for interleave store with one lane active (#119027 ) If we're performing a segment store and all but one of the segments are undefined, that's equivalent to performing a strided store of the one active segment. This is the store side of a905203b. As before, this only covers fixed vectors.	2024-12-06 16:45:01 -08:00
Michael Maitland	84efad0b47	[RISCV][MRI] Account for fixed registers when determining callee saved regs (#115756 ) This fixes https://discourse.llvm.org/t/fixed-register-being-spill-and-restored-in-clang/83058. We need to do it in `MachineRegisterInfo::getCalleeSavedRegs` instead of `RISCVRegisterInfo::getCalleeSavedRegs` since the MF argument of `TargetRegisterInfo:::getCalleeSavedRegs` is `const`, so we can't call `MF->getRegInfo().disableCalleeSavedRegister` there. So to put it in `MachineRegisterInfo::getCalleeSavedRegs`, we move `isRegisterReservedByUser` into `TargetSubtargetInfo`.	2024-12-06 14:07:27 -05:00
Pengcheng Wang	35619c791d	[RISCV] Add tune info for mem* expansion (#118439 ) So that CPUs can tune these options.	2024-12-06 14:48:37 +08:00
Philip Reames	e60a939a51	[RISCV] Use zext and shift for spread(4,8) when types allow (#118893 ) For a spread with an element type small enough, we can use a zext and shift to perform the shuffle. For e8, this covers spread(2,4,8), and for e16 covers spread(2,4). Note that spread(2) is already covered by the existing interleave logic, and is simply listed for completeness in the prior description.	2024-12-05 16:34:15 -08:00
Philip Reames	66a0a08133	[RISCV] Extract spread(2,4,8) shuffle lowering from interleave(2) (#118822 ) This is a prep patch for improving spread(4,8) shuffles. I also think it improves the readability of the existing code, but the primary motivation is simply staging work.	2024-12-05 11:32:27 -08:00
Mikhail R. Gadelha	59a9e4d8a4	[RISCV] Update matchSplatAsGather to convert vectors if they have different sizes (#117878 ) This patch updates the matchSplatAsGather function so we can handle vectors of different sizes. The goal is to improve the code gen for @llvm.experimental.vector.match on RISCV. Currently, we use a scalar extract and splat instead of vrgather, and the patch changes that.	2024-12-05 11:47:02 -03:00
Vitaly Buka	d57892a2a1	Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc" (#118693 ) Reverts llvm/llvm-project#117566 Breaks libc++ tests with HWASAN https://lab.llvm.org/buildbot/#/builders/55/builds/3959	2024-12-04 12:36:46 -08:00
Philip Reames	a6e7749ea9	[RISCV] Improve lowering of spread(2) shuffles (#118658 ) A spread(2) shuffle is just a interleave with an undef lane. The existing lowering was reusing the even lane for the undef value. This was entirely legal, but non-optimal.	2024-12-04 12:21:08 -08:00
David Sherwood	4675db5f39	[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566 ) For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant.	2024-12-04 10:26:51 +00:00
Craig Topper	b076fbb844	[TargetLowering] Use Type* instead of EVT in shouldSignExtendTypeInLibCall. (#118587 ) I want to use this function for GISel too so Type * is a better common interface. All of the callers already convert EVT to Type * as needed by calling lowering anyway.	2024-12-03 22:06:55 -08:00
Brandon Wu	109e4a147f	[RISCV] Handle zeroinitializer of vector tuple Type (#113995 ) It doesn't make sense to add a new generic ISD to handle riscv tuple type. Instead we use `SPLAT_VECTOR` for ISD and further lower to `VMV_V_X`. Note: If there's `visitSPLAT_VECTOR` in generic DAG combiner, it needs to skip riscv vector tuple type. Stack on https://github.com/llvm/llvm-project/pull/114329	2024-12-04 13:40:02 +08:00
Philip Reames	c1afcaf33b	[RISCV] Match deinterleave(4,8) shuffles to SHL/TRUNC when legal (#118509 ) We can extend the existing SHL+TRUNC lowering used for deinterleave2 for deinterleave4, and deinterleave8 when the result types are small enough to allow the shift to be legal. On RV64, this means i8 and i16 results for deinterleave4 and i8 results for deinterleave8.	2024-12-03 19:35:52 -08:00
Philip Reames	b8805c88ce	[RISCV] Clang-format a few lines to remove diff in a nearby patch	2024-12-03 10:42:17 -08:00
Philip Reames	2af2634c64	[RISCV] Use vcompress in deinterleave2 intrinsic lowering (#118325 ) This is analogous to febbf91 which added shuffle lowering using vcompress; we can do the same thing in the deinterleave2 lowering path which is used for scalable vectors. Note that we can further improve this for high lmul usage by adjusting how we materialize the mask (whose result is at most m1 with a known bit pattern). I am deliberately staging the work so that the changes to reduce register pressure are more easily evaluated on their own merit.	2024-12-02 18:37:32 -08:00
Philip Reames	56cb5cbfcd	[RISCV] Remove RISCVISD::VNSRL_VL and adjust deinterleave lowering to match (#118391 ) Instead of directly lowering to vnsrl_vl and having custom pattern matching for that case, we can just lower to a (legal) shift and truncate, and let generic pattern matching produce the vnsrl. The major motivation for this is that I'm going to reuse this logic to handle e.g. deinterleave4 w/ i8 result. The test changes aren't particularly interesting. They're minor code improvements - I think because we do slightly better with the insert_subvector patterns, but that's mostly irrelevant.	2024-12-02 13:39:12 -08:00
Philip Reames	c6f2d35c4d	Fix a build warning introduce by my febbf910	2024-11-27 13:41:29 -08:00
Philip Reames	febbf9105f	[RISCV] Match vcompress during shuffle lowering (#117748 ) This change matches a subset of vcompress patterns during shuffle lowering. The subset implemented requires a contiguous prefix of demanded elements followed by undefs. This subset was chosen for two reasons: 1) which elements to spurious demand is a non-obvious problem, and 2) my first several attempts at implementing the general case were buggy. I decided to go with the simple case to start with. vcompress scales better with LMUL than a general vrgather, and at least the SpaceMit X60, has higher throughput even at m1. It also has the advantage of requiring smaller vector constants at one bit per element as opposed to vrgather which is a minimum of 8 bits per element. The downside to using vcompress is that we can't fold a vselect into it, as there is no masked vcompress variant. For reference, here are the relevant throughputs from camel-cdr's data table on BP3 (X60): vrgather.vv v8,v16,v24 4.0 16.0 64.0 256.0 vcompress.vm v8,v16,v24 3.0 10.0 36.0 136. vmerge.vvm v8,v16,v24,v0 2.0 4.0 8.0 16.0 The largest concern with the extra vmerge is that we locally increase register pressure. If we do have masking, we also have a passthru, without the ability to fold that into the vcompress, we need to keep it alive a bit longer. This can hurt at e.g. m8 where we have very few architectural registers. As compared with the vrgather.vv sequence, this is only one additional m1 VREG - since we no longer need the index vector. It compares slightly worse against vrgatherie16.vv which can use index vectors smaller than other operands. Note that we could potentially fold the vmerge if only tail elements are being preserved; I haven't investigated this. It is unfortunately hard given our current lowering structure to know if we're emitting a shuffle where masking will follow. Thankfully, it doesn't seem to show up much in practice, so I think we can probably ignore it. This patch only handles single source compress idioms at the moment. This is an effort to avoid interacting with other patches on review for changing how we canonicalize length changing shuffles.	2024-11-27 13:23:18 -08:00
Craig Topper	c2bb056482	[SelectionDAG][RISCV][AArch64] Allow f16 STRICT_FLDEXP to be promoted. Fix integer promotion of STRICT_FLDEXP in type legalizer. (#117633 ) A special case in type legalization wasn't accounting for different operand numbering between FLDEXP and STRICT_FLDEXP. AArch64 already asked STRICT_FLDEXP to be promoted, but had no test for it.	2024-11-25 16:12:45 -08:00
Craig Topper	ed6749a405	[RISCV] Promote frexp with Zfh. The default expansion tries to create an illegal integer type after legalization.	2024-11-25 10:27:37 -08:00
Craig Topper	20bd029a40	[RISCV] Promote fldexp with Zfh. (#117396 ) The default expansion tries to create i16 operations after type legalization. Fixes #117349	2024-11-25 09:08:56 -08:00
David Sherwood	9b76e7fc60	Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031 )" (#117556 ) This reverts commit 22ec44f509ff266b581dbb490d7b040473b7c31a.	2024-11-25 13:49:21 +00:00
David Sherwood	22ec44f509	[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031 ) For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant.	2024-11-25 09:25:01 +00:00
Craig Topper	29afbd5893	[RISCV] Add DAG combine to convert (iX ctpop (bitcast (vXi1 A))) into vcpop.m. (#117062 ) This only handles the simplest case where vXi1 is a legal vector type. If the vector type isn't legal we need to go through type legalization, but the pattern gets much harder to recognize after that. Either because ctpop gets expanded due to Zbb not being enabled, or the bitcast becoming a bitcast+extractelt, or the ctpop being split into multiple ctpops and adds, etc.	2024-11-21 11:12:07 -08:00
Craig Topper	cdd1e27124	[X86][RISCV] Don't emit JumpTableDebugInfo unless triple is OSBinFormatCOFF. (#117083 ) This makes the override in RISCV and X86 consistent with the base class implementation of expandIndirectJTBranch.	2024-11-21 09:38:16 -08:00
Sam Elliott	408659c5b5	[RISCV] Merge GPRPair and GPRF64Pair (#116094 ) As suggested by Craig, this tries to merge the two sets of register classes created in #112983, GPRPair* and GPRF64Pair*. - I added some explicit annotations to `RISCVInstrInfoD.td` which fixed the type inference issues I was seeing from tablegen for select patterns. - I've had to make the behaviour of `splitValueIntoRegisterParts` and `joinRegisterPartsIntoValue` cover more cases, because you cannot bitcast to/from untyped (the bitcast would otherwise have been inserted automatically by TargetLowering code). - I apparently didn't need to change `getNumRegisters` again, which continues to tell me there's a bug in the code for tied inputs. I added some more test coverage of this case but it didn't seem to help find the asserts I was finding before - I think the difference is between the default behaviour for integers which doesn't apply to floats. - There's still a difference between BuildGPRPair and BuildPairF64 (and the same for SplitGPRPair and SplitF64). I'm not happy with this, I think it's quite confusing, as they're very similar, just differing in whether they give a `untyped` or a `f64`. I haven't really worked out how the DAGCombiner copes if one meets the other, I know we have some of this for the f64 variants already, but they're a lot more complex than the GPRPair variants anyway.	2024-11-20 10:08:55 +00:00
Sam Elliott	c4030c896d	[RISCV] Fix FP64 DinX R Regclass (#116688 ) This was a typo in llvm/llvm-project#112983 that didn't cause build failures but is still wrong.	2024-11-19 12:42:27 +00:00
Craig Topper	ac17b50f50	[RISCV] Use getSignedTargetConstant. NFC	2024-11-18 12:20:44 -08:00
Craig Topper	900c056531	[RISCV] Add an implementation of findRepresentativeClass to assign i32 to GPRRegClass for RV64. (#116165 ) This is an alternative fix for #81192. This allows the SelectionDAG scheduler to be able to find a representative register class for i32 on RV64. The representative register class is the super register class with the largest spill size that is also legal. The default implementation of findRepresentativeClass only works for legal types which i32 is not for RV64. I did some investigation of why tablegen uses i32 in output patterns on RV64. It appears it comes down to a function called ForceArbitraryInstResultType that picks a type for the output pattern when the isel pattern isn't specific enough. I believe it picks the smallest type(lowested numbered) to resolve the conflict. A similar issue occurs for f16 and bf16 which both use the FPR16 register class. If the isel pattern doesn't specify, tablegen may find both f16 and bf16 and may pick bf16 from Zfh pattern when Zfbfmin isn't present. Since bf16 isn't legal in that case, findRepresentativeClass will fail. For i8, i16, i32, this patch calls the base class with XLenVT to get the representative class since XLenVT is always legal. For bf16/f16, we call the base class with f32 since all of the f16/bf16 extensions depend on either F or Zfinx which will make f32 a legal type. The final representative register class further depends on whether D or Zdinx is also enabled, but that should be handled by the default implementation.	2024-11-18 10:07:20 -08:00
Sam Elliott	4615cc38f3	[RISCV] Inline Assembly Support for GPR Pairs ('R') (#112983 ) This patch adds support for getting even-odd general purpose register pairs into and out of inline assembly using the `R` constraint as proposed in riscv-non-isa/riscv-c-api-doc#92 There are a few different pieces to this patch, each of which need their own explanation. - Renames the Register Class used for f64 values on rv32i_zdinx from `GPRPair` to `GPRF64Pair`. These register classes are kept broadly unmodified, as their primary value type is used for type inference over selection patterns. This rename affects quite a lot of files. - Adds new `GPRPair` register classes which will be used for `R` constraints and for instructions that need an even-odd GPR pair. This new type is used for `amocas.d.`(rv32) and `amocas.q.`(rv64) in Zacas, instead of the `GPRF64Pair` class being used before. - Marks the new `GPRPair` class legal as for holding a `MVT::Untyped`. Two new RISCVISD node types are added for creating and destructing a pair - `BuildGPRPair` and `SplitGPRPair`, and are introduced when bitcasting to/from the pair type and `untyped`. - Adds functionality to `splitValueIntoRegisterParts` and `joinRegisterPartsIntoValue` to handle changing `i<2xlen>` MVTs into `untyped` pairs. - Adds an override for `getNumRegisters` to ensure that `i<2*xlen>` values, when going to/from inline assembly, only allocate one (pair) register (they would otherwise allocate two). This is due to a bug in SelectionDAGBuilder.cpp which other backends also work around. - Ensures that Clang understands that `R` is a valid inline assembly constraint. - This also allows `R` to be used for `f64` types on `rv32_zdinx` architectures, where doubles are stored in a GPR pair.	2024-11-18 17:45:58 +00:00
Brandon Wu	206ee71918	[RISCV] Change vector tuple type's TypeSize to scalable (#114329 ) Vector tuple is basically multiple grouped vector, so its size is also determined by vscale, we need not to model it as a vector type but its size need to be scalable.	2024-11-17 18:52:49 +08:00
Brandon Wu	b4adce0056	[RISCV] Tuple intrinsics are creating overly aligned memory operands (#115804 ) The alignment should be same as its element type.	2024-11-15 14:12:07 +08:00
Craig Topper	0019565e93	[RISCV] Don't create BuildPairF64 or SplitF64 nodes without D or Zdinx. (#116159 ) The fix in ReplaceNodeResults is the only one really required for the known crash. I couldn't hit the case in LowerOperation because that requires (f64 (bitcast i64)), but the result type is softened before the input so we don't get a chance to legalize the input. The change to the setOperationAction call was an observation that a i64<->vector cast should not be custom legalized on RV32. The custom code already calls isTypeLegal on the scalar type.	2024-11-14 09:54:33 -08:00
Kazu Hirata	4048c64306	[llvm] Remove redundant control flow statements (NFC) (#115831 ) Identified with readability-redundant-control-flow.	2024-11-12 10:09:42 -08:00
Kazu Hirata	82d5dd28b4	[RISCV] Remove unused includes (NFC) (#115814 ) Identified with misc-include-cleaner.	2024-11-11 22:54:54 -08:00
Luke Lau	b2e2d8b3f6	[RISCV] Enable scalable loop vectorization for zvfhmin/zvfbfmin (#115272 ) This PR enables scalable loop vectorization for f16 with zvfhmin and bf16 with zvfbfmin. Enabling this was dependent on filling out the gaps for scalable zvfhmin/zvfbfmin codegen, but everything that the loop vectorizer might emit should now be handled. It does this by marking f16 and bf16 as legal in `isLegalElementTypeForRVV`. There are a few users of `isLegalElementTypeForRVV` that have already been enabled in other PRs: - `isLegalStridedLoadStore` #115264 - `isLegalInterleavedAccessType` #115257 - `isLegalMaskedLoadStore` #115145 - `isLegalMaskedGatherScatter` #114945 The remaining user is `isLegalToVectorizeReduction`. We can't promote f16/bf16 reductions to f32 so we need to disable them for scalable vectors. The cost model actually marks these as invalid, but for out-of-tree reductions `ComputeReductionResult` doesn't get costed and it will end up emitting a reduction intrinsic regardless, so we still need to mark them as illegal. We might be able to remove this restriction later for fmax and fmin reductions.	2024-11-11 13:29:48 +08:00
Gergely Futo	a25d91a164	[RISCV] Skip DAG combine for bitcast fabs/fneg (#115325 ) Disable the DAG combine for bitcast fabs/fneg in case of the zdinx extension. The combine folds the fabs/fneg nodes in some cases. This might result in suboptimal code if compiled with the zdinx extension. In case of the zdinx extension, there is no need to load the double value from an x register to an f register, so the combine can be skipped.	2024-11-08 08:37:46 +01:00
Philip Reames	02668f60a9	[RISCV] Match single source deinterleave shuffles for vnsrl (#114878 ) We had previously only been matching the two source case where both sources came from a wider source type. We can also match the single source case - provided the result is m4 or smaller because we will need a wider type to represent the source. The main goal of this to ensure that vnsrl matching is robust to a possible change in canonicalization for length changing shuffles that I'm considering, but it has the nice effect of picking up a few cases we missed along the way.	2024-11-07 14:34:05 -08:00
Luke Lau	343a810725	[RISCV] Allow f16/bf16 with zvfhmin/zvfbfmin as legal strided access (#115264 ) This is also split off from the zvfhmin/zvfbfmin isLegalElementTypeForRVV work. Enabling this will cause SLP and RISCVGatherScatterLowering to emit @llvm.experimental.vp.strided.{load,store} intrinsics, and codegen support for this was added in #109387 and #114750.	2024-11-07 14:40:15 +08:00

1 2 3 4 5 ...

1816 Commits