llvm-project

Author	SHA1	Message	Date
Min-Yih Hsu	ca05058b49	[IA][RISCV] Recognize deinterleaved loads that could lower to strided segmented loads (#151612 ) Turn the following deinterleaved load patterns ``` %l = masked.load(%ptr, /mask=/110110110110, /passthru=/poison) %f0 = shufflevector %l, [0, 3, 6, 9] %f1 = shufflevector %l, [1, 4, 7, 10] %f2 = shufflevector %l, [2, 5, 8, 11] ``` into ``` %s = riscv.vlsseg2(/passthru=/poison, %ptr, /mask=/1111) %f0 = extractvalue %s, 0 %f1 = extractvalue %s, 1 %f2 = poison ``` The mask `110110110110` is regarded as 'gap mask' since it effectively skips the entire third field / component. Similarly, turning the following snippet ``` %l = masked.load(%ptr, /mask=/110000110000, /passthru=/poison) %f0 = shufflevector %l, [0, 3, 6, 9] %f1 = shufflevector %l, [1, 4, 7, 10] ``` into ``` %s = riscv.vlsseg2(/passthru=/poison, %ptr, /mask=/1010) %f0 = extractvalue %s, 0 %f1 = extractvalue %s, 1 ``` Right now this patch only tries to detect gap mask from a constant mask supplied to a masked.load/vp.load.	2025-08-12 14:08:18 -07:00
Craig Topper	73685583c8	[VP][RISCV] Add a vp.load.ff intrinsic for fault only first load. (#128593 ) There's been some interest in supporting early-exit loops recently. https://discourse.llvm.org/t/rfc-supporting-more-early-exit-loops/84690 This patch was extracted from our downstream where we've been using it in our vectorizer.	2025-08-05 16:12:42 -07:00
Luke Lau	5a80274cae	[RISCV] Reuse lowerToScalableOp for more nodes. NFC (#151911 ) A lot of fixed-length custom lowerings just involve inserting the operands into a scalable container and extracting the result out, and lowerToScalableOp already does this. We just need to teach it to handle operands with different element types (but same vector element count), and we can reuse it for vselect/zext/sext/setcc/fcopysign.	2025-08-05 00:36:36 +08:00
Luke Lau	2a5ac19605	Revert "[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882 )" This reverts commit fe4f6c1a58ab4f00a88a97af01000b6783b573ee, but leaves the tests that were added. The original commit mistakenly assumed that if regular bf16/f16 loads and stores could be lowered without zvfbfmin/zvfhmin, then so too could masked loads/stores and gathers/scatters. However SelectionDAG can't actually type-legalize masked.load/stores since it needs to be done in ScalarizeMaskedMemIntrinPass. This was causing crashes on IREE because we now returned true for isLegalMaskedLoadStore. The original intent of this was to remove a discrepancy in the loop vectorizer tests whenever predication was enabled, but this has gone away after 92d09245d61dce80d3e68a27cc34d5fc6f062c93. So I don't think we need to reapply this patch.	2025-07-30 13:29:47 +08:00
Luke Lau	fe4f6c1a58	[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882 ) When vectorizing with predication some loops that were previously vectorized without zvfhmin/zvfbfmin will no longer be vectorized because the masked load/store or gather/scatter cost returns illegal. This is due to a discrepancy where for these costs we check isLegalElementTypeForRVV but for regular memory accesses we don't. But for bf16 and f16 vectors we don't actually need the extension support for loads and stores, so this adds a new function which takes this into account. For regular memory accesses we should probably also e.g. return an invalid cost for i64 elements on zve32x, but it doesn't look like we have tests for this yet. We also should probably not be vectorizing these bf16/f16 loops to begin with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this is due to the scalar costs being too cheap. I've added tests for this in a100f6367205c6a909d68027af6a8675a8091bd9 to fix in another patch.	2025-07-28 22:59:49 +08:00
Jim Lin	4e3266fb6e	[RISCV] Implement load/store support for XAndesBFHCvt (#150350 ) We use `lh` to load 2 bytes from memory into a gpr, then mask this gpr with -65536 to emulate nan-boxing behavior, and then the value in gpr is moved to fpr using `fmv.w.x`. To move the value back from fpr to gpr, we use `fmv.x.w` and finally, `sh` is used to store the lower 2 bytes back to memory. If zfh is enabled at the same time, we can just use flh/fsw to load/store bf16 directly.	2025-07-25 11:29:17 +08:00
Philip Reames	dbd9eae95a	[IA] Support vp.store in lowerinterleavedStore (#149605 ) Follow up to 28417e64, and the whole line of work started with 4b81dc7. This change merges the handling for VPStore - currently in lowerInterleavedVPStore - into the existing dedicated routine used in the shuffle lowering path. This removes the last use of the dedicated lowerInterleavedVPStore and thus we can remove it. This contains two changes which are functional. First, like in 28417e64, merging support for vp.store exposes the strided store optimization for code using vp.store. Second, it seems the strided store case had a significant missed optimization. We were performing the strided store at the full unit strided store type width (i.e. LMUL) rather than reducing it to match the input width. This became obvious when I tried to use the mask created by the helper routine as it caused a type incompatibility. Normally, I'd try not to include an optimization in an API rework, but structuring the code to both be correct for vp.store and not optimize the existing case turned out be more involved than seemed worthwhile. I could pull this part out as a pre-change, but its a bit awkward on it's own as it turns out to be somewhat of a half step on the possible optimization; the full optimization is complex with the old code structure. --------- Co-authored-by: Craig Topper <craig.topper@sifive.com>	2025-07-22 15:50:17 -07:00
Philip Reames	28417e6459	[IA] Support vp.load in lowerInterleavedLoad [nfc-ish] (#149174 ) This continues in the direction started by commit 4b81dc7. We essentially merges the handling for VPLoad - currently in lowerInterleavedVPLoad - into the existing dedicated routine. This removes the last use of the dedicate lowerInterleavedVPLoad and thus we can remove it. This isn't quite NFC as the main callback has support for the strided load optimization whereas the VPLoad specific version didn't. So this adds the ability to form a strided load for a vp.load deinterleave with one shuffle used.	2025-07-17 17:29:28 -07:00
Philip Reames	b9adc4a59c	[IA] Use a single callback for lowerInterleaveIntrinsic [nfc] (#148978 ) (#149168 ) This continues in the direction started by commit 4b81dc7. We essentially merges the handling for VPStore - currently in lowerInterleavedVPStore which is shared between shuffle and intrinsic based interleaves - into the existing dedicated routine.	2025-07-16 18:09:27 -07:00
Min-Yih Hsu	6824bcfdb4	[IA] Relax the requirement of having ExtractValue users on deinterleave intrinsic (#148716 ) There are cases where InstCombine / InstSimplify might sink extractvalue instructions that use a deinterleave intrinsic into successor blocks, which prevents InterleavedAccess from kicking in because the current pattern requires deinterleave intrinsic to be used by extractvalue. However, this requirement is bit too strict while we could have just replaced the users of deinterleave intrinsic with whatever generated by the target TLI hooks.	2025-07-16 13:46:02 -07:00
Serge Pavlov	905bb5bddb	[RISCV][FPEnv] Lowering of fpmode intrinsics (#148569 ) The change implements custom lowering of `get_fpmode`, `set_fpmode` and `reset_fpmode` for RISCV target. The implementation is aligned with the functions `fegetmode` and `fesetmode` in GLIBC.	2025-07-16 16:02:15 +07:00
Philip Reames	4b81dc75f4	[IA] Use a single callback for lowerDeinterleaveIntrinsic [nfc] (#148978 ) This essentially merges the handling for VPLoad - currently in lowerInterleavedVPLoad which is shared between shuffle and intrinsic based interleaves - into the existing dedicated routine. My plan is that if we like this factoring is that I'll do the same for the intrinsic store paths, and then remove the excess generality from the shuffle paths since we don't need to support both modes in the shared VPLoad/Store callbacks. We can probably even fold the VP versions into the non-VP shuffle variants in the analogous way.	2025-07-15 18:08:57 -07:00
Min-Yih Hsu	bf94c8ddb3	[RISCV][NFC] Split InterleavedAccess related TLI hooks into a separate file (#148040 ) There have been discussions on splitting RISCVISelLowering.cpp. I think InterleavedAccess related TLI hooks would be some of the low hanging fruit as it's relatively isolated and also because X86 is already doing it. NFC.	2025-07-11 11:04:41 -07:00
Boyao Wang	697beb3f17	[TargetLowering] Change getOptimalMemOpType and findOptimalMemOpLowering to take LLVM Context (#147664 ) Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So that we can use EVT::getVectorVT to generate EVT type in getOptimalMemOpType. Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).	2025-07-10 11:11:09 +08:00
Philip Reames	7bf439d260	[IA] Partially revert interface change from 4a66ba As noted in post commit review, the API change here was not required. I'd apparently confused myself when teasing apart patches from my development branch.	2025-07-09 12:02:52 -07:00
Philip Reames	4a66ba2a4d	[IA] Support deinterleave intrinsics w/ fewer than N extracts (#147572 ) For the fixed vector cases, we already support this, but the deinterleave intrinsic cases (primary used by scalable vectors) didn't. Supporting it requires plumbing through the Factor separately from the extracts, as there can now be fewer extracts than the Factor. Note that the fixed vector path handles this slightly differently - it uses the shuffle and indices scheme to achieve the same thing.	2025-07-09 09:41:07 -07:00
Philip Reames	391dafd8af	[RISCV] Consolidate both copies of getLMUL1VT [nfc] (#144568 ) Put one copy on RISCVTargetLowering as a static function so that both locations can use it, and rename the method to getM1VT for slightly improved readability.	2025-06-17 11:28:43 -07:00
Serge Pavlov	953a778fab	[RISCV][FPEnv] Lowering of fpenv intrinsics (#141498 ) The change implements custom lowering of `get_fpenv`, `set_fpenv` and `reset_fpenv` for RISCV target.	2025-06-11 19:08:23 +07:00
Craig Topper	67a0844812	[RISCV] Add BREV8 to SimplifyDemandedBitsForTargetNode. (#141898 )	2025-05-29 08:56:48 -07:00
Marius Kamp	10647685ca	[SDAG] Make Select-with-Identity-Fold More Flexible; NFC (#136554 ) This change adds new parameters to the method `shouldFoldSelectWithIdentityConstant()`. The method now takes the opcode of the select node and the non-identity operand of the select node. To gain access to the appropriate arguments, the call of `shouldFoldSelectWithIdentityConstant()` is moved after all other checks have been performed. Moreover, this change adjusts the precondition of the fold so that it would work for `SELECT` nodes in addition to `VSELECT` nodes. No functional change is intended because all implementations of `shouldFoldSelectWithIdentityConstant()` are adjusted such that they restrict the fold to a `VSELECT` node; the same restriction as before. The rationale of this change is to make more fine grained decisions possible when to revert the InstCombine canonicalization of `(select c (binop x y) y)` to `(binop (select c x idc) y)` in the backends.	2025-05-29 09:46:39 +02:00
Philip Reames	9b4de7d885	[RISCV] Lower PARTIAL_REDUCE_[S/U]MLA via zvqdotq (#140950 ) The semantics of the PARTIAL_REDUCE_SMLA with i32 result element, and i8 sources corresponds to vqdot. Analogously PARTIAL_REDUCE_UMLA corresponds to vqdotu. There is currently no vqdotsu equivalent. This patch is a starting place. We can extend this quite a bit more, and I plan to take a look at the fixed vector lowering, the TTI hook to drive loop vectorizer, and to try to integrate the reduction based lowering I'd added for zvqdotq into this flow.	2025-05-22 08:29:05 -07:00
Gergely Futo	e3b167cb22	[RISCV] Implement RISCVTargetLowering::getRoundingControlRegisters (#139864 ) By adding FRM/FFLAGS as implicit defs, ReadFRM is not optimized out.	2025-05-19 23:13:08 +02:00
Sam Elliott	c60db55568	[RISCV] TableGen-erate RISC-V SDNodes (#138381 ) This commit moves RISC-V to auto-generate its target-specific SDNode types. The biggest change is that SDNodes can now be validated against their expected type profiles, and that we don't need to edit several different files when declaring a new one. This takes Sergei's work in #119709 and "finishes" it - by moving the final five RISCVISD opcodes into tablegen (including defining their types), and by ensuring the tablegen has expected closing scope comments. Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>	2025-05-09 12:36:59 -07:00
Min-Yih Hsu	63fcce6611	[IA][RISCV] Add support for vp.load/vp.store with shufflevector (#135445 ) Teach InterleavedAccessPass to recognize vp.load + shufflevector and shufflevector + vp.store. Though this patch only adds RISC-V support to actually lower this pattern. The vp.load/vp.store in this pattern require constant mask.	2025-05-07 15:51:19 -07:00
Philip Reames	1ac489c8e3	[RISCV] Initial codegen support for zvqdotq extension (#137039 ) This patch adds pattern matching for the basic usages of the dot product instructions introduced by the experimental zvqdotq extension. It specifically only handles the case where the pattern is feeding a i32 sum reduction as we need to reassociate the reduction tree to use these instructions. The vecreduce_add (sext) and vecreduce_add (zext) cases are included mostly to exercise the VX matchers. For the generic matching, we fail to match due to an order of combine issue which results in the bitcast being separated from the splat. I chose to do this lowering as an early combine so as to avoid having to integrate the entire logic into the reduction lowering flow. In particular, that would get a lot more complicated as we extend this to handle add-trees feeding the reductions.	2025-05-07 08:15:44 -07:00
Philip Reames	9062a38d5d	[RISCV] Add codegen support for ri.vinsert.v.x and ri.vextract.x.v (#136708 ) These instructions are included in XRivosVisni. They perform a scalar insert into a vector (with a potentially non-zero index) and a scalar extract from a vector (with a potentially non-zero index) respectively. They're very analogous to vmv.s.x and vmv.x.s respectively. The instructions do have a couple restrictions: 1) Only constant indices are supported w/a uimm5 format. 2) There are no FP variants. One important property of these instructions is that their throughput and latency are expected to be LMUL independent.	2025-04-25 07:59:40 -07:00
Philip Reames	901ac60db7	[RISCV] Use ri.vzip2{a,b} for interleave2 if available (#136364 ) If XRivosVizip is available, the ri.vzip2a and ri.vzip2b instructions can be used perform a interleave shuffle. This patch only effects the intrinsic lowering (and thus scalable vectors). Fixed vectors go through shuffle lowering and the zip2a (but not zip2b) case is already handled there..	2025-04-22 08:25:17 -07:00
Philip Reames	722d5890cc	[RISCV] Lower e64 vector_deinterleave via ri.vunzip2{a,b} if available (#136321 ) If XRivosVizip is available, the ri.vunzip2a and ri.vunzip2b can be used to the concatenation and register deinterleave shuffle. This patch only effects the intrinsic lowering (and thus scalable vectors because the fixed vectors go through shuffle lowering). Note that this patch is restricted to e64 for staging purposes only. e64 is obviously profitable (i.e. we remove a vcompress). At e32 and below, our alternative is a vnsrl instead, and we need a bit more complexity around lowering with fractional LMUL before the ri.vunzip2a/b versions becomes always profitable. I'll post the followup change once this lands.	2025-04-18 10:40:15 -07:00
Philip Reames	336b290923	[RISCV] Use a DAG combine to prune pointless vrgather.vi (#135392 ) If the vrgather.vi is preceeded by a vmv.v.x which writes a superset of the lanes writen by the vrgather, and the vrgather has no passthru, then the vrgather has no semantic effect. This is the start of a mini-series of patches around rewriting vrgather.vi/vx preceeded by vmv.v.x, vfmf.v.f, vmv.s.x, etc... Starting with the simplest, but also lowest impact. One point I'd like a second oppinion on is the out of bounds semenatic change. As far as I can tell, all the indices are in bounds by construction. The doc change is as much as I couldn't figure out how to test the alternative as anything else.	2025-04-11 20:02:53 -07:00
Philip Reames	f8ee58a3cb	[RISCV] Initial codegen support for the XRivosVizip extension (#131933 ) This implements initial code generation support for a subset of the xrivosvizip extension. Specifically, this adds support for vzipeven, vzipodd, and vzip2a, but not vzip2b, vunzip2a, or vunzip2b. The others will follow in separate patches. One review note: The zipeven/zipodd matchers were recently rewritten to better match upstream style, so careful review there would be appreciated. The matchers don't yet support type coercion to wider types. This will be done in a future patch.	2025-03-29 15:25:56 -07:00
Piotr Fusik	96925fa84c	[RISCV] Add vector hasAndNot to enable optimizations (#132438 ) Enables transforms that emit the VANDN instruction. Co-authored-by: Craig Topper <craig.topper@sifive.com>	2025-03-26 10:40:00 +01:00
Mikhail R. Gadelha	f138e36d52	[SelectionDAG][RISCV] Avoid store merging across function calls (#130430 ) This patch improves DAGCombiner's handling of potential store merges by detecting function calls between loads and stores. When a function call exists in the chain between a load and its corresponding store, we avoid merging these stores if the spilling is unprofitable. We had to implement a hook on TLI, since TTI is unavailable in DAGCombine. Currently, it's only enabled for riscv. This is the DAG equivalent of PR #129258	2025-03-22 10:35:25 -03:00
Sam Elliott	3492245ac0	[RISCV] QCI Interrupt Support (#129957 ) This change adds support for `qci-nest` and `qci-nonest` interrupt attribute values. Both of these are machine-mode interrupts, which use instructions in Xqciint to push and pop A- and T-registers (and a few others) from the stack. In particular: - `qci-nonest` uses `qc.c.mienter` to save registers at the start of the function, and uses `qc.c.mileaveret` to restore those registers and return from the interrupt. - `qci-nest` uses `qc.c.mienter.nest` to save registers at the start of the function, and uses `qc.c.mileaveret` to restore those registers and return from the interrupt. - `qc.c.mienter` and `qc.c.mienter.nest` both push registers ra, s0 (fp), t0-t6, and a0-a10 onto the stack (as well as some CSRs for the interrupt context). The difference between these is that `qc.c.mienter.nest` re-enables M-mode interrupts. - `qc.c.mileaveret` will restore the registers that were saved by `qc.c.mienter(.nest)`, and return from the interrupt. These work for both standard M-mode interrupts and the non-maskable interrupt CSRs added by Xqciint. The `qc.c.mienter`, `qc.c.mienter.nest` and `qc.c.mileaveret` instructions are compatible with push and pop instructions, in as much as they (mostly) only spill the A- and T-registers, so we can use the `Zcmp` or `Xqccmp` instructions to spill the S-registers. This combination (`qci-(no)nest` and `Xqccmp`/`Zcmp`) is not implemented in this change. The `qc.c.mienter(.nest)` instructions have a specific register storage order so they preserve the frame pointer convention linked list past the current interrupt handler and into the interrupted code and frames if frame pointers are enabled. Co-authored-by: Pankaj Gode <quic_pgode@quicinc.com>	2025-03-06 13:31:08 -08:00
Craig Topper	0cc532b79e	[RISCV] Move the RISCVII namespaced enums into RISCVVType namespace in RISCVTargetParser.h. NFC (#127585 ) The VLMUL and policy enums originally lived in RISCVBaseInfo.h in the backend which is where everything else in the RISCVII namespace is defined. RISCVTargetParser.h is used by much more of the compiler and it doesn't really make sense to have 2 different namespaces exposed. These enums are both associated with VTYPE so using the RISCVVType namespace seems like a good home for them.	2025-02-18 08:27:25 -08:00
Min-Yih Hsu	005b23bb3b	[IA][RISCV] Support VP loads/stores in InterleavedAccessPass (#120490 ) Teach InterleavedAccessPass to recognize the following patterns: - vp.store an interleaved scalable vector - Deinterleaving a scalable vector loaded from vp.load Upon recognizing these patterns, IA will collect the interleaved / deinterleaved operands and delegate them over to their respective newly-added TLI hooks. For RISC-V, these patterns are lowered into segmented loads/stores Right now we only recognized power-of-two (de)interleave cases, in which (de)interleave4/8 are synthesized from a tree of (de)interleave2. --------- Co-authored-by: Nikolay Panchenko <nicholas.panchenko@gmail.com>	2025-02-04 11:07:34 -08:00
Min-Yih Hsu	bc74a1edbe	[IA] Generalize the support for power-of-two (de)interleave intrinsics (#123863 ) Previously, AArch64 used pattern matching to support llvm.vector.(de)interleave of 2 and 4; RISC-V only supported (de)interleave of 2. This patch consolidates the logics in these two targets by factoring out the common factor calculations into the InterleaveAccess Pass.	2025-01-23 15:27:51 -08:00
yingopq	754ed95b66	[Mips] Fix compiler crash when returning fp128 after calling a functi… (#117525 ) …on returning { i8, i128 } Fixes https://github.com/llvm/llvm-project/issues/96432.	2025-01-20 16:47:40 +08:00
Raphael Moreira Zinsly	01d7f434d2	[RISCV] Stack clash protection for dynamic alloca (#122508 ) Create a probe loop for dynamic allocation and add the corresponding SelectionDAG support in order to use it.	2025-01-16 11:58:42 -08:00
Sergei Barannikov	9ae92d7056	[SelectionDAG] Virtualize isTargetStrictFPOpcode / isTargetMemoryOpcode (#119969 ) With this change, targets are no longer required to put memory / strict-fp opcodes after special `ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers. This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709). Pull Request: https://github.com/llvm/llvm-project/pull/119969	2024-12-21 05:29:51 +03:00
Craig Topper	dc72ec808d	[RISCV] Custom legalize vp.merge for mask vectors. (#120479 ) The default legalization uses vmslt with a vector of XLen to compute a mask. This doesn't work if the type isn't legal. For fixed vectors it will scalarize. For scalable vectors it crashes the compiler. This patch uses an alternate strategy that promotes the i1 vector to an i8 vector and does the merge. I don't claim this to be the best lowering. I wrote it quickly almost 3 years ago when a crash was reported in our downstream. Fixes #120405.	2024-12-18 19:19:14 -08:00
Brendan Sweeney	bfe8a21bad	[RISCV][ISEL] Lowering to load-acquire/store-release for RISCV Zalasr (#82914 ) Lowering to load-acquire/store-release for RISCV Zalasr. Currently uses the psABI lowerings for WMO load-acquire/store-release (which are identical to A.7). These are incompatable with the A.6 lowerings currently used by LLVM. This should be OK for now since Zalasr is behind the enable experimental extensions flag, but needs to be fixed before it is removed from that. For TSO, it uses the standard Ztso mappings except for lowering seq_cst loads/store to load-acquire/store-release, I had Andrea review that.	2024-12-17 00:19:45 -08:00
Raphael Moreira Zinsly	708a478d67	[RISCV] Add stack clash protection (#117612 ) Enable `-fstack-clash-protection` for RISCV and stack probe for function prologues. We probe the stack by creating a loop that allocates and probe the stack in ProbeSize chunks. We emit an unrolled probe loop for small allocations and emit a variable length probe loop for bigger ones.	2024-12-10 16:48:26 +00:00
Craig Topper	b076fbb844	[TargetLowering] Use Type* instead of EVT in shouldSignExtendTypeInLibCall. (#118587 ) I want to use this function for GISel too so Type * is a better common interface. All of the callers already convert EVT to Type * as needed by calling lowering anyway.	2024-12-03 22:06:55 -08:00
Philip Reames	56cb5cbfcd	[RISCV] Remove RISCVISD::VNSRL_VL and adjust deinterleave lowering to match (#118391 ) Instead of directly lowering to vnsrl_vl and having custom pattern matching for that case, we can just lower to a (legal) shift and truncate, and let generic pattern matching produce the vnsrl. The major motivation for this is that I'm going to reuse this logic to handle e.g. deinterleave4 w/ i8 result. The test changes aren't particularly interesting. They're minor code improvements - I think because we do slightly better with the insert_subvector patterns, but that's mostly irrelevant.	2024-12-02 13:39:12 -08:00
Craig Topper	900c056531	[RISCV] Add an implementation of findRepresentativeClass to assign i32 to GPRRegClass for RV64. (#116165 ) This is an alternative fix for #81192. This allows the SelectionDAG scheduler to be able to find a representative register class for i32 on RV64. The representative register class is the super register class with the largest spill size that is also legal. The default implementation of findRepresentativeClass only works for legal types which i32 is not for RV64. I did some investigation of why tablegen uses i32 in output patterns on RV64. It appears it comes down to a function called ForceArbitraryInstResultType that picks a type for the output pattern when the isel pattern isn't specific enough. I believe it picks the smallest type(lowested numbered) to resolve the conflict. A similar issue occurs for f16 and bf16 which both use the FPR16 register class. If the isel pattern doesn't specify, tablegen may find both f16 and bf16 and may pick bf16 from Zfh pattern when Zfbfmin isn't present. Since bf16 isn't legal in that case, findRepresentativeClass will fail. For i8, i16, i32, this patch calls the base class with XLenVT to get the representative class since XLenVT is always legal. For bf16/f16, we call the base class with f32 since all of the f16/bf16 extensions depend on either F or Zfinx which will make f32 a legal type. The final representative register class further depends on whether D or Zdinx is also enabled, but that should be handled by the default implementation.	2024-11-18 10:07:20 -08:00
Sam Elliott	4615cc38f3	[RISCV] Inline Assembly Support for GPR Pairs ('R') (#112983 ) This patch adds support for getting even-odd general purpose register pairs into and out of inline assembly using the `R` constraint as proposed in riscv-non-isa/riscv-c-api-doc#92 There are a few different pieces to this patch, each of which need their own explanation. - Renames the Register Class used for f64 values on rv32i_zdinx from `GPRPair` to `GPRF64Pair`. These register classes are kept broadly unmodified, as their primary value type is used for type inference over selection patterns. This rename affects quite a lot of files. - Adds new `GPRPair` register classes which will be used for `R` constraints and for instructions that need an even-odd GPR pair. This new type is used for `amocas.d.`(rv32) and `amocas.q.`(rv64) in Zacas, instead of the `GPRF64Pair` class being used before. - Marks the new `GPRPair` class legal as for holding a `MVT::Untyped`. Two new RISCVISD node types are added for creating and destructing a pair - `BuildGPRPair` and `SplitGPRPair`, and are introduced when bitcasting to/from the pair type and `untyped`. - Adds functionality to `splitValueIntoRegisterParts` and `joinRegisterPartsIntoValue` to handle changing `i<2xlen>` MVTs into `untyped` pairs. - Adds an override for `getNumRegisters` to ensure that `i<2*xlen>` values, when going to/from inline assembly, only allocate one (pair) register (they would otherwise allocate two). This is due to a bug in SelectionDAGBuilder.cpp which other backends also work around. - Ensures that Clang understands that `R` is a valid inline assembly constraint. - This also allows `R` to be used for `f64` types on `rv32_zdinx` architectures, where doubles are stored in a GPR pair.	2024-11-18 17:45:58 +00:00
Craig Topper	55dbacbf07	[RISCV] Remove RISCVISD::VFCVT_X(U)_F_VL by using VFCVT_RM_X(U)_F_VL with DYN rounding mode. NFC (#114306 )	2024-10-30 19:16:23 -07:00
Pengcheng Wang	b799cc3418	[RISCV] Add lowering for @llvm.experimental.vector.compress (#113291 ) This intrinsic was introduced by #92289 and currently we just expand it for RISC-V. This patch adds custom lowering for this intrinsic and simply maps it to `vcompress` instruction. Fixes #113242.	2024-10-23 14:22:32 +08:00
Sam Elliott	78e026f845	[RISCV][NFC] Document F64 ISD Nodes	2024-10-22 11:45:12 -07:00
Sam Elliott	9b9c2a082c	[RISCV][NFC] Move RISCVISD::TAIL beside RISCVISD::CALL	2024-10-22 11:12:58 -07:00

1 2 3 4 5 ...

467 Commits