llvm-project

Author	SHA1	Message	Date
Philip Reames	c6f2d35c4d	Fix a build warning introduce by my febbf910	2024-11-27 13:41:29 -08:00
Philip Reames	febbf9105f	[RISCV] Match vcompress during shuffle lowering (#117748 ) This change matches a subset of vcompress patterns during shuffle lowering. The subset implemented requires a contiguous prefix of demanded elements followed by undefs. This subset was chosen for two reasons: 1) which elements to spurious demand is a non-obvious problem, and 2) my first several attempts at implementing the general case were buggy. I decided to go with the simple case to start with. vcompress scales better with LMUL than a general vrgather, and at least the SpaceMit X60, has higher throughput even at m1. It also has the advantage of requiring smaller vector constants at one bit per element as opposed to vrgather which is a minimum of 8 bits per element. The downside to using vcompress is that we can't fold a vselect into it, as there is no masked vcompress variant. For reference, here are the relevant throughputs from camel-cdr's data table on BP3 (X60): vrgather.vv v8,v16,v24 4.0 16.0 64.0 256.0 vcompress.vm v8,v16,v24 3.0 10.0 36.0 136. vmerge.vvm v8,v16,v24,v0 2.0 4.0 8.0 16.0 The largest concern with the extra vmerge is that we locally increase register pressure. If we do have masking, we also have a passthru, without the ability to fold that into the vcompress, we need to keep it alive a bit longer. This can hurt at e.g. m8 where we have very few architectural registers. As compared with the vrgather.vv sequence, this is only one additional m1 VREG - since we no longer need the index vector. It compares slightly worse against vrgatherie16.vv which can use index vectors smaller than other operands. Note that we could potentially fold the vmerge if only tail elements are being preserved; I haven't investigated this. It is unfortunately hard given our current lowering structure to know if we're emitting a shuffle where masking will follow. Thankfully, it doesn't seem to show up much in practice, so I think we can probably ignore it. This patch only handles single source compress idioms at the moment. This is an effort to avoid interacting with other patches on review for changing how we canonicalize length changing shuffles.	2024-11-27 13:23:18 -08:00
Craig Topper	c2bb056482	[SelectionDAG][RISCV][AArch64] Allow f16 STRICT_FLDEXP to be promoted. Fix integer promotion of STRICT_FLDEXP in type legalizer. (#117633 ) A special case in type legalization wasn't accounting for different operand numbering between FLDEXP and STRICT_FLDEXP. AArch64 already asked STRICT_FLDEXP to be promoted, but had no test for it.	2024-11-25 16:12:45 -08:00
Craig Topper	ed6749a405	[RISCV] Promote frexp with Zfh. The default expansion tries to create an illegal integer type after legalization.	2024-11-25 10:27:37 -08:00
Craig Topper	20bd029a40	[RISCV] Promote fldexp with Zfh. (#117396 ) The default expansion tries to create i16 operations after type legalization. Fixes #117349	2024-11-25 09:08:56 -08:00
David Sherwood	9b76e7fc60	Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031 )" (#117556 ) This reverts commit 22ec44f509ff266b581dbb490d7b040473b7c31a.	2024-11-25 13:49:21 +00:00
David Sherwood	22ec44f509	[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031 ) For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant.	2024-11-25 09:25:01 +00:00
Craig Topper	29afbd5893	[RISCV] Add DAG combine to convert (iX ctpop (bitcast (vXi1 A))) into vcpop.m. (#117062 ) This only handles the simplest case where vXi1 is a legal vector type. If the vector type isn't legal we need to go through type legalization, but the pattern gets much harder to recognize after that. Either because ctpop gets expanded due to Zbb not being enabled, or the bitcast becoming a bitcast+extractelt, or the ctpop being split into multiple ctpops and adds, etc.	2024-11-21 11:12:07 -08:00
Craig Topper	cdd1e27124	[X86][RISCV] Don't emit JumpTableDebugInfo unless triple is OSBinFormatCOFF. (#117083 ) This makes the override in RISCV and X86 consistent with the base class implementation of expandIndirectJTBranch.	2024-11-21 09:38:16 -08:00
Sam Elliott	408659c5b5	[RISCV] Merge GPRPair and GPRF64Pair (#116094 ) As suggested by Craig, this tries to merge the two sets of register classes created in #112983, GPRPair* and GPRF64Pair*. - I added some explicit annotations to `RISCVInstrInfoD.td` which fixed the type inference issues I was seeing from tablegen for select patterns. - I've had to make the behaviour of `splitValueIntoRegisterParts` and `joinRegisterPartsIntoValue` cover more cases, because you cannot bitcast to/from untyped (the bitcast would otherwise have been inserted automatically by TargetLowering code). - I apparently didn't need to change `getNumRegisters` again, which continues to tell me there's a bug in the code for tied inputs. I added some more test coverage of this case but it didn't seem to help find the asserts I was finding before - I think the difference is between the default behaviour for integers which doesn't apply to floats. - There's still a difference between BuildGPRPair and BuildPairF64 (and the same for SplitGPRPair and SplitF64). I'm not happy with this, I think it's quite confusing, as they're very similar, just differing in whether they give a `untyped` or a `f64`. I haven't really worked out how the DAGCombiner copes if one meets the other, I know we have some of this for the f64 variants already, but they're a lot more complex than the GPRPair variants anyway.	2024-11-20 10:08:55 +00:00
Sam Elliott	c4030c896d	[RISCV] Fix FP64 DinX R Regclass (#116688 ) This was a typo in llvm/llvm-project#112983 that didn't cause build failures but is still wrong.	2024-11-19 12:42:27 +00:00
Craig Topper	ac17b50f50	[RISCV] Use getSignedTargetConstant. NFC	2024-11-18 12:20:44 -08:00
Craig Topper	900c056531	[RISCV] Add an implementation of findRepresentativeClass to assign i32 to GPRRegClass for RV64. (#116165 ) This is an alternative fix for #81192. This allows the SelectionDAG scheduler to be able to find a representative register class for i32 on RV64. The representative register class is the super register class with the largest spill size that is also legal. The default implementation of findRepresentativeClass only works for legal types which i32 is not for RV64. I did some investigation of why tablegen uses i32 in output patterns on RV64. It appears it comes down to a function called ForceArbitraryInstResultType that picks a type for the output pattern when the isel pattern isn't specific enough. I believe it picks the smallest type(lowested numbered) to resolve the conflict. A similar issue occurs for f16 and bf16 which both use the FPR16 register class. If the isel pattern doesn't specify, tablegen may find both f16 and bf16 and may pick bf16 from Zfh pattern when Zfbfmin isn't present. Since bf16 isn't legal in that case, findRepresentativeClass will fail. For i8, i16, i32, this patch calls the base class with XLenVT to get the representative class since XLenVT is always legal. For bf16/f16, we call the base class with f32 since all of the f16/bf16 extensions depend on either F or Zfinx which will make f32 a legal type. The final representative register class further depends on whether D or Zdinx is also enabled, but that should be handled by the default implementation.	2024-11-18 10:07:20 -08:00
Sam Elliott	4615cc38f3	[RISCV] Inline Assembly Support for GPR Pairs ('R') (#112983 ) This patch adds support for getting even-odd general purpose register pairs into and out of inline assembly using the `R` constraint as proposed in riscv-non-isa/riscv-c-api-doc#92 There are a few different pieces to this patch, each of which need their own explanation. - Renames the Register Class used for f64 values on rv32i_zdinx from `GPRPair` to `GPRF64Pair`. These register classes are kept broadly unmodified, as their primary value type is used for type inference over selection patterns. This rename affects quite a lot of files. - Adds new `GPRPair` register classes which will be used for `R` constraints and for instructions that need an even-odd GPR pair. This new type is used for `amocas.d.`(rv32) and `amocas.q.`(rv64) in Zacas, instead of the `GPRF64Pair` class being used before. - Marks the new `GPRPair` class legal as for holding a `MVT::Untyped`. Two new RISCVISD node types are added for creating and destructing a pair - `BuildGPRPair` and `SplitGPRPair`, and are introduced when bitcasting to/from the pair type and `untyped`. - Adds functionality to `splitValueIntoRegisterParts` and `joinRegisterPartsIntoValue` to handle changing `i<2xlen>` MVTs into `untyped` pairs. - Adds an override for `getNumRegisters` to ensure that `i<2*xlen>` values, when going to/from inline assembly, only allocate one (pair) register (they would otherwise allocate two). This is due to a bug in SelectionDAGBuilder.cpp which other backends also work around. - Ensures that Clang understands that `R` is a valid inline assembly constraint. - This also allows `R` to be used for `f64` types on `rv32_zdinx` architectures, where doubles are stored in a GPR pair.	2024-11-18 17:45:58 +00:00
Brandon Wu	206ee71918	[RISCV] Change vector tuple type's TypeSize to scalable (#114329 ) Vector tuple is basically multiple grouped vector, so its size is also determined by vscale, we need not to model it as a vector type but its size need to be scalable.	2024-11-17 18:52:49 +08:00
Brandon Wu	b4adce0056	[RISCV] Tuple intrinsics are creating overly aligned memory operands (#115804 ) The alignment should be same as its element type.	2024-11-15 14:12:07 +08:00
Craig Topper	0019565e93	[RISCV] Don't create BuildPairF64 or SplitF64 nodes without D or Zdinx. (#116159 ) The fix in ReplaceNodeResults is the only one really required for the known crash. I couldn't hit the case in LowerOperation because that requires (f64 (bitcast i64)), but the result type is softened before the input so we don't get a chance to legalize the input. The change to the setOperationAction call was an observation that a i64<->vector cast should not be custom legalized on RV32. The custom code already calls isTypeLegal on the scalar type.	2024-11-14 09:54:33 -08:00
Kazu Hirata	4048c64306	[llvm] Remove redundant control flow statements (NFC) (#115831 ) Identified with readability-redundant-control-flow.	2024-11-12 10:09:42 -08:00
Kazu Hirata	82d5dd28b4	[RISCV] Remove unused includes (NFC) (#115814 ) Identified with misc-include-cleaner.	2024-11-11 22:54:54 -08:00
Luke Lau	b2e2d8b3f6	[RISCV] Enable scalable loop vectorization for zvfhmin/zvfbfmin (#115272 ) This PR enables scalable loop vectorization for f16 with zvfhmin and bf16 with zvfbfmin. Enabling this was dependent on filling out the gaps for scalable zvfhmin/zvfbfmin codegen, but everything that the loop vectorizer might emit should now be handled. It does this by marking f16 and bf16 as legal in `isLegalElementTypeForRVV`. There are a few users of `isLegalElementTypeForRVV` that have already been enabled in other PRs: - `isLegalStridedLoadStore` #115264 - `isLegalInterleavedAccessType` #115257 - `isLegalMaskedLoadStore` #115145 - `isLegalMaskedGatherScatter` #114945 The remaining user is `isLegalToVectorizeReduction`. We can't promote f16/bf16 reductions to f32 so we need to disable them for scalable vectors. The cost model actually marks these as invalid, but for out-of-tree reductions `ComputeReductionResult` doesn't get costed and it will end up emitting a reduction intrinsic regardless, so we still need to mark them as illegal. We might be able to remove this restriction later for fmax and fmin reductions.	2024-11-11 13:29:48 +08:00
Gergely Futo	a25d91a164	[RISCV] Skip DAG combine for bitcast fabs/fneg (#115325 ) Disable the DAG combine for bitcast fabs/fneg in case of the zdinx extension. The combine folds the fabs/fneg nodes in some cases. This might result in suboptimal code if compiled with the zdinx extension. In case of the zdinx extension, there is no need to load the double value from an x register to an f register, so the combine can be skipped.	2024-11-08 08:37:46 +01:00
Philip Reames	02668f60a9	[RISCV] Match single source deinterleave shuffles for vnsrl (#114878 ) We had previously only been matching the two source case where both sources came from a wider source type. We can also match the single source case - provided the result is m4 or smaller because we will need a wider type to represent the source. The main goal of this to ensure that vnsrl matching is robust to a possible change in canonicalization for length changing shuffles that I'm considering, but it has the nice effect of picking up a few cases we missed along the way.	2024-11-07 14:34:05 -08:00
Luke Lau	343a810725	[RISCV] Allow f16/bf16 with zvfhmin/zvfbfmin as legal strided access (#115264 ) This is also split off from the zvfhmin/zvfbfmin isLegalElementTypeForRVV work. Enabling this will cause SLP and RISCVGatherScatterLowering to emit @llvm.experimental.vp.strided.{load,store} intrinsics, and codegen support for this was added in #109387 and #114750.	2024-11-07 14:40:15 +08:00
Luke Lau	f0e2301b7c	[RISCV] Allow f16/bf16 with zvfhmin/zvfbfmin as legal interleaved access (#115257 ) This is another piece split off from the work to add zvfhmin/zvfbfmin to isLegalElementTypeForRVV. This is needed to get InterleavedAccessPass to lower [de]interleaves to segment load/stores.	2024-11-07 12:35:59 +08:00
Luke Lau	481ff22b8b	[RISCV] Lower fixed-length vp_{gather,scatter} for zvfhmin/zvfbfmin (#115253 ) This uses the same lowering as masked gathers and scatters.	2024-11-07 12:28:13 +08:00
Luke Lau	05f87b2d65	[RISCV] Lower fixed-length mload/mstore for zvfhmin/zvfbfmin (#115145 ) This is the same idea as #114945.	2024-11-07 10:41:03 +08:00
Luke Lau	3a26feb607	[RISCV] Lower fixed-length mgather/mscatter for zvfhmin/zvfbfmin (#114945 ) In preparation for allowing zvfhmin and zvfbfmin in isLegalElementTypeForRVV, this lowers fixed-length masked gathers and scatters We need to mark f16 and bf16 as legal in isLegalMaskedGatherScatter otherwise ScalarizeMaskedMemIntrin will just scalarize them, but we can move this back into isLegalElementTypeForRVV afterwards. The scalarized codegen required #114938, #114927 and #114915 to not crash.	2024-11-06 10:33:06 +08:00
Philip Reames	a905203b9e	[RISCV] Prefer strided load for interleave load with only one lane active (#115069 ) If only one of the elements is actually used, then we can legally use a strided load in place of the segment load. Doing so reduces vector register pressure, so if both segment and strided are believed to be element/segment at a time, then prefer the strided load variant. Note that I've seen the vectorizer emitting wide interleave loads to represent a strided load, so this does happen in practice. It doesn't matter much for small LMUL*NF, but at large NF can start causing problems in register allocation. Note that this patch only covers the fixed vector formation cases. In theory, we should do the same patch for scalable, but we can currently only represent NF2 in scalable IR, and NF2 is assumed to be optimized to better than segment-at-a-time by default, so there's currently nothing to do.	2024-11-05 16:15:20 -08:00
Philip Reames	c50bb99d87	[RISCV] Allow vslidedown.vx in isExtractSubvectorCheap for half VT case (#114886 ) We have a special case where we allow the extract of the high half of a vector and consider it cheap. However, we had previously required that the type have no more than 32 elements for this to work. (Because 64/2=32, and the largest immediate for a vslidedown.vi is 31.) This has the effect of pessimizing shuffle vector lowering for long vectors - i.e. at SEW=e8, zvl128b, an m2 or m4 deinterleave can't be matched because it gets scalarized during DAG construction and can't be "profitably" rebuilt by DAG combine. Note that for RISCV, scalarization via insert and extract is extremely expensive (i.e. two vslides per element), so a slide + two half width shuffles is almost always a net win. (i.e, this isn't really specific to vnsrl) Separately, I want to look at the decision to scalarize at all, but it seems worthwhile adjusting this while we're at it regardless.	2024-11-05 07:25:35 -08:00
Luke Lau	f50547e8c8	[RISCV] Lower fixed-length bf16 vector bitcasts (#114938 ) Currently we crash because we can't select it. Also add tests for f16 with zvfhmin.	2024-11-05 15:12:24 +08:00
Luke Lau	aea6b255f0	[RISCV] Lower fixed-length {insert,extract}_vector_elt on zvfhmin/zvfbfmin (#114927 ) RISCVTargetLowering::lower{INSERT,EXTRACT}_VECTOR_ELT already handles f16 and bf16 scalable vectors after #110221, so we can reuse it for fixed-length vectors.	2024-11-05 13:57:59 +08:00
Luke Lau	a8f80897ba	[RISCV] Handle zvfhmin/zvfbfmin in lowerVECTOR_SHUFFLEAsVSlide1 (#114925 ) Most of lowerVECTOR_SHUFFLE lowers to nodes that work on f16 and bf16 vectors, with the exception of the vslide1 lowering which tries to emit vfslide1s. Handle this case as an integer vslide1 via fmv.x.h. Fixes #114893	2024-11-05 13:46:35 +08:00
Luke Lau	efe9ba56e7	[RISCV] Deduplicate VECTOR_SHUFFLE fixed-length FP setOperationAction. NFC Also reshuffle the nodes so they're in enum order.	2024-11-05 11:24:38 +08:00
Craig Topper	8d023b7d66	[RISCV] Combine some setOperationAction calls and update a FIXME. NFC	2024-11-04 13:52:05 -08:00
Philip Reames	ffe96ad105	[RISCV] Allow undef elements in isDeinterleaveShuffle (#114585 ) This allows us to form vnsrl deinterleaves from non-power-of-two shuffles after they've been legalized to a power of two.	2024-11-04 12:01:53 -08:00
Luke Lau	7bf0d6d032	[RISCV] Lower fixed-length strided VP loads and stores for zvfhmin/zvfbfmin (#114750 ) Similarly to #114731, these don't actually require any instructions from the extensions. The motivation for this and #114731 is to eventually enable isLegalElementTypeForRVV for f16 with zvfhmin and bf16 with zvfbfmin in order to enable scalable vectorization. Although the scalable codegen support for f16 and bf16 is now complete enough for anything the loop vectorizer may emit, enabling isLegalElementTypeForRVV would make certian hooks like isLegalInterleavedAccessType and isLegalStridedLoadStore return true for f16 and bf16. This means SLP would start emitting these intrinsics, so we need to add fixed-length codegen support.	2024-11-04 20:53:51 +08:00
Luke Lau	7d35368405	[RISCV] Lower vector_shuffle for bf16 (#114731 ) This is much the same as with f16. Currently we scalarize if there's no zvfbfmin, and crash if there is zvfbfmin because it will try to create a bf16 build_vector, which we also can't lower.	2024-11-04 20:53:15 +08:00
Pengcheng Wang	18f0f70934	[RISCV] Support llvm.masked.expandload intrinsic (#101954 ) We can use `viota`+`vrgather` to synthesize `vdecompress` and lower expanding load to `vcpop`+`load`+`vdecompress`. And if `%mask` is all ones, we can lower expanding load to a normal unmasked load. Fixes #101914.	2024-10-31 20:03:58 +08:00
Luke Lau	6da5968f5e	[RISCV] Lower scalar_to_vector for supported FP types (#114340 ) In https://reviews.llvm.org/D147608 we added custom lowering for integers, but inadvertently also marked it as custom for scalable FP vectors despite not handling it. This adds handling for floats and marks it as custom lowered for fixed-length FP vectors too. Note that this doesn't handle bf16 or f16 vectors that would need promotion, but these scalar_to_vector nodes seem to be emitted when expanding them.	2024-10-31 13:15:17 +08:00
Craig Topper	55dbacbf07	[RISCV] Remove RISCVISD::VFCVT_X(U)_F_VL by using VFCVT_RM_X(U)_F_VL with DYN rounding mode. NFC (#114306 )	2024-10-30 19:16:23 -07:00
Yingwei Zheng	cf9d1c1486	[SDAG] Simplify `SDNodeFlags` with bitwise logic (#114061 ) This patch allows using enumeration values directly and simplifies the implementation with bitwise logic. It addresses the comment in https://github.com/llvm/llvm-project/pull/113808#discussion_r1819923625.	2024-10-31 08:10:07 +08:00
Luke Lau	96f5c68350	[RISCV] Lower @llvm.experimental.vector.compress for zvfhmin/zvfbfmin (#113770 ) This is a follow up to #113291 and handles f16/bf16 with zvfhmin and zvfbmin.	2024-10-28 09:37:06 +00:00
Pengcheng Wang	b799cc3418	[RISCV] Add lowering for @llvm.experimental.vector.compress (#113291 ) This intrinsic was introduced by #92289 and currently we just expand it for RISC-V. This patch adds custom lowering for this intrinsic and simply maps it to `vcompress` instruction. Fixes #113242.	2024-10-23 14:22:32 +08:00
Sam Elliott	9b9c2a082c	[RISCV][NFC] Move RISCVISD::TAIL beside RISCVISD::CALL	2024-10-22 11:12:58 -07:00
Craig Topper	1bc1a79a65	[RISCV] Support inline assembly 'f' constraint for Zfinx. (#112986 ) This would allow some inline assembly code to work with either F or Zfinx. This appears to match gcc behavior.	2024-10-18 18:17:23 -07:00
Sam Elliott	03dcd88c78	[RISCV][ISel] Ensure 'in X' Constraints prevent X0 (#112563 ) I'm not sure if this fix is required, but I've written the patch anyway. This does not cause test changes, but we haven't got tests that try to use all 32 registers in inline assembly. Broadly, for GPRs, we made the explicit choice that `r` constraints would never attempt to use `x0`, because `x0` isn't really usable like the other GPRs. I believe the same thing applies to `Zhinx`, `Zfinx` and `Zdinx` because they should not be allocating operands to `x0` either, so this patch introduces new `NoX0` classes for `GPRF16` and `GPRF32` registers, and uses them with inline assembly. There is also a `GPRPairNoX0` for the `Zdinx` case on rv32, avoiding use of the `x0` pair which has different behaviour to the other GPR pairs.	2024-10-18 22:33:35 +01:00
Sam Elliott	228f88fdc8	[RISCV] Inline Assembly: RVC constraint and N modifier (#112561 ) This change implements support for the `cr` and `cf` register constraints (which allocate a RVC GPR or RVC FPR respectively), and the `N` modifier (which prints the raw encoding of a register rather than the name). The intention behind these additions is to make it easier to use inline assembly when assembling raw instructions that are not supported by the compiler, for instance when experimenting with new instructions or when supporting proprietary extensions outside the toolchain. These implement part of my proposal in riscv-non-isa/riscv-c-api-doc#92 As part of the implementation, I felt there was not enough coverage of inline assembly and the "in X" floating-point extensions, so I have added more regression tests around these configurations.	2024-10-18 10:40:38 +01:00
Roger Ferrer Ibáñez	9d469b5988	[RISCV] Implement trampolines for rv64 (#96309 ) This is implementation is based on what the X86 target does but emitting the instructions that GCC emits for rv64. --------- Co-authored-by: Pengcheng Wang <wangpengcheng.pp@bytedance.com>	2024-10-18 08:06:47 +02:00
Jay Foad	85c17e4092	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706 ) Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsic call.	2024-10-17 16:20:43 +01:00
Nikita Popov	255a99c29f	[APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (#80309 ) This fixes all the places that hit the new assertion added in https://github.com/llvm/llvm-project/pull/106524 in tests. That is, cases where the value passed to the APInt constructor is not an N-bit signed/unsigned integer, where N is the bit width and signedness is determined by the isSigned flag. The fixes either set the correct value for isSigned, set the implicitTrunc flag, or perform more calculations inside APInt. Note that the assertion is currently still disabled by default, so this patch is mostly NFC.	2024-10-17 08:48:08 +02:00

1 2 3 4 5 ...

1789 Commits