llvm-project

Author	SHA1	Message	Date
Craig Topper	524545317c	[RISCV] Remove RISCVISD::BREV8 and use RISCVISD::GREV instead. We already have an ISD opcode for the more general GREV/GREVI instructon. We can just use it with the encoding that corresponds to the behavior of brev8. This is similar to what we do for orc.b where we use the GORC ISD opcode.	2022-01-29 22:45:43 -08:00
Craig Topper	d8f929a567	[RISCV] Custom legalize BITREVERSE with Zbkb. With Zbkb, a bitreverse can be split into a rev8 and a brev8. Reviewed By: VincentWu Differential Revision: https://reviews.llvm.org/D118430	2022-01-28 23:11:12 -08:00
Chenbing.Zheng	6d6c44a3f3	[RISCV] Add support for matching vwmulsu from fixed vectors According to riscv-v-spec-1.0, widening signed(vs2)-unsigned integer multiply vwmulsu.vv vd, vs2, vs1, vm # vector-vector vwmulsu.vx vd, vs2, rs1, vm # vector-scalar It is worth noting that signed op is only for vs2. For vwmulsu.vv, we can swap two ops, and don't care which is sign extension, but for vwmulsu.vx signExt can not be a vector extended from scalar (rs1). I specifically added two functions ending with _swap in the test case. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118215	2022-01-28 02:33:30 +00:00
Fraser Cormack	af773a1818	[RISCV][VP] Lower VP_MERGE to RVV instructions This patch adds lowering of the llvm.vp.merge.* intrinsic (ISD::VP_MERGE) to RVV vmerge/vfmerge instructions. It introduces a special pseudo form of vmerge which allows a tied merge operand, allowing us to specify the tail elements as being equal to the "on false" operand, using a tied-def constraint and a "tail undisturbed" policy. While this strategy allows us to often lower the intrinsic to just one instruction, it may be less efficient in fixed-vector types as the number of tail elements may extend far beyond the length of the fixed vector. Another strategy could be to use a vmerge/vfmerge instruction with an AVL equal to the length of the vector type, and manipulate the condition operand such that mask elements greater than the operation's EVL are false. I've also observed inefficient codegen in which our 'VF' patterns don't match raw floating-point SPLAT_VECTORs, which occur in scalable-vector code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117561	2022-01-24 11:05:05 +00:00
Craig Topper	fa8bb22466	[RISCV] Optimize vector_shuffles that are interleaving the lowest elements of two vectors. RISCV only has a unary shuffle that requires places indices in a register. For interleaving two vectors this means we need at least two vrgathers and a vmerge to do a shuffle of two vectors. This patch teaches shuffle lowering to use a widening addu followed by a widening vmaccu to implement the interleave. First we extract the low half of both V1 and V2. Then we implement (zext(V1) + zext(V2)) + (zext(V2) * zext(2^eltbits - 1)) which simplifies to (zext(V1) + zext(V2) * 2^eltbits). This further simplifies to (zext(V1) + zext(V2) << eltbits). Then we bitcast the result back to the original type splitting the wide elements in half. We can only do this if we have a type with wider elements available. Because we're using extends we also have to be careful with fractional lmuls. Floating point types are supported by bitcasting to/from integer. The tests test a varied combination of LMULs split across VLEN>=128 and VLEN>=512 tests. There a few tests with shuffle indices commuted as well as tests for undef indices. There's one test for a vXi64/vXf64 vector which we can't optimize, but verifies we don't crash. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D117743	2022-01-20 14:44:47 -08:00
Craig Topper	aa7fc02feb	Recommit "[RISCV] Make the operand order for RISCVISD::FSL(W)/FSR(W) match the instruction register numbering." This reverts the revert commit e32838573929ac85fc4df3058593798d10ce4cd2. Accidental demanded bits change has been removed. The demanded bits code itself was remove in a pre-commit since it isn't tested. Original commit message: Previous we used the fshl/fshr operand ordering for simplicity. This made things confusing when D117468 proposed adding intrinsics for the instructions. We can't just use the generic funnel shifting intrinsics because fsl/fsr have different functionality that should be exposed to software. Now we use rs1, rs3, rs2/shamt order which matches the instruction printing order and the order used in this intrinsic header https://github.com/riscv/riscv-bitmanip/blob/main-history/cproofs/rvintrin.h	2022-01-18 10:52:43 -08:00
Craig Topper	e328385739	Revert "[RISCV] Make the operand order for RISCVISD::FSL(W)/FSR(W) match the instruction register numbering." This reverts commit b634f8a663d56877663f5224a785d9f0263c4176. I broke the SimplifyDemandedBits code, but we don't have tests.	2022-01-18 10:36:03 -08:00
Craig Topper	b634f8a663	[RISCV] Make the operand order for RISCVISD::FSL(W)/FSR(W) match the instruction register numbering. Previous we used the fshl/fshr operand ordering for simplicity. This made things confusing when D117468 proposed adding intrinsics for the instructions. We can't just use the generic funnel shifting intrinsics because fsl/fsr have different functionality that should be exposed to software. Now we use rs1, rs3, rs2/shamt order which matches the instruction printing order and the order used in this intrinsic header https://github.com/riscv/riscv-bitmanip/blob/main-history/cproofs/rvintrin.h	2022-01-18 09:47:28 -08:00
David Sherwood	f4515ab858	Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants" This reverts commit 197f3c0deb76951315118ef13937b67ea9cbd5aa. Reverting after miscompilation errors discovered with ffmpeg.	2022-01-18 08:40:20 +00:00
Han-Kuan Chen	ec9cb3a79c	[RISCV] Provide VLOperand in td. Currently, users expected VL is the last operand. However, since some intrinsics has tail policy in the last operand, this rule cannot be used anymore. Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D117452	2022-01-17 20:25:47 -08:00
Han-Kuan Chen	3fc4b5896a	[RISCV] Make SplatOperand start from 0. Current SplatOperand starts from 1 because operand 0 (or 1) is intrinsic id in SelectionDAG. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117453	2022-01-17 20:14:59 -08:00
David Sherwood	197f3c0deb	[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants When we know the value we're extending is a negative constant then it makes sense to use SIGN_EXTEND because this may improve code quality in some cases, particularly when doing a constant splat of an unpacked vector type. For example, for SVE when splatting the value -1 into all elements of a vector of type <vscale x 2 x i32> the element type will get promoted from i32 -> i64. In this case we want the splat value to sign-extend from (i32 -1) -> (i64 -1), whereas currently it zero-extends from (i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use a single mov immediate instruction. New tests added here: CodeGen/AArch64/sve-vector-splat.ll I believe we see some code quality improvements in these existing tests too: CodeGen/AArch64/reduce-and.ll CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only occur because the test disables codegen prepare and branch folding. Differential Revision: https://reviews.llvm.org/D114357	2022-01-17 11:08:57 +00:00
David Sherwood	ba471ba8d2	Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants" This reverts commit 31009f0b5afb504fc1f30769c038e1b7be6ea45b. It seems to be causing SVE VLA buildbot failures and has introduced a genuine regression. Reverting for now.	2022-01-13 15:59:43 +00:00
David Sherwood	31009f0b5a	[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants When we know the value we're extending is a negative constant then it makes sense to use SIGN_EXTEND because this may improve code quality in some cases, particularly when doing a constant splat of an unpacked vector type. For example, for SVE when splatting the value -1 into all elements of a vector of type <vscale x 2 x i32> the element type will get promoted from i32 -> i64. In this case we want the splat value to sign-extend from (i32 -1) -> (i64 -1), whereas currently it zero-extends from (i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use a single mov immediate instruction. New tests added here: CodeGen/AArch64/sve-vector-splat.ll I believe we see some code quality improvements in these existing tests too: CodeGen/AArch64/dag-numsignbits.ll CodeGen/AArch64/reduce-and.ll CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only occur because the test disables codegen prepare and branch folding. Differential Revision: https://reviews.llvm.org/D114357	2022-01-13 09:43:07 +00:00
Lian Wang	16877c5d2c	[RISCV] Add bfp and bfpw intrinsic in zbf extension Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116994	2022-01-13 02:53:00 +00:00
wangpc	c6430fade3	[RISCV] Generate 32 bits jumptable entries when code model is small The code can only address the whole RV32 address space or the lower 2 GiB of the RV64 address space in small code model, so 32 bits entry is enough. Cache hit ratio and code size have some improvements. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D116435	2022-01-11 18:20:37 +08:00
wangpc	98d51c2542	[RISCV] Override TargetLowering::BuildSDIVPow2 to generate SELECT When `Zbt` is enabled, we can generate SELECT for division by power of 2, so that there is no data dependency. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D114856	2022-01-11 15:54:35 +08:00
Craig Topper	808c662665	[RISCV] Change RISCVISD::FCVT*RTZ opcodes to take rounding mode as an operand. Pre-work for a future change that will use these opcodes with other rounding modes. Differential Revision: https://reviews.llvm.org/D116724	2022-01-06 08:12:12 -08:00
wangpc	41454ab256	[RISCV] Use constant pool for large integers For large integers (for example, magic numbers generated by TargetLowering::BuildSDIV when dividing by constant), we may need about 4~8 instructions to build them. In the same time, it just takes two instructions to load constants (with extra cycles to access memory), so it may be profitable to put these integers into constant pool. Reviewed By: asb, craig.topper Differential Revision: https://reviews.llvm.org/D114950	2021-12-31 14:48:48 +08:00
Victor Perez	10b3675aa9	[RISCV][VP] Lower mask vector VP AND/OR/XOR to RVV instructions For fixed and scalable vectors, each intrinsic x is lowered to vmx.mm, dropping the mask, which is safe to do as masked-off elements are undef anyway. Differential Revision: https://reviews.llvm.org/D115339	2021-12-23 15:02:32 -06:00
Craig Topper	b7b260e19a	[RISCV] Support strict FP conversion operations. This adds support for strict conversions between fp types and between integer and fp. NOTE: RISCV has static rounding mode instructions, but the constrainted intrinsic metadata is not used to select static rounding modes. Dynamic rounding mode is always used. Differential Revision: https://reviews.llvm.org/D115997	2021-12-23 09:40:58 -06:00
jacquesguan	28a3e7dea2	[RISCV] Override hasAndNotCompare to use more andn when have Zbb extension. Enable transform (X & Y) == Y ---> (~X & Y) == 0 and (X & Y) != Y ---> (~X & Y) != 0 when have Zbb extension to use more andn instruction. Differential Revision: https://reviews.llvm.org/D115922	2021-12-23 10:42:20 +08:00
Craig Topper	3f1c403a2b	[RISCV] Use AdjustInstrPostInstrSelection to insert a FRM dependency for scalar FP instructions with dynamic rounding mode. In order to support constrained FP intrinsics we need to model FRM dependency. Whether or not a instruction uses FRM is based on a 3 bit field in the instruction. Because of this we can't add 'Uses = [FRM]' to the tablegen descriptions. This patch examines the immediate after isel and adds an implicit use of FRM. This idea came from Roger Ferrer Ibanez. Other ideas: We could be overly conservative and just pretend all instructions with frm field read the FRM register. Or we could have pseudoinstructions for CodeGen with rounding mode. Reviewed By: asb, frasercrmck, arcbbb Differential Revision: https://reviews.llvm.org/D115555	2021-12-14 10:17:57 -08:00
David Green	9e8a71caf0	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 15:29:14 +00:00
Hans Wennborg	a87782c34d	Revert "[DAG] Create fptosi.sat from clamped fptosi" It causes builds to fail with this assert: llvm/include/llvm/ADT/APInt.h:990: bool llvm::APInt::operator==(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed. See comment on the code review. > This adds a fold in DAGCombine to create fptosi_sat from sequences for > smin(smax(fptosi(x))) nodes, where the min/max saturate the output of > the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because > it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, > ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need > to be handled similarly. > > A shouldConvertFpToSat method was added to control when converting may > be profitable. The original fptosi will have a less strict semantics > than the fptosisat, with less values that need to produce defined > behaviour. > > This especially helps on ARM/AArch64 where the vcvt instructions > naturally saturate the result. > > Differential Revision: https://reviews.llvm.org/D111976 This reverts commit 52ff3b009388f1bef4854f1b6470b4ec19d10b0e.	2021-11-30 15:36:56 +01:00
David Green	52ff3b0093	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 11:05:32 +00:00
Philipp Tomsich	af57a71d18	[RISCV] Don't call setHasMultipleConditionRegisters(), so icmp is sunk On RISC-V, icmp is not sunk (as the following snippet shows) which generates the following suboptimal branch pattern: ``` core_list_find: lh a2, 2(a1) seqz a3, a0 << bltz a2, .LBB0_5 bnez a3, .LBB0_9 << should sink the seqz [...] j .LBB0_9 .LBB0_5: bnez a3, .LBB0_9 << should sink the seqz lh a1, 0(a1) [...] ``` due to an icmp not being sunk. The blocks after `codegenprepare` look as follows: ``` define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 { entry: %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1 %0 = load i16, i16* %idx, align 2, !tbaa !4 %cmp = icmp sgt i16 %0, -1 %tobool.not37 = icmp eq %struct.list_head_s* %list, null br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader while.cond9.preheader: ; preds = %entry br i1 %tobool.not37, label %return, label %land.rhs11.lr.ph ``` where the `%tobool.not37` is the result of the icmp that is not sunk. Note that it is computed in the basic-block up until what becomes the `bltz` instruction and the `bnez` is a basic-block of its own. Compare this to what happens on AArch64 (where the icmp is correctly sunk): ``` define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 { entry: %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1 %0 = load i16, i16* %idx, align 2, !tbaa !6 %cmp = icmp sgt i16 %0, -1 br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader while.cond9.preheader: ; preds = %entry %1 = icmp eq %struct.list_head_s* %list, null br i1 %1, label %return, label %land.rhs11.lr.ph ``` This is caused by sinkCmpExpression() being skipped, if multiple condition registers are supported. Given that the check for multiple condition registers affect only sinkCmpExpression() and shouldNormalizeToSelectSequence(), this change adjusts the RISC-V target as follows: * we no longer signal multiple condition registers (thus changing the behaviour of sinkCmpExpression() back to sinking the icmp) * we override shouldNormalizeToSelectSequence() to let always select the preferred normalisation strategy for our backend With both changes, the test results remain unchanged. Note that without the target-specific override to shouldNormalizeToSelectSequence(), there is worse code (more branches) generated for select-and.ll and select-or.ll. The original test case changes as expected: ``` core_list_find: lh a2, 2(a1) bltz a2, .LBB0_5 beqz a0, .LBB0_9 << [...] j .LBB0_9 .LBB0_5: beqz a0, .LBB0_9 << lh a1, 0(a1) [...] ``` Differential Revision: https://reviews.llvm.org/D98932	2021-11-19 08:32:59 -08:00
Craig Topper	391b0ba603	[RISCV] Override TargetLowering::hasAndNot for Zbb. Differential Revision: https://reviews.llvm.org/D113937	2021-11-15 18:44:07 -08:00
Zakk Chen	0649dfebba	[RISCV] Rename some assembler mnemonic and intrinsic functions for RVV 1.0. Rename vpopc/vmandnot/vmornot to vcpop/vmandn/vmorn assembler mnemonic. Reviewed By: frasercrmck, jrtc27, craig.topper Differential Revision: https://reviews.llvm.org/D111062	2021-11-04 10:08:01 -07:00
Fraser Cormack	e7c879a69d	[RISCV][VP] Add support for VP_REDUCE_* operations This patch adds codegen support for lowering the vector-predicated reduction intrinsics to RVV instructions. The process is similar to that of the other reduction intrinsics, save for the fact that every VP reduction has a start value. We reuse the existing custom "VL" nodes, adding extra patterns where required to handle non-true masks. To support these nodes, the `RISCVISD::VECREDUCE_*_VL` nodes have been given an explicit "merge" operand. This is to faciliate the VP reductions, where we must be careful to ensure that even if no operation is performed (when VL=0) we still produce the start value. The RVV reductions don't update the destination register under these conditions, so we tie the splatted start value to the output register. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107657	2021-09-23 11:11:05 +01:00
Craig Topper	d85e347a28	[RISCV] Add a pass to recognize VLS strided loads/store from gather/scatter. For strided accesses the loop vectorizer seems to prefer creating a vector induction variable with a start value of the form <i32 0, i32 1, i32 2, ...>. This value will be incremented each loop iteration by a splat constant equal to the length of the vector. Within the loop, arithmetic using splat values will be done on this vector induction variable to produce indices for a vector GEP. This pass attempts to dig through the arithmetic back to the phi to create a new scalar induction variable and a stride. We push all of the arithmetic out of the loop by folding it into the start, step, and stride values. Then we create a scalar GEP to use as the base pointer for a strided load or store using the computed stride. Loop strength reduce will run after this pass and can do some cleanups to the scalar GEP and induction variable. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107790	2021-09-20 09:39:44 -07:00
Craig Topper	1b736bda3b	[RISCV] Enable CGP to sink splat operands of Add/Sub/Mul/Shl/LShr/AShr LICM may have pulled out a splat, but with .vx instructions we can fold it into an operation. This patch enables CGP to reverse the LICM transform and move the splat back into the loop. I've started with the commutable integer operations and shifts, but we can extend this with more operations in future patches. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109394	2021-09-10 09:04:01 -07:00
Fraser Cormack	a823bdf3ab	[RISCV][VP] Custom lower VP_STORE and VP_LOAD This patch adds support for the vector-predicated `VP_STORE` and `VP_LOAD` nodes. We do this in the same way we lower `MSTORE` and `MLOAD`: to regular load/store instructions via intrinsics. One necessary change was made to `SelectionDAGLegalize` so that `VP_STORE` nodes' operation actions are taken from the stored "value" operands, in the same vein as `STORE` or `MSTORE`. Reviewed By: craig.topper, rogfer01 Differential Revision: https://reviews.llvm.org/D108999	2021-09-07 10:53:25 +01:00
Fraser Cormack	f4dee8cb82	[RISCV][VP] Custom lower VP_SCATTER and VP_GATHER This patch adds support for the `VP_SCATTER` and `VP_GATHER` nodes by lowering them to RVV's `vsox`/`vlux` instructions, respectively. This process is almost identical to the existing `MSCATTER`/`MGATHER` support. One extra change was made to `SelectionDAGLegalize` so that `VP_SCATTER`'s operation action is derived from its stored "value" operand rather than its return type (which is always the chain). Reviewed By: craig.topper, rogfer01 Differential Revision: https://reviews.llvm.org/D108987	2021-09-07 10:43:07 +01:00
Ben Shi	f69fb7ac72	[DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2) Reviewed by: lebedev.ri, spatel, craig.topper, luismarques, jrtc27 Differential Revision: https://reviews.llvm.org/D107711	2021-08-22 16:53:32 +08:00
Craig Topper	d4ee84ceee	[RISCV] Support FP_TO_S/UINT_SAT for i32 and i64. The fcvt fp to integer instructions saturate if their input is infinity or out of range, but the instructions produce a maximum integer for nan instead of 0 required for the ISD opcodes. This means we can use the instructions to do the saturating conversion, but we'll need to fix up the nan case at the end. We can probably improve the i8 and i16 default codegen as well, but I'll leave that for a follow up. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107230	2021-08-07 16:06:00 -07:00
Craig Topper	593059b328	[RISCV] Rename RISCVISD::FCVT_W_RV64 to FCVT_W_RTZ_RV64. NFC fcvt.w(u) supports multiple rounding modes, but the ISD node doesn't encode that. So name it to match the rounding mode it uses.	2021-07-31 11:14:59 -07:00
Fraser Cormack	172487fe4c	[RISCV] Add support for vector saturating add/sub operations This patch adds support for lowering the saturating vector add/sub intrinsics to RVV instructions, for both fixed-length and scalable-vector forms alike. Note that some of the DAG combines are still not triggering for the scalable-vector tests. These require a bit more work in the DAGCombiner itself. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106651	2021-07-27 10:04:14 +01:00
Craig Topper	c63dbd8501	[RISCV] Custom lower (i32 (fptoui/fptosi X)). I stumbled onto a case where our (sext_inreg (assertzexti32 (fptoui X)), i32) isel pattern can cause an fcvt.wu and fcvt.lu to be emitted if the assertzexti32 has an additional user. If we add a one use check it would just cause a fcvt.lu followed by a sext.w when only need a fcvt.wu to satisfy both users. To mitigate this I've added custom isel and new ISD opcodes for fcvt.wu. This allows us to keep know it started life as a conversion to i32 without needing to match multiple nodes. ComputeNumSignBits has been taught that this new nodes produces 33 sign bits. To prevent regressions when we need to zero extend the result of an (i32 (fptoui X)), I've added a DAG combine to convert it to an (i64 (fptoui X)) before type legalization. In most cases this would happen in InstCombine, but a zero_extend can be created for function returns or arguments. To keep everything consistent I've added new nodes for fptosi as well. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D106346	2021-07-24 10:50:43 -07:00
Craig Topper	2b5e53111a	[RISCV] Add support for matching vwmul(u) and vwmacc(u) from fixed vectors. This adds a DAG combine to detect sext/zext inputs and emit a new ISD opcode. The extends will either be removed or replaced with narrower extends. Isel patterns are used to match add and widening mul to vwmacc similar to the recently added vmacc patterns. There's still some work to be to match vmulsu. We should also rewrite splats that were extended as scalars and then splatted. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D104802	2021-07-06 10:24:31 -07:00
Craig Topper	3b6dfa381e	[RISCV] Protect the SHL/SRA/SRL handlers in LowerOperation against being called for an illegal i32 shift amount. It seems it is possible for DAG combine to create a shl with an i64 result type and an i32 shift amount. This is ok before type legalization since the type don't need to match in SelectionDAG. This results in type legalization calling LowerOperation to legalize just the amount. We weren't expecting this so we asserted for not finding a fixed vector shift. To fix this, I've added a check for the fixed vector case and returned SDValue() to get the default type legalizer. I've factored all shifts together and added a fixed vector specific handler to avoid repeating similar code for each in LowerOperation. The particular case I found was exposed by D104581, but the bad shift is created after that patch triggers.	2021-06-29 09:45:13 -07:00
Fraser Cormack	c75e454cb9	[RISCV] Transform unaligned RVV vector loads/stores to aligned ones This patch adds support for loading and storing unaligned vectors via an equivalently-sized i8 vector type, which has support in the RVV specification for byte-aligned access. This offers a more optimal path for handling of unaligned fixed-length vector accesses, which are currently scalarized. It also prevents crashing when `LegalizeDAG` sees an unaligned scalable-vector load/store operation. Future work could be to investigate loading/storing via the largest vector element type for the given alignment, in case that would be more optimal on hardware. For instance, a 4-byte-aligned nxv2i64 vector load could loaded as nxv4i32 instead of as nxv16i8. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104032	2021-06-14 18:12:18 +01:00
Nikita Popov	1ffa6499ea	[TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC) Don't require a specific kind of IRBuilder for TargetLowering hooks. This allows us to drop the IRBuilder.h include from TargetLowering.h. Differential Revision: https://reviews.llvm.org/D103759	2021-06-06 16:29:50 +02:00
Fraser Cormack	4f500c402b	[RISCV] Support vector types in combination with fastcc This patch extends the RISC-V lowering of the 'fastcc' calling convention to vector types, both fixed-length and scalable. Without this patch, any function passing or returning vector types by value would throw a compiler error. Vectors are handled in 'fastcc' much as they are in the default calling convention, the noticeable difference being the extended set of scalar GPR registers that can be used to pass vectors indirectly. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102505	2021-06-01 10:31:18 +01:00
Fraser Cormack	5a80dc4988	[VP][SelectionDAG] Add a target-configurable EVL operand type This patch adds a way for the target to configure the type it uses for the explicit vector length operands of VP SDNodes. The type must be a legal integer type (there is still no target-independent legalization of this operand) and must currently be at least as big as i32, the type used by the IR intrinsics. An implicit zero-extension takes place on targets which choose a larger type. All VP nodes should be created with this type used for the EVL operand. This allows 64-bit RISC-V to avoid custom legalization of all VP nodes, keeping them in their target-independent form for that bit longer. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D103027	2021-05-27 15:27:36 +01:00
Craig Topper	9065118b64	[RISCV] Optimize SEW=64 shifts by splat on RV32. SEW=64 shifts only uses the log2(64) bits of shift amount. If we're splatting a 64 bit value in 2 parts, we can avoid splatting the upper bits and just let the low bits be sign extended. They won't be read anyway. For the purposes of SelectionDAG semantics of the generic ISD opcodes, if hi was non-zero or bit 31 of the low is 1, the shift was already undefined so it should be ok to replace high with sign extend of low. In order do be able to find the split i64 value before it becomes a stack operation, I added a new ISD opcode that will be expanded to the stack spill in PreprocessISelDAG. This new node is conceptually similar to BuildPairF64, but I expanded earlier so that we could go through regular isel to get the right VLSE opcode for the LMUL. BuildPairF64 is expanded in a CustomInserter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D102521	2021-05-26 10:23:32 -07:00
Fraser Cormack	7a211ed110	[RISCV] Prevent store combining from infinitely looping RVV code generation does not successfully custom-lower BUILD_VECTOR in all cases. When it resorts to default expansion it may, on occasion, be expanded to scalar stores through the stack. Unfortunately these stores may then be picked up by the post-legalization DAGCombiner which merges them again. The merged store uses a BUILD_VECTOR which is then expanded, and so on. This patch addresses the issue by overriding the `mergeStoresAfterLegalization` hook. A lack of granularity in this method (being passed the scalar type) means we opt out in almost all cases when RVV fixed-length vector support is enabled. The only exception to this rule are mask vectors, which are always either custom-lowered or are expanded to a load from a constant pool. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102913	2021-05-24 10:19:32 +01:00
Evandro Menezes	3a64b7080d	[RISCV] Move instruction information into the RISCVII namespace (NFC) Move instruction attributes into the `RISCVII` namespace and add associated helper functions. Differential Revision: https://reviews.llvm.org/D102268	2021-05-11 16:32:42 -05:00
Craig Topper	6660319cef	[RISCV] Remove unused RISCV::VLEFF and VLEFF_MASK. NFC Looks like these got left behind when vleff isel was moved to X86ISelDAGToDAG.cpp	2021-05-06 09:41:29 -07:00
Fraser Cormack	6f17613bfb	[RISCV][VP] Lower VP ISD nodes to RVV instructions This patch supports all of the current set of VP integer binary intrinsics by lowering them to to RVV instructions. It does so by using the existing RISCVISD *_VL custom nodes as an intermediate layer. Both scalable and fixed-length vectors are supported by using this method. One notable change to the existing vector codegen strategy is that scalable all-ones and all-zeros mask SPLAT_VECTORs are now lowered to RISCVISD VMSET_VL and VMCLR_VL nodes to match their fixed-length BUILD_VECTOR counterparts. This allows them to reuse the existing "all-ones" VL patterns. To reduce the size of the phabricator diff, some tests are intentionally left out and will be added later if the patch is accepted. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101826	2021-05-05 12:32:24 +01:00

1 2 3 4 5 ...

303 Commits