llvm-project

Author	SHA1	Message	Date
Zakk Chen	b578330754	[RISCV] Use maskedoff to decide mask policy for masked compare and vmsbf/vmsif/vmsof. masked compare and vmsbf/vmsif/vmsof are always tail agnostic, we could check maskedoff value to decide mask policy rather than have a addtional policy operand. Reviewed By: craig.topper, arcbbb Differential Revision: https://reviews.llvm.org/D122456	2022-03-29 18:05:33 -07:00
Zakk Chen	abb5a985e9	[RISCV] Support mask policy for RVV IR intrinsics. Add the UsesMaskPolicy flag to indicate the operations result would be effected by the mask policy. (ex. mask operations). It means RISCVInsertVSETVLI should decide the mask policy according by mask policy operand or passthru operand. If UsesMaskPolicy is false (ex. unmasked, store, and reduction operations), the mask policy could be either mask undisturbed or agnostic. Currently, RISCVInsertVSETVLI sets UsesMaskPolicy operations default to MA, otherwise to MU to keep the current mask policy would not be changed for unmasked operations. Add masked-tama, masked-tamu, masked-tuma and masked-tumu test cases. I didn't add all operations because most of implementations are using the same pseudo multiclass. Some tests maybe be duplicated in different tests. (ex. masked vmacc with tumu shows in vmacc-rv32.ll and masked-tumu) I think having different tests only for policy would make the testing clear. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120226	2022-03-22 01:19:16 -07:00
Chenbing.Zheng	2ae92e19eb	[RISCV][NFC] Add helper function isVectorConfigInstr to reduce Repeated code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D119924	2022-02-24 05:59:12 +00:00
Craig Topper	541c9ba842	[RISCV] Insert VSETVLI at the end of a basic block if we didn't produce BlockInfo.Exit. This is an alternative to D118667 that instead of fixing the store to match phase 1, it tries to detect the mismatch with the expected value at the end of the block. This inserts a vsetvli after the vse to satisfy the requirement of the other basic block. We still have serious design issues in the pass, that is going to require some rethinking. Differential Revision: https://reviews.llvm.org/D119518	2022-02-11 09:34:16 -08:00
Craig Topper	f35ac872b8	Revert "[RISCV] Fix a vsetvli insertion bug involving loads/stores." and "[RISCC] Add missing words to comment. NFC" This reverts commit f943c58cae2480755cecdac5be832274f238df93. and commit 7eb781072744b31a60e82b5a5903471032d4845f. This introduced a new bug that appears to be easier to hit. Differential Revision: https://reviews.llvm.org/D119517	2022-02-11 09:34:16 -08:00
Kazu Hirata	3a3cb929ab	[llvm] Use = default (NFC)	2022-02-06 22:18:35 -08:00
Craig Topper	f943c58cae	[RISCC] Add missing words to comment. NFC	2022-02-01 07:39:51 -08:00
Craig Topper	7eb7810727	[RISCV] Fix a vsetvli insertion bug involving loads/stores. The first phase of the analysis can avoid a vsetvli if an earlier instruction in the block used an SEW and LMUL that when combined with the EEW of the load/store would produce the desired EMUL. If we avoided a vsetvli this will affect the global analysis we do in the second phase. The third phase where we really insert the vsetvlis needs to agree with the first phase. If it doesn't we can insert vsetvlis that invalidate the global analysis. In the test case there is a VSETVLI in the preheader that sets SEW=64 and LMUL=1. Inside the loop there is a VADD with SEW=64 and LMUL=1. This VADD is followed by a store that wants wants SEW=32 LMUL=1/2. Because it has EEW=32 as part of the opcode the SEW=64 LMUL=1 from the VADD can be become EMUL=1 for the store. So the first phase determines no vsetvli is needed. The third phase manages CurInfo differently than BBInfo.Change from the first phase. CurInfo is only updated when we see a vsetvli or insert a vsetvli. This was done to allow predecessor block information from the global analysis to be applied to multiple instructions. Since the loop body has no vsetvli we won't update CurInfo for either the VADD or the VSE. This prevented us from checking the store vsetvli elision for the VSE resulting in a vsetvli SEW=32 LMUL=1/2 being emitted which invalidated the global analysis. To mitigate this, I've added a BBLocalInfo variable that more closely matches the first phase propagation. This gets updated based on the VADD and prevents emitting a vsetvli for the store like we did in the first phase. I wonder if we should do an earlier phase to handle the load/store case by adding more pseudo opcodes and changing the SEW/LMUL for those instructions before the insertion analysis. That might be more robust than trying to guarantee two phases make the same decision. Fixes the test from D118629. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D118667	2022-02-01 07:29:01 -08:00
Craig Topper	4602f4169a	[RISCV] Prune unnecessary vector pseudo instructions. NFC For .vf instructions, we don't need MF8 pseudos for f16. We don't need MF8 or MF4 pseudos for f32. Or MF8, MF4, MF2 for f64. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D116437	2022-01-01 19:53:53 -08:00
jacquesguan	05f82dc877	[RISCV] Fix incorrect cases of vmv.s.f in the VSETVLI insert pass. Fix incorrect cases of vmv.s.f and add test cases for it. Differential Revision: https://reviews.llvm.org/D116432	2021-12-31 14:17:03 +08:00
jacquesguan	128c6ed73b	[RISCV] Teach VSETVLInsert to eliminate redundant vsetvli for vmv.s.x and vfmv.s.f. Differential Revision: https://reviews.llvm.org/D116307	2021-12-30 17:16:18 +08:00
Craig Topper	f59307bfdc	[RISCV] Teach needVSETVLIPHI to handle mask register instructions. This handles the case where the mask register instruction input comes from a Phi of vsetvlis. If the VLMAX is the same as the VLMAX required by the mask register instruction, we can avoid a vsetvli. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D113204	2021-11-15 09:57:28 -08:00
Craig Topper	aefcd59895	[RISCV] Teach RISCVInsertVSETVLI::needVSETVLI to handle mask register instructions better. If the VL operand of a mask register instruction comes from an explicit vsetvli with a different VTYPE, we can still avoid needing a vsetvli as long as the SEW/LMUL ratio is the same and policy bits match. Differential Revision: https://reviews.llvm.org/D112762	2021-10-29 09:49:36 -07:00
Craig Topper	1387483e72	[RISCV] Replace most uses of RISCVSubtarget::hasStdExtV. NFCI Add new hasVInstructions() which is currently equivalent. Replace vector uses of hasStdExtZfh/F/D with new vector specific versions. The vector spec no longer requires that the vectors implement the same types as scalar. It only requires that the scalar type is the maximum size the vectors can support. This is currently implemented using the scalar rule we were using before. Add new hasVInstructionsI64() begin using to qualify code that requires i64 vector elements. This is all NFC for now, but we can start using this to better implement D112408 which introduces the Zve extensions. Reviewed By: frasercrmck, eopXD Differential Revision: https://reviews.llvm.org/D112496	2021-10-27 19:33:48 -07:00
Fraser Cormack	74c6895b39	[RISCV] Fix missing cross-block VSETVLI insertion This patch fixes a codegen bug, the test for which was introduced in D112223. When merging VSETVLIInfo across blocks, if the 'exit' VSETVLIInfo produced by a block is found to be compatible with the VSETVLIInfo computed as the intersection of the 'exit' VSETVLIInfo produced by the block's predecessors, that blocks' 'exit' info is discarded and the intersected value is taken in its place. However, we have one authority on what constitutes VSETVLIInfo compatibility and we are using it in two different contexts. Compatibility is used in one context to elide VSETVLIs between straight-line vector instructions. But compatibility when evaluated between two blocks' exit infos ignores any info produced inside each respective block before the exit points. As such it does not guarantee that a block will not produce a VSETVLI which is incompatible with the 'previous' block. As such, we must ensure that any merging of VSETVLIInfo is performed using some notion of "strict" compatibility. I've defined this as a full vtype match, but this is perhaps too pessimistic. Given that test coverage in this regard is lacking -- the only change is in the failing test -- I think this is a good starting point. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D112228	2021-10-22 10:45:10 +01:00
Hsiangkai Wang	7d39a8a921	[RISCV] (1/2) Add the tail policy argument to builtins/intrinsics. Add the tail policy argument to LLVM IR intrinsics. There are two policies for tail elements. Tail agnostic means users do not care about the values in the tail elements and tail undisturbed means the values in the tail elements need to be kept after the operation. In order to let users control the tail policy, we add an additional argument at the end of the argument list. For unmasked operations, we have no maskedoff and the tail policy is always tail agnostic. If users want to keep tail elements under unmasked operations, they could use all one mask in the masked operations to do it. So, we only add the additional argument for masked operations for most cases. There are exceptions listed below. In this patch, we do not handle the following cases to reduce the complexity of the patch. There could be two separate patches for them. * Use dest argument to control tail policy vmerge.vvm/vmerge.vxm/vmerge.vim (add _t builtins with additional dest argument) vfmerge.vfm (add _t builtins with additional dest argument) vmv.v.v (add _t builtins with additional dest argument) vmv.v.x (add _t builtins with additional dest argument) vmv.v.i (add _t builtins with additional dest argument) vfmv.v.f (add _t builtins with additional dest argument) vadc.vvm/vadc.vxm/vadc.vim (add _t builtins with additional dest argument) vsbc.vvm/vsbc.vxm (add _t builtins with additional dest argument) * Always has tail argument for masked/unmasked intrinsics Vector Single-Width Integer Multiply-Add Instructions (add _t and _mt builtins) Vector Widening Integer Multiply-Add Instructions (add _t and _mt builtins) Vector Single-Width Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins) Vector Widening Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins) Vector Reduction Operations (add _t and _mt builtins) Vector Slideup Instructions (add _t and _mt builtins) Vector Slidedown Instructions (add _t and _mt builtins) Discussion: https://github.com/riscv/rvv-intrinsic-doc/pull/101 Differential Revision: https://reviews.llvm.org/D105092	2021-09-24 17:09:50 +08:00
Craig Topper	6c7cadb8c1	[RISCV] Teach vsetvli insertion that stores don't use the policy bits in vtype. This can avoid a vsetvl after a tail undisturbed operation. Differential Revision: https://reviews.llvm.org/D109549	2021-09-10 09:03:20 -07:00
Craig Topper	75620fadf5	[RISCV] Change how we encode AVL operands in vector pseudoinstructions to use GPRNoX0. This patch changes the register class to avoid accidentally setting the AVL operand to X0 through MachineIR optimizations. There are cases where we really want to use X0, but we can't get that past the MachineVerifier with the register class as GPRNoX0. So I've use a 64-bit -1 as a sentinel for X0. All other immediate values should be uimm5. I convert it to X0 at the earliest possible point in the VSETVLI insertion pass to avoid touching the rest of the algorithm. In SelectionDAG lowering I'm using a -1 TargetConstant to hide it from instruction selection and treat it differently than if the user used -1. A user -1 should be selected to a register since it doesn't fit in uimm5. This is the rest of the changes started in D109110. As mentioned there, I don't have a failing test from MachineIR optimizations anymore. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109116	2021-09-03 09:19:25 -07:00
Craig Topper	e4e69ba4d1	[RISCV] Split PseudoVSETVLI into 2 instructions to allow different register classes for rs1. X0 has special meaning for vsetvli, we need to make sure we never create it a vsetvli that uses it by accident. This could happen if the register coalescer coalesces a copy from X0 into this instruction. This patch splits the instruction so that we can have GPRNoX0 register class to use for the cases where we don't want the source to be X0. The verifier won't let us explicitly use X0 on a GPRNoX0 operand so we need a separate pseudo for those cases. I don't currently have a failing example for this. There was a failure in D107957, but the coalescable copy from that example should have been optimized away much earlier so I've fixed that. This is not a complete fix. We still need to prevent the same possible issue on the AVL operand of all of the vector instruction pseudos. I don't want to make two versions of all of those so we need to find a different solution for those. I have an idea I'm going to try. Differential Revision: https://reviews.llvm.org/D109110	2021-09-02 07:45:31 -07:00
Craig Topper	79fbddbea0	[RISCV] Teach vsetvli insertion pass that it doesn't need to insert vsetvli for unit-stride or strided loads/stores in some cases. For unit-stride and strided load/stores we set the SEW operand of the pseudo instruction equal the EEW in the opcode. The LMUL of the pseudo instruction is the LMUL we want. These instructions calculate EMUL=(EEW/SEW) * LMUL. We can use this to avoid changing vtype if the SEW/LMUL of the previous vtype matches the EEW/EMUL ratio we need for the instruction. Due to how the global analysis works, we can only do this optimization when the previous vsetvli was produced in the block containing the store. We need to know in the first phase if the vsetvli will be inserted so we can propagate information to the successors in the second phase correctly. This means we can't depend on predecessors. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D106601	2021-08-12 10:05:27 -07:00
Craig Topper	643ce70a64	[RISCV] Remove the _COMMUTABLE and _TA versions of FMA and wide FMA vector instructions. Use a tail policy operand instead. Inspired by the work in D105092, but without the intrinsic interface changes. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106512	2021-08-04 10:39:50 -07:00
jacquesguan	7900ee0b61	[RISCV] Teach VSETVLI insertion to merge the unused VSETVLI with the one need to be insert after it. If a vsetvli instruction is not compatible with the next vector instruction, and there is no other things that may update or use VL/VTYPE, we could merge it with the next vsetvli instruction that should be insert for the vector instruction. This commit only merge VTYPE with the former vsetvli instruction which has the same VL. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106857	2021-08-03 12:06:59 +08:00
Craig Topper	5edccc4581	[RISCV] Avoid using x0,x0 vsetvli for vmv.x.s and vfmv.f.s unless we know the sew/lmul ratio is constant. Since we're changing VTYPE, we may change VLMAX which could invalidate the previous VL. If we can't tell if it is safe we should use an AVL of 1 instead of keeping the old VL. This is a quick fix. We may want to thread VL to the pseudo instruction instead of making up a value. That will require ISD opcode changes and changes to the C intrinsic interface. This fixes the issue raised in D106286. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106403	2021-07-23 09:12:05 -07:00
Craig Topper	a467c08570	[RISCV] Cleanup comment around vector tail policy handling. NFC vmv.x.s and reductions don't ignore tail policy anymore.	2021-07-21 12:45:08 -07:00
Craig Topper	c2e01ee4a5	[RISCV] Remove extra character from a comment. NFC	2021-06-21 12:52:02 -07:00
Craig Topper	ac87133f1d	[RISCV] Teach vsetvli insertion to remember when predecessors have same AVL and SEW/LMUL ratio if their VTYPEs otherwise mismatch. Previously we went directly to unknown state on VTYPE mismatch. If we instead remember the partial match, we can use this to still use X0, X0 vsetvli in successors if AVL and needed SEW/LMUL ratio match. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D104069	2021-06-18 12:16:07 -07:00
Craig Topper	8dfd0810f2	[RISCV] Remove unused method from RISCVInsertVSETVLI. NFC If this becomes needed its trivial to add it back.	2021-06-09 15:35:26 -07:00
Craig Topper	c653711fd3	[RISCV] Teach vsetvli insertion pass that operations on masks don't care about SEW/LMUL. All that really matters is that the VLMAX of the preceding instructions is the same as the VLMAX required by the mask operation. Also update the vmsge(u) handling to use the SEW/LMUL we use for other mask register operations. We were matching it to the compare before. Some cases will be improve if we fix masked compares to use tail agnostic policy. I think they ignore the tail policy anyway. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D103299	2021-06-04 09:17:46 -07:00
Craig Topper	e9313fa33a	[RISCV] Simplify some code in RISCVInsertVSETVLI by calling an existing function that does the same thing. NFCI	2021-06-03 17:31:54 -07:00
Craig Topper	0fa5aac292	[RISCV] Teach VSETVLI insertion to look through PHIs to prove we don't need to insert a vsetvli. If an instruction's AVL operand is a PHI node in the same block, we may be able to peek through the PHI to find vsetvli instructions that produce the AVL in other basic blocks. If we can prove those vsetvli instructions have the same VTYPE and were the last vsetvli in their respective blocks, then we don't need to insert a vsetvli for this pseudo instruction. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D103277	2021-05-27 15:34:08 -07:00
Craig Topper	527cd01314	[RISCV] Teach vsetvli insertion to use vsetvl x0, x0 form when we can tell that VLMAX and AVL haven't changed. This can help avoid needing a virtual register for the vsetvl output when the AVL is X0. For other register AVLs it can shorter the live range of the AVL register if it isn't needed later. There's probably no advantage when AVL is a 5 bit immediate that can use vsetivli. But do it anyway for consistency. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D103215	2021-05-27 10:11:38 -07:00
Craig Topper	fdf10e6197	[RISCV] Use X0 as destination of inserted vsetvli when possible. We aren't going to connect the result to anything so we might as well avoid allocating a register. Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D102031	2021-05-26 13:08:51 -07:00
Craig Topper	b2c7ac874f	[RISCV] Don't propagate VL/VTYPE across inline assembly in the Insert VSETVLI pass. It's conceivable someone could put a vsetvli in inline assembly so its safer to consider them as barriers. The alternative would be to trust that the user marks VL and VTYPE registers as clobbers of the inline assembly if they do that, but hat seems error prone. I'm assuming inline assembly in vector code is going to be rare. Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D103126	2021-05-26 09:56:20 -07:00
Craig Topper	1b47a3de48	[RISCV] Enable cross basic block aware vsetvli insertion This patch extends D102737 to allow VL/VTYPE changes to be taken into account before adding an explicit vsetvli. We do this by using a data flow analysis to propagate VL/VTYPE information from predecessors until we've determined a value for every value in the function. We use this information to determine if a vsetvli needs to be inserted before the first vector instruction the block. Differential Revision: https://reviews.llvm.org/D102739	2021-05-26 09:25:42 -07:00
Craig Topper	b510e4cf1b	[RISCV] Add a vsetvli insert pass that can be extended to be aware of incoming VL/VTYPE from other basic blocks. This is a replacement for D101938 for inserting vsetvli instructions where needed. This new version changes how we track the information in such a way that we can extend it to be aware of VL/VTYPE changes in other blocks. Given how much it changes the previous patch, I've decided to abandon the previous patch and post this from scratch. For now the pass consists of a single phase that assumes the incoming state from other basic blocks is unknown. A follow up patch will extend this with a phase to collect information about how VL/VTYPE change in each block and a second phase to propagate this information to the entire function. This will be used by a third phase to do the vsetvli insertion. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D102737	2021-05-24 11:47:27 -07:00

35 Commits