llvm-project

Author	SHA1	Message	Date
Philip Reames	1474880353	[RISCV] Use classic dataflow for VSETVLI insertion Our current implementation of the InsertVSETVLI dataflow allows phase 3 to arrive at a different block end state than the data flow in phase 1/2 computed. This arises because a block which contains instructions (e.g. load or stores) which don't consume all the incoming bits of the VL/VTYPE can be compatible with multiple incoming states. The algorithm effectively changes the SEW on such instructions, and propagates the prior state forward. As phase 3 uses the block input state for this propagation, but phase 1/2 doesn't, this can result in different block end states. If we don't correct for it, this discrepancy can result in miscompiles. This was the source of multiple recent bugs. However, by now we have fixes for all known correctness issues. The basic strategy we use is to insert a compensation vsetvli to bring the block state leaving the block back into consistency with the one computed. This is correct, but results in extra vsetvlis being placed at the end of blocks. This change adjusts the phase 1/2 algorithm to propagate the incoming block state through the block, allowing the compatibility rules to modify the end state. The algorithm may need to run slightly more iterations, but the end result is consistent with what phase 3 does. The benefit of doing this is two fold. First, we reverse some of the code quality introductions introduced in the functional fixes. Second, we simplify the invariants, and allow the strict assertions to be enabled. Several humans, myself included, have found it quite surprising that invariant didn't hold already, and arguably that confusion is the cause of several of our recent miscompiles in this code. The downside to this patch is that the dataflow may require additional iterations to stabilize. In the worse case, we go from O(Edges) to O(E + UniquePaths) as the incoming state (and thus the outgoing one) can now change once for each path from the entry block. Differential Revision: https://reviews.llvm.org/D125232	2022-05-16 17:06:27 -07:00
Philip Reames	3d17c91709	[RISCV] Fix missing vsetvli in transparent block case We've got a lurking problem with our data flow implementation where different phases disagree, resulting in possible miscompiles. D119518 introduced a workaround, but failed to consider blocks which only contain load/stores compatible with their incoming state. When I went to rebase and simplify D125232, it turned out that not all of the correctness issues had been fixed yet after all. This is the correctness fix accidentally embedded in the original more complicated version. Note that the test changes here are mostly regressions. It's worth noting that the simplified version of D125232 exactly reverses all the non-functional diffs in the test caused here. D125232 should be the immediate following commit. Differential Revision: https://reviews.llvm.org/D125703	2022-05-16 17:06:27 -07:00
Philip Reames	e2df48bb23	[RISCV] Add further trace output to InsertVSETLVI	2022-05-16 09:15:32 -07:00
Philip Reames	52b5f1f7d4	[RISCV] Extend dataflow workaround from D119518 to fallthrough blocks We've got a lurking problem with our data flow implementation where different phases disagree, resulting in possible miscompiles. D119518 introduced a workaround, but failed to consider blocks without terminators (e.g. fallthroughs). I have a deeper rework of the algorithm in flight over in D125232, but this patch is specifically a minimal fix for an active miscompile. That change can be reworked over this once landed. Differential Revision: https://reviews.llvm.org/D125408	2022-05-12 10:45:59 -07:00
Craig Topper	09f48c6b80	[RISCV] Move implementation of getVLOpNum and getSEWOpNum from RISCVInsertVSETVLI to RISCVBaseInfo.h. NFC We should consolidate the operand counting and ordering into RISCVBaseInfo.h and stop spreading it around. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D125344	2022-05-11 11:14:58 -07:00
Philip Reames	72925d98bf	[riscv] Canonicalize vsetvli (vsetvli avl, vtype1) vtype2 transitionsas reviewed This patch is an alternative to a piece of D125270. If we have one vsetvli which is using as AVL the output of another, and the prior AVL can be proven to produce the same VL value as that defining one, we can use the AVL from the prior instruction. This has the effect of removing a state transition on AVL, and will let us use the cheaper 'vsetvli x0, x0, vtype1' form or possible even skip emitting it entirely. This builds on the same infrastructure as D125337, and does the analogous extension to working on abstract states instead of only prior explicit vsetvli instructions. This is where the (relatively minor) code improvements come from. More importantly, this fixes the last case where the state computed in phase 1 and 2 of the algorithm differs from the state computed during phase 3. Note that such differences can cause miscompiles by creating disagreements about contents of the VL and VTYPE registers at block boundaries. Doing this transform inside the dataflow can cause the compatibility of a later store to change with regards to the current state. test15 in the diff illustrates this case well. What we have is a vsetvli which is mutated by one following vector op, but whose GPR result is used by another. The compatibility logic walks back to the def in this case, and checks to see if it matches the immediate prior state. In phase 1 and 2, it doesn't, and in phase 3 (after mutation) it does because we remove a transition which caused it to differ. Differential Revision: https://reviews.llvm.org/D125392	2022-05-11 10:45:29 -07:00
Philip Reames	cc0283a635	[riscv] Prefer to use previous VL for scalar move instructionsK This patch is an alternative to a piece of D125270. Its direct motivation is to fix a wrong code bug (described below), but somewhat unexpectedly, it also results in a significant code quality improvement for idiomatic fixed length vector patterns. The existing transform is simply wrong in its current location. We are correct about the fact that the scalar move itself can use the previous vsetvli, but we loose track of the fact that later instructions might depend on the state change represented. That is, the actual value of VL in the register is different than the abstract state thinks it is. Not simply due to precision of modeling, but e.g. the VL register could contain 3 when the abstract state says it is 1. This is annoying hard to demonstrate in practice due to differences in policy flags on the intrinsics, but this is at least a latent wrong code bug. The code quality benefit comes from the fact we don't need to tie this to explicit vsetvli instructions at all. We can propagate the abstract state, and reduce a) the number of transitions, or b) the cost of those transitions. It turns out we have a bunch of cases - in tests at least - where fixed length AVLs are known non-zero, and we can leave VL unchanged while changing VTYPE. Differential Revision: https://reviews.llvm.org/D125337	2022-05-11 07:37:50 -07:00
Philip Reames	7731935ffc	[riscv] Consolidate logic for SEW/VL operand offset calculations [nfc]	2022-05-10 15:06:26 -07:00
Philip Reames	413052310a	[riscv] Minor style cleanup so that code more obviously matches comments [nfc]	2022-05-10 14:20:26 -07:00
Philip Reames	70ad96ca5e	[riscv, InsertVSETVLI] Rename InstrInfo to Require to more clearly indicate purpose [nfc]	2022-05-09 06:40:33 -07:00
Philip Reames	7ed16e7c51	[riscv] Fix state tracking bug on vsetvli (phi of vsetvli) peephole This fixes the first of several cases where the state computed in phase 1 and 2 of the algorithm differs from the state computed during phase 3. Note that such differences can cause miscompiles by creating disagreements about contents of the VL and VTYPE registers at block boundaries. In this particular case, we recognize that for the first vsetvli in a block, that if the AVL is a phi of GPR results from previous vsetvlis and the VTYPE field matches, we can avoid emitting a vsetvli as the register contents don't change. Unfortunately, the abstract state does change and that update was lost. As noted in the test change, this can actually improve results by preserving information until later state transitions in the block. However, this minor codegen improvement is not the motivation for the patch. The motivation is to avoid cases a case where we break a key internal correctness invariant. Differential Revision: https://reviews.llvm.org/D125133	2022-05-09 06:21:45 -07:00
Philip Reames	c7c3f58544	[riscv] Use early return to reduce nesting for InsertVSETVLI [nfc]	2022-05-06 13:10:05 -07:00
Philip Reames	99a41005fe	[riscv] Add early return to InsertVSETLI fixed point step [nfc] If the income state hasn't changed, and the step function is fixed by assumption, then the output state can't have changed. In the current algorithm, this is a very minor win and mostly allows adding tracing output without being horrible verbose.	2022-05-06 13:08:11 -07:00
Philip Reames	dee9b01d83	[riscv] Add some minimal tracing output to InsertVSETVLI Only available with -debug. Main purpose is simplifying an upcoming change, and providing tools for debugging problems.	2022-05-06 13:08:11 -07:00
Philip Reames	f486119ce9	[riscv] Add strict asserts for VSETVLI insertion algorithm to help catch bugs This assertion should hold for any reasonable data flow algorithm, but is known not to in several cases today. I'd like to go ahead and land this off-by-default, so that we can collaborate on fixes and have a common definition of success. Differential: https://reviews.llvm.org/D125035	2022-05-06 10:28:22 -07:00
Philip Reames	042a7a5f0d	[riscv] Use X0 for destination of VSETVLI instruction if result unused If the GPR destination register of a VSETVLI instruction is unused, we can replace it with X0. This discards the result, and thus reduces register pressure. Since after the core insertion/lowering algorithm has run, many user written VSETVLIs will have their GPR result unused (as VTYPE/VLEN is now explicitly read instead), this kicks in for most tests which involve a vsetvli intrinsic for fixed length vectorization. (vscale vectorization generally uses the GPR result to know how far to e.g. advance pointers in a loop and these uses are not removed.) When inserting VSETVLIs to lower psuedos, we prefer the X0 form anyways. Differential Revision: https://reviews.llvm.org/D124961	2022-05-05 07:39:45 -07:00
Philip Reames	18ed2ee80c	[RISCV] Add a version of insertVSETVLI which uses an iterator [NFC] This is to simplify the final version of D124869.	2022-05-04 14:48:31 -07:00
Weverything	5afd20806d	[riscv] Mark function as used to avoid unused warning.	2022-05-03 18:51:23 -07:00
Philip Reames	2982d0032b	Fix a buildbot warning [nfc]	2022-05-03 14:40:27 -07:00
Philip Reames	be50b8c185	[riscv] Add debug printing support for VSETVLIInfo class [nfc]	2022-05-03 14:00:17 -07:00
Zakk Chen	b578330754	[RISCV] Use maskedoff to decide mask policy for masked compare and vmsbf/vmsif/vmsof. masked compare and vmsbf/vmsif/vmsof are always tail agnostic, we could check maskedoff value to decide mask policy rather than have a addtional policy operand. Reviewed By: craig.topper, arcbbb Differential Revision: https://reviews.llvm.org/D122456	2022-03-29 18:05:33 -07:00
Zakk Chen	abb5a985e9	[RISCV] Support mask policy for RVV IR intrinsics. Add the UsesMaskPolicy flag to indicate the operations result would be effected by the mask policy. (ex. mask operations). It means RISCVInsertVSETVLI should decide the mask policy according by mask policy operand or passthru operand. If UsesMaskPolicy is false (ex. unmasked, store, and reduction operations), the mask policy could be either mask undisturbed or agnostic. Currently, RISCVInsertVSETVLI sets UsesMaskPolicy operations default to MA, otherwise to MU to keep the current mask policy would not be changed for unmasked operations. Add masked-tama, masked-tamu, masked-tuma and masked-tumu test cases. I didn't add all operations because most of implementations are using the same pseudo multiclass. Some tests maybe be duplicated in different tests. (ex. masked vmacc with tumu shows in vmacc-rv32.ll and masked-tumu) I think having different tests only for policy would make the testing clear. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120226	2022-03-22 01:19:16 -07:00
Chenbing.Zheng	2ae92e19eb	[RISCV][NFC] Add helper function isVectorConfigInstr to reduce Repeated code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D119924	2022-02-24 05:59:12 +00:00
Craig Topper	541c9ba842	[RISCV] Insert VSETVLI at the end of a basic block if we didn't produce BlockInfo.Exit. This is an alternative to D118667 that instead of fixing the store to match phase 1, it tries to detect the mismatch with the expected value at the end of the block. This inserts a vsetvli after the vse to satisfy the requirement of the other basic block. We still have serious design issues in the pass, that is going to require some rethinking. Differential Revision: https://reviews.llvm.org/D119518	2022-02-11 09:34:16 -08:00
Craig Topper	f35ac872b8	Revert "[RISCV] Fix a vsetvli insertion bug involving loads/stores." and "[RISCC] Add missing words to comment. NFC" This reverts commit f943c58cae2480755cecdac5be832274f238df93. and commit 7eb781072744b31a60e82b5a5903471032d4845f. This introduced a new bug that appears to be easier to hit. Differential Revision: https://reviews.llvm.org/D119517	2022-02-11 09:34:16 -08:00
Kazu Hirata	3a3cb929ab	[llvm] Use = default (NFC)	2022-02-06 22:18:35 -08:00
Craig Topper	f943c58cae	[RISCC] Add missing words to comment. NFC	2022-02-01 07:39:51 -08:00
Craig Topper	7eb7810727	[RISCV] Fix a vsetvli insertion bug involving loads/stores. The first phase of the analysis can avoid a vsetvli if an earlier instruction in the block used an SEW and LMUL that when combined with the EEW of the load/store would produce the desired EMUL. If we avoided a vsetvli this will affect the global analysis we do in the second phase. The third phase where we really insert the vsetvlis needs to agree with the first phase. If it doesn't we can insert vsetvlis that invalidate the global analysis. In the test case there is a VSETVLI in the preheader that sets SEW=64 and LMUL=1. Inside the loop there is a VADD with SEW=64 and LMUL=1. This VADD is followed by a store that wants wants SEW=32 LMUL=1/2. Because it has EEW=32 as part of the opcode the SEW=64 LMUL=1 from the VADD can be become EMUL=1 for the store. So the first phase determines no vsetvli is needed. The third phase manages CurInfo differently than BBInfo.Change from the first phase. CurInfo is only updated when we see a vsetvli or insert a vsetvli. This was done to allow predecessor block information from the global analysis to be applied to multiple instructions. Since the loop body has no vsetvli we won't update CurInfo for either the VADD or the VSE. This prevented us from checking the store vsetvli elision for the VSE resulting in a vsetvli SEW=32 LMUL=1/2 being emitted which invalidated the global analysis. To mitigate this, I've added a BBLocalInfo variable that more closely matches the first phase propagation. This gets updated based on the VADD and prevents emitting a vsetvli for the store like we did in the first phase. I wonder if we should do an earlier phase to handle the load/store case by adding more pseudo opcodes and changing the SEW/LMUL for those instructions before the insertion analysis. That might be more robust than trying to guarantee two phases make the same decision. Fixes the test from D118629. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D118667	2022-02-01 07:29:01 -08:00
Craig Topper	4602f4169a	[RISCV] Prune unnecessary vector pseudo instructions. NFC For .vf instructions, we don't need MF8 pseudos for f16. We don't need MF8 or MF4 pseudos for f32. Or MF8, MF4, MF2 for f64. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D116437	2022-01-01 19:53:53 -08:00
jacquesguan	05f82dc877	[RISCV] Fix incorrect cases of vmv.s.f in the VSETVLI insert pass. Fix incorrect cases of vmv.s.f and add test cases for it. Differential Revision: https://reviews.llvm.org/D116432	2021-12-31 14:17:03 +08:00
jacquesguan	128c6ed73b	[RISCV] Teach VSETVLInsert to eliminate redundant vsetvli for vmv.s.x and vfmv.s.f. Differential Revision: https://reviews.llvm.org/D116307	2021-12-30 17:16:18 +08:00
Craig Topper	f59307bfdc	[RISCV] Teach needVSETVLIPHI to handle mask register instructions. This handles the case where the mask register instruction input comes from a Phi of vsetvlis. If the VLMAX is the same as the VLMAX required by the mask register instruction, we can avoid a vsetvli. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D113204	2021-11-15 09:57:28 -08:00
Craig Topper	aefcd59895	[RISCV] Teach RISCVInsertVSETVLI::needVSETVLI to handle mask register instructions better. If the VL operand of a mask register instruction comes from an explicit vsetvli with a different VTYPE, we can still avoid needing a vsetvli as long as the SEW/LMUL ratio is the same and policy bits match. Differential Revision: https://reviews.llvm.org/D112762	2021-10-29 09:49:36 -07:00
Craig Topper	1387483e72	[RISCV] Replace most uses of RISCVSubtarget::hasStdExtV. NFCI Add new hasVInstructions() which is currently equivalent. Replace vector uses of hasStdExtZfh/F/D with new vector specific versions. The vector spec no longer requires that the vectors implement the same types as scalar. It only requires that the scalar type is the maximum size the vectors can support. This is currently implemented using the scalar rule we were using before. Add new hasVInstructionsI64() begin using to qualify code that requires i64 vector elements. This is all NFC for now, but we can start using this to better implement D112408 which introduces the Zve extensions. Reviewed By: frasercrmck, eopXD Differential Revision: https://reviews.llvm.org/D112496	2021-10-27 19:33:48 -07:00
Fraser Cormack	74c6895b39	[RISCV] Fix missing cross-block VSETVLI insertion This patch fixes a codegen bug, the test for which was introduced in D112223. When merging VSETVLIInfo across blocks, if the 'exit' VSETVLIInfo produced by a block is found to be compatible with the VSETVLIInfo computed as the intersection of the 'exit' VSETVLIInfo produced by the block's predecessors, that blocks' 'exit' info is discarded and the intersected value is taken in its place. However, we have one authority on what constitutes VSETVLIInfo compatibility and we are using it in two different contexts. Compatibility is used in one context to elide VSETVLIs between straight-line vector instructions. But compatibility when evaluated between two blocks' exit infos ignores any info produced inside each respective block before the exit points. As such it does not guarantee that a block will not produce a VSETVLI which is incompatible with the 'previous' block. As such, we must ensure that any merging of VSETVLIInfo is performed using some notion of "strict" compatibility. I've defined this as a full vtype match, but this is perhaps too pessimistic. Given that test coverage in this regard is lacking -- the only change is in the failing test -- I think this is a good starting point. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D112228	2021-10-22 10:45:10 +01:00
Hsiangkai Wang	7d39a8a921	[RISCV] (1/2) Add the tail policy argument to builtins/intrinsics. Add the tail policy argument to LLVM IR intrinsics. There are two policies for tail elements. Tail agnostic means users do not care about the values in the tail elements and tail undisturbed means the values in the tail elements need to be kept after the operation. In order to let users control the tail policy, we add an additional argument at the end of the argument list. For unmasked operations, we have no maskedoff and the tail policy is always tail agnostic. If users want to keep tail elements under unmasked operations, they could use all one mask in the masked operations to do it. So, we only add the additional argument for masked operations for most cases. There are exceptions listed below. In this patch, we do not handle the following cases to reduce the complexity of the patch. There could be two separate patches for them. * Use dest argument to control tail policy vmerge.vvm/vmerge.vxm/vmerge.vim (add _t builtins with additional dest argument) vfmerge.vfm (add _t builtins with additional dest argument) vmv.v.v (add _t builtins with additional dest argument) vmv.v.x (add _t builtins with additional dest argument) vmv.v.i (add _t builtins with additional dest argument) vfmv.v.f (add _t builtins with additional dest argument) vadc.vvm/vadc.vxm/vadc.vim (add _t builtins with additional dest argument) vsbc.vvm/vsbc.vxm (add _t builtins with additional dest argument) * Always has tail argument for masked/unmasked intrinsics Vector Single-Width Integer Multiply-Add Instructions (add _t and _mt builtins) Vector Widening Integer Multiply-Add Instructions (add _t and _mt builtins) Vector Single-Width Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins) Vector Widening Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins) Vector Reduction Operations (add _t and _mt builtins) Vector Slideup Instructions (add _t and _mt builtins) Vector Slidedown Instructions (add _t and _mt builtins) Discussion: https://github.com/riscv/rvv-intrinsic-doc/pull/101 Differential Revision: https://reviews.llvm.org/D105092	2021-09-24 17:09:50 +08:00
Craig Topper	6c7cadb8c1	[RISCV] Teach vsetvli insertion that stores don't use the policy bits in vtype. This can avoid a vsetvl after a tail undisturbed operation. Differential Revision: https://reviews.llvm.org/D109549	2021-09-10 09:03:20 -07:00
Craig Topper	75620fadf5	[RISCV] Change how we encode AVL operands in vector pseudoinstructions to use GPRNoX0. This patch changes the register class to avoid accidentally setting the AVL operand to X0 through MachineIR optimizations. There are cases where we really want to use X0, but we can't get that past the MachineVerifier with the register class as GPRNoX0. So I've use a 64-bit -1 as a sentinel for X0. All other immediate values should be uimm5. I convert it to X0 at the earliest possible point in the VSETVLI insertion pass to avoid touching the rest of the algorithm. In SelectionDAG lowering I'm using a -1 TargetConstant to hide it from instruction selection and treat it differently than if the user used -1. A user -1 should be selected to a register since it doesn't fit in uimm5. This is the rest of the changes started in D109110. As mentioned there, I don't have a failing test from MachineIR optimizations anymore. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109116	2021-09-03 09:19:25 -07:00
Craig Topper	e4e69ba4d1	[RISCV] Split PseudoVSETVLI into 2 instructions to allow different register classes for rs1. X0 has special meaning for vsetvli, we need to make sure we never create it a vsetvli that uses it by accident. This could happen if the register coalescer coalesces a copy from X0 into this instruction. This patch splits the instruction so that we can have GPRNoX0 register class to use for the cases where we don't want the source to be X0. The verifier won't let us explicitly use X0 on a GPRNoX0 operand so we need a separate pseudo for those cases. I don't currently have a failing example for this. There was a failure in D107957, but the coalescable copy from that example should have been optimized away much earlier so I've fixed that. This is not a complete fix. We still need to prevent the same possible issue on the AVL operand of all of the vector instruction pseudos. I don't want to make two versions of all of those so we need to find a different solution for those. I have an idea I'm going to try. Differential Revision: https://reviews.llvm.org/D109110	2021-09-02 07:45:31 -07:00
Craig Topper	79fbddbea0	[RISCV] Teach vsetvli insertion pass that it doesn't need to insert vsetvli for unit-stride or strided loads/stores in some cases. For unit-stride and strided load/stores we set the SEW operand of the pseudo instruction equal the EEW in the opcode. The LMUL of the pseudo instruction is the LMUL we want. These instructions calculate EMUL=(EEW/SEW) * LMUL. We can use this to avoid changing vtype if the SEW/LMUL of the previous vtype matches the EEW/EMUL ratio we need for the instruction. Due to how the global analysis works, we can only do this optimization when the previous vsetvli was produced in the block containing the store. We need to know in the first phase if the vsetvli will be inserted so we can propagate information to the successors in the second phase correctly. This means we can't depend on predecessors. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D106601	2021-08-12 10:05:27 -07:00
Craig Topper	643ce70a64	[RISCV] Remove the _COMMUTABLE and _TA versions of FMA and wide FMA vector instructions. Use a tail policy operand instead. Inspired by the work in D105092, but without the intrinsic interface changes. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106512	2021-08-04 10:39:50 -07:00
jacquesguan	7900ee0b61	[RISCV] Teach VSETVLI insertion to merge the unused VSETVLI with the one need to be insert after it. If a vsetvli instruction is not compatible with the next vector instruction, and there is no other things that may update or use VL/VTYPE, we could merge it with the next vsetvli instruction that should be insert for the vector instruction. This commit only merge VTYPE with the former vsetvli instruction which has the same VL. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106857	2021-08-03 12:06:59 +08:00
Craig Topper	5edccc4581	[RISCV] Avoid using x0,x0 vsetvli for vmv.x.s and vfmv.f.s unless we know the sew/lmul ratio is constant. Since we're changing VTYPE, we may change VLMAX which could invalidate the previous VL. If we can't tell if it is safe we should use an AVL of 1 instead of keeping the old VL. This is a quick fix. We may want to thread VL to the pseudo instruction instead of making up a value. That will require ISD opcode changes and changes to the C intrinsic interface. This fixes the issue raised in D106286. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106403	2021-07-23 09:12:05 -07:00
Craig Topper	a467c08570	[RISCV] Cleanup comment around vector tail policy handling. NFC vmv.x.s and reductions don't ignore tail policy anymore.	2021-07-21 12:45:08 -07:00
Craig Topper	c2e01ee4a5	[RISCV] Remove extra character from a comment. NFC	2021-06-21 12:52:02 -07:00
Craig Topper	ac87133f1d	[RISCV] Teach vsetvli insertion to remember when predecessors have same AVL and SEW/LMUL ratio if their VTYPEs otherwise mismatch. Previously we went directly to unknown state on VTYPE mismatch. If we instead remember the partial match, we can use this to still use X0, X0 vsetvli in successors if AVL and needed SEW/LMUL ratio match. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D104069	2021-06-18 12:16:07 -07:00
Craig Topper	8dfd0810f2	[RISCV] Remove unused method from RISCVInsertVSETVLI. NFC If this becomes needed its trivial to add it back.	2021-06-09 15:35:26 -07:00
Craig Topper	c653711fd3	[RISCV] Teach vsetvli insertion pass that operations on masks don't care about SEW/LMUL. All that really matters is that the VLMAX of the preceding instructions is the same as the VLMAX required by the mask operation. Also update the vmsge(u) handling to use the SEW/LMUL we use for other mask register operations. We were matching it to the compare before. Some cases will be improve if we fix masked compares to use tail agnostic policy. I think they ignore the tail policy anyway. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D103299	2021-06-04 09:17:46 -07:00
Craig Topper	e9313fa33a	[RISCV] Simplify some code in RISCVInsertVSETVLI by calling an existing function that does the same thing. NFCI	2021-06-03 17:31:54 -07:00
Craig Topper	0fa5aac292	[RISCV] Teach VSETVLI insertion to look through PHIs to prove we don't need to insert a vsetvli. If an instruction's AVL operand is a PHI node in the same block, we may be able to peek through the PHI to find vsetvli instructions that produce the AVL in other basic blocks. If we can prove those vsetvli instructions have the same VTYPE and were the last vsetvli in their respective blocks, then we don't need to insert a vsetvli for this pseudo instruction. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D103277	2021-05-27 15:34:08 -07:00

1 2 3

105 Commits