105 Commits

Author SHA1 Message Date
Philip Reames
1474880353 [RISCV] Use classic dataflow for VSETVLI insertion
Our current implementation of the InsertVSETVLI dataflow allows phase 3 to arrive at a different block end state than the data flow in phase 1/2 computed. This arises because a block which contains instructions (e.g. load or stores) which don't consume all the incoming bits of the VL/VTYPE can be compatible with multiple incoming states. The algorithm effectively changes the SEW on such instructions, and propagates the prior state forward. As phase 3 uses the block input state for this propagation, but phase 1/2 doesn't, this can result in different block end states.

If we don't correct for it, this discrepancy can result in miscompiles. This was the source of multiple recent bugs. However, by now we have fixes for all known correctness issues.

The basic strategy we use is to insert a compensation vsetvli to bring the block state leaving the block back into consistency with the one computed. This is correct, but results in extra vsetvlis being placed at the end of blocks.

This change adjusts the phase 1/2 algorithm to propagate the incoming block state through the block, allowing the compatibility rules to modify the end state. The algorithm may need to run slightly more iterations, but the end result is consistent with what phase 3 does.

The benefit of doing this is two fold.

First, we reverse some of the code quality introductions introduced in the functional fixes.

Second, we simplify the invariants, and allow the strict assertions to be enabled. Several humans, myself included, have found it quite surprising that invariant didn't hold already, and arguably that confusion is the cause of several of our recent miscompiles in this code.

The downside to this patch is that the dataflow may require additional iterations to stabilize. In the worse case, we go from O(Edges) to O(E + UniquePaths) as the incoming state (and thus the outgoing one) can now change once for each path from the entry block.

Differential Revision: https://reviews.llvm.org/D125232
2022-05-16 17:06:27 -07:00
Philip Reames
3d17c91709 [RISCV] Fix missing vsetvli in transparent block case
We've got a lurking problem with our data flow implementation where different phases disagree, resulting in possible miscompiles. D119518 introduced a workaround, but failed to consider blocks which only contain load/stores compatible with their incoming state.

When I went to rebase and simplify D125232, it turned out that not all of the correctness issues had been fixed yet after all. This is the correctness fix accidentally embedded in the original more complicated version.

Note that the test changes here are mostly regressions. It's worth noting that the simplified version of D125232 exactly reverses all the non-functional diffs in the test caused here. D125232 should be the immediate following commit.

Differential Revision: https://reviews.llvm.org/D125703
2022-05-16 17:06:27 -07:00
Philip Reames
e2df48bb23 [RISCV] Add further trace output to InsertVSETLVI 2022-05-16 09:15:32 -07:00
Philip Reames
52b5f1f7d4 [RISCV] Extend dataflow workaround from D119518 to fallthrough blocks
We've got a lurking problem with our data flow implementation where different phases disagree, resulting in possible miscompiles. D119518 introduced a workaround, but failed to consider blocks without terminators (e.g. fallthroughs).

I have a deeper rework of the algorithm in flight over in D125232, but this patch is specifically a minimal fix for an active miscompile. That change can be reworked over this once landed.

Differential Revision: https://reviews.llvm.org/D125408
2022-05-12 10:45:59 -07:00
Craig Topper
09f48c6b80 [RISCV] Move implementation of getVLOpNum and getSEWOpNum from RISCVInsertVSETVLI to RISCVBaseInfo.h. NFC
We should consolidate the operand counting and ordering into
RISCVBaseInfo.h and stop spreading it around.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D125344
2022-05-11 11:14:58 -07:00
Philip Reames
72925d98bf [riscv] Canonicalize vsetvli (vsetvli avl, vtype1) vtype2 transitionsas reviewed
This patch is an alternative to a piece of D125270. If we have one vsetvli which is using as AVL the output of another, and the prior AVL can be proven to produce the same VL value as that defining one, we can use the AVL from the prior instruction. This has the effect of removing a state transition on AVL, and will let us use the cheaper 'vsetvli x0, x0, vtype1' form or possible even skip emitting it entirely.

This builds on the same infrastructure as D125337, and does the analogous extension to working on abstract states instead of only prior explicit vsetvli instructions. This is where the (relatively minor) code improvements come from.

More importantly, this fixes the last case where the state computed in phase 1 and 2 of the algorithm differs from the state computed during phase 3. Note that such differences can cause miscompiles by creating disagreements about contents of the VL and VTYPE registers at block boundaries.

Doing this transform inside the dataflow can cause the compatibility of a later store to change with regards to the current state. test15 in the diff illustrates this case well. What we have is a vsetvli which is mutated by one following vector op, but whose GPR result is used by another. The compatibility logic walks back to the def in this case, and checks to see if it matches the immediate prior state. In phase 1 and 2, it doesn't, and in phase 3 (after mutation) it does because we remove a transition which caused it to differ.

Differential Revision: https://reviews.llvm.org/D125392
2022-05-11 10:45:29 -07:00
Philip Reames
cc0283a635 [riscv] Prefer to use previous VL for scalar move instructionsK
This patch is an alternative to a piece of D125270. Its direct motivation is to fix a wrong code bug (described below), but somewhat unexpectedly, it also results in a significant code quality improvement for idiomatic fixed length vector patterns.

The existing transform is simply wrong in its current location. We are correct about the fact that the scalar move itself can use the previous vsetvli, but we loose track of the fact that later instructions might depend on the state change represented. That is, the actual value of VL in the register is different than the abstract state thinks it is. Not simply due to precision of modeling, but e.g. the VL register could contain 3 when the abstract state says it is 1. This is annoying hard to demonstrate in practice due to differences in policy flags on the intrinsics, but this is at least a latent wrong code bug.

The code quality benefit comes from the fact we don't need to tie this to explicit vsetvli instructions at all. We can propagate the abstract state, and reduce a) the number of transitions, or b) the cost of those transitions. It turns out we have a bunch of cases - in tests at least - where fixed length AVLs are known non-zero, and we can leave VL unchanged while changing VTYPE.

Differential Revision: https://reviews.llvm.org/D125337
2022-05-11 07:37:50 -07:00
Philip Reames
7731935ffc [riscv] Consolidate logic for SEW/VL operand offset calculations [nfc] 2022-05-10 15:06:26 -07:00
Philip Reames
413052310a [riscv] Minor style cleanup so that code more obviously matches comments [nfc] 2022-05-10 14:20:26 -07:00
Philip Reames
70ad96ca5e [riscv, InsertVSETVLI] Rename InstrInfo to Require to more clearly indicate purpose [nfc] 2022-05-09 06:40:33 -07:00
Philip Reames
7ed16e7c51 [riscv] Fix state tracking bug on vsetvli (phi of vsetvli) peephole
This fixes the first of several cases where the state computed in phase 1 and 2 of the algorithm differs from the state computed during phase 3. Note that such differences can cause miscompiles by creating disagreements about contents of the VL and VTYPE registers at block boundaries.

In this particular case, we recognize that for the first vsetvli in a block, that if the AVL is a phi of GPR results from previous vsetvlis and the VTYPE field matches, we can avoid emitting a vsetvli as the register contents don't change. Unfortunately, the abstract state does change and that update was lost.

As noted in the test change, this can actually improve results by preserving information until later state transitions in the block. However, this minor codegen improvement is not the motivation for the patch. The motivation is to avoid cases a case where we break a key internal correctness invariant.

Differential Revision: https://reviews.llvm.org/D125133
2022-05-09 06:21:45 -07:00
Philip Reames
c7c3f58544 [riscv] Use early return to reduce nesting for InsertVSETVLI [nfc] 2022-05-06 13:10:05 -07:00
Philip Reames
99a41005fe [riscv] Add early return to InsertVSETLI fixed point step [nfc]
If the income state hasn't changed, and the step function is fixed by assumption, then the output state can't have changed.

In the current algorithm, this is a very minor win and mostly allows adding tracing output without being horrible verbose.
2022-05-06 13:08:11 -07:00
Philip Reames
dee9b01d83 [riscv] Add some minimal tracing output to InsertVSETVLI
Only available with -debug.  Main purpose is simplifying an upcoming change, and providing tools for debugging problems.
2022-05-06 13:08:11 -07:00
Philip Reames
f486119ce9 [riscv] Add strict asserts for VSETVLI insertion algorithm to help catch bugs
This assertion should hold for any reasonable data flow algorithm, but is known not to in several cases today. I'd like to go ahead and land this off-by-default, so that we can collaborate on fixes and have a common definition of success.

Differential: https://reviews.llvm.org/D125035
2022-05-06 10:28:22 -07:00
Philip Reames
042a7a5f0d [riscv] Use X0 for destination of VSETVLI instruction if result unused
If the GPR destination register of a VSETVLI instruction is unused, we can replace it with X0. This discards the result, and thus reduces register pressure.

Since after the core insertion/lowering algorithm has run, many user written VSETVLIs will have their GPR result unused (as VTYPE/VLEN is now explicitly read instead), this kicks in for most tests which involve a vsetvli intrinsic for fixed length vectorization. (vscale vectorization generally uses the GPR result to know how far to e.g. advance pointers in a loop and these uses are not removed.)  When inserting VSETVLIs to lower psuedos, we prefer the X0 form anyways.

Differential Revision: https://reviews.llvm.org/D124961
2022-05-05 07:39:45 -07:00
Philip Reames
18ed2ee80c [RISCV] Add a version of insertVSETVLI which uses an iterator [NFC]
This is to simplify the final version of D124869.
2022-05-04 14:48:31 -07:00
Weverything
5afd20806d [riscv] Mark function as used to avoid unused warning. 2022-05-03 18:51:23 -07:00
Philip Reames
2982d0032b Fix a buildbot warning [nfc] 2022-05-03 14:40:27 -07:00
Philip Reames
be50b8c185 [riscv] Add debug printing support for VSETVLIInfo class [nfc] 2022-05-03 14:00:17 -07:00
Zakk Chen
b578330754 [RISCV] Use maskedoff to decide mask policy for masked compare and vmsbf/vmsif/vmsof.
masked compare and vmsbf/vmsif/vmsof are always tail agnostic, we could
check maskedoff value to decide mask policy rather than have a addtional
policy operand.

Reviewed By: craig.topper, arcbbb

Differential Revision: https://reviews.llvm.org/D122456
2022-03-29 18:05:33 -07:00
Zakk Chen
abb5a985e9 [RISCV] Support mask policy for RVV IR intrinsics.
Add the UsesMaskPolicy flag to indicate the operations result
would be effected by the mask policy. (ex. mask operations).

It means RISCVInsertVSETVLI should decide the mask policy according
by mask policy operand or passthru operand.
If UsesMaskPolicy is false (ex. unmasked, store, and reduction operations),
the mask policy could be either mask undisturbed or agnostic.
Currently, RISCVInsertVSETVLI sets UsesMaskPolicy operations default to
MA, otherwise to MU to keep the current mask policy would not be changed
for unmasked operations.

Add masked-tama, masked-tamu, masked-tuma and masked-tumu test cases.
I didn't add all operations because most of implementations are using
the same pseudo multiclass. Some tests maybe be duplicated in different
tests. (ex. masked vmacc with tumu shows in vmacc-rv32.ll and masked-tumu)
I think having different tests only for policy would make the testing
clear.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D120226
2022-03-22 01:19:16 -07:00
Chenbing.Zheng
2ae92e19eb [RISCV][NFC] Add helper function isVectorConfigInstr to reduce Repeated code.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D119924
2022-02-24 05:59:12 +00:00
Craig Topper
541c9ba842 [RISCV] Insert VSETVLI at the end of a basic block if we didn't produce BlockInfo.Exit.
This is an alternative to D118667 that instead of fixing the store
to match phase 1, it tries to detect the mismatch with the expected
value at the end of the block. This inserts a vsetvli after the vse
to satisfy the requirement of the other basic block.

We still have serious design issues in the pass, that is going to
require some rethinking.

Differential Revision: https://reviews.llvm.org/D119518
2022-02-11 09:34:16 -08:00
Craig Topper
f35ac872b8 Revert "[RISCV] Fix a vsetvli insertion bug involving loads/stores." and "[RISCC] Add missing words to comment. NFC"
This reverts commit f943c58cae2480755cecdac5be832274f238df93.
and commit 7eb781072744b31a60e82b5a5903471032d4845f.

This introduced a new bug that appears to be easier to hit.

Differential Revision: https://reviews.llvm.org/D119517
2022-02-11 09:34:16 -08:00
Kazu Hirata
3a3cb929ab [llvm] Use = default (NFC) 2022-02-06 22:18:35 -08:00
Craig Topper
f943c58cae [RISCC] Add missing words to comment. NFC 2022-02-01 07:39:51 -08:00
Craig Topper
7eb7810727 [RISCV] Fix a vsetvli insertion bug involving loads/stores.
The first phase of the analysis can avoid a vsetvli if an earlier
instruction in the block used an SEW and LMUL that when combined with
the EEW of the load/store would produce the desired EMUL. If we
avoided a vsetvli this will affect the global analysis we do in the
second phase.

The third phase where we really insert the vsetvlis needs to agree
with the first phase. If it doesn't we can insert vsetvlis that
invalidate the global analysis.

In the test case there is a VSETVLI in the preheader that sets
SEW=64 and LMUL=1. Inside the loop there is a VADD with SEW=64 and LMUL=1.
This VADD is followed by a store that wants wants SEW=32 LMUL=1/2.
Because it has EEW=32 as part of the opcode the SEW=64 LMUL=1 from the
VADD can be become EMUL=1 for the store. So the first phase determines no
vsetvli is needed.

The third phase manages CurInfo differently than BBInfo.Change from the
first phase. CurInfo is only updated when we see a vsetvli or insert
a vsetvli. This was done to allow predecessor block information from
the global analysis to be applied to multiple instructions. Since the
loop body has no vsetvli we won't update CurInfo for either the VADD
or the VSE. This prevented us from checking the store vsetvli elision
for the VSE resulting in a vsetvli SEW=32 LMUL=1/2 being emitted which
invalidated the global analysis.

To mitigate this, I've added a BBLocalInfo variable that more closely
matches the first phase propagation. This gets updated based on the
VADD and prevents emitting a vsetvli for the store like we did in the
first phase.

I wonder if we should do an earlier phase to handle the load/store case
by adding more pseudo opcodes and changing the SEW/LMUL for those
instructions before the insertion analysis. That might be more robust
than trying to guarantee two phases make the same decision.

Fixes the test from D118629.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118667
2022-02-01 07:29:01 -08:00
Craig Topper
4602f4169a [RISCV] Prune unnecessary vector pseudo instructions. NFC
For .vf instructions, we don't need MF8 pseudos for f16. We don't
need MF8 or MF4 pseudos for f32. Or MF8, MF4, MF2 for f64.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D116437
2022-01-01 19:53:53 -08:00
jacquesguan
05f82dc877 [RISCV] Fix incorrect cases of vmv.s.f in the VSETVLI insert pass.
Fix incorrect cases of vmv.s.f and add test cases for it.

Differential Revision: https://reviews.llvm.org/D116432
2021-12-31 14:17:03 +08:00
jacquesguan
128c6ed73b [RISCV] Teach VSETVLInsert to eliminate redundant vsetvli for vmv.s.x and vfmv.s.f.
Differential Revision: https://reviews.llvm.org/D116307
2021-12-30 17:16:18 +08:00
Craig Topper
f59307bfdc [RISCV] Teach needVSETVLIPHI to handle mask register instructions.
This handles the case where the mask register instruction input
comes from a Phi of vsetvlis. If the VLMAX is the same as the VLMAX
required by the mask register instruction, we can avoid a vsetvli.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D113204
2021-11-15 09:57:28 -08:00
Craig Topper
aefcd59895 [RISCV] Teach RISCVInsertVSETVLI::needVSETVLI to handle mask register instructions better.
If the VL operand of a mask register instruction comes from an
explicit vsetvli with a different VTYPE, we can still avoid needing
a vsetvli as long as the SEW/LMUL ratio is the same and policy bits
match.

Differential Revision: https://reviews.llvm.org/D112762
2021-10-29 09:49:36 -07:00
Craig Topper
1387483e72 [RISCV] Replace most uses of RISCVSubtarget::hasStdExtV. NFCI
Add new hasVInstructions() which is currently equivalent.

Replace vector uses of hasStdExtZfh/F/D with new vector specific
versions. The vector spec no longer requires that the vectors implement the
same types as scalar. It only requires that the scalar type is
the maximum size the vectors can support. This is currently
implemented using the scalar rule we were using before.

Add new hasVInstructionsI64() begin using to qualify code that
requires i64 vector elements.

This is all NFC for now, but we can start using this to better
implement D112408 which introduces the Zve extensions.

Reviewed By: frasercrmck, eopXD

Differential Revision: https://reviews.llvm.org/D112496
2021-10-27 19:33:48 -07:00
Fraser Cormack
74c6895b39 [RISCV] Fix missing cross-block VSETVLI insertion
This patch fixes a codegen bug, the test for which was introduced in
D112223.

When merging VSETVLIInfo across blocks, if the 'exit' VSETVLIInfo
produced by a block is found to be compatible with the VSETVLIInfo
computed as the intersection of the 'exit' VSETVLIInfo produced by the
block's predecessors, that blocks' 'exit' info is discarded and the
intersected value is taken in its place.

However, we have one authority on what constitutes VSETVLIInfo
compatibility and we are using it in two different contexts.

Compatibility is used in one context to elide VSETVLIs between
straight-line vector instructions. But compatibility when evaluated
between two blocks' exit infos ignores any info produced *inside* each
respective block before the exit points. As such it does not guarantee
that a block will not produce a VSETVLI which is incompatible with the
'previous' block.

As such, we must ensure that any merging of VSETVLIInfo is performed
using some notion of "strict" compatibility. I've defined this as a full
vtype match, but this is perhaps too pessimistic. Given that test
coverage in this regard is lacking -- the only change is in the failing
test -- I think this is a good starting point.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D112228
2021-10-22 10:45:10 +01:00
Hsiangkai Wang
7d39a8a921 [RISCV] (1/2) Add the tail policy argument to builtins/intrinsics.
Add the tail policy argument to LLVM IR intrinsics. There are two policies for tail elements. Tail agnostic means users do not care about the values in the tail elements and tail undisturbed means the values in the tail elements need to be kept after the operation. In order to let users control the tail policy, we add an additional argument at the end of the argument list.

For unmasked operations, we have no maskedoff and the tail policy is always tail agnostic. If users want to keep tail elements under unmasked operations, they could use all one mask in the masked operations to do it. So, we only add the additional argument for masked operations for most cases. There are exceptions listed below.

In this patch, we do not handle the following cases to reduce the complexity of the patch. There could be two separate patches for them.

* Use dest argument to control tail policy
vmerge.vvm/vmerge.vxm/vmerge.vim (add _t builtins with additional dest argument)
vfmerge.vfm (add _t builtins with additional dest argument)
vmv.v.v (add _t builtins with additional dest argument)
vmv.v.x (add _t builtins with additional dest argument)
vmv.v.i (add _t builtins with additional dest argument)
vfmv.v.f (add _t builtins with additional dest argument)
vadc.vvm/vadc.vxm/vadc.vim (add _t builtins with additional dest argument)
vsbc.vvm/vsbc.vxm (add _t builtins with additional dest argument)

* Always has tail argument for masked/unmasked intrinsics
Vector Single-Width Integer Multiply-Add Instructions (add _t and _mt builtins)
Vector Widening Integer Multiply-Add Instructions (add _t and _mt builtins)
Vector Single-Width Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins)
Vector Widening Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins)
Vector Reduction Operations (add _t and _mt builtins)
Vector Slideup Instructions (add _t and _mt builtins)
Vector Slidedown Instructions (add _t and _mt builtins)

Discussion: https://github.com/riscv/rvv-intrinsic-doc/pull/101

Differential Revision: https://reviews.llvm.org/D105092
2021-09-24 17:09:50 +08:00
Craig Topper
6c7cadb8c1 [RISCV] Teach vsetvli insertion that stores don't use the policy bits in vtype.
This can avoid a vsetvl after a tail undisturbed operation.

Differential Revision: https://reviews.llvm.org/D109549
2021-09-10 09:03:20 -07:00
Craig Topper
75620fadf5 [RISCV] Change how we encode AVL operands in vector pseudoinstructions to use GPRNoX0.
This patch changes the register class to avoid accidentally setting
the AVL operand to X0 through MachineIR optimizations.

There are cases where we really want to use X0, but we can't get that
past the MachineVerifier with the register class as GPRNoX0. So I've
use a 64-bit -1 as a sentinel for X0. All other immediate values should
be uimm5. I convert it to X0 at the earliest possible point in the VSETVLI
insertion pass to avoid touching the rest of the algorithm. In
SelectionDAG lowering I'm using a -1 TargetConstant to hide it from
instruction selection and treat it differently than if the user
used -1. A user -1 should be selected to a register since it doesn't
fit in uimm5.

This is the rest of the changes started in D109110. As mentioned there,
I don't have a failing test from MachineIR optimizations anymore.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109116
2021-09-03 09:19:25 -07:00
Craig Topper
e4e69ba4d1 [RISCV] Split PseudoVSETVLI into 2 instructions to allow different register classes for rs1.
X0 has special meaning for vsetvli, we need to make sure we never
create it a vsetvli that uses it by accident. This could happen
if the register coalescer coalesces a copy from X0 into this
instruction.

This patch splits the instruction so that we can have GPRNoX0
register class to use for the cases where we don't want the source
to be X0. The verifier won't let us explicitly use X0 on a GPRNoX0
operand so we need a separate pseudo for those cases.

I don't currently have a failing example for this. There was a
failure in D107957, but the coalescable copy from that example
should have been optimized away much earlier so I've fixed that.

This is not a complete fix. We still need to prevent the same
possible issue on the AVL operand of all of the vector instruction
pseudos. I don't want to make two versions of all of those so we
need to find a different solution for those. I have an idea I'm
going to try.

Differential Revision: https://reviews.llvm.org/D109110
2021-09-02 07:45:31 -07:00
Craig Topper
79fbddbea0 [RISCV] Teach vsetvli insertion pass that it doesn't need to insert vsetvli for unit-stride or strided loads/stores in some cases.
For unit-stride and strided load/stores we set the SEW operand of
the pseudo instruction equal the EEW in the opcode. The LMUL
of the pseudo instruction is the LMUL we want.

These instructions calculate EMUL=(EEW/SEW) * LMUL. We can use
this to avoid changing vtype if the SEW/LMUL of the previous
vtype matches the EEW/EMUL ratio we need for the instruction.

Due to how the global analysis works, we can only do this
optimization when the previous vsetvli was produced in the block
containing the store. We need to know in the first phase if the
vsetvli will be inserted so we can propagate information to
the successors in the second phase correctly. This means we can't
depend on predecessors.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D106601
2021-08-12 10:05:27 -07:00
Craig Topper
643ce70a64 [RISCV] Remove the _COMMUTABLE and _TA versions of FMA and wide FMA vector instructions.
Use a tail policy operand instead. Inspired by the work in D105092,
but without the intrinsic interface changes.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D106512
2021-08-04 10:39:50 -07:00
jacquesguan
7900ee0b61 [RISCV] Teach VSETVLI insertion to merge the unused VSETVLI with the one need to be insert after it.
If a vsetvli instruction is not compatible with the next vector instruction,
and there is no other things that may update or use VL/VTYPE, we could merge
it with the next vsetvli instruction that should be insert for the vector
instruction.

This commit only merge VTYPE with the former vsetvli instruction which has
the same VL.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106857
2021-08-03 12:06:59 +08:00
Craig Topper
5edccc4581 [RISCV] Avoid using x0,x0 vsetvli for vmv.x.s and vfmv.f.s unless we know the sew/lmul ratio is constant.
Since we're changing VTYPE, we may change VLMAX which could
invalidate the previous VL. If we can't tell if it is safe we
should use an AVL of 1 instead of keeping the old VL.

This is a quick fix. We may want to thread VL to the pseudo
instruction instead of making up a value. That will require ISD
opcode changes and changes to the C intrinsic interface.

This fixes the issue raised in D106286.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D106403
2021-07-23 09:12:05 -07:00
Craig Topper
a467c08570 [RISCV] Cleanup comment around vector tail policy handling. NFC
vmv.x.s and reductions don't ignore tail policy anymore.
2021-07-21 12:45:08 -07:00
Craig Topper
c2e01ee4a5 [RISCV] Remove extra character from a comment. NFC 2021-06-21 12:52:02 -07:00
Craig Topper
ac87133f1d [RISCV] Teach vsetvli insertion to remember when predecessors have same AVL and SEW/LMUL ratio if their VTYPEs otherwise mismatch.
Previously we went directly to unknown state on VTYPE mismatch.
If we instead remember the partial match, we can use this to
still use X0, X0 vsetvli in successors if AVL and needed SEW/LMUL
ratio match.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D104069
2021-06-18 12:16:07 -07:00
Craig Topper
8dfd0810f2 [RISCV] Remove unused method from RISCVInsertVSETVLI. NFC
If this becomes needed its trivial to add it back.
2021-06-09 15:35:26 -07:00
Craig Topper
c653711fd3 [RISCV] Teach vsetvli insertion pass that operations on masks don't care about SEW/LMUL.
All that really matters is that the VLMAX of the preceding
instructions is the same as the VLMAX required by the mask
operation.

Also update the vmsge(u) handling to use the SEW/LMUL we use for
other mask register operations. We were matching it to the compare
before. Some cases will be improve if we fix masked compares to
use tail agnostic policy. I think they ignore the tail policy
anyway.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D103299
2021-06-04 09:17:46 -07:00
Craig Topper
e9313fa33a [RISCV] Simplify some code in RISCVInsertVSETVLI by calling an existing function that does the same thing. NFCI 2021-06-03 17:31:54 -07:00
Craig Topper
0fa5aac292 [RISCV] Teach VSETVLI insertion to look through PHIs to prove we don't need to insert a vsetvli.
If an instruction's AVL operand is a PHI node in the same block,
we may be able to peek through the PHI to find vsetvli instructions
that produce the AVL in other basic blocks. If we can prove those
vsetvli instructions have the same VTYPE and were the last vsetvli
in their respective blocks, then we don't need to insert a vsetvli
for this pseudo instruction.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D103277
2021-05-27 15:34:08 -07:00