133 Commits

Author SHA1 Message Date
Kito Cheng
7504e9a193 [RISCV][NFC] Refine the patch of D141061
Just saw Craig's comment after I commit, he has suggest a good NFC
for that change.
2023-01-06 00:48:24 +08:00
Kito Cheng
05a2ae1b4a [RISCV][InsertVSETVLI] Using right instruction during mutate AVL of vsetvli
Fixing a crash during vsetvli insertion pass.

We have a testcase with 3 vsetvli:

1. vsetivli        zero, 2, e8, m4, ta, ma
2. li      a1, 32;  vsetvli zero, a1, e8, m4, ta, mu
3. vsetivli        zero, 2, e8, m4, ta, ma

and then we trying to optimize 2nd vsetvli since the only user is vmv.x.s, so
it could mutate the AVL operand to the AVL operand of the 3rd vsetvli.
OK, so we propagate 2 to vsetvli, BUT it's vsetvli not vsetivli, so it expect a
register rather than a immediate value, so we have to update the opcode
if needed.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D141061
2023-01-06 00:44:30 +08:00
Philip Reames
46dee4a3a3 [RISCV][InsertVSETVLI] Split out demanded property for zero/non-zero of VL
The scalar move instructions (vmv.s.x, and fvmv.s.f) depend solely on whether the VL is 0 or non-zero. By tracking the fact we only demand the zeroness and not the whole VL value, we can allow changing VL over a scalar move. This helps to eliminate vsetvli toggles.

Differential Revision: https://reviews.llvm.org/D140157
2023-01-03 14:47:13 -08:00
Philip Reames
6df5464a46 [RISCV] Minor type fix [nfc] 2023-01-03 14:22:38 -08:00
Philip Reames
460c1bd344 [RISCV][InsertVSETVLI] Rewrite scalar insert forward rule in terms of demanded fields
This is mostly geared at consolidating logic into one form to reduce code duplication, but also has the effect of being a slight generalization. Since these operations aren't masked, we can ignore the mask policy bit when deciding on compatibility. The previous code was overly strict in checking that both policy bits matched.

Note: There's a slight difference from the reviewed version.  The reviewed version was based on a local revision which included the isCompatible change to only check AVL if VL is used.  I apparently never landed that change, and while functional, the functional change isn't visible without this one.  I chose to role the extra change into this patch.

Differential Revision: https://reviews.llvm.org/D140147
2023-01-03 14:19:52 -08:00
Philip Reames
d36936fdb4 [RISCV][InsertVSETVLI] Add debug output capability to DemandedFields [nfc] 2023-01-03 13:56:57 -08:00
Philip Reames
23f4f66da7 [RISCV][InsertVSETVL] Incorporate demanded fields into compatibility interface [nfc]
This reworks the API to explicitly pass in the demanded fields instead of requering them internally.  At the moment, this is NFC, but it will stop being so in future changes which adjust the demanded bits in the caller.
2022-12-15 11:11:09 -08:00
Philip Reames
695fdef0ef [RISCV] Bugfix for 90f91683 noticed in follow up work
I went to extend this locally, and then promptly tripped across a bug which is possible with the landed patch.  The problematic case is:
vsetvli zero, 4, <some vtype>
vmv.x.s x1, v0
vsetvli a0, zero, <same type>

In this case, the naive rewrite - what I had implemented - would form:
vsetvli zero, zero, <same vtype>
vmv.x.s x1, v0

This is, amusingly, correct for the vmv.x.s, but is incorrect for the instructions which follow the sequence and probably rely on VL=VLMAX.  (The VL before the sequence is unknown, and thus doesn't have to be VLMAX.)

I plan to rework the rewrite code to be more robust here, but I wanted to directly fix the bug first.  Sorry for the lack of test; I didn't manage to reproduce this without an additional optimization change after a few minutes of trying.
2022-12-15 08:32:05 -08:00
Philip Reames
90f9168307 [RISCV][InsertVSETVLI] Mutate prior vsetvli AVL if doing so allows us to remove a toggle
This extends the backwards walk to allow mutating the previous vsetvl's AVL value if it was not used by any instructions in between. In practice, this mostly benefits vmv.x.s and fvmv.f.s patterns since vector instructions which ignore VL are rare.

Differential Revision: https://reviews.llvm.org/D140048
2022-12-15 07:32:28 -08:00
Philip Reames
3a020527c2 [RISCV] Use make_range instead of iterator_range for code from 8e6c3094
Jordan fixed this once in 4f9d069, but using make_range is more idiomatic than my accidental iterator_range usage, even with the template type to fix the warning.
2022-12-13 11:17:59 -08:00
Jordan Rupprecht
4f9d069b3b [NFC] Specify template type to fix -Wctad-qmaybe-unsupported 2022-12-13 10:50:20 -08:00
Philip Reames
8e6c309451 [RISCV][InsertVSETVLI] Reverse traversal order of block in post pass [nfc]
his unblocks a following change to be more sophisticated during post pass rewriting.

Review wise, I basically just want a second set of eyes. This change should be straight forward, but since it took me an embarrassing number of attempts to get make check to pass. Let's make sure I'm not missing yet another cornercase.

Differential Revision: https://reviews.llvm.org/D139877
2022-12-13 07:54:05 -08:00
Craig Topper
96ac1aeaf4 [RISCV] Make DemandedFields::usedVTYPE() const. NFC
Noticed while reviewing D139877.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D139879
2022-12-12 14:49:26 -08:00
Philip Reames
b385c01f24 [RISCV][InsertVSETVLI] Reorder code to reduce a future diff [nfc] 2022-12-12 14:46:00 -08:00
Philip Reames
06ebce363a [RISCV][InsertVSETVLI] vmv.s.x and fvmv.s.f do not depend on LMUL
We already have this rule encoded elsewhere in the file - which is why we don't see any test changes.  I'm adding it here for completionism.

This is not technically NFC since there could be a test case which isn't caught by the specific rules, but is handled by the generic logic.  I don't have such an example.
2022-12-08 10:14:39 -08:00
Philip Reames
14ea545a7d [RISCV][InsertVSETVLI] Generalize scalar move rule for when AVL is unchanged
By definition, the AVL of the scalar move is equally zero to the prior AVL if they are the same value.  This generalizes the existing code to the case where the scalar move has a register AVL which is unknown, but unchanged from the preceeding instruction.

This doesn't cause any interesting diffs on its own, but another patch makes this case much more common.  Split off to reduce a future diff.
2022-12-07 10:28:31 -08:00
Kazu Hirata
9f252e5567 [llvm] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 17:31:17 -08:00
Fangrui Song
b0df70403d [Target] llvm::Optional => std::optional
The updated functions are mostly internal with a few exceptions (virtual functions in
TargetInstrInfo.h, TargetRegisterInfo.h).
To minimize changes to LLVMCodeGen, GlobalISel files are skipped.

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 22:43:14 +00:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
wangpc
ea1a2aaa9a [RISCV] Map pseudos to their BaseInstr to reduce cases
There are a lot of cases for pseudos of the same instruction, here
we just use existed mapping table to map pseudos to real instructions
to reduce cases.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D128271
2022-10-27 16:50:15 +08:00
Craig Topper
1bdf21d55c [RISCV] Use mask/tail agnostic if tied source is IMPLICIT_DEF regardless of the policy operand.
If the source is implicit_def, the register allocator won't have
any constraint on what register it picks for the destination. This
doesn't give the user much control of what register is being used.

So in my mind that means the only reason to honor the policy operand
is to control what policy is used in vsetvli to maybe avoid a vtype
change. Given the other optimizations we do on the policy field, I
don't think allowing the user this control is reliable.

Therefore, I think we should use agnostic policies if the source is
undef.

This should give better performance on some CPUs for VP intrinsics where
there is no merge operand and the backend adds IMPLICIT_DEF to the instruction.

Differential Revision: https://reviews.llvm.org/D135396
2022-10-11 16:40:16 -07:00
Philip Reames
d89d45ca9a [RISCV][InsertVSETVLI] Default to MA not MU
This changes the default value used for mask policy from mask undisturbed to mask agnostic. In hardware, there may be a minor preference for ta/ma, but since this is only going to apply to instructions which don't use the mask policy bit, this is functionally mostly a nop. The main value is to make future changes to using MA when legal for masked instructions easier to review by reducing test churn.

The prior code was motivated by a desire to minimize state transitions between masked and unmasked code. This patch achieves the same effect using the demanded field logic (landed in afb45ff), and there are no regressions I spotted in the test diffs. (Given the size, I have only been able to skim.) I do want to call out that regressions are possible here; the demanded analysis only works on a block local scope right now, so e.g. a tight loop mixing masked and unmasked computation might see an extra vsetvli or two.

Differential Revision: https://reviews.llvm.org/D133803
2022-10-06 07:59:39 -07:00
Philip Reames
afb45ffce7 [RISCV][InsertVSETVLI] Treat mask policy as undemanded if usesMaskPolicy is false
Differential Revision: https://reviews.llvm.org/D135327
2022-10-06 07:20:16 -07:00
Anton Sidorenko
3e97e94237 [NFC][RISCV] Move getSEWLMULRatio function to header
More uses of getSEWLMULRatio will be added in D130895.

Reviewed By: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D135086
2022-10-05 15:10:53 +01:00
Philip Reames
900364fccf [RISCV] Minor code motion in InsertVSETVLI [nfc] 2022-09-29 14:01:57 -07:00
Craig Topper
94049db913 [RISCV] Make computeIncomingVLVTYPE more conservative when merging predecessor state.
If we have already calculated the incoming state before, use that
as our starting point to ensure we are conservative.

This fixes an infinite loop found in our downstream where we
we allowed two waves of updates to propagate through a loop and
the merge points allowed us to toggle back and forth between states.
No small reproducer right now.

Differential Revision: https://reviews.llvm.org/D134229
2022-09-19 15:57:55 -07:00
Craig Topper
0cec96ab25 [RISCV] Manage the InQueue flag in insertvli correctly.
We were only setting this flag the first time we added the blocks
not when we mark them for revisiting.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D134193
2022-09-19 14:28:22 -07:00
Philip Reames
9a9848f4b9 [RISCVInsertVSETVLI] Remove an unsound optimization
This fixes a bug reported privately by @craig.topper. Here's an example which illustrates the problem:

vsetivli a1, a0, e32, m1, ta, mu # both DefInfo and PrevInfo
vsetivli a2, a1, e32, m4, ta, mu

With the unsound result being:

vsetivli a1, a0, e32, m1, ta, mu
vsetivli a2, a0, e32, m4, ta, mu

Consider the case where this is running on a machine with VLEN=512,. For this case, the VLMAXs are 16 and 64 respectively.

Consider for a0 = 33. The correct result is: a1 = 16, and a2 = 16

After the unsound optimization: a1 = 16 and a2 = 33

This particular example used VLMAXs which differed by more than a power of two. With a difference of only one power of two, there's another form of this bug which involves the AVL < 2 x VLMAX special case, but that ones more complicated to construct as many examples turn out accidentally sound.

This patch takes the approach of simply removing the unsound optimization, but there are multiple sound sub-cases of it. I plan to return to at least a couple of them, but figured it was cleaner to remove the unsound optimization (for ease of backporting), and then review the new optimizations on their own.

Differential Revision: https://reviews.llvm.org/D131264
2022-08-05 12:13:08 -07:00
Philip Reames
dd48d3ad0e Revert "[RISCV] Avoid changing etype for splat of 0 or -1"
This reverts commit 755c84c62cda80b0acf51ccc5653fc6d64536f7e.  A bug was reported on the original review thread (https://reviews.llvm.org/D128006), and on inspection this patch is simply wrong.  It needs to be checking for VLInBytes, not MaxVL.  These happen to be the same when using AVL=VLMAX (which is quite common), but this does not fold when AVL != VLMAX.
2022-06-29 10:27:02 -07:00
Craig Topper
eb9d21d65c [RISCV] Remove extra semicolon. NFC 2022-06-26 18:19:43 -07:00
Philip Reames
1cc9792281 [RISCV] Fix a crash in InsertVSETVLI where we hadn't properly guarded for a SEWLMULRatioOnly abstract state
A forward abstract state can be in the special SEWLMULRatioOnly state which means we're not allowed to inspect its fields.  The scalar to vector move case was mising a guard, and we'd crash on an assert.  Test cases included.
2022-06-23 10:25:16 -07:00
Philip Reames
14847098f9 [RISCV] Delete unexercised VL=0 vsetvli compatibility logic
The code being removed is technically correct; if we end up with two VL=0 instructions next to each other, we can avoid a state transition if the second is a scalar move.  However, since both ops are also nops, we should simply delete them instead.  As such, this compatibility rule simply complicates the code for no purpose.
2022-06-20 10:15:31 -07:00
Philip Reames
dc562d570d [RISCV] Fold prepass back into InsertVSETVLI data flow [nfc-ish]
When working through correctness issues in this pass, I moved a number of transforms which were phrased as mutating prior vsetvli instructions out of the main data flow because mutating prior instructions can invalidate the running dataflow results in subtle ways. We ended up creating both a prepass and a post-pass.

After consideration, I believe the prepass to be redundant, and this change removes it by folding it back into the data flow via a key conceptual change. Instead of phrasing the mutations on instructions, we can phrase them on abstract states. This avoids the dataflow inconsistency problem mentioned above by simply propagating the potential change forward, and thus reflecting its results in the dataflow.  Critically, we do so without modifying existing VSETVLI instructions; some of the data flow steps include non-local IR analysis.

Compile time wise, this removes a linear pass, but has the potential to increase the number of iterations for the data flow to converge. That's not a algorithmic complexity change, the needVSETVLI mechanism has the same effect. In practice, I don't see this triggering more iterations, so I think it's likely to be a net win overall. (I didn't do any careful analysis here; just an impression from glancing at a couple tests.)

This has the potential to produce better results, so this isn't strictly speaking NFC.

Differential Revision: https://reviews.llvm.org/D127870
2022-06-20 07:56:33 -07:00
Philip Reames
820e84e050 [RISCV] Assert initial load/store SEW is the EEW
In D127983, I had flipped from using the computed EEW to using the SEW value pulled from the VSETVLI when checking compatibility. This wasn't intentional, though thankfully it appears to be a non-functional difference. The new code does make a unchecked assumption that the initial SEW operand on the load/store is the EEW. This patch clarifies the assumption, and adds an assert to make sure this remains true.

Differential Revision: https://reviews.llvm.org/D128085
2022-06-20 07:45:21 -07:00
Philip Reames
fb8ecca06f [RISCV] Remove redundant code checking for exact VTYPE match [nfc]
Should be fully covered by the generic demanded field based logic just below, and this ensures better coverage of that logic.
2022-06-17 12:20:20 -07:00
Philip Reames
4d245f1bc2 [RISCV] Move store policy and mask reg ops into demanded handling in InsertVSETVLI
Doing so let's the post-mutation pass leverage the demanded info to rewrite vsetvlis before a store/mask-op to eliminate later vsetvlis.

Sorry for the lack of store test change; all of my attempts to write something reasonable have been handled through existing logic.
2022-06-17 12:09:50 -07:00
Philip Reames
b595cddea7 [riscv] Extract isMaskRegOp helper [nfc] 2022-06-17 10:40:54 -07:00
Philip Reames
e1f1407beb [RISCV] Delete dead elideCopy code in InsertVSETVLI [nfc]
This code should be dead. A simple whole register copy of an IMPLICIT_DEF, is simply an IMPLICIT_DEF of it's own. (This would not be true for freeze, but is for copy.)  If we find a case which gets here with vector operand copy of an IMPLICIT_DEF, we most likely have an earlier missed optimization anyways.  (The most recent case of this was e6c7a3a, found by Craig during review of this patch.)  There might be others, and if so, we'll revisit them individually as regressions are reported.

Differential Revision: https://reviews.llvm.org/D127996
2022-06-17 09:58:11 -07:00
Philip Reames
755c84c62c [RISCV] Avoid changing etype for splat of 0 or -1
A splat of the values 0 and -1 as sign extended 12 bit immediates are always the same bit pattern regardless of the etype used to perform the operation. As a result, we can sometimes avoid introducing a vsetvli just for the purposes of a splat.

Looking at the diffs, we don't get a huge amount of immediate value out of this. We mostly push the vsetvli one instruction down, usually in front of a vmerge. We also don't get the corresponding fixed length vector cases because VL typically is changed despite the actual bits written being the same. Both of these are areas I plan to explore in future patches.

Interestingly, this makes a great example of why we need the forward and backward implementation to be consistent. Before we merged the demanded field handling, if we implement only the forward direction, we lost the ability to mutate a prior vsetvli and eliminate a later one entirely. This resulted in practical regressions instead of improvements. It's always nice when practice matches theory. :)

Differential Revision: https://reviews.llvm.org/D128006
2022-06-17 08:10:14 -07:00
Philip Reames
2fa2cee6a8 [RISCV] Start merging demanded reasoning - starting with load/stores [nfc]
This change merges the logic for reasoning about demanded portions of the VTYPE register between the main dataflow algorithm and the backwards mutation post pass. In the process, we get to delete a bunch of now redundant code.

This should be entirely NFC. I included a slight hack (see TODO) to avoid changing behavior in the post pass while being able to use the generalized logic in the prepass. I will fix the TODO in a separate change once this lands.

Differential Revision: https://reviews.llvm.org/D127983
2022-06-16 14:34:53 -07:00
Philip Reames
89a11ebd8e [RISCV] Avoid reducing etype just to initialize lane 0 of an undef vector
If we're writing to an undef vector (i.e. implicit_def), we can change the value of bits outside the requested write without consequence. This allows us to avoid a VSETVLI just for narrowing the value written.

Differential Revision: https://reviews.llvm.org/D127880
2022-06-16 11:14:21 -07:00
Philip Reames
6ed81ec164 [RISCV] Reorder function definitions to reduce upcoming diff [nfc] 2022-06-16 09:25:27 -07:00
Philip Reames
27c61d033f [RISCV] Split DemandedField logic in advance of reuse in dataflow [nfc]
This change just moves some code around, and extracts out a helper function expected to be useful when reusing the demanded field logic in the forward dataflow.
2022-06-16 08:49:41 -07:00
Philip Reames
37fa5850f1 [RISCV] Move getSEWLMULRatio out of VSETVLIInfo [nfc] 2022-06-16 08:40:20 -07:00
Philip Reames
4a3e46115a [RISCV] Extend demanded field transform in InsertVSETVLI to VTYPE subfeilds
The motivating case, and the only one actually enabled by this patch, is a load or store followed by another op with the same SEW/LMUL ratio.

As an example, consider:

define void @test1(ptr %in, ptr %out) {
entry:
  %0 = load <8 x i16>, ptr %in, align 2
  %1 = sext <8 x i16> %0 to <8 x i32>
  store <8 x i32> %1, ptr %out, align 4
  ret void
}

Without this patch, we get:

	vsetivli	zero, 8, e16, mf4, ta, mu
	vle16.v	v8, (a0)
	vsetvli	zero, zero, e32, mf2, ta, mu
	vsext.vf2	v9, v8
	vse32.v	v9, (a1)
	ret

Whereas with the patch we get:

	vsetivli	zero, 8, e32, mf2, ta, mu
	vle16.v	v8, (a0)
	vsext.vf2	v9, v8
	vse32.v	v9, (a1)
	ret

We have rewritten the first vsetvli and thus removed the second one.

As is strongly hinted by the code structure and todos, I am planning on communing this with all (or most all?) of the cases from isCompatible used in the forward data flow. This will be done in a series of following changes - some NFC reworks, and some reviewed optimization extensions.

Differential Revision: https://reviews.llvm.org/D127780
2022-06-16 08:01:27 -07:00
Yeting Kuo
9096a52566 [RISCV] Teach vsetvli insertion to not insert redundant vsetvli right after VLEFF/VLSEGFF.
VSETVLIInfos right after VLEFF/VLSEGFF are currently unknown since they modify
VL. Unknown VSETVLIInfos make next vector operations needed to be inserted
VSET(I)VLI. Actually the next vector operation of VLEFF/VLSEGFF may not need to
be inserted VSET(I)VLI if it uses same VTYPE and the resulted vl of
VLEFF/VLSEGFF.

Take the below C code as an example,

  vint8m4_t vec_src1 = vle8ff_v_i8m4(str1, &new_vl, vl);
  vbool2_t mask1 = vmseq_vx_i8m4_b2(vec_src1, 0, new_vl);
  vsetvli insertion adds a redundant vsetvli for that,

Assembly result:
  vsetvli a2,a2,e8,m4,ta,mu
  vle8ff.v v28,(a0)
  csrr a3,vl ; redundant
  vsetvli zero,a3,e8,m4,ta,mu ; redundant
  vmseq.vi v25,v28,0

After D126794, VLEFF/VLSEGFF has a define having value of VL. The patch consider
there is a ghost vsetvli right after VLEFF/VLSEGFF. The ghost VSET(I)LIs use the
vl output of the VLEFF/VLSEGFF as its AVL and same VTYPE of the VLEFF/VLSEGFF.
The ghost vsetvli must be redundant, and we could use it to get the VSETVLIInfo
right after VLEFF/VLSEGFF.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127576
2022-06-15 13:58:40 +08:00
Philip Reames
facb96584e [RISCV] Minor code/comment improvement in prepass of InsertVSETVLI [nfc] 2022-06-14 16:18:11 -07:00
Philip Reames
c67c4133ac [RISCV] Split out transfer function explicitly in VSETVLI insertion dataflow [nfc]
In an effort to make this code easier to read and extend, this splits out helper functions for the transfer function of the data flow. Due to the other results computed during the phases, we can't completely abstract away everything, but we can abstract the actual state transitions.

The motivation here is the following upcoming changes:
* The fault first load patch - already approved, this will be rebased over - adds another case into the transferAfter path.
* An upcoming patch to fold the local prepass back into the main algorithm greatly complicates the transferBefore logic.

Differential Revision: https://reviews.llvm.org/D127761
2022-06-14 14:07:15 -07:00
Philip Reames
44a0a558dc [RISCV] Split out subfields in InsertVSETVLI's demanded fields analysis [nfc]
At the moment, this just gets the infrastructure in place.  Following changes will start using this in non-trivial ways.
2022-06-14 11:35:24 -07:00
Philip Reames
52b166c0de [RISCV] Split out getEEWForLoadStore [nfc]
Mostly about allowing reuse in an upcoming patch, but also makes the code slightly easier to follow.
2022-06-14 10:10:43 -07:00