llvm-project

Author	SHA1	Message	Date
Philip Reames	403261eafd	[RISCV] Remove legacy TA/TU pseudo distinction for load instructions This change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295. This change targets all the pseudos used in loads (unit, strided, segmented, fault first, and their combinations). As with previous changes in the series, we replace the existing TA and TU forms with a single unified pseudo with a passthru (which may be implicit_def) and a policy operand. One quirk is that I went ahead and treated the unmasked mask load instruction (vlm) the same way. We need the pass thru operand to model tail undefined, but since the instruction is unconditionally agnostic and the instruction has no mask, the policy operand is arguably unneeded. I kept it mostly for consistency sake. Another quirk worth highlighting is that segment loads require a bit of dedicated handling. Surprisingly, we don't have IMPLICIT_DEF nodes of the right types, and attempting to use them results in some odd looking codegen and a few crashes. Instead, I left the REG_SEQUENCE form, and extended InsertVSETVLI to recognize the complex undefs. Arguably, we should probably revisit the handling of undef reg_sequence nodes here, but I'm hoping to side step that in this patch. As before, we see codegen changes (some improvements and some regressions) due to scheduling differences caused by the extra implicit_def instructions. I did have to delete one register allocation regression test as I couldn't figure out how to meaningfully update it. I spent a significant amount of time trying, and finally gave up. Differential Revision: https://reviews.llvm.org/D154141	2023-07-05 13:11:58 -07:00
Craig Topper	354530fe19	[RISCV] Prevent vsetvli insertion from deleting some vsetvli instructions If the result register is used, it is not safe to delete. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D153076	2023-06-15 15:18:47 -07:00
Philip Reames	fc9b26440d	[RISCV][InsertVSETVLI] Treat vmv.v.i as-if it were vmv.s.x when VL=1, and inactive lanes are undefined A vmv.v.i/x splats the immediate to all active lanes. For the active lanes, this is the same as vmv.s.x which inserts one scalar into the low lane. If we can ignore all the inactive lanes (because they are known undefined), then the two are semantically equivalent. We already reason about compatible VL/VTYPE combinations for vmv.s.x, apply the same logic to vmv.v.i. Unlike a vmv.s.x, we do need to be careful not to increase LMUL. A splat instruction is probably linear in LMUL, so restrict this to LMUL1. Differential Revision: https://reviews.llvm.org/D152845	2023-06-15 14:10:04 -07:00
Philip Reames	807adcf4b9	[RISCV][InsertVSETVLI] Rework code structure to make reasoning about undefined lanes explicit [NFC] We already have several places in this code which reason about whether the inactive lanes are defined, and are about to add one more in D151653. Let's go ahead and common the code so that we don't have the same concept repeating in multiply places. Differential Revision: https://reviews.llvm.org/D152844	2023-06-14 09:48:31 -07:00
David Green	2802739dfd	[NFC] Replace ;; with ;	2023-06-11 10:25:24 +01:00
Luke Lau	f3b39ceaf5	[RISCV][InsertVSETVLI] Relax tail policy more often for vmv.s.x If a vm.s.x pseudo has an undef passthru operand, then we're free to use whatever tail policy we want for VL > 1. We previously relaxed the tail policy for this but only when we could also expand the SEW. This patch changes it to relax the tail policy even if the SEW can't be expanded and removes a few more toggles, as well as fully moving the vmv.s.x logic into getDemanded.	2023-05-31 18:18:44 +01:00
Luke Lau	badf11de4a	[RISCV][InsertVSETVLI] Avoid vmv.s.x SEW toggle if at start of block vmv.s.x/vfmv.s.f instructions that only write to the first destination element can use any SEW greater than or equal to its original SEW, provided that it's writing to an implicit_def operand where we can clobber the other lanes. We were already handling this in needVSETVLI, which meant that when scanning the instructions from top to bottom we could detect this and avoid the toggle: vsetivli zero, 4, e64, mf2, ta, ma li a0, 11 vsetivli zero, 1, e8, mf8, ta, ma vmv.s.x v0, a0 -> vsetivli zero, 4, e64, mf2, ta, ma li a0, 11 vmv.s.x v0, a0 The issue that this patch aims to solve is arises when the vmv.s.x is the first vector instruction in the block and doesn't have any prior predecessor info: entry_bb: li a0, 11 ; No previous state here: forced to set VL/VTYPE vsetivli zero, 1, e8, mf8, ta, ma vmv.s.x v0, a0 vsetivli zero, 4, e16, mf2, ta, ma vmerge.vvm v8, v9, v8, v0 doLocalPostpass can work backwards from bottom to top and work out if an earlier vsetvli can be mutated to avoid a toggle. It uses DemandedFields and getDemanded for this, which previously didn't take into account the possibility of going to a larger SEW. A previous patch consolidated the vmv.s.x logic from needVSETVLI logic into getDemanded, and this patch removes the gate around it so that doLocalPostpass can now delete vsetvlis like in the scenario below: entry_bb: li a0, 11 ; Previous vsetivli mutated: second one deleted vsetivli zero, 4, e16, mf2, ta, ma vmv.s.x v0, a0 vmerge.vvm v8, v9, v8, v0 Differential Revision: https://reviews.llvm.org/D151561	2023-05-31 18:18:44 +01:00
Luke Lau	257cc049f9	[RISCV][InsertVSETVLI] Move vmv.s.x SEW check into getDemandedBits. NFC This patch restructures the logic that checks if vmv.s.x's SEW can be expanded into getDemandedBits, so that it can be shared by both the top-to-bottom and bottom-to-top passes. It adds a third option for SEW in DemandedFields, that's weaker than demanded but stronger than not demanded, that states that it the new SEW must be greater than or equal to the current SEW. Note that we now need to take care of the order of operands in areCompatibleVTYPEs as the relation is no longer commutative. A later patch will remove the gating on the bottom-to-top pass (dolocalPostpass) and another one will relax the demands on the tail policy further.	2023-05-31 18:18:44 +01:00
Luke Lau	319adf5de7	Revert "[RISCV][InsertVSETVLI] Avoid vmv.s.x SEW toggle if at start of block" This reverts commit 0ba41dd3806e658e67acb63353fd5540f2bf333c.	2023-05-31 18:14:55 +01:00
Luke Lau	0ba41dd380	[RISCV][InsertVSETVLI] Avoid vmv.s.x SEW toggle if at start of block vmv.s.x and friends that only write to the first destination element can use any SEW greater than or equal to its original SEW, provided that it's writing to an implicit_def operand where we can clobber the other lanes. We were already handling this in needVSETVLI, which meant that when scanning the instructions from top to bottom we could detect this and avoid the toggle: ``` vsetivli zero, 4, e64, mf2, ta, ma li a0, 11 vsetivli zero, 1, e8, mf8, ta, ma vmv.s.x v0, a0 -> vsetivli zero, 4, e64, mf2, ta, ma li a0, 11 vmv.s.x v0, a0 ``` The issue that this patch aims to solve is whenever vmv.s.x arises when the first vector instruction in the block and doesn't have any prior predecessor info: ``` entry_bb: li a0, 11 ; No previous state here: forced to set VL/VTYPE vsetivli zero, 1, e8, mf8, ta, ma vmv.s.x v0, a0 vsetivli zero, 4, e16, mf2, ta, ma vmerge.vvm v8, v9, v8, v0 ``` doLocalPostpass can work backwards from bottom to top and work out if an earlier vsetvli can be mutated to avoid a toggle. It uses DemandedFields and getDemanded for this, which previously didn't take into account the possibility of going to a larger SEW. This patch adds a third option for SEW in DemandedFields, that's weaker than demanded but stronger than not demanded, that states that it the new SEW must be greater than or equal to the current SEW. We can then use this option to move that vmv.s.x specific logic from needVSETVLI into getDemanded, making it available for both phase 2 and 3, i.e. we can now mutate the earlier vsetivli going from bottom to top: ``` entry_bb: li a0, 11 ; Previous vsetivli mutated: second one deleted vsetivli zero, 4, e16, mf2, ta, ma vmv.s.x v0, a0 vmerge.vvm v8, v9, v8, v0 ``` Reviewed By: reames Differential Revision: https://reviews.llvm.org/D151561	2023-05-31 18:14:21 +01:00
Philip Reames	7639a39dd2	[RISCV][InsertVSETVLI] Support constant VLs larger than immediate encoding The immediate field on the vsetivli is fairly limited. For larger vectors, we end up having to materialize a constant in a register. We hadn't plumbed the infrastructure to treat such materialized constants as constants for purpose of vsetvli elimination. I only bothered to handle LI. We could extend this to LUI sequences, but well, 2048 elements is probably enough for all practical fixed length vector codegen. :) The test delta does point out a related problem. At LMUL8, we see increased register allocation pressure, and we should probably either a) address register allocation remat, or b) be less aggressive about eliminating vsetvlis at high lmul. Note that high LMUL code is not generated much by default. Differential Revision: https://reviews.llvm.org/D151212	2023-05-24 10:37:59 -07:00
Philip Reames	020812b64f	Reapply "[RISCV][InsertVSETVLI] Avoid VL toggles for extractelement patterns" The original change had a bug where it allowed SEW mutation. This is wrong in multiple ways, but an easy example is that the slide amount is in units of SEW, and thus that changing SEW changes the slide offset. I'd reverted this in 33314693 intending to more majorly rework the patch because in addition to the bug, I'd noticed a potential oppurtunity to increase scope. After implementing that variant, and realizing it triggered nowhere, I decided to go back to the prior patch with the minimal fix. Note there's no separate test case for the fix. This is because we already had multiple, and I just didn't realize the impact of the original test diff. Adding one more test would have been unlikely to catch that human error. Original commit message.. Noticed this while looking at some SLP output. If we have an extractelement, we're probably using a slidedown into an destination with no contents. Given this, we can allow the slideup to use a larger VL and clobber tail elements of the destination vector. Doing this allows us to avoid vsetvli toggles in many fixed length vector examples. Differential Revision: https://reviews.llvm.org/D148834	2023-05-10 11:51:51 -07:00
Philip Reames	33314693f5	Revert "[RISCV][InsertVSETVLI] Avoid VL toggles for extractelement patterns" This reverts commit 657d20dc75252f0c8415ada5214affccc3c98efe. A correctness problem was reported against the review and the fix warrants re-review.	2023-05-10 10:58:46 -07:00
Philip Reames	657d20dc75	[RISCV][InsertVSETVLI] Avoid VL toggles for extractelement patterns Noticed this while looking at some SLP output. If we have an extractelement, we're probably using a slidedown into an destination with no contents. Given this, we can allow the slideup to use a larger VL and clobber tail elements of the destination vector. Doing this allows us to avoid vsetvli toggles in many fixed length vector examples. Differential Revision: https://reviews.llvm.org/D148834	2023-05-01 18:46:54 -07:00
Craig Topper	5894eec874	[RISCV][WIP] Use vsetvli x0, x0 in more cases. If the AVL is a virtual register defined by a vsetvli with the same vlmax we need and the previous vsetvli we saw in the data flow also has that vlmax, we can use the x0, x0 form when we insert a vsetvli. Not only does this avoid an update of the VL physical register, but it may allow doLocalPostpass to completely remove the inserted vsetvli by rewriting the vtype of the previous vsetvli. Differential Revision: https://reviews.llvm.org/D148735	2023-04-20 13:58:28 -07:00
Craig Topper	0f4c9c016c	[RISCV] Replace RISCV->RISC-V in strings. To be consistent with RISC-V branding guidelines https://riscv.org/about/risc-v-branding-guidelines/ Think we should be using RISC-V where possible. D146449 already updated comments. Strings may have more user impact. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D146451	2023-03-27 09:50:17 -07:00
Philip Reames	d199bf9822	[RISCV][InsertVSETVLI] Handle partially transparent instructions in PRE The PRE implementation was being overly strict when checking to see if a vsetvli was removed in the current block. For instructions which don't use all the fields of VTYPE or VL, we can propagate a changed state past the first instruction with an SEW operand and remove a vsetvli later in the block. We do need to be careful now to ensure that the state convergences before the end of the block or we'd invalidate the cached data flow results. Taking a step back, we're modeling the effect of the emitVSETVLIs pass which runs just after PRE. This is unfortunate, and makes me think we should probably reevaluate doing the PRE as a post-pass instead of as surgery in the data flow phases. Doing that requires us to get more aggressive about mutating user written vsetvlis which we've tried not to do up to now, but well, maybe it's time? Anyways, that's a thought for the future, not something I'm proposing doing now. Differential Revision: https://reviews.llvm.org/D142409	2023-01-27 11:52:09 -08:00
Kito Cheng	7504e9a193	[RISCV][NFC] Refine the patch of D141061 Just saw Craig's comment after I commit, he has suggest a good NFC for that change.	2023-01-06 00:48:24 +08:00
Kito Cheng	05a2ae1b4a	[RISCV][InsertVSETVLI] Using right instruction during mutate AVL of vsetvli Fixing a crash during vsetvli insertion pass. We have a testcase with 3 vsetvli: 1. vsetivli zero, 2, e8, m4, ta, ma 2. li a1, 32; vsetvli zero, a1, e8, m4, ta, mu 3. vsetivli zero, 2, e8, m4, ta, ma and then we trying to optimize 2nd vsetvli since the only user is vmv.x.s, so it could mutate the AVL operand to the AVL operand of the 3rd vsetvli. OK, so we propagate 2 to vsetvli, BUT it's vsetvli not vsetivli, so it expect a register rather than a immediate value, so we have to update the opcode if needed. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D141061	2023-01-06 00:44:30 +08:00
Philip Reames	46dee4a3a3	[RISCV][InsertVSETVLI] Split out demanded property for zero/non-zero of VL The scalar move instructions (vmv.s.x, and fvmv.s.f) depend solely on whether the VL is 0 or non-zero. By tracking the fact we only demand the zeroness and not the whole VL value, we can allow changing VL over a scalar move. This helps to eliminate vsetvli toggles. Differential Revision: https://reviews.llvm.org/D140157	2023-01-03 14:47:13 -08:00
Philip Reames	6df5464a46	[RISCV] Minor type fix [nfc]	2023-01-03 14:22:38 -08:00
Philip Reames	460c1bd344	[RISCV][InsertVSETVLI] Rewrite scalar insert forward rule in terms of demanded fields This is mostly geared at consolidating logic into one form to reduce code duplication, but also has the effect of being a slight generalization. Since these operations aren't masked, we can ignore the mask policy bit when deciding on compatibility. The previous code was overly strict in checking that both policy bits matched. Note: There's a slight difference from the reviewed version. The reviewed version was based on a local revision which included the isCompatible change to only check AVL if VL is used. I apparently never landed that change, and while functional, the functional change isn't visible without this one. I chose to role the extra change into this patch. Differential Revision: https://reviews.llvm.org/D140147	2023-01-03 14:19:52 -08:00
Philip Reames	d36936fdb4	[RISCV][InsertVSETVLI] Add debug output capability to DemandedFields [nfc]	2023-01-03 13:56:57 -08:00
Philip Reames	23f4f66da7	[RISCV][InsertVSETVL] Incorporate demanded fields into compatibility interface [nfc] This reworks the API to explicitly pass in the demanded fields instead of requering them internally. At the moment, this is NFC, but it will stop being so in future changes which adjust the demanded bits in the caller.	2022-12-15 11:11:09 -08:00
Philip Reames	695fdef0ef	[RISCV] Bugfix for 90f91683 noticed in follow up work I went to extend this locally, and then promptly tripped across a bug which is possible with the landed patch. The problematic case is: vsetvli zero, 4, <some vtype> vmv.x.s x1, v0 vsetvli a0, zero, <same type> In this case, the naive rewrite - what I had implemented - would form: vsetvli zero, zero, <same vtype> vmv.x.s x1, v0 This is, amusingly, correct for the vmv.x.s, but is incorrect for the instructions which follow the sequence and probably rely on VL=VLMAX. (The VL before the sequence is unknown, and thus doesn't have to be VLMAX.) I plan to rework the rewrite code to be more robust here, but I wanted to directly fix the bug first. Sorry for the lack of test; I didn't manage to reproduce this without an additional optimization change after a few minutes of trying.	2022-12-15 08:32:05 -08:00
Philip Reames	90f9168307	[RISCV][InsertVSETVLI] Mutate prior vsetvli AVL if doing so allows us to remove a toggle This extends the backwards walk to allow mutating the previous vsetvl's AVL value if it was not used by any instructions in between. In practice, this mostly benefits vmv.x.s and fvmv.f.s patterns since vector instructions which ignore VL are rare. Differential Revision: https://reviews.llvm.org/D140048	2022-12-15 07:32:28 -08:00
Philip Reames	3a020527c2	[RISCV] Use make_range instead of iterator_range for code from 8e6c3094 Jordan fixed this once in 4f9d069, but using make_range is more idiomatic than my accidental iterator_range usage, even with the template type to fix the warning.	2022-12-13 11:17:59 -08:00
Jordan Rupprecht	4f9d069b3b	[NFC] Specify template type to fix -Wctad-qmaybe-unsupported	2022-12-13 10:50:20 -08:00
Philip Reames	8e6c309451	[RISCV][InsertVSETVLI] Reverse traversal order of block in post pass [nfc] his unblocks a following change to be more sophisticated during post pass rewriting. Review wise, I basically just want a second set of eyes. This change should be straight forward, but since it took me an embarrassing number of attempts to get make check to pass. Let's make sure I'm not missing yet another cornercase. Differential Revision: https://reviews.llvm.org/D139877	2022-12-13 07:54:05 -08:00
Craig Topper	96ac1aeaf4	[RISCV] Make DemandedFields::usedVTYPE() const. NFC Noticed while reviewing D139877. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D139879	2022-12-12 14:49:26 -08:00
Philip Reames	b385c01f24	[RISCV][InsertVSETVLI] Reorder code to reduce a future diff [nfc]	2022-12-12 14:46:00 -08:00
Philip Reames	06ebce363a	[RISCV][InsertVSETVLI] vmv.s.x and fvmv.s.f do not depend on LMUL We already have this rule encoded elsewhere in the file - which is why we don't see any test changes. I'm adding it here for completionism. This is not technically NFC since there could be a test case which isn't caught by the specific rules, but is handled by the generic logic. I don't have such an example.	2022-12-08 10:14:39 -08:00
Philip Reames	14ea545a7d	[RISCV][InsertVSETVLI] Generalize scalar move rule for when AVL is unchanged By definition, the AVL of the scalar move is equally zero to the prior AVL if they are the same value. This generalizes the existing code to the case where the scalar move has a register AVL which is unknown, but unchanged from the preceeding instruction. This doesn't cause any interesting diffs on its own, but another patch makes this case much more common. Split off to reduce a future diff.	2022-12-07 10:28:31 -08:00
Kazu Hirata	9f252e5567	[llvm] Use std::nullopt instead of None in comments (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-04 17:31:17 -08:00
Fangrui Song	b0df70403d	[Target] llvm::Optional => std::optional The updated functions are mostly internal with a few exceptions (virtual functions in TargetInstrInfo.h, TargetRegisterInfo.h). To minimize changes to LLVMCodeGen, GlobalISel files are skipped. https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-04 22:43:14 +00:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
wangpc	ea1a2aaa9a	[RISCV] Map pseudos to their BaseInstr to reduce cases There are a lot of cases for pseudos of the same instruction, here we just use existed mapping table to map pseudos to real instructions to reduce cases. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D128271	2022-10-27 16:50:15 +08:00
Craig Topper	1bdf21d55c	[RISCV] Use mask/tail agnostic if tied source is IMPLICIT_DEF regardless of the policy operand. If the source is implicit_def, the register allocator won't have any constraint on what register it picks for the destination. This doesn't give the user much control of what register is being used. So in my mind that means the only reason to honor the policy operand is to control what policy is used in vsetvli to maybe avoid a vtype change. Given the other optimizations we do on the policy field, I don't think allowing the user this control is reliable. Therefore, I think we should use agnostic policies if the source is undef. This should give better performance on some CPUs for VP intrinsics where there is no merge operand and the backend adds IMPLICIT_DEF to the instruction. Differential Revision: https://reviews.llvm.org/D135396	2022-10-11 16:40:16 -07:00
Philip Reames	d89d45ca9a	[RISCV][InsertVSETVLI] Default to MA not MU This changes the default value used for mask policy from mask undisturbed to mask agnostic. In hardware, there may be a minor preference for ta/ma, but since this is only going to apply to instructions which don't use the mask policy bit, this is functionally mostly a nop. The main value is to make future changes to using MA when legal for masked instructions easier to review by reducing test churn. The prior code was motivated by a desire to minimize state transitions between masked and unmasked code. This patch achieves the same effect using the demanded field logic (landed in afb45ff), and there are no regressions I spotted in the test diffs. (Given the size, I have only been able to skim.) I do want to call out that regressions are possible here; the demanded analysis only works on a block local scope right now, so e.g. a tight loop mixing masked and unmasked computation might see an extra vsetvli or two. Differential Revision: https://reviews.llvm.org/D133803	2022-10-06 07:59:39 -07:00
Philip Reames	afb45ffce7	[RISCV][InsertVSETVLI] Treat mask policy as undemanded if usesMaskPolicy is false Differential Revision: https://reviews.llvm.org/D135327	2022-10-06 07:20:16 -07:00
Anton Sidorenko	3e97e94237	[NFC][RISCV] Move getSEWLMULRatio function to header More uses of getSEWLMULRatio will be added in D130895. Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D135086	2022-10-05 15:10:53 +01:00
Philip Reames	900364fccf	[RISCV] Minor code motion in InsertVSETVLI [nfc]	2022-09-29 14:01:57 -07:00
Craig Topper	94049db913	[RISCV] Make computeIncomingVLVTYPE more conservative when merging predecessor state. If we have already calculated the incoming state before, use that as our starting point to ensure we are conservative. This fixes an infinite loop found in our downstream where we we allowed two waves of updates to propagate through a loop and the merge points allowed us to toggle back and forth between states. No small reproducer right now. Differential Revision: https://reviews.llvm.org/D134229	2022-09-19 15:57:55 -07:00
Craig Topper	0cec96ab25	[RISCV] Manage the InQueue flag in insertvli correctly. We were only setting this flag the first time we added the blocks not when we mark them for revisiting. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D134193	2022-09-19 14:28:22 -07:00
Philip Reames	9a9848f4b9	[RISCVInsertVSETVLI] Remove an unsound optimization This fixes a bug reported privately by @craig.topper. Here's an example which illustrates the problem: vsetivli a1, a0, e32, m1, ta, mu # both DefInfo and PrevInfo vsetivli a2, a1, e32, m4, ta, mu With the unsound result being: vsetivli a1, a0, e32, m1, ta, mu vsetivli a2, a0, e32, m4, ta, mu Consider the case where this is running on a machine with VLEN=512,. For this case, the VLMAXs are 16 and 64 respectively. Consider for a0 = 33. The correct result is: a1 = 16, and a2 = 16 After the unsound optimization: a1 = 16 and a2 = 33 This particular example used VLMAXs which differed by more than a power of two. With a difference of only one power of two, there's another form of this bug which involves the AVL < 2 x VLMAX special case, but that ones more complicated to construct as many examples turn out accidentally sound. This patch takes the approach of simply removing the unsound optimization, but there are multiple sound sub-cases of it. I plan to return to at least a couple of them, but figured it was cleaner to remove the unsound optimization (for ease of backporting), and then review the new optimizations on their own. Differential Revision: https://reviews.llvm.org/D131264	2022-08-05 12:13:08 -07:00
Philip Reames	dd48d3ad0e	Revert "[RISCV] Avoid changing etype for splat of 0 or -1" This reverts commit 755c84c62cda80b0acf51ccc5653fc6d64536f7e. A bug was reported on the original review thread (https://reviews.llvm.org/D128006), and on inspection this patch is simply wrong. It needs to be checking for VLInBytes, not MaxVL. These happen to be the same when using AVL=VLMAX (which is quite common), but this does not fold when AVL != VLMAX.	2022-06-29 10:27:02 -07:00
Craig Topper	eb9d21d65c	[RISCV] Remove extra semicolon. NFC	2022-06-26 18:19:43 -07:00
Philip Reames	1cc9792281	[RISCV] Fix a crash in InsertVSETVLI where we hadn't properly guarded for a SEWLMULRatioOnly abstract state A forward abstract state can be in the special SEWLMULRatioOnly state which means we're not allowed to inspect its fields. The scalar to vector move case was mising a guard, and we'd crash on an assert. Test cases included.	2022-06-23 10:25:16 -07:00
Philip Reames	14847098f9	[RISCV] Delete unexercised VL=0 vsetvli compatibility logic The code being removed is technically correct; if we end up with two VL=0 instructions next to each other, we can avoid a state transition if the second is a scalar move. However, since both ops are also nops, we should simply delete them instead. As such, this compatibility rule simply complicates the code for no purpose.	2022-06-20 10:15:31 -07:00
Philip Reames	dc562d570d	[RISCV] Fold prepass back into InsertVSETVLI data flow [nfc-ish] When working through correctness issues in this pass, I moved a number of transforms which were phrased as mutating prior vsetvli instructions out of the main data flow because mutating prior instructions can invalidate the running dataflow results in subtle ways. We ended up creating both a prepass and a post-pass. After consideration, I believe the prepass to be redundant, and this change removes it by folding it back into the data flow via a key conceptual change. Instead of phrasing the mutations on instructions, we can phrase them on abstract states. This avoids the dataflow inconsistency problem mentioned above by simply propagating the potential change forward, and thus reflecting its results in the dataflow. Critically, we do so without modifying existing VSETVLI instructions; some of the data flow steps include non-local IR analysis. Compile time wise, this removes a linear pass, but has the potential to increase the number of iterations for the data flow to converge. That's not a algorithmic complexity change, the needVSETVLI mechanism has the same effect. In practice, I don't see this triggering more iterations, so I think it's likely to be a net win overall. (I didn't do any careful analysis here; just an impression from glancing at a couple tests.) This has the potential to produce better results, so this isn't strictly speaking NFC. Differential Revision: https://reviews.llvm.org/D127870	2022-06-20 07:56:33 -07:00

1 2 3 4 5

250 Commits