SEWGreaterThanOrEqualAndLessThan64 is a stricter constraint so it should
have a higher value than SEWGreaterThanOrEqual.
Found by our random test generator.
This is an alternative to #117866 that works by demanding a valid vtype
instead of using a separate pass.
The main advantage of this is that it allows coalesceVSETVLIs to just
reuse an existing vsetvli later in the block.
To do this we need to first transfer the vsetvli info to some arbitrary
valid state in transferBefore when we encounter a vector copy. Then we
add a new vill demanded field that will happily accept any other known
vtype, which allows us to coalesce these where possible.
Note we also need to check for vector copies in computeVLVTYPEChanges,
otherwise the pass will completely skip over functions that only have
vector copies and nothing else.
This is one part of a fix for #114518. We still need to check if there's
other cases where vector copies/whole register moves that are inserted
after vsetvli insertion.
The existing analysis was already a pimpl wrapper.
I have extracted legacy pass logic to a LDVImpl wrapper named
`LiveDebugVariables` which is the analysis::Result now. This controls
whether to activate the LDV (depending on `-live-debug-variables` and
DIsubprogram) itself.
The legacy and new analysis only construct the LiveDebugVariables.
VirtRegRewriter will test this.
In order to coalesce a vsetvli with a register AVL into a previous
vsetvli, we need to make sure that the AVL register is reachable at the
previous vsetvli.
Back in pre-RA vsetvli insertion we just checked to see if the two
virtual registers were the same virtual register, and then this was
hacked around in the move to post-RA. We can instead use live intervals
to check that the reaching definition is the same at both instructions.
On its own this doesn't have much of an impact, but helps a lot in
#118283 and enables coalescing in about 60 of the test cases from that
PR.
In #71501 we removed the LMUL variants for vmv.s.x and vmv.x.s because
they ignore register groups, so this patch does the same for their
floating point equivalents.
We don't need to add any extra patterns for extractelt in
RISCVInstrInfoVSDPatterns.td because in lowerEXTRACT_VECTOR_ELT we make
sure that the node is narrowed down to LMUL 1.
The code previously deferred deleting the vsetvli to avoid invalidating
iterators, but eagerly deleted any ADDIs feeding the AVL register
operand. This was safe because the iterator was known to point to a
non-ADDI instruction (the vsetvli which was the previous user.) This
change switches to using an early_inc_range so that we can eagerly
delete the vsetvlis, but have to track ADDIs for later deletion.
This is purely stylistic, but IMO makes the code easier to follow. It
will also simplify a future change to support recursive deletion of
trivially dead instructions (i.e. LUI/ADDI pairs.)
This is a follow up to ff8a03a7. On the review for that, I'd suggested a
stylistic rework, but after discussion we decided to move forward with
the fix as it was. This change is a small part of that suggested rework.
Once I sat down and wrote the code, I think I've convinced myself of an
entirely different approach (tbd), but for the moment, let's use a
lambda to share code so that we can pickup a missed optimization, and
reduce some duplication.
---------
Co-authored-by: Luke Lau <luke_lau@icloud.com>
Currently before forwarding an AVL we do a simple non-exhaustive check
to see if its LiveInterval is extendable. But we also need to check for
this when we're extending an AVL's LiveInterval via merging the
VSETVLIInfos in transferBefore with equally zero AVLs.
Rather than trying to conservatively prevent these cases, this inserts a
copy of the AVL instead if we don't know we'll be able to extend it.
This is likely to be more robust, and even if the extra copy is
undesirable these cases should be rare in practice.
Most of the time when we coalesce and delete a vsetvli, we shrink the
LiveInterval of its AVL register now that there is one less use. However
there's one edge case we were missing where if we have two vsetvlis with
no users of vl or vtype in between, we coalesced a vsetvli without
shrinking it's AVL.
This fixes it by shrinking the LiveInterval whenever we delete a
vsetvli, and also makes the LiveIntervals consistent in-situ by not
removing the use before shrinking.
This fixes a -verify-machineinstrs assertion in an MIR test case I found
while investigating
https://github.com/llvm/llvm-project/pull/97264#issuecomment-2218036877.
I couldn't recreate this at the LLVM IR level, seemingly because
RISCVInsertVSETVLI will just avoid inserting extra vsetvlis that don't
need coalesced.
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.
This would be the last analysis required by `PHIElimination`.
If we're inserting a vsetvli that uses a register AVL, then the AVL
register should either:
a) Be already live at the vsetvli with the expected value
b) Not be live at the vsetvli, but have exactly one value that can be
extended safely: see #97264
This flag was added in https://reviews.llvm.org/D103277 out of
precaution, but it's been enabled by default for 3 years now and I
haven't seen any miscompiles upstream stemming from needVSETVLIPHI. This
would remove it just to simplify the number of configurations.
This fixes a crash found when compiling OpenBLAS with -mllvm
-verify-machineinstrs.
When we "forward" the AVL from the output of a vsetvli, we might have to
extend the LiveInterval of the AVL to where insert the new vsetvli.
Most of the time we are able to extend the LiveInterval because there's
only one val num (definition) for the register. But PHI elimination can
assign multiple values to the same register, in which case we end up
clobbering a different val num when extending:
%x = PseudoVSETVLI %avl, ...
%avl = ADDI ...
%v = PseudoVADD ..., avl=%x
; %avl is forwarded to PseudoVADD:
%x = PseudoVSETVLI %avl, ...
%avl = ADDI ...
%v = PseudoVADD ..., avl=%avl
Here there's no way to extend the %avl from the vsetvli since %avl is
redefined, i.e. we have two val nums.
This fixes it by only forwarding it when we have exactly one val num,
where it should be safe to extend it.
In #96200 we handled extending AVL LiveIntervals across basic blocks,
which fixed a crash in a test case in
133ab9a8c82a31549f060da33fd7e14f1d7f39fd.
This was done by manually adding a single segment to the LiveInterval to
extend it from AVL def -> inserted vsetvli, but in hindsight this was
too simple and fails to handle cases where the vsetlvi is located before
the AVL def.
This patch fixes this by using LiveIntervals::extendToIndices instead
which can handle these cases.
(The crash that this fixes is separate from the crash in #97264)
When checking if a VSETVLIInfo is compatible, we call hasEquallyZeroAVL
if only the AVL-zeroness is demanded. This will try to lookup the
defining MachineInstr (to check if it's an ADDI immediate) via
getAVLDefMI, but in it we were asserting that the VSETVLIInfo's AVL
wouldn't come from a phi. It turns out this can happen in normal
circumstances.
This causes a crash when compiling highway, so this fixes it by relaxing
the assertion.
We do the AVL forwarding trick in both getInfoForVSETVLI and
computeInfoForInstr, but there's a bug with this that I plan on fixing
in an upcoming patch. This factors it out to so we only need to fix it
in one place.
When coalescing vsetvlis we might remove a use of a register AVL, which
in turn might leave the AVL def dead. When it's dead (currently limited
to just ADDIs) we delete the def, but we were forgetting to remove it
from LiveInterval's instruction map. This fixes#95865
If the AVL in a VSETVLIInfo is the output VL of a vsetvli with the same
VLMAX, we treat it as the AVL of said vsetvli.
This allows us to remove a true dependency as well as treating
VSETVLIInfos as equal in more places and avoid toggles.
We do this in two places, needVSETVLI and computeInfoForInstr. However
we don't do this in computeInfoForInstr's vsetvli equivalent,
getInfoForVSETVLI.
We also have a restriction only in computeInfoForInstr that the AVL
can't be a register as we want to avoid extending live ranges.
This patch does two interlinked things:
1) It adds this AVL "peeking" to getInfoForVSETVLI
2) It relaxes the constraint that the AVL can't be a register in
computeInfoForInstr, since it removes a use of the output VL which can
actually reduce register pressure. E.g. see the diff in
@vector_init_vsetvli_N and @test6
Now that getInfoForVSETVLI and computeInfoForInstr are consistent, we
can remove the check in needVSETVLI.
We also need to update how we update LiveIntervals in insertVSETVLI, as
we can now end up needing to extend the LiveRange of the AVL across
blocks.
(Reapplying with corrected commit message)
We recently moved RISCVInsertVSETVLI from before vector register allocation
to after vector register allocation. When doing so, we added an unconditional
dependency on LiveIntervals - even at O0 where LiveIntevals hadn't previously
run. As reported in #93587, this was apparently not safe to do.
This change makes LiveIntervals optional, and adjusts all the update code to
only run wen live intervals is present. The only real tricky part of this
change is the abstract state tracking in the dataflow. We need to represent
a "register w/unknown definition" state - but only when we don't have
LiveIntervals.
This adjust the abstract state definition so that the AVLIsReg state can
represent either a register + valno, or a register + unknown definition.
With LiveIntervals, we have an exact definition for each AVL use. Without
LiveIntervals, we treat the definition of a register AVL as being unknown.
The key semantic change is that we now have a state in the lattice for which
something is known about the AVL value, but for which two identical lattice
elements do *not* neccessarily represent the same AVL value at runtime.
Previously, the only case which could result in such an unknown AVL was the
fully unknown state (where VTYPE is also fully unknown). This requires a
small adjustment to hasSameAVL and lattice state equality to draw this
important distinction.
The net effect of this patch is that we remove the LiveIntervals dependency
at O0, and O0 code quality will regress for cases involving register AVL values.
In practice, this means we pessimize code written with intrinsics at O0.
This patch is an alternative to #93796 and #94340. It is very directly
inspired by review conversation around them, and thus should be considered
coauthored by Luke.
Stacked on https://github.com/llvm/llvm-project/pull/94658.
We recently moved RISCVInsertVSETVLI from before vector register
allocation to after vector register allocation. When doing so, we added
an unconditional dependency on LiveIntervals - even at O0 where
LiveIntevals hadn't previously run. As reported in #93587, this was
apparently not safe to do.
This change makes LiveIntervals optional, and adjusts all the update
code to only run wen live intervals is present. The only real tricky
part of this change is the abstract state tracking in the dataflow. We
need to represent a "register w/unknown definition" state - but only
when we don't have LiveIntervals.
This adjust the abstract state definition so that the AVLIsReg state can
represent either a register + valno, or a register + unknown definition.
With LiveIntervals, we have an exact definition for each AVL use.
Without LiveIntervals, we treat the definition of a register AVL as
being unknown.
The key semantic change is that we now have a state in the lattice for
which something is known about the AVL value, but for which two
identical lattice elements do *not* necessarily represent the same AVL
value at runtime. Previously, the only case which could result in such
an unknown AVL was the fully unknown state (where VTYPE is also fully
unknown). This requires a small adjustment to hasSameAVL and lattice
state equality to draw this important distinction.
The net effect of this patch is that we remove the LiveIntervals
dependency at O0, and O0 code quality will regress for cases involving
register AVL values.
This patch is an alternative to
https://github.com/llvm/llvm-project/pull/93796 and
https://github.com/llvm/llvm-project/pull/94340. It is very directly
inspired by review conversation around them, and thus should be
considered coauthored by Luke.
As noted in one of the existing comments, the job AVLIsIgnored was
filing was really more of a demanded field role. Since we recently
realized we can use the values of VL on MI even in the backwards pass,
let's exploit that to improve demanded fields, and delete AVLIsIgnored.
Note that the test change is a real regression, but only incidental to
this patch. The backwards pass doesn't have the information that the VL
following a VL-preserving vtype is non-zero. This is an existing
problem, this patch just adds a few more cases where we prove
vl-preserving is legal.
In VSETVLIInfo we define == as lattice equality, e.g. unknown == unknown
and invalid == invalid. We need this to check if the information at a
block's entry or exit has changed.
However I think we may have been conflating it with the notion that the
state of VL and VTYPE are the same, which isn't the case for two
unknowns, and potentially in an upcoming patch where we don't know the
value of an AVL register (see #93796)
This patch switches over the use in emitVSETVLIs to use isCompatible
with all fields demanded, since we need VL and VTYPE to be known equal
at runtime rather than just the VSETVLIInfos to be the same.
This should be NFC for now we shouldn't reach here with an unknown
state. But it's needed if we're to introduce the notion of an AVLReg
with a nullptr ValNo, which will need this non-reflexive equality.
Specifically, hasSameAVL should return false for this case, but ==
should return true.
getVNInfoFromReg is expected to return a nullptr if-and-only-if the
operand is undef. (This was asserted for.) Reverse the order of the
checks to simplify an upcoming set of patches.
As noted in
https://github.com/llvm/llvm-project/pull/93796#issuecomment-2142752336,
a better way to teach RISCVInsertVSETVLI to work without LiveIntervals
is to set VNInfo to nullptr and teach the various methods to handle it.
We should try that approach first, so we no longer need this pre-commit
patch.
This reverts commit 4b4d36654d8056546b177b3d04c352ba0b16d7ea.
The VNInfo id (called val no elsewhere it seems) and register is enough
to uniquely identify AVL values, so try to store as little state as
possible.
This may also allow us to use dummy val nos in an upcoming patch when we
don't have LiveIntervals.
We no longer need to separate the passes now that #70549 is landed and
this will unblock #89089.
It's not strictly NFC because it will move coalescing before register
allocation when -riscv-vsetvl-after-rvv-regalloc is disabled. But this
makes it closer to the original behaviour.
This also makes the function a bit easier to reason about since we can
remove the assert. Eventually we might be able to replace needVSETVLI
with VSETVLIInfo::isCompatible.
We have two rules in needVSETVLI where we can relax the demanded fields
for slides and splats when VL=1.
However these aren't present in getDemanded which prevents us from
coalescing some vsetvlis around slides and splats in the backwards pass.
The reasoning as to why they weren't in getDemanded is that these
require us to check the value of the AVL operand, which may be stale in
the backwards pass: the actual VL or VTYPE value may differ from what
was precisely requested in the pseudo's operands.
Using the original operands should actually be fine though, as we only
care about what was originally demanded by the instruction. The current
value of VL or VTYPE shouldn't influence this.
This addresses some of the regressions we are seeing in #70549 from
splats and slides getting reordered.
In needVSETVLI used by the forward insertion pass, we have some rules
where we can relax the demanded fields for slides and splats when VL=1.
However this only works if we don't increase LMUL to anything > M1
otherwise we would end up clobbering random registers.
Rather than check the VSETVLIInfo we're transitioning to, store this
information in DemandedFields and have isCompatible check it.
That way an upcoming patch can share these VL=1 rules with
RISCVCoalesceVSETVLI, which uses isCompatible and not needVSETVLI.
As noted in
https://github.com/llvm/llvm-project/pull/91440#discussion_r1601976425,
if the pass pipeline stops early because of -stop-after any allocated
passes added with insertPass will not be freed if they haven't already
been added.
This was showing up as a failure on the address sanitizer buildbots. We
can fix it by instead passing the pass ID instead so that allocation is
deferred.
Before #91440 a VSETVLIInfo would have had an IMPLICIT_DEF defining
instruction, but now we look up a VNInfo which doesn't exist, which
triggers an assertion failure. Mark these undef AVLs as AVLIsIgnored.
Split off from #70549, this patch moves RISCVInsertVSETVLI to after phi
elimination where we exit SSA and need to move to LiveVariables.
The motivation for splitting this off is to avoid the large scheduling
diffs from moving completely to after regalloc, and instead focus on
converting the pass to work on LiveIntervals.
The two main changes required are updating VSETVLIInfo to store VNInfos
instead of MachineInstrs, which allows us to still check for PHI defs in
needVSETVLIPHI, and fixing up the live intervals of any AVL operands
after inserting new instructions.
On O3 the pass is inserted after the register coalescer, otherwise we
end up with a bunch of COPYs around eliminated PHIs that trip up
needVSETVLIPHI.
Co-authored-by: Piyou Chen <piyou.chen@sifive.com>
If a segmented load has an undefined passthru then it will be selected
as a reg_sequence with implicit_def operands, which currently slips
through the implicit_def -> noreg peephole.
This patch fixes this so we're able to infer if the passthru is
undefined without the need for looking through vreg definitions with
MachineRegisterInfo, which will help with moving RISCVInsertVSETVLI to
LiveIntervals in #70549
For vector merge operands, we check if it's a NoRegister beforehand so
any other register type should have a definition.
For VL operands, they don't get replaced with NoRegisters since they're
scalar and should also always have a definition, even if it's an
implicit_def.
All the definitions at this stage should also be unique, this will
change in #70549
This flag has been enabled by default for almost two years now since
1f06398e96d4508d22f42b760f70eb5d4e7b1dc9, and at this stage we probably
shouldn't be falling back to the fixups.
This removes the flag so we always perform the assertion, as well as
making sure that CurInfo is always valid on exit: We shouldn't leave
emitVSETVLIs with an uninitialized VSETVLIInfo.