LiveInterval DefLI can be consisting of multiple segments, and SlotIndex
NewDefSI can be inside any of them. Currenty it is assumed that NewDefSI
belongs to the first segment, and when trying to call removeSegment,
clang crashes since there is no segment containing [DefLI.beginIndex(),
NewDefSI]
FIxes https://github.com/llvm/llvm-project/issues/146518
When `AVL` is the last MI, `std::next(II)` equals `MBB.end()`, and
calling `II->getParent()` at that point will cause an error.
---------
Co-authored-by: yanming <ming.yan@terapines.com>
In #90049 we removed the side effect flag on the vleNff pseudos with the
reasoning that we modelled the effect of setting vl as an output
operand.
This extends this further by removing the implicit def on vl, inserting
it back in RISCVInsertVSETVLI when we also emit the PseudoReadVL.
The motiviation for this is to make it easier to handle vleff in more
places in RISCVVectorPeephole in a follow up patch, which in turn will
make migrating the last vmerge peephole over from RISCVISelDAGToDAG
easier.
Some of these tests claim that the vleff shouldn't be deleted when none
of its values are used, but these are from the initial commit in
3b5430eb0dad5. I'm not sure if these still hold today?
This also moves the fault-only-first predicate to
RISCVInstrPredicates.td since we can't rely on the implicit vl operand
anymore.
With EVL tail folding we can end up with vsetvlis where the output vl
and the input AVL are the same register. When we try to coalesce it we
crashed because we tried to move the def's live interval before the
kill's live interval, e.g. in this example:
(vn0 def)
dead $x0 = PseudoVSETIVLI 1, 192, implicit-def $vl, implicit-def $vtype
renamable $v9 = COPY killed renamable $v8
(vn1 def) %23:gprnox0 = PseudoVSETVLI killed (vn0) %23:gprnox0, 197,
implicit-def $vl, implicit-def $vtype
We would try to move the vn1 def VNInfo up to the previous VSETVLI, in
the middle of vn0's segment.
However separately, we were also assuming that the vl would only have
one definition and thus were just taking the VNInfo from beginIndex(),
so we ended up with a backwards segment and got the error "Cannot create
empty or backwards segment".
This fixes these two issues, the first one by moving the AVL operand +
live interval up first, and the second by taking the VNInfo from
NextMI's slot index.
Fixes#141907
Strengthen the register class on PseudoVSETVLIX0 to disallow X0 as a
destination. This allows removal of an opcode check from
RISCVDeadRegisterDefinitions. Now the register class will prevent the
conversion to X0.
I'm considering removing the explicit X0 operands and adding them during
PseudoExpansion, but it complicates finding the vtype operand.
I ran into a relatively rare case in RISCVInsertVSETVLIPass, where right
after the `emitVSETVLI` phase but before the `coalesceVSETVLIs` phase,
we have two blocks that look like this:
```
bb.first:
%46:gprnox0 = PseudoVSETIVLI %30:gprnox0, 199 /* e8, mf2, ta, ma */, implicit-def $vl, implicit-def $vtype
%76:gpr = PseudoVSETVLIX0 killed $x0, ..., implicit-def $vl, implicit-def $vtype
$v10m2 = PseudoVMV_V_I_M2 undef renamable $v10m2, 0, -1, 5 /* e32 */, 0 /* tu, mu */, implicit $vl, implicit $vtype
...
bb.second:
$x0 = PseudoVSETVLI %46, 209 /* e32, m2, ta, ma */, implicit-def $vl, implicit-def $vtype
$v10 = PseudoVMV_S_X undef $v10(tied-def 0), undef %53:gpr, $noreg, 5, implicit $vl, implicit $vtype
$x0 = PseudoVSETVLI %30, 209 /* e32, m2, ta, ma */, implicit-def $vl, implicit-def $vtype
$v8 = PseudoVREDSUM_VS_M2_E32 undef $v8(tied-def 0), killed $v8m2, killed $v10, $noreg, 5, 0, implicit $vl, implicit $vtype
```
After the `coalesceVSETVLIs` phase, it turns into:
``` diff
bb.first:
- %46:gprnox0 = PseudoVSETIVLI %30:gprnox0, 199 /* e8, mf2, ta, ma */, implicit-def $vl, implicit-def $vtype
+ dead %46:gprnox0 = PseudoVSETIVLI %30:gprnox0, 199 /* e8, mf2, ta, ma */, implicit-def $vl, implicit-def $vtype
%76:gpr = PseudoVSETVLIX0 killed $x0, ..., implicit-def $vl, implicit-def $vtype
$v10m2 = PseudoVMV_V_I_M2 undef renamable $v10m2, 0, -1, 5 /* e32 */, 0 /* tu, mu */, implicit $vl, implicit $vtype
...
bb.second:
- $x0 = PseudoVSETVLI %46, 209 /* e32, m2, ta, ma */, implicit-def $vl, implicit-def $vtype
+ $x0 = PseudoVSETVLI %30, 209 /* e32, m2, ta, ma */, implicit-def $vl, implicit-def $vtype
$v10 = PseudoVMV_S_X undef $v10(tied-def 0), undef %53:gpr, $noreg, 5, implicit $vl, implicit $vtype
- $x0 = PseudoVSETVLI %30, 209 /* e32, m2, ta, ma */, implicit-def $vl, implicit-def $vtype
$v8 = PseudoVREDSUM_VS_M2_E32 undef $v8(tied-def 0), killed $v8m2, killed $v10, $noreg, 5, 0, implicit $vl, implicit $vtype
```
We forwarded `%30` to any use of `%46` and further reduced the number of
VSETVLI we need in `bb.second`. But the problem is, if `bb.first` is
processed before `bb.second` -- which is the majority of the cases --
then we're not able to remove the vsetvli which defines the now-dead
`%46` in `bb.first` after coalescing `bb.second`.
This will produce assembly code like this:
```
vsetvli zero, s0, e8, mf2, ta, ma
vsetvli a0, zero, e32, m2, ta, ma
vmv.v.i v10, 0
```
This patch fixes this issue by coalescing the blocks from bottom up such
that we can account for dead VSETVLI in the earlier blocks after its
uses are eliminated in later blocks.
---------
Co-authored-by: Luke Lau <luke@igalia.com>
These instructions are included in XRivosVisni. They perform a scalar
insert into a vector (with a potentially non-zero index) and a scalar
extract from a vector (with a potentially non-zero index) respectively.
They're very analogous to vmv.s.x and vmv.x.s respectively.
The instructions do have a couple restrictions:
1) Only constant indices are supported w/a uimm5 format.
2) There are no FP variants.
One important property of these instructions is that their throughput
and latency are expected to be LMUL independent.
…sing inline assembly.
Fixing [#128636](https://github.com/llvm/llvm-project/pull/128636).
This patch has RISCVInsertVSETVLI to add implicit use operand to inline
assembly, this approach is suggested by @preames and the implementation
I referenced is from @topperc . The purpose of adding vl, vtype implicit
operand is to prevent Post-RA scheduler moving vsetvl across inline
assembly.
The VLMUL and policy enums originally lived in RISCVBaseInfo.h in the
backend which is where everything else in the RISCVII namespace is
defined.
RISCVTargetParser.h is used by much more of the compiler and it
doesn't really make sense to have 2 different namespaces exposed.
These enums are both associated with VTYPE so using the RISCVVType
namespace seems like a good home for them.
Add a policy operand to set the tail agnostic policy instead of using
ForceTailAgnostic. The masked to unmasked transforms had to be updated
to drop the policy operand when converting to unmasked.
Found using https://github.com/codespell-project/codespell
```
codespell RISCV --write-changes \
--ignore-words-list=FPR,fpr,VAs,ORE,WorstCase,hart,sie,MIs,FLE,fle,CarryIn,vor,OLT,VILL,vill,bu,pass-thru
```
SEWGreaterThanOrEqualAndLessThan64 is a stricter constraint so it should
have a higher value than SEWGreaterThanOrEqual.
Found by our random test generator.
This is an alternative to #117866 that works by demanding a valid vtype
instead of using a separate pass.
The main advantage of this is that it allows coalesceVSETVLIs to just
reuse an existing vsetvli later in the block.
To do this we need to first transfer the vsetvli info to some arbitrary
valid state in transferBefore when we encounter a vector copy. Then we
add a new vill demanded field that will happily accept any other known
vtype, which allows us to coalesce these where possible.
Note we also need to check for vector copies in computeVLVTYPEChanges,
otherwise the pass will completely skip over functions that only have
vector copies and nothing else.
This is one part of a fix for #114518. We still need to check if there's
other cases where vector copies/whole register moves that are inserted
after vsetvli insertion.
The existing analysis was already a pimpl wrapper.
I have extracted legacy pass logic to a LDVImpl wrapper named
`LiveDebugVariables` which is the analysis::Result now. This controls
whether to activate the LDV (depending on `-live-debug-variables` and
DIsubprogram) itself.
The legacy and new analysis only construct the LiveDebugVariables.
VirtRegRewriter will test this.
In order to coalesce a vsetvli with a register AVL into a previous
vsetvli, we need to make sure that the AVL register is reachable at the
previous vsetvli.
Back in pre-RA vsetvli insertion we just checked to see if the two
virtual registers were the same virtual register, and then this was
hacked around in the move to post-RA. We can instead use live intervals
to check that the reaching definition is the same at both instructions.
On its own this doesn't have much of an impact, but helps a lot in
#118283 and enables coalescing in about 60 of the test cases from that
PR.
In #71501 we removed the LMUL variants for vmv.s.x and vmv.x.s because
they ignore register groups, so this patch does the same for their
floating point equivalents.
We don't need to add any extra patterns for extractelt in
RISCVInstrInfoVSDPatterns.td because in lowerEXTRACT_VECTOR_ELT we make
sure that the node is narrowed down to LMUL 1.
The code previously deferred deleting the vsetvli to avoid invalidating
iterators, but eagerly deleted any ADDIs feeding the AVL register
operand. This was safe because the iterator was known to point to a
non-ADDI instruction (the vsetvli which was the previous user.) This
change switches to using an early_inc_range so that we can eagerly
delete the vsetvlis, but have to track ADDIs for later deletion.
This is purely stylistic, but IMO makes the code easier to follow. It
will also simplify a future change to support recursive deletion of
trivially dead instructions (i.e. LUI/ADDI pairs.)
This is a follow up to ff8a03a7. On the review for that, I'd suggested a
stylistic rework, but after discussion we decided to move forward with
the fix as it was. This change is a small part of that suggested rework.
Once I sat down and wrote the code, I think I've convinced myself of an
entirely different approach (tbd), but for the moment, let's use a
lambda to share code so that we can pickup a missed optimization, and
reduce some duplication.
---------
Co-authored-by: Luke Lau <luke_lau@icloud.com>
Currently before forwarding an AVL we do a simple non-exhaustive check
to see if its LiveInterval is extendable. But we also need to check for
this when we're extending an AVL's LiveInterval via merging the
VSETVLIInfos in transferBefore with equally zero AVLs.
Rather than trying to conservatively prevent these cases, this inserts a
copy of the AVL instead if we don't know we'll be able to extend it.
This is likely to be more robust, and even if the extra copy is
undesirable these cases should be rare in practice.
Most of the time when we coalesce and delete a vsetvli, we shrink the
LiveInterval of its AVL register now that there is one less use. However
there's one edge case we were missing where if we have two vsetvlis with
no users of vl or vtype in between, we coalesced a vsetvli without
shrinking it's AVL.
This fixes it by shrinking the LiveInterval whenever we delete a
vsetvli, and also makes the LiveIntervals consistent in-situ by not
removing the use before shrinking.
This fixes a -verify-machineinstrs assertion in an MIR test case I found
while investigating
https://github.com/llvm/llvm-project/pull/97264#issuecomment-2218036877.
I couldn't recreate this at the LLVM IR level, seemingly because
RISCVInsertVSETVLI will just avoid inserting extra vsetvlis that don't
need coalesced.
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.
This would be the last analysis required by `PHIElimination`.
If we're inserting a vsetvli that uses a register AVL, then the AVL
register should either:
a) Be already live at the vsetvli with the expected value
b) Not be live at the vsetvli, but have exactly one value that can be
extended safely: see #97264
This flag was added in https://reviews.llvm.org/D103277 out of
precaution, but it's been enabled by default for 3 years now and I
haven't seen any miscompiles upstream stemming from needVSETVLIPHI. This
would remove it just to simplify the number of configurations.
This fixes a crash found when compiling OpenBLAS with -mllvm
-verify-machineinstrs.
When we "forward" the AVL from the output of a vsetvli, we might have to
extend the LiveInterval of the AVL to where insert the new vsetvli.
Most of the time we are able to extend the LiveInterval because there's
only one val num (definition) for the register. But PHI elimination can
assign multiple values to the same register, in which case we end up
clobbering a different val num when extending:
%x = PseudoVSETVLI %avl, ...
%avl = ADDI ...
%v = PseudoVADD ..., avl=%x
; %avl is forwarded to PseudoVADD:
%x = PseudoVSETVLI %avl, ...
%avl = ADDI ...
%v = PseudoVADD ..., avl=%avl
Here there's no way to extend the %avl from the vsetvli since %avl is
redefined, i.e. we have two val nums.
This fixes it by only forwarding it when we have exactly one val num,
where it should be safe to extend it.
In #96200 we handled extending AVL LiveIntervals across basic blocks,
which fixed a crash in a test case in
133ab9a8c82a31549f060da33fd7e14f1d7f39fd.
This was done by manually adding a single segment to the LiveInterval to
extend it from AVL def -> inserted vsetvli, but in hindsight this was
too simple and fails to handle cases where the vsetlvi is located before
the AVL def.
This patch fixes this by using LiveIntervals::extendToIndices instead
which can handle these cases.
(The crash that this fixes is separate from the crash in #97264)
When checking if a VSETVLIInfo is compatible, we call hasEquallyZeroAVL
if only the AVL-zeroness is demanded. This will try to lookup the
defining MachineInstr (to check if it's an ADDI immediate) via
getAVLDefMI, but in it we were asserting that the VSETVLIInfo's AVL
wouldn't come from a phi. It turns out this can happen in normal
circumstances.
This causes a crash when compiling highway, so this fixes it by relaxing
the assertion.
We do the AVL forwarding trick in both getInfoForVSETVLI and
computeInfoForInstr, but there's a bug with this that I plan on fixing
in an upcoming patch. This factors it out to so we only need to fix it
in one place.
When coalescing vsetvlis we might remove a use of a register AVL, which
in turn might leave the AVL def dead. When it's dead (currently limited
to just ADDIs) we delete the def, but we were forgetting to remove it
from LiveInterval's instruction map. This fixes#95865
If the AVL in a VSETVLIInfo is the output VL of a vsetvli with the same
VLMAX, we treat it as the AVL of said vsetvli.
This allows us to remove a true dependency as well as treating
VSETVLIInfos as equal in more places and avoid toggles.
We do this in two places, needVSETVLI and computeInfoForInstr. However
we don't do this in computeInfoForInstr's vsetvli equivalent,
getInfoForVSETVLI.
We also have a restriction only in computeInfoForInstr that the AVL
can't be a register as we want to avoid extending live ranges.
This patch does two interlinked things:
1) It adds this AVL "peeking" to getInfoForVSETVLI
2) It relaxes the constraint that the AVL can't be a register in
computeInfoForInstr, since it removes a use of the output VL which can
actually reduce register pressure. E.g. see the diff in
@vector_init_vsetvli_N and @test6
Now that getInfoForVSETVLI and computeInfoForInstr are consistent, we
can remove the check in needVSETVLI.
We also need to update how we update LiveIntervals in insertVSETVLI, as
we can now end up needing to extend the LiveRange of the AVL across
blocks.
(Reapplying with corrected commit message)
We recently moved RISCVInsertVSETVLI from before vector register allocation
to after vector register allocation. When doing so, we added an unconditional
dependency on LiveIntervals - even at O0 where LiveIntevals hadn't previously
run. As reported in #93587, this was apparently not safe to do.
This change makes LiveIntervals optional, and adjusts all the update code to
only run wen live intervals is present. The only real tricky part of this
change is the abstract state tracking in the dataflow. We need to represent
a "register w/unknown definition" state - but only when we don't have
LiveIntervals.
This adjust the abstract state definition so that the AVLIsReg state can
represent either a register + valno, or a register + unknown definition.
With LiveIntervals, we have an exact definition for each AVL use. Without
LiveIntervals, we treat the definition of a register AVL as being unknown.
The key semantic change is that we now have a state in the lattice for which
something is known about the AVL value, but for which two identical lattice
elements do *not* neccessarily represent the same AVL value at runtime.
Previously, the only case which could result in such an unknown AVL was the
fully unknown state (where VTYPE is also fully unknown). This requires a
small adjustment to hasSameAVL and lattice state equality to draw this
important distinction.
The net effect of this patch is that we remove the LiveIntervals dependency
at O0, and O0 code quality will regress for cases involving register AVL values.
In practice, this means we pessimize code written with intrinsics at O0.
This patch is an alternative to #93796 and #94340. It is very directly
inspired by review conversation around them, and thus should be considered
coauthored by Luke.
Stacked on https://github.com/llvm/llvm-project/pull/94658.
We recently moved RISCVInsertVSETVLI from before vector register
allocation to after vector register allocation. When doing so, we added
an unconditional dependency on LiveIntervals - even at O0 where
LiveIntevals hadn't previously run. As reported in #93587, this was
apparently not safe to do.
This change makes LiveIntervals optional, and adjusts all the update
code to only run wen live intervals is present. The only real tricky
part of this change is the abstract state tracking in the dataflow. We
need to represent a "register w/unknown definition" state - but only
when we don't have LiveIntervals.
This adjust the abstract state definition so that the AVLIsReg state can
represent either a register + valno, or a register + unknown definition.
With LiveIntervals, we have an exact definition for each AVL use.
Without LiveIntervals, we treat the definition of a register AVL as
being unknown.
The key semantic change is that we now have a state in the lattice for
which something is known about the AVL value, but for which two
identical lattice elements do *not* necessarily represent the same AVL
value at runtime. Previously, the only case which could result in such
an unknown AVL was the fully unknown state (where VTYPE is also fully
unknown). This requires a small adjustment to hasSameAVL and lattice
state equality to draw this important distinction.
The net effect of this patch is that we remove the LiveIntervals
dependency at O0, and O0 code quality will regress for cases involving
register AVL values.
This patch is an alternative to
https://github.com/llvm/llvm-project/pull/93796 and
https://github.com/llvm/llvm-project/pull/94340. It is very directly
inspired by review conversation around them, and thus should be
considered coauthored by Luke.
As noted in one of the existing comments, the job AVLIsIgnored was
filing was really more of a demanded field role. Since we recently
realized we can use the values of VL on MI even in the backwards pass,
let's exploit that to improve demanded fields, and delete AVLIsIgnored.
Note that the test change is a real regression, but only incidental to
this patch. The backwards pass doesn't have the information that the VL
following a VL-preserving vtype is non-zero. This is an existing
problem, this patch just adds a few more cases where we prove
vl-preserving is legal.
In VSETVLIInfo we define == as lattice equality, e.g. unknown == unknown
and invalid == invalid. We need this to check if the information at a
block's entry or exit has changed.
However I think we may have been conflating it with the notion that the
state of VL and VTYPE are the same, which isn't the case for two
unknowns, and potentially in an upcoming patch where we don't know the
value of an AVL register (see #93796)
This patch switches over the use in emitVSETVLIs to use isCompatible
with all fields demanded, since we need VL and VTYPE to be known equal
at runtime rather than just the VSETVLIInfos to be the same.
This should be NFC for now we shouldn't reach here with an unknown
state. But it's needed if we're to introduce the notion of an AVLReg
with a nullptr ValNo, which will need this non-reflexive equality.
Specifically, hasSameAVL should return false for this case, but ==
should return true.
getVNInfoFromReg is expected to return a nullptr if-and-only-if the
operand is undef. (This was asserted for.) Reverse the order of the
checks to simplify an upcoming set of patches.
As noted in
https://github.com/llvm/llvm-project/pull/93796#issuecomment-2142752336,
a better way to teach RISCVInsertVSETVLI to work without LiveIntervals
is to set VNInfo to nullptr and teach the various methods to handle it.
We should try that approach first, so we no longer need this pre-commit
patch.
This reverts commit 4b4d36654d8056546b177b3d04c352ba0b16d7ea.