307 Commits

Author SHA1 Message Date
Sam Elliott
7184229fea
[NFC][MI] Tidy Up RegState enum use (2/2) (#177090)
This Change makes `RegState` into an enum class, with bitwise operators.
It also:
- Updates declarations of flag variables/arguments/returns from
`unsigned` to `RegState`.
- Updates empty RegState initializers from 0 to `{}`.

If this is causing problems in downstream code:
- Adopt the `RegState getXXXRegState(bool)` functions instead of using a
ternary operator such as `bool ? RegState::XXX : 0`.
- Adopt the `bool hasRegState(RegState, RegState)` function instead of
using a bitwise check of the flags.
2026-01-23 00:19:03 -08:00
Christudasan Devadasan
a095db2f0b
[CodeGen] Introduce MI flag for Live Range split instructions (#117543)
For some targets, it is required to identify the COPY instruction
corresponds to the RA inserted live range split. Adding the new
flag `MachineInstr::LRSplit` to serve the purpose.
2026-01-09 21:23:29 +05:30
theRonShark
3a1079fa25
Revert "[RegAlloc] Relax the split constrain on MBB prolog" (#169990)
Reverts llvm/llvm-project#168259

breaks hip buildot
2025-11-29 08:01:23 -05:00
Luo Yuanke
9bae84b017
[RegAlloc] Relax the split constrain on MBB prolog (#168259)
https://reviews.llvm.org/D52052 is to prevent register split on the MBB
which have prolog instructions defining the exec register (or mask register
that activate the threads of a warp in GPU). The constrain seems too
strict, because 1) If the split is allowed, it may fit the free live range
of a physical register, and no spill will happen; 2) The register class of
register that is under splitting may not be the same to the register that
is defined in prolog, so there is no interference with the register being
defined in prolog. 
The current code has another small issue. The MBB->getFirstNonDebugInstr()
just skip debug instructions, but SA->getFirstSplitPoint(Number) would skip
label and phi instructions. This cause some MBB with label instruction
being taken as prolog.
This patch is to relax the split constrain on MMB with prolog by checking
if the register defined in prolog has the common register class with the
register being split. It allow the split if the register defined in prolog
is physical register or there is no common register class.

---------

Co-authored-by: Yuanke Luo <ykluo@birentech.com>
2025-11-29 07:27:19 +08:00
Sander de Smalen
e1b08731e5 Revert "Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG""
This reverts commit bb78728826ff57f3df859e79bfd857b5a175bb6d.
2025-11-25 11:01:27 +00:00
Sander de Smalen
bb78728826
Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"
A SUBREG_TO_REG instruction expresses that the top bits of the result
register are set to a certain value (e.g. 0).

The example below expresses that the result of %1 will have the top 32
bits zeroed and the lower 32bits being equal to the result of INSTR.
```
    %0:gpr32 = INSTR
    %1:gpr64 = SUBREG_TO_REG 0, %0, sub32
```
When the RegisterCoalescer tries to remove SUBREG_TO_REG instructions by
coalescing %0 into %1, it must keep the same semantics. Currently
however, the RegisterCoalescer would emit:
```
    %1.sub32:gpr64 = INSTR
```
which no longer expresses that the top 32-bits of the register are
defined (zeroed) by INSTR.

This may cause issues with e.g. machine copy propagation where the pass
may think it can remove a COPY-like instruction because the MIR says
only the bottom 32-bits are defined/used, even though other uses of the
register rely on the top 32-bits being zeroed by the COPY-like
instruction.

This PR changes the RegisterCoalescer to instead emit:
```
    undef %1.sub32:gpr64 = MOVimm32 42, implicit-def %1
```
to express that the entire contents of %1:gpr64 are defined by the
instruction.

This tries to reland #134408 which had to be reverted due to a few reported
failures.
2025-11-24 15:55:19 +00:00
Matt Arsenault
242d0c770c
SplitKit: Use initializer lists (#167449) 2025-11-11 04:48:30 +00:00
Philip Reames
6ae6583089
[CodeGen] Finish untangling LRE::scanRemattable [nfc] (#161963)
This is an attempt to simplify the rematerialization logic in
InlineSpiller and SplitKit. I'd earlier done the same for
RegisterCoalescer in 57b673.

The basic idea of this change is that we don't need to check whether an
instruction is rematerializable early. Instead, we can defer the check
to the point where we're actually trying to materialize something. We
also don't need to indirect that query through a VNI key, and can
instead just check the instruction directly at the use site.
2025-10-07 07:19:58 -07:00
Kazu Hirata
cfc2b0d094
[llvm] Use llvm::SmallVector::pop_back_val (NFC) (#136533) 2025-04-21 08:13:16 -07:00
Philip Reames
bdb4012fe3 [CodeGen] Remove parameter from LiveRangeEdit::canRematerializeAt [NFC]
Only one caller cares about the true case of this parameter, so move
the check to that single caller.  Note that RegisterCoalescer seems
like it should care, but it already duplicates the check several
lines above.
2025-03-14 09:12:07 -07:00
Philip Reames
e26bcf1627 [CodeGen] Use early return to simplify SplitEditor::defFromParent [NFC] 2025-03-13 16:13:26 -07:00
Matt Arsenault
b21663cb5b
SplitKit: Take register class directly from instruction definition (#129727)
This fixes an expensive chesk failure after 8476a5d480304. The issue
was essentially that getRegClassConstraintEffectForVReg was not doing
anything useful, sometimes. If the register passed to it is not present
in the instruction, it is a no-op and returns the original classe. The
Edit->getReg() register may not be the register as it appears in either
the use or def instruction. It may be some split register, so take
the register directly from the instruction being rematerialized.

Also directly query the constraint from the def instruction, with a
hardcoded operand index. This isn't ideal, but all the other
rematerialize
code makes the same assumption.

So far I've been unable to reproduce this with a standalone MIR test. In
the
original case, stop-before=greedy and running the one pass is not
working.
2025-03-06 20:06:35 +07:00
Matt Arsenault
8476a5d480
SplitKit: Fix rematerialization undoing subclass based split (#122110)
This fixes an allocation failure in the new test.

In cases where getLargestLegalSuperClass can inflate the register class,
rematerialization could effectively undo a split which was done to
inflate
the register class, if the defining instruction can only write a
subclass
and the use can read the superclass.

Some of the x86 tests changes look like improvements, but some are
likely regressions.

I'm not entirely sure this is the correct place to fix this. It also
seems more complicated than necessary, but the decision to change
the register class is far removed from the point where the decision
to split the virtual register is made. I'm also also not sure if this
should be considering the register classes of all the use indexes
in getUseSlots, rather than just checking if this use index instruction
reads the register.
2025-03-04 10:04:14 +07:00
Craig Topper
a70175ab93 [CodeGen] Use MCRegister and Register. NFC 2025-03-02 22:33:26 -08:00
Jay Foad
0d71b3e403
[CodeGen] Remove unused argument from getCoveringSubRegIndexes. NFC. (#122884) 2025-01-14 12:59:31 +00:00
Kazu Hirata
735ab61ac8
[CodeGen] Remove unused includes (NFC) (#115996)
Identified with misc-include-cleaner.
2024-11-12 23:15:06 -08:00
Jay Foad
e03f427196
[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133)
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
2024-09-19 16:16:38 +01:00
Kazu Hirata
7c6d0d26b1
[llvm] Use llvm::unique (NFC) (#95628) 2024-06-14 22:49:36 -07:00
David Green
303a7835ff
[GreedyRA] Improve RA for nested loop induction variables (#72093)
Imagine a loop of the form:
```
  preheader:
    %r = def
  header:
    bcc latch, inner
  inner1:
    ..
  inner2:
    b latch
  latch:
    %r = subs %r
    bcc header
```

It can be possible for code to spend a decent amount of time in the
header<->latch loop, not going into the inner part of the loop as much.
The greedy register allocator can prefer to spill _around_ %r though,
adding spills around the subs in the loop, which can be very detrimental
for performance. (The case I am looking at is actually a very deeply
nested set of loops that repeat the header<->latch pattern at multiple
different levels).

The greedy RA will apply a preference to spill to the IV, as it is live
through the header block. This patch attempts to add a heuristic to
prevent that in this case for variables that look like IVs, in a similar
regard to the extra spill weight that gets added to variables that look
like IVs, that are expensive to spill. That will mean spills are more
likely to be pushed into the inner blocks, where they are less likely to
be executed and not as expensive as spills around the IV.

This gives a 8% speedup in the exchange benchmark from spec2017 when
compiled with flang-new, whilst importantly stabilising the scores to be
less chaotic to other changes. Running ctmark showed no difference in
the compile time. I've tried to run a range of benchmarking for
performance, most of which were relatively flat not showing many large
differences. One matrix multiply case improved 21.3% due to removing a
cascading chains of spills, and some other knock-on effects happen which
usually cause small differences in the scores.
2023-11-18 09:55:19 +00:00
Christudasan Devadasan
ce7fd498ed
[AMDGPU] RA inserted scalar instructions can be at the BB top (#72140)
We adjust the insertion point at the BB top for spills/copies during RA
to ensure they are placed after the exec restore instructions required
for the divergent control flow execution. This is, however, required
only for the vector operations. The insertions for scalar registers can
still go to the BB top.
2023-11-16 10:30:03 +05:30
Jon Roelofs
bdd17b853f
Remove a reference to rdar://problem/10664933
The original commit, and the comments in the code already provide sufficient
context. But for posterity, there's a tiny bit more that might be useful if
someone is digging here in the future:

> This is related to <rdar://problem/10318439> Lower invokes into terminating
> machine instructions.
>
> The return value from a function call is live in to that function call's
> landing pad. The landing pad is shared with a later call, and the variable is
> undef on the first exceptional edge.
>
> Our computation of the last legal split point gets confused because the
> return value is live-out from the calling block, and live-in to the landing
> pad, but it is not live on the edge itself.
>
> Fixed in r147911 and r147912.
2023-08-09 15:10:08 -07:00
Matt Arsenault
4d42e8b5d1 Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.

The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work
around the underlying problem with SUBREG_TO_REG.
2023-07-31 20:15:45 -04:00
Vitaly Buka
a496c8be6e Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.

No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

Reviewed By: alexfh

Differential Revision: https://reviews.llvm.org/D156381
2023-07-26 22:13:32 -07:00
Yashwant Singh
b7836d8562 [CodeGen]Allow targets to use target specific COPY instructions for live range splitting
Replacing D143754. Right now the LiveRangeSplitting during register allocation uses
TargetOpcode::COPY instruction for splitting. For AMDGPU target that creates a
problem as we have both vector and scalar copies. Vector copies perform a copy over
a vector register but only on the lanes(threads) that are active. This is mostly sufficient
however we do run into cases when we have to copy the entire vector register and
not just active lane data. One major place where we need that is live range splitting.

Allowing targets to use their own copy instructions(if defined) will provide a lot of
flexibility and ease to lower these pseudo instructions to correct MIR.

- Introduce getTargetCopyOpcode() virtual function and use if to generate copy in Live range
 splitting.
- Replace necessary MI.isCopy() checks with TII.isCopyInstr() in register allocator pipeline.

Reviewed By: arsenm, cdevadas, kparzysz

Differential Revision: https://reviews.llvm.org/D150388
2023-07-07 22:29:50 +05:30
Jay Foad
d170a254a5 [CodeGen] Define and use MachineOperand::getOperandNo
This is a helper function to very slightly simplify many calls to
MachineInstruction::getOperandNo.

Differential Revision: https://reviews.llvm.org/D143250
2023-02-07 11:50:57 +00:00
Kazu Hirata
f7dffc28b3 Don't include None.h (NFC)
I've converted all known uses of None to std::nullopt, so we no longer
need to include None.h.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-10 11:24:26 -08:00
Gregory Alfonso
cb38be9ed3 [NFC] Use Register instead of unsigned for variables that receive a Register object
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D139451
2022-12-07 00:23:34 +00:00
Kazu Hirata
998960ee1f [CodeGen] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:08 -08:00
Matt Arsenault
8d0383eb69 CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.

Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.

Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
2022-07-18 17:23:41 -04:00
Kito Cheng
e9f7263b38 Reland "[SplitKit] Handle early clobber + tied to def correctly"
This reverts commit 7207373e1eb0dd419b4e13a5e2d0ca146ef9544e.

We found another RISC-V bug when landing D126048, and it has been fixed
by D127642 now.

Differential Revision: https://reviews.llvm.org/D126048
2022-06-16 17:13:09 +08:00
Kito Cheng
7207373e1e Revert "[SplitKit] Handle early clobber + tied to def correctly"
Revert due to failed on LLVM_ENABLE_EXPENSIVE_CHECKS.

This reverts commit e14d04909df4e52e531f6c2e045c3cf9638dd817.
2022-06-08 13:05:35 +08:00
Kito Cheng
e14d04909d [SplitKit] Handle early clobber + tied to def correctly
Spliter will try to extend a live range into `r` slot for a use operand,
that's works on most situaion, however that not work correctly when the operand
has tied to def, and the def operand is early clobber.

Give an example to demo what's wrong:
  0  %0 = ...
 16  early-clobber %0 = Op %0 (tied-def 0), ...
 32  ... = Op %0

Before extend:
 %0 = [0r, 0d) [16e, 32d)

The point we want to extend is 0d to 16e not 16r in this case, but if
we use 16r here we will extend nothing because that already contained
in [16e, 32d).

This patch add check for detect such case and adjust the extend point.

Detailed explanation for testcase: https://reviews.llvm.org/D126047

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D126048
2022-06-08 11:33:05 +08:00
Benjamin Kramer
a40dc4eaf8 Simplify mask creation with llvm::seq. NFCI. 2022-02-05 23:35:41 +01:00
Mircea Trofin
592f52de33 [nfc][regalloc] const LiveIntervals within the allocator
Once built, LiveIntervals are immutable. This patch captures that.

Differential Revision: https://reviews.llvm.org/D118918
2022-02-03 12:35:36 -08:00
Matt Arsenault
87c00878d3 SplitKit: Remove decade old live interval hack
This was trying to fixup broken live intervals coming out of the
coalescer. The verifier is more complete now and no tests seem to fail
without this.
2021-09-15 17:35:59 -04:00
Ruiling Song
e1beebbac5 SplitKit: Don't further split subrange mask in buildCopy
We may use several COPY instructions to copy the needed sub-registers
during split. But the way we split the lanes during the COPYs may be
different from the subranges of the old register. This would fail when we
extend the subranges of the new register because the LaneMasks do not
match exactly between subranges of new register and old register.
Since we are bundling the COPYs, I think there is no need to further refine the
subranges of the new register based on the set of LaneMasks of the inserted COPYs.

I am not sure if there will be further breaking cases. But as the subranges of
new register are created based on the LaneMasks of the subranges of old register,
it will be highly possible we will always find an exact LaneMask match.
We can think about how to make the extendPHIKillRanges() work for
subrange mask mismatch case if we meet more such cases in the future.

The test case was from D105065 by @arsenm.

Differential Revision: https://reviews.llvm.org/D107829
2021-08-13 07:36:38 +08:00
Serguei Katkov
9f631d14c6 [GreedyRA] Add support for invoke statepoint with tied-defs.
statepoint instruction uses tied-def registers to represent live gc value which
is use and def at the same time on a call.
At the same time invoke statepoint instruction is a last split point which can throw and
jump to landing pad.
As a result we have instructon which is last split point with tied-defs registers and
we need to teach Greedy RA to work with it.

The option -use-registers-for-gc-values-in-landing-pad controls whether statepoint lowering
will generate tied-defs for invoke statepoint and is off by default now.

To resolve all issues the following changes has been done.
1) Last Split point for invoke statepoint should be statepoint itself

If statepoint has a def it is a relocated gc pointer and it should be available in landing pad.
So we cannot split interval after statepoint at end of basic block.

2) Do not split interval on tied-def

If end of interval for overlap utility is a use which has tied-def we
should not split interval on this instruction due to in this case use
and def may have different registers and it breaks tied-def property.

3) Take into account Last Split Point for enterIntvAtEnd

If the use after Last Split Point is a def so it should be tied-def and
we can take the def of the tied-use as ParentVNI and thus
tied-use and tied-def will be live in resulting interval.

4) Handle the case when def is after LIP in InlineSpiller

If def of LI is after last insertion point of basic block we cannot hoist in this BB.

The example of such instruction is invoke statepoint where def represents the
relocated live gc pointer. Invoke is a last insertion point and its def is located after it.
In this case there is no place to insert spill and we bail out.

5) Fix removeBackCopies to account empty copies

RegAssignMap cannot hold empty interval, so do not set stop
to kill value if it produces empty interval.

This can happen if we remove back-copy and right before that we have another
back-copy.

For example, for parent %0 we can get
%1 = COPY %0
%2 = COPY %0
while we removing %2 we cannot set kill for %1 due to its empty.

6) Do not hoist copy to BB if its def is after LSP

If the parent def is a LastSplitPoint or later we cannot hoist copy to this basic block
because inserted copy (or re-materialization) will be located before the def.

All parts have been reviewed separately as follows:
https://reviews.llvm.org/D100747
https://reviews.llvm.org/D100748
https://reviews.llvm.org/D100750
https://reviews.llvm.org/D100927
https://reviews.llvm.org/D100945
https://reviews.llvm.org/D101028

Reviewers: reames, rnk, void, MatzeB, wmi, qcolombet
Reviewed By: reames, qcolombet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D101150
2021-05-05 11:13:35 +07:00
Hongtao Yu
b98807df05 [CSSPGO] Exclude pseudo probes from slot index
Pseudo probe are currently given a slot index like other regular instructions. This affects register pressure and lifetime weight computation because of enlarged lifetime length with pseudo probe instructions. As a consequence, program could get different code generated w/ and w/o pseudo probes. I'm closing the gap by excluding pseudo probes from stack index and downstream register allocation related passes.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D100334
2021-04-19 17:55:35 -07:00
Kazu Hirata
ffba9e596d [CodeGen] Use range-based for loops (NFC) 2021-02-21 19:58:07 -08:00
Kazu Hirata
0b417ba20f [CodeGen] Use range-based for loops (NFC) 2021-02-20 21:46:02 -08:00
Mircea Trofin
82492f24ff [NFC][Regalloc] Share the VirtRegAuxInfo object with LiveRangeEdit
VirtRegAuxInfo is an extensibility point, so the register allocator's
decision on which implementation to use should be communicated to the
other users - namely, LiveRangeEdit.

Differential Revision: https://reviews.llvm.org/D96898
2021-02-19 07:44:28 -08:00
Kazu Hirata
fd04f3a30c [CodeGen] Use range-based for loops (NFC) 2021-02-18 22:46:43 -08:00
Philip Reames
5318d9e516 [splitkit] Add a minor wrapper function for readability [NFC] 2021-02-18 09:00:22 -08:00
Philip Reames
1dfb06d0b4 [regalloc] Add a couple of dump routines for ease of debugging [NFC] 2021-02-18 08:50:00 -08:00
Matt Arsenault
1b3d8ddeb9 CodeGen: Move function to get subregister indexes to cover a LaneMask
Return the best covering index, and additional needed to complete the
mask. This logically belongs in TargetRegisterInfo, although I ended
up not needing it for why I originally split this out.
2021-02-15 17:05:37 -05:00
Kazu Hirata
c5c4dbd279 [CodeGen] Use llvm::append_range (NFC) 2021-01-21 19:59:46 -08:00
Matt Arsenault
29bd6519d2 SplitKit: Use Register 2020-11-30 15:09:33 -05:00
Jay Foad
cdac4492b4 [SplitKit] Cope with no live subranges in defFromParent
Following on from D87757 "[SplitKit] Only copy live lanes", it is
possible to split a live range at a point when none of its subranges
are live. This patch handles that case by inserting an implicit def
of the superreg.

Patch by Quentin Colombet!

Differential Revision: https://reviews.llvm.org/D88397
2020-09-30 10:16:25 +01:00
Jay Foad
b34ddfcc76 [SplitKit] In addDeadDef tolerate parent range that defines more lanes
Following on from D87757 "[SplitKit] Only copy live lanes", in
SplitEditor::addDeadDef, when we're checking whether the parent live
interval has a subrange defining the same lanes, tolerate the case
where the parent subrange defines a superset of the lanes. This can
happen when the child subrange comes from SplitEditor::buildCopy
decomposing a partial copy into a sequence of subreg copies that cover
the required lanes.

Differential Revision: https://reviews.llvm.org/D88020
2020-09-25 11:31:56 +01:00
Jay Foad
6f6d389da5 [SplitKit] Only copy live lanes
When splitting a live interval with subranges, only insert copies for
the lanes that are live at the point of the split. This avoids some
unnecessary copies and fixes a problem where copying dead lanes was
generating MIR that failed verification. The test case for this is
test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir.

Without this fix, some earlier live range splitting would create %430:

%430 [256r,848r:0)[848r,2584r:1)  0@256r 1@848r L0000000000000003 [848r,2584r:0)  0@848r L0000000000000030 [256r,2584r:0)  0@256r weight:1.480938e-03
...
256B     undef %430.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec
...
848B     %430.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec
...
2584B    %431:vreg_128 = COPY %430:vreg_128

Then RAGreedy::tryLocalSplit would split %430 into %432 and %433 just
before 848B giving:

%432 [256r,844r:0)  0@256r L0000000000000030 [256r,844r:0)  0@256r weight:3.066802e-03
%433 [844r,848r:0)[848r,2584r:1)  0@844r 1@848r L0000000000000030 [844r,2584r:0)  0@844r L0000000000000003 [844r,844d:0)[848r,2584r:1)  0@844r 1@848r weight:2.831776e-03
...
256B     undef %432.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec
...
844B     undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128 {
           internal %433.sub2:vreg_128 = COPY %432.sub2:vreg_128
848B     }
  %433.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec
...
2584B    %431:vreg_128 = COPY %433:vreg_128

Note that the copy from %432 to %433 at 844B is a curious
bundle-without-a-BUNDLE-instruction that SplitKit creates deliberately,
and it includes a copy of .sub0 which is not live at this point, and
that causes it to fail verification:

*** Bad machine code: No live subrange at use ***
- function:    zextload_global_v64i16_to_v64i64
- basic block: %bb.0  (0x7faed48) [0B;2848B)
- instruction: 844B    undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128
- operand 1:   %432.sub0:vreg_128
- interval:    %432 [256r,844r:0)  0@256r L0000000000000030 [256r,844r:0)  0@256r weight:3.066802e-03
- at:          844B

Using real bundles with a BUNDLE instruction might also fix this
problem, but the current fix is less invasive and also avoids some
unnecessary copies.

https://bugs.llvm.org/show_bug.cgi?id=47492

Differential Revision: https://reviews.llvm.org/D87757
2020-09-17 09:26:11 +01:00