3703 Commits

Author SHA1 Message Date
Alexey Bataev
c1bcf5dd0a [SLP]Fix PR61835: Assertion `I->use_empty() && "trying to erase
instruction with users."' failed.

If the externally used scalar is part of the tree and is replaced by
extractelement instruction, need to add generated extractelement
instruction to the list of the ExternallyUsedValues to avoid deletion
during vectorization.
2023-03-31 14:21:19 -07:00
David Green
965a090f02 Revert "[IVDescriptors] Add pointer InductionDescriptors with non-constant strides"
Multiple errors have being reported on
https://reviews.llvm.org/rG498aa534f472d28db893aa9a8627d0b46e17f312

Reverting until the correctness issues can be resolved.

We are also seeing a lot of performance differences from the patch.  Some are
looking good, but some are looking pretty bad.
2023-03-31 11:08:50 +01:00
Philip Reames
498aa534f4 [IVDescriptors] Add pointer InductionDescriptors with non-constant strides
This matches the handling for integer IVs.  I left the non-opaque cases alone, mostly because they're largely irrelevant today.

This doesn't actually make much difference in vectorization right now as we immediately fail on aliasing checks (which also bail on non-constant strides).  Slightly suprisingly, it's the case which *do* need runtime checks which work after this patch as they don't use the same dependency analysis path.

This will also enable non-constant stride pointer recurrences for other consumers.  I've auditted said code, and don't see any obvious issues.
2023-03-30 11:56:00 -07:00
Alexey Bataev
9255124a07 [SLP]Fix a crash when trying to shuffle multiple nodes.
Need to transform mask after applying shuffle using the mask itself as
a base to correctly mark with identity those indices, actually used in
previous shuffle. Allows to fix a crash, if different sized vectors are
shuffled.
2023-03-30 09:32:11 -07:00
David Sherwood
0ef8a79b12 [LoopVectorize] Add non-zero check for MaxPowerOf2RuntimeVF in computeMaxVF
This one-line patch just tightens up the code added in
1c4fedfa35aeb8b456e2d8f4f826c0e026b9d863
where we try to avoid tail-folding if we know the runtime
VF will always be a multiple of the trip count.
2023-03-29 10:08:32 +00:00
Krzysztof Drewniak
916425b2d1 [llvm] Use pointer index type for more GEP offsets (pre-codegen)
Many uses of getIntPtrType() were using that type to calculate the
neened type for GEP offset arguments. However, some time ago,
DataLayout was extended to support pointers where the size of the
pointer is not equal to the size of the values used to index it.

Much code was already migrated to, for example, use getIndexSizeInBits
instead of getPtrSizeInBits, but some rewrites still used
getIntPtrType() to get the type for GEP offsets.

This commit changes uses of getIntPtrType() to getIndexType() where
they are involved in a GEP-related calculation.

In at least one case (bounds check insertion) this resolves a compiler
crash that the new test added here would previously trigger.

This commit does not impact
- C library-related rewriting (memcpy()), which are operating under
the assumption that intptr_t == size_t. While all the mechanisms for
breaking this assumption now exist, doing so is outside the scope of
this commit.
- Code generation and below. Note that the use of getIntPtrType() in
CodeGenPrepare will be changed in a future commit.
- Usage of getIntPtrType() in any backend

Depends on D143435

Reviewed By: arichardson

Differential Revision: https://reviews.llvm.org/D143437
2023-03-28 16:41:02 +00:00
Florian Hahn
417fe52e6f
Revert "[SLP] Check with target before vectorizing GEP Indices."
This reverts commit 1387a13e1d0bac94457626ef3e7427c84caf6e65.

This introduced performance regressions on AArch64, when the cost of a
vector GEP + extracts is offset by the benefits of vectorizing the rest
of the tree.

The test in llvm/test/Transforms/SLPVectorizer/AArch64/vector-getelementptr.ll
illustrates the issue. It was extracted from code that regressed a SPEC
benchmark by 15%.
2023-03-28 08:06:53 +01:00
David Sherwood
1c4fedfa35 [LoopVectorize] Don't tail-fold for scalable VFs when there is no scalar tail
Currently in LoopVectorize we avoid tail-folding if we can
prove the trip count is always a multiple of the maximum
fixed-width VF. This works because we know the vectoriser
only ever chooses a VF that is a power of 2. However, if
we are also considering scalable VFs then we conservatively
bail out of the optimisation because we don't know the value
of vscale, which could be an odd or prime number, etc.

This patch tries to enable the same optimisation for scalable
VFs by asking if vscale is known to be a power of 2. If so,
we can then query the maximum value of vscale and use the same
logic as we do for fixed-width VFs. I've also added a new TTI
hook called isVScaleKnownToBeAPowerOfTwo that does the same
thing as the existing TargetLowering hook.

Differential Revision: https://reviews.llvm.org/D146199
2023-03-27 08:34:30 +00:00
Florian Hahn
ea929a07b6
[LV] Set inbounds flag using CreateGEP in vectorizeInterleaveGroup(NFC).
This avoids having to cast the result of the builder to
GetElementPtrInst.
2023-03-22 11:29:57 +00:00
Florian Hahn
af99aa0ff7
[LV] Set imbounds flag using CreateGEP in VPWidenMemInst (NFC).
This avoids having to cast the result of the builder to
GetElementPtrInst.
2023-03-21 11:44:21 +00:00
Alexey Bataev
59ff9d3777 [SLP]Fix PR61554: use of missing vectorized value in buildvector nodes.
If the buildvector node matches the vector node, it reuse the vector
value from this vector node, but its VectorizedValue field is not
updated. Need to update this field to avoid misses during the analysis
of the reused gather/buildvector nodes.
2023-03-20 12:05:26 -07:00
Florian Hahn
371bb2c9d3
[VPlan] Move createReplicateRegion out of VPRecipeBuilder.h. (NFC)
The function doesn't use anything from VPRecipeBuilder, so move the
definition to where it is actually used and turn it into a simple static
function.

It also makes the VPRecipeBuilder argument for createAndOptimizeReplicateRegions
unnecessary.
2023-03-18 20:30:49 +00:00
Florian Hahn
6a6b65a84c
[LV] Restructure code creating replicate region (NFC).
Re-order recipe and block creation to be in order, as suggested
post-commit for 2db71c9851e5.
2023-03-18 17:17:07 +00:00
Alexey Bataev
0ad87ffdcc [SLP]Introduce shuffle of the nodes + gather/vectorbuild of the remaining scalars.
Currently compiler does not support mixing of shuffled nodes
+ gather/buildvector of the remaining scalar values. It may reduce total
  number of instructions and improve performance of the
  gather/buildvector sequences.

Part of D110978

Differential Revision: https://reviews.llvm.org/D146167
2023-03-17 11:18:36 -07:00
Florian Hahn
962c306a11
[LV] Don't consider pointer as uniform if it is also stored.
Update isVectorizedMemAccessUse to also check if the pointer is stored.
This prevents LV to incorrectly consider a pointer as uniform if it is
used as both pointer and stored by the same StoreInst.

Fixes #61396.
2023-03-17 16:26:16 +00:00
Graham Hunter
9aa01c4e89 [LV] Remove scalable constraints on creating bitcasts
InnerLoopVectorizer::createBitOrPointerCast only supported fixed
length vectors since it hadn't been updated. Supporting scalable
vectors is just a matter of changing types and using elementcount
instead of numelements, since there's nothing which actually relies
on knowing the exact length of the vector.

Original written by mgabka.

Split out from D145163.
2023-03-17 16:19:33 +00:00
Florian Hahn
eca14a810e
[VPlan] Consolidate replicate region optimizations (NFC).
As suggested in D143865, consolidate replicate region creation and
optimization in a single helper that's exposed and used by LV.
2023-03-16 17:06:44 +00:00
Kazu Hirata
398af9b43b [llvm] Use *{Map,Set}::contains (NFC) 2023-03-15 18:06:32 -07:00
Michael Maitland
194f3dc8fd [VPlan] VPWidenIntOrFpInductionRecipe inherits from VPHeaderPHIRecipe
Differential Revision: https://reviews.llvm.org/D144125
2023-03-14 17:01:34 -07:00
Valery N Dmitriev
f9b438b519 [SLP] Outline GEP chain cost modeling into new TTI interface - NFCI.
Cost modeling for GEPs should actually be target dependent but is currently
done inside SLP target-independent way.
Sinking it into TTI enables target dependent implementation.
This patch adds new TTI interface and implementation of the basic functionality
trying to retain existing cost modeling.

Differential Revision: https://reviews.llvm.org/D144770
2023-03-14 14:01:34 -07:00
Alexey Bataev
641939baa9 [SLP]Remove CreateShuffle lambda and reuse ShuffleBuilder functions.
After merging main part of the gather/buildvector code, CreateShuffle
lambda can removed and ShuffleBuilder add functions can be used instead.
Also, part of the code from CreateShuffle migrated to createShuffle of
the BaseShuffleAnalysis::createShuffle function for better code emission.

Differential Revision: https://reviews.llvm.org/D145988
2023-03-14 10:15:41 -07:00
Alexey Bataev
874c49f554 [SLP]Fix PR61395: need to adjust vector factor after emitting shuffle
operation for combined entries.

The vector factor after combining of the shuffle entries is defined by
the size of the mask, not by the vector factors  of the original
entries. So, need to adjust it to emit correct code.
2023-03-14 06:27:08 -07:00
Kazu Hirata
c8f9555c4d [Transforms] Use *{Set,Map}::contains (NFC) 2023-03-14 00:24:30 -07:00
Jakub Kuderski
b9db89fbcf [ADT][NFCI] Do not use non-const lvalue-refs with enumerate in llvm/
Replace references to `enumerate` results with either const lvalue
rerences or structured bindings. I did not use structured bindings
everywhere as it wasn't clear to me it would improve readability.

This is in preparation to the switch to `zip` semantics which won't
support non-const lvalue reference to elements:
https://reviews.llvm.org/D144503.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D145987
2023-03-13 20:59:06 -04:00
David Green
98481bc723 [LV][VPlan] Fix printing TripCount liveins. NFC
The TripCount liveins would currently be printed as badref in the vplan as they
are not allocated slots in the VPSlotTracker. This patch allocates them a slot
and adds them to the printed Live-Ins. It also makes a minor adjustment to
printing of Live-ins to reduce the empty lines when multiple Live-ins are
present.

Differential Revision: https://reviews.llvm.org/D145507
2023-03-13 19:44:12 +00:00
Philip Reames
dae682ce92 [IRBuilder] Add utilities for materializing scalable values [nfc]
These idioms already appear a number of places in code, and upcoming changes to the various sanitizers continue to need more instances of the same patterns.

Differential Revision: https://reviews.llvm.org/D145945
2023-03-13 11:54:19 -07:00
Alexey Bataev
f3a68ac10c [SLP][NFC]Initial merge of gather/buildvector code in the createBuildVector function.
Required for future changes with combining shuffled nodes and
buildvector sequences to improve cost/emission of the gather nodes.

Part of D110978

Differential Revision: https://reviews.llvm.org/D145732
2023-03-13 06:11:05 -07:00
Florian Hahn
2db71c9851
[VPlan] Simplify code in createReplicateRegion (NFC).
Simplify the code as suggested in D143865.
2023-03-11 11:47:23 +01:00
Arthur Eubanks
7c3c981442 [Passes] Remove some legacy passes
DFAJumpThreading
JumpThreading
LibCallsShrink
LoopVectorize
SLPVectorizer
DeadStoreElimination
AggressiveDCE
CorrelatedValuePropagation
IndVarSimplify

These are part of the optimization pipeline, of which the legacy version is deprecated and being removed.
2023-03-10 17:17:00 -08:00
Alexey Bataev
93a9be0cea [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes.
Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes.
But the compiler may do the same for other gather/buildvector nodes too, just need to check the
dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet.

Part of D110978

Differential Revision: https://reviews.llvm.org/D144958
2023-03-10 13:19:43 -08:00
Florian Hahn
9be8d90e62
[VPlan] Add VPWidenSelectRecipe::getCond() (NFC).
Add helper to access condition, as suggested in D144489.
2023-03-10 17:49:23 +01:00
Florian Hahn
54558fd8f3
[VPlan] Replace InvariantCond field from VPWidenSelectRecipe.
There is no need to store information about invariance in the recipe.
Replace the fields with checks of the operands using
isDefinedOutsideVectorRegions.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D144489
2023-03-10 15:28:43 +01:00
Hans Wennborg
3b3a4c270b Revert "[SLP]Initial support for reshuffling of non-starting buildvector/gather nodes."
This caused verifier errors:

  Instruction does not dominate all uses!
    %8 = insertelement <2 x i64> %7, i64 %pgocount1330, i64 1
    %15 = shufflevector <2 x i64> %8, <2 x i64> poison, <2 x i32> <i32 1, i32 1>
  in function ?NearestInclusiveAncestorAssignedToSlot@SlotScopedTraversal@blink@@SAPAVElement@2@ABV32@@Z

(or register allocator crash when the verifier was disabled).

See comment on the code review.

> Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes.
> But the compiler may do the same for other gather/buildvector nodes too, just need to check the
> dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet.
>
> Part of D110978
>
> Differential Revision: https://reviews.llvm.org/D144958

This reverts commit a611b3f3059e4c3b9e7b914091c3edaef099fd5d.
It also reverts 7a4061ae372b3262703ffeea3b64db89187db611 which depended on the above.
2023-03-10 14:40:12 +01:00
Florian Hahn
a8adb38a96
[VPlan] Replace invariance fields from VPWidenGEPRecipe.
There is no need to store information about invariance in the recipe.
Replace the fields with checks of the operands using
isDefinedOutsideVectorRegions.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D144487
2023-03-09 17:52:22 +01:00
Florian Hahn
79272ec028
[VPlan] Add predicate to VPReplicateRecipe, expand region later.
This patch adds the predicate as additional operand to VPReplicateRecipe
during initial construction. The predicated recipes are later moved into
replicate regions. This simplifies constructions and some VPlan
transformations, like fixed-order recurrence handling.

It also improves codegen in some cases (e.g. for in-loop reductions),
because the recipes remain in the same block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D143865
2023-03-08 20:11:28 +01:00
Florian Hahn
3b2cf45d6b
[VPlan] Check if recipe is in ReplicateRegion for IfPredicateInstr (NFC)
Check if replicate recipe is in a replicate region when considering to
collect predicated instructions. This allows use IsPredicated for
recipes with a mask attached directly in D143865.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D145322
2023-03-08 11:39:44 +01:00
Alexey Bataev
a611b3f305 [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes.
Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes.
But the compiler may do the same for other gather/buildvector nodes too, just need to check the
dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet.

Part of D110978

Differential Revision: https://reviews.llvm.org/D144958
2023-03-07 12:45:40 -08:00
Nikita Popov
ffe8f47d72 [IR] Add operator<< overload for CmpInst::Predicate (NFC)
I regularly try and fail to use this while debugging.
2023-03-07 15:10:56 +01:00
sgokhale
4f018e54c4 [LV][AArch64] Resolve test failure due use of unordered container
AArch64/reg-usage.ll has an issue with the output ordering due to use of unordered container. This was discovered by -DLLVM_REVERSE_ITERATION:BOOL=ON
cmake option.
This patch tries to address it by making use of ordered container.

Differential Revision: https://reviews.llvm.org/D145472/
2023-03-07 16:42:21 +05:30
Alexey Bataev
c411965820 [SLP]Fix PR61224: Compiler hits infinite loop.
IRBuilder in many cases is able to fold constant code automatically,
but in some cases (for some intrinsics) it cannot do it. Need to perform
manual calculation, if constant provided in these corner cases, to avoid
infinite loop.
2023-03-06 13:46:41 -08:00
Florian Hahn
be968dbeee
[VPlan] VPWidenCallRecipe has side-effects if the call has.
Handle VPWidenCallRecipe in VPRecipeBase::mayHaveSideEffects by
delegating to the underlying call.
2023-03-05 12:08:56 +01:00
Graham Hunter
a180344589 [LV] Allow scalarization of function calls when masking is required
This patch adds support for scalarizing calls to a function when
there is a vector variant that cannot be used, either because there
isn't a masked variant or because the cost model indicated a VF
without a masked variant was better.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D134422
2023-03-03 15:26:04 +00:00
Nikita Popov
f7ca013332 [llvm-c] Remove bindings for creating legacy passes
Legacy passes are only supported for codegen, and I don't believe
it's possible to write backends using the C API, so we should drop
all of those. Reduces the number of places that need to be modified
when removing legacy passes.

Differential Revision: https://reviews.llvm.org/D144970
2023-03-02 09:53:50 +01:00
Sander de Smalen
c41b41eb11 [LoopVectorize] Use overflow-check analysis to improve tail-folding.
This work follows on from D142109 and addresses a possible regression
when we know the loop iteration counter cannot overflow.

When we know the overflow-check always evaluates to false, it's better to
use the other style of tail folding where it assumes a runtime check was
added, because that avoids having to calculate a modified trip-count.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D142894
2023-03-01 14:17:58 +00:00
Sander de Smalen
fe1b51ffee [LoopVectorize] Remove runtime check and scalar tail loop when tail-folding.
When using tail-folding and using the predicate for both data and control-flow
(the next vector iteration's predicate is generated with the llvm.active.lane.mask
intrinsic and then tested for the backedge), the LoopVectorizer still inserts a
runtime check to see if the 'i + VF' may at any point overflow for the given
trip-count. When it does, it falls back to a scalar epilogue loop.

We can get rid of that runtime check in the pre-header and therefore also
remove the scalar epilogue loop. This reduces code-size and avoids a runtime
check.

Consider the following loop:

  void foo(char * __restrict__ dst, char *src, unsigned long N) {
      for (unsigned long  i=0; i<N; ++i)
          dst[i] = src[i] + 42;
  }

If 'N' is e.g. ULONG_MAX, and the VF > 1, then the loop iteration counter
will overflow when calculating the predicate for the next vector iteration
at some point, because LLVM does:

  vector.ph:
    %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)

  vector.body:
    %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ]
    ...

    %index.next = add i64 %index, 16
      ; The add above may overflow, which would affect the lane mask and control flow. Hence a runtime check is needed.
    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index.next, i64 %N)
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7

The solution:

What we can do instead is calculate the predicate before incrementing
the loop iteration counter, such that the llvm.active.lane.mask is
calculated from 'i' to 'tripcount > VF ? tripcount - VF : 0', i.e.

  vector.ph:
    %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)
    %N_minus_VF = select %N > 16 ? %N - 16 : 0

  vector.body:
    %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ]
    ...

    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index, i64 %N_minus_VF)
    %index.next = add i64 %index, %4
      ; The add above may still overflow, but this time the active.lane.mask is not affected
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7

For N = 20, we'd then get:

  vector.ph:
    %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)
      ; %active.lane.mask.entry = <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1>
    %N_minus_VF = select 20 > 16 ? 20 - 16 : 0
      ; %N_minus_VF = 4

  vector.body: (1st iteration)
    ... ; using <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> as predicate in the loop
    ...
    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 4)
      ; %active.lane.mask.next = <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
    %index.next = add i64 0, 16
      ; %index.next = 16
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
      ; %8 = 1
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7
      ; branch to %vector.body

  vector.body: (2nd iteration)
    ... ; using <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> as predicate in the loop
    ...
    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 16, i64 4)
      ; %active.lane.mask.next = <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
    %index.next = add i64 16, 16
      ; %index.next = 32
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
      ; %8 = 0
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7
      ; branch to %for.cond.cleanup

Reviewed By: fhahn, david-arm

Differential Revision: https://reviews.llvm.org/D142109
2023-03-01 09:01:19 +00:00
Valery N Dmitriev
ec7154fe70 [SLP] Add banner argument to SLP costs debug printer method - NFC.
Removed unnecessary warning workaround.

Differential Revision: https://reviews.llvm.org/D144992
2023-02-28 11:22:49 -08:00
Alexey Bataev
1d6b5b66bb [SLP]Fix PR61050: Assertion `I->use_empty() && "trying to erase instruction with users."
When gathering the counter for the reused scalars, need to use reduced
value, not the original reduced value. Same values counter is gathered
for reduced values, not original ones.
2023-02-28 07:51:34 -08:00
Nikita Popov
4bc254c664 [LoopVectorize] Only fetch BFI if profile summary available
BlockFrequencyInfo should generally only be fetched in PGO builds
where a PSI profile summary is available. However, LoopVectorize
was fetching it unconditionally.

This results in a small compile-time improvement for non-PGO builds.

Differential Revision: https://reviews.llvm.org/D144953
2023-02-28 14:16:21 +01:00
sgokhale
4f9a5447c6 [LV] Reland "Update logic for calculating register usage due to invariants"
Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening
instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details).

An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check.

This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493

Differential Revision: https://reviews.llvm.org/D143422
2023-02-28 17:32:39 +05:30
sgokhale
3c8ddbde37 Revert "[LV] Update logic for calculating register usage due to invariants"
Observing test failure for llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll

This reverts commit d1628266946fdddb44bdad2b3ccf3cd5fc769f42.
2023-02-28 15:46:59 +05:30