1808 Commits

Author SHA1 Message Date
Sander de Smalen
c41b41eb11 [LoopVectorize] Use overflow-check analysis to improve tail-folding.
This work follows on from D142109 and addresses a possible regression
when we know the loop iteration counter cannot overflow.

When we know the overflow-check always evaluates to false, it's better to
use the other style of tail folding where it assumes a runtime check was
added, because that avoids having to calculate a modified trip-count.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D142894
2023-03-01 14:17:58 +00:00
Sander de Smalen
fe1b51ffee [LoopVectorize] Remove runtime check and scalar tail loop when tail-folding.
When using tail-folding and using the predicate for both data and control-flow
(the next vector iteration's predicate is generated with the llvm.active.lane.mask
intrinsic and then tested for the backedge), the LoopVectorizer still inserts a
runtime check to see if the 'i + VF' may at any point overflow for the given
trip-count. When it does, it falls back to a scalar epilogue loop.

We can get rid of that runtime check in the pre-header and therefore also
remove the scalar epilogue loop. This reduces code-size and avoids a runtime
check.

Consider the following loop:

  void foo(char * __restrict__ dst, char *src, unsigned long N) {
      for (unsigned long  i=0; i<N; ++i)
          dst[i] = src[i] + 42;
  }

If 'N' is e.g. ULONG_MAX, and the VF > 1, then the loop iteration counter
will overflow when calculating the predicate for the next vector iteration
at some point, because LLVM does:

  vector.ph:
    %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)

  vector.body:
    %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ]
    ...

    %index.next = add i64 %index, 16
      ; The add above may overflow, which would affect the lane mask and control flow. Hence a runtime check is needed.
    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index.next, i64 %N)
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7

The solution:

What we can do instead is calculate the predicate before incrementing
the loop iteration counter, such that the llvm.active.lane.mask is
calculated from 'i' to 'tripcount > VF ? tripcount - VF : 0', i.e.

  vector.ph:
    %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)
    %N_minus_VF = select %N > 16 ? %N - 16 : 0

  vector.body:
    %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ]
    ...

    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index, i64 %N_minus_VF)
    %index.next = add i64 %index, %4
      ; The add above may still overflow, but this time the active.lane.mask is not affected
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7

For N = 20, we'd then get:

  vector.ph:
    %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)
      ; %active.lane.mask.entry = <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1>
    %N_minus_VF = select 20 > 16 ? 20 - 16 : 0
      ; %N_minus_VF = 4

  vector.body: (1st iteration)
    ... ; using <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> as predicate in the loop
    ...
    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 4)
      ; %active.lane.mask.next = <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
    %index.next = add i64 0, 16
      ; %index.next = 16
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
      ; %8 = 1
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7
      ; branch to %vector.body

  vector.body: (2nd iteration)
    ... ; using <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> as predicate in the loop
    ...
    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 16, i64 4)
      ; %active.lane.mask.next = <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
    %index.next = add i64 16, 16
      ; %index.next = 32
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
      ; %8 = 0
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7
      ; branch to %for.cond.cleanup

Reviewed By: fhahn, david-arm

Differential Revision: https://reviews.llvm.org/D142109
2023-03-01 09:01:19 +00:00
Nikita Popov
4bc254c664 [LoopVectorize] Only fetch BFI if profile summary available
BlockFrequencyInfo should generally only be fetched in PGO builds
where a PSI profile summary is available. However, LoopVectorize
was fetching it unconditionally.

This results in a small compile-time improvement for non-PGO builds.

Differential Revision: https://reviews.llvm.org/D144953
2023-02-28 14:16:21 +01:00
sgokhale
4f9a5447c6 [LV] Reland "Update logic for calculating register usage due to invariants"
Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening
instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details).

An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check.

This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493

Differential Revision: https://reviews.llvm.org/D143422
2023-02-28 17:32:39 +05:30
sgokhale
3c8ddbde37 Revert "[LV] Update logic for calculating register usage due to invariants"
Observing test failure for llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll

This reverts commit d1628266946fdddb44bdad2b3ccf3cd5fc769f42.
2023-02-28 15:46:59 +05:30
sgokhale
d162826694 [LV] Update logic for calculating register usage due to invariants
Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening
instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details).

An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check.

This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493

Differential Revision: https://reviews.llvm.org/D143422
2023-02-28 11:05:26 +05:30
Sander de Smalen
9449deda12 [LV] NFC: Move logic to query maximum vscale to its own function.
To query the maximum value for vscale, the LV queries the vscale_range
attribute or a TTI hook. To avoid having to reimplement the same behaviour
for multiple uses (such as in D142894), it makes sense to move this code
to a separate function.
2023-02-23 15:12:35 +00:00
OCHyams
620a529760 [Assignment Tracking] Choose better passes for RemoveRedundantDbgInstrs call
Enabling assignment tracking without this patch, a significant amount of
additional compiler run time comes from the RemoveRedundantDbgInstrs call in
InstCombine. This patch reduces compiler run time by choosing better places to
call RemoveRedundantDbgInstrs.

In non-assignment-tracking builds, RemoveRedundantDbgInstrs is called by
InstCombine if LowerDbgDeclare makes a change (i.e. it is _sometimes_
called). In assignment tracking builds LowerDbgDeclare doesn't do anything. We
still need to clean up redundant intrinsics to avoid a large performance hit
due to the number of instructions, so the current approach is to have
InstCombine _always_ call RemoveRedundantDbgInstrs.

Instrumenting the compiler to run RemoveRedundantDbgInstrs after every pass and
dump the numbers and building CTMark/tramp3d-v4 indicates that SROA and
LoopVectorize give us a bigger bang (number removed) for buck (times pass is
run).

The compile time tracker reports that this patch reduces the number of
instructions retired building CTMark projects by an average of 1.1%.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D144483
2023-02-22 16:28:06 +00:00
Luke Lau
b02b1e0ed6 [LV][NFC] Use ElementCount for getMaxInterleaveFactor
In order to allow targets to disable interleaving for scalable vectors, pass the entire VF's ElementCount to getMaxInterleaveFactor.
This is based off of the approach used here: 8d36708507

The plan would then be to disable interleaving on scalable VFs on RISC-V in a follow up patch.
See https://reviews.llvm.org/D143723#4132349

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D144474
2023-02-22 10:15:05 +00:00
Liren Peng
529ee9750b [NFC] Use single quotes for single char output during printPipline
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144365
2023-02-22 02:35:13 +00:00
Florian Hahn
9333b97763
[VPlan] Replace AlsoPack field with shouldPack() method (NFC).
There is no need to update the AlsoPack field when creating
VPReplicateRecipes. It can be easily computed based on the VP def-use
chains when it is needed.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D143864
2023-02-20 10:28:26 +00:00
Florian Hahn
a3d1de3e29
[LV] Move invalid cost remark code to separate function (NFC).
The code only needs access to INvalidCosts, ORE and TheLoop, so it can
easily be moved into a helper to make selectVectorizationFactor more
compact.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D143957
2023-02-16 11:28:19 +00:00
Hongtao Yu
eddec9de44 [Pseudo probe] Duplicate probes in vectorized loop body.
Prevoius pseudo probes were dropped out of a vectorized loop body during loop vectorization. This can result in the samples of the loop entry is used for the loop body, which in turn can cause undercounting of the loop iteration count. The undercounting can further prevent the loop from being vectorized in the next build. I'm fixing this by explicting allowing pseudo probes to be kept in the vectorized loop body, and by claiming a probe instruction is not "uniform", the vectorizer will duplicate it by the number of vector lanes.

For one internal service, I'm seeing the change causes the size increase of the .pseudoprobe section by 0.7%, which should count around 0.2% of the whole binary size.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D144066
2023-02-15 10:18:08 -08:00
Graham Hunter
0fa5df1959 [LV] Synthesize all true masks for masked vector function variants
When vectorizing code with function calls in it, if we encounter
a function which only has vectorized variants requiring a mask
we can synthesize an all-true mask to enable us to proceed.

Since we want the mask to be represented in vplan, the pointer
to the chosen Function is now stored as part of the
VPWidenCallRecipe, and mask arguments are added at the
appropriate index to the recipe operands.

Reviewed By: david-arm, fhahn, reames

Differential Revision: https://reviews.llvm.org/D132458
2023-02-14 14:33:18 +00:00
Fangrui Song
1e6921131a Move global namespace cl::opt inside llvm:: 2023-02-14 00:09:44 -08:00
Florian Hahn
2e6430666c
[LV] Update recipe builder functions to pass VPlan directly (NFC).
Passing VPlanPtr requires a dereference of std::unique_ptr on each
access, which is unnecessary. Just pass the plan by reference.
2023-02-12 22:35:14 +00:00
Sander de Smalen
5a115452c4 Reland D143267: [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices.
Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for Part,
e.g. IndexTy was i64, but Part was 'unsigned', which led to things like 'mul i64 .., 4294967292',
which was obviously wrong.

Also changed sve-vector-reverse.ll to be vectorized with UF>1 to test this.

This reverts commit 1f01cdda68614dba12af3cc3aff38541d0abcc6b.
2023-02-09 09:42:29 +00:00
Florian Hahn
c83fdc905a
[LV] Perform recurrence sinking directly on VPlan.
This patch updates LV to sink recipes directly using the VPlan use
chains. The initial patch only moves sinking to be purely VPlan-based.
Follow-up patches will move legality checks to VPlan as well.

At the moment, there's a single test failure remaining.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142589
2023-02-08 15:49:29 +00:00
Sander de Smalen
1f01cdda68 Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices."
This patch causes a regression, so reverting it while I investigate the issue.

This reverts commit e6eb84a191ca2a1afd5789c5bb398da68bb6065e.
2023-02-08 15:46:52 +00:00
Sander de Smalen
e6eb84a191 [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices.
This is specifically relevant for loops that vectorize using a scalable VF,
where the code results in:

  %vscale = call i32 llvm.vscale.i32()
  %vf.part1 = mul i32 %vscale, 4
  %gep = getelementptr  ..., i32 %vf.part1

Which InstCombine then changes into:

  %vscale = call i32 llvm.vscale.i32()
  %vf.part1 = mul i32 %vscale, 4
  %vf.part1.zext = sext i32 %vf.part1 to i64
  %gep = getelementptr  ..., i32 %vf.part1.zext

D143016 tried to remove these extends, but that only works when
the call to llvm.vscale.i32() has a single use. After doing any
kind of CSE on these calls the combine no longer kicks in.

It seems more sensible to ask DataLayout what type to use, rather
than relying on InstCombine to insert the extend and hoping it can
fold it away.

I've only changed this for indices that are not constant, because
I vaguely remember there was a reason for sticking with i32. It
would also mean patching up loads more tests.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D143267
2023-02-07 11:47:51 +00:00
Sander de Smalen
005311399e [LoopVectorize][TTI] NFCI: Clarify enum for the tail folding style.
This NFC (intended) patch has several small changes:
* It renames PredicationStyle to TailFoldingStyle.
* It renames TTI.emitActiveLaneMask() to TTI.getPreferredTailFoldingStyle()
* Simplifies some of its uses in the LoopVectorizer

Rationale: To my surprise PredicationStyle::None did not mean 'no
predication', but rather 'no active lane mask intrinsic', such that the
predicate is created using a splat + compare with stepvector. The enum is
also highly specific to tail folding, so it seems better to name this
around that feature, i.e. 'tail folding style'.

This also makes it more amenable to extend it to other tail folding styles,
such as the one added in D142109.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D142887
2023-02-03 14:59:57 +00:00
Kazu Hirata
f20b5071f3 [llvm] Use llvm::bit_floor instead of llvm::PowerOf2Floor (NFC) 2023-01-28 09:06:31 -08:00
ShihPo Hung
5fb3a57ea7 [Cost] Add CostKind to getVectorInstrCost and its related users
LoopUnroll estimates the loop size via getInstructionCost(),
but getInstructionCost() cannot pass CostKind to getVectorInstrCost().
And so does getShuffleCost() to getBroadcastShuffleOverhead(),
getPermuteShuffleOverhead(), getExtractSubvectorOverhead(),
and getInsertSubvectorOverhead().

To address this, this patch adds an argument CostKind to these
functions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D142116
2023-01-21 05:29:24 -08:00
Florian Hahn
e2c43a547b
[VPlan] Add vp_depth_first_deep (NFC)
Similar to vp_depth_first_shallow (D140512) add vp_depth_first_deep to
make existing code clearer and more compact.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142055
2023-01-19 20:34:23 +00:00
Guillaume Chatelet
8fd5558b29 [NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:49:38 +00:00
Guillaume Chatelet
48f5d77eee [NFC] Use TypeSize::getKnownMinValue() instead of TypeSize::getKnownMinSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:36:39 +00:00
Miguel Saldivar
3c5e0d87f8
[LoopVectorize] Clear cache of LoopAccessInfoManager
LAI is cached during the LoopDistribute pass, and is later re-used during LoopVectorize. The problem is that LoopVectorize changes SCEV, and the cached LAI does not get updated. Hence, when re-using the cached LAI, it references an invalid SCEV.

Fixes #59319

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D139601
2023-01-11 09:03:40 +00:00
Benjamin Kramer
b6942a2880 [NFC] Hide implementation details in anonymous namespaces 2023-01-08 17:37:02 +01:00
Florian Hahn
68469a80cb
[LV] Disable runtime unrolling for vectorized loops.
This patch adds metadata to disable runtime unrolling to the vectorized
loop. If runtime unrolling/interleaving is considered profitable, LV
will interleave the loop directly. There should be no need to perform
runtime unrolling at a later stage.

Note that we already add metadata to disable runtime unrolling to the
scalar loop after vectorization.

The additional unrolling unnecessarily increases code size and compile
time. In addition to that we have several bug reports of unncessary
runtime unrolling for vectorized loops, e.g. PR40961

Compile-time improvements:

  NewPM-O3: -1.04%
  NewPM-ReleaseThinLTO: -0.59%
  NewPM-ReleaseLTO-g: -0.97%

https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u

Fixes #40306.

Reviewed By: lebedev.ri, nikic

Differential Revision: https://reviews.llvm.org/D115261
2023-01-06 10:56:17 +00:00
David Green
586fd86b0a [LoopVectorizer] Fix inloop reductions mask placement
The validation of vplans could fail if an inloop reduction was created
with a block-in mask that did not dominate the reduction. This makes
sure that the insert point is set when creating the mask, to ensure it
dominates the reduction.

Differential Revision: https://reviews.llvm.org/D141003
2023-01-05 11:37:37 +00:00
Florian Hahn
89718815c6
[VPlan] Adjust mergeReplicateRegions to be in line with mergeBlock (NFC)
Adjust mergeReplicateRegions to be in line with
mergeBlocksIntoPredecessors added in 36d70a6aea6b by collecting only the
valid candidates first.

Also rename to mergeReplicateRegionsIntoSuccessors and add missing
doc-comment.

This addresses post-commit suggestions by @Ayal.
2023-01-01 19:48:49 +00:00
Michael Maitland
396b0b2b13 [LV] Remove duplicate name set of vector header basic block. NFC
The preheader was named explicitly in 256c6b0ba14e8a7ab6373b61b7193ea8c0a3651c
which makes setting the name in prior commit 95b2aa511eea1f31e183a2a3aed4d2aa852d089c
unnecessary.

Differential Revision: https://reviews.llvm.org/D140246
2022-12-27 17:19:08 -08:00
Florian Hahn
e91e62db14
[LV] Sink scalar operands and merge regions repeatedly.
Merging regions can enable new sinking opportunities (e.g. if users of a
scalar value are moved from different VPBBs into the same VPBB). Sinking
in turn can also enable new merging opportunities (e.g. if a recipe
between to merge-able regions is moved.

To enable more sinking opportunities, repeat sinking & merging if
regions could be merged.

Also fix mergeReplicateRegions to return the correct Changed status.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D139788
2022-12-27 18:08:32 +00:00
Florian Hahn
36d70a6aea
[VPlan] Remove redundant blocks by merging them into predecessors.
Add and run VPlan transform to fold blocks with a single predecessor
into the predecessor. This remove redundant blocks and addresses a TODO
to replace special handling for the vector latch VPBB.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D139927
2022-12-26 22:47:09 +00:00
Florian Hahn
e1650c8d52
[LV] Move exit cond simplification to separate transform.
This sets the stage for D133017 by moving out the code that performs
VPlan based simplifications to a separate transform that takes the
chosen VF & UF as arguments.

The main advantage is that this transform runs before any changes to
the CFG are being made. This allows using SCEV without worrying about
making queries while the IR is in an incomplete state.

Note that this patch switches the reasoning to use SCEV, but still only
simplifies loops with constant trip counts. Using SCEV here is needed to
access the backedge taken count, because the trip count IR value has not
been created yet.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D135017
2022-12-23 12:51:21 +00:00
Florian Hahn
b7b1e5c96f
[LV] Assert that the executed plan contains selected VF & UF (NFC).
Add assertion to ensure the executed plan is valid for the selected VF
and UF.
2022-12-23 11:44:42 +00:00
Florian Hahn
96296922b6
[VPlan] Move VF and UF string generation to getName() (NFC).
The VFs and UFs may be more constrained as the plans are transformed
(e.g. see D135017 for an example).

To make sure the VFs/UFs included in the VPlan dump are accurate,
generate them when accessing a plan's name, rather than include them in
the name string set after initial construction.
2022-12-22 13:15:01 +00:00
Florian Hahn
a84064bcda
[LV] Add createTripCountSCEV helper (NFC).
Split off helper function in preparation for D135017.
2022-12-21 22:02:31 +00:00
Florian Hahn
7d8528dbf2
[LV] Move SCEV caching workaround to executePlan (NFC).
As suggested by @Ayal in D92132.

This avoids having to duplicate the workaround in multiple places.
2022-12-21 14:51:21 +00:00
Florian Hahn
f69ac9a22d
[LV] Support widened induction variables in epilogue vectorization.
Code generation now uses the start VPValue of induction recipes.

This makes it possible to adjust the start value of the epilogue
vector loop to use the 'resume' value of the main vector loop.

Fixes #59459.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D92132
2022-12-21 13:58:50 +00:00
Florian Hahn
41b45ce656
[LV] Remove unused AAResults argument (NFC).
AAResults is passed to LoopVectorizationLegality but no longer used.
Remove the dead code.
2022-12-19 20:37:47 +00:00
Fangrui Song
21c4dc7997 std::optional::value => operator*/operator->
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).

This fixes clang.
2022-12-17 00:42:05 +00:00
Kazu Hirata
6eb0b0a045 Don't include Optional.h
These files no longer use llvm::Optional.
2022-12-14 21:16:22 -08:00
Fangrui Song
1ec11d2d48 [Transforms/Vectorize] llvm::Optional => std::optional 2022-12-12 08:56:35 +00:00
Fangrui Song
c178ed33bd Transforms/Utils: llvm::Optional => std::optional 2022-12-12 08:29:05 +00:00
Kazu Hirata
f7dffc28b3 Don't include None.h (NFC)
I've converted all known uses of None to std::nullopt, so we no longer
need to include None.h.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-10 11:24:26 -08:00
Philip Reames
b0f904b6da [LV] Account for minimum vscale when rejecting scalable vectorization of short loops
The vectorizer has code to reject scalable vectorization of loops with very short trip counts, and instead use fixed length vectors. The current code doesn't account for the minimum vscale value known, and thus under estimates the number of lanes in the scalable type for RISCV's default configuration. This results in use of predication and a trivially dead loop where a single straight line piece of code would suffice.

Note that the code quality of the original scalable vectorization could (and probably should) be improved other ways as well. This patch is solely about whether the scalable vectorization was the right choice to begin with.

This bit of code - both with and without my change - does make the unchecked assumption that the target knows how to lower fixed length vectors whose length is provably less than the vector length.

Differential Revision: https://reviews.llvm.org/D137285
2022-12-09 11:29:41 -08:00
Kazu Hirata
9f252e5567 [llvm] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 17:31:17 -08:00
Florian Hahn
37809c867a
[VPlan] Support sinking VPScalarIVStepsRecipe.
This patch extends VP-based sinking to also sink VPScalarStepsRecipe.
This takes us a step closer towards retiring the IR based sinking.

The main change is extending VPScalarIVStepsRecipe::execute to support
executing in a replicate-region.

Depends on D133758.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D133760
2022-12-04 22:59:17 +00:00
Kazu Hirata
343de6856e [Transforms] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 21:11:37 -08:00