2388 Commits

Author SHA1 Message Date
Sjoerd Meijer
f44ba25135 ExtractValue instruction costs
Instruction ExtractValue wasn't handled in
LoopVectorizationCostModel::getInstructionCost(). As a result, it was modeled
as a mul which is not really accurate. Since it is free (most of the times),
this now gets a cost of 0 using getInstructionCost.

This is a follow-up of D92208, that required changing this regression test.
In a follow up I will look at InsertValue which also isn't handled yet.

Differential Revision: https://reviews.llvm.org/D92317
2020-12-01 10:42:23 +00:00
Florian Hahn
fe83adb05a
[VPlan] Use VPUser to manage VPPredInstPHIRecipe operand (NFC).
VPPredInstPHIRecipe is one of the recipes that was missed during the
initial conversion. This patch adjusts the recipe to also manage its
operand using VPUser.
2020-11-30 13:09:58 +00:00
Sjoerd Meijer
5110ff0817 [AArch64][CostModel] Fix cost for mul <2 x i64>
This was modeled to have a cost of 1, but since we do not have a MUL.2d this is
scalarized into vector inserts/extracts and scalar muls.

Motivating precommitted test is test/Transforms/SLPVectorizer/AArch64/mul.ll,
which we don't want to SLP vectorize.

Test Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
unfortunately needed changing, but the reason is documented in
LoopVectorize.cpp:6855:

  // The cost of executing VF copies of the scalar instruction. This opcode
  // is unknown. Assume that it is the same as 'mul'.

which I will address next as a follow up of this.

Differential Revision: https://reviews.llvm.org/D92208
2020-11-30 11:36:55 +00:00
Florian Hahn
4bc9b909d7
[VPlan] Use VPValue and VPUser ops to print VPReplicateRecipe. 2020-11-29 18:28:27 +00:00
David Green
d939ba4c68 [ARM] MVE qabs vectorization test. NFC 2020-11-27 12:21:11 +00:00
David Green
e0c479cd0e [VPlan] Switch VPWidenRecipe to be a VPValue
Similar to other patches, this makes VPWidenRecipe a VPValue. Because of
the way it interacts with the reduction code it also slightly alters the
way that VPValues are registered, removing the up front NeedDef and
using getOrAddVPValue to create them on-demand if needed instead.

Differential Revision: https://reviews.llvm.org/D88447
2020-11-25 08:25:06 +00:00
David Green
00a6601136 [VPlan] Turn VPReductionRecipe into a VPValue
This converts the VPReductionRecipe into a VPValue, like other
VPRecipe's in preparation for traversing def-use chains. It also makes
it a VPUser, now storing the used VPValues as operands.

It doesn't yet change how the VPReductionRecipes are created. It will
need to call replaceAllUsesWith from the original recipe they replace,
but that is not done yet as VPWidenRecipe need to be created first.

Differential Revision: https://reviews.llvm.org/D88382
2020-11-25 08:25:05 +00:00
Ayal Zaks
32d9a386bf [LV] Keep Primary Induction alive when folding tail by masking
Fix PR47390.

The primary induction should be considered alive when folding tail by masking,
because it will be used by said masking; even when it may otherwise appear
useless: feeding only its own 'bump', which is correctly considered dead, and
as the 'bump' of another induction variable, which may wrongfully want to
consider its bump = the primary induction, dead.

Differential Revision: https://reviews.llvm.org/D92017
2020-11-24 15:12:54 +02:00
Yichao Yu
4bc88a0e9a Enable support for floating-point division reductions
Similar to fsub, fdiv can also be vectorized using fmul.

Also http://llvm.org/viewvc/llvm-project?view=revision&revision=215200

Differential Revision: https://reviews.llvm.org/D34078

Co-authored-by: Jameson Nash <jameson@juliacomputing.com>
2020-11-23 20:00:58 -05:00
Philip Reames
d6239b3ea6 [test] pre-comit test for D91451 2020-11-23 15:36:08 -08:00
Philip Reames
b06a2ad94f [LoopVectorizer] Lower uniform loads as a single load (instead of relying on CSE)
A uniform load is one which loads from a uniform address across all lanes. As currently implemented, we cost model such loads as if we did a single scalar load + a broadcast, but the actual lowering replicates the load once per lane.

This change tweaks the lowering to use the REPLICATE strategy by marking such loads (and the computation leading to their memory operand) as uniform after vectorization. This is a useful change in itself, but it's real purpose is to pave the way for a following change which will generalize our uniformity logic.

In review discussion, there was an issue raised with coupling cost modeling with the lowering strategy for uniform inputs.  The discussion on that item remains unsettled and is pending larger architectural discussion.  We decided to move forward with this patch as is, and revise as warranted once the bigger picture design questions are settled.

Differential Revision: https://reviews.llvm.org/D91398
2020-11-23 15:32:17 -08:00
Sanjay Patel
e32bd35120 [CostModel] mostly remove cost-kind predicate for intrinsics in basic TTI implementation
This is re-applying a combination of f7eac51b9b3f and 8ec7ea3ddce7 as one patch
to avoid regressions now that we have better testing in place.

Those were reverted with 32dd5870ee31 because of crashing in experimental intrinsics.
That bug should be fixed with 7ae346434.

Paraphrased original commit messages:

This is the last step in removing cost-kind as a consideration in the
basic class model for intrinsics.
See D89461 for the start of that.
Subsequent commits dealt with each of the special-case intrinsics that
had customization here in the basic class. This should remove a barrier
to retrying D87188 (canonicalization to the abs intrinsic).

The ARM and x86 cost diffs seen here may be wrong because the
target-specific overrides have their own bugs, but we hope this is
less wrong - if something has a significant throughput cost, then it
should have a significant size / blended cost too by default.

The only behavioral diff in current regression tests is shown in the
x86 scatter-gather test (which is misplaced or broken because it runs
the entire -O3 pipeline) - we unrolled less, and we assume that is
a improvement.

Exception: in general, we want the *size* cost for a scalar call to be
cheap even if the other costs are expensive - we expect it to just be
a branch with some optional stack manipulation.

It is likely that we will want to carve out some
exceptions/overrides to this rule as follow-up patches for
calls that have some general and/or target-specific difference
to the expected lowering.

This was noticed as a regression in unrolling, so we have a test
for that now along with a couple of direct cost model tests.

If the assumed scalarization costs for the oversized vector
calls are not realistic, that would be another follow-up
refinement of the cost models.

Differential Revision: https://reviews.llvm.org/D90554
2020-11-20 11:21:10 -05:00
Eric Christopher
32dd5870ee Temporarily Revert "[CostModel] remove cost-kind predicate for intrinsics in basic TTI implementation"
as it's causing crashes in the optimizer. A reduced testcase has been posted as a follow-up.

This reverts commit f7eac51b9b3f780c96ca41913293851c5acb465b.

Temporarily Revert "[CostModel] make default size cost for libcalls small (again)" as it depends upon the primary revert.

This reverts commit 8ec7ea3ddce7379e13e8dfb4a5260a6d2004aa1c.

Temporarily Revert "[CostModel] add tests for math library calls; NFC" as it depends upon the primary revert.

This reverts commit df09f825995b10da03f148133c119f52c94fd6e4.

Temporarily Revert "[LoopUnroll] add test for full unroll that is sensitive to cost-model; NFC" as it depends upon the primary revert.

This reverts commit 618d555e8d926a83161774df2035519c387269db.
2020-11-19 22:10:23 -08:00
Sanjay Patel
4e68bc0999 Revert "[InstCombine] add multi-use demanded bits fold for add with low-bit mask"
This reverts commit e56103d25016c9ce4e98f652ac1a09379793ccf5.
There is a stage2 msan failure blamed on this commit:
http://lab.llvm.org:8011/#/builders/74/builds/888/steps/9/logs/stdio
2020-11-16 14:48:09 -05:00
Sanjay Patel
e56103d250 [InstCombine] add multi-use demanded bits fold for add with low-bit mask
I noticed an add example like the one from D91343, so here's a similar patch.
The logic is based on existing code for the single-use demanded bits fold.
But I only matched a constant instead of using compute known bits on the
operands because that was the motivating patterni that I noticed.

I think this will allow removing a special-case (but incomplete) dedicated
fold within visitAnd(), but I need to untangle the existing code to be sure.

https://rise4fun.com/Alive/V6fP

  Name: add with low mask
  Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %r = and i8 %x, C2

Differential Revision: https://reviews.llvm.org/D91415
2020-11-15 15:09:49 -05:00
Florian Hahn
0c119ba8a8 [VPlan] Use VPValue def for VPWidenGEPRecipe.
This patch turns VPWidenGEPRecipe into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84683
2020-11-15 15:12:47 +00:00
Florian Hahn
a70b511e78 Recommit "[VPlan] Use VPValue def for VPWidenSelectRecipe."
This reverts the revert commit c8d73d939fa4fda9c87b3979225d02d63062bd68.

It includes a fix for cases where we missed inserting VPValues
for some selects, which should fix PR48142.
2020-11-14 20:00:25 +00:00
Philip Reames
d4e81cd9dd [Tests][LoopVect] Exercise basic uniform memory operand logic 2020-11-12 20:34:31 -08:00
Sanjay Patel
9e0c35655b [LoopVectorize] regenerate test checks; NFC 2020-11-12 17:15:46 -05:00
Florian Hahn
c8d73d939f Revert "[VPlan] Use VPValue def for VPWidenSelectRecipe."
This reverts commit a8e50f1c6e7b404aab8fedb972f003a4d6a6434e.

This reportedly breaks building the Linux kernel.
  https://bugs.llvm.org/show_bug.cgi?id=48142
2020-11-10 22:50:46 +00:00
Florian Hahn
a8e50f1c6e
[VPlan] Use VPValue def for VPWidenSelectRecipe.
This patch turns VPWidenSelectRecipe into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84682
2020-11-10 19:39:37 +00:00
Sanjay Patel
f7eac51b9b [CostModel] remove cost-kind predicate for intrinsics in basic TTI implementation
This is the last step in removing cost-kind as a consideration in the basic class model for intrinsics.
See D89461 for the start of that.
Subsequent commits dealt with each of the special-case intrinsics that had customization here in the
basic class. This should remove a barrier to retrying
D87188 (canonicalization to the abs intrinsic).

The ARM and x86 cost diffs seen here may be wrong because the target-specific overrides have their own
bugs, but we hope this is less wrong - if something has a significant throughput cost, then it should
have a significant size / blended cost too by default.

The only behavioral diff in current regression tests is shown in the x86 scatter-gather test (which is
misplaced or broken because it runs the entire -O3 pipeline) - we unrolled less, and we assume that is
a improvement.

Differential Revision: https://reviews.llvm.org/D90554
2020-11-10 08:19:31 -05:00
Joe Ellis
462dd4f803 [SVE][AArch64] Improve specificity of vectorization legality TypeSize test
The test was using -O2, where -loop-vectorize will suffice.

Reviewed By: fpetrogalli

Differential Revision: https://reviews.llvm.org/D90685
2020-11-10 10:55:25 +00:00
Florian Hahn
f0d76275cb
[VPlan] Print result value for loads in VPWidenMemoryInst (NFC).
For loads, print the result value.
2020-11-09 14:01:29 +00:00
Florian Hahn
fec64de261
[VPlan] Use VPValue def for VPWidenCall.
This patch turns VPWidenCall into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84681
2020-11-09 13:29:41 +00:00
Simon Pilgrim
28fc173819 [LoopVectorize] Remove unused check-prefixes 2020-11-09 12:18:20 +00:00
Simon Pilgrim
8a34e30d33 [LoopVectorize][AMDGPU] Regenerate packed-math test checks 2020-11-09 12:18:20 +00:00
Simon Moll
d3b33a7810 [VE][TTI] don't advertise vregs/vops
Claim to not have any vector support to dissuade SLP, LV and friends
from generating SIMD IR for the VE target.  We will take this back once
vector isel is stable.

Reviewed By: kaz7, fhahn

Differential Revision: https://reviews.llvm.org/D90462
2020-11-06 11:12:10 +01:00
Simon Pilgrim
f03be9df37 [LV][X86] Regenerate gather_scatter tests. NFCI.
Reduce diff in D90554
2020-11-02 11:57:37 +00:00
Arthur Eubanks
5c31b8b94f Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit 10f2a0d662d8d72eaac48d3e9b31ca8dc90df5a4.

More uint64_t overflows.
2020-10-31 00:25:32 -07:00
Arthur Eubanks
10f2a0d662 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-30 10:03:46 -07:00
Nikita Popov
20b386aae0 [LoopUtils] Fix neutral value for vector.reduce.fadd
Use -0.0 instead of 0.0 as the start value. The previous use of 0.0
was fine for all existing uses of this function though, as it is
always generated with fast flags right now, and thus nsz.
2020-10-29 21:45:13 +01:00
Philip Reames
4e4abd16a7 [Deref] Use maximum trip count instead of exact trip count
When trying to prove that a memory access touches only dereferenceable memory across all iterations of a loop, use the maximum exit count rather than an exact one.  In many cases we can't prove exact exit counts whereas we can prove an upper bound.

The test included is for a single exit loop with a min(C,V) exit count, but the true motivation is support for multiple exits loops.  It's just really hard to write a test case for multiple exits because the vectorizer (the primary user of this API), bails far before this.  For multiple exits, this allows a mix of analyzeable and unanalyzable exits when only analyzeable exits are needed to prove deref.
2020-10-28 14:33:30 -07:00
Nico Weber
2a4e704c92 Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit e5766f25c62c185632e3a75bf45b313eadab774b.
Makes clang assert when building Chromium, see https://crbug.com/1142813
for a repro.
2020-10-27 09:26:21 -04:00
Florian Hahn
f067bc3c0a [LoopRotation] Allow loop header duplication if vectorization is forced.
-Oz normally does not allow loop header duplication so this loop wouldn't be
vectorized.  However the vectorization pragma should override this and allow
for loop rotation.

rdar://problem/49281061

Original patch by Adam Nemet.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D59832
2020-10-27 09:28:01 +00:00
Arthur Eubanks
e5766f25c6 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-26 20:24:04 -07:00
Joe Ellis
467e5cf40f [SVE][AArch64] Fix TypeSize warning in loop vectorization legality
The warning would fire when calling isDereferenceableAndAlignedInLoop
with a scalable load. Calling isDereferenceableAndAlignedInLoop with a
scalable load would result in the use of the now deprecated implicit
cast of TypeSize to uint64_t through the overloaded operator.

This patch fixes this issue by:

- no longer considering vector loads as candidates in
  canVectorizeWithIfConvert. This doesn't make sense in the context of
  identifying scalar loads to vectorize.

- making use of getFixedSize inside isDereferenceableAndAlignedInLoop --
  this removes the dependency on the deprecated interface, and will
  trigger an assertion error if the function is ever called with a
  scalable type.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D89798
2020-10-26 17:40:04 +00:00
Florian Hahn
1747aae9fc [LV] Add cost-model test for AArch64 select costs.
Currently, the cost of some compare/select patterns is overestimated on
AArch64.
2020-10-26 13:43:31 +00:00
Nikita Popov
0dda633317 [SCEV] Strength nowrap flags after constant folding
We should first try to constant fold the add expression and only
strengthen nowrap flags afterwards. This allows us to determine
stronger flags if e.g. only two operands are left after constant
folding (and thus "guaranteed no wrap region" code applies) or the
resulting operands are non-negative and thus nsw->nuw strengthening
applies.
2020-10-25 18:00:22 +01:00
Venkataramanan Kumar
57cdc52c4d Initial support for vectorization using Libmvec (GLIBC vector math library)
Differential Revision: https://reviews.llvm.org/D88154
2020-10-22 16:01:39 -04:00
Arthur Eubanks
c76968d8b6 [test][NPM] Fix already-vectorized.ll under NPM
The NPM runs SpeculateAroundPHIs which breaks critical edges, causing a
branch we check for to not directly jump back to the same block.
2020-10-19 13:11:13 -07:00
Arthur Eubanks
fce64578bc [NPM][test] Fix some LoopVectorize tests under NPM 2020-10-19 12:05:37 -07:00
Arthur Eubanks
65e5006962 [NPM][opt] Run -O# after other passes in legacy PM compatibility mode
Generally tests run -O# before other passes, not after.
2020-10-19 11:48:44 -07:00
Evgeniy Brevnov
d0c95808e5 [LV] Unroll factor is expected to be > 0
LV fails with assertion checking that UF > 0. We already set UF to 1 if it is 0 except the case when IC > MaxInterleaveCount. The fix is to set UF to 1 for that case as well.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D87679
2020-10-14 16:48:17 +07:00
David Green
be6e8e50f4 [LV] Tail folded inloop reductions.
This expands upon the inloop reductions added in e9761688e41cb9e976,
allowing them to be inserted into tail folded loops. Reductions are
generates with the form:

  x = select(mask, vecop, zero)
  v = vecreduce.add(x)
  c = add chain, v

Where zero here is chosen as the identity value for add reductions. The
backend is then expected to fold the select and the vecreduce into a
single predicated instruction.

Most of the code is fairly straight forward, except for the creation of
blockmasks which need to ensure they are created in dominance order. The
order they are added is altered to be after any phis, keeping the
requirements for the underlying IR.

Differential Revision: https://reviews.llvm.org/D84451
2020-10-11 16:58:34 +01:00
David Green
8f2cacae67 [LV] Extra predicated inloop reduction tests. NFC 2020-10-11 15:06:21 +01:00
David Green
6d8eea61b1 [AArch64][LV] Move vectorizer test to Transforms/LoopVectorize/AArch64. NFC 2020-10-10 10:15:43 +01:00
David Green
498f89d188 [LV] Collect dead induction truncates
We currently collect the ICmp and Add from an induction variable,
marking them as dead so that vplan values are not created for them. This
extends that to include any single use trunk from the ICmp, which allows
the Add to more readily be removed too.

This can help with costing vplan nodes, as the ICmp and Add are more
reliably removed and are not double-counted.

Differential Revision: https://reviews.llvm.org/D88873
2020-10-08 08:28:58 +01:00
Florian Hahn
a73166a452 [LAA] Use DL to get element size for bound computation.
Currently LAA uses getScalarSizeInBits to compute the size of an element
when computing the end bound of an access.

This does not work as expected for pointers to pointers, because
getScalarSizeInBits will return 0 for pointer types.

By using DataLayout to get the size of the element we can also correctly
handle pointer element types.

Note the changes to the existing test, which seems to also use the wrong
offset for the end.

Fixes PR47751.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D88953
2020-10-07 18:57:07 +01:00
Amara Emerson
322d0afd87 [llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle legacy intrinsics.

Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

Differential Revision: https://reviews.llvm.org/D88787
2020-10-07 10:36:44 -07:00