828 Commits

Author SHA1 Message Date
Yevgeny Rouban
f7ef26ef0b [SLP] Fix crash in reduction for integer min/max
The SCEV commit b46c085d2b6d1 [NFCI] SCEVExpander:
    emit intrinsics for integral {u,s}{min,max} SCEV expressions
seems to reveal a new crash in SLPVectorizer.
SLP crashes expecting a SelectInst as an externally used value
but umin() call is found.

The patch relaxes the assumption to make the IR flag propagation safe.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D99328
2021-03-25 21:44:21 +07:00
Alexey Bataev
568c874117 [SLP]Improve and simplify extendSchedulingRegion.
We do not need to scan further if the upper end or lower end of the
basic block is reached already and the instruction is not found. It
means that the instruction is definitely in the lower part of basic
block or in the upper block relatively.
This should improve compile time for the very big basic blocks.

Differential Revision: https://reviews.llvm.org/D99266
2021-03-25 05:31:58 -07:00
Sander de Smalen
55d18b3cc2 [TTI] Return a TypeSize from getRegisterBitWidth.
This patch changes the interface to take a RegisterKind, to indicate
whether the register bitwidth of a scalar register, fixed-width vector
register, or scalable vector register must be returned.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D98874
2021-03-24 14:45:13 +00:00
Alexey Bataev
99203f2004 [Analysis]Add getPointersDiff function to improve compile time.
Added getPointersDiff function to LoopAccessAnalysis and used it instead
direct calculatoin of the distance between pointers and/or
isConsecutiveAccess function in SLP vectorizer to improve compile time
and detection of stores consecutive chains.

Part of D57059

Differential Revision: https://reviews.llvm.org/D98967
2021-03-23 14:25:36 -07:00
Alexey Bataev
f1b47ad278 Revert "[Analysis]Add getPointersDiff function to improve compile time."
This reverts commit 065a14a12d2694f26f4e894641f5ab8cfc5da8bd to
investigate and fix crash in SLP vectorizer.
2021-03-23 13:17:54 -07:00
Alexey Bataev
065a14a12d [Analysis]Add getPointersDiff function to improve compile time.
Added getPointersDiff function to LoopAccessAnalysis and used it instead
direct calculatoin of the distance between pointers and/or
isConsecutiveAccess function in SLP vectorizer to improve compile time
and detection of stores consecutive chains.

Part of D57059

Differential Revision: https://reviews.llvm.org/D98967
2021-03-23 12:58:42 -07:00
Sanjay Patel
3c8473ba53 [SLP] allow matching integer min/max intrinsics as reduction ops
As noted in D98152, we need to patch SLP to avoid regressions when
we start canonicalizing to integer min/max intrinsics.
Most of the real work to make this possible was in:
7202f47508

Differential Revision: https://reviews.llvm.org/D98981
2021-03-23 08:56:44 -04:00
Bjorn Pettersson
688cdddafb [SLP] Honor min/max regsize and min/max VF in vectorizeStores
Make sure we use PowerOf2Floor instead of PowerOf2Ceil when
calculating max number of elements that fits inside a vector
register (otherwise we could end up creating vectors larger
than the maximum vector register size).

Also make sure we honor the min/max VF (as given by TTI or
cmd line parameters) when doing vectorizeStores.

Reviewed By: anton-afanasyev

Differential Revision: https://reviews.llvm.org/D97691
2021-03-22 17:29:35 +01:00
Alexey Bataev
b3ced9852c [SLP]Fix crash on extending scheduling region.
If SLP vectorizer tries to extend the scheduling region and runs out of
the budget too early, but still extends the region to the new ending
instructions (i.e., it was able to extend the region for the first
instruction in the bundle, but not for the second), the compiler need to
recalculate dependecies in full, just like if the extending was
successfull. Without it, the schedule data chunks may end up with the
wrong number of (unscheduled) dependecies and it may end up with the
incorrect function, where the vectorized instruction does not dominate
on the extractelement instruction.

Differential Revision: https://reviews.llvm.org/D98531
2021-03-18 06:11:08 -07:00
David Green
e2935dcfc4 [TTI] Add a Mask to getShuffleCost
This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask
can be provided a more accurate cost can be provided by the backend.
For example VREV costs could be returned by the ARM backend. This should
be an NFC until then, laying the groundwork for that to be added.

Differential Revision: https://reviews.llvm.org/D98206
2021-03-17 17:46:26 +00:00
Bu Le
9abe500473 [SLP] Fix the trunc instruction insertion problem
Current SLP pass has this piece of code that inserts a trunc instruction
after the vectorized instruction. In the case that the vectorized instruction
is a phi node and not the last phi node in the BB, the trunc instruction
will be inserted between two phi nodes, which will trigger verify problem
in debug version or unpredictable error in another pass.
This patch changes the algorithm to 'if the last vectorized instruction
is a phi, insert it after the last phi node in current BB' to fix this problem.
2021-03-17 13:51:08 +03:00
Sanjay Patel
7202f47508 [SLP] separate min/max matching from its instruction-level implementation; NFC
The motivation is to handle integer min/max reductions independently
of whether they are in the current cmp+sel form or the planned intrinsic
form.

We assumed that min/max included a select instruction, but we can
decouple that implementation detail by checking the instructions
themselves rather than relying on the recurrence (reduction) type.
2021-03-16 17:16:11 -04:00
Sanjay Patel
40fdb43d30 [SLP] improve readability in reduction logic; NFC
We had 2 different and ambiguously-named 'I' variables.
2021-03-16 07:35:13 -04:00
Valery N Dmitriev
73f94969b2 [SLP] Fix crash when matching associative reduction for integer min/max.
Associative reduction matcher in SLP begins with select instruction but when
it reached call to llvm.umax (or alike) via def-use chain the latter also matched
as UMax kind. The routine's later code assumes matched instruction to be a select
and thus it merely died on the first encountered cast that did not fit.

Differential Revision: https://reviews.llvm.org/D98432
2021-03-11 11:52:57 -08:00
Sanjay Patel
23fd647cc6 [SLP] remove dead null check; NFC
We cast<> to Instruction (not dyn_cast<>), so we already
required/assumed that Cmp is not null.
2021-03-09 17:43:07 -05:00
Alexey Bataev
a054e94e9e [SLP]Merge reorder and reuse shuffles.
It is possible to merge reuse and reorder shuffles and reduce the total
cost of the vectorization tree/number of final instructions.

Differential Revision: https://reviews.llvm.org/D94992
2021-03-02 06:39:47 -08:00
David Green
bd4b61efbd [CostModel] Remove VF from IntrinsicCostAttributes
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf36af5c88c working. ARM removed the fix in
dfac521da1b90db683, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291
2021-02-23 13:03:26 +00:00
Alexey Bataev
9a4dd4de9d [SLP]No need to mark scatter load pointer as scalar as it gets vectorized.
Pointer operand of scatter loads does not remain scalar in the tree (it
gest vectorized) and thus must not be marked as the scalar that remains
scalar in vectorized form.

Differential Revision: https://reviews.llvm.org/D96818
2021-02-22 11:58:28 -08:00
Jinsong Ji
9202806241 Revert "[CostModel] Remove VF from IntrinsicCostAttributes"
This reverts commit 502a67dd7f23901834e05071ab253889f671b5d9.

This expose a failure in test-suite build on PowerPC,
revert to unblock buildbot first,
Dave will re-commit in https://reviews.llvm.org/D96287.

Thanks Dave.
2021-02-09 02:14:14 +00:00
David Green
502a67dd7f [CostModel] Remove VF from IntrinsicCostAttributes
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf36af5c88c working. ARM removed the fix in
dfac521da1b90db683, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291
2021-02-05 09:34:24 +00:00
Sander de Smalen
171d12489f [SLPVectorizer] NFC: Migrate getVectorCallCosts to use InstructionCost.
This change also changes getReductionCost to return InstructionCost,
and it simplifies two expressions by removing a redundant 'isValid' check.
2021-01-25 12:27:01 +00:00
Sanjay Patel
77adbe6a8c [SLP] fix fast-math requirements for fmin/fmax reductions
a6f0221276 enabled intersection of FMF on reduction instructions,
so it is safe to ease the check here.

There is still some room to improve here - it looks like we
have nearly duplicate flags propagation logic inside of the
LoopUtils helper but it is limited targets that do not form
reduction intrinsics (they form the shuffle expansion).
2021-01-24 08:55:56 -05:00
Kazu Hirata
1238378f18 [llvm] Use pop_back_val (NFC) 2021-01-23 10:56:33 -08:00
Sanjay Patel
a6f0221276 [SLP] fix fast-math-flag propagation on FP reductions
As shown in the test diffs, we could miscompile by
propagating flags that did not exist in the original
code.

The flags required for fmin/fmax reductions will be
fixed in a follow-up patch.
2021-01-23 11:17:20 -05:00
Anton Rapetov
a4914dc1f2 [SLP] do not traverse constant uses
Walking the use list of a Constant (particularly, ConstantData)
is not scalable, since a given constant may be used by many
instructinos in many functions in many modules.

Differential Revision: https://reviews.llvm.org/D94713
2021-01-22 08:14:09 -05:00
Sanjay Patel
2f03528f5e [SLP] rename reduction variable to avoid shadowing; NFC
The code structure can likely be improved now that
'OperationData' is gone.
2021-01-21 16:02:38 -05:00
Sanjay Patel
d77753381f [SLP] simplify reduction matching
This is NFC-intended and removes the "OperationData"
class which had become nothing more than a recurrence
(reduction) type.

I adjusted the matching logic to distinguish
instructions from non-instructions - that's all that
the "IsLeafValue" member was keeping track of.
2021-01-21 14:58:57 -05:00
Sanjay Patel
c09be0d2a0 [SLP] reduce reduction code for checking vectorizable ops; NFC
This is another step towards removing `OperationData` and
fixing FMF matching/propagation bugs when forming reductions.
2021-01-20 11:14:48 -05:00
Sanjay Patel
1c54112a57 [SLP] refactor more reduction functions; NFC
We were able to remove almost all of the state from
OperationData, so these don't make sense as members
of that class - just pass the RecurKind in as a param.

More streamlining is possible, but I'm trying to avoid
logic/typo bugs while fixing this. Eventually, we should
not need the `OperationData` class.
2021-01-20 11:14:48 -05:00
Sanjay Patel
8590d24543 [SLP] move reduction createOp functions; NFC
We were able to remove almost all of the state from
OperationData, so these don't make sense as members
of that class - just pass the RecurKind in as a param.
2021-01-20 11:14:48 -05:00
Alexey Bataev
e463bd53c0 Revert "[SLP]Merge reorder and reuse shuffles."
This reverts commit 438682de6a38ac97f89fa38faf5c8dc9b09cd9ad to fix the
bug with the reducing size of the resulting vector for the entry node
with multiple users.
2021-01-19 11:48:04 -08:00
Sanjay Patel
5b77ac32b1 [SLP] match maxnum/minnum intrinsics as FP reduction ops
After much refactoring over the last 2 weeks to the reduction
matching code, I think this change is finally ready.

We effectively broke fmax/fmin vector reduction optimization
when we started canonicalizing to intrinsics in instcombine,
so this should restore that functionality for SLP.

There are still FMF problems here as noted in the code comments,
but we should be avoiding miscompiles on those for fmax/fmin by
restricting to full 'fast' ops (negative tests are included).

Fixing FMF propagation is a planned follow-up.

Differential Revision: https://reviews.llvm.org/D94913
2021-01-18 17:37:16 -05:00
Sanjay Patel
3dbbadb8ef [SLP] rename reduction query for min/max ops; NFC
This will avoid confusion once we start matching
min/max intrinsics. All of these hacks to accomodate
cmp+sel idioms should disappear once we canonicalize
to min/max intrinsics.
2021-01-18 09:32:57 -05:00
Sanjay Patel
d1c4e859ce [SLP] reduce opcode API dependency in reduction cost calc; NFC
The icmp opcode is now hard-coded in the cost model call.
This will make it easier to eventually remove all opcode
queries for min/max patterns as we transition to intrinsics.
2021-01-18 09:32:57 -05:00
Sanjay Patel
49b96cd9ef [SLP] remove opcode field from reduction data class
This is NFC-intended and another step towards supporting
intrinsics as reduction candidates.

The remaining bits of the OperationData class do not make
much sense as-is, so I will try to improve that, but I'm
trying to take minimal steps because it's still not clear
how this was intended to work.
2021-01-16 13:55:52 -05:00
Sanjay Patel
fcfcc3cc6b [SLP] fix typos; NFC 2021-01-16 13:55:52 -05:00
Sanjay Patel
48dbac5b6b [SLP] remove unnecessary use of 'OperationData'
This is another NFC-intended patch to allow matching
intrinsics (example: maxnum) as candidates for reductions.

It's possible that the loop/if logic can be reduced now,
but it's still difficult to understand how this all works.
2021-01-16 13:55:52 -05:00
Sanjay Patel
ceb3cdccd0 [SLP] remove dead code in reduction matching; NFC
To get into this block we had: !A || B || C
and we checked C in the first 'if' clause
leaving !A || B. But the 2nd 'if' is checking:
A && !B --> !(!A || B)
2021-01-15 17:03:26 -05:00
Sanjay Patel
1f21de535d [SLP] remove unused reduction functions; NFC
These were made obsolete by simplifying the code in recent patches.
2021-01-15 14:59:33 -05:00
Sanjay Patel
b21905dfe3 [SLP] remove unnecessary state in matching reductions
This is NFC-intended. I'm still trying to figure out
how the loop where this is used works. It does not
seem like we require this data at all, but it's
hard to confirm given the complicated predicates.
2021-01-14 18:32:37 -05:00
Bjorn Pettersson
d58512b2e3 [SLP] Don't vectorize stores of non-packed types (like i1, i2)
In the spirit of commit fc783e91e0c0696e (llvm-svn: 248943) we
shouldn't vectorize stores of non-packed types (i.e. types that
has padding between consecutive variables in a scalar layout,
but being packed in a vector layout).

The problem was detected as a miscompile in a downstream test case.

Reviewed By: anton-afanasyev

Differential Revision: https://reviews.llvm.org/D94446
2021-01-14 11:30:33 +01:00
Sanjay Patel
123674a816 [SLP] simplify type check for reductions
This is NFC-intended. The 'valid' call allows int/FP/pointers
for other parts of SLP. The difference here is that we can't
reduce pointers.
2021-01-13 13:30:46 -05:00
Sanjay Patel
9e7895a868 [SLP] reduce code duplication while processing reductions; NFC 2021-01-12 16:03:57 -05:00
Sanjay Patel
92fb5c49e8 [SLP] rename variable to improve readability; NFC
The OperationData in the 2nd block (visiting the operands)
is completely independent of the 1st block.
2021-01-12 16:03:57 -05:00
Sanjay Patel
554be30a42 [SLP] reduce code duplication in processing reductions; NFC 2021-01-12 16:03:57 -05:00
Sanjay Patel
46507a96fc [SLP] reduce code duplication while matching reductions; NFC 2021-01-12 16:03:57 -05:00
David Sherwood
b7ccaca537 [NFC] Remove min/max functions from InstructionCost
Removed the InstructionCost::min/max functions because it's
fine to use std::min/max instead.

Differential Revision: https://reviews.llvm.org/D94301
2021-01-11 09:00:12 +00:00
Sanjay Patel
3f09c77d33 [SLP] fix typo in assert
This snuck into 0aa75fb12faa , but I didn't catch it locally.
2021-01-10 13:15:04 -05:00
Sanjay Patel
0aa75fb12f [SLP] put verifyFunction call behind EXPENSIVE_CHECKS
A severe compile-time slowdown from this call is noted in:
https://llvm.org/PR48689
My naive fix was to put it under LLVM_DEBUG ( 267ff79 ),
but that's not limiting in the way we want.
This is a quick fix (or we could just remove the call completely
and rely on some later pass to discover potentially wrong IR?).
A bigger/better fix would be to improve/limit verifyFunction()
as noted in:
https://llvm.org/PR47712

Differential Revision: https://reviews.llvm.org/D94328
2021-01-10 12:32:21 -05:00
Alexander Belyaev
bcbdeafa9c Revert "[SLP]Need shrink the load vector after reordering."
This reverts commit 4284afdf9432f7d756f56b0ab21d69191adefa8d.

This changes computed values in fused_batchnorm_test_cpu.

Not equal to tolerance rtol=1e-06, atol=0.001
Mismatched value: a is different from b.
not close where = (array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1]), array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1]), array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
       1, 1]), array([0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3,
       4, 5]))
not close lhs = [-0.6636615  -0.9804948  -1.148275   -0.68193716 -0.8572368  -0.65046215
 -0.6993756  -1.2244141  -1.0938729  -0.50369143 -0.51830524 -0.738452
 -0.7214286  -0.48115745 -0.9380924  -0.9341769  -0.5916775  -1.2896856
 -0.7264182  -0.9746917  -0.783249   -0.7659018  -0.86214024 -0.47784212]
not close rhs = [ 0.44102234  0.12418899 -0.04359123  0.42274666  0.24744703  0.45422167
  0.40530816 -0.11973029  0.01081094  0.6009924   0.5863786   0.3662318
  0.38325527  0.62352633  0.1665914   0.1705069   0.5130063  -0.18500176
  0.37826565  0.12999213  0.3214348   0.338782    0.24254355  0.62684166]
not close dif = [1.1046839 1.1046838 1.1046838 1.1046839 1.1046839 1.1046839 1.1046838
 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046838
 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839
 1.1046839 1.1046838 1.1046838]
not close tol = [0.00100044 0.00100012 0.00100004 0.00100042 0.00100025 0.00100045
 0.00100041 0.00100012 0.00100001 0.0010006  0.00100059 0.00100037
 0.00100038 0.00100062 0.00100017 0.00100017 0.00100051 0.00100019
 0.00100038 0.00100013 0.00100032 0.00100034 0.00100024 0.00100063]
2021-01-08 14:42:26 +01:00