2138 Commits

Author SHA1 Message Date
Nicolai Hähnle
c0cdd22c72 Introduce CfgTraits abstraction
The CfgTraits abstraction simplfies writing algorithms that are
generic over the type of CFG, and enables writing such algorithms
as regular non-template code that operates on opaque references
to CFG blocks and values.

Implementations of CfgTraits provide operations on the concrete
CFG types, e.g. `IrCfgTraits::BlockRef` is `BasicBlock *`.

CfgInterface is an abstract base class which provides operations
on opaque types CfgBlockRef and CfgValueRef. Those opaque types
encapsulate a `void *`, but the meaning depends on the concrete
CFG type. For example, MachineCfgTraits -- for use with MachineIR
in SSA form -- encodes a Register inside CfgValueRef. Converting
between concrete references and opaque/generic ones is done by
CfgTraits::{fromGeneric,toGeneric}. Convenience methods
CfgTraits::{un}wrap{Iterator,Range} are available as well.

Writing algorithms in terms of CfgInterface adds some overhead
(virtual method calls, plus in same cases it removes the
opportunity to inline iterators), but can be much more convenient
since generic algorithms can be written as non-templates.

This patch adds implementations of CfgTraits for all CFGs on
which dominator trees are calculated, so that the dominator
tree can be ported to this machinery. Only IrCfgTraits (LLVM IR)
and MachineCfgTraits (Machine IR in SSA form) are complete, the
other implementations are limited to the absolute minimum
required to make the upcoming dominator tree changes work.

v5:
- fix MachineCfgTraits::blockdef_iterator and allow it to iterate over
  the instructions in a bundle
- use MachineBasicBlock::printName

v6:
- implement predecessors/successors for all CfgTraits implementations
- fix error in unwrapRange
- rename toGeneric/fromGeneric into wrapRef/unwrapRef to have naming
  that is consistent with {wrap,unwrap}{Iterator,Range}
- use getVRegDef instead of getUniqueVRegDef

v7:
- std::forward fix in wrapping_iterator
- fix typos

v8:
- cleanup operators on CfgOpaqueType
- address other review comments

Change-Id: Ia75f4f268fded33fca11218a7d578c9aec1f3f4d

Differential Revision: https://reviews.llvm.org/D83088
2020-10-20 13:50:52 +02:00
Artem Belevich
c36c0fabd1 [VectorCombine] Avoid crossing address space boundaries.
We can not bitcast pointers across different address spaces, and VectorCombine
should be careful when it attempts to find the original source of the loaded
data.

Differential Revision: https://reviews.llvm.org/D89577
2020-10-16 13:19:31 -07:00
Florian Hahn
89c0124273 [LoopVersion] Unify SCEVChecks and alias check handling (NFC).
This is an initial cleanup of the way LoopVersioning interacts with LAA.

Currently LoopVersioning has 2 ways of initializing things:

1. Passing LAI and passing UseLAIChecks = true
2. Passing UseLAIChecks = false, followed by calling setSCEVChecks and
   setAliasChecks.

Both ways of initializing lead to the same result and the duplication
seems more complicated than necessary.

This patch removes the UseLAIChecks flag from the constructor and the
setSCEVChecks & setAliasChecks helpers and move initialization
exclusively to the constructor.

This simplifies things, by providing a single way to initialize
LoopVersioning and reducing duplication.

Reviewed By: Meinersbur, lebedev.ri

Differential Revision: https://reviews.llvm.org/D84406
2020-10-15 22:02:17 +01:00
David Green
13ec3dd66f [LV] Add a getRecurrenceBinOp and make use of it. NFC 2020-10-15 18:21:41 +01:00
Florian Hahn
93f6c6b79c Recommit "[VPlan] Use VPValue def for VPMemoryInstructionRecipe."
This reverts the revert commit 710aceb645e7dba4de7053eef2c616311b9163d4
and includes a fix for a memsan failure.

Original message:

    This patch turns VPMemoryInstructionRecipe into a VPValue and uses it
    during VPlan construction and codegeneration instead of the plain IR
    reference where possible.
2020-10-14 17:41:23 +01:00
Evgeniy Brevnov
d0c95808e5 [LV] Unroll factor is expected to be > 0
LV fails with assertion checking that UF > 0. We already set UF to 1 if it is 0 except the case when IC > MaxInterleaveCount. The fix is to set UF to 1 for that case as well.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D87679
2020-10-14 16:48:17 +07:00
Vitaly Buka
710aceb645 Revert "[VPlan] Use VPValue def for VPMemoryInstructionRecipe."
It introduced a memory leak.

This reverts commit 525b085a65d30a5f2ae2af38c0be252fe8d4781b.
2020-10-13 03:14:08 -07:00
Florian Hahn
525b085a65 [VPlan] Use VPValue def for VPMemoryInstructionRecipe.
This patch turns VPMemoryInstructionRecipe into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84680
2020-10-12 18:02:33 +01:00
Florian Hahn
ea058d289c [VPlan] Use operands for printing of VPWidenMemoryInstructionRecipe.
Now that operands of the recipe are managed through VPUser, we can
simplify the printing by just using the operands.
2020-10-12 16:51:54 +01:00
David Sherwood
c5ba0d33cc [SVE] Make ElementCount and TypeSize use a new PolySize class
I have introduced a new template PolySize class, where the template
parameter determines the type of quantity, i.e. for an element
count this is just an unsigned value. The ElementCount class is
now just a simple derivation of PolySize<unsigned>, whereas TypeSize
is more complicated because it still needs to contain the uint64_t
cast operator, since there are still many places in the code that
rely upon this implicit cast. As such the class also still needs
some of it's own operators.

I've tried to minimise the amount of code in the base PolySize
class, which led to a couple of changes:

1. In some places we were relying on '==' operator comparisons
between ElementCounts and the scalar value 1. I didn't put this
operator in the new PolySize class, and thought it was actually
clearer to use the isScalar() function instead.
2. I removed the isByteSized function and replaced it with calls
to isKnownMultipleOf(8).

I've also renamed NextPowerOf2 to be coefficientNextPowerOf2 so
that it's more consistent with coefficientDivideBy.

Differential Revision: https://reviews.llvm.org/D88409
2020-10-12 08:23:38 +01:00
David Green
be6e8e50f4 [LV] Tail folded inloop reductions.
This expands upon the inloop reductions added in e9761688e41cb9e976,
allowing them to be inserted into tail folded loops. Reductions are
generates with the form:

  x = select(mask, vecop, zero)
  v = vecreduce.add(x)
  c = add chain, v

Where zero here is chosen as the identity value for add reductions. The
backend is then expected to fold the select and the vecreduce into a
single predicated instruction.

Most of the code is fairly straight forward, except for the creation of
blockmasks which need to ensure they are created in dominance order. The
order they are added is altered to be after any phis, keeping the
requirements for the underlying IR.

Differential Revision: https://reviews.llvm.org/D84451
2020-10-11 16:58:34 +01:00
Simon Pilgrim
0716805c02 [SLP] optimizeGatherSequence - assert every Instruction in the worklist is non-null.
Fixes clang static analyzer warning.
2020-10-08 20:02:18 +01:00
David Green
498f89d188 [LV] Collect dead induction truncates
We currently collect the ICmp and Add from an induction variable,
marking them as dead so that vplan values are not created for them. This
extends that to include any single use trunk from the ICmp, which allows
the Add to more readily be removed too.

This can help with costing vplan nodes, as the ICmp and Add are more
reliably removed and are not double-counted.

Differential Revision: https://reviews.llvm.org/D88873
2020-10-08 08:28:58 +01:00
Florian Hahn
348d85a6c7 [VPlan] Clean up uses/operands on VPBB deletion.
Update the code responsible for deleting VPBBs and recipes to properly
update users and release operands.

This is another preparation for D84680 & following patches towards
enabling modeling def-use chains in VPlan.
2020-10-05 14:43:52 +01:00
Florian Hahn
357bbaab66 [VPlan] Add VPRecipeBase::toVPUser helper (NFC).
This adds a helper to convert a VPRecipeBase pointer to a VPUser, for
recipes that inherit from VPUser. Once VPRecipeBase directly inherits
from VPUser this helper can be removed.
2020-10-04 19:43:27 +01:00
Florian Hahn
f5fe7abe8a [VPlan] Account for removed users in replaceAllUsesWith.
Make sure we do not iterate using an invalid iterator.

Another small fix/step towards traversing the def-use chains in VPlan.
2020-10-04 18:18:58 +01:00
Florian Hahn
82dcd383c4 [VPlan] Properly update users when updating operands.
When updating operands of a VPUser, we also have to adjust the list of
users for the new and old VPValues. This is required once we start
transitioning recipes to become VPValues.
2020-10-03 20:54:58 +01:00
Florian Hahn
0867a9e85a [VPlan] Use isa<> instead of directly checking VPRecipeID (NFC).
getVPRecipeID is intended to be only used in `classof` helpers. Instead
of checking it directly, use isa<> with the correct recipe type.
2020-10-02 17:47:35 +01:00
Florian Hahn
d856365470 [VPlan] Change recipes to inherit from VPUser instead of a member var.
Now that VPUser is not inheriting from VPValue, we can take the next
step and turn the recipes that already manage their operands via VPUser
into VPUsers directly. This is another small step towards traversing
def-use chains in VPlan.

This is NFC with respect to the generated code, but makes the interface
more powerful.
2020-09-30 14:39:00 +01:00
Sanjay Patel
0a349d5827 [SLP] clean up - use 'const' and ArrayRef constructor; NFC
Follow-on tidying suggested in the post-commit review of 6a23668.
2020-09-24 15:31:07 -04:00
Craig Topper
03f22b08e2 [SLP] Remove LHS and RHS from OperationData.
These were only really used for 2 things. One was to check if the operand matches the phi if it exists. The other was for the createOp method to build the reduction.

For the first case we still have the operation we just need to know how to index its operands. So I've modified getLHS/getRHS to just use the opcode/kind to know how to find the right operands on an instruction that is now passed in.

For the other case we had to create an OperationData object to set the LHS/RHS values and copy the opcode/kind from another object. We would then just call createOp on that temporary object. Instead I've made LHS/RHS arguments to createOp and removed all these temporary objects.

Differential Revision: https://reviews.llvm.org/D88193
2020-09-24 10:57:11 -07:00
Craig Topper
7a3c643c35 [SLP] Make HorizontalReduction::getOperationData take an Instruction* instead of a Value*. NFCI
All of the callers already have an Instruction *. Many of them
from a dyn_cast.

Also update the OperationData constructor to use a Instruction&
to remove a dyn_cast and make it clear that the pointer is non-null.

Differential Revision: https://reviews.llvm.org/D88132
2020-09-23 10:51:03 -07:00
Simon Pilgrim
474dc33d07 Add missing namespace closure comment. NFCI.
Fixes clang-tidy llvm-namespace-comment warning.
2020-09-23 16:19:25 +01:00
Florian Hahn
31923f6b36 [VPlan] Disconnect VPValue and VPUser.
This refactors VPuser to not inherit from VPValue to facilitate
introducing operations that introduce multiple VPValues (e.g.
VPInterleaveRecipe).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D84679
2020-09-23 14:44:31 +01:00
Alexey Bataev
d6ac649ccd [SLP]Fix coding style, NFC. 2020-09-22 17:44:29 -04:00
Stefanos Baziotis
89c1e35f3c [LoopInfo] empty() -> isInnermost(), add isOutermost()
Differential Revision: https://reviews.llvm.org/D82895
2020-09-22 23:28:51 +03:00
Florian Hahn
c671e34bf2 [VPlan] Add dump() helper to VPValue & VPRecipeBase.
This provides a convenient way to print VPValues and recipes in a
debugger. In particular it saves the user from instantiating
VPSlotTracker to print recipes or values.
2020-09-22 15:55:16 +01:00
Sanjay Patel
0c3bfbe4bc [SLP] reduce code duplication for checking parent block; NFC 2020-09-22 09:21:20 -04:00
Sanjay Patel
bbd49a0266 [SLP] move misplaced code comments; NFC 2020-09-22 09:21:20 -04:00
Sanjay Patel
062276c691 [SLP] clean up code in gather(); NFC
1. Use range for-loop to avoid repeatedly accessing end index.
2. Better variable names.
2020-09-22 09:21:20 -04:00
Simon Pilgrim
d682a36ef9 [SLP] Merge null and dyn_cast<> checks into dyn_cast_or_null<>. NFCI. 2020-09-22 14:01:47 +01:00
Sanjay Patel
7451bf0b0b [SLP] use std::distance/find to reduce code; NFC
We were already using this code pattern right after
the loop, so this makes it consistent.
2020-09-21 16:22:55 -04:00
Sanjay Patel
be93505986 [LoopVectorize] use unary shuffle creator to reduce code duplication; NFC 2020-09-21 15:34:24 -04:00
Sanjay Patel
a44238cb44 [SLP] use unary shuffle creator to reduce code duplication; NFC 2020-09-21 13:54:06 -04:00
Sanjay Patel
1e6b240d7d [IRBuilder][VectorCombine] make and use a convenience function for unary shuffle; NFC
This reduces code duplication for common construct.
Follow-ups can use this in SLP, LoopVectorizer, and other passes.
2020-09-21 13:47:01 -04:00
Simon Pilgrim
005f826a05 [SLP] Use for-range loops across ValueLists. NFCI.
Also rename some existing loops that used a 'j' iterator to consistently use 'V'.
2020-09-21 18:24:23 +01:00
Sanjay Patel
46075e0b78 [SLP] simplify interface for gather(); NFC
The implementation of gather() should be reduced too,
but this change by itself makes things a little clearer:
we don't try to gather to a different type or
number-of-values than whatever is passed in as the value
list itself.
2020-09-21 12:57:28 -04:00
Simon Pilgrim
3ddecfd220 SLPVectorizer.cpp - fix include ordering. NFCI. 2020-09-21 17:17:11 +01:00
Alexey Bataev
3ff07fcd54 [SLP] Allow reordering of vectorization trees with reused instructions.
If some leaves have the same instructions to be vectorized, we may
incorrectly evaluate the best order for the root node (it is built for the
vector of instructions without repeated instructions and, thus, has less
elements than the root node). In this case we just can not try to reorder
the tree + we may calculate the wrong number of nodes that requre the
same reordering.
For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves
are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first
leaf, it will be shrink to \<a, b\>. If instructions in this leaf should
be reordered, the best order will be \<1, 0\>. We need to extend this
order for the root node. For the root node this order should look like
\<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes
with the reused instructions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D45263
2020-09-21 10:51:03 -04:00
Fangrui Song
6913812abc Fix some clang-tidy bugprone-argument-comment issues 2020-09-19 20:41:25 -07:00
Eric Christopher
ecfd8161bf Temporarily Revert "[SLP] Allow reordering of vectorization trees with reused instructions."
as it's infinite looping on occasion.

This reverts commit 455ca0ebb69210046928fedffe292420a30f89ad.
2020-09-18 12:50:04 -07:00
Alexey Bataev
455ca0ebb6 [SLP] Allow reordering of vectorization trees with reused instructions.
If some leaves have the same instructions to be vectorized, we may
incorrectly evaluate the best order for the root node (it is built for the
vector of instructions without repeated instructions and, thus, has less
elements than the root node). In this case we just can not try to reorder
the tree + we may calculate the wrong number of nodes that requre the
same reordering.
For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves
are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first
leaf, it will be shrink to \<a, b\>. If instructions in this leaf should
be reordered, the best order will be \<1, 0\>. We need to extend this
order for the root node. For the root node this order should look like
\<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes
with the reused instructions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D45263
2020-09-18 09:34:59 -04:00
Sanjay Patel
48a23bccf3 [VectorCombine] limit load+insert transform to one-use
As discussed in:
https://llvm.org/PR47558
...there are several potential fixes/follow-ups visible
in the test case, but this is the quickest and safest
fix of the perf regression.
2020-09-17 14:29:15 -04:00
Sanjay Patel
ddd9575d15 [VectorCombine] rearrange bailouts for load insert for efficiency; NFC 2020-09-17 13:50:37 -04:00
Sanjay Patel
03783f19dc [SLP] sort candidates to increase chance of optimal compare reduction
This is one (small) part of improving PR41312:
https://llvm.org/PR41312

As shown there and in the smaller tests here, if we have some member of the
reduction values that does not match the others, we want to push it to the
end (bring the matching members forward and together).

In the regression tests, we have 5 candidates for the 4 slots of the reduction.
If the one "wrong" compare is grouped with the others, it prevents forming the
ideal v4i1 compare reduction.

Differential Revision: https://reviews.llvm.org/D87772
2020-09-17 08:49:27 -04:00
Sanjay Patel
24238f09ed [SLP] fix formatting; NFC
Also move variable declarations closer to usage and add code comments.
2020-09-16 08:50:27 -04:00
Sanjay Patel
6a23668e78 [SLP] remove uses of 'auto' that obscure functionality; NFC 2020-09-16 08:26:21 -04:00
Sanjay Patel
0cee1bf5d1 [SLP] remove redundant size check; NFC
We bail out on small array size anyway.
2020-09-16 08:11:19 -04:00
Sanjay Patel
bbad998bab [SLP] move loop index variable declaration to its use; NFC 2020-09-16 07:59:31 -04:00
Sanjay Patel
158989184e [SLP] change poorly named variable; NFC
'V' shadows a function argument.
2020-09-16 07:59:31 -04:00