If all operands of the non-schedulable nodes were previously only
copyables, need to clear the dependencies of the original schedule data
for such copyable operands and recalculate them to correctly handle
number of dependecies.
Fixes#159406
A common idiom is the usage of the PatternMatch match function within a
functional algorithm like all_of. Introduce a match functor to shorten
this idiom.
Co-authored-by: Luke Lau <luke@igalia.com>
If the commutable instruction can be represented as a non-commutable
vector instruction (like add 0, %v can be represented as a part of sub
nodes with operation sub %v, 0), its operands might still be reordered
and this should be accounted when checking for copyables in operands
Fixes#158293
If the original instruction is going to be scheduled after same
instruction being scheduled as copyable, need to recalculate
dependencies. Otherwise, the dependencies maybe calculated incorrectly.
If the user node of the SExt/ZExt node is a bitcast to a float point
type, the node itself should not be considered legal to demote, since
still the casting is required to match the size of the float point type.
Fixes#157277
If a standalone schedule data relates to a vectorized instruction, still
need to schedule it as a part of pseudo-bundle to correctly handle
dependencies between its child nodes.
If a standalone schedule data relates to a vectorized instruction, still
need to schedule it as a part of pseudo-bundle to correctly handle
dependencies between its child nodes.
Commutable instruction can be reordering during tree building, and if
the parent node is not scheduled, its ScheduleData elements are
considered independent and compiler do not looks for reordered operands.
Need to cancel scheduling of copyables in this case.
In the initial patch for FMAD, potential FMAD nodes were completely
excluded from the reduction analysis for the smaller patch. But it may
cause regressions.
This patch adds better detection of scalar FMAD reduction operations and
tries to correctly calculate the costs of the FMAD reduction operations
(also, excluding the costs of the scalar fmuls) and split reduction
operations, combined with regular FMADs.
Fixed the handling for reduced values with many uses.
Reviewers: RKSimon, gregbedwell, hiraditya
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/152787
In the initial patch for FMAD, potential FMAD nodes were completely
excluded from the reduction analysis for the smaller patch. But it may
cause regressions.
This patch adds better detection of scalar FMAD reduction operations and
tries to correctly calculate the costs of the FMAD reduction operations
(also, excluding the costs of the scalar fmuls) and split reduction
operations, combined with regular FMADs.
Reviewers: RKSimon, gregbedwell, hiraditya
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/152787
Currently stores are sorted by the stored values instruction types,
which do not include analysis for copyables. The compiler may miss some
potential vectorization opportunities because of that. Patch adds
detection of the copyables in stored values.
Reviewers: hiraditya, HanKuanChen, RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/153213
If the value is checked for the reduction and it is a copyable element
in a root node, it should not be deleted, since it may still be used
after vectorization.
Fixes#155512
If the copyable instruction is a terminate instruction from the same
block, as the potential main instruction, such instruction cannot be
copyable and the value list cannot be modeled as instructions with same
(and copyables) opcodes.
Fixes#155183
Need to recalculate the dependencies only for nodes, which have valid
deps before they gets cleared because of the copyable nodes. Otherwise,
no need to recaculate the dependencies to prevent a crash.
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
If the copyable schedule data is created and the user is used several
times in the user node, no need to count same data for the same user
several times, need to include it only ones.
Fixes#153754
If the copyable schedule data is created and the user is used several
times in the user node, no need to count same data for the same user
several times, need to include it only ones.
Fixes#153754
Added support for LShr instructions as base for copyable elements. Also,
added simple analysis for best base instruction selection, if multiple
candidates are available.
Fixed scheduling after cancellation
Reviewers: hiraditya, RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/153393
Added support for LShr instructions as base for copyable elements. Also,
added simple analysis for best base instruction selection, if multiple
candidates are available.
Reviewers: hiraditya, RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/153393
After clearing the dependencies in copyable data, need to recalculate
dependencies for the original ScheduleData, if it can be marked as
control dependent.
Fixes#153289
Adds initial support for copyable elements, both schedulable and
non-schedulable.
Adds support only for add for now, other opcodes will added in future.
Still some cases are not handled, e.g. stores do not include this,
because currently do not check for copyable elements.
Reviewers: hiraditya, RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/147366
Added initial check for potential fmad conversion in reductions and
operands vectorization.
Added the check for instruction to fix#152683
Skipped the code for reduction to avoid regressions.
If the instruction is checked for matching the main instruction, need to
check if the opcode of the main instruction is compatible with the
operands of the instruction. If they are not, need to check the
alternate instruction and its operands for compatibility and return
alternate instruction as a match.
Fixes#151699
Fixed check for non-supported binary operations.