When attempting to perform complex deinterleaving on an unrolled loop
containing a reduction, the complex deinterleaving pass would fail to
accommodate the wider types when accumulating the unrolled paths.
Instead of trying to alter the incoming IR to fit expectations, the pass
should instead decide against processing any reduction that results in a
non-complex or non-vector value.
If a complex pattern had the shape of both a complex->complex reduction
and a complex->single reduction, the matching would recognise both and
deem the graph a valid transformation. Preventing this reprocessing
results in only one of these matching, meaning that in the case of an
invalid graph, we don't try to transform it anyway.
This reverts commit 76714be5fd4ace66dd9e19ce706c2e2149dd5716, fixing the
build failure that caused the revert.
The failure stemmed from the complex deinterleaving pass identifying a
series of add operations as a "complex to single reduction", so when it
tried to transform this erroneously identified pattern, it faulted. The
fix applied is to ensure that complex numbers (or patterns that match
them) are used throughout, by checking if there is a deinterleave node
amidst the graph.
The Complex Deinterleaving pass assumes that all values emitted will
result in complex numbers, this patch aims to remove that assumption and
adds support for emitting just the real or imaginary components, not
both.
This patch is moving out following intrinsics:
* vector.interleave2/deinterleave2
* vector.reverse
* vector.splice
from the experimental namespace.
All these intrinsics exist in LLVM for more than a year now, and are
widely used, so should not be considered as experimental.
It's becoming potentially unsafe to insert a PHI instruction using a plain
Instruction pointer. Switch all the remaining sites that create and insert
PHIs to use iterators instead. For example, the code in
ComplexDeinterleavingPass.cpp is definitely at-risk of mixing PHIs and
debug-info.
When replacing ComplexDeinterleavingPass::ReductionOperation, we can do it
either from the Real or Imaginary part. The correct way is to take whichever
is later in the BasicBlock, but before the patch, we just always took the
Real part.
Fixes https://github.com/llvm/llvm-project/issues/65044
Differential Revision: https://reviews.llvm.org/D159209
Cache all results of running `identifyNode`, even those that do not identify
potential complex operations. This patch prevents ComplexDeinterleaving pass
from repeatedly trying to identify Nodes for the same pair of instructions.
Fixes https://github.com/llvm/llvm-project/issues/64379
Differential Revision: https://reviews.llvm.org/D156916
AArch64 introduced CMLA and CADD instructions as part of SVE2. This
change allows to generate such instructions when this architecture
feature is available.
Differential Revision: https://reviews.llvm.org/D153808
Using ACLE intrinsics, it is possible to create a loop that the
deinterleaving pass incorrectly classified as a reduction loop.
For example, for fixed-width vectors the loop was like below:
vector.body:
%a = phi <4 x float> [ %init.a, %entry ], [ %updated.a, %vector.body ]
%b = phi <4 x float> [ %init.b, %entry ], [ %updated.b, %vector.body ]
...
; Does not depend on %a or %b:
%updated.a = ...
%updated.b = ...
Differential Revision: https://reviews.llvm.org/D154598
This commit allows generating of complex number intrinsics for expressions
with constants or loops invariants, which are represented as splats.
For instance, after vectorizing loops in the following code snippets,
the ComplexDeinterleaving pass will be able to generate complex number
intrinsics:
```
complex<> x = ...;
for (int i = 0; i < N; ++i)
c[i] = a[i] * b[i] * x;
```
or
```
for (int i = 0; i < N; ++i)
c[i] = a[i] * b[i] * (11.0 + 3.0i);
```
Differential Revision: https://reviews.llvm.org/D153355
Add a missing check that ensures that ComplexDeinterleaving for reduction
is only analyzed for Real and Imaginary Instructions of the same type.
Differential Revision: https://reviews.llvm.org/D153862
Adds the capability to recognize SelectInst that appear in the IR.
These instructions are generated during scalable vectorization for reduction
and when the code contains conditions inside the loop body or when
"-prefer-predicate-over-epilogue=predicate-dont-vectorize" is set.
Differential Revision: https://reviews.llvm.org/D152558
This reverts commit ab09654832dba5cef8baa6400fdfd3e4d1495624.
Reason: Reapplying after removing unnecessary default case in switch expression.
ComplexDeinterleavingPass.cpp:1849:3: error: default label in switch which covers all enumeration values
This reverts commit 116953b82130df1ebd817b3587b16154f659c013.
Adds the capability to recognize SelectInst that appear in the IR.
These instructions are generated during scalable vectorization for reduction
and when the code contains conditions inside the loop body or when
"-prefer-predicate-over-epilogue=predicate-dont-vectorize" is set.
Differential Revision: https://reviews.llvm.org/D152558
This patch fixes:
llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:1790:3: error:
default label in switch which covers all enumeration values
[-Werror,-Wcovered-switch-default]
This commit enhances the ComplexDeinterleaving pass to handle unordered
reductions in simple one-block vectorized loops, supporting both
SVE and Neon architectures.
Differential Revision: https://reviews.llvm.org/D152022
Code generated with -Ofast and -O3 -ffp-contract=fast (add
-ffinite-math-only to enable vectorization) can differ significantly.
Code compiled with -O3 can be deinterleaved using patterns as the
instruction order is preserved. However, with the -Ofast flag, there
can be multiple changes in the computation sequence, and even the real
and imaginary parts may not be calculated in parallel.
For more details, refer to
llvm/test/CodeGen/AArch64/complex-deinterleaving-*-fast.ll and
llvm/test/CodeGen/AArch64/complex-deinterleaving-*-contract.ll tests.
This patch implements a more general approach and enables handling most
-Ofast cases.
Differential Revision: https://reviews.llvm.org/D148558
This patch updates several functions in LLVM's IR generation code to accept
an IRBuilder object as an argument, rather than an Instruction that indicates
the insertion point for new instructions.
This change is necessary to handle sophisticated -Ofast optimization cases
from D148558 where it's unclear which instructions should be used as the
insertion point for new operations.
Differential Revision: https://reviews.llvm.org/D148703
Remove the unnecessary `"llvm/IR/PatternMatch.h"` include directive from
`ComplexDeinterleavingPass.h` and move it to the corresponding source
file.
Add missing includes that were transitively included by this header to 3
other source files.
This reduces the total number of preprocessing tokens across the LLVM
source files in `lib` from (roughly) 1,964,876,961 to 1,935,091,611 - a
reduction of ~1.52%. This should result in a small improvement in
compilation time.
Remove the unnecessary `"llvm/IR/PatternMatch.h"` include directive from
`ComplexDeinterleavingPass.h` and move it to the corresponding source
file.
Add missing includes that were transitively included by this header to 2
other source files.
This reduces the total number of preprocessing tokens across the LLVM
source files in `lib` from (roughly) 1,964,876,961 to 1,935,091,611 - a
reduction of ~1.52%. This should result in a small improvement in
compilation time.
Differential Revision: https://reviews.llvm.org/D150514
This commit adds support for scalable vector types in theComplexDeinterleaving
pass, allowing it to recognize and handle `llvm.vector.interleave2` and
`llvm.vector.deinterleave2` intrinsics for both fixed and scalable vectors
Differential Revision: https://reviews.llvm.org/D147451
With this patch, ComplexDeinterleavingPass now has the ability to handle
any number of interconnected operations involving complex numbers.
For example, the patch enables the processing of code like the following:
for (int i = 0; i < 1000; ++i) {
a[i] = w[i] * v[i];
b[i] = w[i] * u[i];
}
This code has multiple arrays containing complex numbers and a common
subexpression `w` that appears in two expressions.
Differential Revision: https://reviews.llvm.org/D146988
This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D148303
This is a simple patch to make sure fast math flags are propagated through to
the newly created symmetric operations, which can help with later
simplifications.
Differential Revision: https://reviews.llvm.org/D146409
Adds the Complex Deinterleaving Pass implementing support for complex numbers in a target-independent manner, deferring to the TargetLowering for the given target to create a target-specific intrinsic.
Differential Revision: https://reviews.llvm.org/D114174