The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order.
I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions.
We chose the canonical type as i64 arbitrarily. We might consider changing this to something else in the future if we have cause.
Differential Revision: https://reviews.llvm.org/D115387
The final reduction nodes should not be reordered, the order does not
matter for reductions. Also, it might be profitable to vectorize smaller
reduction trees, reduction cost may compensate small tree cost.
Part of D111574
Differential Revision: https://reviews.llvm.org/D112467
This function is called when some predecessor of an empty return block
ends with a conditional branch, with both successors being empty ret blocks.
Now, because of the way SimplifyCFG works, it might happen to simplify
one of the blocks in a way that makes a conditional branch
into an unconditional one, since it's destinations are now identical,
but it might not have actually simplified said conditional branch
into an unconditional one yet.
So, we have to check that ourselves first,
especially now that SimplifyCFG aggressively tail-merges
all ret and resume blocks.
Even if it was an unconditional branch already,
`SimplifyCFGOpt::simplifyReturn()` doesn't call `FoldReturnIntoUncondBranch()`
by default.
Based ontop of D104598, which is a NFCI-ish refactoring.
Here, a restriction, that only empty blocks can be merged, is lifted.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D104597
This is a patch that disables the poison-unsafe select -> and/or i1 folding.
It has been blocking D72396 and also has been the source of a few miscompilations
described in llvm.org/pr49688 .
D99674 conditionally blocked this folding and successfully fixed the latter one.
The former one was still blocked, and this patch addresses it.
Note that a few test functions that has `_logical` suffix are now deoptimized.
These are created by @nikic to check the impact of disabling this optimization
by copying existing original functions and replacing and/or with select.
I can see that most of these are poison-unsafe; they can be revived by introducing
freeze instruction. I left comments at fcmp + select optimizations (or-fcmp.ll, and-fcmp.ll)
because I think they are good targets for freeze fix.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D101191
This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef.
This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector.
The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ).
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D94061
This patch updates IRBuilder to create insertelement/shufflevector using poison as a placeholder.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93793
This patch enables canonicalization of SPF_ABS and SPF_ABS
to the abs intrinsic.
This is a recommit, the original try was
05d4c4ebc2fb006b8a2bd05b24c6aba10dd2eef8,
but it was reverted due to an apparent miscompile,
which since then has just been fixed by the previous commit.
Differential Revision: https://reviews.llvm.org/D87188
Use -0.0 instead of 0.0 as the start value. The previous use of 0.0
was fine for all existing uses of this function though, as it is
always generated with fast flags right now, and thus nsz.
This reverts commit 05d4c4ebc2fb006b8a2bd05b24c6aba10dd2eef8.
mstorsjo reports a miscompile after this change in
https://reviews.llvm.org/D87188#2281093. Reverting until I can
investigate this.
Enable canonicalization of SPF_ABS and SPF_NABS to the abs intrinsic.
To be conservative, the one-use check on the comparison is retained,
this may be relaxed if all goes well.
It's pretty likely that this will uncover places that missing
handling for the abs() intrinsic. Please report any seen performance
regressions.
Differential Revision: https://reviews.llvm.org/D87188
binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1)
-->
vcmp = cmp Pred X, VecC
ext (binop vNi1 vcmp, (shuffle vcmp, Index1)), Index0
This is a larger pattern than the existing extractelement folds because we can't
reasonably vectorize the sub-patterns with constants based on cost model calcs
(it doesn't usually make sense to replace a single extracted scalar op with
constant operand with a vector op).
I salvaged as much of the existing logic as I could, but there might be better
ways to share and reduce code.
The motivating case from PR43745:
https://bugs.llvm.org/show_bug.cgi?id=43745
...is the special case of a 2-way reduction. We tried to get SLP to handle that
particular pattern in D59710, but that caused crashing and regressions.
This patch is more general, but hopefully safer.
The v2f64 test with SSE2 surprised me - the cost model accounting looks like this:
OldCost = 0 (free extract of f64 at index 0) + 1 (extract of f64 at index 1) + 2 (scalar fcmps) + 1 (and of bools) = 4
NewCost = 2 (vector fcmp) + 1 (shuffle) + 1 (vector 'and') + 1 (extract of bool) = 5
Differential Revision: https://reviews.llvm.org/D82474
This is the integer sibling to D81491.
(a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) -->
(a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3])
Removing the "experimental" from these intrinsics is likely
not too far away.
This saves creating/destroying a builder every time we
perform some transform.
The tests show instruction ordering diffs resulting from
always inserting at the root instruction now, but those
should be benign.
(a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) -->
(a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3])
This should be the last step in solving PR43953:
https://bugs.llvm.org/show_bug.cgi?id=43953
We started emitting reduction intrinsics with:
D80867/ rGe50059f6b6b3
So it's a relatively easy pattern match now to re-order those ops.
Also, I have not seen any complaints for the switch to intrinsics
yet, so I'll propose to remove the "experimental" tag from the
intrinsics soon.
Differential Revision: https://reviews.llvm.org/D81491
Motivating examples are seen in the PhaseOrdering tests based on:
https://bugs.llvm.org/show_bug.cgi?id=43953#c2 - if we have
intrinsics there, some pass can fold them.
The intrinsics are still named "experimental" at this point, but
if there is no fallout from this patch, that will be a good
indicator that it is safe to finalize them.
Differential Revision: https://reviews.llvm.org/D80867
There are 2 known problem patterns shown in the test diffs here:
vector horizontal ops (an x86 specialization) and vector reductions.
SLP has greater ability to match and fold those than vector-combine,
so let SLP have first chance at that.
This is a quick fix while we continue to improve vector-combine and
possibly canonicalize to reduction intrinsics.
In the longer term, we should improve matching of these patterns
because if they were created in the "bad" forms shown here, then we
would miss optimizing them.
I'm not sure what is happening with alias analysis on the addsub test.
The old pass manager now shows an extra line for that, and we see an
improvement that comes from SLP vectorizing a store. I don't know
what's missing with the new pass manager to make that happen.
Strangely, I can't reproduce the behavior if I compile from C++ with
clang and invoke the new PM with "-fexperimental-new-pass-manager".
Differential Revision: https://reviews.llvm.org/D80236
This is split off from D79799 - where I was proposing to fully iterate
over a function until there are no more transforms. I suspect we are
still going to want to do something like that eventually.
But we can achieve the same gains much more efficiently on the current
set of regression tests just by reversing the order that we visit the
instructions.
This may also reduce the motivation for D79078, but we are still not
getting the optimal pattern for a reduction.