InstCombine is supposed to be a superset of InstSimplify, but we
were not attempting simplification of insertvalue instructions.
As the test change illustrates, we failed to remove some aggregate
construction patterns because of that.
shuffle (fabs X), Mask --> fabs (shuffle X, Mask)
shuffle (fabs X), (fabs Y), Mask --> fabs (shuf X, Y, Mask)
https://alive2.llvm.org/ce/z/JH2nkf
This generalizes the existing fneg transforms to also work with fabs.
A likely follow-up would generalize this further to move any unary
intrinsic op.
This puts lower insert indexes before higher. This is independent
of endian, so it requires an adjustment to a fold added with
4446f71ce392, but it makes that fold more robust.
That's also where this patch was suggested - D139668.
This matches what we already do in DAGCombiner, but there is one
more constraint because there's an existing canonicalization for
insert-of-scalar-constant. I'm not sure if that is still needed,
so it may be adjusted/removed as a follow-up.
This transform was added with 4446f71ce392. However, as noted in
the post-commit feedback, the transform is not safe with an
arbitrary base vector because we may leak poison from a narrow
element into an adjacent element when bitcasting.
I made the least invasive code change in case we do figure out
a way to make this safe.
This replaces patches that tried to convert related patterns to shuffles
(D138872, D138873, D138874 - reverted/abandoned) but caused codegen
problems and were questionable as a canonicalization because an
insertelement is a simpler op than a shuffle.
This detects a larger pattern -- insert-of-insert -- and replaces with
another insert, so this hopefully does not cause any problems.
As noted by TODO items in the code and tests, this could go a lot further.
But this is enough to reduce the motivating test from issue #17113.
Example proofs:
https://alive2.llvm.org/ce/z/NnUv3a
I drafted a version of this for AggressiveInstCombine, but it seems that
would uncover yet another phase ordering gap. If we do generalize this to
handle the full range of potential patterns, that may be worth looking at
again.
Differential Revision: https://reviews.llvm.org/D139668
This reverts commit e71b81cab09bf33e3b08ed600418b72cc4117461.
As discussed in the planned follow-on to this patch (D138874),
this and the subsequent patches in this set can cause trouble for
the backend, and there's probably no quick fix. We may even
want to canonicalize in the opposite direction (towards insertelt).
This reverts commit b7c7fe3d0779b6e332fe6db64e87561deba2e56a.
As discussed in the planned follow-on to this patch (D138874),
this and the previous patch in this set can cause trouble for
the backend, and there's probably no quick fix. We may even
want to canonicalize in the opposite direction (towards insertelt).
This reverts commit dd8d0d21ce6d0665ef5d426372096aaed85b479a.
As discussed in the planned follow-on to this patch (D138874),
this and the previous patch in this set can cause trouble for
the backend, and there's probably no quick fix. We may even
want to canonicalize in the opposite direction (towards insertelt).
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
This enhances the base fold from part 1 to allow mapping a
right-shift to an insert index.
Example of translating a middle chunk of the scalar to vector
for either endian:
https://alive2.llvm.org/ce/z/fRXCOZ
This only allows creating an identity shuffle (with optional
shortening/lengthening) because that is considered the safe
baseline for any target (can be inverted if needed). If we
tried this fold with target-specific costs/legality, then we
could do the transform more generally.
Differential Revision: https://reviews.llvm.org/D138873
As noted in issue #59266, the logic reduction could be
beyond the capabilities of an optimizing compiler, and
the code with ternary op is easier to read either way.
The first attempt was reverted because a clang test changed
unexpectedly - the file is already marked with a FIXME, so
I just updated it this time to pass.
Original commit message:
This is the main patch for converting a truncated scalar that is
inserted into a vector to bitcast+shuffle. We could go either way
on patterns like this, but this direction will allow collapsing a
pair of these sequences on the motivating example from issue
The patch is split into 3 parts to make it easier to see the
progression of tests diffs. We allow inserting/shuffling into a
different size vector for flexibility, so there are several test
variations. The length-changing is handled by shortening/padding
the shuffle mask with undef elements.
In part 1, handle the basic pattern:
inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask
Proof for the endian-dependency behaving as expected:
https://alive2.llvm.org/ce/z/BsA7yC
The TODO items for handling shifts and insert into an arbitrary base
vector value are implemented as follow-ups.
Differential Revision: https://reviews.llvm.org/D138872
This is the main patch for converting a truncated scalar that is
inserted into a vector to bitcast+shuffle. We could go either way
on patterns like this, but this direction will allow collapsing a
pair of these sequences on the motivating example from issue
The patch is split into 3 parts to make it easier to see the
progression of tests diffs. We allow inserting/shuffling into a
different size vector for flexibility, so there are several test
variations. The length-changing is handled by shortening/padding
the shuffle mask with undef elements.
In part 1, handle the basic pattern:
inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask
Proof for the endian-dependency behaving as expected:
https://alive2.llvm.org/ce/z/BsA7yC
The TODO items for handling shifts and insert into an arbitrary base
vector value are implemented as follow-ups.
Differential Revision: https://reviews.llvm.org/D138872
We were checking for a desirable integer type even when there
is no shift in the transform. This is unnecessary since we
are truncating directly to the destination type.
This removes an extractelt in more cases and seems to make the
canonicalization more uniform overall. There's still a potential
difference between patterns that need a shift vs. trunc-only.
I'm not sure if that is worth keeping at this point, but it can
be adjusted in another step (assuming this change does not cause
trouble).
In the most basic case where I noticed this, we missed a fold
that would have completely removed vector ops from a pattern
like:
https://alive2.llvm.org/ce/z/y4Qdte
An extractelt with a constant index which extracts an element from the
two vector operands of a select can be directly folded into a select.
extractelt (select %x, %vec1, %vec2), %const ->
select %x, %vec1[%const], %vec2[%const]
Note: the implementation currently only works for constant vector operands.
Reviewed By: foad, spatel
Differential Revision: https://reviews.llvm.org/D137934
Splatting the first vector element of the result of a BinOp, where any of the
BinOp's operands are the result of a first vector element splat can be simplified to
splatting the first vector element of the result of the BinOp
Differential Revision: https://reviews.llvm.org/D135876
Splatting the first vector element of the result of a BinOp, where any of the
BinOp's operands are the result of a first vector element splat can be simplified to
splatting the first vector element of the result of the BinOp
Differential Revision: https://reviews.llvm.org/D135876
Bug introduced in e239198cdbbf.
The assert() is making an assumption that the resulting shuffle mask
will always select elements from both vectors, this is untrue in the
case of two shuffles being folded if the former shuffle has a mask with
undef elements in it. In such a case folding the shuffles might result
in a mask which only selects from one of the vectors because the other
elements (in the mask) are undef.
Differential Revision: https://reviews.llvm.org/D136256
When Index is variable but still trivially known to be equal we can use Value
from before the insertion, possibly eliminating the vector.
Reverts a functional change from:
Author: Philip Reames <listmail@philipreames.com>
Date: Wed Dec 8 12:21:10 2021 -0800
[instcombine] A couple style tweaks to visitExtractElementInst [nfc]
Thanks to Michele Scandale for identifying the bug
Differential Revision: https://reviews.llvm.org/D135625
We don't combine generic shuffles together in IR, but select
shuffles are a special-case because a select shuffle of a
select shuffle is just another select shuffle; codegen is
expected to efficiently lower those (select shuffles are also
the canonical form of a vector select with constant condition).
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName". This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.
This is the alternative to the less invasive clang-format only patch: D126783
Reviewed By: spatel, rengolin
Differential Revision: https://reviews.llvm.org/D126889
shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask)
This extends the transform added with 0353c2c996c5.
If the shuffle reduces vector length, the transform
reduces the width of the cast, so that should be a
win for most codegen (if not, it can be inverted).
shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask)
This extends the transform added with 0353c2c996c5.
If the casts are to a larger element type, the transform
reduces shuffle bit width, so that should be a win for
most codegen (if not, it can be inverted).
shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask)
This is similar to a recent transform with fneg ( b331a7ebc1e0 ),
but this is intentionally the most conservative first step to
try to avoid regressions in codegen. There are several
restrictions that could be removed as follow-up enhancements.
Note that a cast with a unary shuffle is currently canonicalized
in the other direction (shuffle after cast - D103038 ). We might
want to invert that to be consistent with this patch.
For the unary shuffle pattern, this is opposite to what we try
to do with binops, but it seems better to keep it consistent
with the motivating binary shuffle pattern. On that, it is
clearly better on the usual no-extra uses case.
There is a chance that this will pull an fneg away from some
other binop and cause a regression in codegen, but that should
be invertible in the backend. The transform is birectional:
https://alive2.llvm.org/ce/z/kKaKCUhttps://alive2.llvm.org/ce/z/3DesfwFixes#45631
Add the "(0 - X) --> (X * -1)" reverse identity to the list of alternate form binops.
We need a little hack to make the existing logic work because it does not expect to
move constants from op0 to op1, but the code comment hopefully makes that clear.
I don't think there are any other identities like that.
Fixes#54364
Differential Revision: https://reviews.llvm.org/D122390
The transform can require an optional shuffle instruction to be sound,
so we need to use Builder to create all values and then replace the
original instruction with whatever that final value is.
The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order.
I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions.
We chose the canonical type as i64 arbitrarily. We might consider changing this to something else in the future if we have cause.
Differential Revision: https://reviews.llvm.org/D115387
This reorders existing transforms to put demanded elements last. The reasoning here is that when we have an example which can be scalarized or handled via demanded bits, we should prefer scalarization as that doesn't require dropping flags on arithmetic instructions.
This doesn't show major changes in the tests today, but once I add support for fast math flags to dropPoisonGeneratingFlags this becomes glaringly obvious.
Differential Revision: https://reviews.llvm.org/D115394
In D71220 a pattern was added to replace shuffle's insertelement operand
if inserted scalar is not demanded. The pattern was added only for
the case where the shuffle's mask size is equal to element's vector size.
However, that condition is not required because the pattern does not
change the shuffle vector size.
This patch extends the pattern to also include cases where shuffle's mask
size is not equal to element's vector size.
Differential Revision: https://reviews.llvm.org/D112318
This is NFC-intended for the callers. Posting in case there are
other potential users that I missed.
I would also use this from VectorCombine in a patch for:
https://llvm.org/PR52178 ( D111901 )
Differential Revision: https://reviews.llvm.org/D111891