This fixes a miscompile in #180005 where we didn't check that the first
incoming value isn't poison.
We should use the first non-poison incoming value if it exists, or just
poison if all the incoming values are poison.
If a phi has fast math flags, we can propagate it to the widened select.
To do this, this patch makes VPPhi and VPBlendRecipe subclasses of
VPRecipeWithIRFlags, and propagates it through PlainCFGBuilder and
VPPredicator.
Alive2 proofs for some of the FMFs (it looks like it can't reason about
the full "fast" set yet)
nnan: https://alive2.llvm.org/ce/z/f0bRd4
nsz: https://alive2.llvm.org/ce/z/u9P96T
The actual motivation for this to eventually be able to move the special
casing for tail folding in
LoopVectorizationPlanner::addReductionResultComputation into the CFG in
#176143, which requires passing through FMFs.
Add a new VPInstruction opcode to compute the exiting value of an
induction variable after vectorization. This replaces the pattern of
extracting the last lane from the last part of the induction backedge
value when applicable.
This allows us to always use the pre-computed IV end value. It will also
allow unifying end value creation for both induction resume and exit
values.
PR: https://github.com/llvm/llvm-project/pull/175651
We have an optimization in VPPredicator when creating blends where if
all the incoming values are the same, we just return that value.
This extends it to handle cases like "phi [%x, %x, poison, %x]" by
ignoring poison values.
This is split off from #176143 to prevent regressions when maintaining
SSA by adding PHIs with a poison incoming value.
When converting phis to blends, the `VPPredicator` expects to have edge
masks to the phi node if the phi node has different incoming blocks.
This was not the case if the predecessor of the phi was a switch where a
conditional destination was the same as the default destination.
This was because when creating edge masks in `createSwitchEdgeMasks`,
edge masks are set in a loop through the *non-default* destinations. But
when there are no non-default destinations (but at least one condition,
otherwise an earlier condition would trigger and just forward the source
mask), this loop is never executed, so the masks are never set.
To resolve this, we explicitly forward the source mask for these cases
as well, which is correct because it is an unconditional branch, just a
very convoluted one.
fixes#179074
We can just directly use the incoming value. These single value blends
would get optimized later on in simplifyBlends, but by doing it early it
removes the notion of an "immediately normalized" blend, and simplifies
an upcoming patch.
Replace ExtractLastElement and ExtractLastLanePerPart with more generic
and specific ExtractLastLane and ExtractLastPart, which model distinct
parts of extracting across parts and lanes. ExtractLastElement ==
ExtractLastLane(ExtractLastPart) and ExtractLastLanePerPart ==
ExtractLastLane, the latter clarifying the name of the opcode. A new
m_ExtractLastElement matcher is provided for convenience.
The patch should be NFC modulo printing changes.
PR: https://github.com/llvm/llvm-project/pull/164124
Following up from #150368, this moves folding common edge masks into
simplifyBlends.
One test in uniform-blend.ll ended up regressing but after looking at it
closely, it came from a weird (x && !x) edge mask. So I've just included
a simplifcation in this PR to fold that to false.
If a phi is widened with tail folding, all of its predecessors will have
a mask of the form
%x = logical-and %active-lane-mask, %foo
%y = logical-and %active-lane-mask, %bar
%z = logical-and %active-lane-mask, %baz
...
We can remove the common %active-lane-mask from all of these edge masks,
which in turn allows us to simplify a lot of VPBlendRecipes.
In particular, it allows the header mask to be removed in selects with
EVL tail folding, improving RISC-V codegen on SPEC CPU 2017 for
525.x264_r, and supersedes #147243.
This also allows us to remove VPBlendRecipe and directly emit
VPInstruction::Select in another patch.
Add 3 new iterator ranges to VPPhiAccessors
* incoming_values(): returns a range over the incoming
values of a phi
* incoming_blocks(): returns a range over the incoming
blocks of a phi
* incoming_values_and_blocks: returns a range over pairs of
incoming values and blocks.
Depends on https://github.com/llvm/llvm-project/pull/124838.
PR: https://github.com/llvm/llvm-project/pull/138472
The initial VPlan closely reflects the original scalar loop, so unsing
VPWidenPHIRecipe here is premature. Widened phi recipes should only be
introduced together with other widened recipes.
PR: https://github.com/llvm/llvm-project/pull/150847
This reverts commit 793bb6b257fa4d9f4af169a4366cab3da01f2e1f.
The recommitted version contains a fix to make sure only the original
phis are processed in convertPhisToBlends nu collecting them in a vector
first. This fixes a crash when no mask is needed, because there is only
a single incoming value.
Original message:
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform. It mostly ports the existing logic directly.
There are a number of follow-ups planned in the near future to
further improve on the implementation:
* Edge and block masks are cached in VPPredicator, but the block masks
are still made available to VPRecipeBuilder, so they can be accessed
during recipe construction. As a follow-up, this should be replaced by
adding mask operands to all VPInstructions that need them and use that
during recipe construction.
* The mask caching in a map also means that this map needs updating each
time a new recipe replaces a VPInstruction; this would also be handled
by adding mask operands.
PR: https://github.com/llvm/llvm-project/pull/128420
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform. It mostly ports the existing logic directly.
There are a number of follow-ups planned in the near future to
further improve on the implementation:
* Edge and block masks are cached in VPPredicator, but the block masks
are still made available to VPRecipeBuilder, so they can be accessed
during recipe construction. As a follow-up, this should be replaced by
adding mask operands to all VPInstructions that need them and use that
during recipe construction.
* The mask caching in a map also means that this map needs updating each
time a new recipe replaces a VPInstruction; this would also be handled
by adding mask operands.
PR: https://github.com/llvm/llvm-project/pull/128420
The implementations of VPlanDominatorTree, VPlanLoopInfo and VPlanPredicator
are all incompatible with modeling loops in VPlans as region without
explicit back-edges.
Those pieces are not actively used and only exercised by a few gtest
unit tests. They are at the moment blocking progress towards unifying
the native and inner-loop vectorizer paths in D121624 and D123005.
I think we should not block forward progress on unused pieces of code,
so this patch removes the utilities for now. The plan is to re-introduce
them as needed in a way that is compatible with the unified VPlan scheme
used in both the inner loop vectorizer and the native path.
Reviewed By: sguggill
Differential Revision: https://reviews.llvm.org/D123017
In LLVM's common loop terminology, an exit block is a block outside a
loop with a predecessor inside the loop. An exiting block is a block
inside the loop which branches to an exit block outside the loop.
This patch updates a few places where VPlan was using ExitBlock for a
block exiting a region. Those instances have been updated to use
ExitingBlock.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D126173
VPlan-native path
Context: Patch Series #2 for outer loop vectorization support in LV
using VPlan. (RFC:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
Patch series #2 checks that inner loops are still trivially lock-step
among all vector elements. Non-loop branches are blindly assumed as
divergent.
Changes here implement VPlan based predication algorithm to compute
predicates for blocks that need predication. Predicates are computed
for the VPLoop region in reverse post order. A block's predicate is
computed as OR of the masks of all incoming edges. The mask for an
incoming edge is computed as AND of predecessor block's predicate and
either predecessor's Condition bit or NOT(Condition bit) depending on
whether the edge from predecessor block to the current block is true
or false edge.
Reviewers: fhahn, rengolin, hsaito, dcaballe
Reviewed By: fhahn
Patch by Satish Guggilla, thanks!
Differential Revision: https://reviews.llvm.org/D53349
llvm-svn: 351990