This patch fixes a bug in `DependencyGraph::extend()` when the old
interval contains no memory instructions. When this is the case we
should do a full dependency scan of the new interval.
When the operand comes from multiple inputs then we need additional
packing code. When the operands are scalar then we can use a single
InsertElementInst. But when the operands are vectors then we need a
chain of ExtractElementInst and InsertElementInst instructions to insert
the vector value into the destination vector. This is what this patch
implements.
This patch fixes a bug in the creation of shuffle masks when vectorizing
vectors in case of a diamond reuse with shuffle. The mask needs to
enumerate all elements of a vector, not treat the original vector value
as a single element. That is: if vectorizing two <2 x float> vectors
into a <4 x float> the mask needs to have 4 indices, not just 2.
This patch fixes the way the top-of-schedule variable gets set and
updated. Before this patch it used to get updated whenever we scheduled
a bundle, which is wrong, as the top-of-schedule needs to be maintained
across scheduling attempts.
It should get reset only when we clear the schedule or when we destroy
the current schedule and re-schedule.
The TransactionAcceptOrRevert pass is the final pass in the Sandbox
Vectorizer's default pass pipeline. It's job is to check the cost
before/after vectorization and accept or revert the IR to its original
state.
Since we are now starting the transaction in BottomUpVec, tests that run
a custom pipeline need to accept the transaction. This is done with the
help of the TransactionAlwaysAccept pass (tr-accept).
This patch implements the vectorizer's callback for getting notified
about new instructions being created. This updates the scheduler state,
which may involve removing dependent instructions from the ready list
and update the "scheduled" flag.
Since we need to remove elements from the ready list, this patch also
implements the `remove()` operation.
This patch fixes a bug in the dependency node iterators that would
incorrectly not skip nodes that are not in the current DAG. This
resulted in iterators returning nullptr when dereferenced.
The fix is to update the existing "skip" function to not only skip
non-instruction values but also to skip instructions not in the DAG.
SelectInsts need special treatment because they are not always
straightforward to vectorize. This patch disables vectorization unless
they are trivially vectorizable.
`sandboxir::Context` is defined at a pass-level scope with the
`SandboxVectorizerPass` class because the function pass manager `FPM`
object depends on it, and that is in pass-level scope to avoid
recreating the pass pipeline every single time `runOnFunction()` is
called.
This means that the Context's state lives on across function passes. The
problem is twofold:
(i) the LLVM IR to Sandbox IR map can grow very large including objects
from different functions, which is of no use to the vectorizer, as it's
a function-level pass.
(ii) this can result in stale data in the LLVM IR to Sandbox IR object
map, as other passes may delete LLVM IR objects.
To fix both issues this patch introduces a `Context::clear()` function
that clears the `LLVMValueToValueMap`.
This patch removes the assertion that checks for an existing function.
If one exists it will remove it and create a new one. This helps remove
a crash when a function declaration object already exists and we are
about to create a SandboxIR object for the definition.
This patch changes the functionality of `VecUtils::getLowest(Vals, BB)`
such that it filters out any instructions in `Vals` that are not in BB.
This is useful when Vals contains instructions from different BBs,
because in that case we are only interested in one BB.
This patch implements a wrapper function for the LLVM IR verifier for
functions, and calls it (flag-guarded) within the bottom-up-vectorizer
for finding IR bugs as soon as they happen.
This patch implements cost modeling for Region. All instructions that
are added or removed get their cost counted in the Scoreboard. This is
used for checking if the region before or after a transformation is more
profitable.
This patch fixes a bug in the maintenance of the MemDGNode chain of the
DAG. Whenever we move a memory instruction, the DAG gets notified about
the move and maintains the chain of memory nodes. The bug was that if
the destination of the move was not a memory instruction, then the
memory node's next node would end up pointing to itself.
Crossing BBs is not currently supported by the structures of the
vectorizer. This patch fixes instances where this was happening,
including:
- a walk of use-def operands that updates the UnscheduledSuccs counter,
- the dead instruction removal is now done per BB,
- the scheduler, which will reject bundles that cross BBs.
VecUtils::getLowest(Valse) returns the lowest instruction in the BB among Vals.
If the instructions are not in the same BB, or if none of them is an
instruction it returns nullptr.
This patch implements the diamond pattern where we are vectorizing
toward the top of the diamond from both edges, but the second edge may
use elements from a different vector or just scalar values. This
requires some additional packing code (see lit test).
InstrMaps is a helper data structure that maps scalars to vectors and
the reverse. This is used by the vectorizer to figure out which vectors
it can extract scalar values from.
When we vectorize loads or stores we only keep the address of the first
lane. The rest may become dead. This patch adds the address operands of
vectorized loads or stores to the dead candidates set, such that they
get erased if dead.
With this patch we switch from the temporary dummy seeds to actual seeds
provided by the seed collector.
The seeds get sliced and each slice is used as the starting point for
vectorization.
The DAG maintains a chain of MemDGNodes that links together all the
nodes that may touch memroy.
Whenever a new instruction gets created we need to make sure that this
chain gets updated. If the new instruction touches memory then its
corresponding MemDGNode should be inserted into the chain.
Load/Store isSimple is a necessary condition for VectorSeeds, but not
sufficient, so reverse the condition and return value, and continue the
check. Add relevant tests.
This patch adds the callback registration logic in the DAG's constructor
and the corresponding deregistration logic in the destructor. It also
implements the code that makes sure that SchedBundle and DGNodes can be
safely destroyed in any order.
Up until now we could only support packing of scalar elements. This
patch fixes this by implementing packing of vector elements, by
generating extractelement and insertelement instruction pairs.
This patch implements packing of scalar operands when the vectorizer
decides to stop vectorizing. Packing is implemented with a sequence of
InsertElement instructions.
Packing vectors requires different instructions so it's implemented in a
follow-up patch.
This patch adds support for re-scheduling already scheduled
instructions. For now this will clear and rebuild the DAG, and will
reschedule the code using the new DAG.