When the operand comes from multiple inputs then we need additional
packing code. When the operands are scalar then we can use a single
InsertElementInst. But when the operands are vectors then we need a
chain of ExtractElementInst and InsertElementInst instructions to insert
the vector value into the destination vector. This is what this patch
implements.
This patch fixes a bug in the creation of shuffle masks when vectorizing
vectors in case of a diamond reuse with shuffle. The mask needs to
enumerate all elements of a vector, not treat the original vector value
as a single element. That is: if vectorizing two <2 x float> vectors
into a <4 x float> the mask needs to have 4 indices, not just 2.
This patch fixes the way the top-of-schedule variable gets set and
updated. Before this patch it used to get updated whenever we scheduled
a bundle, which is wrong, as the top-of-schedule needs to be maintained
across scheduling attempts.
It should get reset only when we clear the schedule or when we destroy
the current schedule and re-schedule.
The TransactionAcceptOrRevert pass is the final pass in the Sandbox
Vectorizer's default pass pipeline. It's job is to check the cost
before/after vectorization and accept or revert the IR to its original
state.
Since we are now starting the transaction in BottomUpVec, tests that run
a custom pipeline need to accept the transaction. This is done with the
help of the TransactionAlwaysAccept pass (tr-accept).
This patch implements the vectorizer's callback for getting notified
about new instructions being created. This updates the scheduler state,
which may involve removing dependent instructions from the ready list
and update the "scheduled" flag.
Since we need to remove elements from the ready list, this patch also
implements the `remove()` operation.
SelectInsts need special treatment because they are not always
straightforward to vectorize. This patch disables vectorization unless
they are trivially vectorizable.
This patch changes the functionality of `VecUtils::getLowest(Vals, BB)`
such that it filters out any instructions in `Vals` that are not in BB.
This is useful when Vals contains instructions from different BBs,
because in that case we are only interested in one BB.
Crossing BBs is not currently supported by the structures of the
vectorizer. This patch fixes instances where this was happening,
including:
- a walk of use-def operands that updates the UnscheduledSuccs counter,
- the dead instruction removal is now done per BB,
- the scheduler, which will reject bundles that cross BBs.
This patch implements the diamond pattern where we are vectorizing
toward the top of the diamond from both edges, but the second edge may
use elements from a different vector or just scalar values. This
requires some additional packing code (see lit test).
InstrMaps is a helper data structure that maps scalars to vectors and
the reverse. This is used by the vectorizer to figure out which vectors
it can extract scalar values from.
When we vectorize loads or stores we only keep the address of the first
lane. The rest may become dead. This patch adds the address operands of
vectorized loads or stores to the dead candidates set, such that they
get erased if dead.
With this patch we switch from the temporary dummy seeds to actual seeds
provided by the seed collector.
The seeds get sliced and each slice is used as the starting point for
vectorization.
Up until now we could only support packing of scalar elements. This
patch fixes this by implementing packing of vector elements, by
generating extractelement and insertelement instruction pairs.
This patch implements packing of scalar operands when the vectorizer
decides to stop vectorizing. Packing is implemented with a sequence of
InsertElement instructions.
Packing vectors requires different instructions so it's implemented in a
follow-up patch.
This patch adds support for re-scheduling already scheduled
instructions. For now this will clear and rebuild the DAG, and will
reschedule the code using the new DAG.
This patch registers the "createInstr" callback that notifies the
scheduler about newly created instructions. This guarantees that all
newly created instructions have a corresponding DAG node associated with
them. Without this the pass crashes when the scheduler encounters the
newly created vector instructions.
This patch also changes the lifetime of the sandboxir Ctx variable in
the SandboxVectorizer pass. It needs to be destroyed after the passes
get destroyed. Without this change when components like the Scheduler
get destroyed Ctx will have already been freed, which is not legal.
My previous attempt (#111904) hacked creation of Regions from metadata
into the bottom-up vectorizer. I got some feedback that it should be its
own pass. So now we have two SandboxIR function passes (`BottomUpVec`
and `RegionsFromMetadata`) that are interchangeable, and we could have
other SandboxIR function passes doing other kinds of transforms, so this
commit revamps pipeline creation and parsing.
First, `sandboxir::PassManager::setPassPipeline` now accepts pass
arguments in angle brackets. Pass arguments are arbitrary strings that
must be parsed by each pass, the only requirement is that nested angle
bracket pairs must be balanced, to allow for nested pipelines with more
arguments. For example:
```
bottom-up-vec<region-pass-1,region-pass-2<arg>,region-pass-3>
```
This has complicated the parser a little bit (the loop over pipeline
characters now contains a small state machine), and we now have some new
test cases to exercise the new features.
The main SandboxVectorizerPass now contains a customizable pipeline of
SandboxIR function passes, defined by the `sbvec-passes` flag. Region
passes for the bottom-up vectorizer pass are now in pass arguments (like
in the example above).
Because we have now several classes that can build sub-pass pipelines,
I've moved the logic that interacts with PassRegistry.def into its own
files (PassBuilder.{h,cpp} so it can be easily reused.
Finally, I've added a `RegionsFromMetadata` function pass, which will
allow us to run region passes in isolation from lit tests without
relying on the bottom-up vectorizer, and a new lit test that does
exactly this.
Note that the new pipeline parser now allows empty pipelines. This is
useful for testing. For example, if we use
```
-sbvec-passes="bottom-up-vec<>"
```
SandboxVectorizer converts LLVM IR to SandboxIR and runs the bottom-up
vectorizer, but no region passes afterwards.
```
-sbvec-passes=""
```
SandboxVectorizer converts LLVM IR to SandboxIR and runs no passes on
it. This is useful to exercise SandboxIR conversion on its own.
https://github.com/llvm/llvm-project/pull/111223 was reverted because of
a build failure with `-DBUILD_SHARED_LIBS=on`.
The Passes component depends on Vectorizer (because PassBuilder needs to
be able to instantiate SandboxVectorizerPass). This resulted in CMake
doing this
1. when it builds lib/libLLVMVectorize.so.20.0git it adds
lib/libLLVMSandboxIR.so.20.0git to the command line, because it's listed
as a dependency (as expected)
2. when it's trying to build lib/libLLVMPasses.so.20.0git it adds
lib/libLLVMVectorize.so.20.0git to the command line, because it's listed
as a dependency (also as expected). But not libLLVMSandboxIR.so.
When SandboxVectorizerPass has its ctors/dtors defined inline, this
caused "undefined reference to vtable" linker errors. This change works
around that by moving ctors/dtors out of line.
Also fix a bazel build problem by adding the new
`llvm/lib/Transforms/Vectorize/SandboxVectorizer/Passes/PassRegistry.def`
as a textual header in the Vectorizer target.
The main change is that the main SandboxVectorizer pass no longer has a
pipeline of function passes. Now it is a wrapper that creates sandbox IR
from functions before calling BottomUpVec.
BottomUpVec now builds its own RegionPassManager from the `sbvec-passes`
flag, using a PassRegistry.def file. For now, these region passes are
not run (BottomUpVec doesn't create Regions yet), and only a null pass
for testing exists.
This commit also changes the ownership model for sandboxir::PassManager:
instead of having a PassRegistry that owns passes, and PassManagers that
contain non-owning pointers to the passes, now PassManager owns (via
unique pointers) the passes it contains.
PassRegistry is now deleted, and the logic to parse and create a pass
pipeline is now in PassManager::setPassPipeline.
This patch adds support for a user-defined pass-pipeline that overrides
the default pipeline of the vectorizer.
This will commonly be used by lit tests.
This patch implements a new empty pass for the Bottom-up vectorizer and
creates a pass pipeline that includes it.
The SandboxVectorizer LLVM pass runs the Sandbox IR pass pipeline.
This patch implements the new pass and registers it with the pass
manager. For context, this is a vectorizer that operates on Sandbox IR,
which is a transactional IR on top of LLVM IR.