The `SeedCollector` class gets two new arguments: `CollectStores` and
`CollectLoads`. These replace the `sbvec-collect-seeds` cl::opt flag.
This is done to help with reusing the SeedCollector class in a future
pass. The cl::opt flag is moved to the seed collection pass:
Passes/SeedCollection.cpp
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Transforms`
library. These annotations currently have no meaningful impact on the
LLVM build; however, they are a prerequisite to support an LLVM Windows
DLL (shared library) build.
## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.
The following manual adjustments were also applied after running IDS on
Linux:
- Removed a redundant `operator<<` from Attributor.h. IDS only
auto-annotates the 1st declaration, and the 2nd declaration being
un-annotated resulted in an "inconsistent linkage" error on Windows when
building LLVM as a DLL.
- `#include` the `VirtualFileSystem.h` in PGOInstrumentation.h and
remove the local declaration of the `vfs::FileSystem` class. This is
required because exporting the `PGOInstrumentationUse` constructor
requires the class be fully defined because it is used by an argument.
- Add #include "llvm/Support/Compiler.h" to files where it was not
auto-added by IDS due to no pre-existing block of include statements.
- Add `LLVM_TEMPLATE_ABI` and `LLVM_EXPORT_TEMPLATE` to exported
instantiated templates.
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
This patch implements a simple pass that tries to de-duplicate packs. If
there are two packing patterns inserting the exact same values in the
exact same order, then we will keep the top-most one of them. Even
though such patterns may be optimized away by subsequent passes it is
still useful to do this within the vectorizer because otherwise the cost
estimation may be off, making the vectorizer over conservative.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
RegionFromBBs is a helper Sandbox IR function pass that builds a region
for each BB. This helps test region passes in isolation without relying
on other parts of the vectorizer, which is especially useful for
stress-testing.
This patch updates/adds LLVM_DEBUG dumps.
It moves the DEBUG_TYPE into SandboxVectorizer/Debug.h such that it can
be shared across all components of the vectorizer.
This patch adds a helper flag for bisection debugging. This flag
force-stops vectorization after this many bundles have been considered
for vectorization.
Using -sbvec-stop-bndl=0 will not vectorize the code at all.
This new option lets you specify an allow-list of source files and
disables vectorization if the IR is not in the list. This can be used
for debugging miscompiles.
This patch implements the check for not allowing re-scheduling of
instructions that have already been scheduled in a scheduling bundle.
Rescheduling should only happen if the instructions were temporarily
scheduled in singleton bundles during a previous call to
`trySchedule()`.
Up until now the generation of vector instructions was taking place
during the top-down post-order traversal of vectorizeRec(). The issue
with this approach is that the vector instructions emitted during the
traversal can be reordered by the scheduler, making it challenging to
place them without breaking the def-before-uses rule.
With this patch we separate the vectorization decisions (done in
`vectorizeRec()`) from the code generation phase (`emitVectors()`). The
vectorization decisions are stored in the `Actions` vector and are used
by `emitVectors()` to drive code generation.
In a particular scenario (see test) we used to insert scheduled
instructions into the ready list. This patch fixes this by fixing the
trimSchedule() function.
This patch implements a small region pass that saves the context's
state. The patch is now used in the default pipeline to save the context
state instead of the hard-coded call to Context::save().
The concept behind this is that the passes themselves should not have to
do the actual saving/restoring of the IR state, because that would make
it challenging to reorder them in the pipeline. Having separate
save/restore passes makes the transformation passes more composable as
parts of arbitrary pipelines.
This patch moves the seed collection logic from the BottomUpVec pass
into a new Sandbox IR Function pass. The new "seed-collection" pass
collects the seeds, builds a region and runs the region pass pipeline.
This patch fixes a bug in `DependencyGraph::extend()` when the old
interval contains no memory instructions. When this is the case we
should do a full dependency scan of the new interval.
When the operand comes from multiple inputs then we need additional
packing code. When the operands are scalar then we can use a single
InsertElementInst. But when the operands are vectors then we need a
chain of ExtractElementInst and InsertElementInst instructions to insert
the vector value into the destination vector. This is what this patch
implements.
This patch fixes a bug in the creation of shuffle masks when vectorizing
vectors in case of a diamond reuse with shuffle. The mask needs to
enumerate all elements of a vector, not treat the original vector value
as a single element. That is: if vectorizing two <2 x float> vectors
into a <4 x float> the mask needs to have 4 indices, not just 2.
This patch fixes the way the top-of-schedule variable gets set and
updated. Before this patch it used to get updated whenever we scheduled
a bundle, which is wrong, as the top-of-schedule needs to be maintained
across scheduling attempts.
It should get reset only when we clear the schedule or when we destroy
the current schedule and re-schedule.
The TransactionAcceptOrRevert pass is the final pass in the Sandbox
Vectorizer's default pass pipeline. It's job is to check the cost
before/after vectorization and accept or revert the IR to its original
state.
Since we are now starting the transaction in BottomUpVec, tests that run
a custom pipeline need to accept the transaction. This is done with the
help of the TransactionAlwaysAccept pass (tr-accept).
This patch implements the vectorizer's callback for getting notified
about new instructions being created. This updates the scheduler state,
which may involve removing dependent instructions from the ready list
and update the "scheduled" flag.
Since we need to remove elements from the ready list, this patch also
implements the `remove()` operation.
This patch fixes a bug in the dependency node iterators that would
incorrectly not skip nodes that are not in the current DAG. This
resulted in iterators returning nullptr when dereferenced.
The fix is to update the existing "skip" function to not only skip
non-instruction values but also to skip instructions not in the DAG.
SelectInsts need special treatment because they are not always
straightforward to vectorize. This patch disables vectorization unless
they are trivially vectorizable.
`sandboxir::Context` is defined at a pass-level scope with the
`SandboxVectorizerPass` class because the function pass manager `FPM`
object depends on it, and that is in pass-level scope to avoid
recreating the pass pipeline every single time `runOnFunction()` is
called.
This means that the Context's state lives on across function passes. The
problem is twofold:
(i) the LLVM IR to Sandbox IR map can grow very large including objects
from different functions, which is of no use to the vectorizer, as it's
a function-level pass.
(ii) this can result in stale data in the LLVM IR to Sandbox IR object
map, as other passes may delete LLVM IR objects.
To fix both issues this patch introduces a `Context::clear()` function
that clears the `LLVMValueToValueMap`.
This patch removes the assertion that checks for an existing function.
If one exists it will remove it and create a new one. This helps remove
a crash when a function declaration object already exists and we are
about to create a SandboxIR object for the definition.
This patch changes the functionality of `VecUtils::getLowest(Vals, BB)`
such that it filters out any instructions in `Vals` that are not in BB.
This is useful when Vals contains instructions from different BBs,
because in that case we are only interested in one BB.
This patch implements a wrapper function for the LLVM IR verifier for
functions, and calls it (flag-guarded) within the bottom-up-vectorizer
for finding IR bugs as soon as they happen.
This patch implements cost modeling for Region. All instructions that
are added or removed get their cost counted in the Scoreboard. This is
used for checking if the region before or after a transformation is more
profitable.