Lowering transfer_read/transfer_write to load_gather/store_scatter in
case the target uArch doesn't support load_nd/store_nd. The high level
steps:
1. compute Strides;
2. compute Offsets;
3. collapseMemrefTo1D;
4. create Load gather or store_scatter op
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
[mlir][vector] Standardize base Naming Across Vector Ops (NFC)
This change standardizes the naming convention for the argument
representing the value to read from or write to in Vector ops that
interface with Tensors or MemRefs. Specifically, it ensures that all
such ops use the name `base` (i.e., the base address or location to
which offsets are applied).
Updated operations:
* `vector.transfer_read`,
* `vector.transfer_write`.
For reference, these ops already use `base`:
* `vector.load`, `vector.store`, `vector.scatter`, `vector.gather`,
`vector.expandload`, `vector.compressstore`, `vector.maskedstore`,
`vector.maskedload`.
This is a non-functional change (NFC) and does not alter the semantics of these
operations. However, it does require users of the XFer ops to switch from
`op.getSource()` to `op.getBase()`.
To ease the transition, this PR temporarily adds a `getSource()` interface
method for compatibility. This is intended for downstream use only and should
not be relied on upstream. The method will be removed prior to the LLVM 21
release.
Implements #131602
`let constructor` is deprecated since the table gen backend emits most
of the glue logic to build a pass. This PR retires the td method for
most (I need another pass) passes in the Conversion directory.
In the LLVM style guide, we prefer not using braced initializer lists to
call a constructor. Also, we prefer using an equal before the open curly
brace if we use a braced initializer list when initializing a variable.
See
https://llvm.org/docs/CodingStandards.html#do-not-use-braced-initializer-lists-to-call-a-constructor
for more details.
The style guide does not explain the reason well. There is an article
from abseil, which mentions few benefits. E.g., we can avoid the most
vexing parse, etc. See https://abseil.io/tips/88 for more details.
Signed-off-by: hanhanW <hanhan0912@gmail.com>
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.
These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.
Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.
For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
Constrains Vector lowering to apply boundary checks only to data
transfers operating on block shapes.
This further aligns lowering with the current Xe instructions'
restrictions.
Relaxes vector.transfer_write lowering to allow out-of-bound writes.
This aligns lowering with the current hardware specification which does
not update bytes in out-of-bound locations during block stores.