Fix two issues in `MatchingSubsets::populateSubsetOpsAtIterArg`:
1. The `collectHoistableOps` parameter was declared but never used when
inserting subset ops via `insert(subsetOp)`. As a result, when recursing
into nested loops with `collectHoistableOps=false`, the nested loop's
subset ops were incorrectly added to the hoistable extraction/insertion
pairs of the parent loop. This caused spurious failures in the
`allDisjoint` check, preventing valid hoisting when nested loop ops
overlapped with outer loop ops. Fix by passing the parameter:
`insert(subsetOp, collectHoistableOps)`.
2. In the nested loop handling branch, there was no guard to detect when
a value has multiple nested loop uses (i.e., is used as an init arg in
more than one nested loop). Without the guard, `nextValue` would be
silently overwritten, leading to an incorrect use-def chain traversal.
Add `if (nextValue) return failure()` before setting `nextValue` for the
nested loop case, mirroring the existing guard for insertion ops.
Fixes#147096
Assisted-by: Claude Code
SwitchOp::getEntrySuccessorRegions and getRegionInvocationBounds called
IntegerAttr::getInt() to retrieve the constant switch argument, but
getInt() asserts that the attribute type must be a signless integer or
index. For unsigned integer types (e.g. ui32), this assertion fired and
crashed the process.
Fix by selecting the appropriate accessor based on the attribute type:
getInt() for signless/index, getSInt() for signed, and getUInt() (cast
to int64_t) for unsigned integer types. Unknown types fall back to the
conservative "all regions possible" path.
The same fix is applied to getRegionInvocationBounds, which had an
identical call to getInt().
Fixes#187973
Assisted-by: Claude Code
The RemoveDeadValuesPass previously emitted an error and skipped
optimization when the IR contained non-function symbol ops, non-call
symbol user ops, or branch ops. This restriction was later removed, but
the comments in RemoveDeadValues.cpp and Passes.td still described the
pass as operating "iff the IR doesn't have any non-function symbol ops,
non-call symbol user ops and branch ops."
Remove the stale restriction text from both the .cpp file comment and
the Passes.td description. Also add a test that verifies dead function
arguments are correctly removed inside a module that defines a symbol
(has a sym_name attribute), which was the original failure case reported
in issue #98700.
Fixes#98700
Assisted-by: Claude Code
This reverts commit d9402d087ab90610d3ff8a78a50eb66d3be4cffd.
This re-applies commit e5adddc5be63b8bb8c36572f68ac64c8042cb282
along with
62eafb5cd1
Co-authored-by: Yi Zhang <cathyzhyi@google.com>
Co-authored-by: Yi Zhang <cathyzhyi@google.com>
Two related assertion failures in DeadCodeAnalysis when processing
OpenACC operations:
1. visitRegionBranchEdges (issue #187972): When a RegionSuccessor refers
to an empty region (no blocks), calling getSuccessor()->front()
dereferences a sentinel ilist iterator, crashing with
"\!NodePtr->isKnownSentinel()". Fix: skip successors whose region is
empty.
2. isRegionOrCallableReturn (issue #188408): When iterating over ops in
a nested acc region whose blocks do not have a required terminator,
Block::getTerminator() is called without first checking
mightHaveTerminator(), triggering "Assertion `mightHaveTerminator()'
failed". Fix: guard the getTerminator() call with mightHaveTerminator().
Fixes#187972, #188408
Assisted-by: Claude Code
This PR adds support for region control-flow. Region control-flow and
CFG can be mixed together in the same program. See the [accompanying
RFC](https://discourse.llvm.org/t/rfc-support-region-control-flow-in-mem2reg/90082)
for some design considerations.
Beyond the considerations in the RFC, a few minor changes were
introduced:
- Calling the visitor hook for defined values is now deferred to the end
of promotion.
- The lazy creation of default values has been moved to the places where
it happens to prepare for a future change where it is actually lazy.
Documentation about it not working as intended for now was also added.
All SCF operations are supported, including `forall` and `parallel`,
which is pretty cool I think.
I am sorry in advance for git diff displaying a really bad diff for
Mem2Reg.cpp around where the liveness analysis used to be. Do consider
simply reading this part of the code off the file.
As a disclaimer, I designed all the test cases myself, but I used a
large amount of matrix multiplications to produce the corresponding IR
and FileCheck tests. I have reviewed them carefully and they correspond
to my intent.
---------
Co-authored-by: Slava Zakharin <szakharin@nvidia.com>
The `--promote-buffers-to-stack` pass crashes when allocating a memref
whose element type is itself a memref (e.g., `memref<1xmemref<2xf32>>`).
This happens because `defaultIsSmallAlloc` calls
`DataLayout::getTypeSizeInBits` on the element type, but `MemRefType`
(and other types without `DataLayoutTypeInterface`) trigger a fatal
error when queried this way.
Fix the crash by checking whether the element type has data layout
support before computing its size. Types that are not int/float,
complex, index, or vector and do not implement `DataLayoutTypeInterface`
are silently skipped (i.e., the allocation is not promoted).
Fixes#60092
Assisted-by: Claude Code
`defaultIsSmallAlloc` called `ShapedType::getNumElements()` which
asserts when the static element count overflows `int64_t` (e.g. a
`memref<3090540x3090540x3090540xi32>` whose element count is ~29e18).
Switch to `ShapedType::tryGetNumElements()`, which returns
`std::nullopt` on overflow. An overflowing element count means the
allocation is definitely not small, so we return `false` immediately. A
secondary overflow guard is added for the final size comparison.
Fixes#64638
Assisted-by: Claude Code
When the MLIR inliner inlines a callable region that has more than one
block, it calls `handleTerminator(op, Block *newDest)` for the
terminator of every inlined block. `TestInlinerInterface` only
implemented the single-block variant (`handleTerminator(op,
ValueRange)`), so the default `llvm_unreachable` was hit when inlining a
`test.functional_region_op` whose body contained multiple blocks (e.g.
an explicit `cf.br` jump to a successor block whose terminator was
`test.return`).
Fix: add the missing `handleTerminator(op, Block *)` override to
`TestInlinerInterface`. Mirror the pattern used by
`FuncDialectInlinerExtension`: if the terminator is a `TestReturnOp`,
replace it with a `cf.br` to `newDest` carrying the return operands. Any
other terminator (e.g. `cf.br` for intra-region branches) is left
untouched — the existing `ControlFlowInlinerInterface` no-op already
handles those correctly.
Add a regression test in `test/Transforms/inlining.mlir` that inlines a
`call_indirect` into a `test.functional_region_op` with two blocks.
Fixes#185350
Assisted-by: Claude Code
Some function ops (e.g., gpu.func with workgroup memory arguments) have
more entry block arguments than their FunctionType has inputs. The
workgroup memory arguments are not part of the public function signature
but are present as additional block arguments.
`convertFuncOpTypes` previously created a `SignatureConversion` sized
only for `type.getNumInputs()`, then called `applySignatureConversion`
on the entry block. When the block had more arguments (e.g., workgroup
args), the loop in `applySignatureConversion` would call
`getInputMapping(i)` with out-of-bounds indices, causing an assertion
failure in `SmallVector::operator[]`.
Fix this by:
1. Sizing the `SignatureConversion` for all entry block arguments.
2. Adding identity mappings for extra block args beyond the function
type inputs.
3. Using only the converted function-type-input types when updating the
FunctionType (so extra block arg types are not included in the
signature).
Fixes#184744
Assisted-by: Claude Code
The stripmineSink helper splices loop body operations into a new inner
scf.for that has no iter_args. When the target loop carries iter_args,
values yielded by the spliced body are moved inside the inner loop, but
the outer loop's yield terminator still references those values,
creating an SSA invariant violation. In debug builds this triggers the
assertion
use_empty() && "Cannot destroy a value that still has uses\!"
when the outer RewriterBase tries to erase the now-broken operations.
Fix: in extractFixedOuterLoops, skip the strip-mining transformation if
any of the collected perfectly-nested loops have iter_args.
Add a regression test to parametric-tiling.mlir.
Fixes#129044
Assisted-by: Claude Code
This patch allows creating a hierarchy of `SideEffects::Resource`s by adding
a virtual `getParent()` method, so that effects on *disjoint* resources
can be proven non-conflicting. It also adds virtual `isAddressable()` method
that represents a property of a resource to be addressable via a pointer
value. The non-addressable resources may not be affected via any pointer.
This is unblocking CSE, LICM and alias analysis without per-pass
special-casing.
RFC:
https://discourse.llvm.org/t/rfc-mlir-memory-region-hierarchy-for-mlir-side-effects/89811
The `handleTerminator` implementation in the test dialect's inliner
interface was asserting that the number of `test.return` operands equals
the number of values to replace. This assertion fires when inlining a
callee whose body uses `test.return` with values into a call site that
expects zero results (e.g., a void `llvm.func` calling a function whose
implementation uses `test.return` with operands).
Replace the assertion with a conditional early return so the inliner
gracefully skips replacement instead of crashing.
Fixes#108376
Assisted-by: Claude Code
This PR improves MLIR dialect conversion failure diagnostics when
legalization fails.
Previously, the diagnostic mostly included the operation name (and in
partial conversion, whether it was explicitly marked illegal). This
change keeps that prefix and appends the printed failing operation. This
provides immediate operand/result/type context directly in the same
error line.
### Example
Before:
```
failed to legalize operation 'test.type_consumer' that was explicitly marked illegal
```
After:
```
failed to legalize operation 'test.type_consumer' that was explicitly marked illegal: "test.type_consumer"(%arg0) : (f32) -> ()
```
### Tests
- Updated `mlir/test/Transforms/test-legalizer.mlir` expectations for
the richer emitted diagnostic.
This reverts commit 7ad2c6db54a0e77249f2edb3c589ccf4c930d455.
PR #183395 introduced the `exact` flag to `index_cast` and
`index_castui` and updated some canonicalization patterns.
These canonicalization patterns were found to be unsound. For example:
* `index_cast(index_cast(x)) -> x`
* where one first truncates and then widens x
the rewrite is unsound because information is lost on the first cast as
it **may** truncate the value of x, therefore losing information. The
`exact` flag was made to make this transformation sound. Its semantics
are that when the `exact` flag is present, then it is assumed that the
operand to index_cast does not lose information (i.e., fits perfectly in
the destination type).
In PR #183395, the canonicalization rule was rewritten such that would
only match where the inner index_cast had the `exact` flag set.
* `index_cast(index_cast(x, exact)) -> x`
* where source type and destination type are the same
A post-merge review
[highlighted](https://github.com/llvm/llvm-project/pull/183395#discussion_r2880422529)
that the pattern above also disallows the following correct pattern:
* `index_cast(index_cast(x)) -> x`
* when the first index widens and the second one truncates.
Unfortunately the semantics of `index` are such that its bitwidth is
target specific. Attempts were made in
https://github.com/llvm/llvm-project/pull/184631 to automatically add
annotations were possible but no agreement was reached on the best way
to do this. Adding to the disagreement are the following points:
* [there are other unsound patterns that assume index is
64](https://github.com/llvm/llvm-project/pull/184631/changes#r2885181291)
* [The semantics of index is
contested](https://discourse.llvm.org/t/index-type-and-assumption-about-bitwidth/88287)
This lead to the belief that a reversal and an RFC would be a good
approach to get some consensus from the community before proceeding
further.
Move the operand count and type checks for func.return from
ReturnOp::verify() into a new FuncOp::verify(). The verifier iterates
all blocks in the callable region, skipping terminators that are not
func.return (e.g. llvm.return or test.return that may appear during
dialect conversion).
Fix several invalid-IR tests that had func.func return types
inconsistent with the actual func.return operands. Previously these
mismatches were silent because block verification stopped at an earlier
expected error before reaching the func.return; now that
FuncOp::verify() runs before body verification, the return types must be
consistent.
The `exact` flag with the following semantics
> If the `exact` attribute is present, it is assumed that the index type
width
> is such that the conversion does not lose information. When this
assumption
> is violated, the result is poison.
can be added to index_cast and index_castui operations. This unlocks
the following lowerings:
* index_cast (signed) exact -> trunc nsw
* index_castui (unsigned) exact -> trunc nuw
* index_castui nneg exact -> trunc nuw nsw
Changes:
* Adds ArithExactFlagInterface.
* Updates Arith_IntBinaryOpWithExactFlag to use ArithExactFlagInterface
* Update IndexCastOp and IndexCastUIOp to declare
`ArithExactFlagInterface`
* Update canonicalization patterns
* Update roundtrip, lowering, and canonicalization tests.
`processFuncOp` asserts that all symbol uses of a function are
`CallOpInterface` operations. This is violated when a function is
referenced by a non-call operation such as `spirv.EntryPoint`, which
uses the function symbol for metadata purposes without calling it.
Fix this by replacing the assertion with an early return: if any user of
the function symbol is not a `CallOpInterface`, skip the function
entirely. This is safe because the pass cannot determine the semantics
of arbitrary non-call references, so it should leave such functions
alone.
Fixes#180416
Before erasing the operation, replace all result values with live-uses
by
ub.poison values. This is important to maintain IR validity. For
example,
if we have an op with one of its results used by another op, erasing the
op without replacing its corresponding result would leave us with a
dangling operand in the user op. By replacing the result with a
ub.poison
value, we ensure that the user op still has a valid operand, even though
it's a poison value which will be cleaned up later if it can be cleaned
up. This keeps the IR valid for further simplification and
canonicalization while fixing a related crash in the canonicalizer.
Fixes https://github.com/llvm/llvm-project/issues/179944
Prior to this change, rollback of the `MoveBlockRewrite` could result in
segfault if the block wasn't contained in a region anymore.
That situation could arise if the previous rollback of another rewrite
orphaned the block by removing it from its region, as demonstrated by
the new test pattern.
Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>
This patch adds a side-effect check to `moveOperationDependencies` to
match the behavior of `moveValueDefinitions`. Previously,
`moveOperationDependencies` would move operations with side-effecting
dependencies, which could change program semantics.
**Note** that the existing test changes are needed because unregistered
operations (e.g., "moved_op"()) are treated as side-effecting. These
tests were updated to use pure operations for operations in the moved
slice, while keeping unregistered ops for operations that aren't moved
(e.g., "before"(), "foo"()). This ensures that tests continue to
exercise their intended functionality without being blocked by the new
side-effect check.
Instead of op-specific cleanup patterns for region branch ops to remove
unused results / block arguments, etc., add a set of patterns that can
handle all `RegionBranchOpInterface` ops. These patterns are enabled
only for selected SCF dialect ops at the moment:
* `scf.execute_region`
* `scf.for`
* `scf.if`
* `scf.index_switch`
* `scf.while`
It is currently not possible to register canoncalization patterns for op
interfaces and some ops have incorrect interface implementations. In
follow-up PRs, the set of ops will be gradually extended within the SCF
dialect (`scf.forall`) and across other dialects
(`gpu.warp_execute_on_lane0`, (maybe) various affine dialect ops, ...),
and maybe eventually to apply to all `RegionBranchOpInterface` ops.
This commit removes many similar canonicalization patterns from the SCF
dialect. The newly added canonicalization patterns allow users to get
the same canonicalizations for free for their own ops. And even a few
additional new canonicalizations
([example](https://github.com/llvm/llvm-project/pull/174094/files#diff-54318cd685386d5519c42be49818e388b09d934edcbe4280548baa3601802977R2241),
[example](https://github.com/llvm/llvm-project/pull/174094/files#diff-54318cd685386d5519c42be49818e388b09d934edcbe4280548baa3601802977R1101),
...).
Implementation outline: This commit adds 3 canonicalization patterns.
* `MakeRegionBranchOpSuccessorInputsDead`: Remove uses of successor
inputs, by swapping them for successor operand values.
* `RemoveDuplicateSuccessorInputUses`: Remove uses of successor inputs
that are duplicates. (Similar to `WhileRemoveDuplicatedResults` in the
SCF dialect.)
* `RemoveDeadRegionBranchOpSuccessorInputs`: Remove dead successor
inputs if all of their "tied" successor inputs are also dead. (Similar
to `WhileUnusedResult` in the SCF dialect.)
This commit simplifies the `remove-dead-values` pass and fixes a bug in
the handling of `RegionBranchOpInterface` ops. The pass used to produce
invalid IR ("null value found") for the newly added test case.
`remove-dead-values` is a pass for additional IR simplification that
cannot be performed by the canonicalizer pass. Based on a liveness
analysis, it erases dead values / IR. (The liveness analysis is a
dataflow analysis that has more information about the IR than a
canonicalization pattern, which can see only "local" information.)
Region-based ops are difficult. The liveness analysis may determine that
an SSA value is dead. However, that does not mean that the value can
actually be removed. Doing so may violate an region data flow (as
modeled by the `RegionBranchOpInterface`). As an example, consider the
case where a region branch terminator may dispatch to one of two region
successor with the same forwarded values. A successor input (block
argument) can be erased only if it is dead on both successors.
Before this commit, there used to be complex logic to determine when it
is safe to erase an SSA value. That logic was broken. The new
implementation does not remove any block arguments or op results of
region-based ops. Instead, operands of region-based ops and region
branch terminators are replaced with `ub.poison` if all of their
successor values are dead. This simplifies the IR good enough for the
canonicalizer to perform the remaining region simplification (i.e.,
dropping block arguments etc.).
RFC:
https://discourse.llvm.org/t/rfc-delegate-simplification-of-region-based-ops-from-remove-dead-values-to-canonicalizer/89194
dropRedundantArguments was incorrectly indexing into forwardedOperands
using the block argument index directly. This crashes when the block has
produced operands (generated by the terminator, not forwarded from
predecessors) because forwardedOperands doesn't include them.
The fix checks isOperandProduced() to skip produced arguments and uses
SuccessorOperands::operator[] which handles the offset correctly.
Add visitNonControlFlowArgumentst API to SparseBackwardDataFlowAnalysis,
current SparseBackwardDataflowAnalysis cannot access all SSA values,
such as, the loop's IV. Now we can use visitNonControlFlowArgumentst to
visit it. Apply it in LivenessAnalysis/RemoveDeadValues, solved the
issue of IV liveness in the loop.
https://discourse.llvm.org/t/rfc-add-visitbranchregionargument-interface-to-sparsedataflowanalysis/89061
This commit align the implementation of
`ConversionPatternRewriter::legalize` with its documentation:
```
/// Attempt to legalize the given region. This can be used within
...
LogicalResult legalize(Region *r);
```
This function now legalizes the entire region, including nested ops. The
implementation follows the same logic as the "main" traversal:
pre-order, forward-dominance.
Currently empty tensor elimination by constructing a SubsetExtractionOp
to match a SubsetInsertionOp at the end of a DPS chain will fail if any
operands required by the insertion op don't dominate the insertion point
for the extraction op.
This change improves the transformation by attempting to move all pure
producers of required operands to the insertion point of the extraction
op. In the process this improves a number of tests for empty tensor
elimination.
This commit fixes two crashes in the `-remove-dead-values` pass related
to private functions.
Private functions are considered entirely "dead" by the liveness
analysis, which drives the `-remove-dead-values` pass.
The `-remove-dead-values` pass removes dead block arguments from private
functions. Private functions are entirely dead, so all of their block
arguments are removed. However, the pass did not correctly update all
users of these dropped block arguments.
1. A side-effecting operation must be removed if one of its operands is
dead. Otherwise, the operation would end up with a NULL operand. Note:
The liveness analysis would not have marked an SSA value as "dead" if it
had a reachable side-effecting users. (Therefore, it is safe to erase
such side-effecting operations.)
2. A branch operation must be removed if one of its non-forwarded
operands is dead. (E.g., the condition value of a `cf.cond_br`.)
Whenever a terminator is removed, a `ub.unrechable` operation is
inserted. This fixes#158760.
This commit adds support for `replaceUsesWithIf` (and variants such as
`replaceAllUsesExcept`) to the `ConversionPatternRewriter`. This API is
supported only in no-rollback mode. An assertion is triggered in
rollback mode. (This missing assertion has been confusing for users
because it seemed that the API supported, while it was actually not
working properly.)
This commit brings us a bit closer towards removing
[this](76ec25f729/mlir/lib/Transforms/Utils/DialectConversion.cpp (L1214))
workaround.
Additional changes are needed to support this API in rollback mode. In
particular, no entries should be added to the `ConversionValueMapping`
for conditional replacements. It's unclear at this point if this API can
be supported in rollback mode, so this is deferred to later.
This commit turns `replaceUsesWithIf` into a virtual function, so that
the `ConversionPatternRewriter` can override it. All other API functions
for conditional value replacements call that function.
Note for LLVM integration: If you are seeing failed assertions due to
this change, you are using unsupported API in your dialect conversion.
You have 3 options: (1) Migrate to the no-rollback driver. (2) Rewrite
your patterns without the unsupported API. (3) Last resort: bypass the
rewriter and call `replaceUsesWithIf` etc. directly on the `Value`
object.
Reland https://github.com/llvm/llvm-project/pull/165725, fix the Failed
test by removing successor operands before delete operations. Following
the deletion of cond.branch, its successor operands will subsequently be
removed.
This MR modifies side effect traits of some integer arithmetic
operations in the LLVM dialect.
Prior to this MR, the LLVM dialect `sdiv` and `udiv` operations were
marked as `Pure` through `tblgen` inheritance of the
`LLVM_ArithmeticOpBase` class. The `Pure` trait allowed incorrect
hoisting of `sdiv`/`udiv` operations by the
`loop-independent-code-motion` pass.
This MR modifies the `sdiv` and `udiv` LLVM operations to have traits
and code motion behavior identical to their counterparts in the `arith`
dialect, which were established by the commit/review below.
ed39825be4https://reviews.llvm.org/D137814
By default, the dialect conversion driver processes operations in
pre-order: the initial worklist is populated pre-order. (New/modified
operations are immediately legalized recursively.)
This commit adds a new API for selective post-order legalization.
Patterns can request an operation / region legalization via
`ConversionPatternRewriter::legalize`. They can call these helper
functions on nested regions before rewriting the operation itself.
Note: In rollback mode, a failed recursive legalization typically leads
to a conversion failure. Since recursive legalization is performed by
separate pattern applications, there is no way for the original pattern
to recover from such a failure.
When converting a function, convert only the entry block signature. The
remaining block signatures should be converted by the respective
branching ops. The `FuncToLLVM` / `ControlFlowToLLVM` patterns already
use that design.
```c++
struct BranchOpLowering : public ConvertOpToLLVMPattern<cf::BranchOp> {
LogicalResult
matchAndRewrite(cf::BranchOp op, OneToNOpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const override {
// Convert successor block.
SmallVector<Value> flattenedAdaptor = flattenValues(adaptor.getOperands());
FailureOr<Block *> convertedBlock =
getConvertedBlock(rewriter, getTypeConverter(), op, op.getSuccessor(),
TypeRange(ValueRange(flattenedAdaptor)));
// ...
}
};
```
This is consistent with the fact that operations from unreachable blocks
are not put on the initial worklist.
With this change, parent ops are no longer recursively legalized when
inserting a block, simplifying the conversion driver a bit.
Note for LLVM integration: If you are seeing failures, make sure to:
- Drop `converter.isLegal(&op.getBody())` when checking the legality of
a function op. Only the entry block signature / function type should be
taken into account.
- If you need to convert all reachable blocks and are using `cf`
branching ops, add `populateCFStructuralTypeConversionsAndLegality`.
- If you need to convert all reachable blocks and are using custom
branching ops, implement and populate custom structural type conversion
patterns, similar to `populateCFStructuralTypeConversionsAndLegality`.
Add structural type conversion patterns for CF dialect ops. These
patterns are similar to the SCF structural type conversion patterns.
This commit adds missing functionality and is in preparation of #165180,
which changes the way blocks are converted. (Only entry blocks are
converted.)
Fix https://github.com/llvm/llvm-project/issues/157934. In liveness
analysis, variables that are not analyzed are set as dead variables, but
some variables are definitely live.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
Add hoist-dynamic-allocs-option to buffer-results-to-out-params. This PR
supported that obtain the size of the dynamic shape memref through the
caller-callee relationship.
In #153973 I added the correctly handling of block arguments,
unfortunately this was gated on operation that also have results. This
wasn't intentional and this excluded operations like function from being
correctly processed.
## Problem
`RemoveDeadValues` can legally drop dead function arguments on private
`func.func` callees. But call-sites to such functions aren't fixed if
the call operation keeps its call arguments in a **segmented operand
group** (i.ie, uses `AttrSizedOperandSegments`), unless the call op
implements `getArgOperandsMutable` and the RDV pass actually uses it.
## Fix
When RDV decides to drop callee function args, it should, for each
call-site that implements `CallOpInterface`, **shrink the call's
argument segment** via `getArgOperandsMutable()` using the same dead-arg
indices. This keeps both the flat operand list and the
`operand_segment_sizes` attribute in sync (that's what
`MutableOperandRange` does when bound to the segment).
## Note
This change is a no-op for:
* call ops without segment operands (they still get their flat operands
erased via the generic path)
* call ops whose calle args weren't dropped (public, external,
non-`func-func`, unresolved symbol, etc)
* `llvm.call`/`llvm.invoke` (RDV doesn't drop `llvm.func` args
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
This commit generalizes `replaceUsesOfBlockArgument` to
`replaceAllUsesWith`. In rollback mode, the same restrictions keep
applying: a value cannot be replaced multiple times and a call to
`replaceAllUsesWith` will replace all current and future uses of the
`from` value.
`replaceAllUsesWith` is now fully supported and its behavior is
consistent with the remaining dialect conversion API. Before this
commit, `replaceAllUsesWith` was immediately reflected in the IR when
running in rollback mode. After this commit, `replaceAllUsesWith`
changes are materialized in a delayed fashion, at the end of the dialect
conversion. This is consistent with the `replaceUsesOfBlockArgument` and
`replaceOp` APIs.
`replaceAllUsesExcept` etc. are still not supported and will be
deactivated on the `ConversionPatternRewriter` (when running in rollback
mode) in a follow-up commit.
Note for LLVM integration: Replace `replaceUsesOfBlockArgument` with
`replaceAllUsesWith`. If you are seeing failures, you may have patterns
that use `replaceAllUsesWith` incorrectly (e.g., being called multiple
times on the same value) or bypass the rewriter API entirely. E.g., such
failures were mitigated in Flang by switching to the walk-patterns
driver (#156171).
You can temporarily reactivate the old behavior by calling
`RewriterBase::replaceAllUsesWith`. However, note that that behavior is
faulty in a dialect conversion. E.g., the base
`RewriterBase::replaceAllUsesWith` implementation does not see uses of
the `from` value that have not materialized yet and will, therefore, not
replace them.