279 Commits

Author SHA1 Message Date
Kazu Hirata
e3a3f78f35
[CodeGen] Use llvm::append_range (NFC) (#133603) 2025-03-29 16:53:02 -07:00
Jeffrey Byrnes
3963d21482
[MachineSink] Fix typo in loop sinking (#127133)
Failure to sink a candidate should not block us from attempting to sink
other candidates. There are mechanisms in place to handle the case where
the failed to be sunk instruction uses an instruction that gets sunk (we
do not delete the original instruction corresponding with the sunk
instruction if it still has uses).
2025-03-03 17:30:12 -08:00
Akshat Oke
77f44a9642
[CodeGen][NewPM] Port MachineSink to NPM (#115434)
Targets can set the EnableSinkAndFold option in CGPassBuilderOptions for
the NPM pipeline in buildCodeGenPipeline(... &Opts, ...)
2025-03-03 15:49:37 +05:30
Craig Topper
81a8b5c579 [MachineSink] Use Register and MCRegUnit. NFC 2025-03-01 21:57:04 -08:00
Kazu Hirata
9fecb4f907 [CodeGen] Fix a warning
This patch fixes:

  llvm/lib/CodeGen/MachineSink.cpp:1667:22: error: unused variable
  'Preheader' [-Werror,-Wunused-variable]
2025-01-23 19:37:28 -08:00
Jeffrey Byrnes
acb7859f07
[MachineSink] Extend loop sinking capability (#117247)
The current MIR cycle sinking capabilities are rather limited. It only
support sinking copies into a single successor block while obeying
limits.

This opt-in feature adds a more aggressive option, that is not limited
to the above concerns. The feature will try to "sink" by duplicating any
top-level preheader instruction (that we are sure is safe to sink) into
any user block, then does some dead code cleanup. In particular, this is
useful for high RP situations when loop bodies have control flow.
2025-01-23 17:08:23 -08:00
Kazu Hirata
fa9fb2ae94
[CodeGen] Avoid repeated hash lookups (NFC) (#123447) 2025-01-18 09:44:23 -08:00
Pengcheng Wang
b6ad231666
[MachineSink] Use RegisterClassInfo::getRegPressureSetLimit (#119830)
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of
`TargetRegisterInfo::getRegPressureSetLimit` with some logics to
adjust the limit by removing reserved registers.

It seems that we shouldn't use
`TargetRegisterInfo::getRegPressureSetLimit`
directly, just like the comment "This limit must be adjusted
dynamically for reserved registers" said.

Separate from https://github.com/llvm/llvm-project/pull/118787
2024-12-18 14:51:01 +08:00
paperchalice
1562b70eaf
Reapply "[DomTreeUpdater] Move critical edge splitting code to updater" (#119547)
This relands commit #115111.
Use traditional way to update post dominator tree, i.e. break critical
edge splitting into insert, insert, delete sequence.
When splitting critical edges, the post dominator tree may change its
root node, and `setNewRoot` only works in normal dominator tree...
See

6c7e5827ed/llvm/include/llvm/Support/GenericDomTree.h (L684-L687)
2024-12-13 11:43:09 +08:00
paperchalice
553058f825
Revert "[DomTreeUpdater] Move critical edge splitting code to updater" (#119512)
Reverts llvm/llvm-project#115111 Causes #119511
2024-12-11 14:25:17 +08:00
paperchalice
79047fac65
[DomTreeUpdater] Move critical edge splitting code to updater (#115111)
Support critical edge splitting in dominator tree updater. Continue the
work in #100856.

Compile time check:
https://llvm-compile-time-tracker.com/compare.php?from=87c35d782795b54911b3e3a91a5b738d4d870e55&to=42b3e5623a9ab4c3648564dc0926b36f3b438a3a&stat=instructions%3Au
2024-12-11 11:31:42 +08:00
Philip Reames
6657d4bd70
[TTI][RISCV] Unconditionally break critical edges to sink ADDI (#108889)
This looks like a rather weird change, so let me explain why this isn't
as unreasonable as it looks. Let's start with the problem it's solving.

```
define signext i32 @overlap_live_ranges(ptr %arg, i32 signext %arg1) { bb:
  %i = icmp eq i32 %arg1, 1
  br i1 %i, label %bb2, label %bb5

bb2:                                              ; preds = %bb
  %i3 = getelementptr inbounds nuw i8, ptr %arg, i64 4
  %i4 = load i32, ptr %i3, align 4
  br label %bb5

bb5:                                              ; preds = %bb2, %bb
  %i6 = phi i32 [ %i4, %bb2 ], [ 13, %bb ]
  ret i32 %i6
}
```

Right now, we codegen this as:

```
	li	a3, 1
	li	a2, 13
	bne	a1, a3, .LBB0_2
	lw	a2, 4(a0)
.LBB0_2:
	mv	a0, a2
	ret
```

In this example, we have two values which must be assigned to a0 per the
ABI (%arg, and the return value). SelectionDAG ensures that all values
used in a successor phi are defined before exit the predecessor block.
This creates an ADDI to materialize the immediate in the entry block.

Currently, this ADDI is not sunk into the tail block because we'd have
to split a critical edges to do so. Note that if our immediate was
anything large enough to require two instructions we *would* split this
critical edge.

Looking at other targets, we notice that they don't seem to have this
problem. They perform the sinking, and tail duplication that we don't.
Why? Well, it turns out for AArch64 that this is entirely an accident of
the existance of the gpr32all register class. The immediate is
materialized into the gpr32 class, and then copied into the gpr32all
register class. The existance of that copy puts us right back into the
two instruction case noted above.

This change essentially just bypasses this emergent behavior aspect of
the aarch64 behavior, and implements the same "always sink immediates"
behavior for RISCV as well.
2024-11-25 18:59:31 -08:00
Ellis Hoag
e72209db35
[MachineSink] Fix stable sort comparator (#116705)
Fix the comparator in `stable_sort()` to satisfy the strict weak
ordering requirement.

In https://github.com/llvm/llvm-project/pull/115367 this comparator was
changed to use `getCycleDepth()` when `shouldOptimizeForSize()` is true.
However, I mistakenly changed to logic so that we use `LHSFreq <
RHSFreq` if **either** of them are zero. This causes us to fail the last
requirment (https://en.cppreference.com/w/cpp/named_req/Compare).

> if comp(a, b) == true and comp(b, c) == true then comp(a, c) == true
2024-11-19 16:15:35 -08:00
Akshat Oke
43bef75fd6
[NFC][CodeGen] Clang format MachineSink.cpp (#114027)
Preparing to port this pass to new pass manager.
2024-11-14 18:48:35 +05:30
Kazu Hirata
735ab61ac8
[CodeGen] Remove unused includes (NFC) (#115996)
Identified with misc-include-cleaner.
2024-11-12 23:15:06 -08:00
Ellis Hoag
57c33acac8
[MachineSink] Sink into consistent blocks for optsize funcs (#115367)
Do not consider profile data when choosing a successor block to sink
into for optsize functions. This should result in more consistent
instruction sequences which will improve outlining and ICF. We've
observed a slight codesize improvement in a large binary. This is
similar reasoning to https://github.com/llvm/llvm-project/pull/114607.

Using profile data to select a block to sink into was original added in
d04f7596e7.
2024-11-12 09:53:27 -08:00
Ruiling, Song
e33e087a17
[MachineSink] Update register dependency correctly (#109763)
The accumulateUsedDefed() was missing if block prologue interference
check does not pass. This would cause incorrect register dependency,
which cause incorrect sinking.
2024-09-25 08:42:40 +08:00
Stephen Tozer
3d08ade7bd
[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149)
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:

https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850

This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).

Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
2024-08-29 17:53:32 +01:00
Kazu Hirata
83fc989a22
[CodeGen] Construct SmallVector with iterator ranges (NFC) (#105622) 2024-08-22 09:15:47 -07:00
Pengcheng Wang
ed4e75d5e5
[CodeGen] Remove AA parameter of isSafeToMove (#100691)
This `AA` parameter is not used and for most uses they just pass
a nullptr.

The use of `AA` was removed since 8d0383e.
2024-07-26 15:47:47 +08:00
Craig Topper
495d3ea989
[MachineSink][RISCV] Only call isConstantPhysReg or isIgnorableUse for uses. (#99363)
The included test case contains X0 as a def register. X0 is considered a
constant register when it is a use. When its a def, it means to throw
away the result value.

If we treat it as a constant register here, we will execute the continue
and not assign `DefReg` to any register. This will cause a crash when
trying to get the register class for `DefReg` after the loop.

By only checking isConstantPhysReg for uses, we will reach the `return
false` a little further down and stop processing this instruction.
2024-07-17 13:06:58 -07:00
yozhu
7b135f7c08
[MachineSink] Check predecessor/successor relationship between two basic blocks involved in critical edge splitting (#98540)
Fix an issue in #97618 - if the two basic blocks involved are not
predecessor / successor to each other, treat the candidate as illegal
for critical edge splitting.

Closes #98477 (checked in test copied from its comment).
2024-07-13 01:39:27 +02:00
paperchalice
099899961c
[CodeGen][NewPM] Port machine-block-freq to new pass manager (#98317)
- Add `MachineBlockFrequencyAnalysis`.
- Add `MachineBlockFrequencyPrinterPass`.
- Use `MachineBlockFrequencyInfoWrapperPass` in legacy pass manager.
- `LazyMachineBlockFrequencyInfo::print` is empty, drop it due to new
pass manager migration.
2024-07-12 15:45:01 +08:00
YongKang Zhu
04c8c95c24 Revert "[MachineSink] Only add sink candidate if ToBB is a successor of fromBB"
This reverts commit 546c09018a615388a36bdf898649fffbd2df529f.
2024-07-11 11:14:33 -07:00
YongKang Zhu
546c09018a [MachineSink] Only add sink candidate if ToBB is a successor of fromBB 2024-07-11 11:12:53 -07:00
Min-Yih Hsu
7e2f96194f
[MachineSink] Fix missing sinks along critical edges (#97618)
4e0bd3f improved early MachineLICM's capabilities to hoist COPY from
physical registers out of a loop. However, it accidentally broke one of
MachineSink's preconditions on sinking cheap instructions (in this case,
COPY) which considered those instructions being profitable to sink only
when there are at least two of them in the same def-use chain in the
same basic block. So if early MachineLICM hoisted one of them out,
MachineSink no longer sink rest of the cheap instructions. This results
in redundant load immediate instructions from the motivating example
we've seen on RISC-V.

This patch fixes this by teaching MachineSink that if there is more than
one demand to sink a register into the same block from different
critical edges, it should be considered profitable as it increases the
CSE opportunities.
This change also improves two of the AArch64's cases.
2024-07-09 10:48:22 -07:00
paperchalice
79d0de2ac3
[CodeGen][NewPM] Port machine-loops to new pass manager (#97793)
- Add `MachineLoopAnalysis`.
- Add `MachineLoopPrinterPass`.
- Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.
2024-07-09 09:11:18 +08:00
paperchalice
d38b518e04
Reapply "[CodeGen][NewPM] Port machine-branch-prob to new pass manager" (#96858) (#96869)
This reverts commit ab58b6d58edf6a7c8881044fc716ca435d7a0156.
In `CodeGen/Generic/MachineBranchProb.ll`, `llc` crashed with dumped MIR
when targeting PowerPC. Move test to `llc/new-pm`, which is X86
specific.
2024-06-28 10:59:23 +08:00
paperchalice
ab58b6d58e
Revert "[CodeGen][NewPM] Port machine-branch-prob to new pass manager" (#96858)
Reverts llvm/llvm-project#96389
Some ppc bots failed.
2024-06-27 15:00:17 +08:00
paperchalice
73e46c2bb4
[CodeGen][NewPM] Port machine-branch-prob to new pass manager (#96389)
Like IR version `print<branch-prob>`, there is also a
`print<machine-branch-prob>`.
2024-06-27 14:04:51 +08:00
Kazu Hirata
50e222fa27
[MachineSink] Use SmallDenseMap (NFC) (#95676)
The use of SmallDenseMap saves 0.39% of heap allocations during the
compilation of a large preprocessed file, namely X86ISelLowering.cpp,
for the X86 target.
2024-06-15 13:21:07 -07:00
paperchalice
4b24c2dfb5
[CodeGen][NewPM] Split MachinePostDominators into a concrete analysis result (#95113)
`MachinePostDominators` version of #94571.
2024-06-12 14:29:22 +08:00
paperchalice
837dc542b1
[CodeGen][NewPM] Split MachineDominatorTree into a concrete analysis result (#94571)
Prepare for new pass manager version of `MachineDominatorTreeAnalysis`.
We may need a machine dominator tree version of `DomTreeUpdater` to
handle `SplitCriticalEdge` in some CodeGen passes.
2024-06-11 21:27:14 +08:00
Xu Zhang
f6d431f208
[CodeGen] Make the parameter TRI required in some functions. (#85968)
Fixes #82659

There are some functions, such as `findRegisterDefOperandIdx` and  `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI  parameters, as shown in issue #82411.

Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`,  `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.

After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
2024-04-24 14:24:14 +01:00
Jay Foad
2df652a691
[CodeGen] Simplify updateLiveIn in MachineSink (#79831)
When a whole register is added a basic block's liveins, use
LaneBitmask::getAll for the live lanes instead of trying to calculate an
accurate mask of the lanes that comprise the register.

This simplifies the code and matches other places where a whole register
is marked as livein.

This also avoids problems when regunits that are synthesized by TableGen
to represent ad hoc aliasing have a lane mask of 0.

Fixes #78942
2024-02-15 10:39:05 +00:00
Momchil Velikov
d7ee99a4fc
[MachineSink] Clear kill flags of sunk addressing mode registers (#75072)
When doing sink-and-fold, the MachineSink clears the "killed" flags of
the operands of the sunk (and deleted) instruction. However, this is not
always sufficient. In some cases we can create the new load/store
instruction with operands other than the ones present in the deleted
instruction. One such example is folding a zero word extend into a
memory load on AArch64. The zero-extend is represented by a pair of
instructions - `MOV` (i.e. `ORRwrs`) followed by a `SUBREG_TO_REG`. The
`SUBREG_TO_REG` is deleted (it is the sunk instruction), but the new
load instruction mentions operands "killed" in the `MOV`, which is no
longer correct.

To fix this, clear the "killed" flags of the registers participating in
the addressing mode.
2023-12-13 09:15:28 +00:00
Momchil Velikov
6b87d84ff4
[MachineSink] Some more preserving of debug location when rematerialising an instruction to replace a COPY (#73155)
Somewhat similar to ef9bcace834e63f25bbbc5e8e2b615f89d85fb2f
([MachineSink][AArch64] Preserve debug location when rematerialising
an instruction to replace a COPY (#72685))
reuse the debug location of the COPY, iff the rematerialised instruction
did not have a location.

Fixes a regression in `DebugInfo/AArch64/constant-dbgloc.ll` after
enabling sink-and-fold.
2023-11-24 09:46:03 +00:00
Momchil Velikov
ef9bcace83
[MachineSink][AArch64] Preserve debug location when rematerialising an instruction to replace a COPY (#72685)
Fixes a regression in `tools/lldb-dap/optimized/TestDAP_optimized.py`
caused by enabling "sink-and-fold" in MachineSink.
2023-11-21 10:10:23 +00:00
Momchil Velikov
e8209b2486
[MachineSink] Drop debug info for instructions deleted by sink-and-fold (#71443)
After performing sink-and-fold over a COPY, the original instruction is
replaced with one that produces its output in the destination of the
copy. Its value is still available (in a hard register), so if there are
debug instructions which refer to the (now deleted) virtual register
they could be updated to refer to the hard register, in principle.
However, it's not clear how to do that, moreover in some cases the debug
instructions may need to be replicated proportionally to the number of
the COPY instructions replaced and in some extreme cases we can end up
with quadratic increase in the number of debug instructions, e.g:

        int f(int);
    
        void g(int x) {
          int y = x + 1;
    
          int t0 = y;
          f(t0);
    
          int t1 = y;
          f(t1);
        }
2023-11-11 19:43:14 +00:00
Momchil Velikov
2ceabf6bdc
[MachineSink] Reduce the number of unnecessary invalidations of StoreInstrCache (NFC) (#68676)
Don't invalidate the cache when erasing instructions which cannot ever
appear in the cache.
2023-10-12 10:06:19 +01:00
Momchil Velikov
86d9faa5a9
[MachineSink] Use LLVM ADTs (NFC) (#68677)
Replace a few uses of `std::map` with `llvm::DenseMap`.
2023-10-12 10:04:41 +01:00
Amara Emerson
7510f32f90 [MachineSink] Fix crash due to use-after-free in a MachineInstr* cache.
After the SinkAndFold optimization was enabled, we saw some crashes with
GISel due to SinkAndFold erasing an MI while a reference was being held in a
cache.
2023-10-06 15:02:39 -07:00
Petar Avramovic
2fa7d652d0 AMDGPU: Fix temporal divergence introduced by machine-sink (#67456)
Temporal divergence that was present in input or introduced in IR
transforms, like code-sinking or LICM, is handled in SIFixSGPRCopies
by changing sgpr source instr to vgpr instr.
After 5b657f5, that moved LICM after AMDGPUCodeGenPrepare,
machine-sinking can introduce temporal divergence by sinking
instructions outside of the cycle.
Add isSafeToSink callback in TargetInstrInfo.
2023-10-06 15:00:08 +02:00
Petar Avramovic
ccf68ab432 Revert "MachineSink: Fix sinking VGPR def out of a divergent loop"
This reverts commit 3f8ef57bede94445b1a1042c987cc914a886e7ff.
2023-10-06 15:00:08 +02:00
Momchil Velikov
b30765caf8
[AArch64] Fix an incorrect handling of debug values in MachineSink (#68107) 2023-10-04 10:11:47 +01:00
Momchil Velikov
b454b04d68
[AArch64] Fix a compiler crash in MachineSink (#67705)
There were a couple of issues with maintaining register def/uses held
in `MachineRegisterInfo`:

* when an operand is changed from one register to another, the
corresponding instruction must already be inserted into the function,
or MRI won't be updated

* when traversing the set of all uses of a register, that set must not
change
2023-09-29 09:29:20 +01:00
Momchil Velikov
c649fd34e9 [MachineSink][AArch64] Sink instruction copies when they can replace copy into hard register or folded into addressing mode
This patch adds a new code transformation to the `MachineSink` pass,
that tries to sink copies of an instruction, when the copies can be folded
into the addressing modes of load/store instructions, or
replace another instruction (currently, copies into a hard register).

The criteria for performing the transformation is that:
* the register pressure at the sink destination block must not
  exceed the register pressure limits
* the latency and throughput of the load/store or the copy must not deteriorate
* the original instruction must be deleted

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D152828
2023-09-25 10:49:44 +01:00
Jay Foad
6551cfa8eb [CodeGen] Set regunitmasks for leaf regs to all instead of none
This simplifies every use of MCRegUnitMaskIterator.

Differential Revision: https://reviews.llvm.org/D157864
2023-08-14 15:22:35 +01:00
Jon Roelofs
f9ebcb4814
Remove a reference to rdar://problem/8030636
The surrounding comment has more than enough context to describe the problem.
2023-08-09 17:27:09 -07:00
Danila Kutenin
49d41de578 MachineSink: Fix strict weak ordering in GetAllSortedSuccessors
CodeGen/X86/pseudo_cmov_lower2.ll fails using libc++ debug mode
(D150264) without this change.

Reviewed By: MaskRay, aeubanks

Differential Revision: https://reviews.llvm.org/D155811
2023-08-02 12:52:55 -07:00