328 Commits

Author SHA1 Message Date
Kazu Hirata
3e53d4d386
[llvm] Remove unused includes (NFC) (#150265)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-07-23 15:18:46 -07:00
Peter Collingbourne
8b9bbd9ed6
MachineLICM: Merge logic for implicit and explicit definitions.
Anatoly Trosinenko found that when hasSideEffect was set to 0 in the
definition of LOADgotAUTH, MultiSource/Benchmarks/Ptrdist/ks/ks test
from llvm-test-suite started to crash. The issue was traced down to
MachineLICM pass placing LOADgotAUTH right after an unrelated copy to
x16 like rewriting this code:

````
bb.0:
  renamable $x16 = COPY renamable $x12
  B %bb.1

bb.1:
  ...
  /* use $x16 */
  ...
  renamable $x20 = LOADgotAUTH target-flags(aarch64-got) @some_variable, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv
  /* use $x20 */
  ...
````

like the following:

````
bb.0:
  renamable $x16 = COPY renamable $x12
  renamable $x20 = LOADgotAUTH target-flags(aarch64-got) @some_variable, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv
  B %bb.1

bb.1:
  ...
  /* use $x16 */
  ...
  /* use $x20 */
  ...
```

The issue was caused by inconsistent logic between implicit and explicit
operand definitions, where the implicit side was incorrectly skipping
checking RUDefs for dead operands, leading to RuledOut not being set
for the X16 operand.

Because there isn't really a semantic difference between implicit and
explicit operands at this point, let's remove the isImplicit check and
adjust the logic to do the same thing in both cases:

- For implicit operands, we now check and update RUDefs in the same way
  as explicit operands.
- For explicit operands, we now allow dead operands to be skipped.

Reviewers: arsenm, s-barannikov, atrosinenko

Reviewed By: arsenm, s-barannikov

Pull Request: https://github.com/llvm/llvm-project/pull/147624
2025-07-09 13:02:15 -07:00
Guy David
a38cf85738
[MachineLICM] Let targets decide if copy instructions are cheap (#146599)
When checking whether it is profitable to hoist an instruction, the pass
may override a target's ruling because it assumes that all COPY
instructions are cheap, and that may not be the case for all
micro-architectures (especially for when copying between different
register classes).

On AArch64 there's 0% difference in performance in LLVM's test-suite
with this change. Additionally, very few tests were affected which shows
how it is not so useful to keep it.

x86 performance is slightly better (but maybe that's just noise) for an
A/B comparison consisting of five iterations on LLVM's test suite (Ryzen
5950X on Ubuntu):
```
$ ./utils/compare.py build-a/results* vs build-b/results* --lhs-name base --rhs-name patch --absolute-diff
Tests: 3341
Metric: exec_time

Program                                       exec_time                 
                                              base      patch     diff  
LoopVector...meChecks4PointersDBeforeA/1000   824613.68 825394.06 780.38
LoopVector...timeChecks4PointersDBeforeA/32    18763.60  19486.02 722.42
LCALS/Subs...test:BM_MAT_X_MAT_LAMBDA/44217    37109.92  37572.52 462.60
LoopVector...ntimeChecks4PointersDAfterA/32    14211.35  14562.14 350.79
LoopVector...timeChecks4PointersDEqualsA/32    14221.44  14562.85 341.40
LoopVector...intersAllDisjointIncreasing/32    14222.73  14562.20 339.47
LoopVector...intersAllDisjointDecreasing/32    14223.85  14563.17 339.32
LoopVector...nLoopFrom_uint32_t_To_uint8_t_      739.60    807.45  67.86
harris/har...est:BENCHMARK_HARRIS/2048/2048    15953.77  15998.94  45.17
LoopVector...nLoopFrom_uint8_t_To_uint16_t_      301.94    331.21  29.27
LCALS/Subs...Raw.test:BM_DISC_ORD_RAW/44217      616.35    637.13  20.78
LCALS/Subs...Raw.test:BM_MAT_X_MAT_RAW/5001     3814.95   3833.70  18.75
LCALS/Subs...Raw.test:BM_HYDRO_2D_RAW/44217      812.98    830.64  17.66
LCALS/Subs...test:BM_IMP_HYDRO_2D_RAW/44217      811.26    828.13  16.87
ImageProce...ENCHMARK_BILATERAL_FILTER/64/4      714.77    726.23  11.46
           exec_time                            
l/r             base          patch         diff
count  3341.000000    3341.000000    3341.000000
mean   903.866450     899.732349    -4.134101   
std    20635.900959   20565.289417   115.346928 
min    0.000000       0.000000      -3380.455787
25%    0.000000       0.000000       0.000000   
50%    0.000000       0.000000       0.000000   
75%    1.806500       1.836397       0.000100   
max    824613.680801  825394.062500  780.381699
```
2025-07-05 14:06:33 +03:00
Austin
a550fef906
[llvm] Use llvm::fill instead of std::fill(NFC) (#146911)
Use llvm::fill instead of std::fill
2025-07-04 14:10:28 +08:00
Kazu Hirata
3bc174ba77
[CodeGen] Remove unused includes (NFC) (#141320)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-24 00:00:00 -07:00
Sergei Barannikov
2ae9a74bf1
[CodeGen] Use TRI::regunits() (NFC) (#137356) 2025-04-26 08:49:17 +03:00
Ulrich Weigand
be7ef6c52b
[MachineLICM] Recognize registers clobbered at EH landing pad entry (#122446)
EH landing pad entry implicitly clobbers target-specific exception
pointer and exception selector registers. The post-RA MachineLICM pass
needs to take these into account when deciding whether to hoist an
instruction out of the loop that initializes one of these registers.

Fixes: https://github.com/llvm/llvm-project/issues/122315
2025-04-25 22:27:27 +02:00
Sergei Barannikov
c3959f22ab
[MachineLICM] Remove CurPreheader parameter that is always nullptr (#135554)
Also, rename `getCurPreheader` -> `getOrCreatePreheader` to make it
clear that this method may alter CFG.

Update `Changed` if the method modified a loop by splitting a critical
edge (this change is not strictly NFC).

The removed parameter was probably intended to save compile time by not
trying to a critical edge after the first attempt has failed, but it is
only tried once per loop.

PR: https://github.com/llvm/llvm-project/pull/135554
2025-04-15 05:03:33 +03:00
Kazu Hirata
dc5178cc41
[CodeGen] Use llvm::append_range (NFC) (#135567) 2025-04-13 16:36:03 -07:00
Craig Topper
aaaaa4d256 [MachineLICM] Use Register. NFC 2025-03-02 22:33:26 -08:00
Daniel Paoliello
19032bfe87
[aarch64][win] Update Called Globals info when updating Call Site info (#122762)
Fixes the "use after poison" issue introduced by #121516 (see
<https://github.com/llvm/llvm-project/pull/121516#issuecomment-2585912395>).

The root cause of this issue is that #121516 introduced "Called Global"
information for call instructions modeling how "Call Site" info is
stored in the machine function, HOWEVER it didn't copy the
copy/move/erase operations for call site information.

The fix is to rename and update the existing copy/move/erase functions
so they also take care of Called Global info.
2025-01-13 14:00:31 -08:00
Nikita Popov
eeac0ffaf4 Revert "[MachineLICM] Use RegisterClassInfo::getRegPressureSetLimit (#119826)"
This reverts commit b4e17d4a314ed87ff6b40b4b05397d4b25b6636a.

This causes a large compile-time regression.
2025-01-10 09:05:06 +01:00
Pengcheng Wang
b4e17d4a31
[MachineLICM] Use RegisterClassInfo::getRegPressureSetLimit (#119826)
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of
`TargetRegisterInfo::getRegPressureSetLimit` with some logics to
adjust the limit by removing reserved registers.

It seems that we shouldn't use
`TargetRegisterInfo::getRegPressureSetLimit`
directly, just like the comment "This limit must be adjusted
dynamically for reserved registers" said.

Separate from https://github.com/llvm/llvm-project/pull/118787
2025-01-09 21:05:52 +08:00
paperchalice
1562b70eaf
Reapply "[DomTreeUpdater] Move critical edge splitting code to updater" (#119547)
This relands commit #115111.
Use traditional way to update post dominator tree, i.e. break critical
edge splitting into insert, insert, delete sequence.
When splitting critical edges, the post dominator tree may change its
root node, and `setNewRoot` only works in normal dominator tree...
See

6c7e5827ed/llvm/include/llvm/Support/GenericDomTree.h (L684-L687)
2024-12-13 11:43:09 +08:00
paperchalice
553058f825
Revert "[DomTreeUpdater] Move critical edge splitting code to updater" (#119512)
Reverts llvm/llvm-project#115111 Causes #119511
2024-12-11 14:25:17 +08:00
paperchalice
79047fac65
[DomTreeUpdater] Move critical edge splitting code to updater (#115111)
Support critical edge splitting in dominator tree updater. Continue the
work in #100856.

Compile time check:
https://llvm-compile-time-tracker.com/compare.php?from=87c35d782795b54911b3e3a91a5b738d4d870e55&to=42b3e5623a9ab4c3648564dc0926b36f3b438a3a&stat=instructions%3Au
2024-12-11 11:31:42 +08:00
Florian Hahn
ef102b4a63
[MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987)
The improvements in 63917e1 / #70796 do not check for memory
barriers/unmodelled sideeffects, which means we may incorrectly hoist
loads across memory barriers.

Fix this by checking any machine instruction in the loop is a load-fold
barrier.

PR: https://github.com/llvm/llvm-project/pull/116987
2024-11-21 10:25:04 +00:00
abhishek-kaushik22
46f43b6d92
[DebugInfo][InstrRef][MIR][GlobalIsel][MachineLICM] NFC Use std::move to avoid copying (#116935) 2024-11-21 13:37:56 +05:30
Kazu Hirata
735ab61ac8
[CodeGen] Remove unused includes (NFC) (#115996)
Identified with misc-include-cleaner.
2024-11-12 23:15:06 -08:00
paperchalice
fe63669282
[Instrumentation] Support MachineFunction in OptNoneInstrumentation (#115471)
Support `MachineFunction` in `OptNoneInstrumentation`, also add
`isRequired` to all necessary passes.
2024-11-09 16:50:11 +08:00
abhishek-kaushik22
6b64f36536
[NFC] Use std::move to avoid copy (#113080) 2024-11-05 14:42:53 +00:00
Gaëtan Bossu
a0c318938a
[CodeGen][NFC] Properly split MachineLICM and EarlyMachineLICM (#113573)
Both are based on MachineLICMBase, and the functionality there is
"switched" based on a PreRegAlloc flag. This commit is simply about
trusting the original value of that flag, defined by the `MachineLICM`
and `EarlyMachineLICM` classes.

The `PreRegAlloc` flag used to be overwritten it based on MRI.isSSA(),
which is un-reliable due to how it is inferred by the MIRParser. I see
that we can now define isSSA in MIR (thanks @gargaroff ), meaning the
fix isn’t really needed anymore, but redefining that flag still feels
wrong.

Note that I'm looking into upstreaming more changes to MachineLICM, see
[the discourse
thread](https://discourse.llvm.org/t/extending-post-regalloc-machinelicm/82725).
2024-10-25 11:19:22 -07:00
Kazu Hirata
db9e1fb3bc
[MachineLICM] Avoid repeated hash lookups (NFC) (#110452) 2024-09-30 06:49:04 -07:00
Jeremy Morse
056a3f4673 [NFC] Reapply 3f37c517f, SmallDenseMap speedups
This time with 100% more building unit tests. Original commit message follows.

[NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (#109417)

If we use SmallDenseMaps instead of DenseMaps at these locations,
we get a substantial speedup because there's less spurious malloc
traffic. Discovered by instrumenting DenseMap with some accounting
code, then selecting sites where we'll get the most bang for our buck.
2024-09-26 10:49:29 +01:00
Jeremy Morse
817e742ba5 Revert "[NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (#109417)"
This reverts commit 3f37c517fbc40531571f8b9f951a8610b4789cd6.

Lo and behold, I missed a unit test
2024-09-25 14:31:30 +01:00
Jeremy Morse
3f37c517fb
[NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (#109417)
If we use SmallDenseMaps instead of DenseMaps at these locations,
we get a substantial speedup because there's less spurious malloc
traffic. Discovered by instrumenting DenseMap with some accounting
code, then selecting sites where we'll get the most bang for our buck.
2024-09-25 14:22:23 +01:00
Akshat Oke
d2d78e584b
[NewPM][CodeGen] Port MachineLICM to NPM (#107376) 2024-09-20 11:34:18 +05:30
Pengcheng Wang
ed4e75d5e5
[CodeGen] Remove AA parameter of isSafeToMove (#100691)
This `AA` parameter is not used and for most uses they just pass
a nullptr.

The use of `AA` was removed since 8d0383e.
2024-07-26 15:47:47 +08:00
Jon Roelofs
b1f263e4c2
[llvm][MachineLICM] Fix a comment typo. NFC 2024-07-24 13:03:46 -07:00
paperchalice
099899961c
[CodeGen][NewPM] Port machine-block-freq to new pass manager (#98317)
- Add `MachineBlockFrequencyAnalysis`.
- Add `MachineBlockFrequencyPrinterPass`.
- Use `MachineBlockFrequencyInfoWrapperPass` in legacy pass manager.
- `LazyMachineBlockFrequencyInfo::print` is empty, drop it due to new
pass manager migration.
2024-07-12 15:45:01 +08:00
Nikita Popov
6a907699d8 Revert "[CodeGen] Remove applySplitCriticalEdges in MachineDominatorTree (#97055)"
This reverts commit c5e5088033fed170068d818c54af6862e449b545.

Causes large compile-time regressions.
2024-07-11 09:13:37 +02:00
paperchalice
c5e5088033
[CodeGen] Remove applySplitCriticalEdges in MachineDominatorTree (#97055)
Summary:
- Remove wrappers in `MachineDominatorTree`.
- Remove `MachineDominatorTree` update code in
`MachineBasicBlock::SplitCriticalEdge`.
- Use `MachineDomTreeUpdater` in passes which call
`MachineBasicBlock::SplitCriticalEdge` and preserve
`MachineDominatorTreeWrapperPass` or CFG analyses.

Commit abea99f65a97248974c02a5544eaf25fc4240056 introduced related
methods in 2014. Now we have SemiNCA based dominator tree in 2017 and
dominator tree updater, the solution adopted here seems a bit outdated.
2024-07-11 11:08:05 +08:00
paperchalice
79d0de2ac3
[CodeGen][NewPM] Port machine-loops to new pass manager (#97793)
- Add `MachineLoopAnalysis`.
- Add `MachineLoopPrinterPass`.
- Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.
2024-07-09 09:11:18 +08:00
Pierre van Houtryve
f0897ea4bb
[MachineLICM] Work-around Incomplete RegUnits (#95926)
Reverts the behavior introduced by 770393b while keeping the refactored
code.

Fixes a miscompile on AArch64, at the cost of a small regression on
AMDGPU.
#96146 opened to investigate the issue
2024-06-20 10:59:00 +02:00
Jay Foad
457e895479
[CodeGen] Do not include $noreg in any regmask operands. NFCI. (#95775)
Saying that a call preserves $noreg seems weird and required a
workaround in MachineLICM.
2024-06-17 13:42:03 +01:00
Pierre van Houtryve
770393bb99
[MachineLICM] Correctly Apply Register Masks (#95746)
Fix regression introduced in d4b8b72
2024-06-17 13:42:00 +02:00
Pierre van Houtryve
864981d72b
[NFC][MachineLICM] Use SmallDenseSet instead of SmallSet (#95201)
All values are small so no reason to ever use SmallSet really. In large
programs we'll end up using std::set which is extremely slow compared to
DenseSet. This brings a decent speedup to the pass in large programs.
2024-06-12 11:34:54 +02:00
paperchalice
837dc542b1
[CodeGen][NewPM] Split MachineDominatorTree into a concrete analysis result (#94571)
Prepare for new pass manager version of `MachineDominatorTreeAnalysis`.
We may need a machine dominator tree version of `DomTreeUpdater` to
handle `SplitCriticalEdge` in some CodeGen passes.
2024-06-11 21:27:14 +08:00
Pierre van Houtryve
d4b8b7217f
[CodeGen][MachineLICM] Use RegUnits in HoistRegionPostRA (#94608)
Those BitVectors get expensive on targets like AMDGPU with thousands of
registers, and RegAliasIterator is also expensive.

We can move all liveness calculations to use RegUnits instead to speed
it up for targets where RegAliasIterator is expensive, like AMDGPU.
On targets where RegAliasIterator is cheap, this alternative can be a little more expensive, but I believe the tradeoff is worth it.
2024-06-11 14:27:35 +02:00
Pengcheng Wang
4e0bd3fab4
[MachineLICM] Hoist copies of constant physical register (#93285)
Previously, we just check if the source is a virtual register and
this prevents some potential hoists.

We can see some improvements in AArch64/RISCV tests.
2024-05-29 14:10:01 +08:00
Matt Arsenault
39e24bdd8e
MachineLICM: Allow hoisting REG_SEQUENCE (#90638) 2024-05-01 16:52:04 +02:00
Matt Arsenault
114a59d4d3 MachineLICM: Remove unnecessary isReg checks
COPY operands are always registers.
2024-04-30 17:44:45 +02:00
michaelselehov
56ad6d1939
[MachineLICM] Hoist COPY instruction only when user can be hoisted (#81735)
befa925acac8fd6a9266e introduced preliminary hoisting of COPY
instructions when the user of the COPY is inside the same loop. That
optimization appeared to be too aggressive and hoisted too many COPY's
greatly increasing register pressure causing performance regressions for
AMDGPU target.

This is intended to fix the regression by hoisting COPY instruction only
if either:
 - User of COPY can be hoisted (other args are invariant) 
 or
 - Hoisting COPY doesn't bring high register pressure
2024-02-27 12:31:29 +00:00
Igor Kirillov
839abdb0d2
[MachineLICM] Fix incorrect CSE on hoisted const load (#73007)
When hoisting an invariant load, we should not combine it with an
existing load through common subexpression elimination (CSE). This is
because there might be memory-changing instructions between the existing
load and the end of the block entering the loop.

Fixes https://github.com/llvm/llvm-project/issues/72855
2023-11-27 14:37:18 +00:00
Rin
befa925aca
[MachineLICM][AArch64] Hoist COPY instructions with other uses in the loop (#71403)
When there is a COPY instruction in the loop with other uses, we want to
hoist the COPY, which in turn leads to the users being hoisted as well.

Co-authored-by David Green : David.Green@arm.com
2023-11-20 10:01:04 +00:00
Igor Kirillov
63917e1975
[MachineLICM] Allow hoisting loads from invariant address (#70796)
Sometimes, loads can appear in a loop after the LICM pass is executed
the final time. For example, ExpandMemCmp pass creates loads in a loop,
and one of the operands may be an invariant address.
This patch extends the pre-regalloc stage MachineLICM by allowing to
hoist invariant loads from loops that don't have any stores or calls
and allows load reorderings.
2023-11-16 11:12:10 +00:00
Hendrik Greving
2600aaab21
Revert "[MachineLICM] Relax overlay conservative PHI check (#67186)" (#68580)
This reverts commit 71a8d2e3064fcb3ff76565e6e8529613f90aa51b.
2023-10-09 05:26:58 -07:00
Hendrik Greving
71a8d2e306
[MachineLICM] Relax overlay conservative PHI check (#67186)
Skip LICM if PHI belongs to the current loop, e.g. is in the
loop's header. This prevents LICM from bailing for CFGs like

L1:
  R = LoopInvariant // can be LICM'd
  BR L1
L2:
  PHI(R, ..)
  BR L2
2023-10-09 04:49:11 -07:00
Karl-Johan Karlsson
fa3a685926
[MachineLICM] Clear subregister kill flags (#67240)
When hosting a loop invariant instruction the resulting register must be
live in
all the basic blocks of the loop body and the killed flags of the
register must
be cleared.

Before this patch killed flags of subregister to a hoisted superregister
was not
cleared in the loop body.

This was found in an out of tree target, but the testcase
mlicm-stack-write-check.mir was modified to trigger the case.
2023-09-28 07:26:39 +02:00
Jingu Kang
ff68e43c81 [MachineLICM] Handle Subloops
It is a re-commit from reverted commit 3454cf67bd0a650097dc6ca99874a34e1d59b500.

Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle subloops with only visiting outermost loop's blocks once.

Differential Revision: https://reviews.llvm.org/D154205
2023-09-26 14:25:11 +01:00