110 Commits

Author SHA1 Message Date
dianqk
66f158d918
[TailDuplicator] Determine if computed gotos using blockaddress (#132536)
Using `blockaddress` should be more reliable than determining if an
operand comes from a jump table index.

Alternative: Add the `MachineInstr::MIFlag::ComputedGoto` flag when
lowering `indirectbr`. But I don't think this approach is suitable to
backport.
2025-03-26 21:27:43 +08:00
Kazu Hirata
1b189cab5e
[llvm] Use *Set::insert_range (NFC) (#132509)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch uses insert_range in
conjunction with llvm::{predecessors,successors} and
MachineBasicBlock::{predecessors,successors}.
2025-03-22 08:07:33 -07:00
Kazu Hirata
599005686a
[llvm] Use *Set::insert_range (NFC) (#132325)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch replaces:

  Dest.insert(Src.begin(), Src.end());

with:

  Dest.insert_range(Src);

This patch does not touch custom begin like succ_begin for now.
2025-03-20 22:24:06 -07:00
DianQK
dd21aacd76
[TailDuplicator] Do not restrict the computed gotos (#114990)
Fixes #106846.

This is what I learned from GCC. I found that GCC does not duplicate the
BB that has indirect jumps with the jump table. I believe GCC has
provided a clear explanation here:

> Duplicate the blocks containing computed gotos. This basically
unfactors computed gotos that were factored early on in the compilation
process to speed up edge based data flow. We used to not unfactor them
again, which can seriously pessimize code with many computed jumps in
the source code, such as interpreters.
2025-03-10 19:34:07 +08:00
Craig Topper
1d9207fda0 [TailDuplicator] Use Register. NFC 2025-03-02 12:44:26 -08:00
Florian Hahn
0d0190815d
[TailDup] Allow large number of predecessors/successors without phis. (#116072)
This adjusts the threshold logic added in #78582 to only trigger for
cases where there are actually phis to duplicate in either TailBB or in
one of the successors.

In cases there are no phis, we only have to pay the cost of extra edges,
but have no explosion in PHI related instructions.

This improves performance of Python on some inputs by 2-3% on Apple
Silicon CPUs.

PR: https://github.com/llvm/llvm-project/pull/116072
2025-01-23 18:24:20 +00:00
Daniel Paoliello
19032bfe87
[aarch64][win] Update Called Globals info when updating Call Site info (#122762)
Fixes the "use after poison" issue introduced by #121516 (see
<https://github.com/llvm/llvm-project/pull/121516#issuecomment-2585912395>).

The root cause of this issue is that #121516 introduced "Called Global"
information for call instructions modeling how "Call Site" info is
stored in the machine function, HOWEVER it didn't copy the
copy/move/erase operations for call site information.

The fix is to rename and update the existing copy/move/erase functions
so they also take care of Called Global info.
2025-01-13 14:00:31 -08:00
Kazu Hirata
735ab61ac8
[CodeGen] Remove unused includes (NFC) (#115996)
Identified with misc-include-cleaner.
2024-11-12 23:15:06 -08:00
Ellis Hoag
6ab26eab4f
Check hasOptSize() in shouldOptimizeForSize() (#112626) 2024-10-28 09:45:03 -07:00
Matt Arsenault
0f0cfcff2c
CodeGen: Avoid some references to MachineFunction's getMMI (#99652)
MachineFunction's probably should not include a backreference to
the owning MachineModuleInfo. Most of these references were used
just to query the MCContext, which MachineFunction already directly
stores. Other contexts are using it to query the LLVMContext, which
can already be accessed through the IR function reference.
2024-07-19 22:09:05 +04:00
Kazu Hirata
559ea40d9a
[CodeGen] Use range-based for loops (NFC) (#96855) 2024-06-27 11:00:27 -07:00
Kazu Hirata
dae061f1b2
[CodeGen] Use range-based for loops (NFC) (#96777) 2024-06-26 16:49:00 -07:00
Quentin Dian
86a78284e7
[TailDuplicator] Add maximum predecessors and successors to consider tail duplicating blocks (#78582)
Fixes #78578.

Duplicating a BB which has both multiple predecessors and successors
will result in a complex CFG and also may cause huge amount of PHI
nodes. See
https://github.com/llvm/llvm-project/issues/78578#issuecomment-1962363580
for a detailed description of the limit.
2024-04-17 20:27:09 +08:00
Jay Foad
4b2381a5f0 [CodeGen] Make use of MachineBasicBlock::phis. NFC. 2023-05-02 13:39:01 +01:00
Mikael Holmen
f7bee65728 [TailDuplicator] Don't constrain register classes due to debug instructions
If cloning a DBG_VALUE instruction, register uses in that instruction could
lead to constraining of a virtual register that would not happen if the
DBG_VALUE was not present at all. This lead to different code with/without
debug info.

Now we only do that register class constraining if we dealing with a non
debug instruction.

Differential Revision: https://reviews.llvm.org/D149146
2023-04-26 08:17:42 +02:00
Bjorn Pettersson
df947febe2 [TailDuplicator] Fix old bugs in TailDuplicator::duplicateInstruction
This patch is updating TailDuplicator::duplicateInstruction to fix
some old bugs that has been found with an out-of-tree target. There
are three different things being addressed:

1) In one situation two subregister indices are combined using the
   composeSubRegIndices helper. But the order in which those indices
   are combined has been incorrect. For this problem I managed to
   create some kind of reproducer using AArch64 (see the test case
   touched in this patch).

2) Another fault was found in the else branch for the above situation.
   Here we do not compose the two subregisters, instead we insert a
   COPY to replace the PHI, and then the subreg index in the using
   MO remains. Thus, the virtual register created for the COPY should
   always match with the size of the original register. Therefore the
   optimization that "constrain" (or rather relax) the register
   class by looking at the instruction desc must be limited to the
   situation when there is no subregister access. Otherwise we create
   a vreg with the wrong class.

3) Last problem addressed in this patch is that when a new register
   class is picked by looking at the instruction desc, then it
   isn't guaranteed that the isAllocatable property is set for that
   class. So one need to use the getAllocatableClass helper to find
   a subclass that is allocatable before using createVirualRegister,
   or alternatively (as in this patch) just use the OrigRC instead
   of relaxing the register class for the COPY destination.

Haven't been able to find any in-tree reproducers for problem 2 and 3.
The tricky part is to find a target that has register hierarchies that
match with the problem to trigger those code paths (and with subreg
accesses involved).

Differential Revision: https://reviews.llvm.org/D140496
2023-02-06 19:21:23 +01:00
Craig Topper
e72ca520bb [CodeGen] Remove uses of Register::isPhysicalRegister/isVirtualRegister. NFC
Use isPhysical/isVirtual methods.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D141715
2023-01-13 14:38:08 -08:00
Tim Northover
5b24d42106 TailDuplication: do not remove trivial PHIs from addr-taken blocks.
Unlike an anonymous block, it will not be removed even though we've resolved
all valid paths to get here. So removing a PHI can leave vregs with no
definition, violating SSA. Instead, this converts it to an IMPLICIT_DEF.
2023-01-09 11:12:33 +00:00
Nick Desaulniers
d7474bef77 [llvm][TailDuplicator] don't taildup isInlineAsmBrIndirectTargets
This fixes a crash observed after
https://reviews.llvm.org/D129997.

Similar to D88823.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D130127
2022-08-31 13:07:10 -07:00
Mircea Trofin
4146c1756d [nfc] Remove unused parameter in TailDuplicator::duplicateSimpleBB
Differential Revision: https://reviews.llvm.org/D131008
2022-08-02 13:39:34 -07:00
Kazu Hirata
9e6d1f4b5d [CodeGen] Qualify auto variables in for loops (NFC) 2022-07-17 01:33:28 -07:00
Momchil Velikov
b4ad28da19 [CodeGen] Async unwind - add a pass to fix CFI information
This pass inserts the necessary CFI instructions to compensate for the
inconsistency of the call-frame information caused by linear (non-CGA
aware) nature of the unwind tables.

Unlike the `CFIInstrInserer` pass, this one almost always emits only
`.cfi_remember_state`/`.cfi_restore_state`, which results in smaller
unwind tables and also transparently handles custom unwind info
extensions like CFA offset adjustement and save locations of SVE
registers.

This pass takes advantage of the constraints taht LLVM imposes on the
placement of save/restore points (cf. `ShrinkWrap.cpp`):

  * there is a single basic block, containing the function prologue

  * possibly multiple epilogue blocks, where each epilogue block is
    complete and self-contained, i.e. CSR restore instructions (and the
    corresponding CFI instructions are not split across two or more
    blocks.

  * prologue and epilogue blocks are outside of any loops

Thus, during execution, at the beginning and at the end of each basic
block the function can be in one of two states:

  - "has a call frame", if the function has executed the prologue, or
     has not executed any epilogue

  - "does not have a call frame", if the function has not executed the
    prologue, or has executed an epilogue

These properties can be computed for each basic block by a single RPO
traversal.

From the point of view of the unwind tables, the "has/does not have
call frame" state at beginning of each block is determined by the
state at the end of the previous block, in layout order.

Where these states differ, we insert compensating CFI instructions,
which come in two flavours:

- CFI instructions, which reset the unwind table state to the
    initial one.  This is done by a target specific hook and is
    expected to be trivial to implement, for example it could be:
```
     .cfi_def_cfa <sp>, 0
     .cfi_same_value <rN>
     .cfi_same_value <rN-1>
     ...
```
where `<rN>` are the callee-saved registers.

- CFI instructions, which reset the unwind table state to the one
    created by the function prologue. These are the sequence:
```
       .cfi_restore_state
       .cfi_remember_state
```
In this case we also insert a `.cfi_remember_state` after the
last CFI instruction in the function prologue.

Reviewed By: MaskRay, danielkiss, chill

Differential Revision: https://reviews.llvm.org/D114545
2022-04-11 13:27:26 +01:00
Muhammad Omair Javaid
0320115c16 Revert "[CodeGen] Async unwind - add a pass to fix CFI information"
This reverts commit 980c3e6dd223a8e628367144b8180117950bb364.

This commit had failing tests with clang crashing across various
AArch64/Linux buildots.

https://lab.llvm.org/buildbot/#/builders/179/builds/3346

Differential Revision: https://reviews.llvm.org/D114545
2022-04-05 13:12:30 +05:00
Momchil Velikov
980c3e6dd2 [CodeGen] Async unwind - add a pass to fix CFI information
This pass inserts the necessary CFI instructions to compensate for the
inconsistency of the call-frame information caused by linear (non-CFG
aware) nature of the unwind tables.

Unlike the `CFIInstrInserer` pass, this one almost always emits only
`.cfi_remember_state`/`.cfi_restore_state`, which results in smaller
unwind tables and also transparently handles custom unwind info
extensions like CFA offset adjustement and save locations of SVE
registers.

This pass takes advantage of the constraints that LLVM imposes on the
placement of save/restore points (cf. `ShrinkWrap.cpp`):

  * there is a single basic block, containing the function prologue

  * possibly multiple epilogue blocks, where each epilogue block is
    complete and self-contained, i.e. CSR restore instructions (and the
    corresponding CFI instructions are not split across two or more
    blocks.

  * prologue and epilogue blocks are outside of any loops

Thus, during execution, at the beginning and at the end of each basic
block the function can be in one of two states:

  - "has a call frame", if the function has executed the prologue, or
     has not executed any epilogue

  - "does not have a call frame", if the function has not executed the
    prologue, or has executed an epilogue

These properties can be computed for each basic block by a single RPO
traversal.

In order to accommodate backends which do not generate unwind info in
epilogues we compute an additional property "strong no call frame on
entry" which is set for the entry point of the function and for every
block reachable from the entry along a path that does not execute the
prologue. If this property holds, it takes precedence over the "has a
call frame" property.

From the point of view of the unwind tables, the "has/does not have
call frame" state at beginning of each block is determined by the
state at the end of the previous block, in layout order.

Where these states differ, we insert compensating CFI instructions,
which come in two flavours:

- CFI instructions, which reset the unwind table state to the
    initial one.  This is done by a target specific hook and is
    expected to be trivial to implement, for example it could be:
```
     .cfi_def_cfa <sp>, 0
     .cfi_same_value <rN>
     .cfi_same_value <rN-1>
     ...
```
where `<rN>` are the callee-saved registers.

- CFI instructions, which reset the unwind table state to the one
    created by the function prologue. These are the sequence:
```
       .cfi_restore_state
       .cfi_remember_state
```
In this case we also insert a `.cfi_remember_state` after the
last CFI instruction in the function prologue.

Reviewed By: MaskRay, danielkiss, chill

Differential Revision: https://reviews.llvm.org/D114545
2022-04-04 14:38:22 +01:00
Shengchen Kan
37b378386e [NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments 2022-03-16 20:25:42 +08:00
serge-sans-paille
989f1c72e0 Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169

after:  1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
2022-03-16 08:43:00 +01:00
Nico Weber
a278250b0f Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10 07:59:22 -05:00
serge-sans-paille
7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00
Kazu Hirata
630c847b1b [llvm] Use range-based for loops (NFC) 2021-12-07 09:17:03 -08:00
Stephen Tozer
98a021fcbf [DebugInfo] Attempt to preserve more information during tail duplication
Prior to this patch, tail duplication handled debug info poorly -
specifically, debug instructions would be dropped instead of being set
undef, potentially extending the lifetimes of prior debug values that
should be killed. The pass was also very aggressive with dropping debug
info, dropping debug info even when the SSA value it referred to was
still present. This patch attempts to handle debug info more carefully,
checking to see whether each affected debug value can still be live,
setting it undef if not.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D106875
2021-12-03 15:30:05 +00:00
Jun Ma
17eb6b61de Revert "[Taildup] Don't tail-duplicate loop header with multiple successors as its latches"
This reverts commit 1f9fa549841a2ec55aa5a131bfaf83f0383c4713.
2021-11-24 10:26:37 +08:00
Kazu Hirata
7f00806a6a [llvm] Use make_early_inc_range (NFC) 2021-11-15 21:28:46 -08:00
Jun Ma
1f9fa54984 [Taildup] Don't tail-duplicate loop header with multiple successors as its latches
when Taildup hit loop with multiple latches like:
  //    1 -> 2 <-> 3                 |
  //          \  <-> 4               |
  //           \   <-> 5             |
  //            \---> rest           |
it may transform this loop into multiple loops by duplicate loop header.
However, this change may has little benefit while makes cfg much complex.
In some uncommon cases, it causes large compile time regression (offered by
@alexfh in D106056).

This patch disable tail-duplicate of such cases.

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D110613
2021-11-01 15:32:00 +08:00
Neubauer, Sebastian
c78640ee6a [TailDuplicator] Fix merging block with terminator
The TailDuplicator merged two blocks, even if the first one ended with
a terminator, resulting in invalid MIR, where a terminator is in the
middle of a block.

Abort merging if the first block ends with a terminator.

Differential Revision: https://reviews.llvm.org/D112226
2021-10-29 10:52:46 +02:00
Kazu Hirata
e2febc2ed4 [llvm] Use drop_begin (NFC) 2021-09-17 09:16:40 -07:00
Kazu Hirata
cfc7402419 [llvm] Use drop_begin (NFC) 2021-09-16 08:46:26 -07:00
Stephen Tozer
31e7551217 [DebugInfo] Correctly update debug users of SSA values in tail duplication
During tail duplication, SSA values may be updated and have their uses
replaced with a virtual register, and any debug instructions that use
that value are deleted. This patch fixes the implementation of the debug
instruction deletion to work correctly for debug instructions that use
the SSA value multiple times, by batching deletions so that we don't
attempt to delete the same instruction twice.

Differential Revision: https://reviews.llvm.org/D106557
2021-07-26 17:27:57 +01:00
Hongtao Yu
bd52495518 [CSSPGO] Undoing the concept of dangling pseudo probe
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.

I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104477
2021-06-18 15:14:11 -07:00
Hongtao Yu
8985515822 [CSSPGO] Unblocking optimizations by dangling pseudo probes.
This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments.

The optimizations being unblocked are:

	1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted.
	2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions.
	3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks.

Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D97481
2021-03-03 22:44:42 -08:00
Kazu Hirata
a205fa5cd9 [CodeGen] Use range-based for loops (NFC) 2021-02-19 22:44:14 -08:00
Kazu Hirata
7bc76fd0ec [CodeGen] Construct SmallVector with iterator ranges (NFC) 2020-12-31 09:39:11 -08:00
Bill Wendling
d2c61d2bf9 [CodeGen][TailDuplicator] Don't duplicate blocks with INLINEASM_BR
Tail duplication of a block with an INLINEASM_BR may result in a PHI
node on the indirect branch. This is okay, but it also introduces a copy
for that PHI node *after* the INLINEASM_BR, which is not okay.

See: https://github.com/ClangBuiltLinux/linux/issues/1125

Differential Revision: https://reviews.llvm.org/D88823
2020-10-06 18:44:59 -07:00
James Y Knight
4b0aa5724f Change the INLINEASM_BR MachineInstr to be a non-terminating instruction.
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.

Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.

Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.

Reviewed By: nickdesaulniers, void

Differential Revision: https://reviews.llvm.org/D79794
2020-07-01 12:51:50 -04:00
Matt Arsenault
edb4a5cb36 TailDuplicator: Use Register 2020-06-30 12:13:08 -04:00
James Y Knight
1978309db1 MachineBasicBlock::updateTerminator now requires an explicit layout successor.
Previously, it tried to infer the correct destination block from the
successor list, but this is a rather tricky propspect, given the
existence of successors that occur mid-block, such as invoke, and
potentially in the future, callbr/INLINEASM_BR. (INLINEASM_BR, in
particular would be problematic, because its successor blocks are not
distinct from "normal" successors, as EHPads are.)

Instead, require the caller to pass in the expected fallthrough
successor explicitly. In most callers, the correct block is
immediately clear. But, in MachineBlockPlacement, we do need to record
the original ordering, before starting to reorder blocks.

Unfortunately, the goal of decoupling the behavior of end-of-block
jumps from the successor list has not been fully accomplished in this
patch, as there is currently no other way to determine whether a block
is intended to fall-through, or end as unreachable. Further work is
needed there.

Differential Revision: https://reviews.llvm.org/D79605
2020-06-06 22:30:51 -04:00
Djordje Todorovic
016d91ccbd [CallSiteInfo] Handle bundles when updating call site info
This will address the issue: P8198 and P8199 (from D73534).

The methods was not handle bundles properly.

Differential Revision: https://reviews.llvm.org/D74904
2020-02-27 13:57:06 +01:00
Djordje Todorovic
a5ac8ca3e0 [CSInfo][TailDuplicator] Delete the call site info when removing dead MBBs
This is needed for the debug entry values feature.

Differential Revision: https://reviews.llvm.org/D74702
2020-02-18 12:29:51 +01:00
Guozhi Wei
369d086d78 [MBP] Partial tail duplication into hot predecessors
Current tail duplication embedded in MBP duplicates a BB into all or none of its predecessors without too much cost analysis. So sometimes it is duplicated into cold predecessors, and in other cases it may miss the duplication into hot predecessors.

This patch improves tail duplication in 3 aspects:

  A successor can be duplicated into part of its predecessors.
  A more fine-grained benefit analysis, combined with 1, now a successor is duplicated into hot predecessors only.
  If a successor can't be duplicated into one predecessor, it doesn't impact the duplication into other predecessors.

Differential Revision: https://reviews.llvm.org/D73387
2020-02-12 15:22:33 -08:00
Hiroshi Yamauchi
ac8da31a0f [PGO][PGSO] Handle MBFIWrapper
Some code gen passes use MBFIWrapper to keep track of the frequency of new
blocks. This was not taken into account and could lead to incorrect frequencies
as MBFI silently returns zero frequency for unknown/new blocks.

Add a variant for MBFIWrapper in the PGSO query interface.

Depends on D73494.
2020-01-31 09:36:55 -08:00
Stanislav Mekhanoshin
8b417dd3d6 Process BUNDLE in tail duplication
When tail duplication estimates a size of tail it uses instruction
count. Account for a number of instrictions in a bundle too.

Differential Revision: https://reviews.llvm.org/D72783
2020-01-15 15:46:57 -08:00