204 Commits

Author SHA1 Message Date
Kazu Hirata
aa613777af
[llvm] Remove redundant control flow (NFC) (#138304) 2025-05-02 10:34:25 -07:00
Matt Arsenault
c3c97eab12
PeepholeOpt: Do not skip reg_sequence sources with subregs (#125667)
Contrary to the comment, this particular code is not responsible
for handling any composes that may be required, and unhandled cases
are already rejected later. Lift this restriction to permit composes
and reg_sequence subregisters later.
2025-03-13 21:49:16 +07:00
Matt Arsenault
c22db56d77
PeepholeOpt: Remove subreg def check for bitcast (#130086)
Subregister defs are illegal in SSA. Surprisingly this enables folding
into subregister insert patterns in one test.
2025-03-07 07:44:08 +07:00
Matt Arsenault
a6e69db52f
PeepholeOpt: Remove subreg def check for insert_subreg (#130085) 2025-03-07 07:40:51 +07:00
Matt Arsenault
f0c2c71d2c
PeepholeOpt: Remove dead checks for subregister def mismatch (#130084) 2025-03-07 07:31:33 +07:00
Matt Arsenault
83ccab35d4
PeepholeOpt: Remove pointless check for subregister def (#128850)
Subregister defs are illegal in SSA
2025-02-26 20:40:06 +07:00
Matt Arsenault
c53eb93dd7
PeepholeOpt: Immediately check if a reg_sequence compose supports a subregister (#128279)
This is a quick fix for EXPENSIVE_CHECKS bot failures. I still think we
could
defer looking for a compatible subregister further up the use-def chain,
and
should be able to check compatibilty with the ultimate found source.
2025-02-26 10:15:17 +07:00
Matt Arsenault
1bb43068f1
PeepholeOpt: Allow introducing subregister uses on reg_sequence (#127052)
This reverts d246cc618adc52fdbd69d44a2a375c8af97b6106. We now handle
composing subregister extracts through reg_sequence.
2025-02-22 09:16:14 +07:00
Matt Arsenault
ed38d6702f
PeepholeOpt: Handle subregister compose when looking through reg_sequence (#127051)
Previously this would give up on folding subregister copies through
a reg_sequence if the input operand already had a subregister index.
d246cc618adc52fdbd69d44a2a375c8af97b6106 stopped introducing these
subregister uses, and this is the first step to lifting that restriction.

I was expecting to be able to implement this only purely with compose /
reverse compose, but I wasn't able to make it work so relies on testing
the lanemasks for whether the copy reads a subset of the input.
2025-02-18 08:07:29 +07:00
Matt Arsenault
58a88001f3
PeepholeOpt: Fix looking for def of current copy to coalesce (#125533)
This fixes the handling of subregister extract copies. This
will allow AMDGPU to remove its implementation of
shouldRewriteCopySrc, which exists as a 10 year old workaround
to this bug. peephole-opt-fold-reg-sequence-subreg.mir will
show the expected improvement once the custom implementation
is removed.

The copy coalescing processing here is overly abstracted
from what's actually happening. Previously when visiting
coalescable copy-like instructions, we would parse the
sources one at a time and then pass the def of the root
instruction into findNextSource. This means that the
first thing the new ValueTracker constructed would do
is getVRegDef to find the instruction we are currently
processing. This adds an unnecessary step, placing
a useless entry in the RewriteMap, and required skipping
the no-op case where getNewSource would return the original
source operand. This was a problem since in the case
of a subregister extract, shouldRewriteCopySource would always
say that it is useful to rewrite and the use-def chain walk
would abort, returning the original operand. Move the process
to start looking at the source operand to begin with.

This does not fix the confused handling in the uncoalescable
copy case which is proving to be more difficult. Some currently
handled cases have multiple defs from a single source, and other
handled cases have 0 input operands. It would be simpler if
this was implemented with isCopyLikeInstr, rather than guessing
at the operand structure as it does now.

There are some improvements and some regressions. The
regressions appear to be downstream issues for the most part. One
of the uglier regressions is in PPC, where a sequence of insert_subrgs
is used to build registers. I opened #125502 to use reg_sequence instead,
which may help.

The worst regression is an absurd SPARC testcase using a <251 x fp128>,
which uses a very long chain of insert_subregs.

We need improved subregister handling locally in PeepholeOptimizer,
and other pasess like MachineCSE to fix some of the other regressions.
We should handle subregister composes and folding more indexes
into insert_subreg and reg_sequence.
2025-02-05 23:29:02 +07:00
Matt Arsenault
0e49c74f36 PeepholeOpt: Make copy ID methods static 2025-02-03 22:44:56 +07:00
Matt Arsenault
1a314b2472 PeepholeOpt: Fix copy current source index accounting bug
We were essentially using the current source index as a binary
value, and didn't actually use it for indexing so it did not
matter. Use the operand to ensure the value is actually correct.
2025-01-31 21:17:20 +07:00
Matt Arsenault
fe1f6b4855
PeepholeOpt: Avoid double map lookup (#124531) 2025-01-30 20:59:58 +07:00
Matt Arsenault
83ca720ef2
PeepholeOpt: Remove check for reg_sequence def of subregister (#124512)
The verifier does not allow reg_sequence to have subregister defs,
even if undef.
2025-01-30 20:55:48 +07:00
Matt Arsenault
8d506b9a5b
PeepholeOpt: Simplify tracking of current op for copy and reg_sequence (#124224)
Set the starting index in the constructor instead of treating
0 as a special case. There should also be no need for bounds
checking in the rewrite.
2025-01-30 20:49:43 +07:00
Matt Arsenault
d246cc618a
PeepholeOpt: Do not add subregister indexes to reg_sequence operands (#124111)
Given the rest of the pass just gives up when it needs to compose
subregisters, folding a subregister extract directly into a reg_sequence
is counterproductive. Later fold attempts in the function will give up
on the subregister operand, preventing looking up through the reg_sequence.

It may still be profitable to do these folds if we start handling
the composes. There are some test regressions, but this mostly
looks better.
2025-01-30 20:42:02 +07:00
Matt Arsenault
15c2d4baf1
PeepholeOpt: Remove check for subreg index on a def operand (#123943)
This is looking at operand 0 of a REG_SEQUENCE, which can never
have a subregister index.
2025-01-23 09:06:26 +07:00
Matt Arsenault
2646e2d487
PeepholeOpt: Stop allocating tiny helper classes (NFC) (#123936)
This was allocating tiny helper classes for every instruction
visited. We can just dispatch over the cases in the visitor
function instead.
2025-01-23 09:00:08 +07:00
Matt Arsenault
6f69adeed6
PeepholeOpt: Remove null TargetRegisterInfo check (#123933)
This cannot happen. Also simplify the LaneBitmask check from !none
to any.
2025-01-23 08:57:04 +07:00
Matt Arsenault
23d2a1862a
PeepholeOpt: Remove unnecessary check for null TargetInstrInfo (#123929)
This can never happen.
2025-01-23 08:46:59 +07:00
Daniel Paoliello
19032bfe87
[aarch64][win] Update Called Globals info when updating Call Site info (#122762)
Fixes the "use after poison" issue introduced by #121516 (see
<https://github.com/llvm/llvm-project/pull/121516#issuecomment-2585912395>).

The root cause of this issue is that #121516 introduced "Called Global"
information for call instructions modeling how "Call Site" info is
stored in the machine function, HOWEVER it didn't copy the
copy/move/erase operations for call site information.

The fix is to rename and update the existing copy/move/erase functions
so they also take care of Called Global info.
2025-01-13 14:00:31 -08:00
Akshat Oke
3f9d02aae8
[CodeGen][NewPM] Port PeepholeOptimizer to NPM (#116326)
With this, all machine SSA optimization passes are available in the new codegen pipeline.
2024-11-18 11:02:01 +05:30
Akshat Oke
00aa08119a
[NFC] Clang format PeepholeOptimizer (#116325) 2024-11-18 10:58:48 +05:30
paperchalice
79d0de2ac3
[CodeGen][NewPM] Port machine-loops to new pass manager (#97793)
- Add `MachineLoopAnalysis`.
- Add `MachineLoopPrinterPass`.
- Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.
2024-07-09 09:11:18 +08:00
Kazu Hirata
dae061f1b2
[CodeGen] Use range-based for loops (NFC) (#96777) 2024-06-26 16:49:00 -07:00
paperchalice
837dc542b1
[CodeGen][NewPM] Split MachineDominatorTree into a concrete analysis result (#94571)
Prepare for new pass manager version of `MachineDominatorTreeAnalysis`.
We may need a machine dominator tree version of `DomTreeUpdater` to
handle `SplitCriticalEdge` in some CodeGen passes.
2024-06-11 21:27:14 +08:00
Xu Zhang
f6d431f208
[CodeGen] Make the parameter TRI required in some functions. (#85968)
Fixes #82659

There are some functions, such as `findRegisterDefOperandIdx` and  `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI  parameters, as shown in issue #82411.

Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`,  `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.

After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
2024-04-24 14:24:14 +01:00
Shengchen Kan
550f0eb2ce [NFC] Rename TargetInstrInfo::FoldImmediate to TargetInstrInfo::foldImmediate and simplify implementation for X86 2024-01-26 20:50:58 +08:00
Guozhi Wei
9a091de7fe [X86, Peephole] Enable FoldImmediate for X86
Enable FoldImmediate for X86 by implementing X86InstrInfo::FoldImmediate.

Also enhanced peephole by deleting identical instructions after FoldImmediate.

Differential Revision: https://reviews.llvm.org/D151848
2023-10-27 19:47:23 +00:00
Mogball
3fb5b18e81 Revert 24633ea and 760e7d0 "Enable FoldImmediate for X86"
This reverts commits 24633eac38d46cd4b253ba53258165ee08d886cd
and 760e7d00d142ba85fcf48c00e0acc14a355da7c3.

I have confirmed that these commits are introducing a new crash in the
peephole optimizer. I have minimized a test case, which you can find
below.

```llvmir
; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "/mnt/big/modular/Kernels/mojo/Mogg/MOGG.mojo"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

declare dso_local void @foo({ { ptr, [4 x i64], [4 x i64], i1 }, { ptr, [4 x i64], [4 x i64], i1 } }, { ptr }, { ptr, i64, i8 })

define dso_local void @bad_fn(ptr %0, ptr %1, ptr %2) {
  %4 = load i64, ptr null, align 8
  %5 = insertvalue [4 x i64] poison, i64 12, 1
  %6 = insertvalue [4 x i64] %5, i64 poison, 2
  %7 = insertvalue [4 x i64] %6, i64 poison, 3
  %8 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } poison, [4 x i64] %7, 1
  %9 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } %8, [4 x i64] poison, 2
  %10 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } %9, i1 poison, 3
  %11 = icmp ne i64 %4, 1
  %12 = or i1 false, %11
  %13 = select i1 %12, i64 %4, i64 0
  %14 = zext i1 %12 to i64
  %15 = insertvalue [4 x i64] poison, i64 12, 1
  %16 = insertvalue [4 x i64] %15, i64 poison, 2
  %17 = insertvalue [4 x i64] %16, i64 %13, 3
  %18 = insertvalue [4 x i64] poison, i64 %14, 3
  %19 = icmp eq i64 0, 0
  %20 = icmp eq i64 0, 0
  %21 = icmp eq i64 %13, 0
  %22 = and i1 %20, %19
  %23 = select i1 %22, i1 %21, i1 false
  %24 = select i1 %23, i1 %12, i1 false
  %25 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } poison, [4 x i64] %17, 1
  %26 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } %25, [4 x i64] %18, 2
  %27 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } %26, i1 %24, 3
  %28 = insertvalue { { ptr, [4 x i64], [4 x i64], i1 }, { ptr, [4 x i64], [4 x i64], i1 } } undef, { ptr, [4 x i64], [4 x i64], i1 } %10, 0
  %29 = insertvalue { { ptr, [4 x i64], [4 x i64], i1 }, { ptr, [4 x i64], [4 x i64], i1 } } %28, { ptr, [4 x i64], [4 x i64], i1 } %27, 1
  br label %31

30:                                               ; preds = %3
  br label %softmax_pass

31:                                               ; preds = %31
  %exitcond.not.i = icmp eq i64 poison, 3
  br i1 %exitcond.not.i, label %37, label %31

32:                                               ; preds = %31
  br i1 poison, label %34, label %33

33:                                               ; preds = %32
  br label %34

34:                                               ; preds = %33, %32
  br i1 poison, label %35, label %36

35:                                               ; preds = %34
  br label %softmax_pass

36:                                               ; preds = %34
  br i1 poison, label %37, label %.critedge.i

37:                                               ; preds = %36
  br i1 poison, label %38, label %.critedge.i

38:                                               ; preds = %37
  br i1 poison, label %40, label %39

39:                                               ; preds = %38
  br label %40

40:                                               ; preds = %39, %38
  br i1 poison, label %.lr.ph28.i, label %._crit_edge.i

.lr.ph28.i:                                       ; preds = %40
  br label %41

41:                                               ; preds = %51, %.lr.ph28.i
  br i1 poison, label %.thread, label %42

42:                                               ; preds = %41
  br i1 poison, label %43, label %44

43:                                               ; preds = %42
  br label %45

44:                                               ; preds = %42
  br label %45

45:                                               ; preds = %44, %43
  br i1 poison, label %46, label %.thread

46:                                               ; preds = %45
  br label %47

.thread:                                          ; preds = %45, %41
  br label %47

47:                                               ; preds = %.thread, %46
  br i1 poison, label %51, label %48

48:                                               ; preds = %47
  br i1 poison, label %49, label %50

49:                                               ; preds = %48
  br label %51

50:                                               ; preds = %48
  br label %51

51:                                               ; preds = %50, %49, %47
  call void @foo({ { ptr, [4 x i64], [4 x i64], i1 }, { ptr, [4 x i64], [4 x i64], i1 } } %29, { ptr } poison, { ptr, i64, i8 } poison)
  br i1 poison, label %._crit_edge.i, label %41

._crit_edge.i:                                    ; preds = %51, %40
  br label %softmax_pass

.critedge.i:                                      ; preds = %37, %36
  br i1 poison, label %.lr.ph.i, label %softmax_pass

.lr.ph.i:                                         ; preds = %.lr.ph.i, %.critedge.i
  store { ptr, [4 x i64], [4 x i64], i1 } %10, ptr poison, align 8
  br i1 poison, label %.lr.ph.i, label %softmax_pass

softmax_pass:                                     ; preds = %.lr.ph.i, %.critedge.i, %._crit_edge.i, %35, %30
  ret void
}
```
2023-10-24 07:08:38 +00:00
weiguozhi
24633eac38
[Peephole] Check instructions from CopyMIs are still COPY (#69511)
Function foldRedundantCopy records COPY instructions in CopyMIs and uses
it later. But other optimizations may delete or modify it. So before
using it we should check if the extracted instruction is existing and
still a COPY instruction.
2023-10-20 08:34:43 -07:00
Guozhi Wei
760e7d00d1 [X86, Peephole] Enable FoldImmediate for X86
Enable FoldImmediate for X86 by implementing X86InstrInfo::FoldImmediate.

Also enhanced peephole by deleting identical instructions after FoldImmediate.

Differential Revision: https://reviews.llvm.org/D151848
2023-10-17 16:22:42 +00:00
Akshay Khadse
8bf7f86d79 Fix uninitialized pointer members in CodeGen
This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D148303
2023-04-17 16:32:46 +08:00
Craig Topper
e72ca520bb [CodeGen] Remove uses of Register::isPhysicalRegister/isVirtualRegister. NFC
Use isPhysical/isVirtual methods.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D141715
2023-01-13 14:38:08 -08:00
Fangrui Song
67819a72c6 [CodeGen] llvm::Optional => std::optional 2022-12-13 09:06:36 +00:00
Philip Reames
7dbf2e7b57 Teach PeepholeOpt to eliminate redundant copy from constant physreg (e.g VLENB on RISCV)
The existing redundant copy elimination required a virtual register source, but the same logic works for any physreg where we don't have to worry about clobbers.  On RISCV, this helps eliminate redundant CSR reads from VLENB.

Differential Revision: https://reviews.llvm.org/D125564
2022-05-16 16:38:30 -07:00
Shengchen Kan
37b378386e [NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments 2022-03-16 20:25:42 +08:00
serge-sans-paille
989f1c72e0 Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169

after:  1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
2022-03-16 08:43:00 +01:00
Nico Weber
a278250b0f Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10 07:59:22 -05:00
serge-sans-paille
7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00
Kazu Hirata
3a8c51480f [CodeGen] Use = default (NFC)
Identified with modernize-use-equals-default
2022-02-06 10:54:44 -08:00
Nikita Popov
0529e2e018 [InstrInfo] Use 64-bit immediates for analyzeCompare() (NFCI)
The backend generally uses 64-bit immediates (e.g. what
MachineOperand::getImm() returns), so use that for analyzeCompare()
and optimizeCompareInst() as well. This avoids truncation for
targets that support immediates larger 32-bit. In particular, we
can avoid the bugprone value normalization hack in the AArch64
target.

This is a followup to D108076.

Differential Revision: https://reviews.llvm.org/D108875
2021-08-30 19:46:04 +02:00
Ahsan Saghir
31ef15e044 Teach peephole optimizer to not emit sub-register defs
Peephole optimizer should not be introducing sub-reg definitions
as they are illegal in machine SSA phase. This patch modifies
the optimizer to not emit sub-register definitions.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D103408
2021-06-28 09:24:07 -05:00
Barry Revzin
92310454bf Make LLVM build in C++20 mode
Part of the <=> changes in C++20 make certain patterns of writing equality
operators ambiguous with themselves (sorry!).
This patch goes through and adjusts all the comparison operators such that
they should work in both C++17 and C++20 modes. It also makes two other small
C++20-specific changes (adding a constructor to a type that cases to be an
aggregate, and adding casts from u8 literals which no longer have type
const char*).

There were four categories of errors that this review fixes.
Here are canonical examples of them, ordered from most to least common:

// 1) Missing const
namespace missing_const {
    struct A {
    #ifndef FIXED
        bool operator==(A const&);
    #else
        bool operator==(A const&) const;
    #endif
    };

    bool a = A{} == A{}; // error
}

// 2) Type mismatch on CRTP
namespace crtp_mismatch {
    template <typename Derived>
    struct Base {
    #ifndef FIXED
        bool operator==(Derived const&) const;
    #else
        // in one case changed to taking Base const&
        friend bool operator==(Derived const&, Derived const&);
    #endif
    };

    struct D : Base<D> { };

    bool b = D{} == D{}; // error
}

// 3) iterator/const_iterator with only mixed comparison
namespace iter_const_iter {
    template <bool Const>
    struct iterator {
        using const_iterator = iterator<true>;

        iterator();

        template <bool B, std::enable_if_t<(Const && !B), int> = 0>
        iterator(iterator<B> const&);

    #ifndef FIXED
        bool operator==(const_iterator const&) const;
    #else
        friend bool operator==(iterator const&, iterator const&);
    #endif
    };

    bool c = iterator<false>{} == iterator<false>{} // error
          || iterator<false>{} == iterator<true>{}
          || iterator<true>{} == iterator<false>{}
          || iterator<true>{} == iterator<true>{};
}

// 4) Same-type comparison but only have mixed-type operator
namespace ambiguous_choice {
    enum Color { Red };

    struct C {
        C();
        C(Color);
        operator Color() const;
        bool operator==(Color) const;
        friend bool operator==(C, C);
    };

    bool c = C{} == C{}; // error
    bool d = C{} == Red;
}

Differential revision: https://reviews.llvm.org/D78938
2020-12-17 10:44:10 +00:00
Alexandre Ganea
4b64ce7428 Improve 723fea23079f9c85800e5cdc90a75414af182bfd - Silence 'warning: unused variable' when compiling with Clang 10.0 2020-09-24 09:07:22 -04:00
Alexandre Ganea
723fea2307 Silence 'warning: unused variable' when compiling with Clang 10.0 2020-09-22 12:17:40 -04:00
Michael Liao
534f6e1718 [PeepholeOptimizer] Enhance the redundant COPY elimination.
- Eliminate redundant COPYs from the same register & subregister pair.

Differential Revision: https://reviews.llvm.org/D87939
2020-09-22 10:11:37 -04:00
Matt Arsenault
e21bb31eb6 CodeGen: Require SSA to run PeepholeOptimizer 2020-09-11 18:03:04 -04:00
Jay Foad
4aaf772542 [PeepholeOptimizer] Remove dead code
At this point we have already ruled out all def operands, so we can't
possibly see a dead implicit def operand.
2020-08-20 16:48:57 +01:00
Matt Arsenault
f9c279b057 PeepholeOptimizer: Use Register 2020-08-10 08:49:36 -04:00