llvm-project

Author	SHA1	Message	Date
Kazu Hirata	aa613777af	[llvm] Remove redundant control flow (NFC) (#138304 )	2025-05-02 10:34:25 -07:00
Matt Arsenault	c3c97eab12	PeepholeOpt: Do not skip reg_sequence sources with subregs (#125667 ) Contrary to the comment, this particular code is not responsible for handling any composes that may be required, and unhandled cases are already rejected later. Lift this restriction to permit composes and reg_sequence subregisters later.	2025-03-13 21:49:16 +07:00
Matt Arsenault	c22db56d77	PeepholeOpt: Remove subreg def check for bitcast (#130086 ) Subregister defs are illegal in SSA. Surprisingly this enables folding into subregister insert patterns in one test.	2025-03-07 07:44:08 +07:00
Matt Arsenault	a6e69db52f	PeepholeOpt: Remove subreg def check for insert_subreg (#130085 )	2025-03-07 07:40:51 +07:00
Matt Arsenault	f0c2c71d2c	PeepholeOpt: Remove dead checks for subregister def mismatch (#130084 )	2025-03-07 07:31:33 +07:00
Matt Arsenault	83ccab35d4	PeepholeOpt: Remove pointless check for subregister def (#128850 ) Subregister defs are illegal in SSA	2025-02-26 20:40:06 +07:00
Matt Arsenault	c53eb93dd7	PeepholeOpt: Immediately check if a reg_sequence compose supports a subregister (#128279 ) This is a quick fix for EXPENSIVE_CHECKS bot failures. I still think we could defer looking for a compatible subregister further up the use-def chain, and should be able to check compatibilty with the ultimate found source.	2025-02-26 10:15:17 +07:00
Matt Arsenault	1bb43068f1	PeepholeOpt: Allow introducing subregister uses on reg_sequence (#127052 ) This reverts d246cc618adc52fdbd69d44a2a375c8af97b6106. We now handle composing subregister extracts through reg_sequence.	2025-02-22 09:16:14 +07:00
Matt Arsenault	ed38d6702f	PeepholeOpt: Handle subregister compose when looking through reg_sequence (#127051 ) Previously this would give up on folding subregister copies through a reg_sequence if the input operand already had a subregister index. d246cc618adc52fdbd69d44a2a375c8af97b6106 stopped introducing these subregister uses, and this is the first step to lifting that restriction. I was expecting to be able to implement this only purely with compose / reverse compose, but I wasn't able to make it work so relies on testing the lanemasks for whether the copy reads a subset of the input.	2025-02-18 08:07:29 +07:00
Matt Arsenault	58a88001f3	PeepholeOpt: Fix looking for def of current copy to coalesce (#125533 ) This fixes the handling of subregister extract copies. This will allow AMDGPU to remove its implementation of shouldRewriteCopySrc, which exists as a 10 year old workaround to this bug. peephole-opt-fold-reg-sequence-subreg.mir will show the expected improvement once the custom implementation is removed. The copy coalescing processing here is overly abstracted from what's actually happening. Previously when visiting coalescable copy-like instructions, we would parse the sources one at a time and then pass the def of the root instruction into findNextSource. This means that the first thing the new ValueTracker constructed would do is getVRegDef to find the instruction we are currently processing. This adds an unnecessary step, placing a useless entry in the RewriteMap, and required skipping the no-op case where getNewSource would return the original source operand. This was a problem since in the case of a subregister extract, shouldRewriteCopySource would always say that it is useful to rewrite and the use-def chain walk would abort, returning the original operand. Move the process to start looking at the source operand to begin with. This does not fix the confused handling in the uncoalescable copy case which is proving to be more difficult. Some currently handled cases have multiple defs from a single source, and other handled cases have 0 input operands. It would be simpler if this was implemented with isCopyLikeInstr, rather than guessing at the operand structure as it does now. There are some improvements and some regressions. The regressions appear to be downstream issues for the most part. One of the uglier regressions is in PPC, where a sequence of insert_subrgs is used to build registers. I opened #125502 to use reg_sequence instead, which may help. The worst regression is an absurd SPARC testcase using a <251 x fp128>, which uses a very long chain of insert_subregs. We need improved subregister handling locally in PeepholeOptimizer, and other pasess like MachineCSE to fix some of the other regressions. We should handle subregister composes and folding more indexes into insert_subreg and reg_sequence.	2025-02-05 23:29:02 +07:00
Matt Arsenault	0e49c74f36	PeepholeOpt: Make copy ID methods static	2025-02-03 22:44:56 +07:00
Matt Arsenault	1a314b2472	PeepholeOpt: Fix copy current source index accounting bug We were essentially using the current source index as a binary value, and didn't actually use it for indexing so it did not matter. Use the operand to ensure the value is actually correct.	2025-01-31 21:17:20 +07:00
Matt Arsenault	fe1f6b4855	PeepholeOpt: Avoid double map lookup (#124531 )	2025-01-30 20:59:58 +07:00
Matt Arsenault	83ca720ef2	PeepholeOpt: Remove check for reg_sequence def of subregister (#124512 ) The verifier does not allow reg_sequence to have subregister defs, even if undef.	2025-01-30 20:55:48 +07:00
Matt Arsenault	8d506b9a5b	PeepholeOpt: Simplify tracking of current op for copy and reg_sequence (#124224 ) Set the starting index in the constructor instead of treating 0 as a special case. There should also be no need for bounds checking in the rewrite.	2025-01-30 20:49:43 +07:00
Matt Arsenault	d246cc618a	PeepholeOpt: Do not add subregister indexes to reg_sequence operands (#124111 ) Given the rest of the pass just gives up when it needs to compose subregisters, folding a subregister extract directly into a reg_sequence is counterproductive. Later fold attempts in the function will give up on the subregister operand, preventing looking up through the reg_sequence. It may still be profitable to do these folds if we start handling the composes. There are some test regressions, but this mostly looks better.	2025-01-30 20:42:02 +07:00
Matt Arsenault	15c2d4baf1	PeepholeOpt: Remove check for subreg index on a def operand (#123943 ) This is looking at operand 0 of a REG_SEQUENCE, which can never have a subregister index.	2025-01-23 09:06:26 +07:00
Matt Arsenault	2646e2d487	PeepholeOpt: Stop allocating tiny helper classes (NFC) (#123936 ) This was allocating tiny helper classes for every instruction visited. We can just dispatch over the cases in the visitor function instead.	2025-01-23 09:00:08 +07:00
Matt Arsenault	6f69adeed6	PeepholeOpt: Remove null TargetRegisterInfo check (#123933 ) This cannot happen. Also simplify the LaneBitmask check from !none to any.	2025-01-23 08:57:04 +07:00
Matt Arsenault	23d2a1862a	PeepholeOpt: Remove unnecessary check for null TargetInstrInfo (#123929 ) This can never happen.	2025-01-23 08:46:59 +07:00
Daniel Paoliello	19032bfe87	[aarch64][win] Update Called Globals info when updating Call Site info (#122762 ) Fixes the "use after poison" issue introduced by #121516 (see <https://github.com/llvm/llvm-project/pull/121516#issuecomment-2585912395>). The root cause of this issue is that #121516 introduced "Called Global" information for call instructions modeling how "Call Site" info is stored in the machine function, HOWEVER it didn't copy the copy/move/erase operations for call site information. The fix is to rename and update the existing copy/move/erase functions so they also take care of Called Global info.	2025-01-13 14:00:31 -08:00
Akshat Oke	3f9d02aae8	[CodeGen][NewPM] Port PeepholeOptimizer to NPM (#116326 ) With this, all machine SSA optimization passes are available in the new codegen pipeline.	2024-11-18 11:02:01 +05:30
Akshat Oke	00aa08119a	[NFC] Clang format PeepholeOptimizer (#116325 )	2024-11-18 10:58:48 +05:30
paperchalice	79d0de2ac3	[CodeGen][NewPM] Port `machine-loops` to new pass manager (#97793 ) - Add `MachineLoopAnalysis`. - Add `MachineLoopPrinterPass`. - Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.	2024-07-09 09:11:18 +08:00
Kazu Hirata	dae061f1b2	[CodeGen] Use range-based for loops (NFC) (#96777 )	2024-06-26 16:49:00 -07:00
paperchalice	837dc542b1	[CodeGen][NewPM] Split `MachineDominatorTree` into a concrete analysis result (#94571 ) Prepare for new pass manager version of `MachineDominatorTreeAnalysis`. We may need a machine dominator tree version of `DomTreeUpdater` to handle `SplitCriticalEdge` in some CodeGen passes.	2024-06-11 21:27:14 +08:00
Xu Zhang	f6d431f208	[CodeGen] Make the parameter TRI required in some functions. (#85968 ) Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.	2024-04-24 14:24:14 +01:00
Shengchen Kan	550f0eb2ce	[NFC] Rename TargetInstrInfo::FoldImmediate to TargetInstrInfo::foldImmediate and simplify implementation for X86	2024-01-26 20:50:58 +08:00
Guozhi Wei	9a091de7fe	[X86, Peephole] Enable FoldImmediate for X86 Enable FoldImmediate for X86 by implementing X86InstrInfo::FoldImmediate. Also enhanced peephole by deleting identical instructions after FoldImmediate. Differential Revision: https://reviews.llvm.org/D151848	2023-10-27 19:47:23 +00:00
Mogball	3fb5b18e81	Revert 24633ea and 760e7d0 "Enable FoldImmediate for X86" This reverts commits 24633eac38d46cd4b253ba53258165ee08d886cd and 760e7d00d142ba85fcf48c00e0acc14a355da7c3. I have confirmed that these commits are introducing a new crash in the peephole optimizer. I have minimized a test case, which you can find below. ```llvmir ; ModuleID = 'bugpoint-reduced-simplified.bc' source_filename = "/mnt/big/modular/Kernels/mojo/Mogg/MOGG.mojo" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" declare dso_local void @foo({ { ptr, [4 x i64], [4 x i64], i1 }, { ptr, [4 x i64], [4 x i64], i1 } }, { ptr }, { ptr, i64, i8 }) define dso_local void @bad_fn(ptr %0, ptr %1, ptr %2) { %4 = load i64, ptr null, align 8 %5 = insertvalue [4 x i64] poison, i64 12, 1 %6 = insertvalue [4 x i64] %5, i64 poison, 2 %7 = insertvalue [4 x i64] %6, i64 poison, 3 %8 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } poison, [4 x i64] %7, 1 %9 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } %8, [4 x i64] poison, 2 %10 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } %9, i1 poison, 3 %11 = icmp ne i64 %4, 1 %12 = or i1 false, %11 %13 = select i1 %12, i64 %4, i64 0 %14 = zext i1 %12 to i64 %15 = insertvalue [4 x i64] poison, i64 12, 1 %16 = insertvalue [4 x i64] %15, i64 poison, 2 %17 = insertvalue [4 x i64] %16, i64 %13, 3 %18 = insertvalue [4 x i64] poison, i64 %14, 3 %19 = icmp eq i64 0, 0 %20 = icmp eq i64 0, 0 %21 = icmp eq i64 %13, 0 %22 = and i1 %20, %19 %23 = select i1 %22, i1 %21, i1 false %24 = select i1 %23, i1 %12, i1 false %25 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } poison, [4 x i64] %17, 1 %26 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } %25, [4 x i64] %18, 2 %27 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } %26, i1 %24, 3 %28 = insertvalue { { ptr, [4 x i64], [4 x i64], i1 }, { ptr, [4 x i64], [4 x i64], i1 } } undef, { ptr, [4 x i64], [4 x i64], i1 } %10, 0 %29 = insertvalue { { ptr, [4 x i64], [4 x i64], i1 }, { ptr, [4 x i64], [4 x i64], i1 } } %28, { ptr, [4 x i64], [4 x i64], i1 } %27, 1 br label %31 30: ; preds = %3 br label %softmax_pass 31: ; preds = %31 %exitcond.not.i = icmp eq i64 poison, 3 br i1 %exitcond.not.i, label %37, label %31 32: ; preds = %31 br i1 poison, label %34, label %33 33: ; preds = %32 br label %34 34: ; preds = %33, %32 br i1 poison, label %35, label %36 35: ; preds = %34 br label %softmax_pass 36: ; preds = %34 br i1 poison, label %37, label %.critedge.i 37: ; preds = %36 br i1 poison, label %38, label %.critedge.i 38: ; preds = %37 br i1 poison, label %40, label %39 39: ; preds = %38 br label %40 40: ; preds = %39, %38 br i1 poison, label %.lr.ph28.i, label %._crit_edge.i .lr.ph28.i: ; preds = %40 br label %41 41: ; preds = %51, %.lr.ph28.i br i1 poison, label %.thread, label %42 42: ; preds = %41 br i1 poison, label %43, label %44 43: ; preds = %42 br label %45 44: ; preds = %42 br label %45 45: ; preds = %44, %43 br i1 poison, label %46, label %.thread 46: ; preds = %45 br label %47 .thread: ; preds = %45, %41 br label %47 47: ; preds = %.thread, %46 br i1 poison, label %51, label %48 48: ; preds = %47 br i1 poison, label %49, label %50 49: ; preds = %48 br label %51 50: ; preds = %48 br label %51 51: ; preds = %50, %49, %47 call void @foo({ { ptr, [4 x i64], [4 x i64], i1 }, { ptr, [4 x i64], [4 x i64], i1 } } %29, { ptr } poison, { ptr, i64, i8 } poison) br i1 poison, label %._crit_edge.i, label %41 ._crit_edge.i: ; preds = %51, %40 br label %softmax_pass .critedge.i: ; preds = %37, %36 br i1 poison, label %.lr.ph.i, label %softmax_pass .lr.ph.i: ; preds = %.lr.ph.i, %.critedge.i store { ptr, [4 x i64], [4 x i64], i1 } %10, ptr poison, align 8 br i1 poison, label %.lr.ph.i, label %softmax_pass softmax_pass: ; preds = %.lr.ph.i, %.critedge.i, %._crit_edge.i, %35, %30 ret void } ```	2023-10-24 07:08:38 +00:00
weiguozhi	24633eac38	[Peephole] Check instructions from CopyMIs are still COPY (#69511 ) Function foldRedundantCopy records COPY instructions in CopyMIs and uses it later. But other optimizations may delete or modify it. So before using it we should check if the extracted instruction is existing and still a COPY instruction.	2023-10-20 08:34:43 -07:00
Guozhi Wei	760e7d00d1	[X86, Peephole] Enable FoldImmediate for X86 Enable FoldImmediate for X86 by implementing X86InstrInfo::FoldImmediate. Also enhanced peephole by deleting identical instructions after FoldImmediate. Differential Revision: https://reviews.llvm.org/D151848	2023-10-17 16:22:42 +00:00
Akshay Khadse	8bf7f86d79	Fix uninitialized pointer members in CodeGen This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D148303	2023-04-17 16:32:46 +08:00
Craig Topper	e72ca520bb	[CodeGen] Remove uses of Register::isPhysicalRegister/isVirtualRegister. NFC Use isPhysical/isVirtual methods. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D141715	2023-01-13 14:38:08 -08:00
Fangrui Song	67819a72c6	[CodeGen] llvm::Optional => std::optional	2022-12-13 09:06:36 +00:00
Philip Reames	7dbf2e7b57	Teach PeepholeOpt to eliminate redundant copy from constant physreg (e.g VLENB on RISCV) The existing redundant copy elimination required a virtual register source, but the same logic works for any physreg where we don't have to worry about clobbers. On RISCV, this helps eliminate redundant CSR reads from VLENB. Differential Revision: https://reviews.llvm.org/D125564	2022-05-16 16:38:30 -07:00
Shengchen Kan	37b378386e	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments	2022-03-16 20:25:42 +08:00
serge-sans-paille	989f1c72e0	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681	2022-03-16 08:43:00 +01:00
Nico Weber	a278250b0f	Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169	2022-03-10 07:59:22 -05:00
serge-sans-paille	7f230feeea	Cleanup codegen includes after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169	2022-03-10 10:00:30 +01:00
Kazu Hirata	3a8c51480f	[CodeGen] Use = default (NFC) Identified with modernize-use-equals-default	2022-02-06 10:54:44 -08:00
Nikita Popov	0529e2e018	[InstrInfo] Use 64-bit immediates for analyzeCompare() (NFCI) The backend generally uses 64-bit immediates (e.g. what MachineOperand::getImm() returns), so use that for analyzeCompare() and optimizeCompareInst() as well. This avoids truncation for targets that support immediates larger 32-bit. In particular, we can avoid the bugprone value normalization hack in the AArch64 target. This is a followup to D108076. Differential Revision: https://reviews.llvm.org/D108875	2021-08-30 19:46:04 +02:00
Ahsan Saghir	31ef15e044	Teach peephole optimizer to not emit sub-register defs Peephole optimizer should not be introducing sub-reg definitions as they are illegal in machine SSA phase. This patch modifies the optimizer to not emit sub-register definitions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D103408	2021-06-28 09:24:07 -05:00
Barry Revzin	92310454bf	Make LLVM build in C++20 mode Part of the <=> changes in C++20 make certain patterns of writing equality operators ambiguous with themselves (sorry!). This patch goes through and adjusts all the comparison operators such that they should work in both C++17 and C++20 modes. It also makes two other small C++20-specific changes (adding a constructor to a type that cases to be an aggregate, and adding casts from u8 literals which no longer have type const char*). There were four categories of errors that this review fixes. Here are canonical examples of them, ordered from most to least common: // 1) Missing const namespace missing_const { struct A { #ifndef FIXED bool operator==(A const&); #else bool operator==(A const&) const; #endif }; bool a = A{} == A{}; // error } // 2) Type mismatch on CRTP namespace crtp_mismatch { template <typename Derived> struct Base { #ifndef FIXED bool operator==(Derived const&) const; #else // in one case changed to taking Base const& friend bool operator==(Derived const&, Derived const&); #endif }; struct D : Base<D> { }; bool b = D{} == D{}; // error } // 3) iterator/const_iterator with only mixed comparison namespace iter_const_iter { template <bool Const> struct iterator { using const_iterator = iterator<true>; iterator(); template <bool B, std::enable_if_t<(Const && !B), int> = 0> iterator(iterator<B> const&); #ifndef FIXED bool operator==(const_iterator const&) const; #else friend bool operator==(iterator const&, iterator const&); #endif }; bool c = iterator<false>{} == iterator<false>{} // error \|\| iterator<false>{} == iterator<true>{} \|\| iterator<true>{} == iterator<false>{} \|\| iterator<true>{} == iterator<true>{}; } // 4) Same-type comparison but only have mixed-type operator namespace ambiguous_choice { enum Color { Red }; struct C { C(); C(Color); operator Color() const; bool operator==(Color) const; friend bool operator==(C, C); }; bool c = C{} == C{}; // error bool d = C{} == Red; } Differential revision: https://reviews.llvm.org/D78938	2020-12-17 10:44:10 +00:00
Alexandre Ganea	4b64ce7428	Improve 723fea23079f9c85800e5cdc90a75414af182bfd - Silence 'warning: unused variable' when compiling with Clang 10.0	2020-09-24 09:07:22 -04:00
Alexandre Ganea	723fea2307	Silence 'warning: unused variable' when compiling with Clang 10.0	2020-09-22 12:17:40 -04:00
Michael Liao	534f6e1718	[PeepholeOptimizer] Enhance the redundant COPY elimination. - Eliminate redundant COPYs from the same register & subregister pair. Differential Revision: https://reviews.llvm.org/D87939	2020-09-22 10:11:37 -04:00
Matt Arsenault	e21bb31eb6	CodeGen: Require SSA to run PeepholeOptimizer	2020-09-11 18:03:04 -04:00
Jay Foad	4aaf772542	[PeepholeOptimizer] Remove dead code At this point we have already ruled out all def operands, so we can't possibly see a dead implicit def operand.	2020-08-20 16:48:57 +01:00
Matt Arsenault	f9c279b057	PeepholeOptimizer: Use Register	2020-08-10 08:49:36 -04:00

1 2 3 4 5

204 Commits