llvm-project

Author	SHA1	Message	Date
Shubham Sandeep Rastogi	92f916faba	Add a pass to collect dropped var statistics for MIR (#126686 ) This patch attempts to reland https://github.com/llvm/llvm-project/pull/120780 while addressing the issues that caused the patch to be reverted. Namely: 1. The patch had included code from the llvm/Passes directory in the llvm/CodeGen directory. 2. The patch increased the backend compile time by 2% due to adding a very expensive include in MachineFunctionPass.h The patch has been re-structured so that there is no dependency between the llvm/Passes and llvm/CodeGen directory, by moving the base class, `class DroppedVariableStats` to the llvm/IR directory. The expensive include in MachineFunctionPass.h has been changed to contain forward declarations instead of other header includes which was pulling a ton of code into MachineFunctionPass.h and should resolve any issues when it comes to compile time increase.	2025-02-12 14:08:18 -08:00
Akshat Oke	7b60e03d73	Reland "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703 )" (#126684 ) `RegisterClassInfo` was supposed to be kept alive between pass runs, which wasn't being done leading to recomputations increasing the compile time. Now the Impl class is a member of the legacy and new passes so that it is not reconstructed on every pass run. --------- Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>	2025-02-12 18:54:39 +05:30
David Green	bf7af2d12e	[AArch64][DAG] Allow fptos/ui.sat to scalarized. (#126799 ) We we previously running into problems with fp128 types and certain integer sizes. Fixes an issue reported on #124984	2025-02-12 11:04:08 +00:00
Craig Topper	7dd82805d5	[SelectionDAGBuilder] Remove NodeMap updates from getValueImpl. NFC (#126849 ) Both callers already put the result in NodeMap immediately after the call.	2025-02-12 00:07:07 -08:00
Haohai Wen	ec28e9b757	[MC] Replace MCContext::GenericSectionID with MCSection::NonUniqueID (#126202 ) They have same semantics. NonUniqueID is more friendly for isUnique implementation in MCSectionELF. History: 97837b7 added support for unique IDs in sections and added GenericSectionID. Later, 1dc16c7 added NonUniqueID.	2025-02-12 14:28:37 +08:00
Jim Lin	31bfae35d2	[DAGCombiner] Add hasOneUse checks for folding (not (add X, -1)) to (neg X) (#126667 ) To get more better codegen for AArch with bic, x86 with andn and riscv with andn.	2025-02-12 12:24:29 +08:00
Daniel Hoekwater	3a22cf9bd8	[CFIFixup] Fixup CFI for split functions with synchronous uwtables (#125299 ) - Precommit tests for synchronous uwtable CFI fixup - [CFIFixup] Fixup CFI for split functions with synchronous uwtables Commit `6e54fccede` disables CFI fixup for functions with synchronous tables, breaking CFI for split functions. Instead, we can disable block-level CFI fixup for functions with synchronous tables. Unwind tables can be: - N/A (not present) - Asynchronous - Synchronous Functions without unwind tables don't need CFI fixup (since they don't care about CFI). Functions with asynchronous unwind tables must be accurate for each basic block, so full CFI fixup is necessary. Functions with synchronous unwind tables only need to be accurate for each function (specifically, the portion of a function in a given section). Disabling CFI fixup entirely for functions with synchronous uwtables may break CFI for a function split between two sections. The portion in the first section may have valid CFI, while the portion in the second section is missing a call frame. Ex: ``` (.text.hot) Foo (BB1): <Call frame information> ... BB2: ... (.text.split) BB3: ... BB4: <epilogue> ``` Even if `Foo` has a synchronous unwind table, we still need to insert call frame information into `BB3` so that unwinding the call stack from `BB3` or `BB4` works properly.	2025-02-11 18:25:08 -05:00
Philip Reames	e4016bf5c3	[DAG] Use ArrayRef to simplify ShuffleVectorSDNode::isSplatMask	2025-02-11 12:47:10 -08:00
Benjamin Maxwell	19556eccf6	[RTLIB] Rename getFSINCOS() to getSINCOS (NFC) (#126705 ) This makes the name more consistent with the other helpers.	2025-02-11 11:51:35 +00:00
Benjamin Maxwell	701223ac20	[IR] Add llvm.sincospi intrinsic (#125873 ) This adds the `llvm.sincospi` intrinsic, legalization, and lowering (mostly reusing the lowering for sincos and frexp). The `llvm.sincospi` intrinsic takes a floating-point value and returns both the sine and cosine of the value multiplied by pi. It computes the result more accurately than the naive approach of doing the multiplication ahead of time, especially for large input values. ``` declare { float, float } @llvm.sincospi.f32(float %Val) declare { double, double } @llvm.sincospi.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincospi.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincospi.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincospi.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincospi.v4f32(<4 x float> %Val) ``` Currently, the default lowering of this intrinsic relies on the `sincospi[f\|l]` functions being available in the target's runtime (e.g. libc).	2025-02-11 09:01:30 +00:00
Rahul Joshi	0f674cce82	[NFC][LLVM] Remove unused `TargetIntrinsicInfo` class (#126003 ) Remove `TargetIntrinsicInfo` class as its practically unused (its pure virtual with no subclasses) and its references in the code.	2025-02-10 14:56:30 -08:00
Jinsong Ji	5d2e2847e0	MachineCopyPropagation: Do not remove copies preserved by regmask (#125868 ) llvm/llvm-project@9e436c2daa tries to handle register masks and sub-registers, it avoids clobbering RegUnit presreved by regmask. But it then introduces invalid pointer issues. We delete the copies without invalidate all the use in the CopyInfo, so we dereferenced invalid pointers in next interation, causing asserts. Fixes: #126107 --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-02-10 12:26:33 -05:00
Shilei Tian	70fdd9f0a2	[GlobalISel] Check whether `G_CTLZ` is legal in `matchUMulHToLShr` (#126457 ) We need to check `G_CTLZ` because the combine uses `G_CTLZ` to get log base 2, and it is not always legal for on a target. Fixes SWDEV-512440.	2025-02-10 00:11:09 -05:00
Kazu Hirata	d1af9ca9fd	[AsmPrinter] Avoid repeated map lookups (NFC) (#126431 )	2025-02-09 13:34:47 -08:00
Kazu Hirata	c741cf1617	[CodeGen] Avoid repeated hash lookups (NFC) (#126403 )	2025-02-09 08:55:43 -08:00
Abhishek Kaushik	7b348f9bfd	[MIR][NFC] Use `std::move` to avoid copying (#125930 )	2025-02-09 13:51:34 +05:30
Akshat Oke	564b9b7f4d	Revert "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703 )" (#126268 ) This reverts commit 5aa4979c47255770cac7b557f3e4a980d0131d69 while I investigate what's causing the compile-time regression.	2025-02-08 15:36:48 +05:30
Kazu Hirata	1c497c4837	[CodeGen] Avoid repeated hash lookups (NFC) (#126343 )	2025-02-08 00:48:01 -08:00
Yashas Andaluri	a361de6d13	[RDF] Create phi nodes for clobbering defs (#123694 ) When a def in a block A reaches another block B that is in A's iterated dominance frontier, a phi node is added to B for the def register. A clobbering def can be created at a call instruction, for a register clobbered by a call. However, phi nodes are not created for a register, when one of the reaching defs of the register is a clobbering def. This patch adds phi nodes for registers that have a clobbering reaching def. These additional phis help in checking reaching defs for an instruction in RDF based copy propagation and addressing mode optimizations.	2025-02-07 08:28:29 -06:00
Benjamin Maxwell	4bf97aa818	[IR] Add `llvm.modf` intrinsic (#121948 ) This adds the `llvm.modf` intrinsic, legalization, and lowering (mostly reusing the lowering for sincos and frexp). The `llvm.modf` intrinsic takes a floating-point value and returns both the integral and fractional parts (as a struct). ``` declare { float, float } @llvm.modf.f32(float %Val) declare { double, double } @llvm.modf.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.modf.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.modf.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.modf.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.modf.v4f32(<4 x float> %Val) ``` This corresponds to the libm `modf` function but returns multiple values in a struct (rather than take output pointers), which makes it easier to vectorize.	2025-02-07 09:25:13 +00:00
Mingming Liu	5399782508	[IR] Generalize Function's {set,get}SectionPrefix to GlobalObjects, the base class of {Function, GlobalVariable, IFunc} (#125757 ) This is a split of https://github.com/llvm/llvm-project/pull/125756	2025-02-06 14:51:13 -08:00
Matt Arsenault	c268a3f093	DAG: Fix extract of load combine with mismatched vector element type Fix the case where the vector element type of the loaded extractelement input does not match the result type of the extract. This fixes a regression reported after c55a7659b38946350315ac4a18d9805deb1f0a54	2025-02-06 22:56:56 +07:00
Michael Buch	eb8901bda1	[llvm][DebugInfo] Add new DW_AT_APPLE_enum_kind to encode enum_extensibility (#124752 ) When creating `EnumDecl`s from DWARF for Objective-C `NS_ENUM`s, the Swift compiler tries to figure out if it should perform "swiftification" of that enum (which involves renaming the enumerator cases, etc.). The heuristics by which it determines whether we want to swiftify an enum is by checking the `enum_extensibility` attribute (because that's what `NS_ENUM` pretty much are). Currently LLDB fails to attach the `EnumExtensibilityAttr` to `EnumDecl`s it creates (because there's not enough info in DWARF to derive it), which means we have to fall back to re-building Swift modules on-the-fly, slowing down expression evaluation substantially. This happens around `4b3931c8ce/lib/ClangImporter/ImportEnumInfo.cpp (L37-L59)` To speed up Swift exression evaluation, this patch proposes encoding the C/C++/Objective-C `enum_extensibility` attribute in DWARF via a new `DW_AT_APPLE_ENUM_KIND`. This would currently be only used from the LLDB Swift plugin. But may be of interest to other language plugins as well (though I haven't come up with a concrete use-case for it outside of Swift). I'm open to naming suggestions of the various new attributes/attribute constants proposed here. I tried to be as generic as possible if we wanted to extend it to other kinds of enum properties (e.g., flag enums). The new attribute would look as follows: ``` DW_TAG_enumeration_type DW_AT_type (0x0000003a "unsigned int") DW_AT_APPLE_enum_kind (DW_APPLE_ENUM_KIND_Closed) DW_AT_name ("ClosedEnum") DW_AT_byte_size (0x04) DW_AT_decl_file ("enum.c") DW_AT_decl_line (23) DW_TAG_enumeration_type DW_AT_type (0x0000003a "unsigned int") DW_AT_APPLE_enum_kind (DW_APPLE_ENUM_KIND_Open) DW_AT_name ("OpenEnum") DW_AT_byte_size (0x04) DW_AT_decl_file ("enum.c") DW_AT_decl_line (27) ``` Absence of the attribute means the extensibility of the enum is unknown and abides by whatever the language rules of that CU dictate. This does feel like a big hammer for quite a specific use-case, so I'm happy to discuss alternatives. Alternatives considered: * Re-using an existing DWARF attribute to express extensibility. E.g., a `DW_TAG_enumeration_type` could have a `DW_AT_count` or `DW_AT_upper_bound` indicating the number of enumerators, which could imply closed-ness. I felt like a dedicated attribute (which could be generalized further) seemed more applicable. But I'm open to re-using existing attributes. * Encoding the entire attribute string (i.e., `DW_TAG_LLVM_annotation ("enum_extensibility((open))")`) on the `DW_TAG_enumeration_type`. Then in LLDB somehow parse that out into a `EnumExtensibilityAttr`. I haven't found a great API in Clang to parse arbitrary strings into AST nodes (the ones I've found required fully formed C++ constructs). Though if someone knows of a good way to do this, happy to consider that too.	2025-02-06 08:58:35 +00:00
Min-Yih Hsu	5a1e16f6de	[IR][RISCV] Add llvm.vector.(de)interleave3/5/7 (#124825 ) These three intrinsics are similar to llvm.vector.(de)interleave2 but work with 3/5/7 vector operands or results. For RISC-V, it's important to have them in order to support segmented load/store with factor of 2 to 8: factor of 2/4/8 can be synthesized from (de)interleave2; factor of 6 can be synthesized from factor of 2 and 3; factor 5 and 7 have their own intrinsics added by this patch. This patch only adds codegen support for these intrinsics, we still need to teach vectorizer to generate them as well as teaching InterleavedAccessPass to use them. --------- Co-authored-by: Craig Topper <craig.topper@sifive.com>	2025-02-05 15:30:33 -08:00
Christudasan Devadasan	d86e379fd2	[CodeGen][NewPM] Port StackSlotColoring to NPM. (#125876 )	2025-02-05 23:18:16 +05:30
Matt Arsenault	58a88001f3	PeepholeOpt: Fix looking for def of current copy to coalesce (#125533 ) This fixes the handling of subregister extract copies. This will allow AMDGPU to remove its implementation of shouldRewriteCopySrc, which exists as a 10 year old workaround to this bug. peephole-opt-fold-reg-sequence-subreg.mir will show the expected improvement once the custom implementation is removed. The copy coalescing processing here is overly abstracted from what's actually happening. Previously when visiting coalescable copy-like instructions, we would parse the sources one at a time and then pass the def of the root instruction into findNextSource. This means that the first thing the new ValueTracker constructed would do is getVRegDef to find the instruction we are currently processing. This adds an unnecessary step, placing a useless entry in the RewriteMap, and required skipping the no-op case where getNewSource would return the original source operand. This was a problem since in the case of a subregister extract, shouldRewriteCopySource would always say that it is useful to rewrite and the use-def chain walk would abort, returning the original operand. Move the process to start looking at the source operand to begin with. This does not fix the confused handling in the uncoalescable copy case which is proving to be more difficult. Some currently handled cases have multiple defs from a single source, and other handled cases have 0 input operands. It would be simpler if this was implemented with isCopyLikeInstr, rather than guessing at the operand structure as it does now. There are some improvements and some regressions. The regressions appear to be downstream issues for the most part. One of the uglier regressions is in PPC, where a sequence of insert_subrgs is used to build registers. I opened #125502 to use reg_sequence instead, which may help. The worst regression is an absurd SPARC testcase using a <251 x fp128>, which uses a very long chain of insert_subregs. We need improved subregister handling locally in PeepholeOptimizer, and other pasess like MachineCSE to fix some of the other regressions. We should handle subregister composes and folding more indexes into insert_subreg and reg_sequence.	2025-02-05 23:29:02 +07:00
Matt Arsenault	92e3cd7069	X86: Remove hack in shouldRewriteCopySrc for subregister handling (#125224 ) In the problematic situation fixed in 61e556d2bdf3fa0a10dbaadd2dd03d01c341bd27, shouldRewriteCopySrc is called with identical register class arguments, but one has a subregister index. This was very surprising to me, and it probably shouldn't be valid for it to occur. It happens in cases with uncoalescable copies where the register class changes, and further up the chain there is a subregister operand. We could possibly just skip over uncoalsecable instructions in the chain rather than letting this query deal with it (or pre-filter the obvious subreg with same class case). The generic implementation is supposed to account for checking for valid subregisters by checking getMatchingSuperRegClass already, but that was bypassed by the early exit for exact class match. Also adds a reduced mir test demonstrating the exact problematic case.	2025-02-05 23:25:04 +07:00
Kazu Hirata	7c2c7a4381	[AsmPrinter] Avoid repeated hash lookups (NFC) (#125814 )	2025-02-05 07:18:29 -08:00
Akshat Oke	f77f777f35	[CodeGen][NewPM] Port RenameIndependentSubregs to NPM (#125192 )	2025-02-05 17:54:57 +05:30
Yuta Mukai	e3abe940d8	[MachinePipeliner] Improve loop carried dependence analysis (#94185 ) The previous implementation had false positive/negative cases in the analysis of the loop carried dependency. A missed dependency case is caused by incorrect analysis of address increments. This is fixed by strict analysis of recursive definitions. See added test swp-carried-dep4.mir. Excessive dependency detection is fixed by improving the formula for determining the overlap of address ranges to be accessed. See added test swp-carried-dep5.mir.	2025-02-05 21:08:20 +09:00
Cullen Rhodes	1cf909208e	[MISched] Small debug improvements (#125072 ) Changes: 1. Fix inconsistencies in register pressure set printing. "Max Pressure" printing is inconsistent with "Bottom Pressure" and "Top Pressure". For the former, register class begins on the same line vs newline for latter. Also for the former, the first register class is on the same line, but subsequent register classes are newline separated. That's removed so all are on the same line. Before: Max Pressure: FPR8=1 GPR32=14 Top Pressure: GPR32=2 Bottom Pressure: FPR8=7 GPR32=17 After: Max Pressure: FPR8=1 GPR32=14 Top Pressure: GPR32=2 Bottom Pressure: FPR8=7 GPR32=17 2. After scheduling an instruction, don't print pressure diff if there isn't one. Also s/UpdateRegP/UpdateRegPressure. E.g., Before: UpdateRegP: SU(3) %0:gpr64common = ADDXrr %58:gpr64common, gpr64 to UpdateRegP: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 390, 12 to GPR32 -1 After: UpdateRegPressure: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 12 to GPR32 -1 3. Don't print excess pressure sets if there are none.	2025-02-05 09:14:51 +00:00
Christudasan Devadasan	44f638f88e	CodeGen][NewPM] Port PostRAScheduler to NPM. (#125798 )	2025-02-05 12:45:59 +05:30
Christudasan Devadasan	5aa4979c47	CodeGen][NewPM] Port MachineScheduler to NPM. (#125703 )	2025-02-05 12:17:59 +05:30
Christudasan Devadasan	68e7df395e	[CodeGen][MachineScheduler] Remove the unimplemented print method. (#125702 )	2025-02-05 12:10:12 +05:30
Christudasan Devadasan	1d22318b81	[MachineVerifier][NewPM] Add method to run MF through verifier. (#125701 )	2025-02-05 11:54:26 +05:30
Christudasan Devadasan	a47c35a699	[CodeGen] Move MISched target hooks into TargetMachine (#125700 ) The createSIMachineScheduler & createPostMachineScheduler target hooks are currently placed in the PassConfig interface. Moving it out to TargetMachine so that both legacy and the new pass manager can effectively use them.	2025-02-05 11:41:37 +05:30
Mingming Liu	5f247e76df	[NFC]Refactor static data splitter (#125758 ) This is a split of https://github.com/llvm/llvm-project/pull/125756	2025-02-04 17:49:45 -08:00
Tom Tromey	3c2807624d	Allow 128-bit discriminants in DWARF variants (#125578 ) If a variant part has a 128-bit discriminator, then DwarfUnit::constructTypeDIE will assert. This patch fixes the problem by allowing any size of integer to be used here. This is largely accomplished by moving part of DwarfUnit::addConstantValue to a new method. Fixes #119655	2025-02-04 13:36:22 -08:00
Min-Yih Hsu	005b23bb3b	[IA][RISCV] Support VP loads/stores in InterleavedAccessPass (#120490 ) Teach InterleavedAccessPass to recognize the following patterns: - vp.store an interleaved scalable vector - Deinterleaving a scalable vector loaded from vp.load Upon recognizing these patterns, IA will collect the interleaved / deinterleaved operands and delegate them over to their respective newly-added TLI hooks. For RISC-V, these patterns are lowered into segmented loads/stores Right now we only recognized power-of-two (de)interleave cases, in which (de)interleave4/8 are synthesized from a tree of (de)interleave2. --------- Co-authored-by: Nikolay Panchenko <nicholas.panchenko@gmail.com>	2025-02-04 11:07:34 -08:00
Robert Imschweiler	21560fe6b9	GlobalISel: Fix defined register of invariant.start (#125664 ) In contrast to SelectionDAG, GlobalISel created a new virtual register for the return value of invariant.start, leaving subsequent users of the invariant.start value with an undefined reference. A minimal example: ``` %tmp = alloca i32, align 4, addrspace(5) %tmpI = call ptr @llvm.invariant.start.p5(i64 4, ptr addrspace(5) %tmp) #3 call void @llvm.invariant.end.p5(ptr %tmpI, i64 4, ptr addrspace(5) %tmp) #3 store i32 %i, ptr %tmpI, align 4 ``` Although the return value of invariant.start might not be intended for any use beyond invariant.end (the fuzzer might not have created a sensible situation here), an implicit definition of the corresponding virtual register avoids a segfault in the target instruction selector later. This LLVM defect was identified via the AMD Fuzzing project.	2025-02-04 23:59:03 +07:00
Michael Maitland	93b90a532d	[ReachingDefAnalysis] Fix management of MBBFrameObjsReachingDefs (#124943 ) MBBFrameObjsReachingDefs was not being built correctly since we were not inserting into a reference of Frame2InstrIdx. If there was multiple stack slot defs in the same basic block, then the bug would occur. This PR fixes this problem while simplifying the insertion logic. Additionally, when lookup into MBBFrameObjsReachingDefs was occurring, there was a chance that there was no entry in the map, in the case that there was no reaching def. This was causing us to return a default value, which may or may not have been correct. This patch returns the correct value now.	2025-02-04 10:04:19 -05:00
Alexander Peskov	358a48b293	[NVPTX] Fix DWARF address space for globals (#122715 ) Fix an issue with defining actual DWARF address space for module scope globals. Previously it was always `ADDR_global_space`. Also, this patch introduces CUDA-specific DWARF codes for address space specification in correspondence with: https://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#cuda-specific-dwarf-definitions Previously hardcoded constant values are replaced with enum values.	2025-02-04 09:16:21 -05:00
Matt Arsenault	c55a7659b3	DAG: Move scalarizeExtractedVectorLoad to TargetLowering (#122670 ) SimplifyDemandedVectorElts should be able to use this on loads	2025-02-04 17:37:12 +07:00
Akshat Oke	8fdd982668	[NewPM] MachineCopyPropagation: Remove dead ID (#125665 ) Fix for #125202 (4313345f2eeeb1e2ea7127a056ec4e1aaaa7fefb)	2025-02-04 16:04:14 +05:30
Akshat Oke	4313345f2e	[CodeGen][NewPM] Port MachineCopyPropagation to NPM (#125202 )	2025-02-04 15:45:03 +05:30
Matt Arsenault	2f2ac3de69	DAG: Avoid stack usage in bitcast operand promotion to legal vector (#125637 ) Fix introducing stack usage if a bitcast source operand is an illegal integer type cast to a legal vector type. This should cover more situations, but this is the first one I noticed.	2025-02-04 16:43:42 +07:00
Matt Arsenault	cdca04913a	DAG: Avoid introducing stack usage in vector->int bitcast int op promotion (#125636) Avoids stack usage in the v5i32 to i160 case for AMDGPU, which appears in fat pointer lowering.	2025-02-04 16:32:47 +07:00
Petar Avramovic	88814969dd	MachineUniformityAnalysis: Pass is incorrectly initialized as CFGOnly (#125511 ) Set CFGOnly in MachineUniformityAnalysisPass to false. If there were new registers created, uniformity analysis needs to be updated. Previously, with CFGOnly set to true, pass would be skipped if CFG was preserved.	2025-02-04 09:25:25 +01:00
Craig Topper	788bbd2ef6	[DAGCombiner] Improve chain handling in fold (fshl ld1, ld0, c) -> (ld0[ofs]) combine. (#124871 ) Happened to notice some odd things related to chains in this code. The code calls hasOneUse on LoadSDNode* which will check users of the data and the chain. I think this was trying to check that the data had one use so one of the loads would definitely be removed by the transform. Load chains don't always have users so our testing may not have noticed that the chains being used would block the transform. The code makes all users of ld1's chain use the new load's chain, but we don't know that ld1 becomes dead. This can cause incorrect dependencies if ld1's chain is used and it isn't deleted. I think the better thing to do is use makeEquivalentMemoryOrdering to make all users of ld0 and ld1 depend on the new load and the original loads. If the olds loads become dead, their chain will be cleaned up later. I'm having trouble getting a test for any ordering issue with the current code. areNonVolatileConsecutiveLoads requires the two loads to have the same input chain. Given that, I don't know how to use one of the load chain results without also using the other. If they are both used we don't do the transform because SDNode::hasOneUse will return false for both.	2025-02-03 11:48:41 -08:00
Matt Arsenault	3a2b552e44	TwoAddressInstruction: Fix assert on undef operand with -early-live-intervals (#125518 )	2025-02-03 23:48:28 +07:00

1 2 3 4 5 ...

37227 Commits