37227 Commits

Author SHA1 Message Date
Shubham Sandeep Rastogi
92f916faba
Add a pass to collect dropped var statistics for MIR (#126686)
This patch attempts to reland
https://github.com/llvm/llvm-project/pull/120780 while addressing the
issues that caused the patch to be reverted.

Namely:

1. The patch had included code from the llvm/Passes directory in the
llvm/CodeGen directory.

2. The patch increased the backend compile time by 2% due to adding a
very expensive include in MachineFunctionPass.h

The patch has been re-structured so that there is no dependency between
the llvm/Passes and llvm/CodeGen directory, by moving the base class,
`class DroppedVariableStats` to the llvm/IR directory.

The expensive include in MachineFunctionPass.h has been changed to
contain forward declarations instead of other header includes which was
pulling a ton of code into MachineFunctionPass.h and should resolve any
issues when it comes to compile time increase.
2025-02-12 14:08:18 -08:00
Akshat Oke
7b60e03d73
Reland "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703)" (#126684)
`RegisterClassInfo` was supposed to be kept alive between pass runs,
which wasn't being done leading to recomputations increasing the compile
time.

Now the Impl class is a member of the legacy and new passes so that it
is not reconstructed on every pass run.

---------

Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>
2025-02-12 18:54:39 +05:30
David Green
bf7af2d12e
[AArch64][DAG] Allow fptos/ui.sat to scalarized. (#126799)
We we previously running into problems with fp128 types and certain
integer sizes.

Fixes an issue reported on #124984
2025-02-12 11:04:08 +00:00
Craig Topper
7dd82805d5
[SelectionDAGBuilder] Remove NodeMap updates from getValueImpl. NFC (#126849)
Both callers already put the result in NodeMap immediately after the
call.
2025-02-12 00:07:07 -08:00
Haohai Wen
ec28e9b757
[MC] Replace MCContext::GenericSectionID with MCSection::NonUniqueID (#126202)
They have same semantics. NonUniqueID is more friendly for isUnique
implementation in MCSectionELF.

History: 97837b7 added support for unique IDs in sections and added
GenericSectionID. Later, 1dc16c7 added NonUniqueID.
2025-02-12 14:28:37 +08:00
Jim Lin
31bfae35d2
[DAGCombiner] Add hasOneUse checks for folding (not (add X, -1)) to (neg X) (#126667)
To get more better codegen for AArch with bic, x86 with andn and riscv
with andn.
2025-02-12 12:24:29 +08:00
Daniel Hoekwater
3a22cf9bd8
[CFIFixup] Fixup CFI for split functions with synchronous uwtables (#125299)
- **Precommit tests for synchronous uwtable CFI fixup**
- **[CFIFixup] Fixup CFI for split functions with synchronous uwtables**

Commit
6e54fccede
disables CFI fixup for
functions with synchronous tables, breaking CFI for split functions.
Instead, we can disable *block-level* CFI fixup for functions with
synchronous tables.

Unwind tables can be:
- N/A (not present)
- Asynchronous
- Synchronous

Functions without unwind tables don't need CFI fixup (since they don't
care about CFI).

Functions with asynchronous unwind tables must be accurate for each
basic block, so full CFI fixup is necessary.

Functions with synchronous unwind tables only need to be accurate for
each function (specifically, the portion of a function in a given
section). Disabling CFI fixup entirely for functions with synchronous
uwtables may break CFI for a function split between two sections. The
portion in the first section may have valid CFI, while the portion in
the second section is missing a call frame.

Ex:
```
(.text.hot)
Foo (BB1):
  <Call frame information>
  ...
BB2:
  ...

(.text.split)
BB3:
  ...
BB4:
  <epilogue>
```

Even if `Foo` has a synchronous unwind table, we still need to insert
call frame information into `BB3` so that unwinding the call stack from
`BB3` or `BB4` works properly.
2025-02-11 18:25:08 -05:00
Philip Reames
e4016bf5c3 [DAG] Use ArrayRef to simplify ShuffleVectorSDNode::isSplatMask 2025-02-11 12:47:10 -08:00
Benjamin Maxwell
19556eccf6
[RTLIB] Rename getFSINCOS() to getSINCOS (NFC) (#126705)
This makes the name more consistent with the other helpers.
2025-02-11 11:51:35 +00:00
Benjamin Maxwell
701223ac20
[IR] Add llvm.sincospi intrinsic (#125873)
This adds the `llvm.sincospi` intrinsic, legalization, and lowering
(mostly reusing the lowering for sincos and frexp).

The `llvm.sincospi` intrinsic takes a floating-point value and returns
both the sine and cosine of the value multiplied by pi. It computes the
result more accurately than the naive approach of doing the
multiplication ahead of time, especially for large input values.

```
declare { float, float }          @llvm.sincospi.f32(float  %Val)
declare { double, double }        @llvm.sincospi.f64(double %Val)
declare { x86_fp80, x86_fp80 }    @llvm.sincospi.f80(x86_fp80  %Val)
declare { fp128, fp128 }          @llvm.sincospi.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 }  @llvm.sincospi.ppcf128(ppc_fp128  %Val)
declare { <4 x float>, <4 x float> } @llvm.sincospi.v4f32(<4 x float>  %Val)
```

Currently, the default lowering of this intrinsic relies on the
`sincospi[f|l]` functions being available in the target's runtime (e.g.
libc).
2025-02-11 09:01:30 +00:00
Rahul Joshi
0f674cce82
[NFC][LLVM] Remove unused TargetIntrinsicInfo class (#126003)
Remove `TargetIntrinsicInfo` class as its practically unused (its pure
virtual with no subclasses) and its references in the code.
2025-02-10 14:56:30 -08:00
Jinsong Ji
5d2e2847e0
MachineCopyPropagation: Do not remove copies preserved by regmask (#125868)
llvm/llvm-project@9e436c2daa tries to handle register masks and
sub-registers, it avoids clobbering RegUnit presreved by regmask. But it
then introduces invalid pointer issues.

We delete the copies without invalidate all the use in the CopyInfo, so
we dereferenced invalid pointers in next interation, causing asserts.

Fixes: #126107

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-02-10 12:26:33 -05:00
Shilei Tian
70fdd9f0a2
[GlobalISel] Check whether G_CTLZ is legal in matchUMulHToLShr (#126457)
We need to check `G_CTLZ` because the combine uses `G_CTLZ` to get log
base 2,
and it is not always legal for on a target.

Fixes SWDEV-512440.
2025-02-10 00:11:09 -05:00
Kazu Hirata
d1af9ca9fd
[AsmPrinter] Avoid repeated map lookups (NFC) (#126431) 2025-02-09 13:34:47 -08:00
Kazu Hirata
c741cf1617
[CodeGen] Avoid repeated hash lookups (NFC) (#126403) 2025-02-09 08:55:43 -08:00
Abhishek Kaushik
7b348f9bfd
[MIR][NFC] Use std::move to avoid copying (#125930) 2025-02-09 13:51:34 +05:30
Akshat Oke
564b9b7f4d
Revert "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703)" (#126268)
This reverts commit 5aa4979c47255770cac7b557f3e4a980d0131d69 while I
investigate what's causing the compile-time regression.
2025-02-08 15:36:48 +05:30
Kazu Hirata
1c497c4837
[CodeGen] Avoid repeated hash lookups (NFC) (#126343) 2025-02-08 00:48:01 -08:00
Yashas Andaluri
a361de6d13
[RDF] Create phi nodes for clobbering defs (#123694)
When a def in a block A reaches another block B that is in A's iterated
dominance frontier, a phi node is added to B for the def register.

A clobbering def can be created at a call instruction, for a register
clobbered by a call.
However, phi nodes are not created for a register, when one of the
reaching defs of the register is a clobbering def.

This patch adds phi nodes for registers that have a clobbering reaching
def. These additional phis help in checking reaching defs for an
instruction in RDF based copy propagation and addressing mode
optimizations.
2025-02-07 08:28:29 -06:00
Benjamin Maxwell
4bf97aa818
[IR] Add llvm.modf intrinsic (#121948)
This adds the `llvm.modf` intrinsic, legalization, and lowering (mostly
reusing the lowering for sincos and frexp).

The `llvm.modf` intrinsic takes a floating-point value and returns both
the integral and fractional parts (as a struct).

```
declare { float, float }             @llvm.modf.f32(float  %Val)
declare { double, double }           @llvm.modf.f64(double %Val)
declare { x86_fp80, x86_fp80 }       @llvm.modf.f80(x86_fp80  %Val)
declare { fp128, fp128 }             @llvm.modf.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 }     @llvm.modf.ppcf128(ppc_fp128  %Val)
declare { <4 x float>, <4 x float> } @llvm.modf.v4f32(<4 x float>  %Val)
```

This corresponds to the libm `modf` function but returns multiple values
in a struct (rather than take output pointers), which makes it easier to
vectorize.
2025-02-07 09:25:13 +00:00
Mingming Liu
5399782508
[IR] Generalize Function's {set,get}SectionPrefix to GlobalObjects, the base class of {Function, GlobalVariable, IFunc} (#125757)
This is a split of https://github.com/llvm/llvm-project/pull/125756
2025-02-06 14:51:13 -08:00
Matt Arsenault
c268a3f093 DAG: Fix extract of load combine with mismatched vector element type
Fix the case where the vector element type of the loaded extractelement
input does not match the result type of the extract.

This fixes a regression reported after
c55a7659b38946350315ac4a18d9805deb1f0a54
2025-02-06 22:56:56 +07:00
Michael Buch
eb8901bda1
[llvm][DebugInfo] Add new DW_AT_APPLE_enum_kind to encode enum_extensibility (#124752)
When creating `EnumDecl`s from DWARF for Objective-C `NS_ENUM`s, the
Swift compiler tries to figure out if it should perform "swiftification"
of that enum (which involves renaming the enumerator cases, etc.). The
heuristics by which it determines whether we want to swiftify an enum is
by checking the `enum_extensibility` attribute (because that's what
`NS_ENUM` pretty much are). Currently LLDB fails to attach the
`EnumExtensibilityAttr` to `EnumDecl`s it creates (because there's not
enough info in DWARF to derive it), which means we have to fall back to
re-building Swift modules on-the-fly, slowing down expression evaluation
substantially. This happens around
4b3931c8ce/lib/ClangImporter/ImportEnumInfo.cpp (L37-L59)

To speed up Swift exression evaluation, this patch proposes encoding the
C/C++/Objective-C `enum_extensibility` attribute in DWARF via a new
`DW_AT_APPLE_ENUM_KIND`. This would currently be only used from the LLDB
Swift plugin. But may be of interest to other language plugins as well
(though I haven't come up with a concrete use-case for it outside of
Swift).

I'm open to naming suggestions of the various new attributes/attribute
constants proposed here. I tried to be as generic as possible if we
wanted to extend it to other kinds of enum properties (e.g., flag
enums).

The new attribute would look as follows:
```
DW_TAG_enumeration_type
  DW_AT_type      (0x0000003a "unsigned int")
  DW_AT_APPLE_enum_kind   (DW_APPLE_ENUM_KIND_Closed)
  DW_AT_name      ("ClosedEnum")
  DW_AT_byte_size (0x04)
  DW_AT_decl_file ("enum.c")
  DW_AT_decl_line (23)

DW_TAG_enumeration_type
  DW_AT_type      (0x0000003a "unsigned int")
  DW_AT_APPLE_enum_kind   (DW_APPLE_ENUM_KIND_Open)
  DW_AT_name      ("OpenEnum")
  DW_AT_byte_size (0x04)
  DW_AT_decl_file ("enum.c")
  DW_AT_decl_line (27)
```
Absence of the attribute means the extensibility of the enum is unknown
and abides by whatever the language rules of that CU dictate.

This does feel like a big hammer for quite a specific use-case, so I'm
happy to discuss alternatives.

Alternatives considered:
* Re-using an existing DWARF attribute to express extensibility. E.g., a
`DW_TAG_enumeration_type` could have a `DW_AT_count` or
`DW_AT_upper_bound` indicating the number of enumerators, which could
imply closed-ness. I felt like a dedicated attribute (which could be
generalized further) seemed more applicable. But I'm open to re-using
existing attributes.
* Encoding the entire attribute string (i.e., `DW_TAG_LLVM_annotation
("enum_extensibility((open))")`) on the `DW_TAG_enumeration_type`. Then
in LLDB somehow parse that out into a `EnumExtensibilityAttr`. I haven't
found a great API in Clang to parse arbitrary strings into AST nodes
(the ones I've found required fully formed C++ constructs). Though if
someone knows of a good way to do this, happy to consider that too.
2025-02-06 08:58:35 +00:00
Min-Yih Hsu
5a1e16f6de
[IR][RISCV] Add llvm.vector.(de)interleave3/5/7 (#124825)
These three intrinsics are similar to llvm.vector.(de)interleave2 but
work with 3/5/7 vector operands or results.
For RISC-V, it's important to have them in order to support segmented
load/store with factor of 2 to 8: factor of 2/4/8 can be synthesized
from (de)interleave2; factor of 6 can be synthesized from factor of 2
and 3; factor 5 and 7 have their own intrinsics added by this patch.

This patch only adds codegen support for these intrinsics, we still need
to teach vectorizer to generate them as well as teaching
InterleavedAccessPass to use them.

---------

Co-authored-by: Craig Topper <craig.topper@sifive.com>
2025-02-05 15:30:33 -08:00
Christudasan Devadasan
d86e379fd2
[CodeGen][NewPM] Port StackSlotColoring to NPM. (#125876) 2025-02-05 23:18:16 +05:30
Matt Arsenault
58a88001f3
PeepholeOpt: Fix looking for def of current copy to coalesce (#125533)
This fixes the handling of subregister extract copies. This
will allow AMDGPU to remove its implementation of
shouldRewriteCopySrc, which exists as a 10 year old workaround
to this bug. peephole-opt-fold-reg-sequence-subreg.mir will
show the expected improvement once the custom implementation
is removed.

The copy coalescing processing here is overly abstracted
from what's actually happening. Previously when visiting
coalescable copy-like instructions, we would parse the
sources one at a time and then pass the def of the root
instruction into findNextSource. This means that the
first thing the new ValueTracker constructed would do
is getVRegDef to find the instruction we are currently
processing. This adds an unnecessary step, placing
a useless entry in the RewriteMap, and required skipping
the no-op case where getNewSource would return the original
source operand. This was a problem since in the case
of a subregister extract, shouldRewriteCopySource would always
say that it is useful to rewrite and the use-def chain walk
would abort, returning the original operand. Move the process
to start looking at the source operand to begin with.

This does not fix the confused handling in the uncoalescable
copy case which is proving to be more difficult. Some currently
handled cases have multiple defs from a single source, and other
handled cases have 0 input operands. It would be simpler if
this was implemented with isCopyLikeInstr, rather than guessing
at the operand structure as it does now.

There are some improvements and some regressions. The
regressions appear to be downstream issues for the most part. One
of the uglier regressions is in PPC, where a sequence of insert_subrgs
is used to build registers. I opened #125502 to use reg_sequence instead,
which may help.

The worst regression is an absurd SPARC testcase using a <251 x fp128>,
which uses a very long chain of insert_subregs.

We need improved subregister handling locally in PeepholeOptimizer,
and other pasess like MachineCSE to fix some of the other regressions.
We should handle subregister composes and folding more indexes
into insert_subreg and reg_sequence.
2025-02-05 23:29:02 +07:00
Matt Arsenault
92e3cd7069
X86: Remove hack in shouldRewriteCopySrc for subregister handling (#125224)
In the problematic situation fixed in 61e556d2bdf3fa0a10dbaadd2dd03d01c341bd27,
shouldRewriteCopySrc is called with identical register class arguments,
but one has a subregister index. This was very surprising to me,
and it probably shouldn't be valid for it to occur. It happens in cases
with uncoalescable copies where the register class changes, and further
up the chain there is a subregister operand. We could possibly just
skip over uncoalsecable instructions in the chain rather than letting
this query deal with it (or pre-filter the obvious subreg with same
class case).

The generic implementation is supposed to account for checking for
valid subregisters by checking getMatchingSuperRegClass already,
but that was bypassed by the early exit for exact class match.

Also adds a reduced mir test demonstrating the exact problematic
case.
2025-02-05 23:25:04 +07:00
Kazu Hirata
7c2c7a4381
[AsmPrinter] Avoid repeated hash lookups (NFC) (#125814) 2025-02-05 07:18:29 -08:00
Akshat Oke
f77f777f35
[CodeGen][NewPM] Port RenameIndependentSubregs to NPM (#125192) 2025-02-05 17:54:57 +05:30
Yuta Mukai
e3abe940d8
[MachinePipeliner] Improve loop carried dependence analysis (#94185)
The previous implementation had false positive/negative cases in the
analysis of the loop carried dependency.

A missed dependency case is caused by incorrect analysis of address
increments. This is fixed by strict analysis of recursive definitions.
See added test swp-carried-dep4.mir.

Excessive dependency detection is fixed by improving the formula
for determining the overlap of address ranges to be accessed. See added test
swp-carried-dep5.mir.
2025-02-05 21:08:20 +09:00
Cullen Rhodes
1cf909208e
[MISched] Small debug improvements (#125072)
Changes:
1. Fix inconsistencies in register pressure set printing. "Max Pressure"
   printing is inconsistent with "Bottom Pressure" and "Top Pressure".
   For the former, register class begins on the same line vs newline for
   latter. Also for the former, the first register class is on the same
   line, but subsequent register classes are newline separated. That's
   removed so all are on the same line.

   Before:
     Max Pressure: FPR8=1
     GPR32=14
     Top Pressure:
     GPR32=2
     Bottom Pressure:
     FPR8=7
     GPR32=17

   After:
     Max Pressure: FPR8=1 GPR32=14
     Top Pressure: GPR32=2
     Bottom Pressure: FPR8=7 GPR32=17

2. After scheduling an instruction, don't print pressure diff if there
   isn't one. Also s/UpdateRegP/UpdateRegPressure. E.g.,

   Before:
     UpdateRegP: SU(3) %0:gpr64common = ADDXrr %58:gpr64common, gpr64
                 to
     UpdateRegP: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 390, 12
                 to GPR32 -1

   After:
     UpdateRegPressure: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 12
                        to GPR32 -1
3. Don't print excess pressure sets if there are none.
2025-02-05 09:14:51 +00:00
Christudasan Devadasan
44f638f88e
CodeGen][NewPM] Port PostRAScheduler to NPM. (#125798) 2025-02-05 12:45:59 +05:30
Christudasan Devadasan
5aa4979c47
CodeGen][NewPM] Port MachineScheduler to NPM. (#125703) 2025-02-05 12:17:59 +05:30
Christudasan Devadasan
68e7df395e
[CodeGen][MachineScheduler] Remove the unimplemented print method. (#125702) 2025-02-05 12:10:12 +05:30
Christudasan Devadasan
1d22318b81
[MachineVerifier][NewPM] Add method to run MF through verifier. (#125701) 2025-02-05 11:54:26 +05:30
Christudasan Devadasan
a47c35a699
[CodeGen] Move MISched target hooks into TargetMachine (#125700)
The createSIMachineScheduler & createPostMachineScheduler
target hooks are currently placed in the PassConfig interface.
Moving it out to TargetMachine so that both legacy and
the new pass manager can effectively use them.
2025-02-05 11:41:37 +05:30
Mingming Liu
5f247e76df
[NFC]Refactor static data splitter (#125758)
This is a split of https://github.com/llvm/llvm-project/pull/125756
2025-02-04 17:49:45 -08:00
Tom Tromey
3c2807624d
Allow 128-bit discriminants in DWARF variants (#125578)
If a variant part has a 128-bit discriminator, then
DwarfUnit::constructTypeDIE will assert.  This patch fixes the problem
by allowing any size of integer to be used here.  This is largely
accomplished by moving part of DwarfUnit::addConstantValue to a new
method.

Fixes #119655
2025-02-04 13:36:22 -08:00
Min-Yih Hsu
005b23bb3b
[IA][RISCV] Support VP loads/stores in InterleavedAccessPass (#120490)
Teach InterleavedAccessPass to recognize the following patterns:
  - vp.store an interleaved scalable vector
  - Deinterleaving a scalable vector loaded from vp.load

Upon recognizing these patterns, IA will collect the interleaved /
deinterleaved operands and delegate them over to their respective
newly-added TLI hooks.

For RISC-V, these patterns are lowered into segmented loads/stores

Right now we only recognized power-of-two (de)interleave cases, in which
(de)interleave4/8 are synthesized from a tree of (de)interleave2.

---------

Co-authored-by: Nikolay Panchenko <nicholas.panchenko@gmail.com>
2025-02-04 11:07:34 -08:00
Robert Imschweiler
21560fe6b9
GlobalISel: Fix defined register of invariant.start (#125664)
In contrast to SelectionDAG, GlobalISel created a new virtual register
for the return value of invariant.start, leaving subsequent users of the
invariant.start value with an undefined reference.
A minimal example:
```
  %tmp = alloca i32, align 4, addrspace(5)
  %tmpI = call ptr @llvm.invariant.start.p5(i64 4, ptr addrspace(5) %tmp) #3
  call void @llvm.invariant.end.p5(ptr %tmpI, i64 4, ptr addrspace(5) %tmp) #3
  store i32 %i, ptr %tmpI, align 4
```
Although the return value of invariant.start might not be intended for
any use beyond invariant.end (the fuzzer might not have created a
sensible situation here), an implicit definition of the corresponding
virtual register avoids a segfault in the target instruction selector
later.

This LLVM defect was identified via the AMD Fuzzing project.
2025-02-04 23:59:03 +07:00
Michael Maitland
93b90a532d
[ReachingDefAnalysis] Fix management of MBBFrameObjsReachingDefs (#124943)
MBBFrameObjsReachingDefs was not being built correctly since we were not
inserting into a reference of Frame2InstrIdx. If there was multiple
stack slot defs in the same basic block, then the bug would occur. This
PR fixes this problem while simplifying the insertion logic.

Additionally, when lookup into MBBFrameObjsReachingDefs was occurring,
there was a chance that there was no entry in the map, in the case that
there was no reaching def. This was causing us to return a default
value, which may or may not have been correct. This patch returns the
correct value now.
2025-02-04 10:04:19 -05:00
Alexander Peskov
358a48b293
[NVPTX] Fix DWARF address space for globals (#122715)
Fix an issue with defining actual DWARF address space for module scope
globals. Previously it was always `ADDR_global_space`.

Also, this patch introduces CUDA-specific DWARF codes for address space
specification in correspondence with:

https://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#cuda-specific-dwarf-definitions
Previously hardcoded constant values are replaced with enum values.
2025-02-04 09:16:21 -05:00
Matt Arsenault
c55a7659b3
DAG: Move scalarizeExtractedVectorLoad to TargetLowering (#122670)
SimplifyDemandedVectorElts should be able to use this on loads
2025-02-04 17:37:12 +07:00
Akshat Oke
8fdd982668
[NewPM] MachineCopyPropagation: Remove dead ID (#125665)
Fix for #125202 (4313345f2eeeb1e2ea7127a056ec4e1aaaa7fefb)
2025-02-04 16:04:14 +05:30
Akshat Oke
4313345f2e
[CodeGen][NewPM] Port MachineCopyPropagation to NPM (#125202) 2025-02-04 15:45:03 +05:30
Matt Arsenault
2f2ac3de69
DAG: Avoid stack usage in bitcast operand promotion to legal vector (#125637)
Fix introducing stack usage if a bitcast source operand is an illegal
integer type cast to a legal vector type. This should cover more
situations, but this is the first one I noticed.
2025-02-04 16:43:42 +07:00
Matt Arsenault
cdca04913a
DAG: Avoid introducing stack usage in vector->int bitcast int op promotion
(#125636)

Avoids stack usage in the v5i32 to i160 case for AMDGPU, which appears
in fat pointer lowering.
2025-02-04 16:32:47 +07:00
Petar Avramovic
88814969dd
MachineUniformityAnalysis: Pass is incorrectly initialized as CFGOnly (#125511)
Set CFGOnly in MachineUniformityAnalysisPass to false.
If there were new registers created, uniformity analysis needs to be
updated. Previously, with CFGOnly set to true, pass would be skipped
if CFG was preserved.
2025-02-04 09:25:25 +01:00
Craig Topper
788bbd2ef6 [DAGCombiner] Improve chain handling in fold (fshl ld1, ld0, c) -> (ld0[ofs]) combine. (#124871)
Happened to notice some odd things related to chains in this code.

The code calls hasOneUse on LoadSDNode* which will check users
of the data and the chain. I think this was trying to check that
the data had one use so one of the loads would definitely be
removed by the transform. Load chains don't always have users so
our testing may not have noticed that the chains being used would
block the transform.

The code makes all users of ld1's chain use the new load's chain, but
we don't know that ld1 becomes dead. This can cause incorrect dependencies if
ld1's chain is used and it isn't deleted. I think the better thing to do
is use makeEquivalentMemoryOrdering to make all users of ld0 and ld1
depend on the new load and the original loads. If the olds loads become
dead, their chain will be cleaned up later.

I'm having trouble getting a test for any ordering issue with the current code.
areNonVolatileConsecutiveLoads requires the two loads to have the same
input chain. Given that, I don't know how to use one of the load chain
results without also using the other. If they are both used we don't
do the transform because SDNode::hasOneUse will return false for both.
2025-02-03 11:48:41 -08:00
Matt Arsenault
3a2b552e44
TwoAddressInstruction: Fix assert on undef operand with -early-live-intervals (#125518) 2025-02-03 23:48:28 +07:00