637 Commits

Author SHA1 Message Date
int-zjt
0b3934d677
[SROA] Avoid redundant .oldload generation when memset fully covers a partition (#179643)
In our internal (ByteDance) builds we frequently hit very large
`DeadPhiWeb`s that cause serious compile-time slowdowns, especially in
some auto-generated code where a single file can take 20+ minutes to
compile. There were previous attempts to reduce `DeadPhiWeb` in
`InstCombine` (e.g. llvm/llvm-project#108876 and
llvm/llvm-project#158057), but in our workload we still see a lot of
time spent later in the pipeline (notably `JumpThreading` and
`CorrelatedValuePropagation`).

After digging into our cases, a big chunk of the `DeadPhiWeb` comes from
SROA rewriting `memset`s. We often end up with patterns like:
```
%.sroa.xxx.oldload = load <ty>, ptr %.sroa.xxx
%unused = ptrtoint ptr %.sroa.xxx.oldload to i64   ; or a bitcast-like use
store <ty> <new_value>, ptr %.sroa.xxx
```
Even if `%unused` is cleaned up by later DCE-style passes, the
load/store shape can still make `PromoteMem2Reg` conservatively treat
many blocks as live-in when computing IDF. With cyclic CFGs this can
easily create large, sticky dead phi webs, and the rest of the pipeline
pays for it.

The core issue is that `visitMemSetInst` was using the slice’s original
offsets (`BeginOffset`/`EndOffset`) when deciding whether it needs to
merge with an `.oldload` to preserve bytes not written by the `memset`.
First, there was a typo in the original condition (`EndOffset !=
NewAllocaBeginOffset` instead of `EndOffset != NewAllocaEndOffset`),
which effectively made the check always true and forced the merge path
in most cases. Second, even if the typo is fixed, comparing the original
slice range against the partition bounds is still too strict: cases
where the `memset` contains the partition (e.g. a large `memset` over
the whole alloca while the partition is just a subrange) would still be
misclassified as requiring an `.oldload`. Both issues lead to many
redundant loads and downstream dead phi webs.

This change switches the check to use the already-computed intersection
offsets (`NewBeginOffset`/`NewEndOffset`) against the partition bounds,
so we only generate `.oldload` when the `memset` actually writes only
part of the partition:
```diff
-      if (IntTy && (BeginOffset != NewAllocaBeginOffset ||
-                    EndOffset != NewAllocaBeginOffset)) {
+      if (IntTy && (NewBeginOffset != NewAllocaBeginOffset ||
+                    NewEndOffset != NewAllocaEndOffset)) {
    ; emit oldload + insertInteger merge
  }
```
In our workload this cuts down a lot of pointless `.oldload`s and helps
reduce the size of dead phi webs seen after `mem2reg`, improving compile
time without changing semantics (partial overwrites still merge, full
overwrites don’t).
2026-02-05 09:51:55 +08:00
Steffen Larsen
c7408d17fa
[AMDGPU][SROA] Unify cast chain implementations (#177945)
The AMDGPU promote alloca pass is missing a conversion link when casting
between vectors of pointers and pointers or vectors of pointers with
different number of elements. This causes codegen to crash due to
invalid casts being generated. To address this, this commit adds the
missing conversion link.

In addition to this, the commit moves the common load/store cast logic
into a new function `createLoadStoreCastChain`.

---------

Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>
Co-authored-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>
2026-02-03 11:12:02 +00:00
Jameson Nash
a460c8e8da
[NFCI][SROA] reduce calls to getAllocatedType() (#177437)
Replace repeated calls with getAllocationSize() and cached NewAllocaTy.
In AllocaSliceRewriter, the allocated type is already stored in the
NewAllocaTy member variable and now passed directly there. This change
replaces the remaining direct calls to `NewAI.getAllocatedType()` with
the cached NewAllocaTy to simplify and DRY the code.

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 00:03:55 -05:00
Björn Pettersson
1c305aebd6
[SROA] Use shufflevector instead of select for vector blend (#175756)
A patch from May 2013, commit 1e211913b56f390, changed SROA into using a
select instruction to perform vector blend. Idea was that using a select
was the canonical form, and that we optimize select better than
shufflevector.

This patch is changing SROA back into using shufflevector instead of
select when doing the blend (inserting a smaller vector into a larger
vector).

Motivation:
Nowadays InstCombine is canonicalizing this kind of vector blends, using
vector select instructions, into a shufflevector instruction. So it is
assumed that shufflevector is the canonical form now. It is also assumed
that we are better at optimizing shufflevector today, compared to back
in 2013.

Commit f26710d97d9c272be8a55 includes links to a discussion from 2016
(https://discourse.llvm.org/t/ir-canonicalization-vector-select-or-shufflevector/42257/6)
about picking shufflevector as the canonical form.
2026-01-13 18:06:00 +00:00
Kirill Pertsev
5334c51486
[IR][InstCombine] Fix O(n^2) complexity in SliceUpIllegalIntegerPHI (#175468)
Add a hasInsertionPt() helper, which is equivalent to
getFirstInsertionPt() != end(), but performs the check in O(1) instead
of O(n). In particular, this avoids quadratic complexity inside
SliceUpIllegalIntegerPHI().

Fixes #175465.
Should fix https://github.com/rust-lang/rust/issues/129713.
2026-01-12 20:59:58 +00:00
Yonah Goldberg
f34d068687
[SROA] Refactor rewritePartition alloca type selection process (#167771)
This PR does two things:
1. Refactor the rewritePartition type selection process to move the
logic inside of a lambda. Previously the selection process would make
use of a mutable `SliceTy`. Each phase of the type selection would do
something like `if (!SliceTy) { // try to set sliceTy` }. But you also
have `if (!SliceTy && <condition>)` and `if (!SliceTy || <condition>)`.
I think this style makes the priority mechanism confusing. The new way I
wrote the selection process is equivalent (except for the second
contribution of this PR), and works by checking a condition, followed by
returning the selected type right away. I think it makes the priority
clearer.

2. What motivated the rewrite is that there are some cases with small
aggregate allocas that have mixed type loads and stores that SROA fails
to promote.

For example, given:
```
%alloca = alloca [2 x half]
store i32 42, ptr %alloca
%val = load float, ptr %alloca
```
Previously, SROA would:
- Find no common type between `i32` and `float`
- Use `getTypePartition` which returns `[2 x half]`
- Create a new `alloca [2 x half]`

This PR adds an additional check:

```
if (LargestIntTy &&
    DL.getTypeAllocSize(LargestIntTy).getFixedValue() >= P.size() &&
    isIntegerWideningViable(P, LargestIntTy, DL))
  return {LargestIntTy, true, nullptr};
```

which allows the alloca above to get promoted as `i32`.

The larger rewrite helps with this because this check is super specific.
I only want it to apply when there is no common type and when all the
loads and stores can be widened for promotion. I also want to apply it
on the subtype returned from getTypePartition. The larger rewrite makes
it very clear when this optimization occurs. I also don't want it to
apply when vector promotion is possible.

After this change, the example is optimized to:
```
%0 = bitcast i32 42 to float
ret void
```
The alloca is completely eliminated.
2025-12-19 11:04:50 -08:00
Peter Collingbourne
b0d3405578
SROA: Recognize llvm.protected.field.ptr intrinsics.
When an alloc slice's users include llvm.protected.field.ptr intrinsics
and their discriminators are consistent, drop the intrinsics in order
to avoid unnecessary pointer sign and auth operations.

Reviewers: nikic

Reviewed By: nikic

Pull Request: https://github.com/llvm/llvm-project/pull/151650
2025-12-11 18:22:05 -08:00
Yonah Goldberg
4ca61f5661
[NFC][SROA] Clean up rewritePartition type selection process (#169106)
This change reverts
257251247a,
which landed on Aug 8, 2022. This change addressed the problem that if
you have IR that looks something like:

```
%alloca = alloca <4 x float>
store <4 x float> %data, ptr %alloca
%load = load half, ptr %alloca
```

`getCommonType` would return `<4 x float>` because the `load half` isn't
to the entire partition, so we skip the first `getTypePartition` check.
257251247a
added a later check that sees that `<4 x float>` is not vector
promotable because of the `load half`, and then calls
`getTypePartition`, which changes the `sliceTy` to `< 8 x half>`, which
is vector promotable because the store can be changed to `store <8 x
half>`. So we set the `sliceTy` to `<8 x half>`, we can promote the
alloca, and everyone is happy.

This code became unnecessary after
529eafd9be,
which landed ~3 months later, which fixes the issue in a different way.
`isVectorPromotionViable` was already smart enough to try `<8 x half>`
as a type candidate because it sees the `load half`. However, this
candidate didn't work because it conflicts with `store <4 x float>`.
This commit added logic to try integer-ifying candidates if there is no
common type. So the `<8 x half>` candidate gets converted to `<8 x
i16>`, which works because we can convert the alloca to `alloca <8 x
i16>` and the load to `load i16`, allowing promotion.

After
529eafd9be,
the original commit is pointless. It tries to refine the `SliceTy`, but
if `isVectorPromotionViable` succeeds, it returns a new type to use and
we will ignore the `SliceTy`.

This change is my first patch to try to simplify the type selection
process in rewritePartition. I had some other ideas that I tried in
https://github.com/llvm/llvm-project/pull/167771 and
https://github.com/llvm/llvm-project/pull/168796, but they need
refinement.
2025-12-03 10:58:30 -08:00
Drew Kersnar
9c78bc5de4
Revert "[LSV] Merge contiguous chains across scalar types" (#170381)
Reverts llvm/llvm-project#154069. I pointed out a number of issues
post-merge, most importantly examples of miscompiles:
https://github.com/llvm/llvm-project/pull/154069#issuecomment-3603854626.

While the motivation of the change is clear, I think the implementation
approach is flawed. It seems like the goal is to allow elements like
`load <2xi16>` and `load i32` to be vectorized together despite the
current algorithm not grouping them into the same equivalence classes. I
personally think that if we want to attempt this it should be a more
wholistic approach, maybe even redefining the concept of an equivalence
class. This current solution seems like it would be really hard to do
bug-free, and even if the bugs were not present, it is only able to
merge chains that happen to be adjacent to each other after
`splitChainByContiguity`, which seems like it is leaving things up to
chance whether this optimization kicks in. But we can discuss more in
the re-land. Maybe the broader approach I'm proposing is too difficult,
and a narrow optimization is worthwhile. Regardless, this should be
reverted, it needs more iteration before it is correct.
2025-12-02 18:27:58 -05:00
Shubham Sandeep Rastogi
23a22d0497
[SROA] Unify the names of new instructions created in SROA. (#167917)
In Debug builds, the names of adjusted pointers have a pointer-specific
name prefix which doesn't exist in non-debug builds.

This causes differences in output when looking at the output of SROA
with a Debug or Release compiler.

For most of our ongoing testing, we use essentially Release+Asserts
build (basically release but without NDEBUG defined), however we ship a
Release compiler. Therefore we want to say with reasonable confidence
that building a large project with Release vs a Release+Asserts build
gives us the same output when the same compiler version is used.

This difference however, makes it difficult to prove that the output is
the same if the only difference is the name when using LTO builds and
looking at bitcode.

Hence this change is being proposed.
2025-12-02 10:12:20 -08:00
Anshil Gandhi
fbdf8ab590
[LSV] Merge contiguous chains across scalar types (#154069)
This change enables the LoadStoreVectorizer to merge and vectorize
contiguous chains even when their scalar element types differ, as long
as the total bitwidth matches. To do so, we rebase offsets between
chains, normalize value types to a common integer type, and insert the
necessary casts around loads and stores. This uncovers more
vectorization opportunities and explains the expected codegen updates
across AMDGPU tests.

Key changes:
- Chain merging
  - Build contiguous subchains and then merge adjacent ones when:
- They refer to the same underlying pointer object and address space.
    - They are either all loads or all stores.
    - A constant leader-to-leader delta exists.
- Rebasing one chain into the other's coordinate space does not overlap.
    - All elements have equal total bit width.
- Rebase the second chain by the computed delta and append it to the
first.

- Type normalization and casting
- Normalize merged chains to a common integer type sized to the total
bits.
- For loads: create a new load of the normalized type, copy metadata,
and cast back to the original type for uses if needed.
  - For stores: bitcast the value to the normalized type and store that.
- Insert zext/trunc for integer size changes; use bit-or-pointer casts
when sizes match.

- Cleanups
  - Erase replaced instructions and DCE pointer operands when safe.
- New helpers: computeLeaderDelta, chainsOverlapAfterRebase,
rebaseChain, normalizeChainToType, and allElemsMatchTotalBits.

Impact:
- Increases vectorization opportunities across mixed-typed but
size-compatible access chains.
- Large set of expected AMDGPU codegen diffs due to more/changed
vectorization.

This PR resolves #97715.
2025-12-01 23:05:17 -05:00
Orlando Cazalet-Hyams
7bbb03d516
[NFC][SROA][DebugInfo] Reuse existing dbg_assigns where possible (#163938)
Addresses issue #145937

Without this patch SROA generates new dbg_assign for new stores. We can
simply steal the existing dbg_assigns linked to the old store when the
store is not being split.
2025-10-17 17:19:43 +01:00
Kazu Hirata
f2306b6304
[llvm] Replace LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]] (NFC) (#163507)
This patch replaces LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]].  Note
that this patch adjusts the placement of [[maybe_unused]] to comply
with the C++17 language.
2025-10-15 06:54:14 -07:00
Mircea Trofin
a848c1b738
[sroa][profcheck] Vector selects have "unknown" branch weights (#163319) 2025-10-14 17:36:31 -07:00
Mircea Trofin
3901f130e3
[sroa][profcheck] Propagate profile in unfoldGEPSelect (#163318)
Also applied `ProfcheckDisableMetadataFixes`​ to previous (PR #163317) PR.
2025-10-14 10:25:53 -07:00
Mircea Trofin
5fa41f0064
[sroa][profcheck] Propagate select profile (#163317) 2025-10-14 08:06:21 -07:00
Chengjun
8faeed042a
[SROA] Add Stored Value Size Check for Tree-Structured Merge (#162921)
The change fixes a bug in the SROA where tree-structured merge
optimization was incorrectly applied when the size of the stored value
was not a multiple of the new allocated element type size. The original
change is https://github.com/llvm/llvm-project/pull/152793. A simple
repro would be

```
define <1 x i32> @foo(<1 x i16> %a, <1 x i16> %b) {
entry:
  %alloca = alloca [1 x i32]

  %ptr0 = getelementptr inbounds [2 x i16], ptr %alloca, i32 0, i32 0
  store <1 x i16> %a, ptr %ptr0

  %ptr1 = getelementptr inbounds [2 x i16], ptr %alloca, i32 0, i32 1
  store <1 x i16> %b, ptr %ptr1

  %result = load <1 x i32>, ptr %alloca
  ret <1 x i32> %result
}
```

Currently, this will lead to a compile time crash.

In this change, we will skip the tree-structured merge for this case and
fall back to normal SROA.
2025-10-11 00:23:04 +00:00
Chengjun
4bc9d29fba
[SROA] Use tree-structure merge to remove alloca (#152793)
This patch introduces a new optimization in SROA that handles the
pattern where multiple non-overlapping vector `store`s completely fill
an `alloca`.

The current approach to handle this pattern introduces many `.vecexpand`
and `.vecblend` instructions, which can dramatically slow down
compilation when dealing with large `alloca`s built from many small
vector `store`s. For example, consider an `alloca` of type `<128 x
float>` filled by 64 `store`s of `<2 x float>` each. The current
implementation requires:

- 64 `shufflevector`s( `.vecexpand`)
- 64 `select`s ( `.vecblend` )
- All operations use masks of size 128
- These operations form a long dependency chain

This kind of IR is both difficult to optimize and slow to compile,
particularly impacting the `InstCombine` pass.

This patch introduces a tree-structured merge approach that
significantly reduces the number of operations and improves compilation
performance.

Key features:

- Detects when vector `store`s completely fill an `alloca` without gaps
- Ensures no loads occur in the middle of the store sequence
- Uses a tree-based approach with `shufflevector`s to merge stored
values
- Reduces the number of intermediate operations compared to linear
merging
- Eliminates the long dependency chains that hurt optimization

Example transformation:

```
// Before: (stores do not have to be in order) 
%alloca = alloca <8 x float>
store <2 x float> %val0, ptr %alloca       ; offset 0-1
store <2 x float> %val2, ptr %alloca+16    ; offset 4-5  
store <2 x float> %val1, ptr %alloca+8     ; offset 2-3
store <2 x float> %val3, ptr %alloca+24    ; offset 6-7
%result = load <8 x float>, ptr %alloca

// After (tree-structured merge):
%shuffle0 = shufflevector %val0, %val1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%shuffle1 = shufflevector %val2, %val3, <4 x i32> <i32 0, i32 1, i32 2, i32 3>  
%result = shufflevector %shuffle0, %shuffle1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
```

Benefits:

- Logarithmic depth (O(log n)) instead of linear dependency chains
- Fewer total operations for large vectors
- Better optimization opportunities for subsequent passes
- Significant compilation time improvements for large vector patterns

For some large cases, the compile time can be reduced from about 60s to
less than 3s.

---------

Co-authored-by: chengjunp <chengjunp@nividia.com>
2025-09-19 15:37:41 -07:00
Philip Reames
e6b4a21849
[IR] Add utilities for manipulating length of MemIntrinsic [nfc] (#153856)
Goal is simply to reduce direct usage of getLength and setLength so that
if we end up moving memset.pattern (whose length is in elements) there
are fewer places to audit.
2025-08-20 13:50:11 -07:00
Orlando Cazalet-Hyams
d13341db26
[RemoveDIs][NFC] Remove getAssignmentMarkers (#153214)
getAssignmentMarkers was for debug intrinsics. getDVRAssignmentMarkers
is used for DbgRecords.
2025-08-13 10:56:19 +01:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Tommy MᶜMichen
155359c1f2
[llvm][sroa] Disable support for invariant.group (#151743)
Resolves #151574.

> SROA pass does not perform aggregate load/store rewriting on a pointer
whose source is a `launder.invariant.group`.
> 
> This causes failed assertion in `AllocaSlices`.
> 
> ```
> void (anonymous
namespace)::AllocaSlices::SliceBuilder::visitStoreInst(StoreInst &):
> Assertion `(!SI.isSimple() || ValOp->getType()->isSingleValueType())
&&
>  "All simple FCA stores should have been pre-split"' failed.
> ```

Disables support for `{launder,strip}.invariant.group` intrinsics in
SROA.

Updates SROA test for `invariant.group` support.
2025-08-05 09:59:07 +02:00
Jeremy Morse
5328c732a4
[DebugInfo] Strip more debug-intrinsic code from local utils (#149037)
SROA and a few other facilities use generic-lambdas and some overloaded
functions to deal with both intrinsics and debug-records at the same time.
As part of stripping out intrinsic support, delete a swathe of this code
from things in the Utils directory.

This is a large diff, but is mostly about removing functions that were
duplicated during the migration to debug records. I've taken a few
opportunities to replace comments about "intrinsics" with "records",
and replace generic lambdas with plain lambdas (I believe this makes
it more readable).

All of this is chipping away at intrinsic-specific code until we get to
removing parts of findDbgUsers, which is the final boss -- we can't
remove that until almost everything else is gone.
2025-07-16 14:13:53 +01:00
Alex MacLean
0008af882d
[SROA] Allow as zext<i1> index when unfolding GEP select (#146929)
A zero-extension from an i1 is equivalent to a select with constant 0
and 1 values. Add this case when rewriting gep(select) -> select(gep) to
expose more opportunities for SROA.
2025-07-04 08:16:19 -07:00
Paul Walker
ea9046699e
[LLVM][SROA] Teach SROA how to "bitcast" between fixed and scalable vectors. (#130973)
For function whose vscale_range is limited to a single value we can size
scalable vectors. This aids SROA by allowing scalable vector load and
store operations to be considered for replacement whereby bitcasts
through memory can be replaced by vector insert or extract operations.
2025-06-11 11:02:32 +01:00
Nikita Popov
5c97397c2c
[SROA] Support load-only promotion with dynamic offset loads (#135609)
If we do load-only promotion, it is okay if we leave some loads alone.
We only need to know all stores that affect a specific location.

As such, we can handle loads with unknown offset via the "escaped
read-only" code path.

This is something we already support in LICM load-only promotion, but
doing this in SROA is much better from a phase ordering perspective.

Fixes https://github.com/llvm/llvm-project/issues/134513.
2025-04-17 10:42:07 +02:00
Nikita Popov
a9474191e0
[SROA] Improve handling of lifetimes in load-only promotion (#135382)
The propagateStoredValuesToLoads() transform currently bails out if
there is a lifetime intrinsic spanning the whole alloca, but the
individual loads/stores operate on some smaller part, because the slice
/ partition size does not match.
    
Fix this by ignoring assume-like slices early, regardless of which range
they cover.
    
I've changed the overall code structure here a bit because I was getting
confused by the different iterators.
2025-04-14 11:52:42 +02:00
Rahul Joshi
74b7abf154
[IRBuilder] Add new overload for CreateIntrinsic (#131942)
Add a new `CreateIntrinsic` overload with no `Types`, useful for
creating calls to non-overloaded intrinsics that don't need additional
mangling.
2025-03-31 08:10:34 -07:00
Kazu Hirata
73dc2afd2c
[Transforms] Use *Set::insert_range (NFC) (#132652)
We can use *Set::insert_range to collapse:

  for (auto Elem : Range)
    Set.insert(E);

down to:

  Set.insert_range(Range);

In some cases, we can further fold that into the set declaration.
2025-03-23 19:42:53 -07:00
Nikita Popov
55b480ec3c
[SROA] Allow load-only promotion with read-only captures (#130735)
It's okay if the address or read-provenance of the pointer is captured.
We only have to make sure that there are no unanalyzable writes to the
pointer.
2025-03-13 09:53:02 +01:00
Veera
9d1fbbd2b9
[SROA][NFC] Remove Unused Parameter in promoteAllocas() (#128382)
Removing it because `Function &F` is not used by `promoteAllocas()`.
2025-02-23 11:17:43 -05:00
Harald van Dijk
1083ec647f
[reland][DebugInfo] Update DIBuilder insertion to take InsertPosition (#126967)
After #124287 updated several functions to return iterators rather than
Instruction *, it was no longer straightforward to pass their result to
DIBuilder. This commit updates DIBuilder methods to accept an
InsertPosition instead, so that they can be called with an iterator
(preferred), or with a deprecation warning an Instruction *, or a
BasicBlock *. This commit also updates the existing calls to the
DIBuilder methods to pass in iterators.

As a special exception, DIBuilder::insertDeclare() keeps a separate
overload accepting a BasicBlock *InsertAtEnd. This is because despite
the name, this method does not insert at the end of the block, therefore
this cannot be handled implicitly by using InsertPosition.
2025-02-13 10:46:42 +00:00
Harald van Dijk
23209eb1d9 Revert "[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059)"
This reverts commit 3ec9f7494b31f2fe51d5ed0e07adcf4b7199def6.
2025-02-12 17:50:39 +00:00
Harald van Dijk
3ec9f7494b
[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059)
After #124287 updated several functions to return iterators rather than
Instruction *, it was no longer straightforward to pass their result to
DIBuilder. This commit updates DIBuilder methods to accept an
InsertPosition instead, so that they can be called with an iterator
(preferred), or with a deprecation warning an Instruction *, or a
BasicBlock *. This commit also updates the existing calls to the
DIBuilder methods to pass in iterators.
2025-02-12 17:38:59 +00:00
Jeremy Morse
34b139594a
[NFC][DebugInfo] Switch more call-sites to using iterator-insertion (#124283)
To finalise the "RemoveDIs" work removing debug intrinsics, we're
updating call sites that insert instructions to use iterators instead.
This set of changes are those where it's not immediately obvious that
just calling getIterator to fetch an iterator is correct, and one or two
places where more than one line needs to change.

Overall the same rule holds though: iterators generated for the start of
a block such as getFirstNonPHIIt need to be passed into insert/move
methods without being unwrapped/rewrapped, everything else can use
getIterator.
2025-01-27 16:44:14 +00:00
David Green
2a7ed2c1aa [SROA] Protect against calling the alloca ptr
In case we are calling the alloca ptr directly, check that the Use is a normal
operand to the call. Fortran is a funny language.
2024-12-17 09:21:15 +00:00
David Green
0032c151dc [SROA] Optimize reloaded values in allocas that escape into readonly nocapture calls. (#116645)
Given an alloca that potentially has many uses in big complex code and
escapes into a call that is readonly+nocapture, we cannot easily split
up the alloca. There are several optimizations that will attempt to take
a value that is stored and a reload, and replace the load with the
original stored value. Instcombine has some simple heuristics, GVN can
sometimes do it, as can CSE in limited situations. They all suffer from
the same issue with complex code - they start from a load/store and need
to prove no-alias for all code between, which in complex cases might be
a lot to look through. Especially if the ptr is an alloca with many uses
that is over the normal escape capture limits.

The pass that does do well with allocas is SROA, as it has a complete
view of all of the uses. This patch adds a case to SROA where it can
detect allocas that are passed into calls that are no-capture readonly.
It can then optimize the reloaded values inside the alloca slice with
the stored value knowing that it is valid no matter the location of the
loads/stores from the no-escaping nature of the alloca.
2024-12-14 18:07:21 +00:00
Kirill Stoimenov
e3676aa21f Revert "[SROA] Optimize reloaded values in allocas that escape into readonly nocapture calls. (#116645)"
Causing buffer overflow:

SUMMARY: AddressSanitizer: heap-buffer-overflow llvm/lib/Transforms/Scalar/SROA.cpp:5552:35

This reverts commit 5e247d726d7a54cf0acc997bc17b50e7494e6fa3.
2024-12-12 21:32:35 +00:00
Kirill Stoimenov
7071cd3885 Revert "[Transforms] Silence a warning in SROA.cpp (NFC)"
This reverts commit 5b077506de26b1dfce1926895548b86f2106bed9.
2024-12-12 21:31:15 +00:00
Jie Fu
5b077506de [Transforms] Silence a warning in SROA.cpp (NFC)
/llvm-project/llvm/lib/Transforms/Scalar/SROA.cpp:5526:48:
 error: '&&' within '||' [-Werror,-Wlogical-op-parentheses]
          if (!SI->isSimple() || PartitionType && UserTy != PartitionType)
                              ~~ ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
/llvm-project/llvm/lib/Transforms/Scalar/SROA.cpp:5526:48:
 note: place parentheses around the '&&' expression to silence this warning
          if (!SI->isSimple() || PartitionType && UserTy != PartitionType)
                                               ^
                                 (                                       )
1 error generated.
2024-12-12 18:44:19 +08:00
David Green
5e247d726d
[SROA] Optimize reloaded values in allocas that escape into readonly nocapture calls. (#116645)
Given an alloca that potentially has many uses in big complex code and
escapes into a call that is readonly+nocapture, we cannot easily split
up the alloca. There are several optimizations that will attempt to take
a value that is stored and a reload, and replace the load with the
original stored value. Instcombine has some simple heuristics, GVN can
sometimes do it, as can CSE in limited situations. They all suffer from
the same issue with complex code - they start from a load/store and need
to prove no-alias for all code between, which in complex cases might be
a lot to look through. Especially if the ptr is an alloca with many uses
that is over the normal escape capture limits.

The pass that does do well with allocas is SROA, as it has a complete
view of all of the uses. This patch adds a case to SROA where it can
detect allocas that are passed into calls that are no-capture readonly.
It can then optimize the reloaded values inside the alloca slice with
the stored value knowing that it is valid no matter the location of the
loads/stores from the no-escaping nature of the alloca.
2024-12-12 10:27:27 +00:00
Kazu Hirata
5b19ed8bb4
[llvm] Migrate away from PointerUnion::{is,get,dyn_cast} (NFC) (#115626)
Note that PointerUnion::{is,get,dyn_cast} have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>
2024-11-10 07:24:06 -08:00
Afanasyev Ivan
5a41800ea1
[SROA] Fix NumPromoted statistic for SROA pass (#115586)
`NumPromoted` stat should not be increased if `SROASkipMem2Reg` is set
and nothing is changed.
2024-11-09 10:08:58 +01:00
Kazu Hirata
94f9cbbe49
[Scalar] Remove unused includes (NFC) (#114645)
Identified with misc-include-cleaner.
2024-11-02 08:32:26 -07:00
Jay Foad
e03f427196
[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133)
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
2024-09-19 16:16:38 +01:00
Bartłomiej Chmiel
c2b92a4250
[SROA] Use SetVector for PromotableAllocas (#105809)
Use SetVector to make operations more efficient
if there is a very large number of allocas.
2024-09-04 14:10:51 +02:00
Stephen Tozer
3d08ade7bd
[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149)
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:

https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850

This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).

Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
2024-08-29 17:53:32 +01:00
Shubham Sandeep Rastogi
359c704004
Handle #dbg_values in SROA. (#94070)
This patch properly handles #dbg_values in SROA by making sure that any
#dbg_values get moved to before a store just like #dbg_declares do, or
the #dbg_value is correctly updated with the right alloca after an
aggregate alloca is broken up.

The issue stems from swift where #dbg_values are emitted and not
dbg.declares, the SROA pass doesn't handle the #dbg_values correctly and
it causes them to all have undefs

If we look at this simple-ish testcase (This is all I could reduce it
down to, and I am still relatively bad at writing llvm IR by hand so I
apologize in advance):

```
%T4main1TV13TangentVectorV = type <{ %T4main1UV13TangentVectorV, [7 x i8], %T4main1UV13TangentVectorV }>
%T4main1UV13TangentVectorV = type <{ %T1M1SVySfG, [7 x i8], %T4main1VV13TangentVectorV }>
%T1M1SVySfG = type <{ ptr, %Ts4Int8V }>
%Ts4Int8V = type <{ i8 }>
%T4main1VV13TangentVectorV = type <{ %T1M1SVySfG }>
define hidden swiftcc void @"$s4main1TV13TangentVectorV1poiyA2E_AEtFZ"(ptr noalias nocapture sret(%T4main1TV13TangentVectorV) %0, ptr noalias nocapture dereferenceable(57) %1, ptr noalias nocapture dereferenceable(57) %2) #0 !dbg !44 {
entry:
  %3 = alloca %T4main1VV13TangentVectorV
  %4 = alloca %T4main1UV13TangentVectorV
  %5 = alloca %T4main1VV13TangentVectorV
  %6 = alloca %T4main1UV13TangentVectorV
  %7 = alloca %T4main1VV13TangentVectorV
  %8 = alloca %T4main1UV13TangentVectorV
  %9 = alloca %T4main1VV13TangentVectorV
  %10 = alloca %T4main1UV13TangentVectorV
  call void @llvm.lifetime.start.p0(i64 9, ptr %3)
  call void @llvm.lifetime.start.p0(i64 25, ptr %4)
  call void @llvm.lifetime.start.p0(i64 9, ptr %5)
  call void @llvm.lifetime.start.p0(i64 25, ptr %6)
  call void @llvm.lifetime.start.p0(i64 9, ptr %7)
  call void @llvm.lifetime.start.p0(i64 25, ptr %8)
  call void @llvm.lifetime.start.p0(i64 9, ptr %9)
  call void @llvm.lifetime.start.p0(i64 25, ptr %10)
  %.u1 = getelementptr inbounds %T4main1TV13TangentVectorV, ptr %1, i32 0, i32 0
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %4, ptr align 8 %.u1, i64 25, i1 false)
  %.u11 = getelementptr inbounds %T4main1TV13TangentVectorV, ptr %2, i32 0, i32 0
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %6, ptr align 8 %.u11, i64 25, i1 false)
  call void @llvm.dbg.value(metadata ptr %4, metadata !62, metadata !DIExpression(DW_OP_deref)), !dbg !75
  %.s = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %4, i32 0, i32 0
  %.s.c = getelementptr inbounds %T1M1SVySfG, ptr %.s, i32 0, i32 0
  %11 = load ptr, ptr %.s.c
  %.s.b = getelementptr inbounds %T1M1SVySfG, ptr %.s, i32 0, i32 1
  %.s.b._value = getelementptr inbounds %Ts4Int8V, ptr %.s.b, i32 0, i32 0
  %12 = load i8, ptr %.s.b._value
  %.s2 = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %6, i32 0, i32 0
  %.s2.c = getelementptr inbounds %T1M1SVySfG, ptr %.s2, i32 0, i32 0
  %13 = load ptr, ptr %.s2.c
  %.s2.b = getelementptr inbounds %T1M1SVySfG, ptr %.s2, i32 0, i32 1
  %.s2.b._value = getelementptr inbounds %Ts4Int8V, ptr %.s2.b, i32 0, i32 0
  %14 = load i8, ptr %.s2.b._value
  %.v = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %4, i32 0, i32 2
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %3, ptr align 8 %.v, i64 9, i1 false)
  %.v3 = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %6, i32 0, i32 2
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %5, ptr align 8 %.v3, i64 9, i1 false)
  %.s4 = getelementptr inbounds %T4main1VV13TangentVectorV, ptr %3, i32 0, i32 0
  %.s4.c = getelementptr inbounds %T1M1SVySfG, ptr %.s4, i32 0, i32 0
  %18 = load ptr, ptr %.s4.c
  %.s5 = getelementptr inbounds %T4main1VV13TangentVectorV, ptr %5, i32 0, i32 0
  %.s5.c = getelementptr inbounds %T1M1SVySfG, ptr %.s5, i32 0, i32 0
  %20 = load ptr, ptr %.s5.c
  %.u2 = getelementptr inbounds %T4main1TV13TangentVectorV, ptr %1, i32 0, i32 2
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %8, ptr align 8 %.u2, i64 25, i1 false)
  %.u26 = getelementptr inbounds %T4main1TV13TangentVectorV, ptr %2, i32 0, i32 2
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %10, ptr align 8 %.u26, i64 25, i1 false)
  %.s7 = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %8, i32 0, i32 0
  %.s7.c = getelementptr inbounds %T1M1SVySfG, ptr %.s7, i32 0, i32 0
  %25 = load ptr, ptr %.s7.c
  %.s7.b = getelementptr inbounds %T1M1SVySfG, ptr %.s7, i32 0, i32 1
  %.s7.b._value = getelementptr inbounds %Ts4Int8V, ptr %.s7.b, i32 0, i32 0
  %26 = load i8, ptr %.s7.b._value
  %.s8 = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %10, i32 0, i32 0
  %.s8.c = getelementptr inbounds %T1M1SVySfG, ptr %.s8, i32 0, i32 0
  %27 = load ptr, ptr %.s8.c
  %.s8.b = getelementptr inbounds %T1M1SVySfG, ptr %.s8, i32 0, i32 1
  %.s8.b._value = getelementptr inbounds %Ts4Int8V, ptr %.s8.b, i32 0, i32 0
  %28 = load i8, ptr %.s8.b._value
  %.v9 = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %8, i32 0, i32 2
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %7, ptr align 8 %.v9, i64 9, i1 false)
  %.v10 = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %10, i32 0, i32 2
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %9, ptr align 8 %.v10, i64 9, i1 false)
  %.s11 = getelementptr inbounds %T4main1VV13TangentVectorV, ptr %7, i32 0, i32 0
  %.s11.c = getelementptr inbounds %T1M1SVySfG, ptr %.s11, i32 0, i32 0
  %32 = load ptr, ptr %.s11.c
  %.s12 = getelementptr inbounds %T4main1VV13TangentVectorV, ptr %9, i32 0, i32 0
  %.s12.c = getelementptr inbounds %T1M1SVySfG, ptr %.s12, i32 0, i32 0
  %34 = load ptr, ptr %.s12.c
  call void @llvm.lifetime.end.p0(i64 25, ptr %10)
  call void @llvm.lifetime.end.p0(i64 9, ptr %9)
  call void @llvm.lifetime.end.p0(i64 25, ptr %8)
  call void @llvm.lifetime.end.p0(i64 9, ptr %7)
  call void @llvm.lifetime.end.p0(i64 25, ptr %6)
  call void @llvm.lifetime.end.p0(i64 9, ptr %5)
  call void @llvm.lifetime.end.p0(i64 25, ptr %4)
  call void @llvm.lifetime.end.p0(i64 9, ptr %3)
  ret void
}
!llvm.module.flags = !{!0, !1, !2, !3, !4, !6, !7, !8, !9, !10, !11, !12, !13, !14, !15}
!swift.module.flags = !{!33}
!llvm.linker.options = !{!34, !35, !36, !37, !38, !39, !40, !41, !42, !43}
!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 14, i32 4]}
!1 = !{i32 1, !"Objective-C Version", i32 2}
!2 = !{i32 1, !"Objective-C Image Info Version", i32 0}
!3 = !{i32 1, !"Objective-C Image Info Section", !"__DATA, no_dead_strip"}
!4 = !{i32 1, !"Objective-C Garbage Collection", i8 0}
!6 = !{i32 7, !"Dwarf Version", i32 4}
!7 = !{i32 2, !"Debug Info Version", i32 3}
!8 = !{i32 1, !"wchar_size", i32 4}
!9 = !{i32 8, !"PIC Level", i32 2}
!10 = !{i32 7, !"uwtable", i32 1}
!11 = !{i32 7, !"frame-pointer", i32 1}
!12 = !{i32 1, !"Swift Version", i32 7}
!13 = !{i32 1, !"Swift ABI Version", i32 7}
!14 = !{i32 1, !"Swift Major Version", i8 6}
!15 = !{i32 1, !"Swift Minor Version", i8 0}
!16 = distinct !DICompileUnit(language: DW_LANG_Swift, file: !17, imports: !18, sdk: "MacOSX14.4.sdk")
!17 = !DIFile(filename: "/Users/emilpedersen/swift2/swift/test/IRGen/debug_scope_distinct.swift", directory: "/Users/emilpedersen/swift2")
!18 = !{!19, !21, !23, !25, !27, !29, !31}
!19 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !20, file: !17)
!20 = !DIModule(scope: null, name: "main", includePath: "/Users/emilpedersen/swift2/swift/test/IRGen")
!21 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !22, file: !17)
!22 = !DIModule(scope: null, name: "Swift", includePath: "/Users/emilpedersen/swift2/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/lib/swift/macosx/Swift.swiftmodule/arm64-apple-macos.swiftmodule")
!23 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !24, line: 60)
!24 = !DIModule(scope: null, name: "_Differentiation", includePath: "/Users/emilpedersen/swift2/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/lib/swift/macosx/_Differentiation.swiftmodule/arm64-apple-macos.swiftmodule")
!25 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !26, line: 61)
!26 = !DIModule(scope: null, name: "M", includePath: "/Users/emilpedersen/swift2/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/test-macosx-arm64/IRGen/Output/debug_scope_distinct.swift.tmp/M.swiftmodule")
!27 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !28, file: !17)
!28 = !DIModule(scope: null, name: "_StringProcessing", includePath: "/Users/emilpedersen/swift2/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/lib/swift/macosx/_StringProcessing.swiftmodule/arm64-apple-macos.swiftmodule")
!29 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !30, file: !17)
!30 = !DIModule(scope: null, name: "_SwiftConcurrencyShims", includePath: "/Users/emilpedersen/swift2/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/lib/swift/shims")
!31 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !32, file: !17)
!32 = !DIModule(scope: null, name: "_Concurrency", includePath: "/Users/emilpedersen/swift2/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/lib/swift/macosx/_Concurrency.swiftmodule/arm64-apple-macos.swiftmodule")
!33 = !{i1 false}
!34 = !{!"-lswiftCore"}
!35 = !{!"-lswift_StringProcessing"}
!36 = !{!"-lswift_Differentiation"}
!37 = !{!"-lswiftDarwin"}
!38 = !{!"-lswift_Concurrency"}
!39 = !{!"-lswiftSwiftOnoneSupport"}
!40 = !{!"-lobjc"}
!41 = !{!"-lswiftCompatibilityConcurrency"}
!42 = !{!"-lswiftCompatibility56"}
!43 = !{!"-lswiftCompatibilityPacks"}
!44 = distinct !DISubprogram( unit: !16, declaration: !52, retainedNodes: !53)
!45 = !DIFile(filename: "<compiler-generated>", directory: "/")
!46 = !DICompositeType(tag: DW_TAG_structure_type, scope: !47, elements: !48, identifier: "$s4main1TV13TangentVectorVD")
!47 = !DICompositeType(tag: DW_TAG_structure_type, identifier: "$s4main1TVD")
!48 = !{}
!49 = !DISubroutineType(types: !50)
!50 = !{!51}
!51 = !DICompositeType(tag: DW_TAG_structure_type, identifier: "$s4main1TV13TangentVectorVXMtD")
!52 = !DISubprogram( file: !45, type: !49, spFlags: DISPFlagOptimized)
!53 = !{!54, !56, !57}
!54 = !DILocalVariable( scope: !44, type: !55, flags: DIFlagArtificial)
!55 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !46)
!56 = !DILocalVariable( scope: !44, flags: DIFlagArtificial)
!57 = !DILocalVariable( scope: !44, type: !58, flags: DIFlagArtificial)
!58 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !51)
!62 = !DILocalVariable( scope: !63, type: !72, flags: DIFlagArtificial)
!63 = distinct !DISubprogram( type: !66, unit: !16, declaration: !69, retainedNodes: !70)
!64 = !DICompositeType(tag: DW_TAG_structure_type, scope: !65, identifier: "$s4main1UV13TangentVectorVD")
!65 = !DICompositeType(tag: DW_TAG_structure_type, identifier: "$s4main1UVD")
!66 = !DISubroutineType(types: !67)
!67 = !{!68}
!68 = !DICompositeType(tag: DW_TAG_structure_type, identifier: "$s4main1UV13TangentVectorVXMtD")
!69 = !DISubprogram( spFlags: DISPFlagOptimized)
!70 = !{!71, !73}
!71 = !DILocalVariable( scope: !63, flags: DIFlagArtificial)
!72 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !64)
!73 = !DILocalVariable( scope: !63, type: !74, flags: DIFlagArtificial)
!74 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !68)
!75 = !DILocation( scope: !63, inlinedAt: !76)
!76 = distinct !DILocation( scope: !44)

```

if we run
` opt -S -passes=sroa file.ll  -o -`

With this patch we will see
```
%.sroa.5.sroa.021 = alloca [7 x i8], align 8
tail call void @llvm.dbg.value(metadata ptr %.sroa.5.sroa.021, metadata !59, metadata !DIExpression(DW_OP_deref, DW_OP_LLVM_fragment, 72, 56)), !dbg !72
%.sroa.5.sroa.014 = alloca [7 x i8], align 8
 ```
 
 Without this patch we will see:
 
```
%.sroa.5.sroa.021 = alloca [7 x i8], align 8
%.sroa.5.sroa.014 = alloca [7 x i8], align 8
```

Thus this patch ensures that llvm.dbg.values that use allocas that are broken up still have the correct metadata and debug information is preserved

This is part of a stack of patches and is preceded by: https://github.com/llvm/llvm-project/pull/94068
2024-08-21 17:52:37 -07:00
Orlando Cazalet-Hyams
57539418ba
[SROA] Fix debug locations for variables with non-zero offsets (#97750)
Fixes issue #61981 by adjusting variable location offsets (in the DIExpression)
when splitting allocas.

Patch [4/4] to fix structured bindings in SROA.

NOTE: There's still a bug in mem2reg which generates incorrect locations in some
situations: if the variable fragment has an offset into the new (split) alloca,
mem2reg will fail to convert that into a bit shift (the location contains a
garbage offset). That's not addressed here.

insertNewDbgInst - Now takes the address-expression and FragmentInfo as
  separate parameters because unlike dbg_declares dbg_assigns want those to go
  to different places. dbg_assign records put the variable fragment info in the
  value expression only (whereas dbg_declare has only one expression so puts it
  there - ideally this information wouldn't live in DIExpression, but that's
  another issue).

MigrateOne - Modified to correctly compute the necessary offsets and fragment
  adjustments. The previous implementation produced bogus locations for variables
  with non-zero offsets. The changes replace most of the body of this lambda, so
  it might be easier to review in a split-diff view and focus on the change as a
  whole than to compare it to the old implementation.

  This uses calculateFragmentIntersect and extractLeadingOffset added in previous
  patches in this series, and createOrReplaceFragment described below.

createOrReplaceFragment - Similar to DIExpression::createFragmentExpression
  except for 3 important distinctions:

    1. The new fragment isn't relative to an existing fragment.
    2. There are no checks on the the operation types because it is assumed
       the location this expression is computing is not implicit (i.e., it's
       always safe to create a fragment because arithmetic operations apply
       to the address computation, not to an implicit value computation).
    3. Existing extract_bits are modified independetly of fragment changes
       using \p BitExtractOffset. A change to the fragment offset or size
       may affect a bit extract. But a bit extract offset can change
       independently of the fragment dimensions.

  Returns the new expression, or nullptr if one couldn't be created.  Ideally
  this is only used to signal that a bit-extract has become zero-sized (and thus
  the new debug record has no size and can be dropped), however, it fails for
  other reasons too - see the FIXME below.

  FIXME: To keep the scope of this change focused on non-bitfield structured
  bindings the function bails in situations that
  DIExpression::createFragmentExpression fails. E.g. when fragment and bit
  extract sizes differ. These limitations can be removed in the future.
2024-07-18 09:08:25 +01:00
Nikita Popov
9df71d7673
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.
2024-06-28 08:36:49 +02:00