Replace uses of getAllocatedType() with the more semantic
getAllocationSize() method in the alloca dereferenceability check
and zero-size alloca merging logic.
This simplifies the code by:
- Eliminating manual isArrayAllocation() checks (handled by
getAllocationSize)
- Eliminating superfluous isSized() checks (the verifier rejects them
already)
- Using TypeSize::isScalable() for scalable vector handling (before
casting to uint64_t)
- Using TypeSize::isZero() for zero-size checks
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Returns uint64_t to simplify callers. The goal is eventually replace
getValueType with this query, which should return the known minimum
reference-able size, as provided (instead of a Type) during create.
Additionally the common isSized query would be replaced with an
isExactKnownSize query to test if that size is an exact definition.
#176352 introduced a new fold and a new test for this functionality.
Given the select condition is the same before and after, we can
propagate any profile information that may be attached to the select
instruction. We should not need to explicitly drop any metadata off the
select.
Add support for:
load(addrspacecast(select(Cond, &V1, &V2))) =>
select(Cond, load(addrspacecast(&V1)), load(addrspacecast(&V2)))
Note: alive does not support addrspacecasts and thus proofs are omitted.
Replace patterns that manually compute allocation sizes by multiplying
getTypeAllocSize(getAllocatedType()) by the array size with calls to the
getAllocationSize(DL) API, which handles this correctly and concisely,
returning nullopt for VLAs.
This fixes several places that were not accounting for array allocations
when computing sizes, simplifies code that was doing this manually, and
adds some explicit isFixed checks where implied convert was being used.
This PR is because now that we have opaque pointers, I hate that some
AllocaInst still has type information being consumed by some passes
instead of just using the size, since passes rarely handle that type
information well or correctly. I hope this will grow into a sequence of
commits to slowly eliminate uses of getAllocatedType from AllocaInst.
And similarly later to remove type information from GlobalValue too (it
can be replaced with just dereferenceable bytes, similar to arguments).
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Reverts llvm/llvm-project#154069. I pointed out a number of issues
post-merge, most importantly examples of miscompiles:
https://github.com/llvm/llvm-project/pull/154069#issuecomment-3603854626.
While the motivation of the change is clear, I think the implementation
approach is flawed. It seems like the goal is to allow elements like
`load <2xi16>` and `load i32` to be vectorized together despite the
current algorithm not grouping them into the same equivalence classes. I
personally think that if we want to attempt this it should be a more
wholistic approach, maybe even redefining the concept of an equivalence
class. This current solution seems like it would be really hard to do
bug-free, and even if the bugs were not present, it is only able to
merge chains that happen to be adjacent to each other after
`splitChainByContiguity`, which seems like it is leaving things up to
chance whether this optimization kicks in. But we can discuss more in
the re-land. Maybe the broader approach I'm proposing is too difficult,
and a narrow optimization is worthwhile. Regardless, this should be
reverted, it needs more iteration before it is correct.
This change enables the LoadStoreVectorizer to merge and vectorize
contiguous chains even when their scalar element types differ, as long
as the total bitwidth matches. To do so, we rebase offsets between
chains, normalize value types to a common integer type, and insert the
necessary casts around loads and stores. This uncovers more
vectorization opportunities and explains the expected codegen updates
across AMDGPU tests.
Key changes:
- Chain merging
- Build contiguous subchains and then merge adjacent ones when:
- They refer to the same underlying pointer object and address space.
- They are either all loads or all stores.
- A constant leader-to-leader delta exists.
- Rebasing one chain into the other's coordinate space does not overlap.
- All elements have equal total bit width.
- Rebase the second chain by the computed delta and append it to the
first.
- Type normalization and casting
- Normalize merged chains to a common integer type sized to the total
bits.
- For loads: create a new load of the normalized type, copy metadata,
and cast back to the original type for uses if needed.
- For stores: bitcast the value to the normalized type and store that.
- Insert zext/trunc for integer size changes; use bit-or-pointer casts
when sizes match.
- Cleanups
- Erase replaced instructions and DCE pointer operands when safe.
- New helpers: computeLeaderDelta, chainsOverlapAfterRebase,
rebaseChain, normalizeChainToType, and allElemsMatchTotalBits.
Impact:
- Increases vectorization opportunities across mixed-typed but
size-compatible access chains.
- Large set of expected AMDGPU codegen diffs due to more/changed
vectorization.
This PR resolves#97715.
The copied from constant memory analysis had a special case where
nocapture was not required for read-only calls without (or unused)
return. This is not correct, as the address can still be captured though
means other than memory and the return value, for example using
divergence. This code should not be trying to do its own nocapture
inference.
For loads that operate on aggregate type, instcombine unpacks the loads.
It does not preserve the invariant.load metadata. This patch fixes that,
it looks for the metadata in the parent load and attaches the metadata
to the unpacked loads.
```
%struct.double2 = type { double, double }
%struct.double1 = type { double }
define %struct.double2 @func1(ptr %a) {
%1 = load %struct.double2, ptr %a, align 16, !invariant.load !1
ret %struct.double2 %1
}
!1 = !{}
```
Reproducer: https://godbolt.org/z/hcY8MMvYh
There is no need to first remove the instructions before and then
the ones after in two different worklist iterations. We don't need
to worry about change reporting here, as the functions do that
themselves.
This avoids the issue in #150338, but not really in a principled
way. It's possible that we will have to allow poison arguments
to lifetime.start/lifetime.end again if this turns out to be a
recurring problem.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
The testcase I added previously failed because a SelectInst with invalid
operands was created (one side `addrspace(4)`, the other
`addrspace(5)`).
PointerReplacer needs to dig deeper if the true and/or false
instructions of the select are not available.
Fixes SWDEV-542957
This patch enhances the PtrReplacer as follows:
1. Users are now collected iteratively to be generous on the stack. In
the case of PHIs with incoming values which have not yet been visited,
they are pushed back into the stack for reconsideration.
2. Replace users of the pointer root in a reverse-postorder traversal,
instead of a simple traversal over the collected users. This reordering
ensures that the uses of an instruction are replaced before replacing
the instruction itself.
3. During the replacement of PHI, use the same incoming value if it does
not have a replacement.
This patch specifically fixes the case when an incoming value of a PHI
is addrspacecasted.
This reland PR includes a fix for an assertion failure caused by
https://github.com/llvm/llvm-project/pull/137215, which was reverted.
The failing test involved a phi and gep depending on each other, in
which case the PtrReplacer did not order them correctly for replacement.
This patch fixes it by adding a check during the definition of
`PostOrderWorklist`.
This patch enhances the PtrReplacer as follows:
1. Users are now collected iteratively to be generous on the stack. In the case of PHIs with incoming values which have not yet been visited, they are pushed back into the stack for reconsideration.
2. Replace users of the pointer root in a reverse-postorder traversal, instead of a simpletraversal over the collected users. This reordering ensures that the uses of an instruction are replaced before replacing the instruction itself.
3. During the replacement of PHI, use the same incoming value if it does not have a replacement.
This patch specifically fixes the case when an incoming value of a PHI
is addrspacecasted.
This is a reland of https://github.com/llvm/llvm-project/pull/137215.
This patch enhances the PtrReplacer as follows:
1. Users are now collected iteratively to be generous on the stack. In
the case of PHIs with incoming values which have not yet been visited,
they are pushed back into the stack for reconsideration.
2. Replace users of the pointer root in a reverse-postorder traversal,
instead of a simple traversal over the collected users. This reordering
ensures that the operands of an instruction are replaced before
replacing the instruction itself.
3. During the replacement of PHI, use the same incoming value if it does
not have a replacement.
This patch specifically fixes the case when an incoming value of a PHI
is addrspacecasted.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Part of the coverage-tracking feature, following #107279.
In order for DebugLoc coverage testing to work, we firstly have to set
annotations for intentionally-empty DebugLocs, and secondly we have to
ensure that we do not drop these annotations as we propagate DebugLocs
throughout compilation. As the annotations exist as part of the DebugLoc
class, and not the underlying DILocation, they will not survive a
DebugLoc->DILocation->DebugLoc roundtrip. Therefore this patch modifies
a number of places in the compiler to propagate DebugLocs directly
rather than via the underlying DILocation. This has no effect on the
output of normal builds; it only ensures that during coverage builds, we
do not drop incorrectly annotations and therefore create false
positives.
The bulk of these changes are in replacing
DILocation::getMergedLocation(s) with a DebugLoc equivalent, and in
changing the IRBuilder to store a DebugLoc directly rather than storing
DILocations in its general Metadata array. We also use a new function,
`DebugLoc::orElse`, which selects the "best" DebugLoc out of a pair
(valid location > annotated > empty), preferring the current DebugLoc on
a tie - this encapsulates the existing behaviour at a few sites where we
_may_ assign a DebugLoc to an existing instruction, while extending the
logic to handle annotation DebugLocs at the same time.
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
When constructing a PHI node in `PointerReplacer::replace`, the incoming
operands are expected to have already been replaced and in the
replacement map. However, when one of the incoming operands is a load,
the search of the map is unsuccessful, and a nullptr is returned from
`getReplacement`. The reason is that, when a load is replaced, all the
uses of the load has been actually replaced by the new load. It is
useless to insert the original load into the map. Instead, we should
place the new load into the map to meet the expectation of the later map
search.
Fixes: SWDEV-516420
Proof: https://alive2.llvm.org/ce/z/mzVj-u
I will add some follow-up patches to avoid duplicate code, support more
memory instructions, and bypass gep instructions.
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators. A
number of these (such as getFirstNonPHIOrDbg) are sufficiently
infrequently used that we can just replace the pointer-returning version
with an iterator-returning version, hopefully without much/any
disruption.
Thus this patch has getFirstNonPHIOrDbg and
getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all
call-sites. There are no concerns about the iterators returned being
converted to Instruction*'s and losing the debug-info bit: because the
methods skip debug intrinsics, the iterator head bit is always false
anyway.
This teaches unpackLoadToAggregate and unpackStoreToAggregate to unpack
scalable structs to individual loads/stores with insertvalues /
extractvalues. The gep used for the offsets uses an i8 ptradd as opposed
to a struct gep, as the geps for scalable structs are not supported and
we canonicalize to i8.
When replacing load with a select on the address with a select and 2
loads of the values, copy poison-generating metadata from the original
load to the newly created loads, which are placed at the same place as
the original loads. We cannot copy metadata that may trigger UB.
PR: https://github.com/llvm/llvm-project/pull/115605
A crash could happen in `PointerReplacer::replace` when constructing a
new
select instruction and there is no replacement for one of its operand.
This can
happen when the operand is a load instruction that has been replaced
earlier
such that the operand itself is already the new value. In this case, it
is not
in the replacement map and `getReplacement` simply returns nullptr.
Fix SWDEV-472192.
This was looking through an addrspacecast, and not finding a later
unfoldable cast to another address space. Fixes improperly deleting
a required alloca + memcpy and introducing an illegal addrspacecast.
This also required fixing some worklist management issues with
addrspacecast, and assuming that only memcpy sources could need
replacement.
Regresses one test function, but this looks like it optimized
before by accident. It never saw the pointer use by the call
to readonly_callee, which should require insertion of a new cast.
Fixes#68120
As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.
There are two general flavours of update:
* Almost all call-sites just call getIterator on an instruction
* Several make use of an existing iterator (scenarios where the code is
actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.
Noteworthy changes:
* FindInsertedValue now takes an optional iterator rather than an
instruction pointer, as we need to always insert with iterators,
* I've added a few iterator-taking versions of some value-tracking and
DomTree methods -- they just unwrap the iterator. These are purely
convenience methods to avoid extra syntax in some passes.
* A few calls to getNextNode become std::next instead (to keep in the
theme of using iterators for positions),
* SeparateConstOffsetFromGEP has it's insertion-position field changed.
Noteworthy because it's not a purely localised spelling change.
All this should be NFC.
This is another step in the direction of fixing the `Fixed(0) !=
Scalable(0)` bugbear, although whilst weird I don't believe it's causing
us any real issues.
This is in preparation for the InferAlignment pass which handles
inferring alignment for instructions separately. It is better to handle
this as a separate pass as inferring alignment is quite costly, and
InstCombine running multiple times in the pass pipeline makes it even
more so.
Differential Revision: https://reviews.llvm.org/D158527