With a sized dead_on_return, we need to not eliminate stores if there
are to a pointer with a variable offset from the underlying object
marked dead_on_return. This manifested as an assertion failure as
BaseValue/V ended up not being equal. It's possible we could do a range
analysis to try and prove the variable offset stays within bounds, but
this case seems to come up relatively rarely (only reproducible with a
UBSan build of LLVM) and is probably not worth the compile time.
Fixes#180361.
This reverts commit fdb05bbf62b8d4fc8dc7ce1f1cfa570f3265a8ae.
This was causing failures in release-mode builds because the
GetPointerBaseWithConstantOffset call would never be run which leads to
ValueOffset being uninitialized and thus the behavior of the function is
unpredictable.
Remove the variable definition and move the function call directly into
the assert statement. Otherwise builds with -Werror that don't use
asserts would fail.
dead_on_return is made optionally sized in #171712. This patch adds
handling in DSE so that we can actually eliminate stores to pointer
parameters marked with a sized dead_on_return attribute. We do not
eliminate stores where the store may overlap with bytes that are not
known to be dead after return.
Reviewers: nikic, antoniofrighetto, alinas, aeubanks
Pull Request: https://github.com/llvm/llvm-project/pull/173694
This patch makes the dead_on_return parameter attribute optionally require a number
of bytes to be passed in to specify the number of bytes known to be dead
upon function return/unwind. This is aimed at enabling annotating the
this pointer in C++ destructors with dead_on_return in clang. We need
this to handle cases like the following:
```
struct X {
int n;
~X() {
this[n].n = 0;
}
};
void f() {
X xs[] = {42, -1};
}
```
Where we only certain that sizeof(X) bytes are dead upon return of ~X.
Otherwise DSE would be able to eliminate the store in ~X which would not
be correct.
This patch only does the wiring within IR. Future patches will make
clang emit correct sizing information and update DSE to only delete
stores to objects marked dead_on_return that are provably in bounds of
the number of bytes specified to be dead_on_return.
Reviewers: nikic, alinas, antoniofrighetto
Pull Request: https://github.com/llvm/llvm-project/pull/171712
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.
This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)
It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
Goal is simply to reduce direct usage of getLength and setLength so that
if we end up moving memset.pattern (whose length is in elements) there
are fewer places to audit.
fixes#151764
This fix has two parts first we track all lifetime intrinsics and if
they are users of an alloca of a target extention like dx.RawBuffer then
we eliminate those memory intrinsics when we visit the alloca.
We do step one to allow us to use the Dead Store Elimination Pass. This
removes the alloca and simplifies the use of the target extention back
to using just the global. That keeps things in a form the
DXILBitcodeWriter is expecting.
Obviously to pull this off we needed to bring back the legacy pass
manager plumbing for the DSE pass and hook it up into the DirectX
backend.
The net impact of this change is that DML shader pass rate went from
89.72% (4268 successful compilations) to 90.98% (4328 successful
compilations).
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
cc https://github.com/llvm/llvm-project/pull/138299
rustc sets `allockind("uninitialized")` - if we copy the attributes
as-is when creating a dummy function, Verify complains about
`allockind("uninitialized,zeroed")` conflicting, so we need to clear the
flag.
Co-authored-by: Jamie Hill-Daniel <jamie@osec.io>
Add `dead_on_return` attribute, which is meant to be taken advantage
by the frontend, and states that the memory pointed to by the argument
is dead upon function return. As with `byval`, it is supposed to be
used for passing aggregates by value. The difference lies in the ABI:
`byval` implies that the pointer is explicitly passed as argument to
the callee (during codegen the copy is emitted as per byval contract),
whereas a `dead_on_return`-marked argument implies that the copy
already exists in the IR, is located at a specific stack offset within
the caller, and this memory will not be read further by the caller upon
callee return – or otherwise poison, if read before being written.
RFC: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.
Update the AA CaptureAnalysis providers to return CaptureComponents, so
we can distinguish between full provenance and read-only provenance
captures.
Use this to restrict "other" memory effects on call from ModRef to Ref.
Ideally we would also apply the same reasoning for escape sources, but
the current API cannot actually convey the necessary information (we can
only say NoAlias or MayAlias, not MayAlias but only via Ref).
Add a `alloc-variant-zeroed` function attribute which can be used to
inform folding allocation+memset. This addresses
https://github.com/rust-lang/rust/issues/104847, where LLVM does not
know how to perform this transformation for non-C languages.
Co-authored-by: Jamie <jamie@osec.io>
Migrate their usage to the `AnyMem*Inst` family, and add a isAtomic()
query on the base class for that hierarchy. This matches the idioms we
use for e.g. isAtomic on load, store, etc.. instructions, the existing
isVolatile idioms on mem* routines, and allows us to more easily share
code between atomic and non-atomic variants.
As with #138568, the goal here is to simplify the class hierarchy and
make it easier to reason about. I'm moving from easiest to hardest, and
will stop at some point when I hit "good enough". Longer term, I'd sorta
like to merge or reverse the naming on the plain Mem*Inst and the
AnyMem*Inst, but that's a much larger and more risky change. Not sure
I'm going to actually do that.
In isMaskedStoreOverwrite we process two stores that fully overwrite one
another, here we add support for predicated vector length stores so that
DSE will eliminate this variant of masked stores.
This is the follow up installment mentioned in:
https://reviews.llvm.org/D132700
We can use *Set::insert_range to collapse:
for (auto Elem : Range)
Set.insert(E);
down to:
Set.insert_range(Range);
In some cases, we can further fold that into the set declaration.
The implementation doesn't use it, and is unlikely to use it in
the future.
The places that do set StoreCaptures=false, do so incorrectly and
would be broken if the parameter actually did anything.
In removePartiallyOverlappedStores we iterate over
InstOverlapIntervalsTy which is a DenseMap. Change that map into using
MapVector to ensure that we apply the transforms in a deterministic
order. I've only seen that the order matters if starting to use names
for the instructions created when doing the transforms. But such things
are a bit annoying when debugging etc.
Consider IR like this
call void @llvm.memset.p0.i64(ptr dereferenceable(28) %p, i8 0, i64 28,
i1 false)
store i32 1, ptr %p
In the past it has been optimized like this:
%p2 = getelementptr inbounds i8, ptr %p, i64 4
call void @llvm.memset.p0.i64(ptr dereferenceable(28) %p2, i8 0, i64 24,
i1 false)
store i32 1, ptr %p
As the input IR doesn't guarantee that it is OK to deref 28 bytes
starting at the adjusted pointer %p2 the transformation has been a bit
flawed.
With this patch we make sure to drop any
dereferenceable/dereferenceable_or_null attributes when doing such
transforms. An alternative would have been to adjust the amount of
dereferenceable bytes, but since a memset with a constant length already
implies dereferenceability by itself it is simpler to just drop the
attributes.
The new filtering of attributes is done using a helper that only keep
attributes that we explicitly handle. For the adjusted mem instrinsic
pointers that currently involve "NonNull", "NoUndef" and "Alignment"
(when the alignment is known to be fulfilled also after offsetting the
pointer).
Fixes#115976
There are two ways we can fix this problem, depending on how the
semantics of byval and initializes should interact:
* Don't infer initializes on byval arguments. initializes on byval
refers to the original caller memory (or having both attributes is made
a verifier error).
* Infer initializes on byval, but don't use it in DSE. initializes on
byval refers to the callee copy. This matches the semantics of readonly
on byval. This is slightly more powerful, for example, we could do a
backend optimization where byval + initializes will allocate the full
size of byval on the stack but not copy over the parts covered by
initializes.
I went with the second variant here, skipping byval + initializes in DSE
(FunctionAttrs already doesn't propagate initializes past byval). I'm
open to going in the other direction though.
Fixes https://github.com/llvm/llvm-project/issues/126181.
When comparing the instructions, enable attribute intersection to allow
differences in attributes.
Note that we don't actually have to intersect the attributes on the
earlier instruction, because we're not RAUWing, so there's no chance
that we make any values more poisonous.
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.
This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
While update the read clobber check for the "initializes" attr, we
checked the aliasing among arguments, but didn't consider the aliasing
through global variable. It causes problems in this example:
```
int g_var = 123;
void update(int* ptr) {
*ptr = g_var;
void foo() {
g_var = 0;
bar(&g_var);
}
```
We mistakenly removed `g_var = 0;` as a dead store.
Fix the issue by requiring the CallBase only access argmem or
inaccessiblemem.
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.
This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
Writing a test for this transitively exposed a number of places in
BuildLibCalls where
we were failing to propagate address spaces properly, which are
additionally fixed.
I'd like to use the name CaptureInfo to represent the new attribute
proposed at
https://discourse.llvm.org/t/rfc-improvements-to-capture-tracking/81420,
but it's already taken by AA, and I can't think of great alternatives
(CaptureEffects would be something of a stretch).
As such, I'd like to rename CaptureInfo -> CaptureAnalysis in AA, which
also seems like the more accurate terminology.
Currently DSE unconditionally emits `calloc` as returning a pointer to
AS0. However, this is incorrect for targets that have a non-zero default
AS, as it'd not match the `malloc` signature. This patch addresses that
by piping through the AS for the pointer returned by `malloc` into the
`calloc` insertion call.
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
Refactor DSE with MemoryDefWrapper and MemoryLocationWrapper.
Normally, one MemoryDef accesses one MemoryLocation. With "initializes"
attribute, one MemoryDef (like call instruction) could initialize
multiple MemoryLocations.
Refactor DSE as a preparation to apply "initializes" attribute in DSE in
a follow-up PR
(58dd8a4403).