Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
Split out from #150248:
Since #150944 the size passed to lifetime.start/end is considered
meaningless. The lifetime always applies to the whole alloca.
This adjusts MemoryLocation to determine the MemoryLocation size from
the alloca size, instead of using the argument.
lifetime.start and lifetime.end are primarily intended for use on
allocas, to enable stack coloring and other liveness optimizations. This
is necessary because all (static) allocas are hoisted into the entry
block, so lifetime markers are the only way to convey the actual
lifetimes.
However, lifetime.start and lifetime.end are currently *allowed* to be
used on non-alloca pointers. We don't actually do this in practice, but
just the mere fact that this is possible breaks the core purpose of the
lifetime markers, which is stack coloring of allocas. Stack coloring can
only work correctly if all lifetime markers for an alloca are
analyzable.
* If a lifetime marker may operate on multiple allocas via a select/phi,
we don't know which lifetime actually starts/ends and handle it
incorrectly (https://github.com/llvm/llvm-project/issues/104776).
* Stack coloring operates on the assumption that all lifetime markers
are visible, and not, for example, hidden behind a function call or
escaped pointer. It's not possible to change this, as part of the
purpose of lifetime markers is that they work even in the presence of
escaped pointers, where simple use analysis is insufficient.
I don't think there is any way to have coherent semantics for lifetime
markers on allocas, while also permitting them on arbitrary pointer
values.
This PR restricts lifetimes to operate on allocas only. As a followup, I
will also drop the size argument, which is superfluous if we always
operate on an alloca. (This change also renders various code handling
lifetime markers on non-alloca dead. I plan to clean up that kind of
code after dropping the size argument as well.)
In practice, I've only found a few places that currently produce
lifetimes on non-allocas:
* CoroEarly replaces the promise alloca with the result of an intrinsic,
which will later be replaced back with an alloca. I think this is the
only place where there is some legitimate loss of functionality, but I
don't think this is particularly important (I don't think we'd expect
the promise in a coroutine to admit useful lifetime optimization.)
* SafeStack moves unsafe allocas onto a separate frame. We can safely
drop lifetimes here, as SafeStack performs its own stack coloring.
* Similar for AddressSanitizer, it also moves allocas into separate
memory.
* LSR sometimes replaces the lifetime argument with a GEP chain of the
alloca (where the offsets ultimately cancel out). This is just
unnecessary. (Fixed separately in
https://github.com/llvm/llvm-project/pull/149492.)
* InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast
of an alloca. I don't think this is necessary.
cc https://github.com/llvm/llvm-project/pull/138299
rustc sets `allockind("uninitialized")` - if we copy the attributes
as-is when creating a dummy function, Verify complains about
`allockind("uninitialized,zeroed")` conflicting, so we need to clear the
flag.
Co-authored-by: Jamie Hill-Daniel <jamie@osec.io>
Add `dead_on_return` attribute, which is meant to be taken advantage
by the frontend, and states that the memory pointed to by the argument
is dead upon function return. As with `byval`, it is supposed to be
used for passing aggregates by value. The difference lies in the ABI:
`byval` implies that the pointer is explicitly passed as argument to
the callee (during codegen the copy is emitted as per byval contract),
whereas a `dead_on_return`-marked argument implies that the copy
already exists in the IR, is located at a specific stack offset within
the caller, and this memory will not be read further by the caller upon
callee return – or otherwise poison, if read before being written.
RFC: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.
MemoryLocation::getForDest() returns a (potentially) written location,
while still allowing other reads. Currently, this is limited to
argmemonly functions. However, we can ignore other (non-argmem) read
effects here for the same reason we can ignore argument reads.
Fixes https://github.com/llvm/llvm-project/issues/144300.
Proof: https://alive2.llvm.org/ce/z/LKq_dc
Add a `alloc-variant-zeroed` function attribute which can be used to
inform folding allocation+memset. This addresses
https://github.com/rust-lang/rust/issues/104847, where LLVM does not
know how to perform this transformation for non-C languages.
Co-authored-by: Jamie <jamie@osec.io>
In isMaskedStoreOverwrite we process two stores that fully overwrite one
another, here we add support for predicated vector length stores so that
DSE will eliminate this variant of masked stores.
This is the follow up installment mentioned in:
https://reviews.llvm.org/D132700
These date back to when the non-intrinsic format of variable locations
was still being tested and was behind a compile-time flag, so not all
builds / bots would correctly run them. The solution at the time, to get
at least some test coverage, was to have tests opt-in to non-intrinsic
debug-info if it was built into LLVM.
Nowadays, non-intrinsic format is the default and has been on for more
than a year, there's no need for this flag to exist.
(I've downgraded the flag from "try" to explicitly requesting
non-intrinsic format in some places, so that we can deal with tests that
are explicitly about non-intrinsic format in their own commit).
For the purposes of alias analysis, we should only consider provenance
captures, not address captures. To support this, change (or add)
CaptureTracking APIs to accept a Mask and StopFn argument. The Mask
determines which components we are interested in (for AA that would be
Provenance).
The StopFn determines when we can abort the walk early. Currently, we
want to do this as soon as any of the components in the Mask is
captured. The purpose of making this a separate predicate is that in the
future we will also want to distinguish between capturing full
provenance and read-only provenance. In that case, we can only stop
early once full provenance is captured. The earliest escape analysis
does not get a StopFn, because it must always inspect all captures.
Consider IR like this
call void @llvm.memset.p0.i64(ptr dereferenceable(28) %p, i8 0, i64 28,
i1 false)
store i32 1, ptr %p
In the past it has been optimized like this:
%p2 = getelementptr inbounds i8, ptr %p, i64 4
call void @llvm.memset.p0.i64(ptr dereferenceable(28) %p2, i8 0, i64 24,
i1 false)
store i32 1, ptr %p
As the input IR doesn't guarantee that it is OK to deref 28 bytes
starting at the adjusted pointer %p2 the transformation has been a bit
flawed.
With this patch we make sure to drop any
dereferenceable/dereferenceable_or_null attributes when doing such
transforms. An alternative would have been to adjust the amount of
dereferenceable bytes, but since a memset with a constant length already
implies dereferenceability by itself it is simpler to just drop the
attributes.
The new filtering of attributes is done using a helper that only keep
attributes that we explicitly handle. For the adjusted mem instrinsic
pointers that currently involve "NonNull", "NoUndef" and "Alignment"
(when the alignment is known to be fulfilled also after offsetting the
pointer).
Fixes#115976
There are two ways we can fix this problem, depending on how the
semantics of byval and initializes should interact:
* Don't infer initializes on byval arguments. initializes on byval
refers to the original caller memory (or having both attributes is made
a verifier error).
* Infer initializes on byval, but don't use it in DSE. initializes on
byval refers to the callee copy. This matches the semantics of readonly
on byval. This is slightly more powerful, for example, we could do a
backend optimization where byval + initializes will allocate the full
size of byval on the stack but not copy over the parts covered by
initializes.
I went with the second variant here, skipping byval + initializes in DSE
(FunctionAttrs already doesn't propagate initializes past byval). I'm
open to going in the other direction though.
Fixes https://github.com/llvm/llvm-project/issues/126181.
When comparing the instructions, enable attribute intersection to allow
differences in attributes.
Note that we don't actually have to intersect the attributes on the
earlier instruction, because we're not RAUWing, so there's no chance
that we make any values more poisonous.
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
Relates to (but isn't dependent on) #120420.
This allows alias analysis o the intrinsic of the same quality as for
the libcall, which we want in order to move LoopIdiomRecognize over to
selecting the intrinsic.
While update the read clobber check for the "initializes" attr, we
checked the aliasing among arguments, but didn't consider the aliasing
through global variable. It causes problems in this example:
```
int g_var = 123;
void update(int* ptr) {
*ptr = g_var;
void foo() {
g_var = 0;
bar(&g_var);
}
```
We mistakenly removed `g_var = 0;` as a dead store.
Fix the issue by requiring the CallBase only access argmem or
inaccessiblemem.
As AliasAnalysis now has support for scalable sizes, add tests to
DeadStoreElimination covering the scalable vectors case, in preparation
to extend it.
Currently DSE unconditionally emits `calloc` as returning a pointer to
AS0. However, this is incorrect for targets that have a non-zero default
AS, as it'd not match the `malloc` signature. This patch addresses that
by piping through the AS for the pointer returned by `malloc` into the
`calloc` insertion call.
As d5c89cc proved not to be NFC, prior to this change, duplicate
`MemoryAccess` entries were being added to the worklist in
`isWriteAtEndOfFunction`, prematurely reaching the exploration
limit. When `MemorySSAScanLimit` cutoff is set to 4, the store
was previously not eliminated. Introduce a regression test for
additional validation. The test is a simplified variant of function
`ntlmssp_create_session_key`, coming from @dtcxzyw/llvm-opt-benchmark,
bench/wireshark/original/packet-ntlmssp.c.ll.
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records.
If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.
For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
DSE uses BatchAA, which caches queries using pairs of MemoryLocations.
At the moment, DSE may remove instructions that are used as pointers in
cached MemoryLocations. If a new instruction used by a new MemoryLoation
and this instruction gets allocated at the same address as a previosuly
cached and then removed instruction, we may access an incorrect entry in
the cache.
To avoid this delay removing all instructions except MemoryDefs until
the end of DSE. This should avoid removing any values used in BatchAA's
cache.
Test case by @vporpo from
https://github.com/llvm/llvm-project/pull/83181.
(Test not precommitted because the results are non-determinstic - memset
only sometimes gets removed)
PR: https://github.com/llvm/llvm-project/pull/83411
If a store is dominated by a condition that ensures that the value being
stored in a memory location is already present at that memory location,
consider the store a noop.
Fixes#63419
Add the `dead_on_unwind` attribute, which states that the caller will
not read from this argument if the call unwinds. This allows eliding
stores that could otherwise be visible on the unwind path, for example:
```
declare void @may_unwind()
define void @src(ptr noalias dead_on_unwind %out) {
store i32 0, ptr %out
call void @may_unwind()
store i32 1, ptr %out
ret void
}
define void @tgt(ptr noalias dead_on_unwind %out) {
call void @may_unwind()
store i32 1, ptr %out
ret void
}
```
The optimization is not valid without `dead_on_unwind`, because the `i32
0` value might be read if `@may_unwind` unwinds.
This attribute is primarily intended to be used on sret arguments. In
fact, I previously wanted to change the semantics of sret to include
this "no read after unwind" property (see D116998), but based on the
feedback there it is better to keep these attributes orthogonal (sret is
an ABI attribute, dead_on_unwind is an optimization attribute). This is
a reboot of that change with a separate attribute.
Debugify is extremely useful as a testing and debugging tool, and a good
number of LLVM-IR transform tests use it. We need it to support "new"
non-instruction debug-info to get test coverage, but it's not important
enough to completely convert right now (and it'd be a large
undertaking). Thus: convert to/from dbg.value/DPValue mode on entry and
exit of the pass, which gives us the functionality without any further
work. The cost is compile-time, but again this is only happening during
tests.
Tested by: the large set of debugify tests enabled here. Note the
InstCombine test (cast-mul-select.ll) that hasn't been fully enabled:
this is because there's a debug-info sinking piece of code there that
hasn't been instrumented.
Unfortunately the commit (D123162) introduced a mis-compile
(https://github.com/llvm/llvm-project/issues/70547), which wasn't fixed
by the alternative fix (c0de28b92e98acbeb73)
I think as long as the call considered as ephemeral is not removed, we
need to be conservative. To address the correctness issue quickly, I
think we should revert the patch (as this patch does, it doens't revert
cleanly)
This reverts commit 17fdaccccfad9b143e4aadbcdda7f645de127153.
Fixes https://github.com/llvm/llvm-project/issues/70547
We're running into stack overflows for huge functions with lots of phis.
Even without the stack overflows, this is recursing >7000 in some
auto-generated code.
This fixes the stack overflow and brings down the compile time to
something reasonable.
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.
Differential Revision: https://reviews.llvm.org/D149210