354 Commits

Author SHA1 Message Date
Nikita Popov
6b69985da4 [MemCpyOpt] Use helper for unwind check
This extends support to byval arguments. It would be further
extended to handle the case of non-captured noalias returns.
2022-01-26 12:43:31 +01:00
Nikita Popov
0d20407d1a Reapply [MemCpyOpt] Look through pointer casts when checking capture
This is a recommit of the patch without changes. The reason for
the revert has been addressed in D117679.

-----

The user scanning loop above looks through pointer casts, so we
also need to strip pointer casts in the capture check. Previously
the source was incorrectly considered not captured if a bitcast
was passed to the call.
2022-01-20 09:30:21 +01:00
Nikita Popov
655a7024db Reapply [MemCpyOpt] Make capture check during call slot optimization more precise
This is a recommit of the patch without changes. The reason for
the revert has been addressed in D117679.

-----

Call slot optimization is currently supposed to be prevented if
the call can capture the source pointer. Due to an implementation
bug, this check currently doesn't trigger if a bitcast of the source
pointer is passed instead. I'm somewhat afraid of the fallout of
fixing this bug (due to heavy reliance on call slot optimization
in rust), so I'd like to strengthen the capture reasoning a bit first.

In particular, I believe that the capture is fine as long as a)
the call itself cannot depend on the pointer identity, because
neither dest has been captured before/at nor src before the
call and b) there is no potential use of the captured pointer
before the lifetime of the source alloca ends, either due to
lifetime.end or a return from a function. At that point the
potentially captured pointer becomes dangling.

Differential Revision: https://reviews.llvm.org/D115615
2022-01-20 09:30:20 +01:00
Nikita Popov
d7bff2e9d2 [MemCpyOpt] Fix metadata merging during call slot optimization
Call slot optimization currently merges the metadata between the
call and the load. However, we also need to merge in the metadata
of the store.

Part of the reason why we might have gotten away with this
previously is that usually the load and the store are the same
instruction (a memcpy), this can only happen if call slot
optimization occurs on an actual load/store pair.

This addresses the issue reported in
https://reviews.llvm.org/D115615#3251386.

Differential Revision: https://reviews.llvm.org/D117679
2022-01-20 09:25:13 +01:00
Nikita Popov
4dc4815f56 [MemCpyOpt] Add some debug output to call slot optimization (NFC) 2022-01-19 15:51:10 +01:00
Hans Wennborg
53a51acc36 Revert "[MemCpyOpt] Make capture check during call slot optimization more precise"
This casued a miscompile due to call slot optimization replacing a call
argument without considering the call's !noalias metadata, see discussion on
the code review.

> Call slot optimization is currently supposed to be prevented if
> the call can capture the source pointer. Due to an implementation
> bug, this check currently doesn't trigger if a bitcast of the source
> pointer is passed instead. I'm somewhat afraid of the fallout of
> fixing this bug (due to heavy reliance on call slot optimization
> in rust), so I'd like to strengthen the capture reasoning a bit first.
>
> In particular, I believe that the capture is fine as long as a)
> the call itself cannot depend on the pointer identity, because
> neither dest has been captured before/at nor src before the
> call and b) there is no potential use of the captured pointer
> before the lifetime of the source alloca ends, either due to
> lifetime.end or a return from a function. At that point the
> potentially captured pointer becomes dangling.
>
> Differential Revision: https://reviews.llvm.org/D115615

Also reverting the dependent commit:

> [MemCpyOpt] Look through pointer casts when checking capture
>
> The user scanning loop above looks through pointer casts, so we
> also need to strip pointer casts in the capture check. Previously
> the source was incorrectly considered not captured if a bitcast
> was passed to the call.

This reverts commit 487a34ed9d7d24a7b1fb388c8856c784a459b22b
and 00e6869463ae6023d0d48f30de8511d6d748b14f.
2022-01-18 17:41:49 +01:00
Simon Pilgrim
5bbcff6181 [MemCpyOptimizer] hasUndefContents - only look for underlying object if we've found an alloca
Provides an early-out if we fail to find an AllocaInst, and avoids a static analyzer warning about null dereferencing.
2022-01-06 15:15:03 +00:00
Simon Pilgrim
8399fa673b [MemCpyOptimizer] Use auto* for cast<> results (style). NFC. 2022-01-06 15:15:03 +00:00
Nikita Popov
00e6869463 [MemCpyOpt] Look through pointer casts when checking capture
The user scanning loop above looks through pointer casts, so we
also need to strip pointer casts in the capture check. Previously
the source was incorrectly considered not captured if a bitcast
was passed to the call.
2022-01-05 09:50:33 +01:00
Nikita Popov
487a34ed9d [MemCpyOpt] Make capture check during call slot optimization more precise
Call slot optimization is currently supposed to be prevented if
the call can capture the source pointer. Due to an implementation
bug, this check currently doesn't trigger if a bitcast of the source
pointer is passed instead. I'm somewhat afraid of the fallout of
fixing this bug (due to heavy reliance on call slot optimization
in rust), so I'd like to strengthen the capture reasoning a bit first.

In particular, I believe that the capture is fine as long as a)
the call itself cannot depend on the pointer identity, because
neither dest has been captured before/at nor src before the
call and b) there is no potential use of the captured pointer
before the lifetime of the source alloca ends, either due to
lifetime.end or a return from a function. At that point the
potentially captured pointer becomes dangling.

Differential Revision: https://reviews.llvm.org/D115615
2022-01-05 09:39:25 +01:00
Fraser Cormack
7fb66d4035 [MemCpyOpt] Fix a variety of scalable-type crashes
This patch fixes a variety of crashes resulting from the `MemCpyOptPass`
casting `TypeSize` to a constant integer, whether implicitly or
explicitly.

Since the `MemsetRanges` requires a constant size to work, all but one
of the fixes in this patch simply involve skipping the various
optimizations for scalable types as cleanly as possible.

The optimization of `byval` parameters, however, has been updated to
work on scalable types in theory. In practice, this optimization is only
valid when the length of the `memcpy` is known to be larger than the
scalable type size, which is currently never the case. This could
perhaps be done in the future using the `vscale_range` attribute.

Some implicit casts have been left as they were, under the knowledge
they are only called on aggregate types. These should never be
scalably-sized.

Reviewed By: nikic, tra

Differential Revision: https://reviews.llvm.org/D109329
2021-09-08 11:21:36 +01:00
Artem Belevich
30dfd3449e [MemCpyOpt] Allow specifying --enable-memcpyopt-without-libcalls more than once
so we can override it via clang's CLI if necessary.
2021-08-30 13:55:55 -07:00
Nikita Popov
17db125b48 [MemCpyOpt] Optimize MemoryDef insertion
When converting a store into a memset, we currently insert the new
MemoryDef after the store MemoryDef, which requires all uses to be
renamed to the new def using a whole block scan. Instead, we can
insert the new MemoryDef before the store and not rename uses,
because we know that the location is immediately overwritten, so
all uses should still refer to the old MemoryDef. Those uses will
get renamed when the old MemoryDef is actually dropped, which is
efficient.

I expect something similar can be done for some of the other MSSA
updates in MemCpyOpt. This is an alternative to D107513, at least
for this particular case.

Differential Revision: https://reviews.llvm.org/D107702
2021-08-10 21:28:29 +02:00
Nikita Popov
88003cea1c [MemCpyOpt] Remove MemDepAnalysis-based implementation
The MemorySSA-based implementation has been enabled for a few months
(since D94376). This patch drops the old MDA-based implementation
entirely.

I've kept this to only the basic cleanup of dropping various
conditions -- the code could be further cleaned up now that there
is only one implementation.

Differential Revision: https://reviews.llvm.org/D102113
2021-08-07 22:35:44 +02:00
Artem Belevich
6a9cf21f5a [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA.
Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that
there are users that do not expect LLVM to materialize `memset` intrinsic.

While other passes can do that, too, MemCpyOpt triggers it more frequently and
breaks sanitizers and some downstream users.

For now introduce a flag to force-enable the flag and opt-in only CUDA
compilation with NVPTX back-end.

Differential Revision: https://reviews.llvm.org/D106401
2021-08-06 11:13:52 -07:00
Michael Liao
d1cacd5928 [MemCpyOpt] Teach memcpyopt to handle loads from the constant memory.
- Loads from the constant memory (either explicit one or as the source
  of memory transfer intrinsics) won't alias any stores.

Reviewed By: asbirlea, efriedma

Differential Revision: https://reviews.llvm.org/D107605
2021-08-06 12:43:52 -04:00
Nikita Popov
bb15861e14 [MemCpyOpt] Relax libcall checks
Rather than blocking the whole MemCpyOpt pass if the libcalls are
not available, only disable creation of new memset/memcpy intrinsics
where only load/stores were used previously. This only affects the
store merging and load-store conversion optimization. Other
optimizations are derived from existing intrinsics, which are
well-defined in the absence of libcalls -- not having the libcalls
just means that call simplification won't convert them to intrinsics.

This is a weaker variation of D104801, which dropped these checks
entirely. Ideally we would not couple emission of intrinsics to
libcall availability at all, but as the intrinsics may be legalized
to libcalls we need to be a bit careful right now.

Differential Revision: https://reviews.llvm.org/D106769
2021-08-04 21:17:51 +02:00
Artem Belevich
1a43ee65d1 Revert "[MemCpyOpt] Enable memcpy optimizations unconditionally."
This reverts commit 2c98298a7559dfe4a264ef1adaad0921526768cc which breaks
sanitizers.
2021-07-19 14:27:41 -07:00
Artem Belevich
2c98298a75 [MemCpyOpt] Enable memcpy optimizations unconditionally.
The patch does not depend on the availability of the library functions for
memcpy/memset as it operates on LLVM intrinsics.  The optimizations are useful
on the targets that have these functions disabled (e.g. NVPTX & AMDGPU).

Differential Revision: https://reviews.llvm.org/D104801
2021-07-19 11:58:02 -07:00
Arthur Eubanks
ab5693aa4a [OpaquePtr] Use byval type more 2021-07-13 09:34:34 -07:00
Jon Roelofs
37b6e03c18 [Intrinsics] Make MemCpyInlineInst a MemCpyInst
This opens up more optimization opportunities in passes that already handle MemCpyInst's.

Differential revision: https://reviews.llvm.org/D105247
2021-07-02 10:25:24 -07:00
Nikita Popov
9aa951e80e [MemCpyOpt] Preserve address space
Preserve address space when generating the cast to i8*.
2021-06-27 20:21:19 +02:00
Nikita Popov
f025053977 [MemCpyOpt] Handle unusual memcpy element type
Apparently, it is legal to use memcpy/memset with pointer types
other than i8*. Prior to 81fcdae68c5ff656c30032fd26c6a21af4c51dbb
this case was silently miscompiled, as the i8 offset calculation
was performed on some other type. Now it would crash due to a
type mismatch. Fix this by inserting an explicit bitcast to i8*.
2021-06-27 16:21:44 +02:00
Nikita Popov
81fcdae68c [MemCpyOpt] Support opaque pointers 2021-06-27 15:52:38 +02:00
Simon Pilgrim
596004a947 MemCpyOptimizer.cpp - hasUndefContentsMSSA - Pass DataLayout by reference. NFCI. 2021-06-08 10:41:02 +01:00
Arthur Eubanks
6b9524a05b [NewPM] Don't mark AA analyses as preserved
Currently all AA analyses marked as preserved are stateless, not taking
into account their dependent analyses. So there's no need to mark them
as preserved, they won't be invalidated unless their analyses are.

SCEVAAResults was the one exception to this, it was treated like a
typical analysis result. Make it like the others and don't invalidate
unless SCEV is invalidated.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D102032
2021-05-18 13:49:03 -07:00
Hongtao Yu
30bb5be389 [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 2.
As a follow-up to D95982, this patch continues unblocking optimizations that are blocked by pseudu probe instrumention.

The optimizations unblocked are:
		- In-block load propagation.
		- In-block dead store elimination
		- Memory copy optimization that turns stores to consecutive memories into a memset.

These optimizations are local to a block, so they shouldn't affect the profile quality.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D100075
2021-04-26 16:52:33 -07:00
Olle Fredriksson
f5446b769a [MemCpyOpt] Allow variable lengths in memcpy optimizer
This makes the memcpy-memcpy and memcpy-memset optimizations work for
variable sizes as long as they are equal, relaxing the old restriction
that they are constant integers. If they're not equal, the old
requirement that they are constant integers with certain size
restrictions is used.

The implementation works by pushing the length tests further down in the
code, which reveals some places where it's enough that the lengths are
equal (but not necessarily constant).

Differential Revision: https://reviews.llvm.org/D100870
2021-04-21 23:23:38 +02:00
Liam Keegan
edf9565a86 [MemCpyOpt] Add missing MemorySSAWrapperPass dependency macro
Add MemorySSAWrapperPass as a dependency to MemCpyOptLegacyPass,
since MemCpyOpt now uses MemorySSA by default.

Differential Revision: https://reviews.llvm.org/D98484
2021-03-16 20:30:00 +01:00
Nikita Popov
5556660971 [MemCpyOpt] Handle read from lifetime.start with offset
This fixes a regression from the MemDep-based implementation:
MemDep completely ignores lifetime.start intrinsics that aren't
MustAlias -- this is probably unsound, but it does mean that the
MemDep based implementation successfully eliminated memcpy's from
lifetime.start if the memcpy happens at an offset, rather than
the base address of the alloca.

Add a special case for the case where the lifetime.start spans the
whole alloca (which is pretty much the only kind of lifetime.start
that frontends ever emit), as we don't need to figure out our exact
aliasing relationship in that case, the whole alloca is dead prior
to the call.

If this doesn't cover all practically relevant cases, then it
would be possible to make use of the recently added PartialAlias
clobber offsets to make this more precise.
2021-03-13 20:38:09 +01:00
Nikita Popov
2902bdeea1 [MemCpyOpt] Use AA to check for MustAlias between memset and memcpy
Rather than checking for simple equality, check for MustAlias, as
we do in other transforms. This catches equivalent GEPs.
2021-03-13 11:41:15 +01:00
Nikita Popov
9080444f33 [MemCpyOpt] Don't generate zero-size memset
If a memset destination is overwritten by a memcpy and the sizes
are exactly the same, then the memset is simply dead. We can
directly drop it, instead of replacing it with a memset of zero
size, which is particularly ugly for the case of a dynamic size.
2021-03-13 11:41:15 +01:00
Nikita Popov
4125afc357 [MemCpyOpt] Fix handling of readnone byval arguments
If the call is readnone, then there may not be any MemoryAccess
associated with the call. Bail out in that case.

This fixes the issue reported at
https://reviews.llvm.org/D94376#2578312.
2021-02-22 18:48:31 +01:00
Nikita Popov
71a8e4e7d6 [MemCopyOpt] Enable MemorySSA by default
This enables use of MemorySSA instead of MemDep in MemCpyOpt. To
allow this without significant compile-time impact, the MemCpyOpt
pass is moved directly before DSE (in the cases where this was not
already the case), which allows us to reuse the existing MemorySSA
analysis.

Unlike the MemDep-based implementation, the MemorySSA-based MemCpyOpt
can also perform simple optimizations across basic blocks.

Differential Revision: https://reviews.llvm.org/D94376
2021-02-19 18:06:25 +01:00
Kazu Hirata
e53472de68 [Transforms] Use llvm::append_range (NFC) 2021-01-20 21:35:54 -08:00
Kazu Hirata
530c5af6a4 [Transforms] Construct SmallVector with iterator ranges (NFC) 2021-01-02 09:24:17 -08:00
Nikita Popov
624af932a8 [MemCpyOpt] Port to MemorySSA
This is a straightforward port of MemCpyOpt to MemorySSA following
the approach of D26739. MemDep queries are replaced with MSSA queries
without changing the overall structure of the pass. Some care has
to be taken to account for differences between these APIs
(MemDep also returns reads, MSSA doesn't).

Differential Revision: https://reviews.llvm.org/D89207
2020-12-01 17:57:41 +01:00
Nikita Popov
3e37543111 [MemCpyOpt] Move GEP during call slot optimization
When performing a call slot optimization to a GEP destination, it
will currently usually fail, because the GEP is directly before the
memcpy and as such does not dominate the call. We should move it
above the call if that satisfies the domination requirement.

I think that a constant-index GEP is the only useful thing to move
here, as otherwise isDereferenceablePointer couldn't look through
it anyway. As such I'm not trying to generalize this further.

Differential Revision: https://reviews.llvm.org/D89623
2020-10-22 20:40:56 +02:00
Nikita Popov
50cc9a0e61 [MemCpyOpt] Extract common function for unwinding check
These two cases should be using the same logic. Not NFC, as this
resolves the TODO regarding use of the underlying object.
2020-10-17 15:30:39 +02:00
Nikita Popov
cd6f40f432 [MemCpyOpt] Add test scaffolding for MSSA based MemCpyOpt
This adds an -enable-memcpyopt-memoryssa option that currently does
nothing apart from requiring MSSA as a dependency. The tests are
split to run both with the option disabled and enabled. I went with
this rather than the separate directory DSE uses, as I found it
convenient to have a direct side-by-side comparison of differences.

Differential Revision: https://reviews.llvm.org/D89206
2020-10-13 21:45:05 +02:00
Nikita Popov
e79ca751fc [MemCpyOpt] Fix MemorySSA preservation
moveUp() moves instructions, so we should move the corresponding
memory accesses as well. We should also move the store instruction
itself: Even though we'll end up removing it later, this gives us
a correct MemoryDef to replace.

The implementation is somewhat more complicated than it should be,
because we also handle the case where P does not have a memory
access due to a degnerate AA pipeline. Hopefully, the need for this
will go away in the future, when the rest of the pass is based on
MSSA.

Differential Revision: https://reviews.llvm.org/D88778
2020-10-13 21:39:09 +02:00
Nikita Popov
baa3b87015 [MemCpyOpt] Don't shorten memset if memcpy operands may be the same
If the memcpy operands are the same (which is allowed since D86815)
then the memcpy is effectively a no-op and the partially overlapping
memset is not dead.

Differential Revision: https://reviews.llvm.org/D89192
2020-10-13 21:19:19 +02:00
Nikita Popov
39c39e8a7f [MemCpyOpt] Don't shorten memset if destination observable through unwinding
MemCpyOpt can shorten a memset if it is later partially overwritten
by a memcpy. It checks that the destination is not read in between,
but we also need to make sure that the destination cannot be observed
via unwinding.

Differential Revision: https://reviews.llvm.org/D89190
2020-10-13 21:12:19 +02:00
Nikita Popov
5e855f1e80 [MemCpyOpt] Don't hoist store that's not guaranteed to execute
MemCpyOpt can hoist stores while load+store pairs into memcpy.
This hoisting can currently result in stores being executed that
weren't guaranteed to execute in the original problem.

Differential Revision: https://reviews.llvm.org/D89154
2020-10-10 10:26:28 +02:00
Nikita Popov
616f545048 [MemCpyOpt] Use dereferenceable pointer helper
The call slot optimization has some home-grown code for checking
whether the destination is dereferenceable. Replace this with the
generic isDereferenceableAndAlignedPointer() helper.

I'm not checking alignment here, because that is currently handled
separately and may be an enforced alignment for allocas. The clean
way of integrating that part would probably be to accept a callback
in isDereferenceableAndAlignedPointer() for the actual isAligned check,
which would then have a chance to use an enforced alignment instead.

This allows the destination to be a GEP (among other things), though
the two open TODOs may prevent it from working in practice.

Differential Revision: https://reviews.llvm.org/D88805
2020-10-06 18:41:19 +02:00
Nikita Popov
6b441ca523 [MemCpyOpt] Check for throwing calls during call slot optimization
When performing call slot optimization for a non-local destination,
we need to check whether there may be throwing calls between the
call and the copy. Otherwise, the early write to the destination
may be observable by the caller.

This was already done for call slot optimization of load/store,
but not for memcpys. For the sake of clarity, I'm moving this check
into the common optimization function, even if that does need an
additional instruction scan for the load/store case.

As efriedma pointed out, this check is not sufficient due to
potential accesses from another thread. This case is left as a TODO.

Differential Revision: https://reviews.llvm.org/D88799
2020-10-06 18:24:40 +02:00
Nikita Popov
80cde02e85 [MemCpyOpt] Add separate statistic for call slot optimization (NFC) 2020-10-06 18:14:10 +02:00
Nikita Popov
fbf818724f [MemCpyOpt] Make moveUp() a member method (NFC)
So we don't have to pass through more parameters in the future.
2020-10-03 11:28:49 +02:00
Nikita Popov
94704ed008 [MemCpyOpt] Add helper to erase instructions (NFC)
Next to erasing the instruction, we also always want to remove
it from MSSA and MD. Use a common function to do so.

This is a refactoring split out from D26739.
2020-10-02 21:52:10 +02:00
Nikita Popov
87b63c1726 [MemCpyOpt] Avoid double invalidation (NFCI)
The removal of the cpy instruction is left to the caller of
performCallSlotOptzn(), including the invalidation of MD. Both
call-sites already do this.

Also handle incrementation of NumMemCpyInstr consistently at the
call-site. One of the call-site was already doing this, which
ended up incrementing the statistic twice.

This fix was part of D26739.
2020-10-02 21:50:46 +02:00