265 Commits

Author SHA1 Message Date
khei4
b02d349cbf Revert "Revert "[MemCpyOpt] implement single BB stack-move optimization which unify the static unescaped allocas""
This reverts commit 36a6eb7d12a9f827bf3d5d4e5fdc68b8a62807b2.

[MemCpyOpt] check that load/store and dest/src alloca are all in the same bb

Differential Revision: https://reviews.llvm.org/D153453
Co-authored-by: serge-sans-paille <sguelton@mozilla.com>
2023-07-15 16:27:38 +09:00
khei4
a92e197114 [MemCpyOpt] precommit tests to add multi-BB stack-move optimization to check crash for D153453 (NFC)
Differential Revision: https://reviews.llvm.org/D155179
Co-authored-by: serge-sans-paille <sguelton@mozilla.com>
2023-07-15 16:27:38 +09:00
khei4
36a6eb7d12 Revert "[MemCpyOpt] implement single BB stack-move optimization which unify the static unescaped allocas"
This reverts commit 96ae0851c26237378fa1280b0a9ad713e1b72bdb.
2023-07-13 18:04:49 +09:00
khei4
96ae0851c2 [MemCpyOpt] implement single BB stack-move optimization which unify the static unescaped allocas
Differential Revision: https://reviews.llvm.org/D153453
2023-07-13 14:52:30 +09:00
khei4
393215649b [MemCpyOpt] precommit test to add single BB stack-move optimization (NFC)
Differential Revision: https://reviews.llvm.org/D152277
2023-07-13 14:52:30 +09:00
Nikita Popov
edb2fc6dab [llvm] Remove explicit -opaque-pointers flag from tests (NFC)
Opaque pointers mode is enabled by default, no need to explicitly
enable it.
2023-07-12 14:35:55 +02:00
khei4
361464c027 [MemCpyOpt] Use memcpy source directly if dest is known to be immutable from attributes
Differential Revision: https://reviews.llvm.org/D150970
2023-06-10 15:46:32 +09:00
khei4
f0d97c30e9 [MemCpyOpt] precommit test for memcpy removal for immutable arguments from attributes (NFC)
Differential Revision: https://reviews.llvm.org/D150967
2023-06-10 15:46:32 +09:00
khei4
0ad88c156e [MemCpyOpt] remove byval from memcpy size crtical test(NFC)
Differential Revision: https://reviews.llvm.org/D151626
Reviewed By: nikic
2023-05-29 16:56:07 +09:00
Bjorn Pettersson
198e0a12f6 [MemCpyOpt] Fix up debug loc for simplified memset in processMemSetMemCpyDependence
Make sure the code comments in processMemSetMemCpyDependence match
with the actual transform. They indicated that the memset being
rewritten was sunk to after a memcpy, while it actually is inserted
just before the memcpy.

Also make sure we use the debug location of the original memset
when creating the new simplified memset. In the past we've been
using the debug location for the memcpy which could be a bit
confusing.

Differential Revision: https://reviews.llvm.org/D135574
2023-05-16 16:54:51 +02:00
Mikhail Maltsev
ab8150acc5 [MemCpyOpt] Don't fold memcpy.inline into memmove
The llvm.memcpy.inline intrinsic must be expanded into code that
does not contain any function calls because it is intended for
the implementation of low-level functions like memcpy. Currently the
MemCpyOpt might covert llvm.memcpy.inline into llvm.memmove in
certain circumstances. This patch fixes the issue.

Fixes https://github.com/llvm/llvm-project/issues/61791.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D147162
2023-03-30 13:14:59 +01:00
Owen Anderson
8256ddf78c Resolve a long-standing FIXME in memcpyopt.
Inspecting the downstream use of the cpyAlign, it is clear that
`performCallSlotOptzn` is expecting it to represent the alignment
of the copy destination, not the minimum of the src and dest
alignments. This patch renames the parameter to make this more
obvious.

I believe this change is NFC, because the downstream code has
alignment checks such that it all works out in the end. I have not
been able to construct a test case that actually triggers a change
in output.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D140603
2022-12-23 16:15:24 -07:00
Bjorn Pettersson
a11faeed44 [test] Switch to use -passes syntax in various test cases 2022-12-01 21:25:59 +01:00
Nikita Popov
4d33cf4166 [MemCpyOpt] Avoid moving lifetime marker above def (PR58903)
This is unlikely to happen with opaque pointers, so just bail out
of the transform, rather than trying to move bitcasts/etc as well.

Fixes https://github.com/llvm/llvm-project/issues/58903.
2022-11-11 15:06:34 +01:00
Nikita Popov
9a45e4beed [MemCpyOpt] Move lifetime marker before call to enable call slot optimization
Currently call slot optimization may be prevented because the
lifetime markers for the destination only start after the call.
In this case, rather than aborting the transform, we should move
the lifetime.start before the call to enable the transform.

Differential Revision: https://reviews.llvm.org/D135886
2022-11-07 15:26:00 +01:00
Bjorn Pettersson
211cf8a384 [test] Use -passes in more Transforms tests
Another step towards getting rid of dependencies to the legacy
pass manager.

Primary change here is to just do -passes=foo instead of -foo in
simple situations (when running a single transform pass). But also
updated a few test running multiple passes.

Also removed some "duplicated" RUN lines in a few tests that where
using both -foo and -passes=foo syntax. No need to do the same kind
of testing twice.
2022-10-21 17:02:02 +02:00
Nikita Popov
f386f7690d [MemCpyOpt] Add additional tests with lifetime intrinsics (NFC) 2022-10-13 17:29:59 +02:00
Nikita Popov
19aa1aab2e [MemCpyOpt] Don't run full pipeline in test (NFC)
Just memcpyopt is enough for this test.
2022-10-13 17:03:44 +02:00
Arthur Eubanks
f3a928e233 [opt] Don't translate legacy -analysis flag to require<analysis>
Tests relying on this should explicitly use -passes='require<analysis>,foo'.
2022-10-07 14:54:34 -07:00
Nikita Popov
42d98d80fb [MemCpyOpt] Convert tests to opaque pointers (NFC)
Converted using the script at
https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34.
2022-10-05 14:53:50 +02:00
Nikita Popov
5fa14ee835 [MemCpyOpt] Don't hoist above producer of pointer operand
This was already handled correctly below, but not checked for the
original store pointer operand. Encountered when converting tests
to opaque pointers, where the intermediate bitcast goes away.
2022-10-05 14:52:33 +02:00
Florian Hahn
b5e208fcba
[DSE] Support looking through memory phis at end of function.
Update isWriteAtEndOfFunction to look through MemoryPhis. The reason
MemoryPhis were skipped so far was the known AliasAnalysis issue with it
missing loop-carried dependences.

This problem is already addressed in other parts of the code by skipping
MemoryDefs that may be in difference loops. I think the same logic can
be applied here.

This can have a substantial impact on the number of stores removed in
some cases. For MultiSource/SPEC2006/SPEC2017 with -O3:

```
Metric: dse.NumFastStores

Program                                       dse.NumFastStores
                                              base              patch   diff
External/S...CINT2017rate/557.xz_r/557.xz_r     14.00             45.00 221.4%
External/S...te/538.imagick_r/538.imagick_r    439.00           1267.00 188.6%
MultiSourc...e/Applications/SIBsim4/SIBsim4      6.00             15.00 150.0%
MultiSourc...Prolangs-C/simulator/simulator      3.00              7.00 133.3%
MultiSource/Applications/siod/siod               3.00              7.00 133.3%
MultiSourc...arks/FreeBench/distray/distray      6.00              9.00  50.0%
MultiSourc...e/Applications/obsequi/Obsequi     22.00             30.00  36.4%
MultiSource/Benchmarks/Ptrdist/bc/bc            23.00             28.00  21.7%
External/S...NT2017rate/502.gcc_r/502.gcc_r   1258.00           1512.00  20.2%
External/S...te/520.omnetpp_r/520.omnetpp_r    954.00           1143.00  19.8%
External/S...rate/510.parest_r/510.parest_r   5961.00           7122.00  19.5%
External/S...C/CINT2006/445.gobmk/445.gobmk     47.00             56.00  19.1%
External/S...00.perlbench_r/500.perlbench_r    241.00            286.00  18.7%
External/S...NT2006/471.omnetpp/471.omnetpp     36.00             42.00  16.7%
External/S...06/400.perlbench/400.perlbench    183.00            210.00  14.8%
MultiSource/Applications/SPASS/SPASS            72.00             81.00  12.5%
External/S...17rate/541.leela_r/541.leela_r     72.00             80.00  11.1%
External/SPEC/CINT2006/403.gcc/403.gcc         585.00            642.00   9.7%
MultiSourc...e/Applications/sqlite3/sqlite3    120.00            131.00   9.2%
MultiSourc...Applications/hexxagon/hexxagon     11.00             12.00   9.1%
External/S.../CFP2006/453.povray/453.povray    566.00            615.00   8.7%
External/S...rate/511.povray_r/511.povray_r    578.00            627.00   8.5%
External/S...FP2006/482.sphinx3/482.sphinx3     12.00             13.00   8.3%
MultiSource/Applications/oggenc/oggenc         130.00            140.00   7.7%
MultiSourc...e/Applications/ClamAV/clamscan    250.00            268.00   7.2%
MultiSourc.../mediabench/jpeg/jpeg-6a/cjpeg     19.00             20.00   5.3%
MultiSourc...ch/consumer-jpeg/consumer-jpeg     19.00             20.00   5.3%
External/S...te/526.blender_r/526.blender_r   3747.00           3928.00   4.8%
MultiSourc...OE-ProxyApps-C++/miniFE/miniFE    104.00            108.00   3.8%
MultiSourc...ch/consumer-lame/consumer-lame     54.00             56.00   3.7%
MultiSource/Benchmarks/Bullet/bullet          1222.00           1264.00   3.4%
MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4    973.00           1005.00   3.3%
External/S.../CFP2006/447.dealII/447.dealII   2699.00           2780.00   3.0%
External/S...06/483.xalancbmk/483.xalancbmk    788.00            810.00   2.8%
External/S.../CFP2006/450.soplex/450.soplex    180.00            185.00   2.8%
MultiSourc.../DOE-ProxyApps-C++/CLAMR/CLAMR    338.00            345.00   2.1%
MultiSourc...Benchmarks/7zip/7zip-benchmark    685.00            699.00   2.0%
External/S...FP2017rate/544.nab_r/544.nab_r    158.00            160.00   1.3%
MultiSourc...sumer-typeset/consumer-typeset    772.00            781.00   1.2%
External/S...2017rate/525.x264_r/525.x264_r    410.00            414.00   1.0%
External/S...23.xalancbmk_r/523.xalancbmk_r    998.00           1002.00   0.4%
```

Compile-time is almost neutral:

https://llvm-compile-time-tracker.com/compare.php?from=b3125ad3d60531a97eea20009cc9629a87755862&to=84007eee59004f43464eda7f5ba8263ed5158df8&stat=instructions

NewPM-O3: +0.03%
NewPM-ReleaseThinLTO: -0.01%
NewPM-ReleaseLTO-g: +0.03%

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D132365
2022-08-30 13:27:51 +01:00
Augie Fackler
12c0bf8ba9 tests: add attributes that would normally come from inferattrs
As my goal is to remove at least _some_ functions from the static list
in MemoryBuiltins.cpp, these tests either need to run inferattrs or
statically declare these attributes to keep passing. A couple of tests
had alternate cases which are no longer meaningful, e.g.
`malloc-load-removal.ll`.

Differential Revision: https://reviews.llvm.org/D123087
2022-07-25 17:29:00 -04:00
Nikita Popov
a5c3b5748c [MemCpyOpt] Work around PR54682
As discussed on https://github.com/llvm/llvm-project/issues/54682,
MemorySSA currently has a bug when computing the clobber of calls
that access loop-varying locations. I think a "proper" fix for this
on the MemorySSA side might be non-trivial, but we can easily work
around this in MemCpyOpt:

Currently, MemCpyOpt uses a location-less getClobberingMemoryAccess()
call to find a clobber on either the src or dest location, and then
refines it for the src and dest clobber. This was intended as an
optimization, as the location-less API is cached, while the
location-affected APIs are not.

However, I don't think this really makes a difference in practice,
because I don't think anything will use the cached clobbers on
those calls later anyway. On CTMark, this patch seems to be very
mildly positive actually.

So I think this is a reasonable way to avoid the problem for now,
though MemorySSA should also get a fix.

Differential Revision: https://reviews.llvm.org/D122911
2022-04-04 10:19:51 +02:00
Nikita Popov
34135ae9e2 [MemCpyOpt] Add test for PR54682 (NFC) 2022-04-01 16:31:27 +02:00
Arthur Eubanks
55cf09ae26 [ValueTracking] Simplify llvm::isPointerOffset()
We still need the code after stripAndAccumulateConstantOffsets() since
it doesn't handle GEPs of scalable types and non-constant but identical
indexes.

Differential Revision: https://reviews.llvm.org/D120523
2022-03-14 09:32:36 -07:00
Florian Hahn
7662d1687b
[MemCpyOpt] Check all access for MemoryUses in writtenBetween.
Currently writtenBetween can miss clobbers of Loc between End and Start,
if End is a MemoryUse.

To guarantee we see all write clobbers of Loc between Start and End
for MemoryUses, restrict to Start and End being in the same block
and check all accesses between them.

This fixes 2 mis-compiles illustrated in
llvm/test/Transforms/MemCpyOpt/memcpy-byval-forwarding-clobbers.ll

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D119929
2022-02-21 16:54:30 +00:00
Florian Hahn
3ba42a564a
[MemCpyOpt] Add non-local memcpy test with memory phi. 2022-02-18 11:59:24 +00:00
Nikita Popov
97c151de3d [MemCpyOpt] Fix broken check lines (NFC)
These are leftovers from when there were separate MSSA/non-MSSA
check lines.
2022-02-16 14:39:43 +01:00
Florian Hahn
85fd97e3b9
[MemCpyOpt] Add tests with incorrect memcpy->byval forwarding.
Add a few test cases with clobbers (writes and lifetime.end) that should
prevent memcpy->byval forwarding, but those clobbers are missed at the
moment.
2022-02-16 11:35:14 +00:00
Arthur Eubanks
a9029a33ff [OpaquePtr][ValueTracking] Check GEP source element type in isPointerOffset()
Fixes a MemCpyOpt miscompile with opaque pointers.
This function can be further cleaned up, but let's just fix the miscompile first.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D119652
2022-02-13 10:35:38 -08:00
Arthur Eubanks
d050010ea2 [test][MemCpyOpt] Rename test function 2022-02-12 17:44:51 -08:00
Arthur Eubanks
12ba0659b4 [test][MemCpyOpt] Precommit test 2022-02-12 17:38:45 -08:00
Nikita Popov
6b69985da4 [MemCpyOpt] Use helper for unwind check
This extends support to byval arguments. It would be further
extended to handle the case of non-captured noalias returns.
2022-01-26 12:43:31 +01:00
Nikita Popov
ed4efee2a3 [MemCpyOpt] Add additiona call slot unwind tests (NFC)
Test a possibly unwinding call with a byval and sret argument.
2022-01-26 11:49:24 +01:00
Nikita Popov
0d20407d1a Reapply [MemCpyOpt] Look through pointer casts when checking capture
This is a recommit of the patch without changes. The reason for
the revert has been addressed in D117679.

-----

The user scanning loop above looks through pointer casts, so we
also need to strip pointer casts in the capture check. Previously
the source was incorrectly considered not captured if a bitcast
was passed to the call.
2022-01-20 09:30:21 +01:00
Nikita Popov
655a7024db Reapply [MemCpyOpt] Make capture check during call slot optimization more precise
This is a recommit of the patch without changes. The reason for
the revert has been addressed in D117679.

-----

Call slot optimization is currently supposed to be prevented if
the call can capture the source pointer. Due to an implementation
bug, this check currently doesn't trigger if a bitcast of the source
pointer is passed instead. I'm somewhat afraid of the fallout of
fixing this bug (due to heavy reliance on call slot optimization
in rust), so I'd like to strengthen the capture reasoning a bit first.

In particular, I believe that the capture is fine as long as a)
the call itself cannot depend on the pointer identity, because
neither dest has been captured before/at nor src before the
call and b) there is no potential use of the captured pointer
before the lifetime of the source alloca ends, either due to
lifetime.end or a return from a function. At that point the
potentially captured pointer becomes dangling.

Differential Revision: https://reviews.llvm.org/D115615
2022-01-20 09:30:20 +01:00
Nikita Popov
d7bff2e9d2 [MemCpyOpt] Fix metadata merging during call slot optimization
Call slot optimization currently merges the metadata between the
call and the load. However, we also need to merge in the metadata
of the store.

Part of the reason why we might have gotten away with this
previously is that usually the load and the store are the same
instruction (a memcpy), this can only happen if call slot
optimization occurs on an actual load/store pair.

This addresses the issue reported in
https://reviews.llvm.org/D115615#3251386.

Differential Revision: https://reviews.llvm.org/D117679
2022-01-20 09:25:13 +01:00
Nikita Popov
0db30adcfb [MemCpyOpt] Test invalid noalias metadata after call slot opt (NFC) 2022-01-19 15:51:10 +01:00
Hans Wennborg
53a51acc36 Revert "[MemCpyOpt] Make capture check during call slot optimization more precise"
This casued a miscompile due to call slot optimization replacing a call
argument without considering the call's !noalias metadata, see discussion on
the code review.

> Call slot optimization is currently supposed to be prevented if
> the call can capture the source pointer. Due to an implementation
> bug, this check currently doesn't trigger if a bitcast of the source
> pointer is passed instead. I'm somewhat afraid of the fallout of
> fixing this bug (due to heavy reliance on call slot optimization
> in rust), so I'd like to strengthen the capture reasoning a bit first.
>
> In particular, I believe that the capture is fine as long as a)
> the call itself cannot depend on the pointer identity, because
> neither dest has been captured before/at nor src before the
> call and b) there is no potential use of the captured pointer
> before the lifetime of the source alloca ends, either due to
> lifetime.end or a return from a function. At that point the
> potentially captured pointer becomes dangling.
>
> Differential Revision: https://reviews.llvm.org/D115615

Also reverting the dependent commit:

> [MemCpyOpt] Look through pointer casts when checking capture
>
> The user scanning loop above looks through pointer casts, so we
> also need to strip pointer casts in the capture check. Previously
> the source was incorrectly considered not captured if a bitcast
> was passed to the call.

This reverts commit 487a34ed9d7d24a7b1fb388c8856c784a459b22b
and 00e6869463ae6023d0d48f30de8511d6d748b14f.
2022-01-18 17:41:49 +01:00
Nikita Popov
00e6869463 [MemCpyOpt] Look through pointer casts when checking capture
The user scanning loop above looks through pointer casts, so we
also need to strip pointer casts in the capture check. Previously
the source was incorrectly considered not captured if a bitcast
was passed to the call.
2022-01-05 09:50:33 +01:00
Nikita Popov
487a34ed9d [MemCpyOpt] Make capture check during call slot optimization more precise
Call slot optimization is currently supposed to be prevented if
the call can capture the source pointer. Due to an implementation
bug, this check currently doesn't trigger if a bitcast of the source
pointer is passed instead. I'm somewhat afraid of the fallout of
fixing this bug (due to heavy reliance on call slot optimization
in rust), so I'd like to strengthen the capture reasoning a bit first.

In particular, I believe that the capture is fine as long as a)
the call itself cannot depend on the pointer identity, because
neither dest has been captured before/at nor src before the
call and b) there is no potential use of the captured pointer
before the lifetime of the source alloca ends, either due to
lifetime.end or a return from a function. At that point the
potentially captured pointer becomes dangling.

Differential Revision: https://reviews.llvm.org/D115615
2022-01-05 09:39:25 +01:00
Nikita Popov
c2e77c9122 [MemCpyOpt] Add additional call slot capture tests (NFC) 2022-01-05 09:33:04 +01:00
Nikita Popov
396370e889 [MemCpyOpt] Add additional call slot capture tests (NFC)
One test shows a miscompile when bitcasts are involved, the others
cases where we can perform the optimization despite a capture.
2021-12-13 10:57:06 +01:00
Nikita Popov
90ec6dff86 [OpaquePtr] Forbid mixing typed and opaque pointers
Currently, opaque pointers are supported in two forms: The
-force-opaque-pointers mode, where all pointers are opaque and
typed pointers do not exist. And as a simple ptr type that can
coexist with typed pointers.

This patch removes support for the mixed mode. You either get
typed pointers, or you get opaque pointers, but not both. In the
(current) default mode, using ptr is forbidden. In -opaque-pointers
mode, all pointers are opaque.

The motivation here is that the mixed mode introduces additional
issues that don't exist in fully opaque mode. D105155 is an example
of a design problem. Looking at D109259, it would probably need
additional work to support mixed mode (e.g. to generate GEPs for
typed base but opaque result). Mixed mode will also end up
inserting many casts between i8* and ptr, which would require
significant additional work to consistently avoid.

I don't think the mixed mode is particularly valuable, as it
doesn't align with our end goal. The only thing I've found it to
be moderately useful for is adding some opaque pointer tests in
between typed pointer tests, but I think we can live without that.

Differential Revision: https://reviews.llvm.org/D109290
2021-09-10 15:18:23 +02:00
Fraser Cormack
7fb66d4035 [MemCpyOpt] Fix a variety of scalable-type crashes
This patch fixes a variety of crashes resulting from the `MemCpyOptPass`
casting `TypeSize` to a constant integer, whether implicitly or
explicitly.

Since the `MemsetRanges` requires a constant size to work, all but one
of the fixes in this patch simply involve skipping the various
optimizations for scalable types as cleanly as possible.

The optimization of `byval` parameters, however, has been updated to
work on scalable types in theory. In practice, this optimization is only
valid when the length of the `memcpy` is known to be larger than the
scalable type size, which is currently never the case. This could
perhaps be done in the future using the `vscale_range` attribute.

Some implicit casts have been left as they were, under the knowledge
they are only called on aggregate types. These should never be
scalably-sized.

Reviewed By: nikic, tra

Differential Revision: https://reviews.llvm.org/D109329
2021-09-08 11:21:36 +01:00
Nikita Popov
88003cea1c [MemCpyOpt] Remove MemDepAnalysis-based implementation
The MemorySSA-based implementation has been enabled for a few months
(since D94376). This patch drops the old MDA-based implementation
entirely.

I've kept this to only the basic cleanup of dropping various
conditions -- the code could be further cleaned up now that there
is only one implementation.

Differential Revision: https://reviews.llvm.org/D102113
2021-08-07 22:35:44 +02:00
Artem Belevich
6a9cf21f5a [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA.
Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that
there are users that do not expect LLVM to materialize `memset` intrinsic.

While other passes can do that, too, MemCpyOpt triggers it more frequently and
breaks sanitizers and some downstream users.

For now introduce a flag to force-enable the flag and opt-in only CUDA
compilation with NVPTX back-end.

Differential Revision: https://reviews.llvm.org/D106401
2021-08-06 11:13:52 -07:00
Michael Liao
d1cacd5928 [MemCpyOpt] Teach memcpyopt to handle loads from the constant memory.
- Loads from the constant memory (either explicit one or as the source
  of memory transfer intrinsics) won't alias any stores.

Reviewed By: asbirlea, efriedma

Differential Revision: https://reviews.llvm.org/D107605
2021-08-06 12:43:52 -04:00
Nikita Popov
bb15861e14 [MemCpyOpt] Relax libcall checks
Rather than blocking the whole MemCpyOpt pass if the libcalls are
not available, only disable creation of new memset/memcpy intrinsics
where only load/stores were used previously. This only affects the
store merging and load-store conversion optimization. Other
optimizations are derived from existing intrinsics, which are
well-defined in the absence of libcalls -- not having the libcalls
just means that call simplification won't convert them to intrinsics.

This is a weaker variation of D104801, which dropped these checks
entirely. Ideally we would not couple emission of intrinsics to
libcall availability at all, but as the intrinsics may be legalized
to libcalls we need to be a bit careful right now.

Differential Revision: https://reviews.llvm.org/D106769
2021-08-04 21:17:51 +02:00