408 Commits

Author SHA1 Message Date
Nikita Popov
04b717c423 [TLI] Check that malloc argument has type size_t
DSE assumes that this is the case when forming a calloc from a
malloc + memset pair.

For tests, either update the malloc signature or change the
data layout.
2022-03-14 17:22:24 +01:00
Florian Hahn
d03d3d7966
[DSE] Fall back to CFG scan for unreachable terminators.
Blocks with UnreachableInst terminators are considered as root nodes in
the PDT. This pessimize DSE, if there are no aliasing reads from the
potentially dead store and the block with the unreachable terminator.

If any of the root nodes of the PDF has UnreachableInst as terminator,
fall back to the CFG scan, even the common dominator of all killing
blocks does not post-dominate the block with potentially dead store.

It looks like the compile-time impact for the extra scans is negligible.
https://llvm-compile-time-tracker.com/compare.php?from=779bbbf27fe631154bdfaac7a443f198d4654688&to=ac59945f1bec1c6a7d7f5590c8c69fd9c5369c53&stat=instructions

Fixes #53800.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D119760
2022-02-16 14:06:40 +00:00
Florian Hahn
48f1884333
[DSE] Add additional tests with unreachable exits.
Adds tests for #53800.
2022-02-14 14:28:49 +00:00
Nikita Popov
31c1842a7b [DSE] Add test with sret argument (NFC) 2022-01-26 14:25:31 +01:00
Nikita Popov
26f81984e7 [DSE] Handle inaccessiblememonly calloc
Change the DSE calloc handling to assume that it is
inaccessiblememonly, i.e. the defining access is liveOnEntry.

Differential Revision: https://reviews.llvm.org/D117543
2022-01-19 12:55:09 +01:00
Philip Reames
7ac65f6b2e [tests] Add coverage of writeonly attribute and operand bundle intersection 2022-01-18 12:08:14 -08:00
Florian Hahn
e3275cfa94
[BuildLibCalls] Add nounwind,willreturn to memset_pattern{4,8,16}.
Similar to memset, memset_pattern{4,8,16} all will return and do not
unwind. Use fallthrough to include all attributes also set for memset.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114904
2022-01-12 10:32:53 +00:00
Nikita Popov
3cef3cf02f [DSE] Check for noalias calls rather than alloc functions
For these "visible on unwind/ret" checks we only care about the
fact that no other code has access to the pointer (unless it
escapes). A noalias call is sufficient for this, it does not
have to be a known allocation function.

This is basically the same change as D116728, but for DSE rather
than LICM.
2022-01-11 12:22:16 +01:00
Nikita Popov
3d5179febe [DSE] Add additional tests for noalias calls (NFC)
Currently this is special-cased to TLI alloc functions only.
2022-01-11 12:09:54 +01:00
Nikita Popov
fe2c4af905 [DSE] Make test more robust (NFC)
If the allocation is not captured, then all the stores before the
ret are dead anyway.
2022-01-11 11:49:52 +01:00
Nikita Popov
b5a2627423 [DSE] Fix DSE test to use non-extern global (NFC)
The intended transform is not legal with an extern global, because
the actual global defined in a different TU might have larger
size. Make it non-extern to show that the desired transform already
works.
2022-01-03 09:38:04 +01:00
Nikita Popov
3478d64ee4 [DSE] Check for whole object overwrite even if dead store size not known
If the killing store overwrites the whole object, we know that the
preceding store is dead, regardless of the accessed offset or size.
This case was previously only handled if the size of the dead store
was also known.

This allows us to perform conventional DSE for calls that write to
an argument (but without known size).

Differential Revision: https://reviews.llvm.org/D116267
2022-01-03 09:36:44 +01:00
Nikita Popov
eb91d91b7a [DSE] Fix typo in recent commit
This fixes a typo in 81d69e1bda9e4b6a83f29ba1f614e43ab4700972.
Of course we should only skip the particular store if it isn't
removable, not bail out of the whole loop. Add a test to cover
this case.
2021-12-24 11:25:25 +01:00
Nikita Popov
ae64c5a0fd [DSE][MemLoc] Handle intrinsics more generically
Remove the special casing for intrinsics in MemoryLocation::getForDest()
and handle them through the general attribute based code. On the DSE
side, this means that isRemovable() now needs to handle more than a
hardcoded list of intrinsics. We consider everything apart from
volatile memory intrinsics and lifetime markers to be removable.

This allows us to perform DSE on intrinsics that DSE has not been
specially taught about, using a matrix store as an example here.

There is an interesting test change for invariant.start, but I
believe that optimization is correct. It only looks a bit odd
because the code is immediate UB anyway.

Differential Revision: https://reviews.llvm.org/D116210
2021-12-24 09:29:57 +01:00
Nikita Popov
58ad3428d1 [DSE] Add test for matrix store (NFC) 2021-12-23 09:44:01 +01:00
Nikita Popov
f8042492fe [DSE] Regenerate test checks (NFC) 2021-12-23 09:31:44 +01:00
Marianne Mailhot-Sarrasin
90d1786ba0 [DSE] Fix invalid removal of store instruction
Fix handling of alloc-like instructions in isGuaranteedLoopInvariant(). It was not valid when the 'KillingDef' was outside of the loop, while the 'CurrentDef' was inside the loop. In that case, the 'KillingDef' only overwrites the definition from the last iteration of the loop, and not the ones of all iterations. Therefor it does not make the 'CurrentDef' to be dead, and must not remove it.

Fixing issue : https://github.com/llvm/llvm-project/issues/52774

Reviewed by: Florian Hahn

Differential revision: https://reviews.llvm.org/D115965
2021-12-22 16:11:23 -05:00
Marianne Mailhot-Sarrasin
df590567aa [DSE] Add test case showing bug PR52774.
Pre-commiting the test case before the bug fix.

Reviewed by: Florian Hahn

Differential revision: https://reviews.llvm.org/D115965
2021-12-22 14:57:28 -05:00
Nikita Popov
8a0e35f3a7 [MemoryLocation] Don't require nocapture in getForDest()
As reames mentioned on related reviews, we don't need the nocapture
requirement here. First of all, from an API perspective, this is
not something that MemoryLocation::getForDest() should be checking
in the first place, because it does not affect which memory this
particular call can access; it's an orthogonal concern that should
be handled by the caller if necessary.

However, for both of the motivating users in DSE and InstCombine,
we don't need the nocapture requirement, because the capture can
either be purely local to the call (a pointer identity check that
is irrelevant to us), be part of the return value (which we check
is unused), or be written in the dest location, which we have
determined to be dead.

This allows us to remove the special handling for libcalls as well.

Differential Revision: https://reviews.llvm.org/D116148
2021-12-22 12:20:13 +01:00
Philip Reames
44d23d5345 [DSE] Remove calls with known writes to dead memory
This is a reapply of a8a51fe5, which was reverted in 1ba99e due to a failing compiler-rt test.   That test was a false positive because it was checking asan failures not accounting for the fact the call could be validly optimized out.  I hopefully managed to stablize that test in 9b955f.  (That's a speculative fix due to disk consumption needed to build compiler-rt tests locally being absurd.)

Original commit message follows..

The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size.

Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet).

In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest.

Differential Revision: https://reviews.llvm.org/D115904
2021-12-20 18:10:23 -08:00
Nikita Popov
1ba99eaf70 Revert "[DSE] Remove calls with known writes to dead memory"
This reverts commit a8a51fe55649f5e07f9f2973507dc20bc4e40765.

This breaks the strncpy-overflow.cpp test case.
2021-12-18 09:23:41 +01:00
Philip Reames
a8a51fe556 [DSE] Remove calls with known writes to dead memory
The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size.

Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet).

In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest.

Differential Revision: https://reviews.llvm.org/D115904
2021-12-17 13:42:36 -08:00
Philip Reames
d9d6e6a048 [tests] Precommit tests from D115904 2021-12-17 12:42:51 -08:00
Florian Hahn
ddfac0759c
Revert "[MemoryLocation] Handle memset_pattern{4,8,16} in getForDest."
This reverts commit ac60263ad173dbd2eba6e0c8d892d8c3dcc5306c.

It looks like the test fails on certain non-Darwin system, even though
the triple is explicitly set to macos. Revert while I investigate.
2021-12-14 14:48:47 +00:00
Florian Hahn
ac60263ad1
[MemoryLocation] Handle memset_pattern{4,8,16} in getForDest.
memset_pattern{4,8,16} writes to the first argument. Use getForDest
to return the corresponding MemoryLocation.

Reviewed By: ab

Differential Revision: https://reviews.llvm.org/D114906
2021-12-14 14:41:28 +00:00
Florian Hahn
4a419ea400
[DSE] Add additional memset_chk tests. 2021-12-06 13:06:11 +00:00
Florian Hahn
829b29b619
[MemoryLocation] strcat/strncat/strcpy read/write after their args.
strcpy/strcat/strncat access memory starting from the passed in
pointers. Construct memory locations for their args using getAfter.

Discussed in D114872.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D114969
2021-12-03 08:48:23 +00:00
Florian Hahn
68782a860d
[DSE] Read after strcpy test. 2021-12-02 17:37:59 +00:00
Florian Hahn
5fe151f98f
[DSE] Add libcall tests for functions only available on Darwin.
Add a set of tests for memset_pattern{4,8,16} variants.
2021-12-01 20:30:15 +00:00
Florian Hahn
7de410440d
[DSE] Allow DSE to optimize MemorySSA by default.
This allows for better optimization of 'stores-of-existing-values' and
possibly helps passes further down the pipeline.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D113712
2021-12-01 08:29:23 +00:00
Florian Hahn
c9ad356266
[DSE] Use optimized access if available for redundant store elimination.
Using the optimized access enables additional optimizations in cases
where the defining access is a non-aliasing store.

Alternatively we could also walk upwards and skip non-aliasing defs
here, but my experiments so far showed that this will noticeably
increase compile-time for little extra gain compared to just using the
optimized access.

Improvements of dse.NumRedundantStores on MultiSource/CINT2006/CPF2006
on X86 with -O3:

     test-suite...-typeset/consumer-typeset.test     1.00                  76.00              7500.0%
     test-suite.../Benchmarks/Bullet/bullet.test     3.00                  12.00              300.0%
     test-suite...006/453.povray/453.povray.test     3.00                   6.00              100.0%
     test-suite...telecomm-gsm/telecomm-gsm.test     1.00                   2.00              100.0%
     test-suite...ediabench/gsm/toast/toast.test     1.00                   2.00              100.0%
     test-suite...marks/7zip/7zip-benchmark.test     1.00                   2.00              100.0%
     test-suite...ications/JM/lencod/lencod.test     7.00                  10.00              42.9%
     test-suite...6/464.h264ref/464.h264ref.test     6.00                   8.00              33.3%
     test-suite...ications/JM/ldecod/ldecod.test     6.00                   7.00              16.7%
     test-suite...006/447.dealII/447.dealII.test    33.00                  33.00               0.0%
     test-suite...6/471.omnetpp/471.omnetpp.test    NaN                     1.00               nan%
     test-suite...006/450.soplex/450.soplex.test    NaN                     2.00               nan%
     test-suite.../CINT2006/403.gcc/403.gcc.test    NaN                     7.00               nan%
     test-suite...lications/ClamAV/clamscan.test    NaN                     1.00               nan%
     test-suite...CI_Purple/SMG2000/smg2000.test    NaN                     3.00               nan%

Follow-up to D111727.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112315
2021-11-30 15:40:14 +00:00
Florian Hahn
41d59a3645
[DSE] Add memset_chk tests. 2021-11-30 13:50:10 +00:00
Zarko Todorovski
7f7dac7126 [NFC][llvm] Inclusive language: reword uses of sanity test and check
Part of continuing work to use more inclusive language. Reworded uses
of sanity check and sanity test in llvm/test/
2021-11-25 07:21:42 -05:00
Fabian Wolff
7eec832def [DSE] Improve handling of strncpy in Dead Store Elimination
Fixes PR#52062 and one of the remaining cases of PR#47644.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D114035
2021-11-19 17:46:29 +00:00
Fabian Wolff
ffe1741b5c
[DSE] Add additional strncpy tests.
Test for PR#52062 and one of the remaining cases of PR#47644.
2021-11-19 16:18:54 +00:00
Florian Hahn
9c00afe926
[DSE] Add test case with multiple inbounds stores, followed by OOB.
This patch extends the existing out-of-bounds store tests with a case
with a bigger object and multiple inbounds stores, followed by an OOB
store. The OOB store is not used to remove the inbounds stores in this
case at the moment.
2021-11-12 09:40:03 +00:00
Florian Hahn
274a9b0f0b
[DSE] Support redundant stores eliminated by memset.
This patch adds support to remove stores that write the same value
as earlier memesets.

It uses isOverwrite to check that a memset completely overwrites a later
store. The candidate store must store the same bytewise value as the
byte stored by the memset.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112321
2021-10-29 22:19:53 +01:00
Dawid Jurczak
f87e0c68d7 [DSE] Eliminates redundant store of an exisiting value (PR16520)
That's https://reviews.llvm.org/D90328 follow-up.

This change eliminates writes to variables where the value that is being written is already stored in the variable.
This achieves the goal by looping through all memory definitions in the current state and getting defining access from each of them.
When there is defining access where the write instruction is identical to the original instruction it will remove this redundant write.

For example:

void f() {

x = 1;
if foo() {
   x = 1;
   g();
} else {
  h();
}

}
void g();
void h();

The second x=1 will be eliminated since it is rewriting 1 to x. This pass will produce this:

void f() {

x = 1;
if foo() {
   g();
} else {
  h();
}

}
void g();
void h();

Differential Revision: https://reviews.llvm.org/D111727
2021-10-28 16:20:09 +02:00
Florian Hahn
1a2a7cca3e [DSE] Add test case with 2 memcpys that should not be eliminated. 2021-10-27 11:15:58 +01:00
Florian Hahn
286e98b97e
[DSE] Add test cases with more complex redundant stores.
This patch adds more complex test cases with redundant stores of an
existing memset, with other stores in between.

It also makes a few of the existing tests more robust.
2021-10-22 13:50:32 +01:00
Dávid Bolvanský
93fd30a163 [NFC] Added test for PR50339 2021-10-13 12:15:57 +02:00
Dávid Bolvanský
005b715b54 [NFC] Added test for PR49927 2021-10-13 12:15:57 +02:00
Dawid Jurczak
9e65929a8e [DSE] Re-enable calloc transformation with extra care (PR25892)
Transformation from malloc+memset to calloc is always correct and in many situations
it brings significant observable benefits in terms of execution speed and memory consumption [1][2].
Unfortunately there are cases when producing calloc cause performance drops [3].
As discussed here: https://reviews.llvm.org/D103009 it's possible to differentiate between those 2 scenarios.
If optimizer is able to prove that after malloc call it's _very_ likely to reach memset branch then after
calloc emission we shouldn't observe any performance hits. Therefore finding "null pointer check" pattern
before memset basic block sounds like good justification for performing transformation.
Also that method was already suggested by GCC folks [4]. Main reason for change is that for now
to be safe we check for post dominance relation which is way too conservative approach making transformation
"almost" disabled in practice. This patch tends to enable transformation again but with extra care.

[1] https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc
[2] https://vorpus.org/blog/why-does-calloc-exist/
[3] http://smalldatum.blogspot.com/2017/11/a-new-optimization-in-gcc-5x-and-mysql.html
[4] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83022

Differential Revision: https://reviews.llvm.org/D110021
2021-10-10 21:47:14 +02:00
Nikita Popov
ba664d9066 [AA] Move earliest escape tracking from DSE to AA
This is a followup to D109844 (and alternative to D109907), which
integrates the new "earliest escape" tracking into AliasAnalysis.
This is done by replacing the pre-existing context-free capture
cache in AAQueryInfo with a replaceable (virtual) object with two
implementations: The SimpleCaptureInfo implements the previous
behavior (check whether object is captured at all), while
EarliestEscapeInfo implements the new behavior from DSE.

This combines the "earliest escape" analysis with the full power of
BasicAA: It subsumes the call handling from D109907, considers a
wider range of escape sources, and works with AA recursion. The
compile-time cost is slightly higher than with D109907.

Differential Revision: https://reviews.llvm.org/D110368
2021-09-25 22:40:41 +02:00
Nikita Popov
327bbbb10b [DSE] Make capture check more precise
It is sufficient that the object has not been captured before the
load that produces the pointer we're loading. A capture after that
can not affect the already loaded pointer.

This is small part of D110368 applied separately.
2021-09-25 22:23:19 +02:00
Nikita Popov
7774166499 [DSE] Add additional capture tests (NFC)
These test other escape sources and the case of multiple
underlying objects.
2021-09-24 21:13:29 +02:00
Florian Hahn
6f28fb7081
Recommit "[DSE] Track earliest escape, use for loads in isReadClobber."
This reverts the revert commit df56fc6ebbee6c458b0473185277b7860f7e3408.

This version of the patch adjusts the location where the EarliestEscapes
cache is cleared when an instruction gets removed. The earliest escaping
instruction does not have to be a memory instruction.

It could be a ptrtoint instruction like in the added test
@earliest_escape_ptrtoint, which subsequently gets removed. We need to
invalidate the EarliestEscape entry referring to the ptrtoint when
deleting it.

This fixes the crash mentioned in
https://bugs.chromium.org/p/chromium/issues/detail?id=1252762#c6
2021-09-24 17:13:27 +01:00
Nico Weber
df56fc6ebb Revert "[DSE] Track earliest escape, use for loads in isReadClobber."
This reverts commit 5ce89279c0986d0bcbe526dce52f91dd0c16427c.
Makes clang crash, see comments on https://reviews.llvm.org/D109844
2021-09-24 09:57:59 -04:00
Florian Hahn
5ce89279c0
[DSE] Track earliest escape, use for loads in isReadClobber.
At the moment, DSE only considers whether a pointer may be captured at
all in a function. This leads to cases where we fail to remove stores to
local objects because we do not check if they escape before potential
read-clobbers or after.

Doing context-sensitive escape queries in isReadClobber has been removed
a while ago in d1a1cce5b130 to save compile-time. See PR50220 for more
context.

This patch introduces a new capture tracker, which keeps track of the
'earliest' capture. An instruction A is considered earlier than instruction
B, if A dominates B. If 2 escapes do not dominate each other, the
terminator of the common dominator is chosen. If not all uses cannot be
analyzed, the earliest escape is set to the first instruction in the
function entry block.

If the query instruction dominates the earliest escape and is not in a
cycle, then pointer does not escape before the query instruction.

This patch uses this information when checking if a load of a loaded
underlying object may alias a write to a stack object. If the stack
object does not escape before the load, they do not alias.

I will share a follow-up patch to also use the information for call
instructions to fix PR50220.

In terms of compile-time, the impact is low in general,
    NewPM-O3: +0.05%
    NewPM-ReleaseThinLTO: +0.05%
    NewPM-ReleaseLTO-g: +0.03

with the largest change being tramp3d-v4 (+0.30%)
http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Compared to always computing the capture information on demand, we get
the following benefits from the caching:
NewPM-O3: -0.03%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.04%

The biggest speedup is tramp3d-v4 (-0.21%).
http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Overall there is a small, but noticeable benefit from caching. I am not
entirely sure if the speedups warrant the extra complexity of caching.
The way the caching works also means that we might miss a few cases, as
it is less precise. Also, there may be a better way to cache things.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109844
2021-09-23 12:45:05 +01:00
Florian Hahn
963d3a22b3
[DSE] Add additional tests to cover review comments.
Adds additional tests following comments from D109844.

Also removes unusued in.ptr arguments and places in the call tests that
used loads instead of a getval call.
2021-09-20 17:06:04 +01:00