508 Commits

Author SHA1 Message Date
Nikita Popov
81d69e1bda [DSE] Call isRemovable() after getLocForWriteEx() (NFCI)
The only non-trivial change here is that the isReadClobber()
check for redundant stores is now on the DefLoc, not the
UpperLoc. This is semantically the right location to use, though
in practice it makes no difference (the locations are either the
same, or the def inst does not read).
2021-12-24 10:01:25 +01:00
Nikita Popov
ba2b34b1c7 [DSE] Simplify isGuaranteedLoopInvariant() (NFC)
We have Value->stripInBoundsConstantOffsets() which does what we
want here, but the inbounds requirement isn't actually necessary.
We should probably add Value->stripConstantOffsets() as well.
2021-12-24 09:39:44 +01:00
Nikita Popov
ae64c5a0fd [DSE][MemLoc] Handle intrinsics more generically
Remove the special casing for intrinsics in MemoryLocation::getForDest()
and handle them through the general attribute based code. On the DSE
side, this means that isRemovable() now needs to handle more than a
hardcoded list of intrinsics. We consider everything apart from
volatile memory intrinsics and lifetime markers to be removable.

This allows us to perform DSE on intrinsics that DSE has not been
specially taught about, using a matrix store as an example here.

There is an interesting test change for invariant.start, but I
believe that optimization is correct. It only looks a bit odd
because the code is immediate UB anyway.

Differential Revision: https://reviews.llvm.org/D116210
2021-12-24 09:29:57 +01:00
Marianne Mailhot-Sarrasin
90d1786ba0 [DSE] Fix invalid removal of store instruction
Fix handling of alloc-like instructions in isGuaranteedLoopInvariant(). It was not valid when the 'KillingDef' was outside of the loop, while the 'CurrentDef' was inside the loop. In that case, the 'KillingDef' only overwrites the definition from the last iteration of the loop, and not the ones of all iterations. Therefor it does not make the 'CurrentDef' to be dead, and must not remove it.

Fixing issue : https://github.com/llvm/llvm-project/issues/52774

Reviewed by: Florian Hahn

Differential revision: https://reviews.llvm.org/D115965
2021-12-22 16:11:23 -05:00
Nikita Popov
eb2cad8329 [DSE] Make isRemovable() for calls more robust (NFCI)
We can only drop calls if they have an analyzable write, the return
value is not used, they don't throw and they don't diverge. The last
two conditions were previously not checked, because all the libcalls
with analyzable writes already happened to satisfy those conditions
anyway. This may not be true for generalizations (with D115904 in mind).

No test changes because the necessary attributes are already inferred
for currently supported libcalls.

Differential Revision: https://reviews.llvm.org/D115962
2021-12-17 20:52:34 +01:00
Kazu Hirata
7787a8f1b7 [llvm] Use llvm::reverse (NFC) 2021-12-13 21:54:51 -08:00
Florian Hahn
ead3979a92
[MemoryLocation] Move DSE intrinsic handling to MemoryLocation. (NFC)
Suggested in D114872.
2021-12-03 16:00:39 +00:00
Florian Hahn
f078536f46
[MemoryLocation] Move DSE's logic to new MemLoc::getForDest helper (NFC).
DSE has some extra logic to determine the write location of library
calls like str*cpy and str*cat. This patch moves the logic to a new
MemoryLocation:getForDest variant, which takes a call and TLI.

This patch should be NFC, because no other places take advantage of the
new helper yet.

Suggested by @reames post-commit 7eec832def571.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114872
2021-12-03 09:12:01 +00:00
Florian Hahn
7de410440d
[DSE] Allow DSE to optimize MemorySSA by default.
This allows for better optimization of 'stores-of-existing-values' and
possibly helps passes further down the pipeline.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D113712
2021-12-01 08:29:23 +00:00
Florian Hahn
c9ad356266
[DSE] Use optimized access if available for redundant store elimination.
Using the optimized access enables additional optimizations in cases
where the defining access is a non-aliasing store.

Alternatively we could also walk upwards and skip non-aliasing defs
here, but my experiments so far showed that this will noticeably
increase compile-time for little extra gain compared to just using the
optimized access.

Improvements of dse.NumRedundantStores on MultiSource/CINT2006/CPF2006
on X86 with -O3:

     test-suite...-typeset/consumer-typeset.test     1.00                  76.00              7500.0%
     test-suite.../Benchmarks/Bullet/bullet.test     3.00                  12.00              300.0%
     test-suite...006/453.povray/453.povray.test     3.00                   6.00              100.0%
     test-suite...telecomm-gsm/telecomm-gsm.test     1.00                   2.00              100.0%
     test-suite...ediabench/gsm/toast/toast.test     1.00                   2.00              100.0%
     test-suite...marks/7zip/7zip-benchmark.test     1.00                   2.00              100.0%
     test-suite...ications/JM/lencod/lencod.test     7.00                  10.00              42.9%
     test-suite...6/464.h264ref/464.h264ref.test     6.00                   8.00              33.3%
     test-suite...ications/JM/ldecod/ldecod.test     6.00                   7.00              16.7%
     test-suite...006/447.dealII/447.dealII.test    33.00                  33.00               0.0%
     test-suite...6/471.omnetpp/471.omnetpp.test    NaN                     1.00               nan%
     test-suite...006/450.soplex/450.soplex.test    NaN                     2.00               nan%
     test-suite.../CINT2006/403.gcc/403.gcc.test    NaN                     7.00               nan%
     test-suite...lications/ClamAV/clamscan.test    NaN                     1.00               nan%
     test-suite...CI_Purple/SMG2000/smg2000.test    NaN                     3.00               nan%

Follow-up to D111727.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112315
2021-11-30 15:40:14 +00:00
Nikita Popov
3608e18a94 [DSE] Use MapVector for IOLs
I'm not sure whether this can cause any actual non-determinism,
but at least it makes the DSE debug log non-deterministic, which
makes it harder to debug other non-determinism issues.
2021-11-28 21:54:29 +01:00
Florian Hahn
25dad1064b
[DSE] Optimize defining access of defs while walking upwards.
This patch extends the code that walks memory defs upwards to find
clobbering accesses to also try to optimize the clobbering defining
access.

We should be able to find set the optimized access of our starting def
(KillingDef), if the following holds:

 1. It is the first call of getDomMemoryDef for KillingDef (so Current
    ==  KillingDef->getDefiningAccess().
 2. No potentially aliasing defs are skipped.

Then if a (partly) aliasing def is encountered, it can be used as
optimized access for KillingDef. No further optimizations can be
applied to KillingDef.

I'd appreciate a careful look, as the existing documentation is not too
clear on what is expected for optimized accesses.

The motivation for this patch is to use the optimized accesses to cover
more cases of redundant stores as follow-up to D111727.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112313
2021-11-27 13:04:28 +00:00
Evgeniy Brevnov
47e2644c89 [DSE][NFC] Introduce "doesn't overwrite" return code for isOverwrite
Add OR_None code to indicate that there is no overwrite. This has no any effect for current uses but will be used in one of the next patches building support for PHI translation.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D105098
2021-11-23 17:11:15 +07:00
Nikita Popov
aeba28bc62 [DSE] Drop hasAnalyzableMemoryWrite() (NFCI)
The functionality of hasAnalyzableMemoryWrite() is effectively
subsumed by getLocForWriteEx(), which will return None if the
instruction is not analyzable. The implementations don't match
exactly (e.g. getLocForWriteEx() does not limit non-calls to
stores), but in conjunction with the isRemovable() check, it ends
up being the same.
2021-11-20 23:20:12 +01:00
Fabian Wolff
7eec832def [DSE] Improve handling of strncpy in Dead Store Elimination
Fixes PR#52062 and one of the remaining cases of PR#47644.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D114035
2021-11-19 17:46:29 +00:00
Nikita Popov
46c26991ae [DSE] Remove getLocForWrite() (NFCI)
This implements nearly the same logic as getLocForWriteEx(), and
is only used in one place. In that context, we should also know
that getLocForWriteEx() returns a non-None result. As such,
consolidate everything to use one function.
2021-11-18 21:19:18 +01:00
Nikita Popov
f1295563f1 [DSE] Move removePartiallyOverlappedStores() into DSEState (NFC)
So it can use getLocForWriteEx().
2021-11-18 21:19:18 +01:00
Florian Hahn
274a9b0f0b
[DSE] Support redundant stores eliminated by memset.
This patch adds support to remove stores that write the same value
as earlier memesets.

It uses isOverwrite to check that a memset completely overwrites a later
store. The candidate store must store the same bytewise value as the
byte stored by the memset.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112321
2021-10-29 22:19:53 +01:00
Dawid Jurczak
f87e0c68d7 [DSE] Eliminates redundant store of an exisiting value (PR16520)
That's https://reviews.llvm.org/D90328 follow-up.

This change eliminates writes to variables where the value that is being written is already stored in the variable.
This achieves the goal by looping through all memory definitions in the current state and getting defining access from each of them.
When there is defining access where the write instruction is identical to the original instruction it will remove this redundant write.

For example:

void f() {

x = 1;
if foo() {
   x = 1;
   g();
} else {
  h();
}

}
void g();
void h();

The second x=1 will be eliminated since it is rewriting 1 to x. This pass will produce this:

void f() {

x = 1;
if foo() {
   g();
} else {
  h();
}

}
void g();
void h();

Differential Revision: https://reviews.llvm.org/D111727
2021-10-28 16:20:09 +02:00
Simon Pilgrim
13185f0154 [Transforms] eliminateDeadStores - remove unused variable. NFC.
The initial MemoryAccess *Current assignment is never used, and all other uses are initialized/used within the worklist loop (and not across multiple iterations) - so move the variable internal to the loop.

Fixes scan-build unused assignment warning.
2021-10-14 18:10:03 +01:00
Dawid Jurczak
9e65929a8e [DSE] Re-enable calloc transformation with extra care (PR25892)
Transformation from malloc+memset to calloc is always correct and in many situations
it brings significant observable benefits in terms of execution speed and memory consumption [1][2].
Unfortunately there are cases when producing calloc cause performance drops [3].
As discussed here: https://reviews.llvm.org/D103009 it's possible to differentiate between those 2 scenarios.
If optimizer is able to prove that after malloc call it's _very_ likely to reach memset branch then after
calloc emission we shouldn't observe any performance hits. Therefore finding "null pointer check" pattern
before memset basic block sounds like good justification for performing transformation.
Also that method was already suggested by GCC folks [4]. Main reason for change is that for now
to be safe we check for post dominance relation which is way too conservative approach making transformation
"almost" disabled in practice. This patch tends to enable transformation again but with extra care.

[1] https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc
[2] https://vorpus.org/blog/why-does-calloc-exist/
[3] http://smalldatum.blogspot.com/2017/11/a-new-optimization-in-gcc-5x-and-mysql.html
[4] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83022

Differential Revision: https://reviews.llvm.org/D110021
2021-10-10 21:47:14 +02:00
Nikita Popov
14a49f5840 [DSE] Don't check getUnderlyingObject() return value (NFC)
getUnderlyingObject() never returns null. It will simply return
something that is not the "root" underlying object.

Also drop a stale comment.
2021-09-26 18:01:26 +02:00
Nikita Popov
f3c74b72f4 [DSE] Make DSEState non-copyable (NFC)
As it contains a self-reference, the default copy/move ctors
would not be safe.

Move the DSEState::get() method into the ctor to make sure no move
occurs here even without NRVO.

This is a speculative fix for test failures on
llvm-clang-x86_64-expensive-checks-win.
2021-09-26 17:54:38 +02:00
Nikita Popov
ba664d9066 [AA] Move earliest escape tracking from DSE to AA
This is a followup to D109844 (and alternative to D109907), which
integrates the new "earliest escape" tracking into AliasAnalysis.
This is done by replacing the pre-existing context-free capture
cache in AAQueryInfo with a replaceable (virtual) object with two
implementations: The SimpleCaptureInfo implements the previous
behavior (check whether object is captured at all), while
EarliestEscapeInfo implements the new behavior from DSE.

This combines the "earliest escape" analysis with the full power of
BasicAA: It subsumes the call handling from D109907, considers a
wider range of escape sources, and works with AA recursion. The
compile-time cost is slightly higher than with D109907.

Differential Revision: https://reviews.llvm.org/D110368
2021-09-25 22:40:41 +02:00
Nikita Popov
327bbbb10b [DSE] Make capture check more precise
It is sufficient that the object has not been captured before the
load that produces the pointer we're loading. A capture after that
can not affect the already loaded pointer.

This is small part of D110368 applied separately.
2021-09-25 22:23:19 +02:00
Florian Hahn
6f28fb7081
Recommit "[DSE] Track earliest escape, use for loads in isReadClobber."
This reverts the revert commit df56fc6ebbee6c458b0473185277b7860f7e3408.

This version of the patch adjusts the location where the EarliestEscapes
cache is cleared when an instruction gets removed. The earliest escaping
instruction does not have to be a memory instruction.

It could be a ptrtoint instruction like in the added test
@earliest_escape_ptrtoint, which subsequently gets removed. We need to
invalidate the EarliestEscape entry referring to the ptrtoint when
deleting it.

This fixes the crash mentioned in
https://bugs.chromium.org/p/chromium/issues/detail?id=1252762#c6
2021-09-24 17:13:27 +01:00
Nico Weber
df56fc6ebb Revert "[DSE] Track earliest escape, use for loads in isReadClobber."
This reverts commit 5ce89279c0986d0bcbe526dce52f91dd0c16427c.
Makes clang crash, see comments on https://reviews.llvm.org/D109844
2021-09-24 09:57:59 -04:00
Florian Hahn
5ce89279c0
[DSE] Track earliest escape, use for loads in isReadClobber.
At the moment, DSE only considers whether a pointer may be captured at
all in a function. This leads to cases where we fail to remove stores to
local objects because we do not check if they escape before potential
read-clobbers or after.

Doing context-sensitive escape queries in isReadClobber has been removed
a while ago in d1a1cce5b130 to save compile-time. See PR50220 for more
context.

This patch introduces a new capture tracker, which keeps track of the
'earliest' capture. An instruction A is considered earlier than instruction
B, if A dominates B. If 2 escapes do not dominate each other, the
terminator of the common dominator is chosen. If not all uses cannot be
analyzed, the earliest escape is set to the first instruction in the
function entry block.

If the query instruction dominates the earliest escape and is not in a
cycle, then pointer does not escape before the query instruction.

This patch uses this information when checking if a load of a loaded
underlying object may alias a write to a stack object. If the stack
object does not escape before the load, they do not alias.

I will share a follow-up patch to also use the information for call
instructions to fix PR50220.

In terms of compile-time, the impact is low in general,
    NewPM-O3: +0.05%
    NewPM-ReleaseThinLTO: +0.05%
    NewPM-ReleaseLTO-g: +0.03

with the largest change being tramp3d-v4 (+0.30%)
http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Compared to always computing the capture information on demand, we get
the following benefits from the caching:
NewPM-O3: -0.03%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.04%

The biggest speedup is tramp3d-v4 (-0.21%).
http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Overall there is a small, but noticeable benefit from caching. I am not
entirely sure if the speedups warrant the extra complexity of caching.
The way the caching works also means that we might miss a few cases, as
it is less precise. Also, there may be a better way to cache things.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109844
2021-09-23 12:45:05 +01:00
Evgeniy Brevnov
129cf33604 [DSE][NFC] Rename Later->Killing, Earlier->Dead
First (and biggest) change is to use "Killing/Dead" in place of "Later/Earlier" base for names in DSE. For example, [Maybe]DeadLoc - is a location killed by KillingI instruction. I believe such names are more descriptive and easy to understand than current ones.

Second, there are inconsistencies in naming where different names are used for the same thing. Fixed that too.

Third, reordered parameters of isPartialOverwrite, tryToMergePartialOverlappingStores, isOverwrite to make them consistent between each other. This greatly reduces potential mistakes.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D106947
2021-09-21 13:44:12 +07:00
Kazu Hirata
cfc7402419 [llvm] Use drop_begin (NFC) 2021-09-16 08:46:26 -07:00
Alina Sbirlea
e8723abf43 [DSE] Check post-dominance for malloc+memset->calloc transform.
Aiming to address the regression discussed in
https://reviews.llvm.org/D103009.

Differential Revision: https://reviews.llvm.org/D108485
2021-08-23 12:39:51 -07:00
Dawid Jurczak
107401002e [NFC][DSE] Clean up KnownNoReads and MemorySSAScanLimit in DSE
Another simple cleanups set in DSE. CheckCache is removed since 1f1145006b32 and in consequence KnownNoReads is useless.
Also update description of MemorySSAScanLimit which default value is 150 instead 100.

Differential Revision: https://reviews.llvm.org/D107812
2021-08-14 11:26:57 +02:00
Dawid Jurczak
06206a8cd1 [BuildLibCalls][NFC] Remove redundant attribute list from emitCalloc
Additionally with this patch aligned DSE which is the only user of emitCalloc.

Differential Revision: https://reviews.llvm.org/D103523
2021-08-05 16:18:38 +02:00
Dawid Jurczak
238139be09 [DSE][NFC] Clean up DeadStoreElimination from unused variables
Differential Revision: https://reviews.llvm.org/D106446
2021-08-04 19:44:40 +02:00
Dawid Jurczak
5c315bee8c [DSE] Transform memset + malloc --> calloc (PR25892)
After this change DSE can eliminate malloc + memset and emit calloc.
It's https://reviews.llvm.org/D101440 follow-up.

Differential Revision: https://reviews.llvm.org/D103009
2021-07-29 18:34:10 +02:00
Dawid Jurczak
bc536c7101 Revert "[DSE] Transform memset + malloc --> calloc (PR25892)"
This reverts commit 43234b1595125ba2b5c23e7b28f5a67041c77673.

Reason: We should detect that we are implementing 'calloc' and bail out.
2021-07-23 11:51:59 +02:00
Dawid Jurczak
43234b1595 [DSE] Transform memset + malloc --> calloc (PR25892)
After this change DSE can eliminate malloc + memset and emit calloc.
It's https://reviews.llvm.org/D101440 follow-up.

Differential Revision: https://reviews.llvm.org/D103009
2021-07-20 11:39:05 +02:00
Roman Lebedev
4e332cd41a
Revert "Transform memset + malloc --> calloc (PR25892)"
It broke `check-msan`, see e.g. https://lab.llvm.org/buildbot/#/builders/18/builds/1934

This reverts commit 375694a07bcba3be66864c42a5932be1a22831e2.
2021-07-09 16:26:48 +03:00
Dawid Jurczak
375694a07b Transform memset + malloc --> calloc (PR25892)
After this change DSE can eliminate malloc + memset and emit calloc.
It's https://reviews.llvm.org/D101440 follow-up.

Differential Revision: https://reviews.llvm.org/D103009
2021-07-09 11:01:12 +02:00
Evgeniy Brevnov
9568811cb8 [NFC][DSE]Change 'do-while' to 'for' loop to simplify code structure
With 'for' loop there is is a single place where 'Current' is adjusted. It helps to avoid copy paste and makes a bit easy to understand overall loop controll flow.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D101044
2021-07-02 10:00:47 +07:00
Nikita Popov
e81702912e [DSE] Preserve address space
Preserve address space when inserting i8* cast.
2021-06-27 20:26:00 +02:00
Nikita Popov
f00941e061 [DSE] Support opaque pointers
For the start shortening optimization, always use a i8 type for
the GEP, as it is a raw offset calculation.

Handling of non-i8* memset/memcpy arguments requires insertion
of casts. These cases were previously miscompiled, as the offset
calculation was performed on the wrong type.
2021-06-27 17:41:40 +02:00
David Green
a24b02193a [DSE] Remove stores in the same loop iteration
DSE will currently only remove stores in the same block unless they can
be guaranteed to be loop invariant. This expands that to any stores that
are in the same Loop, at the same loop level. This should still account
for where AA/MSSA will not handle aliasing between loops, but allow the
dead stores to be removed where they overlap in the same loop iteration.
It requires adding loop info to DSE, but that looks fairly harmless.

The test case this helps is from code like this, which can come up in
certain matrix operations:
  for(i=..)
    dst[i] = 0;
    for(j=..)
      dst[i] += src[i*n+j];

After LICM, this becomes:
for(i=..)
  dst[i] = 0;
  sum = 0;
  for(j=..)
    sum += src[i*n+j];
  dst[i] = sum;

The first store is dead, and with this patch is now removed.

Differntial Revision: https://reviews.llvm.org/D100464
2021-06-20 17:03:30 +01:00
David Green
297088d1ad Revert "[DSE] Remove stores in the same loop iteration"
Apparently non-dead stores are being removed, as noted in D100464.

This reverts commit 222aeb4d51a46c5a81c9e4ccb16d1d19dd21ec95.
2021-06-08 21:23:08 +01:00
David Green
222aeb4d51 [DSE] Remove stores in the same loop iteration
DSE will currently only remove stores in the same block unless they can
be guaranteed to be loop invariant. This expands that to any stores that
are in the same Loop, at the same loop level. This should still account
for where AA/MSSA will not handle aliasing between loops, but allow the
dead stores to be removed where they overlap in the same loop iteration.
It requires adding loop info to DSE, but that looks fairly harmless.

The test case this helps is from code like this, which can come up in
certain matrix operations:
  for(i=..)
    dst[i] = 0;
    for(j=..)
      dst[i] += src[i*n+j];

After LICM, this becomes:
for(i=..)
  dst[i] = 0;
  sum = 0;
  for(j=..)
    sum += src[i*n+j];
  dst[i] = sum;

The first store is dead, and with this patch is now removed.

Differntial Revision: https://reviews.llvm.org/D100464
2021-05-31 10:22:37 +01:00
Arthur Eubanks
6b9524a05b [NewPM] Don't mark AA analyses as preserved
Currently all AA analyses marked as preserved are stateless, not taking
into account their dependent analyses. So there's no need to mark them
as preserved, they won't be invalidated unless their analyses are.

SCEVAAResults was the one exception to this, it was treated like a
typical analysis result. Make it like the others and don't invalidate
unless SCEV is invalidated.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D102032
2021-05-18 13:49:03 -07:00
Nikita Popov
fb9ed1979a [IR] Add BasicBlock::isEntryBlock() (NFC)
This is a recurring and somewhat awkward pattern. Add a helper
method for it.
2021-05-15 12:41:58 +02:00
David Green
f7cb654763 [DSE] Move isOverwrite into DSEState. NFC
This moves the isOverwrite function into the DSEState so that it can
share the analyses and members from the state.

A few extra loop tests were also added to test stores in and around
multi block loops for D100464.
2021-05-14 09:16:51 +01:00
Dávid Bolvanský
e81819377e [DSE] Eliminate zero memset after calloc
Solves PR11896

As noted, this can be improved futher (calloc -> malloc) in some cases. But for know, this is the first step.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101391
2021-04-28 03:30:52 +02:00
dfukalov
c1a88e007b [AA][NFC] Convert AliasResult to class containing offset for PartialAlias case.
Add an ability to store `Offset` between partially aliased location. Use this
storage within returned `ResultAlias` instead of caching it in `AAQueryInfo`.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98718
2021-04-09 13:26:09 +03:00