llvm-project

Author	SHA1	Message	Date
Nikita Popov	04b717c423	[TLI] Check that malloc argument has type size_t DSE assumes that this is the case when forming a calloc from a malloc + memset pair. For tests, either update the malloc signature or change the data layout.	2022-03-14 17:22:24 +01:00
Florian Hahn	d03d3d7966	[DSE] Fall back to CFG scan for unreachable terminators. Blocks with UnreachableInst terminators are considered as root nodes in the PDT. This pessimize DSE, if there are no aliasing reads from the potentially dead store and the block with the unreachable terminator. If any of the root nodes of the PDF has UnreachableInst as terminator, fall back to the CFG scan, even the common dominator of all killing blocks does not post-dominate the block with potentially dead store. It looks like the compile-time impact for the extra scans is negligible. https://llvm-compile-time-tracker.com/compare.php?from=779bbbf27fe631154bdfaac7a443f198d4654688&to=ac59945f1bec1c6a7d7f5590c8c69fd9c5369c53&stat=instructions Fixes #53800. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119760	2022-02-16 14:06:40 +00:00
Florian Hahn	48f1884333	[DSE] Add additional tests with unreachable exits. Adds tests for #53800.	2022-02-14 14:28:49 +00:00
Nikita Popov	31c1842a7b	[DSE] Add test with sret argument (NFC)	2022-01-26 14:25:31 +01:00
Nikita Popov	26f81984e7	[DSE] Handle inaccessiblememonly calloc Change the DSE calloc handling to assume that it is inaccessiblememonly, i.e. the defining access is liveOnEntry. Differential Revision: https://reviews.llvm.org/D117543	2022-01-19 12:55:09 +01:00
Philip Reames	7ac65f6b2e	[tests] Add coverage of writeonly attribute and operand bundle intersection	2022-01-18 12:08:14 -08:00
Florian Hahn	e3275cfa94	[BuildLibCalls] Add nounwind,willreturn to memset_pattern{4,8,16}. Similar to memset, memset_pattern{4,8,16} all will return and do not unwind. Use fallthrough to include all attributes also set for memset. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D114904	2022-01-12 10:32:53 +00:00
Nikita Popov	3cef3cf02f	[DSE] Check for noalias calls rather than alloc functions For these "visible on unwind/ret" checks we only care about the fact that no other code has access to the pointer (unless it escapes). A noalias call is sufficient for this, it does not have to be a known allocation function. This is basically the same change as D116728, but for DSE rather than LICM.	2022-01-11 12:22:16 +01:00
Nikita Popov	3d5179febe	[DSE] Add additional tests for noalias calls (NFC) Currently this is special-cased to TLI alloc functions only.	2022-01-11 12:09:54 +01:00
Nikita Popov	fe2c4af905	[DSE] Make test more robust (NFC) If the allocation is not captured, then all the stores before the ret are dead anyway.	2022-01-11 11:49:52 +01:00
Nikita Popov	b5a2627423	[DSE] Fix DSE test to use non-extern global (NFC) The intended transform is not legal with an extern global, because the actual global defined in a different TU might have larger size. Make it non-extern to show that the desired transform already works.	2022-01-03 09:38:04 +01:00
Nikita Popov	3478d64ee4	[DSE] Check for whole object overwrite even if dead store size not known If the killing store overwrites the whole object, we know that the preceding store is dead, regardless of the accessed offset or size. This case was previously only handled if the size of the dead store was also known. This allows us to perform conventional DSE for calls that write to an argument (but without known size). Differential Revision: https://reviews.llvm.org/D116267	2022-01-03 09:36:44 +01:00
Nikita Popov	eb91d91b7a	[DSE] Fix typo in recent commit This fixes a typo in 81d69e1bda9e4b6a83f29ba1f614e43ab4700972. Of course we should only skip the particular store if it isn't removable, not bail out of the whole loop. Add a test to cover this case.	2021-12-24 11:25:25 +01:00
Nikita Popov	ae64c5a0fd	[DSE][MemLoc] Handle intrinsics more generically Remove the special casing for intrinsics in MemoryLocation::getForDest() and handle them through the general attribute based code. On the DSE side, this means that isRemovable() now needs to handle more than a hardcoded list of intrinsics. We consider everything apart from volatile memory intrinsics and lifetime markers to be removable. This allows us to perform DSE on intrinsics that DSE has not been specially taught about, using a matrix store as an example here. There is an interesting test change for invariant.start, but I believe that optimization is correct. It only looks a bit odd because the code is immediate UB anyway. Differential Revision: https://reviews.llvm.org/D116210	2021-12-24 09:29:57 +01:00
Nikita Popov	58ad3428d1	[DSE] Add test for matrix store (NFC)	2021-12-23 09:44:01 +01:00
Nikita Popov	f8042492fe	[DSE] Regenerate test checks (NFC)	2021-12-23 09:31:44 +01:00
Marianne Mailhot-Sarrasin	90d1786ba0	[DSE] Fix invalid removal of store instruction Fix handling of alloc-like instructions in isGuaranteedLoopInvariant(). It was not valid when the 'KillingDef' was outside of the loop, while the 'CurrentDef' was inside the loop. In that case, the 'KillingDef' only overwrites the definition from the last iteration of the loop, and not the ones of all iterations. Therefor it does not make the 'CurrentDef' to be dead, and must not remove it. Fixing issue : https://github.com/llvm/llvm-project/issues/52774 Reviewed by: Florian Hahn Differential revision: https://reviews.llvm.org/D115965	2021-12-22 16:11:23 -05:00
Marianne Mailhot-Sarrasin	df590567aa	[DSE] Add test case showing bug PR52774. Pre-commiting the test case before the bug fix. Reviewed by: Florian Hahn Differential revision: https://reviews.llvm.org/D115965	2021-12-22 14:57:28 -05:00
Nikita Popov	8a0e35f3a7	[MemoryLocation] Don't require nocapture in getForDest() As reames mentioned on related reviews, we don't need the nocapture requirement here. First of all, from an API perspective, this is not something that MemoryLocation::getForDest() should be checking in the first place, because it does not affect which memory this particular call can access; it's an orthogonal concern that should be handled by the caller if necessary. However, for both of the motivating users in DSE and InstCombine, we don't need the nocapture requirement, because the capture can either be purely local to the call (a pointer identity check that is irrelevant to us), be part of the return value (which we check is unused), or be written in the dest location, which we have determined to be dead. This allows us to remove the special handling for libcalls as well. Differential Revision: https://reviews.llvm.org/D116148	2021-12-22 12:20:13 +01:00
Philip Reames	44d23d5345	[DSE] Remove calls with known writes to dead memory This is a reapply of a8a51fe5, which was reverted in 1ba99e due to a failing compiler-rt test. That test was a false positive because it was checking asan failures not accounting for the fact the call could be validly optimized out. I hopefully managed to stablize that test in 9b955f. (That's a speculative fix due to disk consumption needed to build compiler-rt tests locally being absurd.) Original commit message follows.. The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size. Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet). In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest. Differential Revision: https://reviews.llvm.org/D115904	2021-12-20 18:10:23 -08:00
Nikita Popov	1ba99eaf70	Revert "[DSE] Remove calls with known writes to dead memory" This reverts commit a8a51fe55649f5e07f9f2973507dc20bc4e40765. This breaks the strncpy-overflow.cpp test case.	2021-12-18 09:23:41 +01:00
Philip Reames	a8a51fe556	[DSE] Remove calls with known writes to dead memory The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size. Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet). In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest. Differential Revision: https://reviews.llvm.org/D115904	2021-12-17 13:42:36 -08:00
Philip Reames	d9d6e6a048	[tests] Precommit tests from D115904	2021-12-17 12:42:51 -08:00
Florian Hahn	ddfac0759c	Revert "[MemoryLocation] Handle memset_pattern{4,8,16} in getForDest." This reverts commit ac60263ad173dbd2eba6e0c8d892d8c3dcc5306c. It looks like the test fails on certain non-Darwin system, even though the triple is explicitly set to macos. Revert while I investigate.	2021-12-14 14:48:47 +00:00
Florian Hahn	ac60263ad1	[MemoryLocation] Handle memset_pattern{4,8,16} in getForDest. memset_pattern{4,8,16} writes to the first argument. Use getForDest to return the corresponding MemoryLocation. Reviewed By: ab Differential Revision: https://reviews.llvm.org/D114906	2021-12-14 14:41:28 +00:00
Florian Hahn	4a419ea400	[DSE] Add additional memset_chk tests.	2021-12-06 13:06:11 +00:00
Florian Hahn	829b29b619	[MemoryLocation] strcat/strncat/strcpy read/write after their args. strcpy/strcat/strncat access memory starting from the passed in pointers. Construct memory locations for their args using getAfter. Discussed in D114872. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D114969	2021-12-03 08:48:23 +00:00
Florian Hahn	68782a860d	[DSE] Read after strcpy test.	2021-12-02 17:37:59 +00:00
Florian Hahn	5fe151f98f	[DSE] Add libcall tests for functions only available on Darwin. Add a set of tests for memset_pattern{4,8,16} variants.	2021-12-01 20:30:15 +00:00
Florian Hahn	7de410440d	[DSE] Allow DSE to optimize MemorySSA by default. This allows for better optimization of 'stores-of-existing-values' and possibly helps passes further down the pipeline. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D113712	2021-12-01 08:29:23 +00:00
Florian Hahn	c9ad356266	[DSE] Use optimized access if available for redundant store elimination. Using the optimized access enables additional optimizations in cases where the defining access is a non-aliasing store. Alternatively we could also walk upwards and skip non-aliasing defs here, but my experiments so far showed that this will noticeably increase compile-time for little extra gain compared to just using the optimized access. Improvements of dse.NumRedundantStores on MultiSource/CINT2006/CPF2006 on X86 with -O3: test-suite...-typeset/consumer-typeset.test 1.00 76.00 7500.0% test-suite.../Benchmarks/Bullet/bullet.test 3.00 12.00 300.0% test-suite...006/453.povray/453.povray.test 3.00 6.00 100.0% test-suite...telecomm-gsm/telecomm-gsm.test 1.00 2.00 100.0% test-suite...ediabench/gsm/toast/toast.test 1.00 2.00 100.0% test-suite...marks/7zip/7zip-benchmark.test 1.00 2.00 100.0% test-suite...ications/JM/lencod/lencod.test 7.00 10.00 42.9% test-suite...6/464.h264ref/464.h264ref.test 6.00 8.00 33.3% test-suite...ications/JM/ldecod/ldecod.test 6.00 7.00 16.7% test-suite...006/447.dealII/447.dealII.test 33.00 33.00 0.0% test-suite...6/471.omnetpp/471.omnetpp.test NaN 1.00 nan% test-suite...006/450.soplex/450.soplex.test NaN 2.00 nan% test-suite.../CINT2006/403.gcc/403.gcc.test NaN 7.00 nan% test-suite...lications/ClamAV/clamscan.test NaN 1.00 nan% test-suite...CI_Purple/SMG2000/smg2000.test NaN 3.00 nan% Follow-up to D111727. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112315	2021-11-30 15:40:14 +00:00
Florian Hahn	41d59a3645	[DSE] Add memset_chk tests.	2021-11-30 13:50:10 +00:00
Zarko Todorovski	7f7dac7126	[NFC][llvm] Inclusive language: reword uses of sanity test and check Part of continuing work to use more inclusive language. Reworded uses of sanity check and sanity test in llvm/test/	2021-11-25 07:21:42 -05:00
Fabian Wolff	7eec832def	[DSE] Improve handling of `strncpy` in Dead Store Elimination Fixes PR#52062 and one of the remaining cases of PR#47644. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D114035	2021-11-19 17:46:29 +00:00
Fabian Wolff	ffe1741b5c	[DSE] Add additional strncpy tests. Test for PR#52062 and one of the remaining cases of PR#47644.	2021-11-19 16:18:54 +00:00
Florian Hahn	9c00afe926	[DSE] Add test case with multiple inbounds stores, followed by OOB. This patch extends the existing out-of-bounds store tests with a case with a bigger object and multiple inbounds stores, followed by an OOB store. The OOB store is not used to remove the inbounds stores in this case at the moment.	2021-11-12 09:40:03 +00:00
Florian Hahn	274a9b0f0b	[DSE] Support redundant stores eliminated by memset. This patch adds support to remove stores that write the same value as earlier memesets. It uses isOverwrite to check that a memset completely overwrites a later store. The candidate store must store the same bytewise value as the byte stored by the memset. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112321	2021-10-29 22:19:53 +01:00
Dawid Jurczak	f87e0c68d7	[DSE] Eliminates redundant store of an exisiting value (PR16520) That's https://reviews.llvm.org/D90328 follow-up. This change eliminates writes to variables where the value that is being written is already stored in the variable. This achieves the goal by looping through all memory definitions in the current state and getting defining access from each of them. When there is defining access where the write instruction is identical to the original instruction it will remove this redundant write. For example: void f() { x = 1; if foo() { x = 1; g(); } else { h(); } } void g(); void h(); The second x=1 will be eliminated since it is rewriting 1 to x. This pass will produce this: void f() { x = 1; if foo() { g(); } else { h(); } } void g(); void h(); Differential Revision: https://reviews.llvm.org/D111727	2021-10-28 16:20:09 +02:00
Florian Hahn	1a2a7cca3e	[DSE] Add test case with 2 memcpys that should not be eliminated.	2021-10-27 11:15:58 +01:00
Florian Hahn	286e98b97e	[DSE] Add test cases with more complex redundant stores. This patch adds more complex test cases with redundant stores of an existing memset, with other stores in between. It also makes a few of the existing tests more robust.	2021-10-22 13:50:32 +01:00
Dávid Bolvanský	93fd30a163	[NFC] Added test for PR50339	2021-10-13 12:15:57 +02:00
Dávid Bolvanský	005b715b54	[NFC] Added test for PR49927	2021-10-13 12:15:57 +02:00
Dawid Jurczak	9e65929a8e	[DSE] Re-enable calloc transformation with extra care (PR25892) Transformation from malloc+memset to calloc is always correct and in many situations it brings significant observable benefits in terms of execution speed and memory consumption [1][2]. Unfortunately there are cases when producing calloc cause performance drops [3]. As discussed here: https://reviews.llvm.org/D103009 it's possible to differentiate between those 2 scenarios. If optimizer is able to prove that after malloc call it's _very_ likely to reach memset branch then after calloc emission we shouldn't observe any performance hits. Therefore finding "null pointer check" pattern before memset basic block sounds like good justification for performing transformation. Also that method was already suggested by GCC folks [4]. Main reason for change is that for now to be safe we check for post dominance relation which is way too conservative approach making transformation "almost" disabled in practice. This patch tends to enable transformation again but with extra care. [1] https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc [2] https://vorpus.org/blog/why-does-calloc-exist/ [3] http://smalldatum.blogspot.com/2017/11/a-new-optimization-in-gcc-5x-and-mysql.html [4] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83022 Differential Revision: https://reviews.llvm.org/D110021	2021-10-10 21:47:14 +02:00
Nikita Popov	ba664d9066	[AA] Move earliest escape tracking from DSE to AA This is a followup to D109844 (and alternative to D109907), which integrates the new "earliest escape" tracking into AliasAnalysis. This is done by replacing the pre-existing context-free capture cache in AAQueryInfo with a replaceable (virtual) object with two implementations: The SimpleCaptureInfo implements the previous behavior (check whether object is captured at all), while EarliestEscapeInfo implements the new behavior from DSE. This combines the "earliest escape" analysis with the full power of BasicAA: It subsumes the call handling from D109907, considers a wider range of escape sources, and works with AA recursion. The compile-time cost is slightly higher than with D109907. Differential Revision: https://reviews.llvm.org/D110368	2021-09-25 22:40:41 +02:00
Nikita Popov	327bbbb10b	[DSE] Make capture check more precise It is sufficient that the object has not been captured before the load that produces the pointer we're loading. A capture after that can not affect the already loaded pointer. This is small part of D110368 applied separately.	2021-09-25 22:23:19 +02:00
Nikita Popov	7774166499	[DSE] Add additional capture tests (NFC) These test other escape sources and the case of multiple underlying objects.	2021-09-24 21:13:29 +02:00
Florian Hahn	6f28fb7081	Recommit "[DSE] Track earliest escape, use for loads in isReadClobber." This reverts the revert commit df56fc6ebbee6c458b0473185277b7860f7e3408. This version of the patch adjusts the location where the EarliestEscapes cache is cleared when an instruction gets removed. The earliest escaping instruction does not have to be a memory instruction. It could be a ptrtoint instruction like in the added test @earliest_escape_ptrtoint, which subsequently gets removed. We need to invalidate the EarliestEscape entry referring to the ptrtoint when deleting it. This fixes the crash mentioned in https://bugs.chromium.org/p/chromium/issues/detail?id=1252762#c6	2021-09-24 17:13:27 +01:00
Nico Weber	df56fc6ebb	Revert "[DSE] Track earliest escape, use for loads in isReadClobber." This reverts commit 5ce89279c0986d0bcbe526dce52f91dd0c16427c. Makes clang crash, see comments on https://reviews.llvm.org/D109844	2021-09-24 09:57:59 -04:00
Florian Hahn	5ce89279c0	[DSE] Track earliest escape, use for loads in isReadClobber. At the moment, DSE only considers whether a pointer may be captured at all in a function. This leads to cases where we fail to remove stores to local objects because we do not check if they escape before potential read-clobbers or after. Doing context-sensitive escape queries in isReadClobber has been removed a while ago in d1a1cce5b130 to save compile-time. See PR50220 for more context. This patch introduces a new capture tracker, which keeps track of the 'earliest' capture. An instruction A is considered earlier than instruction B, if A dominates B. If 2 escapes do not dominate each other, the terminator of the common dominator is chosen. If not all uses cannot be analyzed, the earliest escape is set to the first instruction in the function entry block. If the query instruction dominates the earliest escape and is not in a cycle, then pointer does not escape before the query instruction. This patch uses this information when checking if a load of a loaded underlying object may alias a write to a stack object. If the stack object does not escape before the load, they do not alias. I will share a follow-up patch to also use the information for call instructions to fix PR50220. In terms of compile-time, the impact is low in general, NewPM-O3: +0.05% NewPM-ReleaseThinLTO: +0.05% NewPM-ReleaseLTO-g: +0.03 with the largest change being tramp3d-v4 (+0.30%) http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions Compared to always computing the capture information on demand, we get the following benefits from the caching: NewPM-O3: -0.03% NewPM-ReleaseThinLTO: -0.08% NewPM-ReleaseLTO-g: -0.04% The biggest speedup is tramp3d-v4 (-0.21%). http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions Overall there is a small, but noticeable benefit from caching. I am not entirely sure if the speedups warrant the extra complexity of caching. The way the caching works also means that we might miss a few cases, as it is less precise. Also, there may be a better way to cache things. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109844	2021-09-23 12:45:05 +01:00
Florian Hahn	963d3a22b3	[DSE] Add additional tests to cover review comments. Adds additional tests following comments from D109844. Also removes unusued in.ptr arguments and places in the call tests that used loads instead of a getval call.	2021-09-20 17:06:04 +01:00

1 2 3 4 5 ...

408 Commits