llvm-project

Author	SHA1	Message	Date
Arthur Eubanks	813a7f1ad7	[MemorySSA] Properly handle liveOnEntry in the walker printer Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109177	2021-09-02 12:51:27 -07:00
Arthur Eubanks	a270de359f	[test] Remove missed RUN line after D109040	2021-09-02 11:44:45 -07:00
Arthur Eubanks	50153213c8	[test][NewPM] Remove RUN lines using -analyze Only tests in llvm/test/Analysis. -analyze is legacy PM-specific. This only touches files with `-passes`. I looked through everything and made sure that everything had a new PM equivalent. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D109040	2021-09-02 11:38:14 -07:00
Nikita Popov	c86e1ce73b	[SCEVExpander] Simplify pointer overflow check This is a followup to D104662 to generate slightly nicer code for pointer overflow checks. Bypass expandAddToGEP and instead explicitly generate i8 GEPs. This saves some bitcasts and negates the value in a more obvious way. In particular, this prevents SCEV from looking through the umul.with.overflow, same as in the integer case. The wrapping-pointer-ni.ll test deserves a comment: Previously, this generated a typed GEP which used the umulo argument rather than the multiplication result. This results in more compact IR in that case, but effectively does the multiplication twice, the second one is just hidden in the GEP. Reusing the umulo result seems pretty reasonable to me. Differential Revision: https://reviews.llvm.org/D109093	2021-09-02 20:15:59 +02:00
Roman Lebedev	3f1f08f0ed	Revert @llvm.isnan intrinsic patchset. Please refer to https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html (and that whole thread.) TLDR: the original patch had no prior RFC, yet it had some changes that really need a proper RFC discussion. It won't be productive to discuss such an RFC, once it's actually posted, while said patch is already committed, because that introduces bias towards already-committed stuff, and the tree is potentially in broken state meanwhile. While the end result of discussion may lead back to the current design, it may also not lead to the current design. Therefore i take it upon myself to revert the tree back to last known good state. This reverts commit 4c4093e6e39fe6601f9c95a95a6bc242ef648cd5. This reverts commit 0a2b1ba33ae6dcaedb81417f7c4cc714f72a5968. This reverts commit d9873711cb03ac7aedcaadcba42f82c66e962e6e. This reverts commit 791006fb8c6fff4f33c33cb513a96b1d3f94c767. This reverts commit c22b64ef66f7518abb6f022fcdfd86d16c764caf. This reverts commit 72ebcd3198327da12804305bda13d9b7088772a8. This reverts commit 5fa6039a5fc1b6392a3c9a3326a76604e0cb1001. This reverts commit 9efda541bfbd145de90f7db38d935db6246dc45a. This reverts commit 94d3ff09cfa8d7aecf480e54da9a5334e262e76b.	2021-09-02 13:53:56 +03:00
David Sherwood	d581d94385	[SVE] Fix the FP arithmetic instruction costs for SVE Several FP instructions (fadd, fsub, etc.) were incorrectly assigned a higher cost for SVE because they have custom lowering, however we know they are legal. This patch explicitly assigns a cost of 2 to these opcodes. Tests added here: Analysis/CostModel/AArch64/arith-fp-sve.ll Differential Revision: https://reviews.llvm.org/D108993	2021-09-02 09:55:13 +01:00
Arthur Eubanks	1c503e923a	[test] Precommit/fix up existing test for MemorySSA/invariant.group	2021-09-01 22:58:17 -07:00
Arthur Eubanks	7b08d9da55	Reland [MemorySSA] Add pass to print results of MemorySSA walker Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109028	2021-09-01 18:58:57 -07:00
Arthur Eubanks	0f63496ea4	Revert "[MemorySSA] Add pass to print results of MemorySSA walker" This reverts commit 8f98477c2d2bcbf5b6aa36278b59bf2a861426a1. Breaks bots	2021-09-01 18:45:19 -07:00
Arthur Eubanks	8f98477c2d	[MemorySSA] Add pass to print results of MemorySSA walker Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109028	2021-09-01 18:29:15 -07:00
Philip Reames	29fa37ec9f	[SCEV] If max BTC is zero, then so is the exact BTC [2 of 2] This extends D108921 into a generic rule applied to constructing ExitLimits along all paths. The remaining paths (primarily howFarToZero) don't have the same reasoning about UB sensitivity as the howManyLessThan ones did. Instead, the remain cause for max counts being more precise than exact counts is that we apply context sensitive loop guards on the max path, and not on the exact path. That choice is mildly suspect, but out of scope of this patch. The MVETailPredication.cpp change deserves a bit of explanation. We were previously figuring out that two SCEVs happened to be equal because the happened to be identical. When we optimized one with context sensitive information, but not the other, we lost the ability to prove them equal. So, cover this case by subtracting and then applying loop guards again. Without this, we see changes in test/CodeGen/Thumb2/mve-blockplacement.ll Differential Revision: https://reviews.llvm.org/D109015	2021-09-01 11:51:48 -07:00
David Sherwood	f024a4818d	[NFC] Re-run update_analyze_test_checks on Analysis/CostModel/AArch64/sve-intrinsics.ll	2021-09-01 12:09:58 +01:00
David Sherwood	930d5077f4	Revert "[NFC] Re-run update_analyze_test_checks on Analysis/CostModel/AArch64/sve-intrinsics.ll" This reverts commit aeb2bd68dcb1df682ad549b4033cfad072efabd4.	2021-09-01 11:52:29 +01:00
David Sherwood	aeb2bd68dc	[NFC] Re-run update_analyze_test_checks on Analysis/CostModel/AArch64/sve-intrinsics.ll	2021-09-01 11:44:02 +01:00
Philip Reames	c49503a76d	[SCEV] Add a testcase for zero max btc with non-constant exact btc Reduced from the ArchiveCommandLine.ll case seen in D108848.	2021-08-31 11:00:41 -07:00
Philip Reames	6600e1759b	[SCEV] If max BTC is zero, then so is the exact BTC [1 of N] This patch is specifically the howManyLessThan case. There will be a couple of followon patches for other codepaths. The subtle bit is explaining why the two codepaths have a difference while both are correct. The test case with modifications is a good example, so let's discuss in terms of it. * The previous exact bounds for this example of (-126 + (126 smax %n))<nsw> can evaluate to either 0 or 1. Both are "correct" results, but only one of them results in a well defined loop. If %n were 127 (the only possible value producing a trip count of 1), then the loop must execute undefined behavior. As a result, we can ignore the TC computed when %n is 127. All other values produce 0. * The max taken count computation uses the limit (i.e. the maximum value END can be without resulting in UB) to restrict the bound computation. As a result, it returns 0 which is also correct. WARNING: The logic above only holds for a single exit loop. The current logic for max trip count would be incorrect for multiple exit loops, except that we never call computeMaxBECountForLT except when we can prove either a) no overflow occurs in this IV before exit, or b) this is the sole exit. An alternate approach here would be to add the limit logic to the symbolic path. I haven't played with this extensively, but I'm hesitant because a) the term is optional and b) I'm not sure it'll reliably simplify away. As such, the resulting code quality from expansion might actually get worse. This was noticed while trying to figure out why D108848 wasn't NFC, but is otherwise standalone. Differential Revision: https://reviews.llvm.org/D108921	2021-08-31 08:50:11 -07:00
Philip Reames	301fbf9b81	[SCEV] Clarify the overflow precondition of computeMaxBECountForLT [NFC] And add a test case to illustrate that we do in fact produce the right result for the multiple exit case. I have gotten myself confused at least three times when reading this code, so clarify to prevent future confusion.	2021-08-30 09:49:17 -07:00
Daniil Fukalov	5b3fad4966	[AMDGPU][CostModel] Update shuffle instruction tests. NFC. New tests ported over from test/Analysis/CostModel/AArch64/shuffle-other.ll.	2021-08-30 19:17:27 +03:00
Matthew Devereau	9b830c798e	[AArch64][SVE] Teach cost model masked gathers/scatters are cheap Tell the cost model to use the scalable calculation for non-neon fixed vector. This results in a cheaper cost for fixed-length SVE masked gathers/scatters allowing the vectorizor to emit them more frequently.	2021-08-26 11:17:47 +01:00
Philip Reames	4d235bf75d	[tests] Add a couple tests for intersection of ec8d87e and D108651	2021-08-24 14:29:36 -07:00
Philip Reames	ec8d87e9f5	[SCEV] Infer nuw from nw for addrecs This was previously committed in 914836b, and reverted due to confusion on the status of the review. Differential Revision: https://reviews.llvm.org/D108601	2021-08-24 14:24:05 -07:00
Philip Reames	35b0b1a64a	[test] Prcommit tests for D108651	2021-08-24 14:18:58 -07:00
Philip Reames	58582bae63	Revert "[SCEV] Infer nsw/nuw from nw for addrecs" This reverts commit 914836b1c8b36d4a317ef6c233746f6ec37b57a5. Further comments on review came up after initial approval. Reverting while addressing.	2021-08-24 09:28:37 -07:00
Philip Reames	914836b1c8	[SCEV] Infer nsw/nuw from nw for addrecs If we no an addrec doesn't self-wrap, the increment is strictly positive, and the start value is the smallest representable value, then we know that the corresponding wrap type can not occur. Differential Revision: https://reviews.llvm.org/D108601	2021-08-24 08:53:21 -07:00
Simon Pilgrim	9efda541bf	[CostModel][X86] Add costs for f32/f64 scalar and vector types. The f16 half types are still pretty useless as we don't have it as a legal type (we treat them as i16 most of the time)	2021-08-20 14:31:12 +01:00
Bjorn Pettersson	d52f506192	[NewPM] Use parameterized syntax for a couple of more passes A couple of passes that are parameterized in new-PM used different pass names (in cmd line interface) while using the same pass class name. This patch updates the PassRegistry to model pass parameters more properly using PASS_WITH_PARAMS. Reason for the change is to ensure that we have a 1-1 mapping between class name and pass name (when disregarding the params). With a 1-1 mapping it is more obvious which pass name to use in options such as -debug-only, -print-after etc. The opt -passes syntax is changed for the following passes: early-cse-memssa => early-cse<memssa> post-inline-ee-instrument => ee-instrument<post-inline> loop-extract-single => loop-extract<single> lower-matrix-intrinsics-minimal => lower-matrix-intrinsics<minimal> This patch is not updating pass names in docs/Passes.rst. Not quite sure what the status is for that document (e.g. when it comes to listing pass paramters). It is only loop-extract-single that is mentioned in Passes.rst today, out of the passes mentioned above. Differential Revision: https://reviews.llvm.org/D108362	2021-08-20 14:59:21 +02:00
Simon Pilgrim	72ebcd3198	[CostModel][X86] Add isnan half/float/double costs tests	2021-08-19 18:07:06 +01:00
Simon Pilgrim	9419729b6a	[CostModel][X86] Add VPOPCNTDQ/BITALG ctpop costs VPOPCNTDQ + BITALG add ctpop instructions for vXi64/vXi32 + vXi16/vXi8 vector types respectively	2021-08-19 15:40:09 +01:00
Simon Pilgrim	2d60fdd7aa	[CostModel][X86] Add VPOPCNT/BITALG test coverage for ctpop/cttz costs	2021-08-19 14:05:58 +01:00
Matthew Devereau	734708e04f	[AArch64][SVE] Teach cost model that masked loads/stores are cheap Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.	2021-08-19 13:01:33 +01:00
Peter Collingbourne	6f85225ef3	StackLifetime: Remove asserts for multiple lifetime intrinsics. According to the langref, it is valid to have multiple consecutive lifetime start or end intrinsics on the same object. For llvm.lifetime.start: "If ptr [...] is a stack object that is already alive, it simply fills all bytes of the object with poison." For llvm.lifetime.end: "Calling llvm.lifetime.end on an already dead alloca is no-op." However, we currently fail an assertion in such cases. I've observed the assertion failure when the loop vectorization pass duplicates the intrinsic. We can conservatively handle these intrinsics by ignoring all but the first one, which can be implemented by removing the assertions. Differential Revision: https://reviews.llvm.org/D108337	2021-08-18 18:45:28 -07:00
Nikita Popov	3dd8c9176b	[LICM] Remove AST-based implementation MSSA-based LICM has been enabled by default for a few years now. This drops the old AST-based implementation. Using loop(licm) will result in a fatal error, the use of loop-mssa(licm) is required (or just licm, which defaults to loop-mssa). Note that the core canSinkOrHoistInst() logic has to retain AST support for now, because it is shared with LoopSink. Differential Revision: https://reviews.llvm.org/D108244	2021-08-18 20:21:53 +02:00
David Sherwood	219d4518fc	[Analysis][AArch64] Make fixed-width ordered reductions slightly more expensive For tight loops like this: float r = 0; for (int i = 0; i < n; i++) { r += a[i]; } it's better not to vectorise at -O3 using fixed-width ordered reductions on AArch64 targets. Although the resulting number of instructions in the generated code ends up being comparable to not vectorising at all, there may be additional costs on some CPUs, for example perhaps the scheduling is worse. It makes sense to deter vectorisation in tight loops. Differential Revision: https://reviews.llvm.org/D108292	2021-08-18 17:01:56 +01:00
Dylan Fleming	ef198cd99e	[SVE] Remove usage of getMaxVScale for AArch64, in favour of IR Attribute Removed AArch64 usage of the getMaxVScale interface, replacing it with the vscale_range(min, max) IR Attribute. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D106277	2021-08-17 14:42:47 +01:00
Nikita Popov	735a590471	[MemorySSA] Remove -enable-mssa-loop-dependency option This option has been enabled by default for quite a while now. The practical impact of removing the option is that MSSA use cannot be disabled in default pipelines (both LPM and NPM) and in manual LPM invocations. NPM can still choose to enable/disable MSSA using loop vs loop-mssa. The next step will be to require MSSA for LICM and drop the AST-based implementation entirely. Differential Revision: https://reviews.llvm.org/D108075	2021-08-16 20:59:37 +02:00
Nikita Popov	e11354c0a4	[Tests] Remove explicit -enable-mssa-loop-dependency options (NFC) This is enabled by default. Drop explicit uses in preparation for removing the option. Also drop RUN lines that are now the same (typically modulo a -verify-memoryssa option).	2021-08-14 21:21:07 +02:00
Florian Hahn	f999312872	Recommit "[Matrix] Overload stride arg in matrix.columnwise.load/store." This reverts the revert 28c04794df74ad3c38155a244729d1f8d57b9400. The failing MLIR test that caused the revert should be fixed in this version. Also includes a PPC test fix previously in 1f87c7c478a6.	2021-08-12 18:31:57 +01:00
Florian Hahn	a72cd6353c	Revert "[Matrix] Update column.major.load call in PPC test." Dependent commit a1ef81de35a4 has been reverted in a1ef81de35a4.	2021-08-12 13:13:52 +01:00
Florian Hahn	1f87c7c478	[Matrix] Update column.major.load call in PPC test. a1ef81de35a4bac6d3 adjusted the definition of the intrinsic, but did not update a PowerPC test. Fix the test by updating the call & declaration of @llvm.matrix.column.major.load.	2021-08-12 11:26:33 +01:00
Archibald Elliott	b764b1ef2f	[NFC][X86] New Test Requires Asserts D105263 introduced this new test. It fails when asserts are disabled, due to using a debug option on opt. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D107805	2021-08-10 10:22:04 +01:00
Wang, Pengfei	6f7f5b54c8	[X86] AVX512FP16 instructions enabling 1/6 1. Enable FP16 type support and basic declarations used by following patches. 2. Enable new instructions VMOVW and VMOVSH. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105263	2021-08-10 12:46:01 +08:00
Nikita Popov	88003cea1c	[MemCpyOpt] Remove MemDepAnalysis-based implementation The MemorySSA-based implementation has been enabled for a few months (since D94376). This patch drops the old MDA-based implementation entirely. I've kept this to only the basic cleanup of dropping various conditions -- the code could be further cleaned up now that there is only one implementation. Differential Revision: https://reviews.llvm.org/D102113	2021-08-07 22:35:44 +02:00
Zheng Chen	30b0c455b1	[LoopCacheAnalysis]: handle mismatch type for Numerator and CacheLineSize fix an assertion due to mismatch type for Numerator and CacheLineSize in loop cache analysis pass. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D107618	2021-08-06 16:51:09 +00:00
David Green	649cf4514d	[AArch64] Expand the SVE min/max reduction costs to NEON This takes the existing SVE costing for the various min/max reduction intrinsics and expands it to NEON, where I believe it applies equally well. In the process it changes the lowering to use min/max cost, as opposed to summing up the cost of ICmp+Select. Differential Revision: https://reviews.llvm.org/D106239	2021-08-05 23:23:24 +01:00
Bardia Mahjour	0e08891ec1	[DA] control compile-time spent by MIV tests Function exploreDirections() in DependenceAnalysis implements a recursive algorithm for refining direction vectors. This algorithm has worst-case complexity of O(3^(n+1)) where n is the number of common loop levels. In this patch I'm adding a threshold to control the amount of time we spend in doing MIV tests (which most of the time end up resulting in over pessimistic direction vectors anyway). Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D107159	2021-08-05 09:50:11 -04:00
Irina Dobrescu	b01417d3c5	[AArch64] Optimise min/max lowering in ISel Differential Revision: https://reviews.llvm.org/D106561	2021-08-02 13:40:21 +01:00
Sjoerd Meijer	46a861af3d	[CostModel][AArch64] Add some shuffle concat tests. NFC. Test ported over from test/Analysis/CostModel/ARM/shuffle.ll.	2021-08-02 12:11:00 +01:00
Simon Pilgrim	872a950033	[CostModel] Treat 'widen subvector' patterns as zero cost As discussed on D107228, widening a subvector by inserting the whole subvector into the bottom a larger undef vector should always be cheap enough that we can treat it as zero cost. NOTE: If this proves to cause issues we have the option of introducing a "SK_WidenSubvector" shuffle kind enum that targets could override the zero cost, but that doesn't seem necessary atm. Differential Revision: https://reviews.llvm.org/D107228	2021-08-02 11:43:10 +01:00
Simon Pilgrim	7397dcb403	[TTI] Add basic SK_InsertSubvector shuffle mask recognition This patch adds an initial ShuffleVectorInst::isInsertSubvectorMask helper to recognize 2-op shuffles where the lowest elements of one of the sources are being inserted into the "in-place" other operand, this includes "concat_vectors" patterns as can be seen in the Arm shuffle cost changes. This also helped fix a x86 issue with irregular/length-changing SK_InsertSubvector costs - I'm hoping this will help with D107188 This doesn't currently attempt to work with 1-op shuffles that could either be a "widening" shuffle or a self-insertion. The self-insertion case is tricky, but we currently always match this with the existing SK_PermuteSingleSrc logic. The widening case will be addressed in a follow up patch that treats the cost as 0. Masks with a high number of undef elts will still struggle to match optimal subvector widths - its currently bounded by minimum-width possible insertion, whilst some cases would benefit from wider (pow2?) subvectors. Differential Revision: https://reviews.llvm.org/D107228	2021-08-02 11:23:44 +01:00
David Green	098984a80c	[AArch64] Update and expand min-max cost model test. NFC This expands the cost model test for min/max to many more types, including floating point minnum/maxnum and minimum/maximum, and FP16 with and without fullfp16. The old llc run lines are removed, as those are better tested by CodeGen tests.	2021-07-27 18:48:58 +01:00

1 2 3 4 5 ...

2887 Commits