llvm-project

Author	SHA1	Message	Date
David Green	4e37e32563	[ARM] Update test comments after D114018. NFC The TODOs are now fixed as of D114018, so can be removed. Which is nice.	2021-11-16 22:48:45 +00:00
Philip Reames	8d85e945b2	[SCEV] Canonicalize X - urem X, Y patterns There are multiple possible ways to represent the X - urem X, Y pattern. SCEV was not canonicalizing, and thus, depending on which you analyzed, you could get different results. The sub representation appears to produce strictly inferior results in practice, so I decided to canonicalize to the Y * X/Y version. The motivation here is that runtime unroll produces the sub X - (and X, Y-1) pattern when Y is a power of two. SCEV is thus unable to recognize that an unrolled loop exits because we don't figure out that the new unrolled step evenly divides the trip count of the unrolled loop. After instcombine runs, we convert the the andn form which SCEV recognizes, so essentially, this is just fixing a nasty pass ordering dependency. The ARM loop hardware interaction in the test diff is opague to me, but the comments in the review from others knowledge of the infrastructure appear to indicate these are improvements in loop recognition, not regressions. Differential Revision: https://reviews.llvm.org/D114018	2021-11-16 11:59:21 -08:00
Arthur Eubanks	c95a9f46c9	[Loads] Handle addrspacecast constant expressions when determining dereferenceability Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D114015	2021-11-16 11:17:57 -08:00
Arthur Eubanks	877d6e9b9a	[test] Precommit test for D114015	2021-11-16 11:07:20 -08:00
Sanjay Patel	8fce94f916	[InstCombine] canonicalize icmp with trunc op into mask and cmp, part 2 If C is a high-bit mask: (trunc X) u< C --> (X & C) != C (are any masked-high-bits clear?) If C is low-bit mask: (trunc X) u> C --> (X & ~C) != 0 (are any masked-high-bits set?) If C is not-of-power-of-2 (one clear bit): (trunc X) u> C --> (X & (C+1)) == C+1 (are all masked-high-bits set?) This extends the fold added with: acabad9ff6bf (https://alive2.llvm.org/ce/z/aFr7qV) Using decomposeBitTestICmp() to generalize this is a planned follow-up, but that requires removing an inverse fold. Here are Alive2 generalizations for these folds: https://alive2.llvm.org/ce/z/u-ZpC_ (ult, the previous patch) https://alive2.llvm.org/ce/z/YsuAu2 (ult, this patch) https://alive2.llvm.org/ce/z/ekktQP (ugt, low bitmask) https://alive2.llvm.org/ce/z/pJY9wR (ugt, one clear bit) Differential Revision: https://reviews.llvm.org/D112634	2021-11-16 09:27:30 -05:00
Alexey Bataev	900cc1a226	[SLP]Improve cost of the gather nodes. No need to count the final shuffle cost for the constants, gathering of the constants is just a constant vector + extra inserts, if required. Differential Revision: https://reviews.llvm.org/D113770	2021-11-16 06:25:07 -08:00
Alexey Bataev	51c0b6843a	[SLP][NFC]Add more tests for shuffles that can be optimized after SLP, NFC.	2021-11-16 05:42:18 -08:00
Sander.DeSmalen@arm.com	305816ff1e	[IndVarSimplify] Reduce nondeterministic behaviour in visitIVCast. rGf39978b84f1d3a1da6c32db48f64c8daae64b3ad led to and/or exposed an issue with IndVarSimplification for a loop where a i32 phi node is no longer replaced by a widened (i64) phi node, because the SCEVs of a sign-extend no longer folded the same way. I'm unsure how to properly explain this because it's all rather complicated, but in short: SCEVs don't fold as nicely as they used to and this caused a difference. While investigating this, I found that IndVarSimplify can actually optimise the case in the way we want to if it chooses the widened IV to be 'signed' (the i32 IV is both sign and zero-extended). Oddly enough, there is some level of indeterminism in the way the algorithm works, it just picks the sign of the 'first' zext/sext user, where the order of the users-iterator is not guaranteed to be the same on each invocation of the pass (e.g. shown by first running loop-rotate, which puts the users in a different order). While I think the fix is valid in the sense that consistently picking _any_ order is better than having an nondeterministic order, I can use a bit of advice from people more familiar in this area of the code-base. For example, I'm not sure if this fix is hiding another issue where the IndVarSimplify pass could actually draw the same conclusions (i.e. that it only needs an i64 phi node) if it does a bit more work, regardless of whether it chooses the induction variable to be signed or unsigned. I'm also not sure if choosing signed is better than unsigned, or whether that just happens to be beneficial only in this individual case. Any feedback would be much appreciated! Reviewed By: reames Differential Revision: https://reviews.llvm.org/D112573	2021-11-16 12:41:04 +00:00
Matt Devereau	f526c600c0	[AArch64][SVE] Instcombine SVE LD1/ST1 to stock LLVM IR InstCombine AArch64 LD1/ST1 to llvm.masked.load/llvm.masked.store and LD1/ST1 to load/store when a ptrue all predicate pattern operand is present. This allows existing IR optimizations such as dead-load removal to occur. Differential Revision: https://reviews.llvm.org/D113489	2021-11-16 11:10:23 +00:00
David Green	309f1e4ac8	[ARM] Add datalayout to costmodel tests. NFC This adds a sensible datalayout to the ARM cost model tests, to prevent the costs reported being incorrect for the size of pointers.	2021-11-16 09:49:42 +00:00
Mehrnoosh Heidarpour	62c51a72f9	[InstSimplify] Fold A\|B \| (A^B) --> A\|B This patch adds the following fold opportunity: A\|B \| (A^B) --> A\|B that is reported here : https://bugs.llvm.org/show_bug.cgi?id=52479 https://alive2.llvm.org/ce/z/33-My- Test cases with base results are added in D113860 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D113861	2021-11-15 18:55:04 -05:00
Stanislav Mekhanoshin	833cdb0a07	Revert "[InstSimplify] Fold A\|B \| (A^B) --> A\|B" This reverts commit 193c40e9667ca2b173232b393fc72ea9e4944aa3.	2021-11-15 14:56:20 -08:00
Arthur Eubanks	19867de9e7	[NewPM] Only invalidate modified functions' analyses in CGSCC passes + turn on eagerly invalidate analyses Previously, any change in any function in an SCC would cause all analyses for all functions in the SCC to be invalidated. With this change, we now manually invalidate analyses for functions we modify, then let the pass manager know that all function analyses should be preserved since we've already handled function analysis invalidation. So far this only touches the inliner, argpromotion, function-attrs, and updateCGAndAnalysisManager(), since they are the most used. This is part of an effort to investigate running the function simplification pipeline less on functions we visit multiple times in the inliner pipeline. However, this causes major memory regressions especially on larger IR. To counteract this, turn on the option to eagerly invalidate function analyses. This invalidates analyses on functions immediately after they're processed in a module or scc to function adaptor for specific parts of the pipeline. Within an SCC, if a pass only modifies one function, other functions in the SCC do not have their analyses invalidated, so in later function passes in the SCC pass manager the analyses may still be cached. It is only after the function passes that the eager invalidation takes effect. For the default pipelines this makes sense because the inliner pipeline runs the function simplification pipeline after all other SCC passes (except CoroSplit which doesn't request any analyses). Overall this has mostly positive effects on compile time and positive effects on memory usage. https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=instructions https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=max-rss D113196 shows that we slightly regressed compile times in exchange for some memory improvements when turning on eager invalidation. D100917 shows that we slightly improved compile times in exchange for major memory regressions in some cases when invalidating less in SCC passes. Turning these on at the same time keeps the memory improvements while keeping compile times neutral/slightly positive. Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D113304	2021-11-15 14:44:53 -08:00
Stanislav Mekhanoshin	193c40e966	[InstSimplify] Fold A\|B \| (A^B) --> A\|B This patch adds the following fold opportunity: A\|B \| (A^B) --> A\|B that is reported here : https://bugs.llvm.org/show_bug.cgi?id=52479 https://alive2.llvm.org/ce/z/33-My- Test cases with base results are added in D113860 (authored by MehrHeidar, committed by rampitec). Differential Revision: https://reviews.llvm.org/D113861	2021-11-15 13:49:20 -08:00
Philip Reames	da327e7290	Fix a misleading FIXME in an unroll test	2021-11-15 12:20:10 -08:00
Alexey Bataev	2d0cab9d3d	[SLP][NFC]Add a test for extra shuffle emission, NFC.	2021-11-15 12:14:43 -08:00
Mehrnoosh Heidarpour	7daa95c8fa	[InstCombine] Fold (A^B)\|~A-->~(A&B) https://alive2.llvm.org/ce/z/2v6rhF Fixes: https://llvm.org/PR52478 Differential Revision: https://reviews.llvm.org/D113783	2021-11-15 12:29:37 -05:00
Alexey Bataev	036207d5f2	[SLP]Improve splat detection. A bunch of scalars can be treated as a splat not only if all elements are the same but also if some of them are undefvalues. Differential Revision: https://reviews.llvm.org/D113774	2021-11-15 07:50:34 -08:00
Mehrnoosh Heidarpour	a7f7cf115b	[NFC][InstSimplify] add test cases with base results for or-xor fold This patch adds tests with baseline results as a pre-commit for D113861 Differential Revision: https://reviews.llvm.org/D113860	2021-11-15 10:08:31 -05:00
Alexey Bataev	6fb5bed7d1	[SLP]Do not create unused gather nodes for scalar arguments of vector intrinsics. If the vector intrinsic has scalar argument, we currently still create a tree entry for this argument. This entry is not used, just consumes resources and increases the cost of the tree. Differential Revision: https://reviews.llvm.org/D113806	2021-11-15 06:11:19 -08:00
Florian Hahn	112c1c346a	[IVDescriptor] Make sure the sign is included for negative extension. At the moment, computeRecurrenceType does not include any sign bits in the maximum bit width. If the value can be negative, this means the sign bit will be missing and the sext won't properly extend the value. If the value can be negative, increment the bitwidth by one to make sure there is at least one sign bit in the result value. Note that the increment is also needed if the value is known to be negative, as a sign bit needs to be preserved for the sext to work. Note that this at the moment prevents vectorization, because the analysis computes i1 as type for the recurrence when looking through the AND in lookThroughAnd. Fixes PR51794, PR52485. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D113056	2021-11-15 13:12:57 +00:00
Simon Pilgrim	fbe72e41b9	[LoopVectorize] Add PR41179 test case	2021-11-14 21:54:23 +00:00
Mehrnoosh Heidarpour	b69dc2d180	[InstCombine] add tests for or-xor logic fold; NFC Baseline tests for D113783 Differential Revision: https://reviews.llvm.org/D113846	2021-11-14 11:20:32 -05:00
Roman Lebedev	2c91f48c48	[NFC][SROA] Revisit test coverage in non-capturing-call.ll	2021-11-14 15:29:32 +03:00
David Green	355ee18c5d	[TypePromotion] Extend TypePromotion::isSafeWrap This modifies the preconditions of TypePromotion's isSafeWrap method, to allow it to work from all constants from the ICmp. Using the code: %a = add %x, C1 %c = icmp ult %a, C2 According to Alive, we can prove that is equivalent to icmp ult (add zext(%x), sext(C1)), zext(C2) given C1 <=s 0 and C1 >s C2. https://alive2.llvm.org/ce/z/CECYZB Which is similar to what is already present. We can also prove icmp ult (add zext(%x), sext(C1)), sext(C2) given C1 <=s 0 and C1 <=s C2. https://alive2.llvm.org/ce/z/KKgyeL The PrepareWrappingAdds method was removed, and the constants are now altered to sext or zext directly as required by the above methods. Differential Revision: https://reviews.llvm.org/D113678	2021-11-14 11:18:31 +00:00
Philip Reames	37ead201e6	[runtime-unroll] Use incrementing IVs instead of decrementing ones This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing. Why does this matter? A couple of reasons: * SCEV doesn't have a native subtract node. Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such. As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones. (You can see this in the inferred flags in some of the test cases.) * Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language. We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced. (You can see this looking at nearby phis in the test cases.) Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen. * Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value. We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.	2021-11-12 15:44:58 -08:00
Alexey Bataev	e2a86ab847	[SLP][NFCAdd a test for vector intrinsic with scalar parameter, NFC.	2021-11-12 13:49:56 -08:00
Philip Reames	de2fed6152	[unroll] Keep unrolled iterations with initial iteration The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is that the new iterations are placed far from the initial iteration. With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting. As such, placing Count-1 copies out of line is a fairly poor code placement choice. We'd much rather fall through into the hot (non-exiting) path. For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code. However, the real motivation for this change isn't performance. Its readability and human understanding. Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.	2021-11-12 11:40:50 -08:00
Philip Reames	a1b496be6c	(re-)Autogen one last unroll-and-jam test This case was complicated because someone had added new non-autogened test to an autogened file. In particular, those new tests used two variables (%J and %j) which differeded only in capitalization. The auto-updater doesn't distinguish case, so this meant auto-gened versions of the new tests failed with non-obvious errors. There are two key lessons here: 1) Please don't use two values which differ only in case. This is problematic for automatic tooling, but is also hard to understand for a human. 2) Please DO NOT add new tests to an autogened test without running autogen again. If autogen doesn't pass on your new test, put them in a separate file.	2021-11-12 11:21:08 -08:00
Philip Reames	f453e23e67	Autogen a bunch of unrolling tests for ease of update	2021-11-12 10:34:50 -08:00
Philip Reames	5dd64ef528	Refresh an autogen test to reduce spurious diffs	2021-11-12 09:48:37 -08:00
Philip Reames	e01c91f242	[tests] Add coverage for cases we can prune exits when runtlme unrolling	2021-11-12 09:43:16 -08:00
Joel E. Denny	c9dfe322ee	[OpenMP] Fix main thread barrier for Pascal and amdgpu Fixes what's left of https://bugs.llvm.org/show_bug.cgi?id=51781. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113602	2021-11-12 11:18:45 -05:00
Florian Hahn	30ebdf8a6d	[LV] Precommit test case from PR52485.	2021-11-12 16:09:19 +00:00
Roman Lebedev	8d35c054e3	[NFC][SROA] Add more tests for non-capturing pointer-escaping calls	2021-11-12 18:08:39 +03:00
Kerry McLaughlin	7647822156	[AArch64][SVE] Remove i1 type from isElementTypeLegalForScalableVector `collectElementTypesForWidening` collects the types of load, store and reduction Phis in a loop. These types are later checked using `isElementTypeLegalForScalableVector` to prevent vectorisation of loops with instruction types that are unsupported. This patch removes i1 from the list of types supported for scalable vectors. This fixes an assert ("Cannot yet scalarize uniform stores") in `setCostBasedWideningDecision` when we have a loop containing a uniform i1 store and a scalable VF, which we cannot create a scatter for. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D113680	2021-11-12 14:24:38 +00:00
Alexey Bataev	352c46e707	[SLP]Improve vectorization of split loads. Need to fix ther cost estimation for split loads, since we look at the subregs already, no need to permute them, need just to estimate subregister insert, if it is smaller than the real register. Also, using split loads, it might be profitable already to vectorize smaller trees with gathering of the loads. Differential Revision: https://reviews.llvm.org/D107188	2021-11-12 06:13:22 -08:00
Florian Hahn	9c00afe926	[DSE] Add test case with multiple inbounds stores, followed by OOB. This patch extends the existing out-of-bounds store tests with a case with a bigger object and multiple inbounds stores, followed by an OOB store. The OOB store is not used to remove the inbounds stores in this case at the moment.	2021-11-12 09:40:03 +00:00
Nikita Popov	84e273cced	[InstCombine] Handle undefs in and of icmp eq zero fold For the scalar/splat case, this fold is subsumed by foldLogOpOfMaskedICmps(). However, the conjugated fold for "or" also supports splats with undef. Make both code paths consistent by using m_ZeroInt() for the "and" implementation as well. https://alive2.llvm.org/ce/z/tN63cu https://alive2.llvm.org/ce/z/ufB_Ue	2021-11-11 19:07:07 +01:00
Nikita Popov	1f568f2a25	[InstCombine] Add test for and of icmp ne zero with undefs (NFC) We handle this in the conjugated "or" fold, but not for "and".	2021-11-11 19:01:29 +01:00
Huihui Zhang	5a4bd07ea4	[InstCombine][NFC] Pre-commit baseline test for D113442. Add baseline test to check for folding select into binop operand enabled through D113442.	2021-11-10 19:45:27 -08:00
Stanislav Mekhanoshin	557f4ce0c3	[InstCombine] Precommit updated and-xor-or.ll tests. NFC. Tests for: ``` (a \| ~(b & c)) & ~(a & (b ^ c)) --> ~(a \| b) \| (a ^ b ^ c) (~(a & b) \| c) & ~(a & (b & c)) -> ~(a & b) (~(a & b) \| c) & ~(b & (a & c)) -> ~(a & b) (~a \| b \| c) & ~(a & b & c) -> ~a \| (b ^ c) (~a \| b \| c) & ~(a & b) -> (c & ~b) \| ~a ```	2021-11-10 17:10:06 -08:00
Nikita Popov	0242a6adf7	[InstCombine] Support splat vectors in some or of icmp folds Replace m_ConstantInt() with m_APInt() in order to support splat constants in addition to scalar integers.	2021-11-10 22:59:09 +01:00
Nikita Popov	861adaf2ad	[InstCombine] Support splat vectors in some and of icmp folds Replace m_ConstantInt() with m_APInt() to support splat vectors in addition to scalar integers.	2021-11-10 22:37:54 +01:00
Nikita Popov	fa4e9e64e2	[InstCombine] Add vector variants to merge-icmps.ll (NFC) And regenerate test checks.	2021-11-10 22:36:06 +01:00
Nikita Popov	58ebc79a64	[InstCombine] Strip offset when folding and/or of icmps When folding and/or of icmps, look through add of a constant and adjust the icmp range instead. Effectively, this decomposes X + C1 < C2 style range checks back into a normal range. This allows us to fold comparisons involving two range checks or one range check and some other condition. We had a fold for a really specific case of this (or of range check and eq, and only one one side!) while this handles it in fully generality. Differential Revision: https://reviews.llvm.org/D113510	2021-11-10 22:01:52 +01:00
Stanislav Mekhanoshin	5731381594	[InstCombine] Relax and reorganize one use checks in the ~(a \| b) & c Since there is just a single check for LHS in ~(A \| B) & C \| ... transforms and multiple RHS checks inside with more coming I am removing m_OneUse checks for LHS and adding new checks for RHS. This is non essential as long as there is total benefit. In addition (~(A \| B) & C) \| (~(A \| C) & B) --> (B ^ C) & ~A checks were overly restrictive, it should be good without any additional checks. Differential Revision: https://reviews.llvm.org/D113141	2021-11-10 10:14:12 -08:00
Nikita Popov	fb1a203e45	[InstCombine] Add additional test with signed range check (NFC)	2021-11-10 18:00:24 +01:00
Sanjay Patel	67299aa84f	[InstCombine] add check for integer source type from cast to prevent crash A problem was noted in the post-commit review for c36b7e21bd8f04a44d6 / D113035 : If the source type is not integer or integer vector, then we could crash when trying to ComputeNumSignBits().	2021-11-10 09:44:55 -05:00
Florian Hahn	e7f1232cb7	[LV] Move optimized IV recipes to phi section of header after sinking. Unfortunately sinking recipes for first-order recurrences relies on the original position of recipes. So if a recipes needs to be sunk after an optimized induction, it needs to stay in the original position, until sinking is done. This is causing PR52460. To fix the crash, keep the recipes in the original position until sink-after is done. Post-commit follow-up to c45045bfd04af9 to address PR52460.	2021-11-10 11:41:08 +00:00

1 2 3 4 5 ...

20217 Commits