llvm-project

Author	SHA1	Message	Date
Alexey Bataev	2e0031b371	[SLP] Remove unused initial value from the variable, NFC. llvm-svn: 295826	2017-02-22 12:57:58 +00:00
Alexey Bataev	f96465b9b8	[SLP] nullptr'ize initial value in `findBuildAggregate()`, NFC. Initial value of V is sett nullptr, as it is not used. llvm-svn: 295642	2017-02-20 08:04:11 +00:00
Alexey Bataev	2f6b124e01	[SLP] Rework `findBuildAggregate()` from ercursive form to iterative, NFC. Reviewers: mkuper Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D30103 llvm-svn: 295641	2017-02-20 07:49:39 +00:00
Matthew Simpson	f68e183f91	[LV] Remove constant restriction for vector phi creation We previously only created a vector phi node for an induction variable if its step had a constant integer type. However, the step actually only needs to be loop-invariant. We only handle inductions having loop-invariant steps, so this patch should enable vector phi node creation for all integer induction variables that will be vectorized. Differential Revision: https://reviews.llvm.org/D29956 llvm-svn: 295456	2017-02-17 16:09:07 +00:00
Michael Kuperstein	569162fefe	[LV] Rename Induction to PrimaryInduction. NFC. llvm-svn: 295111	2017-02-14 22:14:01 +00:00
Matthew Simpson	f09d13e5cc	Reapply "[LV] Extend trunc optimization to all IVs with constant integer steps" This reapplies commit r294967 with a fix for the execution time regressions caught by the clang-cmake-aarch64-quick bot. We now extend the truncate optimization to non-primary induction variables only if the truncate isn't already free. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 295063	2017-02-14 16:28:32 +00:00
Alexey Bataev	2a2f35d59c	[SLP] Fix for PR31879: vectorize repeated scalar ops that don't get put back into a vector Previously the cost of the existing ExtractElement/ExtractValue instructions was considered as a dead cost only if it was detected that they have only one use. But these instructions may be considered dead also if users of the instructions are also going to be vectorized, like: ``` %x0 = extractelement <2 x float> %x, i32 0 %x1 = extractelement <2 x float> %x, i32 1 %x0x0 = fmul float %x0, %x0 %x1x1 = fmul float %x1, %x1 %add = fadd float %x0x0, %x1x1 ``` This can be transformed to ``` %1 = fmul <2 x float> %x, %x %2 = extractelement <2 x float> %1, i32 0 %3 = extractelement <2 x float> %1, i32 1 %add = fadd float %2, %3 ``` because though `%x0` and `%x1` have 2 users each other, these users are part of the vectorized tree and we can consider these `extractelement` instructions as dead. Differential Revision: https://reviews.llvm.org/D29900 llvm-svn: 295056	2017-02-14 15:20:48 +00:00
Karl-Johan Karlsson	ec21b769ec	Revert "[LoopVectorize] Added address space check when analysing interleaved accesses" This reverts r295038. The buildbot clang-with-thin-lto-ubuntu failed. I'm reverting to investigate. llvm-svn: 295042	2017-02-14 10:06:16 +00:00
Karl-Johan Karlsson	2ec409cca2	[LoopVectorize] Added address space check when analysing interleaved accesses Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 llvm-svn: 295038	2017-02-14 08:14:06 +00:00
Matthew Simpson	659f92e2aa	Revert "[LV] Extend trunc optimization to all IVs with constant integer steps" This reverts commit r294967. This patch caused execution time slowdowns in a few LLVM test-suite tests, as reported by the clang-cmake-aarch64-quick bot. I'm reverting to investigate. llvm-svn: 294973	2017-02-13 18:02:35 +00:00
Matthew Simpson	7b7f40297f	[LV] Extend trunc optimization to all IVs with constant integer steps This patch extends the optimization of truncations whose operand is an induction variable with a constant integer step. Previously we were only applying this optimization to the primary induction variable. However, the cost model assumes the optimization is applied to the truncation of all integer induction variables (even regardless of step type). The transformation is now applied to the other induction variables, and I've updated the cost model to ensure it is better in sync with the transformation we actually perform. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 294967	2017-02-13 16:48:00 +00:00
Alexey Bataev	e8b1536e21	[SLP] Fix for PR31690: Allow using of extra values in horizontal reductions. Currently, LLVM supports vectorization of horizontal reduction instructions with initial value set to 0. Patch supports vectorization of reduction with non-zero initial values. Also, it supports a vectorization of instructions with some extra arguments, like: ``` float f(float x[], int a, int b) { float p = a % b; p += x[0] + 3; for (int i = 1; i < 32; i++) p += x[i]; return p; } ``` Patch allows vectorization of this kind of horizontal reductions. Differential Revision: https://reviews.llvm.org/D29727 llvm-svn: 294934	2017-02-13 08:01:26 +00:00
Dehao Chen	fb02f7140a	Encode duplication factor from loop vectorization and loop unrolling to discriminator. Summary: This patch starts the implementation as discuss in the following RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html When optimization duplicates code that will scale down the execution count of a basic block, we will record the duplication factor as part of discriminator so that the offline process tool can find the duplication factor and collect the accurate execution frequency of the corresponding source code. Two important optimization that fall into this category is loop vectorization and loop unroll. This patch records the duplication factor for these 2 optimizations. The recording will be guarded by a flag encode-duplication-in-discriminators, which is off by default. Reviewers: probinson, aprantl, davidxl, hfinkel, echristo Reviewed By: hfinkel Subscribers: mehdi_amini, anemet, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D26420 llvm-svn: 294782	2017-02-10 21:09:07 +00:00
Matthew Simpson	df124a7569	[LV] Remove type restriction for vector phi creation We previously only created a vector phi node for an induction variable if its type matched the type of the canonical induction variable. Differential Revision: https://reviews.llvm.org/D29776 llvm-svn: 294755	2017-02-10 16:15:26 +00:00
Elena Demikhovsky	5267edd3e3	[Loop Vectorizer] Cost-based decision for vectorization form of memory instruction. Making the cost model selecting between Interleave, GatherScatter or Scalar vectorization form of memory instruction. The right decision should be done for non-consecutive memory access instrcuctions that may have more than one vectorization solution. This patch includes the following changes: - Cost Model calculates the cost of Load/Store vector form and choose the better option between Widening, Interleave, GatherScactter and Scalarization. Cost Model keeps the widening decision. - Arrays of Uniform and Scalar values are moved from Legality to Cost Model. - Cost Model collects Uniforms and Scalars per VF. The collection is based on CM decision map of Loadis/Stores vectorization form. - Vectorization of memory instruction is performed according to the CM decision. Differential Revision: https://reviews.llvm.org/D27919 llvm-svn: 294503	2017-02-08 19:25:23 +00:00
Michael Kuperstein	7a86bb2589	[SLP] Revert "Allow using of extra values in horizontal reductions." This breaks when one of the extra values is also a scalar that participates in the same vectorization tree which we'll end up reducing. llvm-svn: 294245	2017-02-06 21:50:59 +00:00
Michael Kuperstein	2a735b71b6	[SLP] Make sortMemAccesses explicitly return an error. NFC. llvm-svn: 294029	2017-02-03 19:32:50 +00:00
Alexey Bataev	a16cfe6fa9	[SLP] Fix for PR31690: Allow using of extra values in horizontal reductions. Currently LLVM supports vectorization of horizontal reduction instructions with initial value set to 0. Patch supports vectorization of reduction with non-zero initial values. Also it supports a vectorization of instructions with some extra arguments, like: float f(float x[], int a, int b) { float p = a % b; p += x[0] + 3; for (int i = 1; i < 32; i++) p += x[i]; return p; } Patch allows vectorization of this kind of horizontal reductions. Differential Revision: https://reviews.llvm.org/D28961 llvm-svn: 293994	2017-02-03 08:08:50 +00:00
Adam Nemet	0bf1b863b9	[LV] Also port failure remarks to new OptimizationRemarkEmitter API llvm-svn: 293866	2017-02-02 05:41:51 +00:00
Matthew Simpson	ba5cf9dfee	[LV] Move interleaved access helper functions to VectorUtils (NFC) This patch moves some helper functions related to interleaved access vectorization out of LoopVectorize.cpp and into VectorUtils.cpp. We would like to use these functions in a follow-on patch that improves interleaved load and store lowering in (ARM/AArch64)ISelLowering.cpp. One of the functions was already duplicated there and has been removed. Differential Revision: https://reviews.llvm.org/D29398 llvm-svn: 293788	2017-02-01 17:45:46 +00:00
Jonas Paulsson	3f71d6a38e	[LoopVectorize] Improve getVectorCallCost() getScalarizationOverhead() call. By calling getScalarizationOverhead with the CallInst instead of the types of its arguments, we make sure that only unique call arguments are added to the scalarization cost. getScalarizationOverhead() is extended to handle calls by only passing on the actual call arguments (which is not all the operands). This also eliminates a wrapper function with the same name. review: Hal Finkel llvm-svn: 293459	2017-01-30 05:38:05 +00:00
Mohammad Shahid	3121334d32	[SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way. The jumbled scalar loads will be sorted while building the tree and these accesses will be marked to generate shufflevector after the vectorized load with proper mask. Reviewers: hfinkel, mssimpso, mkuper Differential Revision: https://reviews.llvm.org/D26905 Change-Id: I9c0c8e6f91a00076a7ee1465440a3f6ae092f7ad llvm-svn: 293386	2017-01-28 17:59:44 +00:00
Alexey Bataev	4015bf8372	[SLP] Refactoring of horizontal reduction analysis, NFC. Some checks in SLP horizontal reduction analysis function are performed several times, though it is enough to perform these checks only once during an initial attempt at adding candidate for the reduction instruction/reduced value. Differential Revision: https://reviews.llvm.org/D29175 llvm-svn: 293274	2017-01-27 10:54:04 +00:00
Chandler Carruth	6f4ed077d0	[LV] Fix an issue where forming LCSSA in the place that we did would change the set of uniform instructions in the loop causing an assert failure. The problem is that the legalization checking also builds data structures mapping various facts about the loop body. The immediate cause was the set of uniform instructions. If these then change when LCSSA is formed, the data structures would already have been built and become stale. The included test case triggered an assert in loop vectorize that was reduced out of the new PM's pipeline. The solution is to form LCSSA early enough that no information is cached across the changes made. The only really obvious position is outside of the main logic to vectorize the loop. This also has the advantage of removing one case where forming LCSSA could mutate the loop but we wouldn't track that as a "Changed" state. If it is significantly advantageous to do some legalization checking prior to this, we can do a more careful positioning but it seemed best to just back off to a safe position first. llvm-svn: 293168	2017-01-26 10:41:09 +00:00
Jonas Paulsson	8e2f948ef0	[TargetTransformInfo] Refactor and improve getScalarizationOverhead() Refactoring to remove duplications of this method. New method getOperandsScalarizationOverhead() that looks at the present unique operands and add extract costs for them. Old behaviour was to just add extract costs for one operand of the type always, which still happens in getArithmeticInstrCost() if no operands are provided by the caller. This is a good start of improving on this, but there are more places that can be improved by using getOperandsScalarizationOverhead(). Review: Hal Finkel https://reviews.llvm.org/D29017 llvm-svn: 293155	2017-01-26 07:03:25 +00:00
Alexey Bataev	d28ab559a7	[SLP] Improve horizontal vectorization for non-power-of-2 number of instructions. If number of instructions in horizontal reduction list is not power of 2 then only PowerOf2Floor(NumberOfInstructions) last elements are actually vectorized, other instructions remain scalar. Patch tries to vectorize the remaining elements either. Differential Revision: https://reviews.llvm.org/D28959 llvm-svn: 293042	2017-01-25 09:54:38 +00:00
Alexey Bataev	9f8bb384af	[SLP] Refactoring of HorizontalReduction class, NFC. Removed data members ReduxWidth and MinVecRegSize + some C++11 stylish improvements. Differential Revision: https://reviews.llvm.org/D29010 llvm-svn: 292899	2017-01-24 08:57:17 +00:00
Michael Kuperstein	807982359d	[SLP] Make ReductionOpcode have the right (enum) type. NFC. llvm-svn: 292703	2017-01-21 02:03:03 +00:00
Michael Kuperstein	f8458593cf	[SLP] Delete useless helper. NFC. The helper contained a branch for a special case that is unnecessary, and a cast. llvm-svn: 292698	2017-01-21 01:33:25 +00:00
Mikael Holmen	8bf15614fb	Test commit access, remove trailing whitespace llvm-svn: 292482	2017-01-19 13:35:13 +00:00
Michael Kuperstein	230867e583	[LV] Run loop-simplify and LCSSA explicitly instead of "requiring" them This changes the vectorizer to explicitly use the loopsimplify and lcssa utils, instead of "requiring" the transformations as if they were analyses. This is not NFC, since it changes the LCSSA behavior - we no longer run LCSSA for all loops, but rather only for the loops we expect to modify. Differential Revision: https://reviews.llvm.org/D28868 llvm-svn: 292456	2017-01-19 00:42:28 +00:00
Michael Kuperstein	7cefb409b0	[LV] Allow reductions that have several uses outside the loop We currently check whether a reduction has a single outside user. We don't really need to require that - we just need to make sure a single value is used externally. The number of external users of that value shouldn't actually matter. Differential Revision: https://reviews.llvm.org/D28830 llvm-svn: 292424	2017-01-18 19:02:52 +00:00
Matthew Simpson	3fbdaa5906	[LV] Mark non-consecutive-like pointers non-uniform If a memory instruction will be vectorized, but it's pointer operand is non-consecutive-like, the instruction is a gather or scatter operation. Its pointer operand will be non-uniform. This should fix PR31671. Reference: https://llvm.org/bugs/show_bug.cgi?id=31671 Differential Revision: https://reviews.llvm.org/D28819 llvm-svn: 292254	2017-01-17 20:51:39 +00:00
Chandler Carruth	ca68a3ec47	[PM] Introduce an analysis set used to preserve all analyses over a function's CFG when that CFG is unchanged. This allows transformation passes to simply claim they preserve the CFG and analysis passes to check for the CFG being preserved to remove the fanout of all analyses being listed in all passes. I've gone through and removed or cleaned up as many of the comments reminding us to do this as I could. Differential Revision: https://reviews.llvm.org/D28627 llvm-svn: 292054	2017-01-15 06:32:49 +00:00
Eric Fiselier	0a9eb89cf9	Give comparator const call operator llvm-svn: 292043	2017-01-15 02:06:44 +00:00
Malcolm Parsons	17d266bc96	Remove unused lambda captures. NFC llvm-svn: 291916	2017-01-13 17:12:16 +00:00
Michael Kuperstein	f69e64662b	[SLP] Remove bogus assert. The removed assert seems bogus - it's perfectly legal for the roots of the vectorized subtrees to be equal even if the original scalar values aren't, if the original scalars happen to be equivalent. This fixes PR31599. Differential Revision: https://reviews.llvm.org/D28539 llvm-svn: 291692	2017-01-11 19:23:57 +00:00
Mohammed Agabaria	2c96c43388	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch. updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657	2017-01-11 08:23:37 +00:00
Chandler Carruth	410eaeb064	[PM] Rewrite the loop pass manager to use a worklist and augmented run arguments much like the CGSCC pass manager. This is a major redesign following the pattern establish for the CGSCC layer to support updates to the set of loops during the traversal of the loop nest and to support invalidation of analyses. An additional significant burden in the loop PM is that so many passes require access to a large number of function analyses. Manually ensuring these are cached, available, and preserved has been a long-standing burden in LLVM even with the help of the automatic scheduling in the old pass manager. And it made the new pass manager extremely unweildy. With this design, we can package the common analyses up while in a function pass and make them immediately available to all the loop passes. While in some cases this is unnecessary, I think the simplicity afforded is worth it. This does not (yet) address loop simplified form or LCSSA form, but those are the next things on my radar and I have a clear plan for them. While the patch is very large, most of it is either mechanically updating loop passes to the new API or the new testing for the loop PM. The code for it is reasonably compact. I have not yet updated all of the loop passes to correctly leverage the update mechanisms demonstrated in the unittests. I'll do that in follow-up patches along with improved FileCheck tests for those passes that ensure things work in more realistic scenarios. In many cases, there isn't much we can do with these until the loop simplified form and LCSSA form are in place. Differential Revision: https://reviews.llvm.org/D28292 llvm-svn: 291651	2017-01-11 06:23:21 +00:00
Matthew Simpson	cf796478e9	[LV] Fix-up external IV users after updating dominator tree This patch delays the fix-up step for external induction variable users until after the dominator tree has been properly updated. This should fix PR30742. The SCEVExpander in InductionDescriptor::transform can generate code in the wrong location if the dominator tree is not up-to-date. We should work towards keeping the dominator tree up-to-date throughout the transformation. Reference: https://llvm.org/bugs/show_bug.cgi?id=30742 Differential Revision: https://reviews.llvm.org/D28168 llvm-svn: 291462	2017-01-09 19:05:29 +00:00
Jonas Paulsson	cf7543c44b	Remove unused method in LoopVectorize.cpp. computeInterleaveCount() is not defined/used and is therefore removed. Review: Davide Italiano llvm-svn: 291423	2017-01-09 06:13:21 +00:00
Mohammed Agabaria	23599ba794	Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling. This code seems to be target dependent which may not be the same for all targets. Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'. Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general. Differential Revision: https://reviews.llvm.org/D27518 llvm-svn: 291106	2017-01-05 14:03:41 +00:00
Michael Kuperstein	fb7dd86fd6	[LV] Sink tripcount query to where it's actually used. NFC. llvm-svn: 290142	2016-12-19 22:47:52 +00:00
Daniel Jasper	aec2fa352f	Revert @llvm.assume with operator bundles (r289755-r289757) This creates non-linear behavior in the inliner (see more details in r289755's commit thread). llvm-svn: 290086	2016-12-19 08:22:17 +00:00
Matthew Simpson	a4964f291a	Reapply "[LV] Enable vectorization of loops with conditional stores by default" This patch reapplies r289863. The original patch was reverted because it exposed a bug causing the loop vectorizer to crash in the Python runtime on PPC. The underlying issue was fixed with r289958. llvm-svn: 289975	2016-12-16 19:12:02 +00:00
Matthew Simpson	099af810de	[LV] Don't attempt to type-shrink scalarized instructions After r288909, instructions feeding predicated instructions may be scalarized if profitable. Since these instructions will remain scalar, we shouldn't attempt to type-shrink them. We should only truncate vector types to their minimal bit widths. This bug was exposed by enabling the vectorization of loops containing conditional stores by default. llvm-svn: 289958	2016-12-16 16:52:35 +00:00
Chandler Carruth	48b4e614d8	Revert r289863: [LV] Enable vectorization of loops with conditional stores by default This uncovers a crasher in the loop vectorizer on PPC when building the Python runtime. I'll send the testcase to the review thread for the original commit. llvm-svn: 289934	2016-12-16 11:31:39 +00:00
Matthew Simpson	6a98bcfe33	[LV] Enable vectorization of loops with conditional stores by default This patch sets the default value of the "-enable-cond-stores-vec" command line option to "true". Differential Revision: https://reviews.llvm.org/D27814 llvm-svn: 289863	2016-12-15 20:11:05 +00:00
Hal Finkel	3ca4a6bcf1	Remove the AssumptionCache After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756	2016-12-15 03:02:15 +00:00
Michael Kuperstein	3d23d4a234	[LV] Don't vectorize when we have a small static bound on trip count We currently check if the exact trip count is known and is smaller than the "tiny loop" bound. We should be checking the maximum bound on the trip count instead. Differential Revision: https://reviews.llvm.org/D27690 llvm-svn: 289583	2016-12-13 20:38:18 +00:00

... 8 9 10 11 12 ...

1664 Commits