llvm-project

Author	SHA1	Message	Date
Nikita Popov	a34ae06c20	[SLPVectorizer] Convert some tests to opaque pointers (NFC)	2023-01-04 16:34:39 +01:00
Dinar Temirbulatov	55c600819f	[SLP][AArch64] Incorrectly estimated intrinsic as a function call. We incorrectly assume intrinsic as a function call and it prevents us from the opportunity to vectorize. On Aarch64 Cortex-A53 we think that llvm.fmuladd.f64 is a function call which is wrong. Differential Revision: https://reviews.llvm.org/D140392	2023-01-03 19:45:24 +00:00
Alexey Bataev	26fec4e845	[SLP]Fix crash on casting non-instruction extractelement. Need to check if the extractelement operation is an extraction before trying to move it around the buildblocks to avoid crash on cast.	2023-01-03 09:45:57 -08:00
Dinar Temirbulatov	3c205efe8b	[SLP][AArch64] Add fmuladd test coverage	2023-01-03 11:28:18 +00:00
Nikita Popov	580210a0c9	[SLP] Convert some tests to opaque pointers (NFC)	2022-12-23 10:02:57 +01:00
Alexey Bataev	2e972ea056	[SLP]Integrate looking through shuffles logic into ShuffleInstructionBuilder. Added BaseShuffleAnalysis as a base class for ShuffleInstructionBuilder and integrated shuffle logic from shuffles for externally used scalars into this class. This class is used as the main container that implements smart shuffle instruction builder logic. ShuffleInstructionBuilder uses this logic. ShuffleInstructionBuilder is also used in building of the shuffle for the externally used scalars instead of lambdas, which are now part of BaseShuffleAnalysis class. Differential Revision: https://reviews.llvm.org/D140100	2022-12-21 06:12:53 -08:00
Sjoerd Meijer	5c94faba0b	[TTI] [AArch64] getMemoryOpCost for ptr types Opaque ptr types have a size in bits of 0. The legalised type is an i64 or vector of i64s, which do have a size. Because of this difference in size, target hook getMemoryOpCost modelled stores of ptr types as extending/truncating load/stores. Now we just check for opaque ptr types and return the legalised cost. This makes stores of pointers cheaper, and as a result we now SLP vectorise the changed test case. Differential Revision: https://reviews.llvm.org/D140193	2022-12-16 15:38:17 +00:00
Sjoerd Meijer	e909c3d31f	[CostModel][AArch64] Precommit opaque ptr store tests. NFC.	2022-12-16 15:34:12 +00:00
Roman Lebedev	6697140ba1	[NFC] Port all SLPVectorizer tests to `-passes=` syntax	2022-12-07 21:44:09 +03:00
Alexey Bataev	2f8f17c157	[SLP]Fix PR58956: fix insertpoint for reduced buildvector graphs. If the graph is only the buildvector node without main operation, need to inherit insrtpoint from the redution instruction. Otherwise the compiler crashes trying to insert instruction at the entry block.	2022-11-16 07:38:49 -08:00
Alexey Bataev	b505fd559d	[SLP]Redesign vectorization of the gather nodes. Gather nodes are vectorized as simply vector of the scalars instead of relying on the actual node. It leads to the fact that in some cases we may miss incorrect transformation (non-matching set of scalars is just ended as a gather node instead of possible vector/gather node). Better to rely on the actual nodes, it allows to improve stability and better detect missed cases. Differential Revision: https://reviews.llvm.org/D135174	2022-11-10 10:59:54 -08:00
Alexey Bataev	ecd0b5a532	Revert "[SLP]Redesign vectorization of the gather nodes." This reverts commit 8ddd1ccdf89317be1c40fa9183e214878a56151e to fix buildbots failures reported in https://lab.llvm.org/buildbot#builders/74/builds/14839	2022-11-07 08:35:21 -08:00
Alexey Bataev	8ddd1ccdf8	[SLP]Redesign vectorization of the gather nodes. Gather nodes are vectorized as simply vector of the scalars instead of relying on the actual node. It leads to the fact that in some cases we may miss incorrect transformation (non-matching set of scalars is just ended as a gather node instead of possible vector/gather node). Better to rely on the actual nodes, it allows to improve stability and better detect missed cases. Differential Revision: https://reviews.llvm.org/D135174	2022-11-07 07:04:38 -08:00
David Green	0e9dfff37e	[SLP][AArch64] Add a test case for SLP phi ordering of scalable vectors. NFC	2022-11-06 12:06:12 +00:00
Alexey Bataev	087dadfd37	[SLP]Generalize cost model. Generalized the cost model estimation. Improved cost model estimation for repeated scalars (no need to count their cost anymore), improved cost model for extractelement instructions. cpu2017 511.povray_r 0.57 520.omnetpp_r -0.98 521.wrf_r -0.01 525.x264_r 3.59 <+ 526.blender_r -0.12 531.deepsjeng_r -0.07 538.imagick_r -1.42 Geometric mean: 0.21 Differential Revision: https://reviews.llvm.org/D115757	2022-10-18 11:55:59 -07:00
Alexey Bataev	62267e8de0	Revert "[SLP]Generalize cost model." This reverts commit f12fb91188b836e1bddb36bacbbdb8e4ab70b9b6 and f5c747bfbe36b8f53e6fe2d85ffcaecba6d7153c to fix detected non-initialized var use.	2022-10-18 11:25:59 -07:00
Alexey Bataev	f12fb91188	[SLP]Generalize cost model. Generalized the cost model estimation. Improved cost model estimation for repeated scalars (no need to count their cost anymore), improved cost model for extractelement instructions. cpu2017 511.povray_r 0.57 520.omnetpp_r -0.98 521.wrf_r -0.01 525.x264_r 3.59 <+ 526.blender_r -0.12 531.deepsjeng_r -0.07 538.imagick_r -1.42 Geometric mean: 0.21 Differential Revision: https://reviews.llvm.org/D115757	2022-10-18 08:49:32 -07:00
Bjorn Pettersson	3be72f4029	[test][SLPVectorizer] Use -passes syntax in RUN lines. NFC	2022-10-13 10:44:38 +02:00
Alexey Bataev	1be3428ea0	[SLP]Fix PR58177: Improve isUndefVector function to avoid extra freeze. Freeze instruction in some cases makes codegen worse, so need to be very careful when emitting it. Instead improve analysis in isUndefVector function to generate mask of unused elements and use it in the analysis. Differential Revision: https://reviews.llvm.org/D135382	2022-10-12 07:32:54 -07:00
Arthur Eubanks	f3a928e233	[opt] Don't translate legacy -analysis flag to require<analysis> Tests relying on this should explicitly use -passes='require<analysis>,foo'.	2022-10-07 14:54:34 -07:00
Arthur Eubanks	e23aee7175	[test] Update some legacy PM tests	2022-09-30 11:31:02 -07:00
Simon Pilgrim	5849fcb635	Revert rG1b7089fe67b924bdd5ecef786a34bdba7a88778f "[SLP] Add ScalarizationOverheadBuilder helper to track vector extractions" Revert rGef89409a59f3b79ae143b33b7d8e6ee6285aa42f "Fix 'unused-lambda-capture' gcc warning. NFCI." Revert rG926ccfef032d206dcbcdf74ca1e3a9ebf4d1be45 "[SLP] ScalarizationOverheadBuilder - demand all elements for scalarization if the extraction index is unknown / out of bounds" Revert ScalarizationOverheadBuilder sequence from D134605 - when accumulating extraction costs by Type (instead of specific Value), we are not distinguishing enough when they are coming from the same source or not, and we always just count the cost once. This needs addressing before we can use getScalarizationOverhead properly.	2022-09-30 11:22:48 +01:00
Simon Pilgrim	28a3fc39a6	[SLP][AArch64] Add test case for vectorization regression case reported on D134605	2022-09-30 11:07:54 +01:00
Alexey Bataev	982d9ef1c1	[SLP]Fix PR55734: SLP vectorizer's reduce_and formation introduces poison. Need either follow the original order of the operands for bool logical ops, or emit freeze instruction to avoid poison propagation. Differential Revision: https://reviews.llvm.org/D126877	2022-09-01 05:34:45 -07:00
Florian Hahn	e68594ca20	[SLP] Add FMA test case with missing or partial fast-math flags. Add extra FMA tests with missing or partial fast-math flags.	2022-08-31 16:25:18 +01:00
Florian Hahn	3c5e24a51c	[SLP] Add tests showing over-eager SLP when scalar fma can be used. Add test cases for AArch64 that show over-eager SLP vectorization on AArch64, where keeping the things scalar allows efficient lowering using scalar fmas.	2022-08-29 18:58:56 +01:00
Vasileios Porpodas	f669030373	[TTI][AArch64][SLP] Sets the cost of an ADD reduction 2xi64 to 2. 2xi64 is the legalized type for wide reductions (like 16xi64) and setting the cost to 2 makes `load-reduce` and `load-zext-reduce` patterns profitable. The few performance measurments that I did on an aarch64 machine confirm that these patterns are actually faster when vectorized. Differential Revision: https://reviews.llvm.org/D130740	2022-08-01 13:03:14 -07:00
Vasileios Porpodas	2d9eae4152	[NFC][TTI][AArch64][SLP] Precommit test for a TTI cost fix of i64 add reductions.	2022-08-01 11:20:12 -07:00
David Green	2de05afc19	[SLP] Peek into loads when hitting the RecursionMaxDepth This patch slightly extends the limit on the RecursionMaxDepth inside the SLP vectorizer. It does it only when it hits a load (or zext/sext of a load), which allows it to peek through in the places where it will be the most valuable, without ballooning out the O(..) by any 2^n factors. Differential Revision: https://reviews.llvm.org/D122148	2022-07-04 14:22:50 +01:00
Alexey Bataev	2faacf61a5	[SLP]Improve shuffles cost estimation where possible. Improved/fixed cost modeling for shuffles by providing masks, improved cost model for non-identity insertelements. Differential Revision: https://reviews.llvm.org/D115462	2022-06-24 09:28:01 -07:00
Fangrui Song	1ffd2d99c2	Revert D115462 "[SLP]Improve shuffles cost estimation where possible." This reverts commit cac60940b771a0685d058a5b471c84cea05fdc46. Caused -Os -fsanitize=memory -march=haswell miscompile to pytorch/cpuinfo. See my latest comment (may update) on D115462.	2022-06-22 23:16:31 -07:00
Fangrui Song	a411bc11d6	Revert "[SLP]Fix a crash when insert subvector is out of range." This reverts commit f1ee2738b3d70fea803ac1f3401c2fc9f61e514a. Revert due to the revert of a dependent commit `[SLP]Improve shuffles cost estimation where possible.`	2022-06-22 23:16:25 -07:00
Alexey Bataev	f1ee2738b3	[SLP]Fix a crash when insert subvector is out of range. If the OffsetBeg + InsertVecSz is greater than VecSz, need to estimate the cost as shuffle of 2 vector, not as insert of subvector. Otherwise, the inserted subvector is out of range and compiler may crash. Differential Revision: https://reviews.llvm.org/D128071	2022-06-21 07:16:35 -07:00
Alexey Bataev	cac60940b7	[SLP]Improve shuffles cost estimation where possible. Improved/fixed cost modeling for shuffles by providing masks, improved cost model for non-identity insertelements. Differential Revision: https://reviews.llvm.org/D115462	2022-06-03 08:06:22 -07:00
Fangrui Song	df0f30dc36	Revert "[SLP]Improve shuffles cost estimation where possible." This reverts commit 9980c9971892378ea82475e000de8df210a58e69. Caused assertion failures: https://reviews.llvm.org/D115462#3555350	2022-06-03 00:30:34 -07:00
Alexey Bataev	9980c99718	[SLP]Improve shuffles cost estimation where possible. Improved/fixed cost modeling for shuffles by providing masks, improved cost model for non-identity insertelements. Differential Revision: https://reviews.llvm.org/D115462	2022-06-02 11:18:14 -07:00
Alexey Bataev	73020b4540	Revert "[SLP]Improve shuffles cost estimation where possible." This reverts commit fd5a6ce9dcb77b7821c95355d73af0b3b2020647 to fix a crash detected by a buildbot https://lab.llvm.org/buildbot/#/builders/179/builds/3805/steps/11/logs/stdio.	2022-06-01 15:44:51 -07:00
Alexey Bataev	fd5a6ce9dc	[SLP]Improve shuffles cost estimation where possible. Improved/fixed cost modeling for shuffles by providing masks, improved cost model for non-identity insertelements. Differential Revision: https://reviews.llvm.org/D115462	2022-06-01 11:01:37 -07:00
Alexey Bataev	120d52b0ef	[SLP]Fix PR55653: emit undefs where required, not poison. Need to handle a corner case correctly, if all elements are Undefs/Poisons, need to emit actual values, not just poisons. Differential Revision: https://reviews.llvm.org/D126298	2022-05-26 08:38:50 -07:00
Alexey Bataev	10f41a2147	[SLP]Fix PR55688: Miscompile due to incorrect nuw/nsw handling. Need to use all ReductionOps when propagating flags for the reduction ops, otherwise transformation is not correct. Plus, need to drop nuw/nsw flags. Differential Revision: https://reviews.llvm.org/D126371	2022-05-25 13:59:06 -07:00
Alexey Bataev	2ac5ebedea	[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly. SLP vectorizer emits extracts for externally used vectorized scalars and estimates the cost for each such extract. But in many cases these scalars are input for insertelement instructions, forming buildvector, and instead of extractelement/insertelement pair we can emit/cost estimate shuffle(s) cost and generate series of shuffles, which can be further optimized. Tested using test-suite (+SPEC2017), the tests passed, SLP was able to generate/vectorize more instructions in many cases and it allowed to reduce number of re-vectorization attempts (where we could try to vectorize buildector insertelements again and again). Differential Revision: https://reviews.llvm.org/D107966	2022-05-23 07:06:45 -07:00
Florian Hahn	aeb19817d6	Revert "[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly." This reverts commit fc9c59c355cb255446e571b4515b5e41a76503c4. The patch triggers an assertion when building SPEC on X86. Reduced reproducer shared at D107966. Also reverts follow-up commit 11a09af76d11ad5a9f1f95b561112af17ff81f80.	2022-05-21 21:00:01 +01:00
Alexey Bataev	fc9c59c355	[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly. SLP vectorizer emits extracts for externally used vectorized scalars and estimates the cost for each such extract. But in many cases these scalars are input for insertelement instructions, forming buildvector, and instead of extractelement/insertelement pair we can emit/cost estimate shuffle(s) cost and generate series of shuffles, which can be further optimized. Tested using test-suite (+SPEC2017), the tests passed, SLP was able to generate/vectorize more instructions in many cases and it allowed to reduce number of re-vectorization attempts (where we could try to vectorize buildector insertelements again and again). Differential Revision: https://reviews.llvm.org/D107966	2022-05-20 05:58:09 -07:00
David Green	802e15c576	[SLP] Cluster ordering for loads Given a load without a better order, this patch partially sorts the elements to form clusters of adjacent elements in memory. These clusters can potentially be loaded in fewer loads, meaning less overall shuffling (for example loading v4i8 clusters of a v16i8 as a single f32 loads, as opposed to multiple independent bytes loads and inserts). Differential Revision: https://reviews.llvm.org/D122145	2022-05-07 14:38:11 +01:00
David Green	2db46db54d	[SLP] Add tests for awkward laod orders from SLP. NFC	2022-05-07 10:27:32 +01:00
David Green	6f81903e89	[LV][SLP] Mark fptosi_sat as vectorizable This adds fptosi_sat and fptoui_sat to the list of trivially vectorizable functions, mainly so that the loop vectorizer can vectorize the instruction. Marking them as trivially vectorizable also allows them to be SLP vectorized, and Scalarized. The signature of a fptosi_sat requires two type overrides (@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first operand of the intrinsic as a overloaded (but not scalar) operand. Differential Revision: https://reviews.llvm.org/D124358	2022-05-03 09:32:34 +01:00
Alexey Bataev	7ea03f0b4e	[SLP]Improve reductions analysis and emission, part 1. Currently SLP vectorizer walks through the instructions and selects 3 main classes of values: 1) reduction operations - instructions with same reduction opcode (add, mul, min/max, etc.), which build the reduction, 2) reduced values - instructions with the same opcodes, but different from the reduction opcode, 3) extra arguments - all other values, instructions from the different basic block rather than the root node, instructions with to many/less uses. This scheme is not very efficient. It excludes some instructions and all non-instruction values from the reductions (constants, proficient gathers), to many possibly reduced values are marked as extra arguments. Patch improves this process by introducing a bit extended analysis stage. During this stage, we still try to select 3 classes of the values: 1) reduction operations - same as before, 2) possibly reduced values - all instructions from the current block/non-instructions, which may build a vectorization tree, 3) extra arguments - instructions from the different basic blocks. Additionally, an extra sorting of the possibly reduced values occurs to build the scalar sequences which highly likely will bed vectorized, e.g. loads are grouped by the distance between them, constants are grouped together, cmp instructions are sorted by their compare types and predicates, extractelement instructions are sorted by the vector operand, etc. Also, these groups are reordered by their length so the longest group is the first in the list of the possibly reduced values. The vectorization process tries to emit the reductions for all these groups. These reductions, remaining non-vectorized possible reduced values and extra arguments are then combined into the final expression just like it was before. Differential Revision: https://reviews.llvm.org/D114171	2022-05-02 12:03:58 -07:00
David Green	c7d39fd61a	[LV][SLP] Add tests for vectorizing fptoi_sat intrinsics. NFC	2022-05-02 15:11:44 +01:00
Valery N Dmitriev	88b9e46fb5	[SLP] Steer for the best chance in tryToVectorize() when rooting with binary ops. tryToVectorize() method implements one of searching paths for vectorizable tree roots in SLP vectorizer, specifically for binary and comparison operations. Order of making probes for various scalar pairs was defined by its implementation: the instruction operands, then climb over one operand if the instruction is its sole user and then perform same actions for another operand if previous attempts failed. Problem with this approach is that among these options we can have more than a single vectorizable tree candidate and it is not necessarily the one that encountered first. Trying to build vectorizable tree for each possible combination for just evaluation is expensive. But we already have lookahead heuristics mechanism which we use for finding best pick among operands of commutative instructions. It calculates cumulative score for candidates in two consecutive lanes. This patch introduces use of the heuristics for choosing the best pair among several combinations. We only try one that looks as most promising for vectorization. Additional benefit is that we reduce total number of vectorization trees built for probes because we skip those looking non-profitable early. Reviewed By: Alexey Bataev (ABataev), Vasileios Porpodas (vporpo) Differential Revision: https://reviews.llvm.org/D124309	2022-04-25 12:25:33 -07:00
Vasileios Porpodas	4e971efad4	Recommit "[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64" This reverts commit 7052a0ad689b990265ec79bd2b0a7d6e8c131bfe.	2022-04-22 15:44:02 -07:00

1 2 3 4 5 ...

274 Commits