llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	212b2bbcd1	[VectorCombine][X86] foldShuffleOfCastops - fold shuffle(cast(x),cast(y)) -> cast(shuffle(x,y)) iff cost efficient (#87510 ) Based off the existing foldShuffleOfBinops fold Fixes #67803	2024-04-04 11:22:37 +01:00
Simon Pilgrim	d53b8291bf	[VectorCombine][X86] shuffle-of-casts.ll - adjust zext nneg tests to improve costs for testing Improves SSE vs AVX test results for #87510	2024-04-03 22:27:14 +01:00
Simon Pilgrim	b15d27e249	[VectorCombine][X86] Add additional tests for #87510 Add zext nneg tests and check we don't fold casts with different src types	2024-04-03 19:29:15 +01:00
Simon Pilgrim	4d8a3f5b35	[VectorCombine][X86] Add some tests showing failure to fold shuffle(cast(x),cast(y)) -> cast(shuffle(x,y)) Part of #67803	2024-04-03 16:34:52 +01:00
Simon Pilgrim	1d06f41b72	[VectorCombine] foldBitcastShuffle - peek through any residual bitcasts before creating a new bitcast on top (#86119 ) Encountered while working on #67803, wading through the chains of bitcasts that SSE intrinsics introduces - this patch helps prevents cases where the bitcast chains aren't cleared out and we can't perform further combines until after InstCombine/InstSimplify has run.	2024-04-02 10:58:45 +01:00
Paul Walker	900bea9b1c	[LLVM][test] Convert remaining instances of ConstantExpr based splats to use splat(). This is mostly NFC but some output does change due to consistently inserting into poison rather than undef and using i64 as the index type for inserts.	2024-02-27 13:37:23 +00:00
Luke Lau	b0edc1c452	[Loads] Fix crash in isSafeToLoadUnconditionally with scalable accessed type (#82650 ) This fixes #82606 by updating isSafeToLoadUnconditionally to handle fixed sized loads from a scalable accessed type.	2024-02-23 01:49:19 +08:00
Simon Pilgrim	769c22f25b	[VectorCombine] Fold reduce(trunc(x)) -> trunc(reduce(x)) iff cost effective (#81852 ) Vector truncations can be pretty expensive, especially on X86, whilst scalar truncations are often free. If the cost of performing the add/mul/and/or/xor reduction is cheap enough on the pre-truncated type, then avoid the vector truncation entirely. Fixes https://github.com/llvm/llvm-project/issues/81469	2024-02-19 11:32:23 +00:00
Nikita Popov	2d69827c5c	[Transforms] Convert tests to opaque pointers (NFC)	2024-02-05 11:57:34 +01:00
Michael Maitland	acef83c142	[VectorCombine] Fix crash in scalarizeVPIntrinsic (#72039 ) When getSplatOp returns nullptr, the intrinsic cannot be scalarized. This patch includes a test case that fixes a crash from trying to scalarize the VPIntrinsic when getSplatOp returns nullptr. This fixes https://github.com/llvm/llvm-project/issues/72034.	2023-11-11 19:54:15 -05:00
Nikita Popov	6a06155c53	[VectorCombine] Discard ScalarizationResults if transform aborted Fixes https://github.com/llvm/llvm-project/issues/69820.	2023-10-31 11:24:30 +01:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Nabeel Omer	8e31acf8ca	[VectorCombine] Add special handling for truncating shuffles (#70013 ) When dealing with a truncating shuffle, we can end up in a situation where the type passed to getShuffleCost is the type of the result of the shuffle, and the mask references an element which is out of bounds of the result vector. If dealing with truncating shuffles, pass the type of the input vectors to `getShuffleCost()` in order to avoid an out-of-bounds assertion.	2023-10-24 15:03:43 +01:00
Hans Wennborg	e2fc68c3db	Typos: 'maxium', 'minium'	2023-10-23 10:42:28 +02:00
Luke Lau	c35939b22e	[VectorCombine] Use isSafeToSpeculativelyExecute to guard VP scalarization (#69494 ) Previously we were just matching against a fixed list of VP intrinsics that we knew couldn't be speculated, but we can reuse the logic in isSafeToSpeculativelyExecuteWithOpcode. This also allows speculation in more cases, e.g. when the divisor is known to be non-zero. Unfortunately we can't reuse the exact same function call for VP intrinsics with functional intrinsics instead of opcodes, because isSafeToSpeculativelyExecute needs an instruction that already exists. So this just copies the logic by peeking into the function attributes of the intrinsic.	2023-10-19 12:45:21 -04:00
Luke Lau	3927b9ab11	[VectorCombine] Add tests for unspeculatable VP binops. NFC The current test cases to guard against speculative execution can actually be safely speculated because the denominator is known to be not 0 or -1, and isSafeToSpeculativelyExecuteWithOpcode will account for this. This adds some more test cases and rejigs some existing ones to use an unknown variable instead.	2023-10-18 14:23:02 -04:00
Alexey Bataev	c2ae16f6a7	[VectorCombine]Fix a crash during long vector analysis. If the analysis of the single vector requested, need to use original type to avoid crash	2023-10-09 14:22:37 -07:00
Simon Pilgrim	a16f6462d7	[TTI] improveShuffleKindFromMask - detect SK_ExtractSubvector patterns from SK_PermuteSingleSrc	2023-10-06 11:59:51 +01:00
Simon Pilgrim	94795a37e8	[VectorCombine] foldBitcastShuf - add support for length changing shuffles Allow length changing shuffle masks in the "bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'" fold. It also exposes some poor shuffle mask detection for extract/insert subvector cases inside improveShuffleKindFromMask First stage towards addressing Issue #67803	2023-10-06 11:59:51 +01:00
Simon Pilgrim	3bae69ec8c	[VectorCombine][X86] Add additional length changing foldBitcastShuf tests Made these TODO instead of negative	2023-10-06 11:59:50 +01:00
Nikita Popov	3b82397965	[VectorCombine] Check for non-byte-sized element type We should check whether the element type is non-byte-sized, not the vector type. For types like <32 x i1> the whole type is byte-sized, but the individual elements (that we scalarize to) are not. Fixes https://github.com/llvm/llvm-project/issues/67060.	2023-09-28 14:18:30 +02:00
Nikita Popov	95606a58c1	[VectorCombine] Add tests for #67060 (NFC)	2023-09-28 14:18:29 +02:00
Ben Shi	ea0ee55c02	[VectorCombine] Enable transform 'scalarizeLoadExtract' for non constant indexes (#65445 ) Enable the transform if a non constant index is guaranteed to be safe via a UREM/AND.	2023-09-26 09:41:53 +08:00
Michael Maitland	81b0c24cb1	[RISCV] Add llvm/test/Transforms/VectorCombine/RISCV/lit.local.cfg This directory was missing a lit.local.cfg which was causing some build bots to fail when #65706 was comitted.	2023-09-20 15:50:19 -07:00
Michael Maitland	e0aaa1956d	[VectorCombine][RISCV] Convert VPIntrinsics with splat operands to splats (#65706 ) of the scalar operation VP Intrinsics whose vector operands are both splat values may be simplified into the scalar version of the operation and the result is splatted. This issue is the intrinsic dual of #65072.	2023-09-20 18:27:51 -04:00
Ben Shi	068357d9b0	[VectorCombine] Enable transform 'scalarizeLoadExtract' for scalable vector types (#65443 ) The transform 'scalarizeLoadExtract' can be applied to scalable vector types if the index is less than the minimum number of elements. The check whether the index is less than the minimum number of elements locates at line 1175~1180. 'scalarizeLoadExtract' will call 'canScalarizeAccess' and check the returned result if this transform is safe. At the beginning of the function 'canScalarizeAccess', the index will be checked 1. If it is less than the number of elements of a fixed vector type. 2. If it is less than the minimum number of elements of a scalable vector type. Otherwise 'canScalarizeAccess' will return unsafe and this transform will be prevented.	2023-09-18 10:49:18 +08:00
Ben Shi	f3fbea2cac	[VectorCombine][test] Supplement tests of the load-extractelement sequence (#65442 ) The newly added tests are all about scalable vector types.	2023-09-13 18:56:44 +08:00
Ben Shi	ad35d916cd	[VectorCombine] Enable transform 'foldSingleElementStore' for scalable vector types The transform 'foldSingleElementStore' can be applied to scalable vector types if the index is less than the minimum number of elements. Reviewed By: dmgreen, nikic Differential Revision: https://reviews.llvm.org/D157676	2023-08-23 17:12:36 +08:00
Alexey Bataev	9a207578ac	[TTI]Add InsertSubvector pattern in improveShuffleKindFromMask(). It improves shuffle instructions estimation and improves vectorization outcome. Differential Revision: https://reviews.llvm.org/D157425	2023-08-18 13:47:01 -07:00
Ben Shi	ad648c974e	[VectorCombine][NFC][test] Supplement tests of the load-insert-store sequence The newly added tests are all about scalable vector types. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D157098	2023-08-14 14:25:36 +08:00
David Green	2a859b2014	[AArch64] Change the cost of vector insert/extract to 2 The cost of vector instructions has always been high under AArch64, in order to add a high cost for inserts/extracts, shuffles and scalarization. This is a conservative approach to limit the scope of unusual SLP vectorization where the codegen ends up being quite poor, but has always been higher than the correct costs would be for any specific core. This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is also overridden for integer vector at the same time, to remove the effect of lane 0 being considered free for integer vectors (something that should only be true for float when scalarizing). The lower insert/extract cost will reduce the cost of insert, extracts, shuffling and scalarization. The adjustments of ScalaizationOverhead will increase the cost on integer, especially for small vectors. The end result will be lower cost for float and long-integer types, some higher cost for some smaller vectors. This, along with the raw insert/extract cost being lower, will generally mean more vectorization from the Loop and SLP vectorizer. We may end up regretting this, as that vectorization is not always profitable. In all the benchmarking I have done this is generally an improvement in the overall performance, and I've attempted to address the places where it wasn't with other costmodel adjustments. Differential Revision: https://reviews.llvm.org/D155459	2023-07-28 21:26:50 +01:00
Nuno Lopes	d75fb17963	[VectorCombine] Use poison insteaf of undef as placeholder [NFC] These vector lanes are never accessed. They are used for shifting a value into the right lane and therefore only 1 value of the whole vector is actually used	2023-07-19 10:29:08 +01:00
Fangrui Song	d39b4ce3ce	[test] Replace aarch64-*-eabi with aarch64 Using "eabi" for aarch64 targets is a common mistake and warned by Clang Driver. We want to avoid it elsewhere as well. Just use the common "aarch64" without other triple components.	2023-06-27 20:02:52 -07:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00
Krzysztof Drewniak	f9c1ede254	[AMDGPU] Define data layout entries for buffers Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441	2023-05-03 15:25:58 +00:00
ManuelJBrito	8b56da5e9f	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210	2023-04-27 14:41:10 +01:00
Simon Pilgrim	4060042384	[CostModel][X86] Improve i8 and vXi8 MUL costs We were treating vXi8 multiply as the sum of a trunc(mul(extend(),extend())) which diverged from the costs from llvm-mcaonce we extended beyond legal types Use a modified version of the D103695 script to determine more accurate throughput/latency/codesize/size-latency cost estimates Helps address some of the regressions identified in D148806	2023-04-20 19:38:51 +01:00
Alexey Bataev	0e1312fbe0	[SLP][X86]Fix the cost of reused gathers/buildvectors and floats insert. There are 2 problems in the cost estimation for buildvector/gather. 1. If the buildvector/gather node is the same as another one node, need to estimate the cost of this node as 0. 2. The cost of inserting float point register to non-poison vector is not 0, it should not be considered free. Differential Revision: https://reviews.llvm.org/D148801	2023-04-20 09:34:46 -07:00
Sanjay Patel	af39acda88	[VectorCombine] fix insertion point of shuffles As shown in issue #60649, the new shuffles were being inserted before a phi, and that is invalid. It seems like most test coverage for this fold (foldSelectShuffle) lives in the AArch64 dir, but this doesn't repro there for a base target.	2023-02-10 10:57:11 -05:00
Nikita Popov	c00ffbe02b	[VectorCombine] Convert tests to opaque pointers (NFC)	2022-12-23 10:04:26 +01:00
Matt Devereau	ee4d6c8bf0	[VectorCombine] Enable scalarizeBinopOrCmp for scalable vectors This reverts a change to exclude scalarizeBinopOrCmp in VectorCombine for scalable vectors which caused poor scalable Binop codegen. Differential Revision: https://reviews.llvm.org/D138545	2022-11-23 13:17:21 +00:00
Sanjay Patel	b57819e130	[VectorCombine] widen a load with subvector insert This adapts/copies code from the existing fold that allows widening of load scalar+insert. It can help in IR because it removes a shuffle, and the backend can already narrow loads if that is profitable in codegen. We might be able to consolidate more of the logic, but handling this basic pattern should be enough to make a small difference on one of the motivating examples from issue #17113. The final goal of combining loads on those patterns is not solved though. Differential Revision: https://reviews.llvm.org/D137341	2022-11-10 14:11:32 -05:00
Sanjay Patel	6703d2ecf9	[VectorCombine] add test with addrspacecast; NFC D137341	2022-11-08 08:28:56 -05:00
Sanjay Patel	b62c81b836	[VectorCombine] add test with non-canonical shuffle mask; NFC D137341	2022-11-07 12:07:37 -05:00
Matt Devereau	a8c24d57b8	[InstCombine] Remove redundant splats in InstCombineVectorOps Splatting the first vector element of the result of a BinOp, where any of the BinOp's operands are the result of a first vector element splat can be simplified to splatting the first vector element of the result of the BinOp Differential Revision: https://reviews.llvm.org/D135876	2022-11-07 15:39:05 +00:00
Peter Waller	e1790c8c29	Revert "[InstCombine] Remove redundant splats in InstCombineVectorOps" This reverts commit 957eed0b1af2cb88edafe1ff2643a38165c67a40.	2022-11-03 07:56:03 +00:00
Sanjay Patel	b7d7e96006	[VectorCombine] add tests for load+shuffle and update to typeless ptr; NFC	2022-11-02 16:45:46 -04:00
Matt Devereau	957eed0b1a	[InstCombine] Remove redundant splats in InstCombineVectorOps Splatting the first vector element of the result of a BinOp, where any of the BinOp's operands are the result of a first vector element splat can be simplified to splatting the first vector element of the result of the BinOp Differential Revision: https://reviews.llvm.org/D135876	2022-11-02 11:57:05 +00:00

1 2 3 4

194 Commits