llvm-project

Author	SHA1	Message	Date
Graham Hunter	b070629c10	[LV] Increase max VF if vectorized function variants exist (#66639 ) If there are function calls in the candidate loop and we have vectorized variants available, try some wider VFs in case the conservative initial maximum based on the widest types in the loop won't actually allow us to make use of those function variants.	2023-11-13 10:27:10 +00:00
Philip Reames	3f2ed812f0	[InstCombine] Infer nneg on zext when forming from non-negative sext (#70706 ) Builds on #67982 which recently introduced the nneg flag on a zext instruction. InstCombine is one of our largest canonicalizers of zext from non-negative sext instructions, so set the flag there.	2023-10-30 12:09:43 -07:00
Igor Kirillov	70904226e1	[LoopVectorize] Enhance Vectorization decisions for predicate tail-folded loops with low trip counts (#69588 ) * Avoid using `CM_ScalarEpilogueNotAllowedLowTripLoop` for loops known to be predicate tail-folded, delegating to `areRuntimeChecksProfitable` to decide on the profitability of vectorizing loops with runtime checks. * Update the `areRuntimeChecksProfitable` function to consider the `ScalarEpilogueLowering` setting when assessing vectorization of a loop. With this patch, we can make more informed decisions for loops with low trip counts, especially when leveraging Profile-Guided Optimization (PGO) data.	2023-10-30 13:43:26 +00:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Lou Knauer	852bac4439	[VPlan] Support scalable vectors in outer-loop vectorization This patch enables scalable vectors in the VPlan-native path. If a vectorization factor is specified via loop vectorization hints, that factor is used. If no vectorization factor is specified, but the target preferes scalable vectorization, a scalable vectorization factor is selected. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D157484	2023-10-20 23:17:35 +01:00
Graham Hunter	1abc28fea0	[NFC][LV] Add test for vectorizing fmuladd with another call (#68601 ) As requested in (#66521) I confirmed a crash with "return" instead of "continue" in setVectorizedCallDecision's fmuladd reduction recognition.	2023-10-20 10:23:31 +01:00
JolantaJensen	afdb18df4d	[NFC][AArch64][LV] Reorganise LV tests using symbols from SLEEF (#68207 ) The tests introduced by https://reviews.llvm.org/D134719 and later modified in https://reviews.llvm.org/D146839 are not testing LV in isolation. This patch: 1. Assures that all tests test LV in isolation. 2. Adds LV tests using llvm intrinsics that have libm mappings. llrint, llround and lrint are not included as currently IR verifier pass does not allow to use vector types with them.	2023-10-13 12:10:21 +01:00
Rin	df8e0d057d	[AArch64][LoopVectorize] Use upper bound trip count instead of the constant TC when choosing max VF (#67697 ) This patch is based off of https://github.com/llvm/llvm-project/pull/67543. We are currently using the exact trip count to make decisions regarding the maximum VF. We can instead use the upper bound TC, which will be the same as the constant trip count when that is known.	2023-10-09 16:26:19 +01:00
Dmitriy Smirnov	e13bed4c5f	[PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP This patch tries to canonicalise add + gep to gep + gep. Co-authored-by: Paul Walker <paul.walker@arm.com> Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D155688	2023-10-06 12:29:06 +01:00
Rin	d3e4702c0f	[AArch64] [LoopVectorize] Use either fixed-width or scalable VF when tail-folding (#67543 ) Since the getMaximisedVFForTarget function is called twice, once for fixed-width and once for scalable, it adds no value to always return a fixed-width VF. Instead, when we are tail-folding, we can use either fixed-width or scalable vectors.	2023-10-05 10:24:30 +01:00
JolantaJensen	01797dad86	Fix mechanism propagating mangled names for TLI function mappings (#66656 ) Currently the mappings from TLI are used to generate the list of available "scalar to vector" mappings attached to scalar calls as "vector-function-abi-variant" LLVM IR attribute. Function names from TLI are wrapped in mangled name following the pattern: _ZGV<isa><mask><vlen><parameters>_<scalar_name>[(<vector_redirection>)] The problem is the mangled name uses _LLVM_ as the ISA name which prevents the compiler to compute vectorization factor for scalable vectors as it cannot make any decision based on the _LLVM_ ISA. If we use "s" as the ISA name, the compiler can make decisions based on VFABI specification where SVE spacific rules are described. This patch is only a refactoring stage where there is no change to the compiler's behaviour.	2023-10-02 18:58:39 +01:00
Florian Hahn	97687b7aea	[VPlan] Add active-lane-mask as VPlan-to-VPlan transformation. This patch updates the mask creation code to always create compares of the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front when tail folding and introduce active-lane-mask as later transformation. This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count) the canonical form for tail-folding early on. Introducing more specific active-lane-mask recipes is treated as a VPlan-to-VPlan optimization. This has the advantage of keeping the logic (and complexity) of introducing active-lane-mask recipes in a single place, instead of spreading the logic out across multiple functions. It also simplifies initial VPlan construction and enables treating introducing EVL as similar optimization. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158779	2023-09-25 13:34:45 +01:00
Florian Hahn	96e83d3705	[LV] Use IRBuilder to create and optimize middle-block compare. Split off from D150398 to avoid builder-related diff changes there. Using IRBuilder to create ICmps simplifies the result if both operands are constants. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158332	2023-08-29 11:42:18 +01:00
Kerry McLaughlin	5d814b3848	Revert "[AArch64][SVE2] Change the cost of extends with S/URHADD to 0" This reverts commit dda2cd2505301aa626fcd3e8dea2a447227d00ca.	2023-08-14 10:44:13 +00:00
Kerry McLaughlin	dda2cd2505	[AArch64][SVE2] Change the cost of extends with S/URHADD to 0 When SVE2 is enabled, we can combine an add of 1, add & shift right by 1 to a single s/urhadd instruction. If the operands to the adds are extended, these extends will fold into the s/urhadd and their costs should be 0. Reviewed By: dtemirbulatov Differential Revision: https://reviews.llvm.org/D157628	2023-08-14 10:32:06 +00:00
Florian Hahn	af635a5547	[VPlan] Model wrap flags directly, remove NUW opcodes (NFC) Model wrap flags directly using VPRecipeWithIRFlags and clean up the duplicated NUW opcodes. D157144 will build on this and also model FMFs for VPInstruction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D157194	2023-08-08 12:12:30 +01:00
Florian Hahn	93c5bae00e	[VPlan] Use printOperands for VPInstruction. Use the printOperands for printing VPInstruction's operands to be more in line with other recipes and ensure consistent printing after D15719. Also removes some stray spaces in print output.	2023-08-08 11:31:21 +01:00
Jolanta Jensen	3feb63e112	[TLI][AArch64] Add SLEEF mappings to scalable vector functions for fmod and fmodf This patch adds SLEEF mappings to scalable vector functions for fmod and fmodf. Differential Revision: https://reviews.llvm.org/D156920	2023-08-03 14:33:33 +00:00
Florian Hahn	cdb7d5767c	[LV] Add test for select truncation. Add test coverage for truncating selects for D149903.	2023-08-01 18:53:36 +01:00
Florian Hahn	707359ecf5	Recommit "[LV] Re-use existing broadcast value for live-ins." This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53. Recommits eea9258648ce with a fix to only erase the instruction from the first part if it is defined outside the loop. This fixes a use-after-free error reported.	2023-08-01 15:54:02 +01:00
Zhongyunde	497966f7f2	Reland [InstSimplify] Remove the remainder loop if we know the mask is always true We check the loop trip count is known a power of 2 to determine whether the tail loop can be eliminated in D146199. However, the remainder loop of mask scalable loop can also be removed If we know the mask is always going to be true for every vector iteration. Depend on the assume of power-of-two vscale on D155350 proofs： https://alive2.llvm.org/ce/z/bT62Wa Fix https://github.com/llvm/llvm-project/issues/63616. Reviewed By: goldstein.w.n, nikic, david-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D154953	2023-08-01 22:20:22 +08:00
Florian Hahn	4e130420e3	[LV] Add test for op truncation from 245ec675a4e41. Add extra test for that issue in 245ec675a4e41. Also generate full check lines for tests, which should now be deterministic on all platforms.	2023-08-01 13:54:50 +01:00
Nikita Popov	d01aec4c76	[InstCombine] Set dead phi inputs to poison in more cases Set phi inputs to poison whenever we find a dead edge (either during initial worklist population or the main InstCombine run), instead of only doing this for successors of dead blocks. This means that the phi operand is set to poison even if for critical edges without an intermediate block. There are quite a few test changes, because the pattern is fairly common in vectorizer output, for cases where we know the vectorized loop will be entered.	2023-08-01 11:53:47 +02:00
Nikita Popov	7c64449e44	[LoopVectorize] Regenerate test checks (NFC) To reduce spurious diffs in future changes.	2023-08-01 11:30:55 +02:00
Nikita Popov	eb9fce092a	Revert "[InstSimplify] Remove the remainder loop if we know the mask is always true" This reverts commit 3e386b227886e2fb77b0c1e9182026c4e049f346. Next to the original fold, this also implements an unnecessary and inappropriate simplifyICmpWithDominatingAssume() based fold.	2023-08-01 09:03:20 +02:00
Zhongyunde	3e386b2278	[InstSimplify] Remove the remainder loop if we know the mask is always true We check the loop trip count is known a power of 2 to determine whether the tail loop can be eliminated in D146199. However, the remainder loop of mask scalable loop can also be removed If we know the mask is always going to be true for every vector iteration. Depend on the assume of power-of-two vscale on D155350 proofs： https://alive2.llvm.org/ce/z/FkTMoy Fix https://github.com/llvm/llvm-project/issues/63616. Reviewed By: goldstein.w.n, nikic, david-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D154953	2023-08-01 11:20:20 +08:00
Florian Hahn	4162f36bcb	[LV] Regenerate check lines for shrinking tests. Make sure the full IR is checked for loop-vectorization-factors.ll and to make sure nothing gets missed and add missing checks for type-shrinkage-insertelt.ll. Also removes some undef ops from tests.	2023-07-30 16:38:28 +01:00
David Green	2a859b2014	[AArch64] Change the cost of vector insert/extract to 2 The cost of vector instructions has always been high under AArch64, in order to add a high cost for inserts/extracts, shuffles and scalarization. This is a conservative approach to limit the scope of unusual SLP vectorization where the codegen ends up being quite poor, but has always been higher than the correct costs would be for any specific core. This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is also overridden for integer vector at the same time, to remove the effect of lane 0 being considered free for integer vectors (something that should only be true for float when scalarizing). The lower insert/extract cost will reduce the cost of insert, extracts, shuffling and scalarization. The adjustments of ScalaizationOverhead will increase the cost on integer, especially for small vectors. The end result will be lower cost for float and long-integer types, some higher cost for some smaller vectors. This, along with the raw insert/extract cost being lower, will generally mean more vectorization from the Loop and SLP vectorizer. We may end up regretting this, as that vectorization is not always profitable. In all the benchmarking I have done this is generally an improvement in the overall performance, and I've attempted to address the places where it wasn't with other costmodel adjustments. Differential Revision: https://reviews.llvm.org/D155459	2023-07-28 21:26:50 +01:00
Martin Storsjö	245ec675a4	Revert "[LV] Re-use existing broadcast value for live-ins." This reverts commit eea9258648ce73507f6f85c395de978af659d498. That commit triggered crashes in the following testcase: $ cat reduced.c typedef struct { int a[8] } b; typedef struct { b c; short d } e; void f() { int g; char h; e i = f; short j = i->d; int a = i->c->a[0]; for (;;) for (; g < a; g++) { h = j * i->d >> 8; h++; } } $ clang -target aarch64-linux-gnu -w -c -O2 reduced.c	2023-07-25 10:35:41 +03:00
Florian Hahn	eea9258648	[LV] Re-use existing broadcast value for live-ins. When requesting a vector value for a live-in, we can re-use the broadcast of the live-in of part 0 for parts > 0.	2023-07-24 11:50:47 +01:00
Maciej Gabka	38cdb007a5	Add missing SLEEF mappings to scalable vector functions for log2 and log2f In the original commit adding SLEEF mappings, https://reviews.llvm.org/D146839 mappings for log2/log2f were missing. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D155801	2023-07-21 13:59:13 +00:00
Maciej Gabka	b172fbff68	Revert "[TLI][AArch64] Add missing SLEEF mappings to scalable vector functions for log2 and log2f" This reverts commit 791c89600aaa288d7066aea95a1e06cd6d61b2e3.	2023-07-21 13:50:10 +00:00
Maciej Gabka	791c89600a	[TLI][AArch64] Add missing SLEEF mappings to scalable vector functions for log2 and log2f In the original commit adding SLEEF mappings, https://reviews.llvm.org/D146839 mappings for log2/log2f were missing. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D155623	2023-07-21 13:46:03 +00:00
David Green	2e0bf67df1	[LV][AArch64] Fix reductions costs in strict-fadd-cost.ll. NFC These tests were originally added in 0aff1798b5721d5f95d16f465b99d, where they were measuring the cost of fadd and fmuladd reductions, which should be fairly high cost. For some reason, due to the forced vector factors, the debug costs of each instruction are printed twice by the vectorizer. Once as if the instruction is a simple fadd/fmuladd, and later with the correct reduction cost. In d827865e9f778f5b27edb2afe003c2a the costs were updated to match the first print statements, where they would be better to match the second to test the cost of the reduction. This patch returns them to testing the original reduction costs.	2023-07-20 10:34:05 +01:00
Sander de Smalen	08fd44b300	[AArch64] Force streaming-compatible codegen when attributes are set. Before this patch, the only way to generate streaming-compatible code was to use the `-force-streaming-compatible-sve` flag, but the compiler should also avoid the use of instructions invalid in streaming mode when a function has the aarch64_pstate_sm_enabled/compatible attribute. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D155428	2023-07-18 10:26:00 +00:00
zhongyunde	4d2723bd00	[ValueTracking] Support vscale assumes for isKnownToBeAPowerOfTwo This patch is separated from D154953 to see what tests are affected by this change alone according comment. Depend on the related updating of LangRef on D155193. Reviewed By: paulwalker-arm, nikic, david-arm Differential Revision: https://reviews.llvm.org/D155350	2023-07-15 19:42:58 +08:00
Maciej Gabka	5b0e19a7ab	[TLI][AArch64] Add mappings to vectorized functions from ArmPL Arm Performance Libraries contain math library which provides vectorized versions of common math functions. This patch allows to use it with clang and llvm via -fveclib=ArmPL or -vector-library=ArmPL, so loops with such calls can be vectorized. The executable needs to be linked with the amath library. Arm Performance Libraries are available at: https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries Reviewed by: paulwalker-arm Differential Revision: https://reviews.llvm.org/D154508	2023-07-12 12:53:18 +00:00
Igor Kirillov	17bde328d6	[LV] Add mask support for vectorizing interleaved groups This patch extends LoopVectorize to handle the vectorization of interleaved memory accesses with scalable vectors when mask is required or/and predicated tail folding is enabled. Differential Revision: https://reviews.llvm.org/D152258	2023-06-29 17:50:56 +00:00
Michael Platings	54c79fa53c	[test] Replace aarch64--eabi with aarch64 Also replace aarch64_be--eabi with aarch64_be Using "eabi" for aarch64 targets is a common mistake and warned by Clang Driver. We want to avoid it elsewhere as well. Just use the common "aarch64" without other triple components. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D153943	2023-06-29 09:06:00 +01:00
Igor Kirillov	7049393a58	[LV] Precommit masked interleaved access tests Precommit for D152258. Differential Revision: https://reviews.llvm.org/D153443	2023-06-28 09:23:23 +00:00
Fangrui Song	ebbfdca586	[test] Replace aarch64-arm-none-eabi with aarch64 Similar to 02e9441d6ca73314afa1973a234dce1e390da1da, but for llvm/test and one lld/test/ELF test.	2023-06-27 19:36:27 -07:00
Nikita Popov	9cf67f6ea0	[LoopVectorize] Convert most tests to opaque pointers (NFC) The unsized-pointee-crash.ll and zero-sized-pointee-crash.ll tests have been removed, because these issues are not relevant for opaque pointers.	2023-06-12 13:10:22 +02:00
Graham Hunter	95bfb1902d	[LV][AArch64] Allow (limited) interleaving for scalable vectors This patch uses the (de)interleaving intrinsics introduced in D141924 to handle vectorization of interleaving groups with a factor of 2 for scalable vectors. Reviewed By: fhahn, reames Differential Revision: https://reviews.llvm.org/D145163	2023-06-09 11:42:10 +01:00
zhongyunde	df19d87227	[LV] Add option to tune the cost model, NFC For Neon, the default nonconst stride cost is conservative, and it is a local variable, which is not convenience to to tune the loop vectorize. So I try to use a option, which is similar to SVEGatherOverhead brought in D115143. Fix https://github.com/llvm/llvm-project/issues/63082. Reviewed By: dmgreen, fhahn Differential Revision: https://reviews.llvm.org/D152253	2023-06-07 22:08:29 +08:00
Florian Hahn	8098f2577e	[LV] Use Legal::isUniform to detect uniform pointers. Update collectLoopUniforms to identify uniform pointers using Legal::isUniform. This is more powerful and brings pointer classification here in sync with setCostBasedWideningDecision which uses isUniformMemOp. The existing mis-match in reasoning can causes crashes due to D134460, which is fixed by this patch. Fixes https://github.com/llvm/llvm-project/issues/60831. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D150991	2023-05-30 16:42:55 +01:00
Nikita Popov	d2502eb091	[KnownBits] Add support for nuw/nsw on shifts Implement precise nuw/nsw support in the KnownBits implementation, replacing the rather crude handling in ValueTracking. Differential Revision: https://reviews.llvm.org/D151208	2023-05-25 10:17:10 +02:00
Florian Hahn	299f0ff60e	[VPlan] Print IR flags for VPRecipeWithIRFlags. Now that IR flags are modeled as part of VPRecipeWithIRFlags, include the flags when printing recipes. Depends on D150027. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D150029	2023-05-23 20:36:16 +01:00
Dinar Temirbulatov	1ff828c6c8	[AArch64][LV] Disable maximising bandwidth for streaming compatible sve We noticed some runtime performance improvements by disabling maximising bandwidth for streaming compatible sve. Differential Revision: https://reviews.llvm.org/D150336	2023-05-23 12:58:19 +00:00
David Sherwood	c7dbe326df	[AArch64][LoopVectorize] Enable tail-folding of simple loops on neoverse-v1 This patch enables the tail-folding of simple loops by default when targeting the neoverse-v1 CPU. Simple loops exclude those with recurrences or reductions or loops that are reversed. New tests have been added here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll In terms of SPEC2017 only one benchmark is really affected when building with "-Ofast -mcpu=neoverse-v1 -flto", which is (+ faster, - slower): 525.x264: +7.0% Differential Revision: https://reviews.llvm.org/D130618	2023-05-18 10:35:57 +00:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00

1 2 3 4 5 ...

484 Commits