llvm-project

Author	SHA1	Message	Date
annamthomas	866ac9a165	[LV] Address postcommit review for PR84782 (#84797 ) This testcase was added to show miscompile in https://github.com/llvm/llvm-project/issues/81872	2024-03-11 13:23:00 -04:00
annamthomas	34acdb3ec2	Precommit testcase for pr81872 (#84782 ) Testcase shows miscompile when dropping disjoint flag from disjoint or during vectorization.	2024-03-11 12:16:52 -04:00
Cameron McInally	416debf79b	[test] Move pr73894.ll to AArch64 directory and update the target triple (#84269 ) pr73894.ll is failing on a number of non-AArch64 buildbots. I'm not certain that this is a proper fix, but I think it's best to move the test to the test/Transforms/LoopVectorize/AArch64/ directory and replace the triple with one commonly used in that directory. llvm#73894	2024-03-06 21:25:28 -05:00
Cameron McInally	012d217174	[LV] Use scalar CMP for active-lane-mask with scalar VF (#83902 ) Instead of generating a <1 x i1> active lane mask intrinsic, generate the equivalent scalar ICMP instead. This allows us to avoid unnecessarily extracting the scalar part from the vector mask. Fixes llvm#73894.	2024-03-06 15:59:35 -05:00
Niwin Anto	eaf0d82529	[LV] Disable fold tail by masking when IV is used outside (#81609 ) When induction variable are used outside the loop body, tail folding by masking mis-compiles, because for users outside of the loop the final value of the induction is computed separately from the vector loop. Fixes https://github.com/llvm/llvm-project/issues/76069 Fixes https://github.com/llvm/llvm-project/issues/51677	2024-03-04 11:33:30 +00:00
Shih-Po Hung	6ee9c8afbc	[RISCV][CostModel] Updates reduction and shuffle cost (#77342 ) - Make `andi` cost 1 in SK_Broadcast - Query the cost of VID_V, VRSUB_VX/VRSUB_VI which would scale with LMUL	2024-02-29 15:41:19 +08:00
Nilanjana Basu	1c211bc76e	[LV] Remove unused configuration option (#82955 ) Recent set of changes (PR #67725) in loop interleaving algorithm caused removal of the loop trip count threshold for allowing interleaving. Therefore configuration option interleave-small-loop-scalar-reduction is no longer needed.	2024-02-28 10:17:25 -08:00
Niwin Anto	ce0687e2df	[LV] Add test for tail fold by masking with external IV users. (#82329 ) Test case for https://github.com/llvm/llvm-project/issues/76069	2024-02-28 13:46:00 +00:00
Florian Hahn	15d9d0fa8f	[VPlan] Also print final VPlan directly before codegen/execute. (#82269 ) Some optimizations are apply after UF and VF have been chosen. This patch adds an extra print of the final VPlan just before codegen/execution. In the future, there will be additional transforms that are applied later (interleaving for example). PR: https://github.com/llvm/llvm-project/pull/82269	2024-02-28 13:19:43 +00:00
Florian Hahn	e421c12e47	[VPlan] Remove left-over CHECK-NOT line. This removes a CHECK-NOT: vector.body line from the test which seems to imply the test does not get vectorized, but it does now. This line was left over from when the test was pre-committed, remove it.	2024-02-27 09:38:40 +00:00
Florian Hahn	911055e34f	[VPlan] Consistently use (Part, 0) for first lane scalar values (#80271 ) At the moment, some VPInstructions create only a single scalar value, but use VPTransformatState's 'vector' storage for this value. Those values are effectively uniform-per-VF (or in some cases uniform-across-VF-and-UF). Using the vector/per-part storage doesn't interact well with other recipes, that more accurately using (Part, Lane) to look up scalar values and prevents VPInstructions creating scalars from interacting with other recipes working with scalars. This PR tries to unify handling of scalars by using (Part, 0) for scalar values where only the first lane is demanded. This allows using VPInstructions with other recipes like VPScalarCastRecipe and is also needed when using VPInstructions in more cases otuside the vector loop region to generate scalars. Depends on https://github.com/llvm/llvm-project/pull/80269	2024-02-26 19:06:43 +00:00
Benjamin Kramer	e7c60915e6	Remove duplicated REQUIRES: asserts	2024-02-23 12:01:30 +01:00
Ramkumar Ramachandra	f5c8e9e531	LoopVectorize/test: guard pr72969 with asserts (#82653 ) Follow up on 695a9d8 (LoopVectorize: add test for crash in #72969) to guard pr72969.ll with REQUIRES: asserts, in order to be reasonably confident that it will crash reliably.	2024-02-22 19:55:18 +00:00
Benjamin Kramer	3168af56bc	LoopVectorize: Mark crash test as requiring assertions	2024-02-22 20:25:58 +01:00
Philip Reames	f67ef1a8d9	[RISCV][LV] Add additional small trip count loop coverage	2024-02-22 08:30:25 -08:00
Philip Reames	9eb5f94f9b	[RISCV][AArch64] Add vscale_range attribute to tests per architecture minimums Spent a bunch of time tracing down an odd issue "in SCEV" which turned out to be the fact that SCEV doesn't have access to TTI. As a result, the only way for it to get range facts on vscales (to avoid collapsing ranges of element counts and type sizes to trivial ranges on multiplies) is to look at the vscale_range attribute. Since vscale_range is set by clang by default, manually setting it in the tests shouldn't interfere with the test intent.	2024-02-22 08:11:24 -08:00
Ramkumar Ramachandra	695a9d84dc	LoopVectorize: add test for crash in #72969 (#74111 )	2024-02-22 16:00:33 +00:00
Florian Hahn	9923d29cfa	[VPlan] Merge main VPlan verifer with HCFG verifier. Unify VPlan verifiers in verifyVPlanIsValid. This adds verification for various properties on blocks to the verifier used for VPlans generated by the inner loop vectorizer. It also adds def-use checks for the verifier used in the VPlan native path. This drops the separate flag to enable HCFG verification. Instead, all VPlans are verified once they have been created, if assertions are enabled. This also removes VPWidenPHIRecipe from VPHeaderPHIRecipe; it is used to model any phi node in the native path.	2024-02-20 16:43:57 +00:00
Florian Hahn	0dacba3ad1	[VPlan] Handle truncating ICMPs in truncateToMinimalBWs. Update truncateToMinimalBitwidths to handle truncating ICMPs. For ICMPs, the new target type will be the same as the original type. In that case, only truncate the operands, but skip the extend. This is in line with what the original truncateToMinimalBitwidths did for compares. Fixes https://github.com/llvm/llvm-project/issues/81415.	2024-02-16 12:58:56 +00:00
Rohit Aggarwal	36adfec155	Adding support of AMDLIBM vector library (#78560 ) Hi, AMD has it's own implementation of vector calls. This patch include the changes to enable the use of AMD's math library using -fveclib=AMDLIBM. Please refer https://github.com/amd/aocl-libm-ose --------- Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>	2024-02-15 12:13:07 +05:30
David Sherwood	1c10821022	[LoopVectorize] Fix divide-by-zero bug (#80836 ) (#81721 ) When attempting to use the estimated trip count to refine the costs of the runtime memory checks we should also check for sane trip counts to prevent divide-by-zero faults on some platforms. Fixes #80836	2024-02-14 16:07:51 +00:00
Fangrui Song	3d18c8cd26	[test] Replace aarch64-*-{eabi,gnueabi}{,hf} with aarch64 Similar to d39b4ce3ce8a3c256e01bdec2b140777a332a633 Using "eabi" or "gnueabi" for aarch64 targets is a common mistake and warned by Clang Driver. We want to avoid them elsewhere as well. Just use the common "aarch64" without other triple components.	2024-02-12 18:29:55 -08:00
Nikita Popov	7c0d52ca91	[ValueTracking] Support dominating known bits condition in and/or (#74728 ) This extends computeKnownBits() support for dominating conditions to also handle and/or conditions. We'll look through either and or or depending on which edge we're considering. This change is mainly for the sake of completeness, so we don't start missing optimizations if SimplifyCFG decides to merge some branches.	2024-02-08 09:47:49 +01:00
Philip Reames	1aafe7605b	[test] Regen a test for naming changes	2024-02-06 18:06:24 -08:00
Philip Reames	c5bf1f4b8f	[test] Autogen a test for ease of update in forthcoming patch	2024-02-06 17:59:54 -08:00
Nilanjana Basu	c1c5b854ad	[LV] Remove loop trip count threshold for deciding whether to interleave a loop (#67725 ) A set of microbenchmarks (https://github.com/llvm/llvm-test-suite/pull/26) showed that loop interleaving can be beneficial for loops with low trip count as well. Loop interleaving count computation is updated accordingly in prior patches while this patch removes the loop trip count threshold for interleaving.	2024-02-05 17:23:58 -08:00
Florian Hahn	8cb2de7fae	[VPlan] Implement type inference for ICmp. This fixes a crash in the attached test case due to missing type inference for ICmp VPInstructions.	2024-02-05 15:42:07 +00:00
Nikita Popov	2d69827c5c	[Transforms] Convert tests to opaque pointers (NFC)	2024-02-05 11:57:34 +01:00
Florian Hahn	47abbf4fe9	[VPlan] Update VPInst::onlyFirstLaneUsed to check users. (#80269 ) A VPInstruction only has its first lane used if all users use its first lane only. Use vputils::onlyFirstLaneUsed to continue checking the recipe's users to handle more cases. Besides allowing additional introduction of scalar steps when interleaving in some cases, this also enables using an Add VPInstruction to model the increment - as a follow up.	2024-02-03 16:19:10 +00:00
Maciej Gabka	0f26441cb8	[TLI][AArch64] Adjust TLI mappings to vector functions taking linear pointers (#80296 ) The masked symbols in SLEEF are incorrectly implemented as calls to non masked variants, what only works fine for functions which do not modify memory. For vector variants which modify memory we can only use a non masked symbols for now. The SVE ArmPL mappings need to be removed for now as well.	2024-02-02 08:42:29 +00:00
Florian Hahn	cec24f0d7e	[VPlan] Update stale test after 9536a6286, fix formatting.	2024-01-31 13:45:38 +00:00
Florian Hahn	9536a6286e	[VPlan] Preserve original induction order when creating scalar steps. Update createScalarIVSteps to take an insert point as parameter. This ensures that the inserted scalar steps are in the same order as the recipes they replace (vs in reverse order as currently). This helps to reduce the diff for follow-up changes.	2024-01-31 13:31:28 +00:00
Nilanjana Basu	c492eb6b28	[LV] Update interleaving count computation when scalar epilogue loop needs to run at least once (#79651 ) Update loop interleaving count computation to address loops that require at least one scalar iteration in the epilogue loop. For this case, the available trip count for interleaving the loop is one less.	2024-01-29 13:41:15 -08:00
Nilanjana Basu	155f24b11e	[Tests][LV][AArch64] Pre-commit tests for changing loop interleaving count computation for loops that need to run scalar iterations (#79640 ) This patch contains a set of pre-commit tests for changing the loop interleaving count computation in a subsequent patch in order to address loops that need to execute at least a single scalar iteration in the epilogue.	2024-01-29 10:21:23 -08:00
David Sherwood	962fbafecf	[LoopVectorize] Refine runtime memory check costs when there is an outer loop (#76034 ) When we generate runtime memory checks for an inner loop it's possible that these checks are invariant in the outer loop and so will get hoisted out. In such cases, the effective cost of the checks should reduce to reflect the outer loop trip count. This fixes a 25% performance regression introduced by commit 49b0e6dcc296792b577ae8f0f674e61a0929b99d when building the SPEC2017 x264 benchmark with PGO, where we decided the inner loop trip count wasn't high enough to warrant the (incorrect) high cost of the runtime checks. Also, when runtime memory checks consist entirely of diff checks these are likely to be outer loop invariant.	2024-01-26 14:43:48 +00:00
Florian Hahn	731c2049a4	[VPlan] Relax IV user assertion after 0ab539f for epilogue vec. After 0ab539fd6748adf2f638e10514dd9419597d8863, the canonical IV in the epilogue vector loop may be used by a trunc. Relax the corresponding assert. This should fix some build-bot failures, including https://lab.llvm.org/buildbot/#/builders/187/builds/14113 https://lab.llvm.org/buildbot/#/builders/98/builds/32350 https://lab.llvm.org/buildbot/#/builders/239/builds/5473	2024-01-26 13:19:25 +00:00
Graham Hunter	d4c0171423	[LV] Fix handling of interleaving linear args (#78725 ) Currently when interleaving vector calls with linear arguments, the Part is ignored and all vector calls use the initial value from the first lane of the current iteration. Fix this to extract from the correct part of the linear vector.	2024-01-26 11:30:35 +00:00
Florian Hahn	0ab539fd67	[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. (#78113 ) Add a new recipe to model scalar cast instructions, without relying on an underlying instruction. This allows creating scalar casts, without relying on an underlying instruction (like the current VPReplicateRecipe). The new recipe is used to explicitly model both truncating the induction step and the VPDerivedIVRecipe, thus simplifying both the recipe and code needed to introduce it. Truncating VPWidenIntOrFpInductionRecipes should also be modeled using the new recipe, as follow-up. PR: https://github.com/llvm/llvm-project/pull/78113	2024-01-26 11:13:05 +00:00
David Spickett	4a91206359	[llvm][LV] Move new test into X86 subfolder Added in a04f6152914ea21f3068aaba9d8fc21d2e703d3e. Failing on our Arm only bots: https://lab.llvm.org/buildbot/#/builders/245/builds/19684	2024-01-25 17:04:34 +00:00
Florian Hahn	a04f615291	[LV] Check for innermost loop instead of EnableVPlanNativePath in CM. Replace EnableVPlanNativePath checks in the cost-model by assertions that the code is only called for innermost loops. This ensures that the cost model isn't used in the VPlanNativePath, which is only used for outer-loop vectorization. Even with EnableVPlanNativePath, inner loops are processed by the inner loop vectorization path, not the native path, so checking for EnableVPlanNativePath may impact decisions for inner loops and can cause crashes, like in the attached test case.	2024-01-25 12:49:52 +00:00
Nikita Popov	90ba33099c	[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882 ) This patch canonicalizes getelementptr instructions with constant indices to use the `i8` source element type. This makes it easier for optimizations to recognize that two GEPs are identical, because they don't need to see past many different ways to express the same offset. This is a first step towards https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699. This is limited to constant GEPs only for now, as they have a clear canonical form, while we're not yet sure how exactly to deal with variable indices. The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives two representative examples of the kind of optimization improvement we expect from this change. In the first test SimplifyCFG can now realize that all switch branches are actually the same. In the second test it can convert it into simple arithmetic. These are representative of common optimization failures we see in Rust. Fixes https://github.com/llvm/llvm-project/issues/69841.	2024-01-24 15:25:29 +01:00
wanglei	fcff4582f0	[LoongArch] Permit auto-vectorization using LSX/LASX with `auto-vec` feature (#78943 ) With enough codegen complete, we can now correctly report the size of vector registers for LSX/LASX, allowing auto vectorization (The `auto-vec` feature needs to be enabled simultaneously). As described, the `auto-vec` feature is an experimental one. To ensure that automatic vectorization is not enabled by default, because the information provided by the current `TTI` cannot yield additional benefits for automatic vectorization.	2024-01-23 09:06:35 +08:00
Alexandros Lamprineas	530c72b498	[TLI] Add missing ArmPL mappings (#78474 ) Adds TLI mappings for fixed and scalable vector variants of cospi(f), fmax(f), ilogb(f) and ldexp(f).	2024-01-22 17:15:17 +00:00
Jay Foad	7017efa1a1	Fix typo "widended"	2024-01-19 13:50:26 +00:00
Graham Hunter	689da340ed	[NFC][LV] Test precommit for interleaved linear args	2024-01-19 12:59:09 +00:00
Alexandros Lamprineas	92289db82f	[VFABI] Move the Vector ABI demangling utility to LLVMCore. (#77513 ) This fixes #71892 allowing us to check magled names in the IR verifier.	2024-01-17 09:55:30 +00:00
Fangrui Song	9e9907f1cf	[AMDGPU,test] Change llc -march= to -mtriple= (#75982 ) Similar to 806761a7629df268c8aed49657aeccffa6bca449. For IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. amdgpu-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly. This patch changes AMDGPU tests to not rely on the default OS/environment components. Tests that need fixes are not changed: ``` LLVM :: CodeGen/AMDGPU/fabs.f64.ll LLVM :: CodeGen/AMDGPU/fabs.ll LLVM :: CodeGen/AMDGPU/floor.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.ll LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll LLVM :: CodeGen/AMDGPU/schedule-if-2.ll ```	2024-01-16 21:54:58 -08:00
Maciej Gabka	279dfe77da	[TLI][AArch64] Add extra SLEEF mappings and tests (#78140 ) This patch is adding more scalar to vector mappings to the TLI for the SLEEF vector library. The added mappings are for the following functions: acosh, asinh, cbrt, copysign, cospi erf, erfc, expm1, fdim, fma, fmax, fmin hypot, ilogb, ldexp, log1p, nextafter, sinpi. It also brings back accidentally removed tests for sincospi.	2024-01-16 14:51:38 +00:00
Mel Chen	b6e8f6604c	[LV] Skipping all debug instructions when native vplan is enabled (#77413 ) The following internal error occurred when using native vplan to vectorize the program with the debug info generation. Assertion `!isa<DbgInfoIntrinsic>(CI) && "DbgInfoIntrinsic should have been dropped during VPlan construction"' failed. This patch ignored all debug instructions to fix the error when native vplan is enabled.	2024-01-16 11:08:10 +08:00
Jonas Paulsson	62b7e35f10	[SystemZ] Don't assert for i128 vectors in getInterleavedMemoryOpCost() (#78009 ) This assert does not seem justified given that the LoopVectorizer can form interleave groups containing i128 elements where the number of elements per vector is indeed just one.	2024-01-15 17:31:18 +01:00

1 2 3 4 5 ...

2363 Commits