llvm-project

Author	SHA1	Message	Date
Florian Hahn	3ec6f805c5	[VPlan] Don't created GEP x, 0 for interleave group pointers. The GEP with offet 0 is redundant, remove it. This addresses a TODO from 7f74651837b ((#106431).	2024-10-08 12:08:13 +01:00
Florian Hahn	ea83e1c05a	[LV] Assign cost to all interleave members when not interleaving. At the moment, the full cost of all interleave group members is assigned to the instruction at the group's insert position, even if the decision was to not form an interleave group. This can lead to inaccurate cost estimates, e.g. if the instruction at the insert position is dead. If the decision is to not vectorize but scalarize or scather/gather, then the cost will be to total cost for all members. In those cases, assign individual the cost per member, to more closely reflect to choice per instruction. This fixes a divergence between legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/108098.	2024-09-11 21:04:34 +01:00
Philip Reames	2c7786e94a	Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770 ) This is a follow up to 924907bc6, and is mostly motivated by consistency but does include one additional optimization. In general, we prefer 0.0 over -0.0 as the identity value for an fadd. We use that value in several places, but don't in others. So, let's be consistent and use the same identity (when nsz allows) everywhere. This creates a bunch of test churn, but due to 924907bc6, most of that churn doesn't actually indicate a change in codegen. The exception is that this change enables the use of 0.0 for nsz, but not reasoc, fadd reductions. Or said differently, it allows the neutral value of an ordered fadd reduction to be 0.0.	2024-09-03 09:16:37 -07:00
Nikita Popov	a105877646	[InstCombine] Remove some of the complexity-based canonicalization (#91185 ) The idea behind this canonicalization is that it allows us to handle less patterns, because we know that some will be canonicalized away. This is indeed very useful to e.g. know that constants are always on the right. However, this is only useful if the canonicalization is actually reliable. This is the case for constants, but not for arguments: Moving these to the right makes it look like the "more complex" expression is guaranteed to be on the left, but this is not actually the case in practice. It fails as soon as you replace the argument with another instruction. The end result is that it looks like things correctly work in tests, while they actually don't. We use the "thwart complexity-based canonicalization" trick to handle this in tests, but it's often a challenge for new contributors to get this right, and based on the regressions this PR originally exposed, we clearly don't get this right in many cases. For this reason, I think that it's better to remove this complexity canonicalization. It will make it much easier to write tests for commuted cases and make sure that they are handled.	2024-08-21 12:02:54 +02:00
David Green	dcd246cbde	[ARM] Add scalar add_sat costs. (#100988 ) These can usually generate: - qadd / qsub for signed i32 scalars - uqadd16 / qadd16 / uqsub16 / qsub16 with an extend for signed/unsigned i8/i16 - Are expanded to an add + cmp + sel otherwise This can lead to differences in unrolling etc, but should be a better cost for the instructions.	2024-08-05 18:56:04 +01:00
Florian Hahn	c8c0b18b5d	[LV] Update tests to not have dead interleave groups. Update existing tests with dead interleave groups by adding users. This ensures the tests keep testing what they were intended to test with a planned change to skip unused instructions in cost computations.	2024-07-21 14:03:40 +01:00
Florian Hahn	99d6c6d936	[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651 ) This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block also requires modeling the successor blocks. This is done using the recently introduced VPIRBasicBlock. Note that the middle.block is still created as part of the skeleton and then patched in during VPlan execution. Unfortunately the skeleton needs to create the middle.block early on, as it is also used for induction resume value creation and is also needed to properly update the dominator tree during skeleton creation. After this patch lands, I plan to move induction resume value and phi node creation in the scalar preheader to VPlan. Once that is done, we should be able to create the middle.block in VPlan directly. This is a re-worked version based on the earlier https://reviews.llvm.org/D150398 and the main change is the use of VPIRBasicBlock. Depends on https://github.com/llvm/llvm-project/pull/92525 PR: https://github.com/llvm/llvm-project/pull/92651	2024-07-05 10:08:42 +01:00
Ramkumar Ramachandra	0f111ba790	LoopInfo: introduce Loop::getLocStr; unify debug output (#93051 ) Introduce a Loop::getLocStr stolen from LoopVectorize's static function getDebugLocString in order to have uniform debug output headers across LoopVectorize, LoopAccessAnalysis, and LoopDistribute. The motivation for this change is to have UpdateTestChecks recognize the headers and automatically generate CHECK lines for debug output, with minimal special-casing.	2024-06-25 13:12:15 +01:00
Florian Hahn	3808ba78de	[VPlan] Model middle block via VPIRBasicBlock. (#95816 ) Use VPIRBasicBlock to wrap the middle block and implement patching up branches in predecessors in VPIRBasicBlock::execute. The IR middle block is only created after skeleton creation. Initially a regular VPBasicBlock is created, which will later be replaced by a VPIRBasicBlock once the middle IR basic block has been created. Note that this slightly changes the order of instructions created in the middle block; code generated by recipe execution in the middle block will now be inserted before the terminator (and in between the compare to used by the terminator). The original order will be restored in https://github.com/llvm/llvm-project/pull/92651. PR: https://github.com/llvm/llvm-project/pull/95816	2024-06-20 13:42:20 +01:00
David Green	46541a3636	[ARM] Add a extra MVE low-trip-count loop. NFC This makes use of half floats, which makes the masked stores expensive.	2024-05-23 21:50:47 +01:00
David Green	d486a4c29a	[ARM] Ensure extra uses are not dead in tail-folding-counting-down.ll. NFC This might help keep the test valid if vplan is removing dead intructions.	2024-04-29 15:47:24 +01:00
Michele Scandale	09eb9f1136	[InstCombine] Fix for folding `select` into floating point binary operators. (#83200 ) Folding a `select` into a floating point binary operators can only be done if the result is preserved for both case. In particular, if the other operand of the `select` can be a NaN, then the transformation won't preserve the result value.	2024-03-19 09:47:07 -07:00
Nikita Popov	90ba33099c	[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882 ) This patch canonicalizes getelementptr instructions with constant indices to use the `i8` source element type. This makes it easier for optimizations to recognize that two GEPs are identical, because they don't need to see past many different ways to express the same offset. This is a first step towards https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699. This is limited to constant GEPs only for now, as they have a clear canonical form, while we're not yet sure how exactly to deal with variable indices. The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives two representative examples of the kind of optimization improvement we expect from this change. In the first test SimplifyCFG can now realize that all switch branches are actually the same. In the second test it can convert it into simple arithmetic. These are representative of common optimization failures we see in Rust. Fixes https://github.com/llvm/llvm-project/issues/69841.	2024-01-24 15:25:29 +01:00
Florian Hahn	241fe83704	[VPlan] Introduce ComputeReductionResult VPInstruction opcode. (#70253 ) This patch introduces a new ComputeReductionResult opcode to compute the final reduction result in the middle block. The code from fixReduction has been moved to ComputeReductionResult, after some earlier cleanup changes to model parts of fixReduction explicitly elsewhere as needed. The recipe may be broken down further in the future. Note that the phi nodes to merge the reduction result from the trip count check and the middle block, to be used as resume value for the scalar remainder loop are also generated based on ComputeReductionResult. Once we have a VPValue for the reduction result, this can also be modeled explicitly and moved out of the recipe.	2024-01-04 22:53:18 +00:00
Florian Hahn	f18536d642	[VPlan] Model address separately. (#72164 ) Move vector pointer generation to a separate VPVectorPointerRecipe. This untangles address computation from the memory recipes future and is also needed to enable explicit unrolling in VPlan. https://github.com/llvm/llvm-project/pull/72164	2024-01-01 19:51:15 +00:00
Nikita Popov	d77067d08a	[ValueTracking] Add dominating condition support in computeKnownBits() (#73662 ) This adds support for using dominating conditions in computeKnownBits() when called from InstCombine. The implementation uses a DomConditionCache, which stores which branches may provide information that is relevant for a given value. DomConditionCache is similar to AssumptionCache, but does not try to do any kind of automatic tracking. Relevant branches have to be explicitly registered and invalidated values explicitly removed. The necessary tracking is done inside InstCombine. The reason why this doesn't just do exactly the same thing as AssumptionCache is that a lot more transforms touch branches and branch conditions than assumptions. AssumptionCache is an immutable analysis and mostly gets away with this because only a handful of places have to register additional assumptions (mostly as a result of cloning). This is very much not the case for branches. This change regresses compile-time by about ~0.2%. It also improves stage2-O0-g builds by about ~0.2%, which indicates that this change results in additional optimizations inside clang itself. Fixes https://github.com/llvm/llvm-project/issues/74242.	2023-12-06 14:17:18 +01:00
Nikita Popov	eecb99c5f6	[Tests] Add disjoint flag to some tests (NFC) These tests rely on SCEV looking recognizing an "or" with no common bits as an "add". Add the disjoint flag to relevant or instructions in preparation for switching SCEV to use the flag instead of the ValueTracking query. The IR with disjoint flag matches what InstCombine would produce.	2023-12-05 14:09:36 +01:00
Craig Topper	7ec4f6094e	[InstCombine] Infer disjoint flag on Or instructions. (#72912 ) The disjoint flag was recently added to IR in #72583 We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.	2023-12-02 14:11:12 -08:00
Nikita Popov	5918f62301	[InstCombine] Infer zext nneg flag (#71534 ) Use KnownBits to infer the nneg flag on zext instructions. Currently we only set nneg when converting sext -> zext, but don't set it when we have a zext in the first place. If we want to use it in optimizations, we should make sure the flag inference is consistent.	2023-11-08 09:34:40 +01:00
David Sherwood	07f0e75b53	[LoopVectorize] Fix bug with code to hoist runtime checks (#70937 ) There was a silly mistake in the expandBounds function that was using the wrong type when calling expandCodeFor and always assuming the stride is 64 bits. I've added the following test to defend this fix: Transforms/LoopVectorize/ARM/mve-hoist-runtime-checks.ll	2023-11-02 10:02:50 +00:00
Igor Kirillov	b84977bcc1	Rename test to avoid overlapping with debug output	2023-10-19 12:21:31 +00:00
Florian Hahn	f108c6cdc1	[VPlan] Fold (MUL A, 1) -> A as VPlan2VPlan transform. Add first VPlan-based recipe simplification to fold (MUL A, 1) -> A. Among other things, this enables additional simplifications after applying versioned strides, as follow up to D147783. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D159200	2023-09-18 21:45:34 +01:00
Florian Hahn	96e83d3705	[LV] Use IRBuilder to create and optimize middle-block compare. Split off from D150398 to avoid builder-related diff changes there. Using IRBuilder to create ICmps simplifies the result if both operands are constants. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158332	2023-08-29 11:42:18 +01:00
Florian Hahn	707359ecf5	Recommit "[LV] Re-use existing broadcast value for live-ins." This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53. Recommits eea9258648ce with a fix to only erase the instruction from the first part if it is defined outside the loop. This fixes a use-after-free error reported.	2023-08-01 15:54:02 +01:00
Nikita Popov	d01aec4c76	[InstCombine] Set dead phi inputs to poison in more cases Set phi inputs to poison whenever we find a dead edge (either during initial worklist population or the main InstCombine run), instead of only doing this for successors of dead blocks. This means that the phi operand is set to poison even if for critical edges without an intermediate block. There are quite a few test changes, because the pattern is fairly common in vectorizer output, for cases where we know the vectorized loop will be entered.	2023-08-01 11:53:47 +02:00
Nikita Popov	7c64449e44	[LoopVectorize] Regenerate test checks (NFC) To reduce spurious diffs in future changes.	2023-08-01 11:30:55 +02:00
Martin Storsjö	245ec675a4	Revert "[LV] Re-use existing broadcast value for live-ins." This reverts commit eea9258648ce73507f6f85c395de978af659d498. That commit triggered crashes in the following testcase: $ cat reduced.c typedef struct { int a[8] } b; typedef struct { b c; short d } e; void f() { int g; char h; e i = f; short j = i->d; int a = i->c->a[0]; for (;;) for (; g < a; g++) { h = j * i->d >> 8; h++; } } $ clang -target aarch64-linux-gnu -w -c -O2 reduced.c	2023-07-25 10:35:41 +03:00
Florian Hahn	eea9258648	[LV] Re-use existing broadcast value for live-ins. When requesting a vector value for a live-in, we can re-use the broadcast of the live-in of part 0 for parts > 0.	2023-07-24 11:50:47 +01:00
Nikita Popov	bb3763e497	Revert "[SimplifyCFG] Allow dropping block that only contains ephemeral values" This reverts commit 20f0c68fd83a0147a8ec1722bd2e848180610288. https://reviews.llvm.org/D153966#4464594 reports an optimization regression in Rust. Additionally this change has caused an unexpected 0.3% compile-time regression.	2023-06-30 21:24:05 +02:00
Nikita Popov	20f0c68fd8	[SimplifyCFG] Allow dropping block that only contains ephemeral values Perform the TryToSimplifyUncondBranchFromEmptyBlock() transform if the block is empty except for ephemeral values. The ephemeral values will be dropped in that case. This makes sure that assumes don't block this transforms, as reported in https://discourse.llvm.org/t/llvm-assume-blocks-optimization/71609. Differential Revision: https://reviews.llvm.org/D153966	2023-06-30 15:24:01 +02:00
Florian Hahn	d209084720	[VPlan] Replace versioned stride with constant during VPlan opts. After constructing the initial VPlan, replace VPValues for versioned strides with their constant counterparts. Differential Revision: https://reviews.llvm.org/D147783	2023-06-13 08:26:55 +01:00
Nikita Popov	9cf67f6ea0	[LoopVectorize] Convert most tests to opaque pointers (NFC) The unsized-pointee-crash.ll and zero-sized-pointee-crash.ll tests have been removed, because these issues are not relevant for opaque pointers.	2023-06-12 13:10:22 +02:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Nikita Popov	2ec1d0f427	[InstCombine] Don't reassociate GEPs for loop invariance Since D146813, LICM will reassociate GEPs to expose hoisting opportunities itself. Don't perform this transform in InstCombine, where it is fragile because it depends on an optional LoopInfo analysis.	2023-04-18 12:17:07 +02:00
Nikita Popov	39ca70392b	[LV] Regenerate test checks (NFC)	2023-04-18 11:52:36 +02:00
Philip Reames	78ae870f11	{tests] Rerun autogen to reduce a diff [nfc]	2023-03-31 12:47:08 -07:00
David Green	3f8160e9ce	[LV][ARM][AArch64] Add multi-exit Loop Vectorizer tests. NFC These are useful to test with the various predication schemes available on different targets.	2023-03-27 10:21:24 +01:00
Sanjay Patel	ef6f23535d	Revert "[InstCombine] use loop info when running the pass after loop vectorization" This reverts commit 43ae4b62b2671cf73e691c0b53324cd39405cd51. This was intended to be practically NFC in terms of the overall opt pipeline, but there is experimental data showing that code changes occurred here: https://llvm-compile-time-tracker.com/compare.php?from=772aa05452f8ff90a47168e6801cda2acb5a1873&to=43ae4b62b2671cf73e691c0b53324cd39405cd51&stat=size-text	2023-03-11 17:28:56 -05:00
Sanjay Patel	43ae4b62b2	[InstCombine] use loop info when running the pass after loop vectorization This is the follow-up to D144199 and suggestion from D144045. We make use of loop info explicit via InstCombine pass parameter rather than semi-arbitrary via caching. The only InstCombine transform that uses LoopInfo currently is a GEP fold in visitGEPOfGEP(), so that shows up as a failure in the dedicated test for the fold as well as several LoopVectorizer tests that run extra passes. I don't see any pass manager regression tests that actually check for pass options, but this is intended to be NFC for the pass pipeline behavior - we only try to use loop info where it would have been used before via caching . Differential Revision: https://reviews.llvm.org/D144274	2023-03-11 14:20:30 -05:00
Nikita Popov	9ed2f14c87	[AsmParser] Remove typed pointer auto-detection IR is now always parsed in opaque pointer mode, unless -opaque-pointers=0 is explicitly given. There is no automatic detection of typed pointers anymore. The -opaque-pointers=0 option is added to any remaining IR tests that haven't been migrated yet. Differential Revision: https://reviews.llvm.org/D141912	2023-01-18 09:58:32 +01:00
Paul Walker	eae26b6640	[IRBuilder] Use canonical i64 type for insertelement index used by vector splats. Instcombine prefers this canonical form (see getPreferredVectorIndex), as does IRBuilder when passing the index as an integer so we may as well use the prefered form from creation. NOTE: All test changes are mechanical with nothing else expected beyond a change of index type from i32 to i64. Differential Revision: https://reviews.llvm.org/D140983	2023-01-11 14:08:06 +00:00
Florian Hahn	68469a80cb	[LV] Disable runtime unrolling for vectorized loops. This patch adds metadata to disable runtime unrolling to the vectorized loop. If runtime unrolling/interleaving is considered profitable, LV will interleave the loop directly. There should be no need to perform runtime unrolling at a later stage. Note that we already add metadata to disable runtime unrolling to the scalar loop after vectorization. The additional unrolling unnecessarily increases code size and compile time. In addition to that we have several bug reports of unncessary runtime unrolling for vectorized loops, e.g. PR40961 Compile-time improvements: NewPM-O3: -1.04% NewPM-ReleaseThinLTO: -0.59% NewPM-ReleaseLTO-g: -0.97% https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u Fixes #40306. Reviewed By: lebedev.ri, nikic Differential Revision: https://reviews.llvm.org/D115261	2023-01-06 10:56:17 +00:00
Nikita Popov	2fab927546	[LoopVectorize] Convert some tests to opaque pointers (NFC) Check lines for some of these tests were regenerated. The difference is that with opaque pointers SCEVExpander always emits i8 GEPs, making the address calculation explicit. This is a known problem that will be solved long term by making all address calculations explicit.	2023-01-04 17:25:42 +01:00
Nikita Popov	5b40015063	[LoopVectorize] Convert some tests to opaque pointers (NFC) For these tests update_test_checks.py had to be rerun.	2022-12-14 15:27:31 +01:00
Nikita Popov	7d7577256b	[LoopVectorize] Convert some tests to opaque pointers (NFC)	2022-12-14 15:16:59 +01:00
Sanjay Patel	05dbdb0088	Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 (2nd try)" This reverts commit e71b81cab09bf33e3b08ed600418b72cc4117461. As discussed in the planned follow-on to this patch (D138874), this and the subsequent patches in this set can cause trouble for the backend, and there's probably no quick fix. We may even want to canonicalize in the opposite direction (towards insertelt).	2022-12-08 14:16:46 -05:00
Roman Lebedev	1e08a08a87	[NFC] Port all LoopVectorize tests to `-passes=` syntax	2022-12-08 02:38:47 +03:00
Roman Lebedev	be51fa4580	[NFC] Port all runlines for LoopVectorize pass tests to -passes syntax	2022-12-05 22:17:30 +03:00
Sanjay Patel	e71b81cab0	[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 (2nd try) The first attempt was reverted because a clang test changed unexpectedly - the file is already marked with a FIXME, so I just updated it this time to pass. Original commit message: This is the main patch for converting a truncated scalar that is inserted into a vector to bitcast+shuffle. We could go either way on patterns like this, but this direction will allow collapsing a pair of these sequences on the motivating example from issue The patch is split into 3 parts to make it easier to see the progression of tests diffs. We allow inserting/shuffling into a different size vector for flexibility, so there are several test variations. The length-changing is handled by shortening/padding the shuffle mask with undef elements. In part 1, handle the basic pattern: inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask Proof for the endian-dependency behaving as expected: https://alive2.llvm.org/ce/z/BsA7yC The TODO items for handling shifts and insert into an arbitrary base vector value are implemented as follow-ups. Differential Revision: https://reviews.llvm.org/D138872	2022-11-30 14:52:20 -05:00
Sanjay Patel	5eacdcff06	Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1" This reverts commit a4c466766db77cd1fb42d7f98f32bb87a3d38829. This broke clang tests that are wrongly dependent on the optimizer.	2022-11-30 14:10:50 -05:00

1 2 3 4 5

243 Commits