llvm-project

Author	SHA1	Message	Date
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Nikita Popov	2ec1d0f427	[InstCombine] Don't reassociate GEPs for loop invariance Since D146813, LICM will reassociate GEPs to expose hoisting opportunities itself. Don't perform this transform in InstCombine, where it is fragile because it depends on an optional LoopInfo analysis.	2023-04-18 12:17:07 +02:00
Nikita Popov	39ca70392b	[LV] Regenerate test checks (NFC)	2023-04-18 11:52:36 +02:00
Philip Reames	78ae870f11	{tests] Rerun autogen to reduce a diff [nfc]	2023-03-31 12:47:08 -07:00
David Green	3f8160e9ce	[LV][ARM][AArch64] Add multi-exit Loop Vectorizer tests. NFC These are useful to test with the various predication schemes available on different targets.	2023-03-27 10:21:24 +01:00
Sanjay Patel	ef6f23535d	Revert "[InstCombine] use loop info when running the pass after loop vectorization" This reverts commit 43ae4b62b2671cf73e691c0b53324cd39405cd51. This was intended to be practically NFC in terms of the overall opt pipeline, but there is experimental data showing that code changes occurred here: https://llvm-compile-time-tracker.com/compare.php?from=772aa05452f8ff90a47168e6801cda2acb5a1873&to=43ae4b62b2671cf73e691c0b53324cd39405cd51&stat=size-text	2023-03-11 17:28:56 -05:00
Sanjay Patel	43ae4b62b2	[InstCombine] use loop info when running the pass after loop vectorization This is the follow-up to D144199 and suggestion from D144045. We make use of loop info explicit via InstCombine pass parameter rather than semi-arbitrary via caching. The only InstCombine transform that uses LoopInfo currently is a GEP fold in visitGEPOfGEP(), so that shows up as a failure in the dedicated test for the fold as well as several LoopVectorizer tests that run extra passes. I don't see any pass manager regression tests that actually check for pass options, but this is intended to be NFC for the pass pipeline behavior - we only try to use loop info where it would have been used before via caching . Differential Revision: https://reviews.llvm.org/D144274	2023-03-11 14:20:30 -05:00
Nikita Popov	9ed2f14c87	[AsmParser] Remove typed pointer auto-detection IR is now always parsed in opaque pointer mode, unless -opaque-pointers=0 is explicitly given. There is no automatic detection of typed pointers anymore. The -opaque-pointers=0 option is added to any remaining IR tests that haven't been migrated yet. Differential Revision: https://reviews.llvm.org/D141912	2023-01-18 09:58:32 +01:00
Paul Walker	eae26b6640	[IRBuilder] Use canonical i64 type for insertelement index used by vector splats. Instcombine prefers this canonical form (see getPreferredVectorIndex), as does IRBuilder when passing the index as an integer so we may as well use the prefered form from creation. NOTE: All test changes are mechanical with nothing else expected beyond a change of index type from i32 to i64. Differential Revision: https://reviews.llvm.org/D140983	2023-01-11 14:08:06 +00:00
Florian Hahn	68469a80cb	[LV] Disable runtime unrolling for vectorized loops. This patch adds metadata to disable runtime unrolling to the vectorized loop. If runtime unrolling/interleaving is considered profitable, LV will interleave the loop directly. There should be no need to perform runtime unrolling at a later stage. Note that we already add metadata to disable runtime unrolling to the scalar loop after vectorization. The additional unrolling unnecessarily increases code size and compile time. In addition to that we have several bug reports of unncessary runtime unrolling for vectorized loops, e.g. PR40961 Compile-time improvements: NewPM-O3: -1.04% NewPM-ReleaseThinLTO: -0.59% NewPM-ReleaseLTO-g: -0.97% https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u Fixes #40306. Reviewed By: lebedev.ri, nikic Differential Revision: https://reviews.llvm.org/D115261	2023-01-06 10:56:17 +00:00
Nikita Popov	2fab927546	[LoopVectorize] Convert some tests to opaque pointers (NFC) Check lines for some of these tests were regenerated. The difference is that with opaque pointers SCEVExpander always emits i8 GEPs, making the address calculation explicit. This is a known problem that will be solved long term by making all address calculations explicit.	2023-01-04 17:25:42 +01:00
Nikita Popov	5b40015063	[LoopVectorize] Convert some tests to opaque pointers (NFC) For these tests update_test_checks.py had to be rerun.	2022-12-14 15:27:31 +01:00
Nikita Popov	7d7577256b	[LoopVectorize] Convert some tests to opaque pointers (NFC)	2022-12-14 15:16:59 +01:00
Sanjay Patel	05dbdb0088	Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 (2nd try)" This reverts commit e71b81cab09bf33e3b08ed600418b72cc4117461. As discussed in the planned follow-on to this patch (D138874), this and the subsequent patches in this set can cause trouble for the backend, and there's probably no quick fix. We may even want to canonicalize in the opposite direction (towards insertelt).	2022-12-08 14:16:46 -05:00
Roman Lebedev	1e08a08a87	[NFC] Port all LoopVectorize tests to `-passes=` syntax	2022-12-08 02:38:47 +03:00
Roman Lebedev	be51fa4580	[NFC] Port all runlines for LoopVectorize pass tests to -passes syntax	2022-12-05 22:17:30 +03:00
Sanjay Patel	e71b81cab0	[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 (2nd try) The first attempt was reverted because a clang test changed unexpectedly - the file is already marked with a FIXME, so I just updated it this time to pass. Original commit message: This is the main patch for converting a truncated scalar that is inserted into a vector to bitcast+shuffle. We could go either way on patterns like this, but this direction will allow collapsing a pair of these sequences on the motivating example from issue The patch is split into 3 parts to make it easier to see the progression of tests diffs. We allow inserting/shuffling into a different size vector for flexibility, so there are several test variations. The length-changing is handled by shortening/padding the shuffle mask with undef elements. In part 1, handle the basic pattern: inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask Proof for the endian-dependency behaving as expected: https://alive2.llvm.org/ce/z/BsA7yC The TODO items for handling shifts and insert into an arbitrary base vector value are implemented as follow-ups. Differential Revision: https://reviews.llvm.org/D138872	2022-11-30 14:52:20 -05:00
Sanjay Patel	5eacdcff06	Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1" This reverts commit a4c466766db77cd1fb42d7f98f32bb87a3d38829. This broke clang tests that are wrongly dependent on the optimizer.	2022-11-30 14:10:50 -05:00
Sanjay Patel	a4c466766d	[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 This is the main patch for converting a truncated scalar that is inserted into a vector to bitcast+shuffle. We could go either way on patterns like this, but this direction will allow collapsing a pair of these sequences on the motivating example from issue The patch is split into 3 parts to make it easier to see the progression of tests diffs. We allow inserting/shuffling into a different size vector for flexibility, so there are several test variations. The length-changing is handled by shortening/padding the shuffle mask with undef elements. In part 1, handle the basic pattern: inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask Proof for the endian-dependency behaving as expected: https://alive2.llvm.org/ce/z/BsA7yC The TODO items for handling shifts and insert into an arbitrary base vector value are implemented as follow-ups. Differential Revision: https://reviews.llvm.org/D138872	2022-11-30 13:22:04 -05:00
Simon Tatham	e45cbf9923	[ARM,MVE] Update MVE_VMLA_qr for architecture change. In revision B.q and before of the Armv8-M architecture reference manual, the vector/scalar forms of the `vmla` and `vmlas` instructions came in signed and unsigned integer forms, such as `vmla.s8 q0,q1,r2` or `vmlas.u32 q3,q4,r5`. Revision B.r has changed this. There are no longer signed and unsigned versions of these instructions, since they were functionally identical anyway. Now there is just `vmla.i8` (or `i16` or `i32`, and similarly for `vmlas`). Bit 28 of the instruction encoding, which was previously 0 for signed or 1 for unsigned, is now expected to be 0 always. This change updates LLVM to the new version of the architecture. The obsoleted encodings for unsigned integers are now decoding errors, and only the still-valid encoding is ever emitted. This shouldn't break any existing assembly code, because the old signed and unsigned versions of the mnemonic are still accepted by the assembler (which is standard practice anyway for all signedness-agnostic MVE integer instructions). Reviewed By: dmgreen, lenary Differential Revision: https://reviews.llvm.org/D138827	2022-11-29 08:47:00 +00:00
David Green	662b5f1846	[ARM] Add an extra test for low trip count MVE vectorization. NFC This is quite reduced from the original example, but hopefully shows where vectorization is unprofitable because of multiple factors including the low trip count of the loop.	2022-11-17 15:07:28 +00:00
David Green	093b4011e8	[ARM] Add a test demonstrating reductions with reused extend. NFC D136227 showed that tests for this case in getReductionPatternCost were missing.	2022-10-24 19:38:19 +01:00
Arthur Eubanks	f3a928e233	[opt] Don't translate legacy -analysis flag to require<analysis> Tests relying on this should explicitly use -passes='require<analysis>,foo'.	2022-10-07 14:54:34 -07:00
Simon Pilgrim	09cb9fdef9	[InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635) Alive2: https://alive2.llvm.org/ce/z/sZ6wwS As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero. Differential Revision: https://reviews.llvm.org/D134172	2022-09-20 16:44:41 +01:00
Vitaly Buka	bbef90ace4	[IRBuilder] Use PoisonValue in CreateMasked* Followup to 72b776168c7c80d2035c7226488462dcffc97e75 Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D133967	2022-09-19 11:01:41 -07:00
Vitaly Buka	ed188b39ab	[test] Regenerate few tests	2022-09-15 12:36:32 -07:00
Philip Reames	4c10646367	[LV] Refresh autogen tests to reflect naming changes [nfc] Purely so that these can be easily autogened without spurious diffs	2022-08-29 14:16:54 -07:00
David Green	8d830f8d68	[LV] Replace fixed-order cost model with a SK_Splice shuffle The existing cost model for fixed-order recurrences models the phi as an extract shuffle of a v1 vector. The shuffle produced should be a splice, as they take two vectors inputs are extracting from a subset of the lanes. On certain architectures the existing cost model can drastically under-estimate the correct cost for the shuffle, so this changes it to a SK_Splice and passes a correct Mask through to the getShuffleCost call. I believe this might be the first use of a SK_Splice shuffle cost model outside of scalable vectors, and some targets may require additions to the cost-model to correctly account for them. In tree targets appear to all have been updated where needed. Differential Revision: https://reviews.llvm.org/D132308	2022-08-24 13:00:32 +01:00
David Green	04a68fce13	[ARM] Add a couple of MVE fixed-order-reduction tests. NFC	2022-08-22 10:58:14 +01:00
Sanjay Patel	15e3d86911	[InstCombine] reassociate bitwise logic chains based on uses (X op Y) op Z --> (Y op Z) op X This isn't a complete solution (see TODO tests for possible refinements), but it shows some nice wins and doesn't seem to cause any harm. I think the most potential danger is from conflicting with other folds and causing an infinite loop - that's the reason for avoiding patterns with constant operands. Alternatively, we could try this in the reassociate pass, but we would not immediately see all of the logic folds that instcombine provides. I also looked at improving ValueTracking's isImpliedCondition() (and we should still add some enhancements there), but that would not work in general for bitwise logic reduction. The tests that reduce completely to 0/-1 are motivated by issue #56653. Differential Revision: https://reviews.llvm.org/D131356	2022-08-21 09:42:14 -04:00
Florian Hahn	a34428f07d	[LV] Use variables instead of hard-coded metadata IDs in tests.	2022-08-16 12:21:49 +01:00
Nuno Lopes	a30e77b6f6	fix tests for commit 9df0b254d24eca098	2022-07-23 22:32:30 +01:00
David Green	438ffdb821	[ARM] Switch the costs of mve1beat and mve4beat These three subtarget features are meant to control where MVE instructions take 1 vs 2 vs 4 architectural beats. The mve1beat feature is described as "Model MVE instructions as a 1 beat per tick architecture", meaning MVE instruction will execute over 4 cycles. mve4beat is the opposite where the entire 4 beats of the MVE instruction execute in a single cycle. The costs for the two were backwards though, not matching the cycle counts like they should. This patch switches the costs on the two to bring them in-line with expectations. Differential Revision: https://reviews.llvm.org/D129141	2022-07-07 16:10:00 +01:00
David Sherwood	997ecb0036	[LoopVectorize] Add FastMathFlags to the select used for reductions with tail-folding Based on reviewer comments on https://reviews.llvm.org/D126692 I've added FastMathFlags to the select instruction used when tail-folding with reductions. These flags can then be used by InstCombine to decide upon the most optimal floating point identity value for fadd/fsub. Doing so unlocks further optimisations, such as folding selects into masked loads. Differential Revision: https://reviews.llvm.org/D126778	2022-06-07 10:21:31 +01:00
Nikita Popov	36cbdaa163	[InstCombine] Fix inbounds preservation when swapping GEPs (PR44206) When reassociating GEPs, we can only keep inbounds if both original GEPs were inbounds, and their offsets have the same sign. For the sake of simplicity, I only handle the case where both offsets are non-negative here. It would probably be fine to just not preserve inbounds at all here, but as I don't see a compile-time impact for adding the isKnownNonNegative() calls I went with this more conservative approach. Fixes https://github.com/llvm/llvm-project/issues/44206. Differential Revision: https://reviews.llvm.org/D126687	2022-05-31 15:45:02 +02:00
Florian Hahn	b7315ffc3c	[LAA,LV] Add initial support for pointer-diff memory checks. This patch adds initial support for a pointer diff based runtime check scheme for vectorization. This scheme requires fewer computations and checks than the existing full overlap checking, if it is applicable. The main idea is to only check if source and sink of a dependency are far enough apart so the accesses won't overlap in the vector loop. To do so, it is sufficient to compute the difference and compare it to the `VF * UF * AccessSize`. It is sufficient to check `(Sink - Src) <u VF * UF * AccessSize` to rule out a backwards dependence in the vector loop with the given VF and UF. If Src >=u Sink, there is not dependence preventing vectorization, hence the overflow should not matter and using the ULT should be sufficient. Note that the initial version is restricted in multiple ways: 1. Pointers must only either be read or written, by a single instruction (this allows re-constructing source/sink for dependences with the available information) 2. Source and sink pointers must be add-recs, with matching steps 3. The step must be a constant. 3. abs(step) == AccessSize. Most of those restrictions can be relaxed in the future. See https://github.com/llvm/llvm-project/issues/53590. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D119078	2022-05-16 15:27:22 +01:00
Dávid Bolvanský	872f7000fc	Revert "[NFCI] Regenerate SROA/LoopVectorize test checks" This reverts commit 14e3450fb57305aa9ff3e9e60687b458e43835c9.	2022-04-04 01:15:30 +02:00
Dávid Bolvanský	a113a582b1	[NFCI] Regenerate LoopVectorize test checks	2022-04-03 21:56:24 +02:00
Florian Hahn	95f76bff1c	[LV] Create & use VPScalarIVSteps for all scalar users. This patch is a follow-up to D115953. It updates optimizeInductions to also introduce new VPScalarIVStepsRecipes if an IV has both vector and scalar uses. It updates all uses that only need scalar values to use the newly created recipe for the scalar steps. This completes untangling of VPWidenIntOrFpInductionRecipe code-generation. Now the recipe only creates the widened vector values, as it says on the tin. The code to genereate IR has been moved directly to VPWidenIntOrFpInductionRecipe::execute. Note that the recipe has been updated to hold a reference to ScalarEvolution, which is needed to expand the step, until we can place the corresponding SCEV expansion in the pre-header. Depends on D120827. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D120828	2022-03-13 17:15:24 +00:00
Roman Lebedev	2f80ea7f4f	[NFC][LV] Use different braces in debug output The analysis passes output function name encapsulated in `'` braces, but LV uses `"`. Harmonizing this may help in creating an update script for the LV costmodel test checks. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D121105	2022-03-07 19:32:37 +03:00
Florian Hahn	b3e8ace198	Recommit "[VPlan] Introduce recipe to build scalar steps." This reverts the revert commit ff93260bf6bddfbad1fa65c4d5184988885b900f. The underlying issue causing the PPC bot failures has been fixed in cbaac1473403 and a corresponding test case has been added in ad2cad1c521c. Original message: This patch adds a new VPScalarIVStepsRecipe to handle building scalar steps. In the first patch, it only handles the case where there is no vector induction variable needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115953	2022-02-28 14:12:20 +00:00
Florian Hahn	ff93260bf6	Revert "[VPlan] Introduce recipe to build scalar steps." This reverts commit 49b23f451cf713036c99573a35daed308d2ac894. This appears to break some PPC build bots. Revert while I investigate.	2022-02-27 17:51:19 +00:00
Florian Hahn	49b23f451c	[VPlan] Introduce recipe to build scalar steps. This patch adds a new VPScalarIVStepsRecipe to handle building scalar steps. In the first patch, it only handles the case where there is no vector induction variable needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115953	2022-02-27 17:32:41 +00:00
Florian Hahn	da740492b0	[VPlan] Remove dead header-phi recipes. This patch adds a new transform to remove dead recipes. For now, it only removes dead recipes in the header, to keep the number tests that require updating manageable. Future patches will extend this to remove dead recipes across the whole plan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D118051	2022-02-26 16:26:39 +00:00
Nikita Popov	a266af7211	[InstCombine] Canonicalize SPF to min/max intrinsics Now that integer min/max intrinsics have good support in both InstCombine and other passes, start canonicalizing SPF min/max to intrinsic min/max. Once this sticks, we can stop matching SPF min/max in various places, and can remove hacks we have for preventing infinite loops and breaking of SPF canonicalization. Differential Revision: https://reviews.llvm.org/D98152	2022-02-24 09:01:20 +01:00
Florian Hahn	efd4938723	[VPlan] Handle IV vector splat using VPWidenCanonicalIV. This patch tries to use an existing VPWidenCanonicalIVRecipe instead of creating another step-vector for canonical induction recipes in widenIntOrFpInduction. This has the following benefits: 1. First step to avoid setting both vector and scalar values for the same induction def. 2. Reducing complexity of widenIntOrFpInduction through making things more explicit in VPlan 3. Only need to splat the vector IV for block in masks. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116123	2022-01-29 16:25:27 +00:00
Florian Hahn	d4a8fc3a87	[VPlan] Introduce and use BranchOnCount VPInstruction. This patch adds a new BranchOnCount VPInstruction opcode with 2 operands. It first compares its 2 operands (increment of canonical induction and vector trip count), followed by a branch to either the exit block or back to the vector header. It must be the last recipe in the exit block of the topmost vector loop region. This extracts parts from D113224 and was discussed in D113223. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116479	2022-01-12 13:42:13 +00:00
Florian Hahn	86d113a8b8	[SCEVExpand] Do not create redundant 'or false' for pred expansion. This patch updates SCEVExpander::expandUnionPredicate to not create redundant 'or false, x' instructions. While those are trivially foldable, they can be easily avoided and hinder code that checks the size/cost of the generated checks before further folds. I am planning on look into a few other similar improvements to code generated by SCEVExpander. I remember a while ago @lebedev.ri working on doing some trivial folds like that in IRBuilder itself, but there where concerns that such changes may subtly break existing code. Reviewed By: reames, lebedev.ri Differential Revision: https://reviews.llvm.org/D116696	2022-01-06 11:52:19 +00:00
Florian Hahn	65c4d6191f	[VPlan] Add VPCanonicalIVPHIRecipe, partly retire createInductionVariable. At the moment, the primary induction variable for the vector loop is created as part of the skeleton creation. This is tied to creating the vector loop latch outside of VPlan. This prevents from modeling the whole vector loop in VPlan, which in turn is required to model preheader and exit blocks in VPlan as well. This patch introduces a new recipe VPCanonicalIVPHIRecipe to represent the primary IV in VPlan and CanonicalIVIncrement{NUW} opcodes for VPInstruction to model the increment. This allows us to partly retire createInductionVariable. At the moment, a bit of patching up is done after executing all blocks in the plan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D113223	2022-01-05 10:46:06 +00:00
Philip Reames	e6ad9ef4e7	[instcombine] Canonicalize constant index type to i64 for extractelement/insertelement The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order. I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions. We chose the canonical type as i64 arbitrarily. We might consider changing this to something else in the future if we have cause. Differential Revision: https://reviews.llvm.org/D115387	2021-12-13 16:56:22 -08:00

1 2 3 4 5

211 Commits