llvm-project

Author	SHA1	Message	Date
Nikita Popov	330cb03269	[LoadStoreVectorizer] Check for guaranteed-to-transfer (PR52950) Rather than checking for nounwind in particular, make sure the instruction is guaranteed to transfer execution, which will also handle non-willreturn calls correctly. Fixes https://github.com/llvm/llvm-project/issues/52950.	2022-01-03 10:55:47 +01:00
Florian Hahn	6e0a333f71	[LV] Use Builder.CreateVectorReverse directly. (NFC) IRBuilder::CreateVectorReverse already handles all cases required by LoopVectorize. It can be used directly instead of reverseVector.	2022-01-02 19:09:30 +00:00
Kazu Hirata	7e163afd9e	Remove redundant void arguments (NFC) Identified by modernize-redundant-void-arg.	2022-01-02 10:20:19 -08:00
Florian Hahn	b1a333f0fe	[VPlan] Don't consider VPWidenCanonicalIVRecipe phi-like. VPWidenCanonicalIVRecipe does not create PHI instructions, so it does not need to be placed in the phi section of a VPBasicBlock. Also tidies the code so the WidenCanonicalIV recipe and the compare/lane-masks are created in the header. Discussed D113223. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116473	2022-01-02 12:48:17 +00:00
Kazu Hirata	fd4808887e	[llvm] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-01 16:18:18 -08:00
Florian Hahn	7305798049	[VPlan] Remove VPWidenPHIRecipe constructor without start value (NFC). This was suggested as a separate cleanup in recent reviews.	2022-01-01 13:53:48 +00:00
Florian Hahn	e2f1c4c706	[LV] Turn check for unexpected VF into assertion (NFC). VF should always be non-zero in widenIntOrFpInduction. Turn check into assertion.	2021-12-31 13:19:03 +00:00
Alexey Bataev	e0efedd2c3	[SLP][NFC]Fix non-determinism in reordering, NFC. Need to clear CurrentOrder order mask if it is determined that extractelements form identity order and need to use a vector-like construct when iterating over ordered entries in the reorderTopToBottom function.	2021-12-30 13:10:25 -08:00
Florian Hahn	ba9016a030	[LV] Replace redundant tail-fold check with assert (NFC). The code path can only be reached when folding the tail, so turn the check into an assertion.	2021-12-29 19:00:41 +01:00
Florian Hahn	9d297c7894	[VPlan] Add prepareToExecute to set up live-ins (NFC). This patch adds a new prepareToExecute helper to set up live-ins, so VPTransformState doesn't need to hold values like TripCount. This also requires making the trip count operand for ActiveLaneMask explicit in VPlan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116320	2021-12-28 17:49:47 +01:00
Sanjay Patel	0edf99950e	[Analysis] allow caller to choose signed/unsigned when computing constant range We should not lose analysis precision if an 'add' has both no-wrap flags (nsw and nuw) compared to just one or the other. This patch is modeled on a similar construct that was added with D59386. I don't think it is possible to expose a problem with an unsigned compare because of the way this was coded (nuw is handled first). InstCombine has an assert that fires with the example from: https://github.com/llvm/llvm-project/issues/52884 ...because it was expecting InstSimplify to handle this kind of pattern with an smax. Fixes #52884 Differential Revision: https://reviews.llvm.org/D116322	2021-12-28 09:45:37 -05:00
Florian Hahn	c2275278c6	[VPlan] Add abstract base class for header phi recipes (NFC). Not all header phis widen the phi, e.g. like the new VPCanonicalIVPHIRecipe in D113223. To let those recipes also inherit from a phi-like base class, add a more generic VPHeaderPHIRecipe abstract base class. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116304	2021-12-28 15:37:47 +01:00
Florian Hahn	c66286ed59	[LV] Use specific first-order recurrence recipe as arg type (NFC). Required for further refactoring in D116304.	2021-12-28 10:58:21 +01:00
Florian Hahn	2e630eabd3	[LV] Sink BTC creation to actual use (NFC). Suggested separately in D116123.	2021-12-27 11:25:46 +01:00
Florian Hahn	511726c64d	[LV] Move getStepVector out of ILV (NFC). First step to split up induction handling and move it outside ILV. Used in D116123 and following.	2021-12-26 21:17:26 +01:00
Kazu Hirata	76f0f1cc5c	Use {DenseSet,SetVector,SmallPtrSet}::contains (NFC)	2021-12-24 21:43:06 -08:00
Florian Hahn	ede7c2438f	[VPlan] Create header & latch blocks for skeleton up front (NFC). By creating the header and latch blocks up front and adding blocks and recipes in between those 2 blocks we ensure that the entry and exits of the plan remain valid throughout construction. In order to avoid test changes and keep printing of the plans the same, we use the new header block instead of creating a new block on the first iteration of the loop traversing the original loop. We also fold the latch into its predecessor. This is a follow up to a post-commit suggestion in D114586. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115793	2021-12-22 12:44:25 +00:00
Florian Hahn	c83ef407df	[LV] Adjust comment to say the induction is created in header. Follow-up suggested post-commit for 1a54889f48fa.	2021-12-22 11:56:40 +00:00
Florian Hahn	1a54889f48	[LV] Ensure WidenCanonicalIVRecipe is always created in header (NFC). The VPWidenCanonicalIVRecipe must always be created in the phi section of the header block. Use that block as insert point.	2021-12-21 15:14:48 +00:00
Paul Walker	7c68ed8892	[SVE] Reintroduce -scalable-vectorization=preferred as an alias to "on". Some buildbots still rely on the experimental flag, so let's keep it until everything has been migrated to the new "on by default" state.	2021-12-21 12:54:04 +00:00
Kazu Hirata	500c4b68dc	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-12-20 23:43:24 -08:00
Sander de Smalen	b1ff20fd35	[LV] Enable scalable vectorization by default for SVE cores. The availability of SVE should be sufficient to enable scalable auto-vectorization. This patch adds a new TTI interface to query the target what style of vectorization it wants when scalable vectors are available. For other targets than AArch64, this currently defaults to 'FixedWidthOnly'. Differential Revision: https://reviews.llvm.org/D115651	2021-12-20 16:23:29 +00:00
Alexey Bataev	ab9078f3d3	[SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy. Need to check for the number of the unique non-constant values since the unique values may include several constants. Differential Revision: https://reviews.llvm.org/D115939	2021-12-20 07:21:20 -08:00
Alexey Bataev	4459a11f4d	Revert "[SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy." This reverts commit fcaf290d0278bb83387e1a1d972c55e08b8c40e3 to fix test mismatch reported in https://lab.llvm.org/buildbot#builders/117/builds/3531	2021-12-20 07:21:18 -08:00
Florian Hahn	5b362e4c7f	[VPlan] Add Debugloc to VPInstruction. Upcoming changes require attaching debug locations to VPInstructions, e.g. adding induction increment recipes in D113223. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115123	2021-12-20 15:10:41 +00:00
Alexey Bataev	fcaf290d02	[SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy. Need to check for the number of the unique non-constant values since the unique values may include several constants. Differential Revision: https://reviews.llvm.org/D115939	2021-12-20 05:15:01 -08:00
Alexey Bataev	71fe59212c	[SLP][NFC]Adjust type in debug output loop. The ReuseShuffleIndices indeces are integer, not unsigned, need to fix the type in the debug print loop.	2021-12-17 12:43:01 -08:00
Alexey Bataev	46ad66b817	[SLP][NFC]Use 'llvm::copy' instead of element-by-elemen copying.	2021-12-17 12:07:59 -08:00
Florian Hahn	564d109b35	[LV] Pass VectorHeader block to emitTransformedIndex (NFC). Pass in the vector header instead of relying on ILV::LoopVectorBody. This reduces the dependence on state from ILV. Where VPTransformState is available, State.CFG.PrevBB can be used.	2021-12-17 10:11:16 +00:00
Alexey Bataev	65fc992579	[SLP]Early exit out of the reordering if shuffled/perfect diamond match found. Need to early exit out of the reordering process if the perfect/shuffled match is found in the operands. Such pattern will result in not profitable reordering because of (false positive) external use of scalars. Differential Revision: https://reviews.llvm.org/D115811	2021-12-16 11:09:49 -08:00
Florian Hahn	3b35113ff0	[VPlan] Add VPBlockBase::successors() returning an iterator_range (NFC). This will also be helpful for D115793.	2021-12-16 14:28:50 +00:00
Arthur Eubanks	5a81a60391	[NFC] Remove more calls to getAlignment() These are deprecated and should be replaced with getAlign(). Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align value.	2021-12-15 14:40:57 -08:00
Alexey Bataev	6f2e087631	[SLP]Do not represent splats as node with the reused scalars. No need to represent splats as a node with the reused scalars, it may increase the cost (currently pass just ignores extra shuffle cost and it is still not correct). Differential Revision: https://reviews.llvm.org/D115800	2021-12-15 06:33:11 -08:00
Alexey Bataev	bd05376986	[SLP]Improve multinode analysis. Changes the preliminary multinode analysis: 1. Introduced scores for reversed loads/extractelements. 2. Improved shallow score calculation. 3. Lowered the cost of external uses (no need to consider it several times, just ones). 4. The initial lane for analysis is the one with the minimal possible reorderings. These changes in general shall reduce compile time and improve the reordering in many cases. Part of D57059. Differential Revision: https://reviews.llvm.org/D101109	2021-12-14 06:01:52 -08:00
Alexey Bataev	e5b191a433	[SLP]Improve/fix reodering for gather nodes with extractelements/undefs. If the gather node is a mix of undefvalues and exractelement instructions, need to take the ordering for such nodes into account too. It allows to reorder some (sub)trees and remove some extra shuffles, improving overall vectorization. Also, outlined common functionality into a separate function. Differential Revision: https://reviews.llvm.org/D115358	2021-12-13 10:59:38 -08:00
Nikita Popov	432c41ebe9	[SLP] Avoid getPointerElementType() call Use the load result type instead of the element type of the load pointer operand.	2021-12-13 15:46:13 +01:00
Evgeniy Brevnov	7002125cff	[LV][NFC] Fix debug message to print out resulting clamped VF	2021-12-13 18:54:05 +07:00
Florian Hahn	e90630e5a5	[VPlan] Remove unused createNaryOp (NFC).	2021-12-13 11:11:00 +00:00
Evgeniy Brevnov	2025e0985c	[LV] Make sure VF doesn't exceed compile time known TC For the simple copy loop (see test case) vectorizer selects VF equal to 32 while the loop is known to have 17 iterations only. Such behavior makes no sense to me since such vector loop will never be executed. The only case we may want to select VF large than TC is masked vectoriztion. So I haven't touched that case. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D114528	2021-12-13 13:48:46 +07:00
Florian Hahn	b6a2ddb6c8	[LV] Use info from State in some helper functions (NFC). This updates several helper functions to use information provided by VPTransformState instead of ILV directly, to help with the transition out of ILV.	2021-12-12 20:48:38 +00:00
David Green	fed3041863	[LV][ARM] Improve reduction costmodel for mismatching extension types. Given a MLA reduction from two different types (say i8 and i16), we were previously failing to find the reduction pattern, often making us chose the lower vector factor. This improves that by using the largest of the two extension types, allowing us to use the larger VF as the type of the reduction. As per https://godbolt.org/z/KP549EEYM the backend handles this valiantly, leading to better performance. Differential Revision: https://reviews.llvm.org/D115432	2021-12-10 15:40:58 +00:00
Florian Hahn	505ad03c7d	[LV] Remove redundant IV casts using VPlan (NFCI). This patch simplifies handling of redundant induction casts, by removing dead cast instructions after initial VPlan construction. This has the following benefits: 1. fixes a crash (see @test_optimized_cast_induction_feeding_first_order_recurrence) 2. Simplifies VPWidenIntOrFpInduction to a single-def recipes 3. Retires recordVectorLoopValueForInductionCast. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115112	2021-12-10 13:57:03 +00:00
Florian Hahn	acea6e9cfa	[Passes] Only run extra vector passes if loops have been vectorized. This patch uses a similar trick as in D113947 to only run the extra passes after vectorization on functions where loops have been vectorized. The reason for running the 'extra vector passes' is simplification/unswitching of the runtime checks created by LV, there should be no need to run them if nothing got vectorized To do that, a new dummy analysis ShouldRunExtraVectorPasses has been added. If loops have been vectorized for a function, LV will cache the analysis. At the moment it uses MadeCFGChanges as proxy for loop vectorized, which isn't perfect (it could be too aggressive, e.g. because no runtime checks have been added), but should be good enough for now. The extra passes are now managed by a new FunctionPassManager that runs its passes only if ShouldRunExtraVectorPasses has been cached. Without this patch, `-extra-vectorizer-passes` has the following compile-time impact: NewPM-O3: +4.86% NewPM-ReleaseThinLTO: +3.56% NewPM-ReleaseLTO-g: +7.17% http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=c292da649e2c6e88a31e702fdc474727d09c72bc&stat=instructions With this patch, that gets reduced to NewPM-O3: +1.43% NewPM-ReleaseThinLTO: +1.00% NewPM-ReleaseLTO-g: +1.58% http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=e67d86b57810011cf285eb9aa1944781be6096f0&stat=instructions It is probably still too high to enable by default, but much better. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D115052	2021-12-10 11:42:45 +00:00
Florian Hahn	978883d254	[VPlan] Add InductionDescriptor to VPWidenIntOrFpInduction. (NFC) This allows easier access to the induction descriptor from VPlan, without needing to go through Legal. VPReductionPHIRecipe already contains a RecurrenceDescriptor in a similar fashion. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115111	2021-12-10 09:55:09 +00:00
Alexey Bataev	19c5cf4167	[SLP]Fix comparator for cmp instruction vectorization. The comparator for the sort functions should provide strict weak ordering relation between parameters. Current solution causes compiler crash with some standard c++ library implementations, because it does not meet this criteria. Tried to fix it + it improves the iverall vectorization result. Differential Revision: https://reviews.llvm.org/D115268	2021-12-09 10:57:57 -08:00
Philip Reames	b24db85c0b	[recurrence] Delete dead flag/fmf handling [NFC] The recurrence lowering code has handling which claims to be about flag intersection, but all the callers pass empty arrays to the arguments. The sole exception is a caller of a method which has the argument, but no implementation. I don't know what the intent was here, but it certaintly doesn't actually do anything today.	2021-12-09 10:43:53 -08:00
Florian Hahn	d74a8a78ad	[LV] Mark various functions as const (NFC). Make sure various accessors do not modify any state, in preparation for D115111.	2021-12-09 10:51:29 +00:00
Florian Hahn	e9a2944495	[VPlan] Verify plan entry and exit blocks, set correct exit block. Both the entry and exit blocks of the top-region of a plan must be VPBasicBlocks. They also must have no predecessors or successors respectively. This invariant was broken when splitting a block for sink-after. To fix the issue, set the exit block of the region after sink-after is done. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D114586	2021-12-07 16:26:31 +00:00
Cullen Rhodes	0395e01583	[IR] Split vscale_range interface Interface is split from: std::pair<unsigned, unsigned> getVScaleRangeArgs() into separate functions for min/max: unsigned getVScaleRangeMin(); Optional<unsigned> getVScaleRangeMax(); Reviewed By: sdesmalen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D114075	2021-12-07 10:38:26 +00:00
Alexey Bataev	a101a9b64b	[SLP]Fix compiler crash when calculating extract cost for undefs. Need to add an extra check for potential undef values in computeExtractCost function to avoid compiler crash on casting to instructon. Differential Revision: https://reviews.llvm.org/D115162	2021-12-06 10:46:13 -08:00

... 3 4 5 6 7 ...

3059 Commits