llvm-project

Author	SHA1	Message	Date
Alexey Bataev	f9bc00e4bb	[SLP]Initial support for interleaved loads Adds initial support for interleaved loads, which allows emission of segmented loads for RISCV RVV. Vectorizes extra code for RISCV CFP2006/447.dealII, CFP2006/453.povray, CFP2017rate/510.parest_r, CFP2017rate/511.povray_r, CFP2017rate/526.blender_r, CFP2017rate/538.imagick_r, CINT2006/403.gcc, CINT2006/473.astar, CINT2017rate/502.gcc_r, CINT2017rate/525.x264_r Reviewers: RKSimon, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/112042	2024-10-14 09:12:33 -04:00
Florian Hahn	ec42778071	[LV] Remove unused type declaration from ILV (NFC).	2024-10-12 20:36:42 +01:00
Alexey Bataev	3ed8acf2f0	[SLP][NFC]Simplify check for external user parent basic block, NFC.	2024-10-11 13:11:16 -07:00
vporpo	31b85c6ead	[SandboxVec][Interval] Implement Interval::comesBefore() (#112026 ) This patch implements `Interval::comesBefore(const Interval &Other)` which returns true if this interval is strictly before Other in program order. The function asserts that the intervals are disjoint.	2024-10-11 11:51:38 -07:00
vporpo	e8dd95e97b	[SandboxVec][DAG] Extend DAG (#111908 ) This patch implements growing the DAG towards the top or bottom. This does the necessary dependency checks and adds new mem dependencies.	2024-10-11 08:12:29 -07:00
Rahul Joshi	fa789dffb1	[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752 ) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).	2024-10-11 05:26:03 -07:00
Florian Hahn	65da32c634	[LV] Account for any-of reduction when computing costs of blend phis. Any-of reductions are narrowed to i1. Update the legacy cost model to use the correct type when computing the cost of a phi that gets lowered to selects (BLEND). This fixes a divergence between legacy and VPlan-based cost models after 36fc291b6ec6d. Fixes https://github.com/llvm/llvm-project/issues/111874.	2024-10-11 11:27:22 +01:00
David Sherwood	72f339de45	[LoopVectorize] Use predicated version of getSmallConstantMaxTripCount (#109928 ) There are a number of places where we call getSmallConstantMaxTripCount without passing a vector of predicates: getSmallBestKnownTC isIndvarOverflowCheckKnownFalse computeMaxVF isMoreProfitable I've changed all of these to now pass in a predicate vector so that we get the benefit of making better vectorisation choices when we know the max trip count for loops that require SCEV predicate checks. I've tried to add tests that cover all the cases affected by these changes.	2024-10-11 10:10:15 +01:00
Alexey Bataev	4b5018d231	[SLP]Track repeated reduced value as it might be vectorized Need to track changes with the repeated reduced value, since it might be vectorized in the next attempt for reduction vectorization, to correctly generate the code and avoid compiler crash. Fixes #111887	2024-10-10 13:41:56 -07:00
vporpo	69c0067927	[SandboxVec][DAG] Refactoring: Outline code that looks for mem nodes (#111750 )	2024-10-10 13:25:03 -07:00
vporpo	a4916d2005	[SandboxVec][DAG] Refactoring: Move MemPreds from DGNode to MemDGNode (#111897 )	2024-10-10 12:42:28 -07:00
Florian Hahn	bb937e276d	[LV] Compute value of escaped induction based on the computed end value. (#110576 ) Update fixupIVUsers to compute the value for escaped inductions using the already computed end value of the induction (EndValue), but subtracting the step. This results in slightly simpler codegen, as we avoid computing the full transformed index at VectorTripCount - 1. PR: https://github.com/llvm/llvm-project/pull/110576	2024-10-10 20:04:46 +01:00
vporpo	747d8f3fc9	[SandboxVec][DAG] Implement PredIterator (#111604 ) This patch implements an iterator for iterating over both use-def and mem dependencies of MemDGNodes.	2024-10-10 12:01:56 -07:00
Ramkumar Ramachandra	1f919aa778	VectorCombine: lift one-use limitation in foldExtractedCmps (#110902 ) There are artificial one-use limitations on foldExtractedCmps. Adjust the costs to account for multi-use, and strip the one-use matcher, lifting the limitations.	2024-10-10 14:10:41 +01:00
Piotr Fusik	a7a4daa429	[LV][NFC] Improve readability with `bool` instead of `auto` (#111532 )	2024-10-10 12:30:18 +02:00
Jorge Gorbe Moya	756ec99c36	[SandboxVec] Re-land "Use sbvec-passes flag to create a pipeline of Region passes after BottomUpVec. (#111223 )" (#111772 ) https://github.com/llvm/llvm-project/pull/111223 was reverted because of a build failure with `-DBUILD_SHARED_LIBS=on`. The Passes component depends on Vectorizer (because PassBuilder needs to be able to instantiate SandboxVectorizerPass). This resulted in CMake doing this 1. when it builds lib/libLLVMVectorize.so.20.0git it adds lib/libLLVMSandboxIR.so.20.0git to the command line, because it's listed as a dependency (as expected) 2. when it's trying to build lib/libLLVMPasses.so.20.0git it adds lib/libLLVMVectorize.so.20.0git to the command line, because it's listed as a dependency (also as expected). But not libLLVMSandboxIR.so. When SandboxVectorizerPass has its ctors/dtors defined inline, this caused "undefined reference to vtable" linker errors. This change works around that by moving ctors/dtors out of line. Also fix a bazel build problem by adding the new `llvm/lib/Transforms/Vectorize/SandboxVectorizer/Passes/PassRegistry.def` as a textual header in the Vectorizer target.	2024-10-09 18:00:17 -07:00
Alexey Bataev	f020bf1526	[SLP]Initial support for non-power-of-2 (but whole reg) vectorization for stores Allows non-power-of-2 vectorization for stores, but still requires, that vectorized number of elements forms full vector registers. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/111194	2024-10-09 15:22:44 -04:00
Jorge Gorbe Moya	102c384b57	Revert "[SandboxVectorizer] Use sbvec-passes flag to create a pipeline of Region passes after BottomUpVec." (#111727 ) Reverts llvm/llvm-project#111223 It broke one of the build bots: LLVM Buildbot has detected a new failure on builder flang-aarch64-libcxx running on linaro-flang-aarch64-libcxx while building llvm at step 5 "build-unified-tree". Full details are available at: https://lab.llvm.org/buildbot/#/builders/89/builds/8127	2024-10-09 10:45:23 -07:00
Jorge Gorbe Moya	10ada4ae73	[SandboxVectorizer] Use sbvec-passes flag to create a pipeline of Region passes after BottomUpVec. (#111223 ) The main change is that the main SandboxVectorizer pass no longer has a pipeline of function passes. Now it is a wrapper that creates sandbox IR from functions before calling BottomUpVec. BottomUpVec now builds its own RegionPassManager from the `sbvec-passes` flag, using a PassRegistry.def file. For now, these region passes are not run (BottomUpVec doesn't create Regions yet), and only a null pass for testing exists. This commit also changes the ownership model for sandboxir::PassManager: instead of having a PassRegistry that owns passes, and PassManagers that contain non-owning pointers to the passes, now PassManager owns (via unique pointers) the passes it contains. PassRegistry is now deleted, and the logic to parse and create a pass pipeline is now in PassManager::setPassPipeline.	2024-10-09 10:37:05 -07:00
vporpo	ee0e17a4d8	[SandboxVec][DAG] Drop RAR and fix dependency scanning loop (#111715 )	2024-10-09 10:29:48 -07:00
David Green	c136d3237a	[VectorCombine] Do not try to operate on OperandBundles. (#111635 ) This bails out if we see an intrinsic with an operand bundle on it, to make sure we don't process the bundles incorrectly. Fixes #110382.	2024-10-09 16:20:03 +01:00
Florian Hahn	fa3258ecb8	[VPlan] Sink retrieving legacy costs to more specific computeCost impls. (#109708 ) Make legacy cost retrieval independent of getInstructionForCost by sinking it to more specific ::computeCost implementation (specifically VPInterleaveRecipe::computeCost and VPSingleDefRecipe::computeCost). Inline getInstructionForCost to VPRecipeBase::cost(), as it is now only used to decide which recipes to skip during cost computation and when to apply forced costs. PR: https://github.com/llvm/llvm-project/pull/109708	2024-10-09 13:58:58 +01:00
Florian Hahn	01cbbc52dc	[VPlan] Request lane 0 for pointer arg in PtrAdd. After 7f74651, the pointer operand may be replicated of a PtrAdd. Instead of requesting a single scalar, request lane 0, which correctly handles the case when there is a scalar-per-lane. Fixes https://github.com/llvm/llvm-project/issues/111606.	2024-10-09 13:18:54 +01:00
Simon Pilgrim	00c1c589e0	DependencyGraph.cpp - mix MSVC "not all control paths return a value" warning. NFC.	2024-10-09 11:47:43 +01:00
David Sherwood	e080be5ac2	[NFC][LoopVectorize] Clean up some code around getting a context (#111114 ) There are several places in LoopVectorize where we do more work than necessary to obtain a LLVMContext. I've tried to make the code more efficient.	2024-10-09 09:28:16 +01:00
Vasileios Porpodas	267e852109	[SandboxVec][DAG][NFC] Rename enumerators	2024-10-08 20:01:43 -07:00
vporpo	04a8bffdf7	[SandboxVec][DAG] Build actual dependencies (#111094 ) This patch implements actual dependencies checking using BatchAA. This adds memory dep edges between MemDGNodes.	2024-10-08 16:18:57 -07:00
Vasileios Porpodas	56d2c626f7	[SandboxVec][Interval] Add print() and dump()	2024-10-08 15:27:07 -07:00
Florian Hahn	6fbbe152fa	[VPlan] Introduce VPWidenIntrinsicRecipe to separate from libcall. (#110486 ) This patch splits off intrinsic hanlding to a new VPWidenIntrinsicRecipe. VPWidenIntrinsicRecipes only need access to the intrinsic ID to widen and the scalar result type (in case the intrinsic is overloaded on the result type). It does not need access to an underlying IR call instruction or function. This means VPWidenIntrinsicRecipe can be created easily without access to underlying IR.	2024-10-08 22:37:20 +01:00
Alexey Bataev	9f3c55954e	[SLP]Fix loads sorting for loads from diffrent basic blocks Patch fixes lookup for loads from different basic blocks. Originally, the code checked is the main key (combined with parent basic block) was created, but did not include the key into LoadsMap. When the code looked for the load pointer in LoadsMap, it skipped check for parent basic block and could mix loads from different basic blocks (but the same underlying pointer). Currently, it does lead to any issues, since later the code compares parent basic blocks and sorts loads properly. But it increases compile time and affects compile time. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/111521	2024-10-08 16:44:16 -04:00
Alexey Bataev	a65a5feb1a	[SLP]Improve masked loads vectorization, attempting gathered loads If the vector of loads can be vectorized as masked gather and there are several other masked gather nodes, compiler can try to attempt to check, if it possible to gather such nodes into big consecutive/strided loads node, which provide better performance. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/110151	2024-10-08 16:43:10 -04:00
Florian Hahn	36fc291b6e	[VPlan] Implement VPBlendRecipe::computeCost. Implement VPBlendRecipe::computeCost. VPBlendRecipe is currently is also used if only the first lane is used. This also requires pre-computing costs for forced scalars and instructions considered profitable to scalarize. For those, the cost will be computed separately in the legacy cost model. This will also be needed when implementing VPReplicateRecipe::computeCost.	2024-10-08 21:33:42 +01:00
Florian Hahn	3829fd75c8	[VPlan] Remove redundant getVPSingleValue for VPSingleDefRecipes (NFC).	2024-10-08 20:31:41 +01:00
Simon Pilgrim	d38addf099	Fix MSVC signed/unsigned mismatch warning	2024-10-08 17:36:35 +01:00
Alexey Bataev	45826513ef	[SLP][NFC]Fix clang-tidy suggestions, cleanup, NFC.	2024-10-08 08:31:23 -07:00
Florian Hahn	3ec6f805c5	[VPlan] Don't created GEP x, 0 for interleave group pointers. The GEP with offet 0 is redundant, remove it. This addresses a TODO from 7f74651837b ((#106431).	2024-10-08 12:08:13 +01:00
Sterling-Augustine	3f5039323c	[SandboxVectorizer][NFC] Remove unused include (#111418 )	2024-10-07 11:47:00 -07:00
Sterling-Augustine	93bfa7886b	[SandboxVectorizer] Define SeedBundle: a set of instructions to be vectorized [retry] (#111073 ) [Retry 110696 with a proper rebase.] Seed collection will assemble instructions to be vectorized into SeedBundles. This data structure is not intended to be used directly, but will be the basis for load bundles, store bundles, and so on.	2024-10-07 11:20:50 -07:00
David Sherwood	66b282014c	[LoopVectorize] Remove redundant code in emitSCEVChecks (#111132 ) There was some code in emitSCEVChecks to update the dominator tree if LoopBypassBlocks is empty, however there are no tests that fail when replacing this code with an assert. I built both SPEC2017 and the LLVM test suite and also didn't see any build failures. I've removed the code for now and added an assert to guard this in case anything changes, since it seems pointless to have code that's impossible to defend.	2024-10-07 07:58:27 +01:00
Florian Hahn	7f74651837	[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431 ) Update VPInterleaveRecipe to always use the pointer to member 0 as pointer argument. This in many cases helps to remove unneeded index adjustments and simplifies VPInterleaveRecipe::execute. In some rare cases, the address of member 0 does not dominate the insert position of the interleave group. In those cases a PtrAdd VPInstruction is emitted to compute the address of member 0 based on the address of the insert position. Alternatively we could hoist the recipe computing the address of member 0.	2024-10-06 22:53:13 +01:00
Florian Hahn	45b526afa2	[LV] Honor uniform-after-vectorization in setVectorizedCallDecision. The legacy cost model always computes the cost for uniforms as cost of VF = 1, but VPWidenCallRecipes would be created, as setVectorizedCallDecisions would not consider uniform calls. Fix setVectorizedCallDecision to set to Scalarize, if the call is uniform-after-vectorization. This fixes a bug in VPlan construction uncovered by the VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/111040.	2024-10-06 10:35:06 +01:00
Florian Hahn	68210c7c26	[VPlan] Only generate first lane for VPPredInstPHI if no others used. IF only the first lane of the result is used, only generate the first lane. Fixes https://github.com/llvm/llvm-project/issues/111042.	2024-10-05 19:15:05 +01:00
Alexey Bataev	7692d106b4	[SLP][NFC]Remove dead code + use nlogn lookups instead of n^2	2024-10-04 15:32:04 -07:00
Alexey Bataev	f74879cf0c	[SLP]Make PHICompare comparator follow weak strict ordering requirement Reviewers: efriedma-quic Reviewed By: efriedma-quic Pull Request: https://github.com/llvm/llvm-project/pull/110529	2024-10-04 14:23:48 -04:00
Alexey Bataev	d991e05452	[SLP]Fix compiler crash on vectorizing gatehrd loads with different types Need to check not only parents, but also types for compatible loads, when trying to build the vectorizable sequences. Fixes crash reported in https://github.com/llvm/llvm-project/pull/107461#issuecomment-2392980214	2024-10-04 08:36:57 -07:00
Han-Kuan Chen	f5815b9903	[SLP] NFC. Set NumOperands directly if VL[0] is IntrinsicInst. (#111103 )	2024-10-04 19:38:33 +08:00
Vasileios Porpodas	45582ed240	[SandboxVec][DAG][NFC] Rename isMemDepCandidate() to isMemDepNodeCandidate()	2024-10-03 10:39:10 -07:00
Alexey Bataev	133c1224de	[SLP]Fix a crash on accessing element with index -1 for reused mask with PoisonMaskElem Need to check if the index from the ReuseShuffleIndices mask is not equal to PoisonMaskElem before trying to access the element by index.	2024-10-03 08:24:05 -07:00
Han-Kuan Chen	5901463ada	[SLP] NFC. BaseIndex is not used for getSameOpcode. (#110948 )	2024-10-03 19:58:44 +08:00
Alexey Bataev	c1b911c579	[SLP]Do correct signedness analysis for clustered nodes Should get the signedness info from the original scalar instructions, if possible, to correctly generate sext/zext instructions. Also, the clustered node must be assigned a gather node user info to correctly estimate its bitwidth/sign.	2024-10-02 12:56:49 -07:00

1 2 3 4 5 ...

5038 Commits