llvm-project

Author	SHA1	Message	Date
Alexey Bataev	9f3b6adb15	[SLP][NFC]Exit early if the graph is empty, NFC No need to check anything if the graph is empty, just exit early.	2024-11-06 08:33:14 -08:00
Alexey Bataev	76422385c3	[SLP]Support reordered buildvector nodes for better clustering Patch adds reordering of the buildvector nodes for better clustering of the compatible operations and future vectorization. Includes basic cost estimation and if the transformation is not profitable - reverts it. AVX512, -O3+LTO Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/401.bzip2/401.bzip2.test 74565.00 75701.00 1.5% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 75773.00 76397.00 0.8% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 75773.00 76397.00 0.8% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2014462.00 2024494.00 0.5% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 395219.00 396979.00 0.4% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 857795.00 859667.00 0.2% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 800472.00 802440.00 0.2% test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test 590699.00 591403.00 0.1% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 203006.00 203102.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 42408.00 42424.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12451575.00 12451927.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1396480.00 1396448.00 -0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1396480.00 1396448.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1047708.00 1047580.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 111344.00 111328.00 -0.0% test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1087660.00 1087500.00 -0.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 280664.00 280616.00 -0.0% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 502646.00 502006.00 -0.1% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1033135.00 1031567.00 -0.2% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 2070917.00 2065845.00 -0.2% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 2070917.00 2065845.00 -0.2% test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test 33893.00 33797.00 -0.3% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 39677.00 39549.00 -0.3% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 39674.00 39546.00 -0.3% test-suite :: MultiSource/Benchmarks/MiBench/security-blowfish/security-blowfish.test 11560.00 11512.00 -0.4% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 653867.00 649275.00 -0.7% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 653867.00 649275.00 -0.7% CINT2006/401.bzip2 - extra code vectorized CINT2017rate/541.leela_r CINT2017speed/641.leela_s - function _ZN9FastBoard25get_pattern3_augment_specEiib not inlined anymore, better vectorization CFP2017rate/510.parest_r - better vectorization JM/ldecod - better vectorization JM/lencod - same CINT2006/464.h264ref - extra code vectorized CFP2006/447.dealII - extra vector code MiBench/consumer-lame - vectorized 2 loops previously scalar DOE-ProxyApps-C/miniGMG - small changes Benchmarks/7zip - extra code vectorized, better vectorization CFP2017rate/526.blender_r - extra vectorization CFP2017speed/638.imagick_s CFP2017rate/538.imagick_r - extra vectorization MiBench/consumer-jpeg - extra vectorization CINT2006/400.perlbench - extra vectorization Prolangs-C/TimberWolfMC - small variations Applications/sqlite3 - extra function vectorized and inlined Benchmarks/tramp3d-v4 - extra code vectorized CINT2017rate/500.perlbench_r CINT2017speed/600.perlbench_s - extra code vectorized, function digcpy gets vectorized and inlined CINT2006/473.astar - extra code vectorized MiBench/telecomm-gsm - extra code vectorized, better vector code mediabench/gsm - same MiBench/security-blowfish - extra code vectorized CINT2017speed/625.x264_s CINT2017rate/525.x264_r - sub4x4_dct function vectorized and gets inlined RISCV-V, SiFive-p670, O3+LTO CFP2017rate/510.parest_r - extra vectorization CFP2017rate/526.blender_r - extra vectorization MiBench/consumer-lame - extra vectorized code Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/114284	2024-11-06 10:51:15 -05:00
Simon Pilgrim	e3a0775651	[VectorCombine] foldExtractedCmps - (re-)enable fold on non-commutative binops #114901 exposed that foldExtractedCmps didn't account for non-commutative binops, and were disabled by 05e838f428555bcc4507bd37912da60ea9110ef6 This patch re-enables support for non-commutative binops by ensuring that the LHS/RHS arg order of the binop is retained.	2024-11-06 12:10:31 +00:00
David Sherwood	d77a36e01b	[LoopVectorize] Use new getUniqueLatchExitBlock routine (#108231 ) With PR #88385 I am introducing support for vectorising more loops with early exits that don't require a scalar epilogue. As such, if a loop doesn't have a unique exit block it will not automatically imply we require a scalar epilogue. Also, in all places in the code today where we use the variable LoopExitBlock we actually mean the exit block from the latch. Therefore, it seemed reasonable to add a new getUniqueLatchExitBlock that allows the caller to determine the exit block taken from the latch and use this instead of getUniqueExitBlock. I also renamed LoopExitBlock to be LatchExitBlock. I feel this not only better reflects how the variable is used today, but also prepares the code for PR #88385. While doing this I also noticed that one of the comments in requiresScalarEpilogue is wrong when we require a scalar epilogue, i.e. when we're not exiting from the latch block. This doesn't always imply we have multiple exits, e.g. see the test in Transforms/LoopVectorize/unroll_nonlatch.ll where the latch unconditionally branches back to the only exiting block.	2024-11-06 10:35:35 +00:00
Mel Chen	4480a22c2b	[LV][EVL] Emit vp.merge intrinsic to enable out-loop reduction in EVL vectorization. (#101641 ) Following #90184, this patch emits vp.merge intrinsic, which is used to set the inactive lanes in a select operation to the RHS instead of undef. Currently, it is applied to out-loop reduction for EVL vectorization. This patch performs transformation to convert select(header_mask, LHS, RHS) into vp.merge(all-true, LHS, RHS, EVL) And always use the predicated reduction select to set the incoming value of the reduction phi to support out-loop reduction when using tail folding with EVL. TODO: Postpone the adjustment of the predicated reduction select to VPlanTransform. The current adjustment might be too early, which could lead to a situation where the predicated reduction select is adjusted, but the EVL recipes cannot be successfully generated during VPlanTransform.	2024-11-06 14:53:49 +08:00
Vasileios Porpodas	11b768af3e	[SandboxVec][BottomUpVec] Fix bug in invalidation of analyses This makes sure we don't preserve analyses when we modify the IR. This was causing errors in the EXPENSIVE_CHECKS build.	2024-11-05 18:01:31 -08:00
vporpo	320389d428	[SandboxVec][BottomUpVec] Generate vector instructions (#115087 ) This patch implements some very basic code generation, for some opcodes.	2024-11-05 16:27:24 -08:00
vporpo	ce0d085842	[SandboxVec][Legality] Query the scheduler for legality (#114616 ) This patch adds the legality check of whether the candidate instructions can be scheduled together. This uses a Scheduler object.	2024-11-05 14:54:21 -08:00
Alexey Bataev	0c18def2c1	[SLP]Allow interleaving check only if it is less than number of elements Need to check if the interleaving factor is less than total number of elements in loads slice to handle it correctly and avoid compiler crash. Fixes report https://github.com/llvm/llvm-project/pull/112361#issuecomment-2457227670	2024-11-05 07:06:15 -08:00
Simon Pilgrim	05e838f428	[VectorCombine] foldExtractedCmps - disable fold on non-commutative binops The fold needs to be adjusted to correctly track the LHS/RHS operands, which will take some refactoring, for now just disable the fold in this case. Fixes #114901	2024-11-05 11:42:30 +00:00
Mel Chen	70de0b8bea	[LV][NFC] Simplify initialization of MinProfitableTripCount (#113445 ) Iteration runtime check confirms whether the trip count is greater than VFxUF at least. Therefore, there is no need to adjust the MinProfitableTripCount to VF if it is zero. Retaining the original MinProfitableTripCount information is also beneficial for supporting more profitable runtime checks in the future.	2024-11-05 15:13:59 +08:00
Florian Hahn	596fd103f8	[VPlan] Share logic to connect predecessors in VPBB/VPIRBB execute (NFC) This moves the common logic to connect IRBBs created for a VPBB to their predecessors in the VPlan CFG, making it easier to keep in sync in the future.	2024-11-04 19:01:39 +00:00
Alexey Bataev	899336735a	[SLP]Be more pessimistic about poisonous reductions Consider all possible reductions ops as being non-poisoning boolean logical operations, which require freeze to be fully correct. https://alive2.llvm.org/ce/z/TKWDMP Fixes #114738	2024-11-04 06:13:52 -08:00
Kazu Hirata	aa825b74af	[Vectorize] Remove unused includes (NFC) (#114643 ) Identified with misc-include-cleaner.	2024-11-03 08:58:51 -08:00
Florian Hahn	af6ebb70d2	[VPlan] Implement computeCost for remaining VPSingleDefRecipes. Provide computeCost implementations for all remaining sub-classes of VPSingleDefRecipe. This pushes one of the last uses of getLegacyCost directly to its user: VPReplicateRecipe.	2024-11-02 19:38:31 +00:00
vporpo	083369fd99	[SandboxVec][Legality] Per opcode checks (#114145 ) This patch adds more opcode-specific legality checks.	2024-11-01 15:04:03 -07:00
Florian Hahn	17bad1a9da	[LV] Bail out on header phis in shouldConsiderInvariant. This fixes an infinite recursion in rare cases. Fixes https://github.com/llvm/llvm-project/issues/113794.	2024-11-01 20:51:25 +00:00
Han-Kuan Chen	a795a18bba	[SLP][REVEC] VF should be scaled when ScalarTy is FixedVectorType. (#114551 )	2024-11-02 03:03:52 +08:00
Simon Pilgrim	718d50d6d0	[VectorCombine] foldPermuteOfBinops - prefer the new fold for matching costs. Minor tweak to #114101 - as we're reducing the instruction count, we should prefer the fold if the old/new costs are the same.	2024-11-01 17:28:37 +00:00
Han-Kuan Chen	e4aeeba84c	[SLP][REVEC] When ScalarTy is FixedVectorType, the insertion index should consider the number of elements of ScalarTy. (#114526 )	2024-11-01 21:17:57 +08:00
David Sherwood	4ed7bcb4a6	[VPlan][NFC] Add new getMiddleBlock interface to VPlan (#113558 ) This work is in preparation for PRs #112138 and #88385 where the middle block is not guaranteed to be the immediate successor to the region block. I've simply add new getMiddleBlock() interfaces to VPlan that for now just return cast<VPBasicBlock>(VectorRegion->getSingleSuccessor()) Once PR #112138 lands we'll need to do more work to discover the middle block.	2024-11-01 10:50:52 +00:00
Florian Hahn	3b4c45e4e5	[VPlan] Fix long comment added in b021464d35ca (NFC). Fix formatting of comment added in b021464d35ca.	2024-10-31 21:05:00 +00:00
Florian Hahn	b021464d35	[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975 ) Update VPlan to include the scalar loop header. This allows retiring VPLiveOut, as the remaining live-outs can now be handled by adding operands to the wrapped phis in the scalar loop header. Note that the current version only includes the scalar loop header, no other loop blocks and also does not wrap it in a region block. PR: https://github.com/llvm/llvm-project/pull/109975	2024-10-31 21:36:44 +01:00
Alexey Bataev	e05def081e	[SLP]Do not vectorize code in EH and non-returning blocks The code in EH and non-returning blocks can be skipped by the vectorizer, since it does not add to the perfromance, just consumes compile/link time. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/112221	2024-10-31 13:50:02 -04:00
Alexey Bataev	19a34dded7	[SLP]Do not account external uses in EH block and in non-returning blocks No need to account the cost of the external uses in EH and non-returning basic blocks. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/112045	2024-10-31 13:23:43 -04:00
Alexey Bataev	e7080fd735	[SLP]Extra check if the intruction matked for removal, must be replaced in reduction ops If the instruction is vectorized and it is a part of the reduced values gather/buildvector node, it should replaced in reduced operation instructions before removal properly, to avoid compiler crash. Fixes #114371	2024-10-31 09:59:35 -07:00
Simon Pilgrim	92af82a48d	[VectorCombine] Fold "shuffle (binop (shuffle, shuffle)), undef" --> "binop (shuffle), (shuffle)" (#114101 ) Add foldPermuteOfBinops - to fold a permute (single source shuffle) through a binary op that is being fed by other shuffles. Fixes #94546 Fixes #49736	2024-10-31 10:58:09 +00:00
Florian Hahn	5bd1af5abc	[LV] Directly store VPlan in InnerLoopVectorizer (NFC). The current VPlan is already passed to multiple functions and more in the future. Store it once directly in InnerLoopVectorizer.	2024-10-30 18:39:50 +00:00
Mel Chen	8420dbf2b9	[VPlan] Refine the constructor of VPWidenIntrinsicRecipe. nfc (#113890 ) Infers member MayReadFromMemory, MayWriteToMemory, and MayHaveSideEffects based on intrinsic attributes. --------- Co-authored-by: Florian Hahn <flo@fhahn.com>	2024-10-30 12:22:28 +08:00
Piotr Fusik	3c02fea737	[LV][NFC] Remove stray semicolons (#114057 )	2024-10-30 04:07:14 +01:00
vporpo	ca998b071e	[SandboxVec][Legality] Check wrap flags (#113975 )	2024-10-29 15:37:03 -07:00
Florian Hahn	680901ed80	[VPlan] Implement VPHeaderPHIRecipe::computeCost. Fill out computeCost implementations for various header PHI recipes, matching the legacy cost model for now.	2024-10-29 21:04:31 +00:00
vporpo	a461869db3	[SandboxIR][Pass] Implement Analyses class (#113962 ) The Analyses class provides a way to pass around commonly used Analyses to SandboxIR passes throught `runOnFunction()` and `runOnRegion()` functions.	2024-10-28 18:00:52 -07:00
vporpo	bf4b31ad54	[SandboxVec][Legality] Check Fastmath flags (#113967 )	2024-10-28 15:32:20 -07:00
vporpo	5ea694816b	[SandboxVec][Legality] Check opcodes and types (#113741 )	2024-10-28 14:05:58 -07:00
Florian Hahn	0d0abb351b	[VPlan] Use ResumePhi to create reduction resume phis. (#110004 ) Use VPInstruction::ResumePhi to create phi nodes for reduction resume values in the scalar preheader, similar to how ResumePhis are used for first-order recurrence resume values after 9a5a8731e77. This allows simplifying createAndCollectMergePhiForReduction to only collect reduction resume phis when vectorizing epilogue loops and adding extra incoming edges from the main vector loop. Updating phis for the epilogue vector loops requires special attention, because additional incoming values from the bypass blocks need to be added. PR: https://github.com/llvm/llvm-project/pull/110004	2024-10-28 20:14:08 +01:00
Ellis Hoag	6ab26eab4f	Check hasOptSize() in shouldOptimizeForSize() (#112626 )	2024-10-28 09:45:03 -07:00
Alexey Bataev	7152bf3bc8	[SLP]Do not create new vector node if scalars fully overlap with the existing one If the list of scalars vectorized as the part of the same vector node, no need to generate vector node again, it will be handled as part of overlapping matching. Fixes #113810	2024-10-28 06:59:41 -07:00
Florian Hahn	7fe149cdf0	[VPlan] Replace getIRBasicBlock with IRBB in VPIRBB::execute (NFC). Suggested in https://github.com/llvm/llvm-project/pull/109975. This makes the function consistent throughout.	2024-10-27 16:22:18 +01:00
Shih-Po Hung	266ff98cba	[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974 ) Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the runtime VF, similar to #95305. Since only reverse vector pointers require the runtime VF, the patch sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for reverse vector pointers. As a result, the generation of reverse vector pointers is moved into a separate recipe.	2024-10-26 23:18:50 +08:00
Vasileios Porpodas	1540f772c7	Reapply "[SandboxVec][Legality] Reject non-instructions (#113190 )" This reverts commit eb9f4756bc3daaa4b19f4f46521dc05180814de4.	2024-10-25 12:55:58 -07:00
Vasileios Porpodas	eb9f4756bc	Revert "[SandboxVec][Legality] Reject non-instructions (#113190 )" This reverts commit 6c9bbbc818ae8a0d2849dbc1ebd84a220cc27d20.	2024-10-25 12:52:31 -07:00
vporpo	6c9bbbc818	[SandboxVec][Legality] Reject non-instructions (#113190 )	2024-10-25 12:47:19 -07:00
Florian Hahn	e724226da7	[VPlan] Return cost of 0 for VPWidenCastRecipe without underlying value. In some cases, VPWidenCastRecipes are created but not considered in the legacy cost model, including truncates/extends when evaluating a reduction in a smaller type. Return 0 for such casts for now, to avoid divergences between VPlan and legacy cost models. Fixes https://github.com/llvm/llvm-project/issues/113526.	2024-10-25 21:25:44 +02:00
Florian Hahn	9648271a3c	[LV] Pass flag indicating epilogue is vectorized to executePlan (NFC) This clarifies the flag, which is now only passed if the epilogue loop is being vectorized.	2024-10-25 20:39:47 +02:00
Florian Hahn	7b9f988a53	[VPlan] Limit stride replacement to vector region and middle VPBB (NFC). At the moment this in NFC, but ensures we only replace uses that are dominated by runtime checks as we model more of the skeleton in VPlan.	2024-10-24 15:08:36 -07:00
Alexey Bataev	e914421d7f	[SLP]Do correct signedness analysis for externally used scalars If the scalars is used externally is in the root node, it may have incorrect signedness info because of the conflict with the demanded bits analysis. Need to perform exact signedness analysis and compute it rather than rely on the precomputed value, which might be incorrect for alternate zext/sext nodes. Fixes #113520	2024-10-24 08:59:24 -07:00
Alexey Bataev	d2e7ee77d3	[SLP]Do not check for clustered loads only Since SLP support "clusterization" of the non-load instructions, the restriction for reduced values for loads only should be removed to avoid compiler crash. Fixes #113516	2024-10-24 08:16:42 -07:00
Alexey Bataev	cb5046da26	[SLP]Do not ignore undefs when trying to replace with "poisonous" shuffles Need to consider undefs correctly, when trying to replace them with potentially poisonous values in shuffles. Such elements should not be silently replaced by poison values, instead complex analysis should be implemented to see if it is safe to do it. Fixes #113425	2024-10-24 07:47:23 -07:00
Florian Hahn	ef217a0f6b	[VPlan] Introduce and use getVectorPreheader (NFC). Introduce a dedicated function to retrieve the vector preheader. This ensures the correct block is used, even if the skeleton is exetended.	2024-10-23 21:01:52 -07:00

1 2 3 4 5 ...

5134 Commits