llvm-project

Author	SHA1	Message	Date
Philip Reames	2c7786e94a	Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770 ) This is a follow up to 924907bc6, and is mostly motivated by consistency but does include one additional optimization. In general, we prefer 0.0 over -0.0 as the identity value for an fadd. We use that value in several places, but don't in others. So, let's be consistent and use the same identity (when nsz allows) everywhere. This creates a bunch of test churn, but due to 924907bc6, most of that churn doesn't actually indicate a change in codegen. The exception is that this change enables the use of 0.0 for nsz, but not reasoc, fadd reductions. Or said differently, it allows the neutral value of an ordered fadd reduction to be 0.0.	2024-09-03 09:16:37 -07:00
Matt Arsenault	47d831f2c9	TTI: Check legalization cost of min/max ISD nodes (#100514 ) Instead of counting the cost of the assumed expansion. The AMDGPU costs for the i64 case look too high to me. Preserve default expansion logic	2024-08-08 17:06:11 +04:00
Matt Arsenault	e7630a0d60	AMDGPU: Improve cost handling of canonicalize (#101479 )	2024-08-01 19:02:20 +04:00
Matt Arsenault	524795926b	AMDGPU: Enable vectorization of v2f16 copysign (#100799 )	2024-07-30 08:48:13 +04:00
Matt Arsenault	3a2ef3a634	AMDGPU: Add some baseline cost model tests (#100797 )	2024-07-29 12:13:51 +04:00
Matt Arsenault	01a489133e	AMDGPU: Add baseline test for vectorize of integer min/max (#100513 )	2024-07-26 00:45:10 +04:00
Alexey Bataev	70a54bca6f	[SLP]Improve/fix extracts calculations for non-power-of-2 elements. One of the previous patches introduced initial support for non-power-of-2 number of elements but some parts of the SLP vectorizer still were not adjusted to handle the costs correctly. Patch fixes it by improving analysis of the non-power-of-2 number of elements and fixes in the cost of the extractelements instructions. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/93213	2024-05-24 09:33:36 -04:00
Alexey Bataev	554c47c8e9	[SLP]Fix undef poison vector values shuffles with poisonous vectors. If trying to find vector value in shuffling of the extractelements and one of the vector values is undef value, need to generate real mask value for such vector and either undef vector, or incoming second vector, if non-poisonous.	2024-05-22 10:41:57 -07:00
Jeffrey Byrnes	ea43a30899	[AMDGPU] Vectorize more 16 bit shuffles (#90648 ) In the case of larger vectors, we should still prefer the vectorized version (i.e. shufflevector vs extract/insert chains). In arithmetic chains, vectorization results in chains of packed math instructions (as opposed to unpack/repack & scalarized arithmetic): https://godbolt.org/z/c5onaf6G5 In chains with PHIs, vectorization again removes the unnecessary pack / repack code around BBs: https://godbolt.org/z/vz7zYzvhs	2024-05-21 09:21:36 -07:00
Nikita Popov	a5f3415533	[InstCombine] Replace non-demanded undef vector with poison If an operand (esp to shufflevector or insertelement) is not demanded, canonicalize it from undef to poison.	2023-12-18 16:12:37 +01:00
Alexey Bataev	019aee8327	[SLP]Improve costs in computeExtractCost() to avoid crash after D158449. Need to consider the length of the original vector for extractelements, not the length, matched number of the scalars. It fixes 2 issues: 1) improves cost estimation; 2) Fixes crashes after D158449.	2023-09-29 07:48:02 -07:00
Hans Wennborg	06f3b0ed43	Revert "[SLP]Improve costs in computeExtractCost() to avoid crash after D158449." This caused asserts: Assertion failed: NumElts > 1 && "Expected at least 2-element fixed length vector(s).", file C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp, line 7096 see comment on `59a67ea35d` > Need to consider the length of the original vector for extractelements, > not the length, matched number of the scalars. It fixes 2 issues: 1) > improves cost estimation; 2) Fixes crashes after D158449. This reverts commit 59a67ea35d608480257fc64ec3e5106ef50de740.	2023-09-29 10:42:19 +02:00
Alexey Bataev	59a67ea35d	[SLP]Improve costs in computeExtractCost() to avoid crash after D158449. Need to consider the length of the original vector for extractelements, not the length, matched number of the scalars. It fixes 2 issues: 1) improves cost estimation; 2) Fixes crashes after D158449.	2023-09-28 09:36:08 -07:00
David Spickett	8f548610a6	Revert "[SLP]Use source vector type as the original vector type instead of" This reverts commit 9a99944df068b29b905cd8ba9a2132cc6382b6fb. Due to test suite failures on all our SVE buildbots e.g.: https://lab.llvm.org/buildbot/#/builders/184/builds/7375 clang: ../llvm/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:3565: InstructionCost llvm::AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind, VectorType , ArrayRef<int>, TTI::TargetCostKind, int, VectorType , ArrayRef<const Value *>): Assertion `Mask.size() == TpNumElts && "Expected Mask and Tp size to match!"' failed.	2023-09-22 07:52:16 +00:00
Alexey Bataev	9a99944df0	[SLP]Use source vector type as the original vector type instead of artificial for better cost estimation. Need to use original source vector type, not the one artificially constructed, based on the number of vectorized scalars. It affect the cost significantly.	2023-09-21 11:34:02 -07:00
Alexey Bataev	9a207578ac	[TTI]Add InsertSubvector pattern in improveShuffleKindFromMask(). It improves shuffle instructions estimation and improves vectorization outcome. Differential Revision: https://reviews.llvm.org/D157425	2023-08-18 13:47:01 -07:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00
Krzysztof Drewniak	f9c1ede254	[AMDGPU] Define data layout entries for buffers Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441	2023-05-03 15:25:58 +00:00
ManuelJBrito	8b56da5e9f	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210	2023-04-27 14:41:10 +01:00
Alexey Bataev	93a9be0cea	[SLP]Initial support for reshuffling of non-starting buildvector/gather nodes. Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes. But the compiler may do the same for other gather/buildvector nodes too, just need to check the dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet. Part of D110978 Differential Revision: https://reviews.llvm.org/D144958	2023-03-10 13:19:43 -08:00
Hans Wennborg	3b3a4c270b	Revert "[SLP]Initial support for reshuffling of non-starting buildvector/gather nodes." This caused verifier errors: Instruction does not dominate all uses! %8 = insertelement <2 x i64> %7, i64 %pgocount1330, i64 1 %15 = shufflevector <2 x i64> %8, <2 x i64> poison, <2 x i32> <i32 1, i32 1> in function ?NearestInclusiveAncestorAssignedToSlot@SlotScopedTraversal@blink@@SAPAVElement@2@ABV32@@Z (or register allocator crash when the verifier was disabled). See comment on the code review. > Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes. > But the compiler may do the same for other gather/buildvector nodes too, just need to check the > dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet. > > Part of D110978 > > Differential Revision: https://reviews.llvm.org/D144958 This reverts commit a611b3f3059e4c3b9e7b914091c3edaef099fd5d. It also reverts 7a4061ae372b3262703ffeea3b64db89187db611 which depended on the above.	2023-03-10 14:40:12 +01:00
Alexey Bataev	a611b3f305	[SLP]Initial support for reshuffling of non-starting buildvector/gather nodes. Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes. But the compiler may do the same for other gather/buildvector nodes too, just need to check the dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet. Part of D110978 Differential Revision: https://reviews.llvm.org/D144958	2023-03-07 12:45:40 -08:00
Nikita Popov	580210a0c9	[SLP] Convert some tests to opaque pointers (NFC)	2022-12-23 10:02:57 +01:00
Roman Lebedev	6697140ba1	[NFC] Port all SLPVectorizer tests to `-passes=` syntax	2022-12-07 21:44:09 +03:00
Alexey Bataev	b505fd559d	[SLP]Redesign vectorization of the gather nodes. Gather nodes are vectorized as simply vector of the scalars instead of relying on the actual node. It leads to the fact that in some cases we may miss incorrect transformation (non-matching set of scalars is just ended as a gather node instead of possible vector/gather node). Better to rely on the actual nodes, it allows to improve stability and better detect missed cases. Differential Revision: https://reviews.llvm.org/D135174	2022-11-10 10:59:54 -08:00
skc7	42bce72536	Reapply "[SLP] Extend reordering data of tree entry to support PHInodes". Reapplies 87a2086 (which was reverted in 656f1d8). Fix for scalable vectors in getInsertIndex merged in 46d53f4. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D137537	2022-11-08 21:21:28 +05:30
Alexey Bataev	ecd0b5a532	Revert "[SLP]Redesign vectorization of the gather nodes." This reverts commit 8ddd1ccdf89317be1c40fa9183e214878a56151e to fix buildbots failures reported in https://lab.llvm.org/buildbot#builders/74/builds/14839	2022-11-07 08:35:21 -08:00
Alexey Bataev	8ddd1ccdf8	[SLP]Redesign vectorization of the gather nodes. Gather nodes are vectorized as simply vector of the scalars instead of relying on the actual node. It leads to the fact that in some cases we may miss incorrect transformation (non-matching set of scalars is just ended as a gather node instead of possible vector/gather node). Better to rely on the actual nodes, it allows to improve stability and better detect missed cases. Differential Revision: https://reviews.llvm.org/D135174	2022-11-07 07:04:38 -08:00
David Green	656f1d8b74	Revert "[SLP] Extend reordering data of tree entry to support PHI nodes" This reverts commit 87a20868eb2043420d48f591c3437472f7137834 as it has problems with scalable vectors and use-list orders. Test to follow.	2022-11-06 11:43:51 +00:00
skc7	87a20868eb	[SLP] Extend reordering data of tree entry to support PHI nodes Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D136757	2022-11-01 04:50:04 +00:00
skc7	e98501e27e	[SLP][NFC] Added test to check resulting mask in shufflevector as per order of phinodes Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D136553	2022-10-23 17:20:47 +00:00
Bjorn Pettersson	3be72f4029	[test][SLPVectorizer] Use -passes syntax in RUN lines. NFC	2022-10-13 10:44:38 +02:00
Alexey Bataev	7d8060bc19	[SLP]Improve reductions vectorization. The pattern matching and vectgorization for reductions was not very effective. Some of of the possible reduction values were marked as external arguments, SLP could not find some reduction patterns because of too early attempt to vectorize pair of binops arguments, the cost of consts reductions was not correct. Patch addresses these issues and improves the analysis/cost estimation and vectorization of the reductions. The most significant changes in SLP.NumVectorInstructions: Metric: SLP.NumVectorInstructions [140/14396] Program results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 920.00 3548.00 285.7% test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test 66.00 122.00 84.8% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 100.00 128.00 28.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 664.00 810.00 22.0% test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 592.00 687.00 16.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 402.00 426.00 6.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1665.00 1745.00 4.8% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 135.00 139.00 3.0% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 135.00 139.00 3.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 388.00 397.00 2.3% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 895.00 914.00 2.1% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 240.00 244.00 1.7% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 240.00 244.00 1.7% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 820.00 832.00 1.5% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 820.00 832.00 1.5% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 14804.00 14914.00 0.7% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 8125.00 8183.00 0.7% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1330.00 1338.00 0.6% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1330.00 1338.00 0.6% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 9832.00 9880.00 0.5% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 5267.00 5291.00 0.5% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 4018.00 4024.00 0.1% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 4018.00 4024.00 0.1% test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test 426.00 424.00 -0.5% test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test 426.00 424.00 -0.5% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 201.00 192.00 -4.5% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 201.00 192.00 -4.5% 644.nab_s and 544.nab_r - reduced number of shuffles but increased number of useful vectorized instructions. 641.leela_s and 541.leela_r - the function `@_ZN9FastBoard25get_pattern3_augment_specEiib` is not inlined anymore but its body gets vectorized successfully. Before, the function was inlined twice and vectorized just after inlining, currently it is not required. The vector code looks pretty similar, just like as it was before. Differential Revision: https://reviews.llvm.org/D111574	2022-05-18 13:22:18 -07:00
Philip Reames	48cc9287f5	Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" (try 3) The original commit exposed several missing dependencies (e.g. latent bugs in SLP scheduling). Most of these were fixed over the weekend and have had several days to bake. The last was fixed this morning after being noticed in manual review of test changes yesterday. See the review thread for links to each change. Original commit message follows: SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538	2022-03-25 10:39:23 -07:00
Philip Reames	deae979a2c	Revert "Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""" This reverts commit 738042711bc08cde9135873200b1d088e6cf11c3. A second, apparently separate, issue has been reported on the original review.	2022-03-03 11:35:34 -08:00
Philip Reames	738042711b	Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" Root issue which triggered the revert was fixed in 689bab. No changes in the reapplied patch. Original commit message follows: SLP currently schedules all instructions within a scheduling window which stretches from the first instr uction potentially vectorized to the last. This window can include a very large number of unrelated instruct ions which are not being considered for vectorization. This change switches the code to only schedule the su b-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538	2022-03-02 10:47:20 -08:00
Arthur Eubanks	9c6250ee41	Revert "[SLP] Schedule only sub-graph of vectorizable instructions" This reverts commit 0539a26d91a1b7c74022fa9cf33bd7faca87544d. Causes a miscompile, see comments on D118538. Required updating bottom-to-top-reorder.ll.	2022-03-01 17:31:16 -08:00
Philip Reames	0539a26d91	[SLP] Schedule only sub-graph of vectorizable instructions SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538	2022-02-22 10:15:55 -08:00
Philip Reames	15f7857412	[tests] Refresh autogen tests for SLP	2022-01-24 17:05:58 -08:00
Alexey Bataev	74af4bb1f4	[SLP]Remove unnecessary UndefValue in CreateShuffle. No need to use UndefValue in CreateShuffle call. Differential Revision: https://reviews.llvm.org/D104113	2021-06-11 08:08:30 -07:00
Anton Afanasyev	ab2c499d3a	[SLP] Add insertelement instructions to vectorizable tree Add new type of tree node for `InsertElementInst` chain forming vector. These instructions could be either removed, or replaced by shuffles during vectorization and we can add this node to cost model, so naturally estimating their cost, getting rid of `CompensateCost` tricks and reducing further work for InstCombine. This fixes PR40522 and PR35732 in a natural way. Also this patch is the first step towards revectorization of partially vectorization (to fix PR42022 completely). After adding inserts to tree the next step is to add vector instructions there (for instance, to merge `store <2 x float>` and `store <2 x float>` to `store <4 x float>`). Fixes PR40522 and PR35732. Differential Revision: https://reviews.llvm.org/D98714	2021-05-13 07:41:45 +03:00
Anton Afanasyev	00a0595b25	[SLP][Test] Fix and precommit tests for D98714	2021-05-13 07:41:06 +03:00
Alexey Bataev	a3fd82c289	[SLP]Fix the crash on cost calculation if non-compatible vectors shuffled. If the extracts from the non-power-2 vectors are recognized as shuffles, need some extra checks to not crash cost calculations if trying to gext the ecost for subvector extracts. In this case need to check carefully that we do not exit out of bounds of the original vector, otherwise the TTI's cost model will crash on assert. Differential Revision: https://reviews.llvm.org/D101477	2021-04-30 09:34:20 -07:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Sanjay Patel	79b1b4a581	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552	2021-02-12 08:13:50 -05:00
Juneyoung Lee	4a8e6ed2f7	[SLP,LV] Use poison constant vector for shufflevector/initial insertelement This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef. This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector. The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ). Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94061	2021-01-06 11:22:50 +09:00
Juneyoung Lee	9b29610228	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923	2020-12-30 22:36:08 +09:00
Juneyoung Lee	db7a2f347f	Precommit transform tests that have poison as insertelement's placeholder This commit copies existing tests at llvm/Transforms and replaces 'insertelement undef' in those files with 'insertelement poison'. (see https://reviews.llvm.org/D93586) Tests listed using this script: grep -R -E '^[^;]insertelement <.> undef,' . \| cut -d":" -f1 \| uniq \| wc -l Tests updated: file_org=llvm/test/Transforms/$1 file=${file_org%.ll}-inseltpoison.ll cp $file_org $file sed -i -E 's/^([^;])insertelement <(.)> undef/\1insertelement <\2> poison/g' $file head -1 $file \| grep "Assertions have been autogenerated by utils/update_test_checks.py" -q if [ "$?" == 1 ]; then echo "$file : should be manually updated" # I manually updated the script exit 1 fi python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file	2020-12-24 11:46:17 +09:00

1 2

72 Commits