llvm-project

Author	SHA1	Message	Date
Yaxun (Sam) Liu	9d5adc7e49	Revert "reland e5581df60a35 [SimplifyCFG] accumulate bonus insts cost" This reverts commit bd7949bcd86633bd4203b2ba6f891aea00fce4d1. Revert this patch since reviwers have different opinions regarding the approach in post-commit review. Will open RFC for further discussion. Differential Revision: https://reviews.llvm.org/D132408	2022-10-25 12:15:39 -04:00
Yaxun (Sam) Liu	bd7949bcd8	reland e5581df60a35 [SimplifyCFG] accumulate bonus insts cost Fixed compile time increase due to always constructing LocalCostTracker. Now only construct LocalCostTracker when needed.	2022-10-24 15:43:53 -04:00
bipmis	38f3e44997	[AggressiveInstCombine] Load merge the reverse load pattern of consecutive loads. This patch extends the load merge/widen in AggressiveInstCombine() to handle reverse load patterns. Differential Revision: https://reviews.llvm.org/D135137	2022-10-19 11:22:58 +01:00
bipmis	82e3056255	Add test for combinations of four i8-loads spliced into a 32-bit value	2022-10-18 15:40:56 +01:00
Nikita Popov	627a0c6b40	[PhaseOrdering] Name instructions in test (NFC) Run through opt -instnamer.	2022-10-05 17:04:11 +02:00
Simon Pilgrim	5849fcb635	Revert rG1b7089fe67b924bdd5ecef786a34bdba7a88778f "[SLP] Add ScalarizationOverheadBuilder helper to track vector extractions" Revert rGef89409a59f3b79ae143b33b7d8e6ee6285aa42f "Fix 'unused-lambda-capture' gcc warning. NFCI." Revert rG926ccfef032d206dcbcdf74ca1e3a9ebf4d1be45 "[SLP] ScalarizationOverheadBuilder - demand all elements for scalarization if the extraction index is unknown / out of bounds" Revert ScalarizationOverheadBuilder sequence from D134605 - when accumulating extraction costs by Type (instead of specific Value), we are not distinguishing enough when they are coming from the same source or not, and we always just count the cost once. This needs addressing before we can use getScalarizationOverhead properly.	2022-09-30 11:22:48 +01:00
Simon Pilgrim	1b7089fe67	[SLP] Add ScalarizationOverheadBuilder helper to track vector extractions Instead of accumulating all extraction costs separately and then adjusting for repeated subvector extractions, this patch collects all the extractions and then converts to calls to getScalarizationOverhead to improve the accuracy of the costs. I'm not entirely satisfied with the getExtractWithExtendCost handling yet - this still just adds all the getExtractWithExtendCost costs together - it really needs to be replaced with a "getScalarizationOverheadWithExtend", but that will require further refactoring first. This replaces my initial attempt in D124769. Differential Revision: https://reviews.llvm.org/D134605	2022-09-27 14:49:07 +01:00
Sanjay Patel	271f3b91bb	[PhaseOrdering] add test for issue #50778 ; NFC Several different passes are involved to get the expected IR, and we don't want that to break again.	2022-09-23 12:12:13 -04:00
Sanjay Patel	34f8112b79	Revert "[PhaseOrdering] add test for issue #50778 ; NFC" This reverts commit cdc012fa2696434689c872abc4797a1ee8284ddf. This accidentally deleted a test file (not sure how that became part of the commit).	2022-09-23 12:06:29 -04:00
Sanjay Patel	cdc012fa26	[PhaseOrdering] add test for issue #50778 ; NFC Several different passes are involved to get the expected IR, and we don't want that to break again.	2022-09-23 12:03:03 -04:00
Nikita Popov	dd61726d5b	Revert "[SimplifyCFG] accumulate bonus insts cost" This reverts commit e5581df60a35fffb0c69589777e4e126c849405f. This causes major compile-time regressions, about 2-3% end-to-end on CTMark.	2022-09-19 14:46:43 +02:00
Yaxun (Sam) Liu	e5581df60a	[SimplifyCFG] accumulate bonus insts cost SimplifyCFG folds bool foo() { if (cond1) return false; if (cond2) return false; return true; } as bool foo() { if (cond1 \| cond2) return false return true; } 'cond2' is called 'bonus insts' in branch folding since they introduce overhead since the original CFG could do early exit but the folded CFG always executes them. SimplifyCFG calculates the costs of 'bonus insts' of a folding a BB into its predecessor BB which shares the destination. If it is below bonus-inst-threshold, SimplifyCFG will fold that BB into its predecessor and cond2 will always be executed. When SimplifyCFG calculates the cost of 'bonus insts', it only consider 'bonus' insts in the current BB to be considered for folding. This causes issue for unrolled loops which share destinations, e.g. bool foo(int a) { for (int i = 0; i < 32; i++) if (a[i] > 0) return false; return true; } After unrolling, it becomes bool foo(int a) { if(a[0]>0) return false if(a[1]>0) return false; //... if(a[31]>0) return false; return true; } SimplifyCFG will merge each BB with its predecessor BB, and ends up with 32 'bonus insts' which are always executed, which is much slower than the original CFG. The root cause is that SimplifyCFG does not consider the accumulated cost of 'bonus insts' which are folded from different BB's. This patch fixes that by introducing a ValueMap to track costs of 'bonus insts' coming from different BB's into the same BB, and cuts off if the accumulated cost exceeds a threshold. Reviewed by: Artem Belevich, Florian Hahn, Nikita Popov, Matt Arsenault Differential Revision: https://reviews.llvm.org/D132408	2022-09-18 20:21:14 -04:00
Sanjay Patel	d6498abc24	[InstCombine] remove multi-use add demanded constant fold This was originally part of D133788. There are no visible regressions. All of the diffs show a large unsigned constant becoming a small negative constant. This should be better for analysis (and slightly less compile-time) and codegen.	2022-09-18 14:23:43 -04:00
Valery N Dmitriev	18dde772d6	[SLP] Unify main/alternate selection for CmpInst instructions Make main/alternate operation selection logic for CmpInst consistent across SLP vectorizer. Differential Revision: https://reviews.llvm.org/D133430	2022-09-13 09:20:25 -07:00
Simon Pilgrim	626a84db47	[CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry Begin the refactoring to use CostKindTblEntry and return real latency/codesize/sizelatency costs instead of reusing the throughput numbers	2022-09-04 17:59:08 +01:00
Simon Pilgrim	8dc99180a6	[PhaseOrdering] Move X86 unsigned-multiply-overflow-check.ll test under X86	2022-09-04 17:54:32 +01:00
Simon Pilgrim	3edec9ba60	[CostModel][X86] Support cost kind specific look up tables (REAPPLIED) Most of our cost model tables have been created assuming cost kind == recip-throughput. But we're starting to see passes wanting to get accurate costs for the other kinds as well. Some of these can be determined procedurally (e.g. codesize by default could just be the split count after type legalization), but others are going to need to be handled in cost tables - this is especially true for x86 which has so many ISA combinations. I've created a 'CostKindCosts' struct which can hold cost values for the 4 cost kinds, defaulting to -1U for unknown cost, this can be used with the existing CostTblEntryT/CostTableLookup template code. I've also added a [TargetCostKind] accessor to make it much easier to look up individual <Optional> costs. This just changes the ISD::SELECT costs to check the effect (and also to check that the ISD::SETCC are correctly handled for default/None cost kinds) - the plan would be to slowly extend this and move the CostKindTblEntry type somewhere generic to allow other targets to use it once its matured. I'm also going to resurrect D103695 so that it can help with latency/codesize/sizelatency coverage testing. For sizelatency - IIRC the definition was vague to let it be target specific - I've tried to use typical uop counts so they're comparable to MicroOpBufferSize etc. REAPPLIED: Added early out to prevent getCmpSelInstrCost being used for anything but generic integer/float scalar/vector types - getTypeLegalizationCost can't handle the "exotic" TypeID enums that some passes attempt to get a costs for (aggregates etc.). Differential Revision: https://reviews.llvm.org/D132216	2022-08-25 16:49:17 +01:00
Benjamin Kramer	ab85996e47	Revert "[CostModel][X86] Support cost kind specific look up tables" This reverts commit 45846854a2c1414c27bc819033f6de588dea56fe. This triggers an assertion failure during Clang selfhost Unknown type! UNREACHABLE executed at llvm/lib/CodeGen/ValueTypes.cpp:548! * SIGABRT received by PID 6107 (TID 6107) on cpu 218 from PID 6107; stack trace: * @ 0x556c8827c2d1 64 llvm::llvm_unreachable_internal() @ 0x556c82a5542a 32 llvm::MVT::getVT() @ 0x556c82a54a28 80 llvm::EVT::getEVT() @ 0x556c7dda1526 80 llvm::TargetLoweringBase::getValueType() @ 0x556c8174dd38 112 llvm::BasicTTIImplBase<>::getTypeLegalizationCost() @ 0x556c81755e72 144 llvm::X86TTIImpl::getCmpSelInstrCost() @ 0x556c8174cadf 512 llvm::TargetTransformInfoImplCRTPBase<>::getInstructionCost() @ 0x556c84ab4dd2 32 llvm::TargetTransformInfo::getInstructionCost() @ 0x556c82ead283 1968 llvm::sinkRegion()	2022-08-25 15:42:44 +02:00
Simon Pilgrim	2e5f16516a	[CostModel][X86] Add CodeSize handling for fdiv ops Eventually this will be part of the cost table lookup	2022-08-25 14:08:03 +01:00
Simon Pilgrim	45846854a2	[CostModel][X86] Support cost kind specific look up tables Most of our cost model tables have been created assuming cost kind == recip-throughput. But we're starting to see passes wanting to get accurate costs for the other kinds as well. Some of these can be determined procedurally (e.g. codesize by default could just be the split count after type legalization), but others are going to need to be handled in cost tables - this is especially true for x86 which has so many ISA combinations. I've created a 'CostKindCosts' struct which can hold cost values for the 4 cost kinds, defaulting to -1U for unknown cost, this can be used with the existing CostTblEntryT/CostTableLookup template code. I've also added a [TargetCostKind] accessor to make it much easier to look up individual <Optional> costs. This just changes the ISD::SELECT costs to check the effect (and also to check that the ISD::SETCC are correctly handled for default/None cost kinds) - the plan would be to slowly extend this and move the CostKindTblEntry type somewhere generic to allow other targets to use it once its matured. I'm also going to resurrect D103695 so that it can help with latency/codesize/sizelatency coverage testing. For sizelatency - IIRC the definition was vague to let it be target specific - I've tried to use typical uop counts so they're comparable to MicroOpBufferSize etc. Differential Revision: https://reviews.llvm.org/D132216	2022-08-25 12:23:36 +01:00
Jay Foad	2754ff883d	[InstCombine] Try not to demand low order bits for Add Don't demand low order bits from the LHS of an Add if: - they are not demanded in the result, and - they are known to be zero in the RHS, so they can't possibly overflow and affect higher bit positions This is intended to avoid a regression from a future patch to change the order of canonicalization of ADD and AND. Differential Revision: https://reviews.llvm.org/D130075	2022-08-22 20:03:53 +01:00
Simon Pilgrim	53c0be28a7	[PhaseOrdering][X86] Regenerate vdiv.ll Noticed while cleaning up x86 cost tables for upcoming cost kind support and it affected this test	2022-08-21 13:39:55 +01:00
Simon Pilgrim	08d153d806	[ValueTracking] computeKnownBits - attempt to use a branch condition feeding a phi to improve known bits range (PR38280) If computeKnownBits encounters a phi node, and we fail to determine any known bits through direct analysis, see if the incoming value is part of a branch condition feeding the phi. Handle cases where icmp(IncomingValue PRED Constant) is driving a branch instruction feeding that phi node - at the moment this only handles EQ/ULT/ULE predicate cases as they are the most straightforward to handle and most likely for branch-loop 'max upper bound' cases - we can extend this if/when necessary. I investigated a more general icmp(LHS PRED RHS) KnownBits system, but the hard limits we put on value tracking depth through phi nodes meant that we were mainly catching constants anyhow. Fixes the pointless vectorization in PR38280 / Issue #37628 (excessive unrolling still needs handling though) Differential Revision: https://reviews.llvm.org/D131838	2022-08-16 16:54:44 +01:00
Florian Hahn	1638ad1ebf	[PhaseOrdering] Add test showing excessive unrolling of vector loop. Test cases based on #42332 showing excessive unrolling with both known and runtime trip counts.	2022-08-16 16:29:15 +01:00
Simon Pilgrim	c771d0fd4b	[Instcombine] Add (simplified) pointless loop unroll / vectorization test for Issue #37628	2022-08-13 14:34:08 +01:00
Sanjay Patel	bfb9b8e075	[Passes] add a tail-call-elim pass near the end of the opt pipeline We call tail-call-elim near the beginning of the pipeline, but that is too early to annotate calls that get added later. In the motivating case from issue #47852, the missing 'tail' on memset leads to sub-optimal codegen. I experimented with removing the early instance of tail-call-elim instead of just adding another pass, but that appears to be slightly worse for compile-time: +0.15% vs. +0.08% time. "tailcall" shows adding the pass; "tailcall2" shows moving the pass to later, then adding the original early pass back (so 1596886802 is functionally equivalent to 180b0439dc ): https://llvm-compile-time-tracker.com/index.php?config=NewPM-O3&stat=instructions&remote=rotateright Note that there was an effort to split the tail call functionality into 2 passes - that could help reduce compile-time if we find that this change costs more in compile-time than expected based on the preliminary testing: D60031 Differential Revision: https://reviews.llvm.org/D130374	2022-07-25 15:25:47 -04:00
Nikita Popov	41d5033eb1	[IR] Enable opaque pointers by default This enabled opaque pointers by default in LLVM. The effect of this is twofold: * If IR that contains neither explicit ptr nor %T* types is passed to tools, we will now use opaque pointer mode, unless -opaque-pointers=0 has been explicitly passed. * Users of LLVM as a library will now default to opaque pointers. It is possible to opt-out by calling setOpaquePointers(false) on LLVMContext. A cmake option to toggle this default will not be provided. Frontends or other tools that want to (temporarily) keep using typed pointers should disable opaque pointers via LLVMContext. Differential Revision: https://reviews.llvm.org/D126689	2022-06-02 09:40:56 +02:00
Nikita Popov	1721ff1dfd	[GVN] Enable enable-split-backedge-in-load-pre option by default This option was added in D89854. It prevents GVN from performing load PRE in a loop, if doing so would require critical edge splitting on the backedge. From the review: > I know that GVN Load PRE negatively impacts peeling, > loop predication, so the passes expecting that latch has > a conditional branch. In the PhaseOrdering test in this patch, splitting the backedge negatively affects vectorization: After critical edge splitting, the loop gets rotated, effectively peeling off the first loop iteration. The effect is that the first element is handled separately, then the bulk of the elements use a vectorized reduction (but using unaligned, off-by-one memory accesses) and then a tail of 15 elements is handled separately again. It's probably worth noting that the loop load PRE from D99926 is not affected by this change (as it does not need backedge splitting). This is about normal load PRE that happens to occur inside a loop. Differential Revision: https://reviews.llvm.org/D126382	2022-05-30 09:55:58 +02:00
Nikita Popov	6346a026af	[PhaseOrdering] Add test for unprofitable loop load PRE backedge splitting (NFC)	2022-05-25 16:06:41 +02:00
Nikita Popov	b2a13d3e2d	[InstCombine] Use IRBuilder in freeze pushing transform (PR55619) Use IRBuilder so that the newly created freeze instructions automatically gets inserted back into the IC worklist. The changed worklist processing order leads to some cosmetic differences in tests. Fixes https://github.com/llvm/llvm-project/issues/55619.	2022-05-24 15:48:28 +02:00
Florian Hahn	b7315ffc3c	[LAA,LV] Add initial support for pointer-diff memory checks. This patch adds initial support for a pointer diff based runtime check scheme for vectorization. This scheme requires fewer computations and checks than the existing full overlap checking, if it is applicable. The main idea is to only check if source and sink of a dependency are far enough apart so the accesses won't overlap in the vector loop. To do so, it is sufficient to compute the difference and compare it to the `VF * UF * AccessSize`. It is sufficient to check `(Sink - Src) <u VF * UF * AccessSize` to rule out a backwards dependence in the vector loop with the given VF and UF. If Src >=u Sink, there is not dependence preventing vectorization, hence the overflow should not matter and using the ULT should be sufficient. Note that the initial version is restricted in multiple ways: 1. Pointers must only either be read or written, by a single instruction (this allows re-constructing source/sink for dependences with the available information) 2. Source and sink pointers must be add-recs, with matching steps 3. The step must be a constant. 3. abs(step) == AccessSize. Most of those restrictions can be relaxed in the future. See https://github.com/llvm/llvm-project/issues/53590. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D119078	2022-05-16 15:27:22 +01:00
Alexey Bataev	7ea03f0b4e	[SLP]Improve reductions analysis and emission, part 1. Currently SLP vectorizer walks through the instructions and selects 3 main classes of values: 1) reduction operations - instructions with same reduction opcode (add, mul, min/max, etc.), which build the reduction, 2) reduced values - instructions with the same opcodes, but different from the reduction opcode, 3) extra arguments - all other values, instructions from the different basic block rather than the root node, instructions with to many/less uses. This scheme is not very efficient. It excludes some instructions and all non-instruction values from the reductions (constants, proficient gathers), to many possibly reduced values are marked as extra arguments. Patch improves this process by introducing a bit extended analysis stage. During this stage, we still try to select 3 classes of the values: 1) reduction operations - same as before, 2) possibly reduced values - all instructions from the current block/non-instructions, which may build a vectorization tree, 3) extra arguments - instructions from the different basic blocks. Additionally, an extra sorting of the possibly reduced values occurs to build the scalar sequences which highly likely will bed vectorized, e.g. loads are grouped by the distance between them, constants are grouped together, cmp instructions are sorted by their compare types and predicates, extractelement instructions are sorted by the vector operand, etc. Also, these groups are reordered by their length so the longest group is the first in the list of the possibly reduced values. The vectorization process tries to emit the reductions for all these groups. These reductions, remaining non-vectorized possible reduced values and extra arguments are then combined into the final expression just like it was before. Differential Revision: https://reviews.llvm.org/D114171	2022-05-02 12:03:58 -07:00
Simon Pilgrim	6f80830f06	[PhaseOrdering][X86] Use passes="" instead of passes='' so DOS can evaluate the cmd lines Fix regenerating the tests on windows builds	2022-04-30 19:56:49 +01:00
Simon Pilgrim	c6994ec12e	[PhaseOrdering][X86] Use passes="default<O3>" instead of passes='default<O3>' so DOS can evaluate the cmd lines Fix regenerating the tests on windows builds	2022-04-30 19:53:07 +01:00
Nikita Popov	20cf4f8af8	[PhaseOrdering] Remove RUN lines for legacy PM (NFC)	2022-04-21 14:43:00 +02:00
Alexey Bataev	883571928c	Revert "[SLP]Improve reductions analysis and emission, part 1." This reverts commit 0e1f4d4d3cb08ff84df5adc4f5e41d0a2cebc53d to fix a crash reported in PR54976	2022-04-19 06:17:03 -07:00
Alexey Bataev	0e1f4d4d3c	[SLP]Improve reductions analysis and emission, part 1. Currently SLP vectorizer walks through the instructions and selects 3 main classes of values: 1) reduction operations - instructions with same reduction opcode (add, mul, min/max, etc.), which build the reduction, 2) reduced values - instructions with the same opcodes, but different from the reduction opcode, 3) extra arguments - all other values, instructions from the different basic block rather than the root node, instructions with to many/less uses. This scheme is not very efficient. It excludes some instructions and all non-instruction values from the reductions (constants, proficient gathers), to many possibly reduced values are marked as extra arguments. Patch improves this process by introducing a bit extended analysis stage. During this stage, we still try to select 3 classes of the values: 1) reduction operations - same as before, 2) possibly reduced values - all instructions from the current block/non-instructions, which may build a vectorization tree, 3) extra arguments - instructions from the different basic blocks. Additionally, an extra sorting of the possibly reduced values occurs to build the scalar sequences which highly likely will bed vectorized, e.g. loads are grouped by the distance between them, constants are grouped together, cmp instructions are sorted by their compare types and predicates, extractelement instructions are sorted by the vector operand, etc. Also, these groups are reordered by their length so the longest group is the first in the list of the possibly reduced values. The vectorization process tries to emit the reductions for all these groups. These reductions, remaining non-vectorized possible reduced values and extra arguments are then combined into the final expression just like it was before. Differential Revision: https://reviews.llvm.org/D114171	2022-04-12 17:46:11 -07:00
Muhammad Omair Javaid	a96638e50e	Revert "[NFCI] Regenerate PhaseOrdering test checks" This reverts commit e91fe08999d5f5d7e7777837c529bac692d06c1b. Breaks following buildbots: https://lab.llvm.org/buildbot/#/builders/171	2022-04-04 15:30:57 +05:00
Dávid Bolvanský	e91fe08999	[NFCI] Regenerate PhaseOrdering test checks	2022-04-04 00:28:57 +02:00
Simon Pilgrim	5dde9c1286	[CostModel][X86] Reduce cost of extracting bool vector elements For constant indices, these are now just a MOVMSK+TEST/BT	2022-03-18 19:02:47 +00:00
Nikita Popov	a266af7211	[InstCombine] Canonicalize SPF to min/max intrinsics Now that integer min/max intrinsics have good support in both InstCombine and other passes, start canonicalizing SPF min/max to intrinsic min/max. Once this sticks, we can stop matching SPF min/max in various places, and can remove hacks we have for preventing infinite loops and breaking of SPF canonicalization. Differential Revision: https://reviews.llvm.org/D98152	2022-02-24 09:01:20 +01:00
William S. Moses	d9da6a535f	[LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information. This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D119965	2022-02-17 20:13:07 -05:00
Roman Lebedev	07cf95942f	[NFC][PhaseOrdering] Improve test coverage for D119975	2022-02-17 14:58:22 +03:00
Roman Lebedev	a5b9987aab	[NFC][PhaseOrdering] spurious-peeling.ll: also test -O1/-O2 results	2022-02-17 00:18:53 +03:00
William S. Moses	73ee82871e	[NFC][PhaseOrdering] Precommit tests from D119965	2022-02-17 00:18:53 +03:00
Roman Lebedev	ee4ba9f3a1	Revert "[SimplifyCFG] Start redesigning `FoldTwoEntryPHINode()`." Unfortunately, it seems we really do need to take the long route; start from the "merge" block, find (all the) "dispatch" blocks, and deal with each "dispatch" block separately, instead of simply starting from each "dispatch" block like it would logically make sense, otherwise we run into a number of other missing folds around `switch` formation, missing sinking/hoisting and phase ordering. This reverts commit 85628ce75b3084dc0f185a320152baf85b59aba7. This reverts commit c5fff9095342a792bf4b9a077fe3c3a83c4e566c. This reverts commit 34a98e1046e3aa55e5f26ab20a15e96b4034d25a. This reverts commit 1e353f092288309d74d380367aa50bbd383780ed.	2022-02-03 12:32:50 +03:00
Alexey Bataev	8a1dfbc4d8	Revert "[SLP]Alternate vectorization for cmp instructions." This reverts commit 842a2360a84692f2e4c37cc3e652640e6627d004 to fix the bugs reported by users in https://reviews.llvm.org/D115955#3291538.	2022-02-02 12:06:36 -08:00
Alexey Bataev	842a2360a8	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-02-02 10:32:52 -08:00
Roman Lebedev	1e353f0922	[SimplifyCFG] Start redesigning `FoldTwoEntryPHINode()`. The current `FoldTwoEntryPHINode()` is not quite designed correctly. It starts from the merge point, and then tries to detect the 'divergence' point. Because of that, it is limited to the simple two-predecessor case, where the PHI completely goes away. but that is rather pessimistic, and it doesn't make much sense from the costmodel side of things. For example if there is some other unrelated predecessor of the merge point, we could split the merge point so that the then/else blocks first branch to an empty block and then to the merge point, and then we'd be able to speculate the then/else code. But if we'd instead simply start at the divergence point, and look for the merge point, then we'll just natively support this case. There's also the fact that `SpeculativelyExecuteBB()` already does just that, but only if there is a single block to speculate, and with a much more restrictive cost model. But that also means we have code duplication. Now, sadly, while this is as much NFCI as possible, there is just no way to cleanly migrate to the proper implementation. The results are going to be different somewhat because of various phase ordering effects and SimplifyCFG block iteration strategy.	2022-02-02 17:53:56 +03:00
Benjamin Kramer	0c3d22a592	Revert "[SLP]Alternate vectorization for cmp instructions." This reverts commit 83620bd2ad867f706c699d0f2b8be10e43d9f3d7. It's causing miscompilations, see review comments at https://reviews.llvm.org/D115955	2022-02-02 13:08:51 +01:00

1 2 3 4

178 Commits