llvm-project

Author	SHA1	Message	Date
Matt Arsenault	fbeda975d2	InstCombine: Drop some typed pointer cast handling	2023-07-31 10:34:31 -04:00
Alexey Bataev	662efdee9b	[SLP][NFC]Improve handling of MinBWs container, NFC. Replaced by DenseMap instead of MapVector(the order is not important, just lookup is used) + reduced number of lookups.	2023-07-31 07:26:55 -07:00
Nikita Popov	72ec2c007e	[InstCombine] Fix handling of irreducible loops (PR64259) Fixes a regression introduced by D75362 for irreducible control flow. In that case, we may visit the predecessor that renders the current block live only later, and incorrectly determine that a block is dead. Instead, switch to using the same DeadEdges based implementation we also use during the main InstCombine iteration. This temporarily regresses some cases that need replacement of dead phi operands with poison, which is currently only done during the main run, but not worklist population. This will be addressed in a followup, to keep it separate from the correctness fix here. Fixes https://github.com/llvm/llvm-project/issues/64259.	2023-07-31 16:20:22 +02:00
Alexey Bataev	85635c7f60	[SLP][NFC]Use ScalarTy consistently in getEntryCost, NFC.	2023-07-31 06:52:56 -07:00
Nikita Popov	09156b36c6	[InstCombine] Move worklist preparation into InstCombinerImpl (NFC)	2023-07-31 15:18:12 +02:00
Matt Arsenault	d74c89fdb4	InstCombine: Drop some typed pointer bitcasts	2023-07-31 08:05:58 -04:00
Matt Arsenault	d388222be2	InstCombine: Drop some typed pointer bitcast handling	2023-07-31 08:05:12 -04:00
Nikita Popov	41895843b5	[InstCombine] Only perform one iteration InstCombine is a worklist-driven algorithm, which works roughly as follows: * All instructions are initially pushed to the worklist. The initial order is in RPO program order. * All newly inserted instructions get added to the worklist. * When an instruction is folded, its users get added back to the worklist. * When the use-count of an instruction decreases, it gets added back to the worklist. * And a few of other heuristics on when we should revisit instructions. On top of the worklist algorithm, InstCombine layers an additional fix-point iteration: If any fold was performed in the previous iteration, then InstCombine will re-populate the worklist from scratch and fold the entire function again. This continues until a fix-point is reached. In the vast majority of cases, InstCombine will reach a fix-point within a single iteration: However, a second iteration is performed to verify that this is indeed the fixpoint. We can see this in the statistics for llvm-test-suite: "instcombine.NumOneIteration": 411380, "instcombine.NumTwoIterations": 117921, "instcombine.NumThreeIterations": 236, "instcombine.NumFourOrMoreIterations": 2, The way to read these numbers is that in 411380 cases, InstCombine performs no folds. In 117921 cases it performs a fold and reaches the fix-point within one iteration (the second iteration verifies the fixpoint). In the remaining 238 cases, more than one iteration is needed to reach the fixpoint. In other words, only in 0.04% of cases are additional iterations needed to reach a fixpoint. Conversely, in 22.3% of cases InstCombine performs a completely useless extra iteration to verify the fix point. This patch removes the fixpoint iteration from InstCombine, and always only perform a single iteration. This results in a major compile-time improvement of around 4% at negligible codegen impact. This explicitly does accept that we will not reach a fixpoint in all cases. However, this is mitigated by two factors: First, the data suggests that this happens very rarely in practice. Second, InstCombine runs many times during the optimization pipeline (8 times even without LTO), so there are many chances to recover such cases. In order to prevent accidental optimization regressions in the future, this implements a verify-fixpoint option, which is enabled by default when instcombine is specified in -passes and disabled when InstCombinePass() is constructed from C++. This means that test cases need to explicitly use the no-verify-fixpoint option if they fail to reach a fixed point (for a well understand reason we cannot / do not want to avoid). Differential Revision: https://reviews.llvm.org/D154579	2023-07-31 10:56:49 +02:00
Alexandros Lamprineas	893d3a61c0	Reland [FuncSpec] Add Phi nodes to the InstCostVisitor. This patch allows constant folding of PHIs when estimating the user bonus. Phi nodes are a special case since some of their inputs may remain unresolved until all the specialization arguments have been processed by the InstCostVisitor. Therefore, we keep a list of dead basic blocks and then lazily visit the Phi nodes once the user bonus has been computed for all the specialization arguments. Differential Revision: https://reviews.llvm.org/D154852	2023-07-31 08:25:48 +01:00
Noah Goldstein	09b6765e7d	[InstCombine] Remove trailing space in comment; NFC	2023-07-30 18:05:56 -05:00
Nikita Popov	ad7f02010f	[InstCombine] Process blocks in RPO InstComine currently processes blocks in an arbitrary depth-first order. This can break the usual invariant that the operands of an instruction should be simplified before the instruction itself, if uses across basic blocks (particularly inside phi nodes) are involved. This patch switches the initial worklist population to use RPO instead, which will ensure that predecessors are visited before successors (back-edges notwithstanding). This allows us to fold more cases within a single InstCombine iteration, in preparation for D154579. This change by itself is a minor compile-time regression of about 0.1%, which will be more than recovered by switching to single-iteration InstCombine. Differential Revision: https://reviews.llvm.org/D75362	2023-07-30 18:38:45 +02:00
Florian Hahn	822c749aec	[LV] Shrink operands before creating new instr to force eval order. Shrink operands before creating the new instruction to make sure the same evaluation order is used on all platforms. This fixes buildbot failures due to different argument evaluation order on different systems.	2023-07-30 17:16:37 +01:00
Aleksandr Popov	236e6787de	[IRCE] Add NSW to OverflowingBinaryOperator but not BinaryOperator Fix incorrect setting NSW flag to non-overflowing indvar base (D154954) Reviewed By: danilaml Differential Revision: https://reviews.llvm.org/D156577	2023-07-30 11:14:26 +02:00
Nuno Lopes	eb1617a582	[InstCombineVectorOps] Use poison instead of undef as placeholder [NFC] It's used to create a vector where only 1 element is used While at it, change OOB extractelement to yield poison per LangRef	2023-07-29 15:28:13 +01:00
Aaron Ballman	1a53b5c367	Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map" This reverts commit 66ba71d913df7f7cd75e92c0c4265932b7c93292. Addressing issues found by: https://lab.llvm.org/buildbot/#/builders/245/builds/11732 https://lab.llvm.org/buildbot/#/builders/187/builds/12251 https://lab.llvm.org/buildbot/#/builders/186/builds/11099 https://lab.llvm.org/buildbot/#/builders/182/builds/6976	2023-07-28 09:41:38 -04:00
Alexey Bataev	48bc5b0a29	[SLP][PR64099]Fix unsound undef to poison transformation when handling insertelement instructions. If the original vector has undef, not poison values, which are not rewritten by later insertelement instructions, need to transform shuffle with the undef vector, not a poison vector, and actual indices, not PoisonMaskElem, otherwise the transformation may produce more poisons output than the input.	2023-07-27 16:09:49 -07:00
William Huang	66ba71d913	[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map. The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow. Several changes to note: (1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names. (2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) ) (3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing. (4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable. (5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive) Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%. Reviewed By: davidxl, snehasish Differential Revision: https://reviews.llvm.org/D147740	2023-07-27 23:08:27 +00:00
Noah Goldstein	edf2e0e075	[InstCombine] Folding `@llvm.ptrmask` with itself `@llvm.ptrmask` is basically just `and` with a `ptr` operand. This is a trivial combine to do with `and` (many others could also be added). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D154006	2023-07-27 17:43:08 -05:00
Douglas Yung	32683b231e	Revert "[FuncSpec] Add Phi nodes to the InstCostVisitor." This reverts commit 96ff464dd3aac255adc52787a1e28487a9cd4c35. The test in this change was failing on many buildbots: https://lab.llvm.org/buildbot/#/builders/164/builds/41292 https://lab.llvm.org/buildbot/#/builders/258/builds/4491 https://lab.llvm.org/buildbot/#/builders/192/builds/3566 https://lab.llvm.org/buildbot/#/builders/123/builds/20411 https://lab.llvm.org/buildbot/#/builders/58/builds/42553 https://lab.llvm.org/buildbot/#/builders/247/builds/7037 https://lab.llvm.org/buildbot/#/builders/139/builds/46259 https://lab.llvm.org/buildbot/#/builders/216/builds/24650 https://lab.llvm.org/buildbot/#/builders/234/builds/12571 https://lab.llvm.org/buildbot/#/builders/232/builds/12574 https://lab.llvm.org/buildbot/#/builders/235/builds/975	2023-07-27 13:47:52 -07:00
Alexandros Lamprineas	96ff464dd3	[FuncSpec] Add Phi nodes to the InstCostVisitor. This patch allows constant folding of PHIs when estimating the user bonus. Phi nodes are a special case since some of their inputs may remain unresolved until all the specialization arguments have been processed by the InstCostVisitor. Therefore, we keep a list of dead basic blocks and then lazily visit the Phi nodes once the user bonus has been computed for all the specialization arguments. In addition to the last revision this one fixes the bug reported on Phabricator. Differential Revision: https://reviews.llvm.org/D154852	2023-07-27 19:24:11 +01:00
spupyrev	bc59faa863	A new code layout algorithm for function reordering [2/3] We are bringing a new algorithm for function layout (reordering) based on the call graph (extracted from a profile data). The algorithm is an improvement of top of a known heuristic, C^3. It tries to co-locate hot and frequently executed together functions in the resulting ordering. Unlike C^3, it explores a larger search space and have an objective closely tied to the performance of instruction and i-TLB caches. Hence, the name CDS = Cache-Directed Sort. The algorithm can be used at the linking or post-linking (e.g., BOLT) stage. The algorithm shares some similarities with C^3 and an approach for basic block reordering (ext-tsp). It works with chains (ordered lists) of functions. Initially all chains are isolated functions. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the objective, which is a weighted combination of frequency-based and distance-based locality. That is, we try to co-locate hot functions together (so they can share the cache lines) and functions frequently executed together. The merging process stops when there is only one chain left, or when merging does not improve the objective. In the latter case, the remaining chains are sorted by density in the decreasing order. Complexity We regularly apply the algorithm for large data-center binaries containing 10K+ (hot) functions, and the algorithm takes only a few seconds. For some extreme cases with 100K-1M nodes, the runtime is within minutes. Perf-impact We extensively tested the implementation extensively on a benchmark of isolated binaries and prod services. The impact is measurable for "larger" binaries that are front-end bound: the cpu time improvement (on top of C^3) is in the range of [0% .. 1%], which is a result of a reduced i-TLB miss rate (by up to 20%) and i-cache miss rate (up to 5%). Reviewed By: rahmanl Differential Revision: https://reviews.llvm.org/D152834	2023-07-27 09:20:53 -07:00
Maksim Kita	cbfcf90152	[AggressiveInstCombine] Fold strcmp for short string literals with size 2 Fold strcmp for short string literals with size 2. Depends D155742. Differential Revision: https://reviews.llvm.org/D155743	2023-07-27 18:45:21 +03:00
Nikita Popov	70aca7b122	[InstCombine] Explicitly track dead edges This allows us to handle dead blocks with multiple incoming edges, where we can determine that all of those edges are dead (or cycles). This allows InstCombine to handle certain dead code patterns that can be produced by LoopVectorize in a single iteration. This is in preparation for D154579.	2023-07-27 16:41:03 +02:00
Ramkumar Ramachandra	23caf9e9e7	Local: fix debug output of replaceDominatedUsesWith() The debug output of replaceDominatedUsesWith() prints incorrect information, and the user is left confused about what exactly was replaced. Fix this. Differential Revision: https://reviews.llvm.org/D156318	2023-07-27 13:23:38 +01:00
witstorm95	77ef88d7ee	[Coroutines] Add an O(n) algorithm for computing the cross suspend point information. Fixed https://github.com/llvm/llvm-project/issues/62348 Propagate cross suspend point information by visiting CFG. Just only go through two times at most, you can get all the cross suspend point information. Before the patch: ``` n: 20000 4.31user 0.11system 0:04.44elapsed 99%CPU (0avgtext+0avgdata 552352maxresident)k 0inputs+8848outputs (0major+126254minor)pagefaults 0swaps n: 40000 11.24user 0.40system 0:11.66elapsed 99%CPU (0avgtext+0avgdata 1788404maxresident)k 0inputs+17600outputs (0major+431105minor)pagefaults 0swaps n: 60000 21.65user 0.96system 0:22.62elapsed 99%CPU (0avgtext+0avgdata 3809836maxresident)k 0inputs+26352outputs (0major+934749minor)pagefaults 0swaps n: 80000 37.05user 1.53system 0:38.58elapsed 99%CPU (0avgtext+0avgdata 6602396maxresident)k 0inputs+35096outputs (0major+1622584minor)pagefaults 0swaps n: 100000 51.87user 2.67system 0:54.54elapsed 99%CPU (0avgtext+0avgdata 10210736maxresident)k 0inputs+43848outputs (0major+2518945minor)pagefaults 0swaps ``` After the patch: ``` n: 20000 3.17user 0.16system 0:03.33elapsed 100%CPU (0avgtext+0avgdata 551736maxresident)k 0inputs+8848outputs (0major+126192minor)pagefaults 0swaps n: 40000 6.10user 0.42system 0:06.54elapsed 99%CPU (0avgtext+0avgdata 1787848maxresident)k 0inputs+17600outputs (0major+432212minor)pagefaults 0swaps n: 60000 9.13user 0.89system 0:10.03elapsed 99%CPU (0avgtext+0avgdata 3809108maxresident)k 0inputs+26352outputs (0major+931280minor)pagefaults 0swaps n: 80000 12.44user 1.57system 0:14.02elapsed 99%CPU (0avgtext+0avgdata 6603432maxresident)k 0inputs+35096outputs (0major+1624635minor)pagefaults 0swaps n: 100000 16.29user 2.28system 0:18.59elapsed 99%CPU (0avgtext+0avgdata 10212808maxresident)k 0inputs+43848outputs (0major+2522200minor)pagefaults 0swaps ``` Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D154695	2023-07-27 17:28:51 +08:00
Chuanqi Xu	97615ed2f0	Revert "[Coroutines] Add an O(n) algorithm for computing the cross suspend point" This reverts commit bb4121e65251275b5b16a63423c2bb2be79aeebb. Sorry for forgetting adding Differential Revision information. It may worth reverting this one and commit it again given this is a relative big patch.	2023-07-27 17:27:45 +08:00
witstorm95	bb4121e652	[Coroutines] Add an O(n) algorithm for computing the cross suspend point information. Fixed https://github.com/llvm/llvm-project/issues/62348 Propagate cross suspend point information by visiting CFG. Just only go through two times at most, you can get all the cross suspend point information. Before the patch: ``` n: 20000 4.31user 0.11system 0:04.44elapsed 99%CPU (0avgtext+0avgdata 552352maxresident)k 0inputs+8848outputs (0major+126254minor)pagefaults 0swaps n: 40000 11.24user 0.40system 0:11.66elapsed 99%CPU (0avgtext+0avgdata 1788404maxresident)k 0inputs+17600outputs (0major+431105minor)pagefaults 0swaps n: 60000 21.65user 0.96system 0:22.62elapsed 99%CPU (0avgtext+0avgdata 3809836maxresident)k 0inputs+26352outputs (0major+934749minor)pagefaults 0swaps n: 80000 37.05user 1.53system 0:38.58elapsed 99%CPU (0avgtext+0avgdata 6602396maxresident)k 0inputs+35096outputs (0major+1622584minor)pagefaults 0swaps n: 100000 51.87user 2.67system 0:54.54elapsed 99%CPU (0avgtext+0avgdata 10210736maxresident)k 0inputs+43848outputs (0major+2518945minor)pagefaults 0swaps ``` After the patch: ``` n: 20000 3.17user 0.16system 0:03.33elapsed 100%CPU (0avgtext+0avgdata 551736maxresident)k 0inputs+8848outputs (0major+126192minor)pagefaults 0swaps n: 40000 6.10user 0.42system 0:06.54elapsed 99%CPU (0avgtext+0avgdata 1787848maxresident)k 0inputs+17600outputs (0major+432212minor)pagefaults 0swaps n: 60000 9.13user 0.89system 0:10.03elapsed 99%CPU (0avgtext+0avgdata 3809108maxresident)k 0inputs+26352outputs (0major+931280minor)pagefaults 0swaps n: 80000 12.44user 1.57system 0:14.02elapsed 99%CPU (0avgtext+0avgdata 6603432maxresident)k 0inputs+35096outputs (0major+1624635minor)pagefaults 0swaps n: 100000 16.29user 2.28system 0:18.59elapsed 99%CPU (0avgtext+0avgdata 10212808maxresident)k 0inputs+43848outputs (0major+2522200minor)pagefaults 0swaps ```	2023-07-27 17:25:32 +08:00
Florian Mayer	9a67c6beb2	[NFC] [HWASan] simplify code Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D156382	2023-07-26 17:05:19 -07:00
Florian Mayer	12982d250d	[NFC] [HWASan] remove unused include	2023-07-26 14:34:31 -07:00
Shilei Tian	10068cd654	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Depend on D155886. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-07-26 13:35:14 -04:00
Maksim Kita	ac357a4773	[InstCombine] Fold icmp or sub chain ((x1 - y1) \| (x2 - y2)) == 0 Improve ((x1 ^ y1) \| (x2 ^ y2)) == 0 transform to also support sub ((x1 - y1) \| (x2 - y2)) == 0. Depends D155703. Differential Revision: https://reviews.llvm.org/D155704	2023-07-26 19:16:41 +03:00
Ivan Kosarev	e9df4c9892	[ADT] Support iterating size-based integer ranges. It seems the ranges start with 0 in most cases. Reviewed By: dblaikie, gchatelet Differential Revision: https://reviews.llvm.org/D156135	2023-07-26 16:28:41 +01:00
Alexandros Lamprineas	c52ab9ea2f	Revert "[FuncSpec] Add Phi nodes to the InstCostVisitor." Reverting due to the crash reported in D154852. Also reverting the subsequent commit as collateral damage: "[FuncSpec] Split the specialization bonus into CodeSize and Latency."	2023-07-26 12:33:41 +01:00
Alexandros Lamprineas	20c8f58c11	[FuncSpec] Split the specialization bonus into CodeSize and Latency. Currently we use a combined metric TargetTransformInfo::TCK_SizeAndLatency when estimating the specialization bonus. This is suboptimal, and in some cases erroneous. For example we shouldn't be weighting the codesize decrease attributed to constant propagation by the block frequency of the dead code. Instead only the latency savings should be weighted by block frequency. The total codesize savings from all the specialization arguments should be deducted from the specialization cost. Differential Revision: https://reviews.llvm.org/D155103	2023-07-26 12:03:46 +01:00
Johannes Doerfert	2f7ef7be1f	[Attributor][FIX] Swap cases in ternary op to avoid nullptr reference The case was wrong before, and somehow I only looked at the condition before.	2023-07-26 00:03:06 -07:00
Johannes Doerfert	a598e39063	[AAInterFnReachabilityFunction][NFC] Remove unused members	2023-07-26 00:03:06 -07:00
Johannes Doerfert	b3fec1067a	[Attributor] Improve NonNull deduction We can improve our deduction if we stop at PHI and select instructions and also iterate the returned values explicitly. The latter helps with isImpliedByIR deductions.	2023-07-25 20:31:21 -07:00
Johannes Doerfert	88b5d23021	[Attributor] Allow multiple LHS/RHS values when simplifying comparisons We use to deal with multiple values but not in the handleCmp function. Now we also allow multiple simplified operands there.	2023-07-25 20:31:21 -07:00
Johannes Doerfert	0cd8a28941	[Attributor][FIX] No IntraFnReachability does not mean unreachable Also, first check inter fn reachability as it seems to be cheaper in practise.	2023-07-25 17:47:33 -07:00
Johannes Doerfert	ba0be698c5	[Attributor][NFC] Rename variable to be less confusing	2023-07-25 17:47:33 -07:00
Johannes Doerfert	4223c9b354	[Attributor] Always deduce nosync from readonly + non-convergent This adds the deduction also if the function is not IPO amendable.	2023-07-25 17:47:33 -07:00
Johannes Doerfert	ae6ad31763	[Attributor] Try to remove argmem effects after signature rewrite If the new signature has only readnone pointers we can remove any argmem effect from the function signature.	2023-07-25 17:47:33 -07:00
Johannes Doerfert	e31724f1a6	[Attributor] Check readonly call sites for nosync in AANoSync See @nosync_convergent_callee_test() in nosync.ll. The other changes are call sites now annotated with `nosync`.	2023-07-25 17:47:33 -07:00
Florian Mayer	6cc9244baa	Enable hwasan-use-after-scope by default This has been in use for a long time without any issues. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D156267	2023-07-25 17:36:15 -07:00
Teresa Johnson	5986559caa	[SimplifyCFG] Guard branch folding by speculate blocks flag Guard FoldBranchToCommonDest in SimplifyCFG with the SpeculateBlocks flag as it can also speculate instructions. This was split out of D155997. Differential Revision: https://reviews.llvm.org/D156194	2023-07-25 06:46:19 -07:00
Matt Arsenault	e93ae281db	Attributor: Fix typo	2023-07-25 07:25:11 -04:00
Alexandros Lamprineas	59a5c582dc	[FuncSpec][NFC] Leave a comment for future improvements. Adds a TODO for checking inlinining opportunities while traversing the users of the specialization arguments. This was brought up in the review of D154852.	2023-07-25 11:58:42 +01:00
Alexandros Lamprineas	03f1d09fe4	[FuncSpec] Add Phi nodes to the InstCostVisitor. This patch allows constant folding of PHIs when estimating the user bonus. Phi nodes are a special case since some of their inputs may remain unresolved until all the specialization arguments have been processed by the InstCostVisitor. Therefore, we keep a list of dead basic blocks and then lazily visit the Phi nodes once the user bonus has been computed for all the specialization arguments. Differential Revision: https://reviews.llvm.org/D154852	2023-07-25 11:00:20 +01:00
Martin Storsjö	245ec675a4	Revert "[LV] Re-use existing broadcast value for live-ins." This reverts commit eea9258648ce73507f6f85c395de978af659d498. That commit triggered crashes in the following testcase: $ cat reduced.c typedef struct { int a[8] } b; typedef struct { b c; short d } e; void f() { int g; char h; e i = f; short j = i->d; int a = i->c->a[0]; for (;;) for (; g < a; g++) { h = j * i->d >> 8; h++; } } $ clang -target aarch64-linux-gnu -w -c -O2 reduced.c	2023-07-25 10:35:41 +03:00
Fangrui Song	e8e7a959c7	[SLP] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D154891	2023-07-24 09:47:50 -07:00

1 2 3 4 5 ...

34262 Commits