llvm-project

Author	SHA1	Message	Date
Alexey Bataev	ef7f6aca14	[SLP][NFC]Add some extra checks/reorganize the code to improve compile time, NFC.	2024-02-01 10:53:39 -08:00
Nikita Popov	62ae7d976f	[LoopUnroll] Fix missing sign extension For integers larger than 64-bit, this would zero-extend a -1 value, instead of sign-extending it. Fixes https://github.com/llvm/llvm-project/issues/80289.	2024-02-01 16:08:25 +01:00
Alexey Bataev	15295d0135	[SLP][NFC]Introduce and use computeCommonAlignment function, NFC.	2024-02-01 06:13:39 -08:00
Florian Hahn	da437330be	[SCEVExp] Keep NUW/NSW if both original inc and isomporphic inc agree. (#79512 ) We are replacing with a wider increment. If both OrigInc and IsomorphicInc are NUW/NSW, then we can preserve them on the wider increment; the narrower IsomorphicInc would wrap before the wider OrigInc, so the replacement won't make IsomorphicInc's uses more poisonous. PR: https://github.com/llvm/llvm-project/pull/79512	2024-02-01 11:01:29 +00:00
Philip Reames	f264da4322	[lsr][term-fold] Restrict transform to low cost expansions (#74747 ) This is a follow up to an item I noted in my submission comment for e947f95. I don't have a real world example where this is triggering unprofitably, but avoiding the transform when we estimate the loop to be short running from profiling seems quite reasonable. It's also now come up as a possibility in a regression twice in two days, so I'd like to get this in to close out the possibility if nothing else. The original review dropped the threshold for short trip count loops. I will return to that in a separate review if this lands.	2024-01-31 14:48:20 -08:00
Nikita Popov	4f32f5d572	[AA][JumpThreading] Don't use DomTree for AA in JumpThreading (#79294 ) JumpThreading may perform AA queries while the dominator tree is not up to date, which may result in miscompilations. Fix this by adding a new AAQI option to disable the use of the dominator tree in BasicAA. Fixes https://github.com/llvm/llvm-project/issues/79175.	2024-01-31 15:23:53 +01:00
Florian Hahn	cec24f0d7e	[VPlan] Update stale test after 9536a6286, fix formatting.	2024-01-31 13:45:38 +00:00
Florian Hahn	9536a6286e	[VPlan] Preserve original induction order when creating scalar steps. Update createScalarIVSteps to take an insert point as parameter. This ensures that the inserted scalar steps are in the same order as the recipes they replace (vs in reverse order as currently). This helps to reduce the diff for follow-up changes.	2024-01-31 13:31:28 +00:00
Yingwei Zheng	817d0cb485	[InstCombine] Simplify commutative compares of symmetric pairs (#80134 ) Fixes #78038.	2024-01-31 21:21:27 +08:00
Nikita Popov	cb6240d247	[BDCE] Also drop poison-generating metadata The comment was incorrect: !range also applies to calls, and we do need to drop it in some cases.	2024-01-31 12:22:58 +01:00
Yingwei Zheng	50e80e06d1	[ValueTracking] Merge `cannotBeOrderedLessThanZeroImpl` into `computeKnownFPClass` (#76360 ) This patch merges the logic of `cannotBeOrderedLessThanZeroImpl` into `computeKnownFPClass` to improve the signbit inference. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2024-01-31 18:26:50 +08:00
Nikita Popov	b210cbbd0e	[BDCE] Fix clearing of poison-generating flags If the demanded bits of an instruction are full, we don't have to recurse to its users, but we may still have to clear flags on the instruction itself. Fixes https://github.com/llvm/llvm-project/issues/80113.	2024-01-31 11:24:13 +01:00
Yingwei Zheng	f292f90bc2	[InstCombine] Fold select with signbit idiom into fabs (#76342 ) This patch folds: ``` ((bitcast X to int) <s 0 ? -X : X) -> fabs(X) ((bitcast X to int) >s -1 ? X : -X) -> fabs(X) ((bitcast X to int) <s 0 ? X : -X) -> -fabs(X) ((bitcast X to int) >s -1 ? -X : X) -> -fabs(X) ``` Alive2: https://alive2.llvm.org/ce/z/rGepow	2024-01-31 15:42:09 +08:00
Yingwei Zheng	f2816ff60c	[InstCombine] Simplify and/or by replacing operands with constants (#77231 ) This patch tries to simplify `X \| Y` by replacing occurrences of `Y` in `X` with 0. Similarly, it tries to simplify `X & Y` by replacing occurrences of `Y` in `X` with -1. Alive2: https://alive2.llvm.org/ce/z/cNjDTR Note: As the current implementation is too conservative in the one-use checks, I cannot remove other existing hard-coded simplifications if they involves more than two instructions (e.g, `A & ~(A ^ B) --> A & B`). Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=a085402ef54379758e6c996dbaedfcb92ad222b5&to=9d655c6685865ffce0ad336fed81228f3071bd03&stat=instructions%3Au \|stage1-O3\|stage1-ReleaseThinLTO\|stage1-ReleaseLTO-g\|stage1-O0-g\|stage2-O3\|stage2-O0-g\|stage2-clang\| \|--\|--\|--\|--\|--\|--\|--\| \|+0.01%\|-0.00%\|+0.00%\|-0.02%\|+0.01%\|+0.02%\|-0.01%\| Fixes #76554.	2024-01-31 14:30:55 +08:00
Yingwei Zheng	a034e65e97	[CVP] Check whether the default case is reachable (#79993 ) This patch eliminates unreachable default cases using context-sensitive range information.	2024-01-31 13:11:10 +08:00
Fangrui Song	9b91c54d9b	[msan] Unpoison indirect outputs for userspace using memset for large operands (#79924 ) Modify #77393 to clear shadow memory using `llvm.memset.` when the size is large, similar to `shouldUseBZeroPlusStoresToInitialize` in clang for `-ftrivial-auto-var-init=`. The intrinsic, if lowered to libcall, will use the msan interceptor. The instruction selector lowers a `StoreInst` to multiple stores, not utilizing `memset`. When the size is large (e.g. `store { [100 x i32] } zeroinitializer, ptr %12, align 1`), the generated code will be long (and `CodeGenPrepare::optimizeInst` will even crash for a huge size). ``` // Test stack size template <class T> void DoNotOptimize(const T& var) { // deprecated by https://github.com/google/benchmark/pull/1493 asm volatile("" : "+m"(const_cast<T&>(var))); } int main() { using LargeArray = std::array<int, 1000000>; auto large_stack = []() { DoNotOptimize(LargeArray()); }; /////// CodeGenPrepare::optimizeInst triggers an assertion failure when creating an integer type with a bit width>2*23 large_stack(); } ```	2024-01-30 13:45:47 -08:00
Alexey Bataev	285bc69846	[SLP]Fix PR80027: Fix costs processing for minbitwidth types. Need to switch the types, the destination is first in getCastInstrCost function.	2024-01-30 10:32:55 -08:00
Alexey Bataev	976374d982	[SLP][NFC]Use MutableArrayRef instead of SmallVectorImpl&, NFC.	2024-01-30 06:21:47 -08:00
ampandey-1995	67f0a6917c	[ASan][AMDGPU] Fix Assertion Failure. (#79795 ) Assertion failure `(i >= FTy->getNumParams() \|\| FTy->getParamType(i) == Args[i]->getType()) && "Calling a function with a bad signature!"'. The 'llvm.memcpy' intercepted by ASan instrumentation pass is implemented by it's own __asan_memcpy implementation. The second argument of llvm.memcpy accepts ptr to addrspace(4), __asan_memcpy also has to follow ptr to addrspace(4) convention. --------- Co-authored-by: Amit Pandey <amit.pandey@amd.com>	2024-01-30 12:31:40 +05:30
Nilanjana Basu	c492eb6b28	[LV] Update interleaving count computation when scalar epilogue loop needs to run at least once (#79651 ) Update loop interleaving count computation to address loops that require at least one scalar iteration in the epilogue loop. For this case, the available trip count for interleaving the loop is one less.	2024-01-29 13:41:15 -08:00
Alexey Bataev	8d89dd4a58	[SLP]Fix PR79743: Check that all users are demoted before trying to demote the tree entry. Need to check if all user nodes are marked for demotion before demoting the node. Otherwise, some data info might be lost after vectorization.	2024-01-29 10:51:20 -08:00
Antonio Frighetto	20737825c9	[BDCE] Handle multi-use binary ops upon demanded bits Simplify multi-use `and`/`or`/`xor` when these last do not affect the demanded bits being considered. Fixes: https://github.com/llvm/llvm-project/issues/78596. Proofs: https://alive2.llvm.org/ce/z/EjuWHa.	2024-01-29 19:03:24 +01:00
Florian Hahn	743946e8ef	[VPlan] Replace VPRecipeOrVPValue with VP2VP recipe simplification. (#76090 ) Move simplification of VPBlendRecipes from early VPlan construction to VPlan-to-VPlan based recipe simplification. This simplifies initial construction. Note that some in-loop reduction tests are failing at the moment, due to the reduction predicate being created after the reduction recipe. I will provide a patch for that soon. PR: https://github.com/llvm/llvm-project/pull/76090	2024-01-29 09:52:05 +00:00
Kazu Hirata	fc15731183	[Transforms] Use a range-based for loop (NFC)	2024-01-28 18:03:35 -08:00
Florian Hahn	2d0d65b3ba	[VPlan] Create edge masks all cases up front needed.(NFC) Similarly to how block masks are created up front and later only retrieved also make sure masks are created in cases where edge masks are needed, i.e. blend recipes. Creating block-in masks for all blocks in the loop also ensures edge masks for all relevant edges have been created. Later, the new getEdgeMask can be used to look up cached edge masks. This makes sure edge masks are available in all cases for https://github.com/llvm/llvm-project/pull/76090.	2024-01-28 21:20:18 +00:00
Florian Hahn	1b37e8087e	[VPlan] use getVPValueOrAddLiveIn in VPlan::duplicate. Instead of creating live-ins manually, use getOrAddLiveIn which automatically takes care of adding them to VPLiveInsToFree. Also use it to create the VPValue for the trip-count. This fixes a leak: https://lab.llvm.org/buildbot/#/builders/168/builds/18308/steps/10/logs/stdio	2024-01-28 12:39:39 +00:00
Kazu Hirata	687136e7cd	[Transforms] Use a range-based for loop (NFC)	2024-01-27 22:20:25 -08:00
Florian Hahn	7c03d5d41d	[VPlan] Use unique_ptr to clean up duplicated plan.	2024-01-27 20:51:55 +00:00
Kazu Hirata	f1cee6b0ba	[Transforms] Use a range-based for loop (NFC)	2024-01-27 09:32:19 -08:00
Florian Hahn	ec402a2e53	[VPlan] Implement cloning of VPlans. (#73158 ) This patch implements cloning for VPlans and recipes. Cloning is used in the epilogue vectorization path, to clone the VPlan for the main vector loop. This means we won't re-use a VPlan when executing the VPlan for the epilogue vector loop, which in turn will enable us to perform optimizations based on UF & VF.	2024-01-27 13:30:52 +00:00
Mikhail Gudim	701ec45f2f	[InstCombine] Fix a comment. (#79422 )	2024-01-26 23:10:19 -05:00
Craig Topper	55c6d91034	[InstCombine] Preserve nuw/nsw/exact flags when transforming (C shift (A add nuw C1)) --> ((C shift C1) shift A). (#79490 ) If we weren't shifting out any non-zero bits or changing the sign before the transform, we shouldn't be after. Alive2: https://alive2.llvm.org/ce/z/mB-rWz	2024-01-26 11:33:53 -08:00
Krzysztof Drewniak	63fe80fb18	[SeperateConstOffsetFromGEP] Handle `or disjoint` flags (#76997 ) This commit extends separate-const-offset-from-gep to look at the newly-added `disjoint` flag on `or` instructions so as to preserve additional opportunities for optimization. The tests were pre-committed in #76972.	2024-01-26 09:56:06 -06:00
David Sherwood	962fbafecf	[LoopVectorize] Refine runtime memory check costs when there is an outer loop (#76034 ) When we generate runtime memory checks for an inner loop it's possible that these checks are invariant in the outer loop and so will get hoisted out. In such cases, the effective cost of the checks should reduce to reflect the outer loop trip count. This fixes a 25% performance regression introduced by commit 49b0e6dcc296792b577ae8f0f674e61a0929b99d when building the SPEC2017 x264 benchmark with PGO, where we decided the inner loop trip count wasn't high enough to warrant the (incorrect) high cost of the runtime checks. Also, when runtime memory checks consist entirely of diff checks these are likely to be outer loop invariant.	2024-01-26 14:43:48 +00:00
Florian Hahn	731c2049a4	[VPlan] Relax IV user assertion after 0ab539f for epilogue vec. After 0ab539fd6748adf2f638e10514dd9419597d8863, the canonical IV in the epilogue vector loop may be used by a trunc. Relax the corresponding assert. This should fix some build-bot failures, including https://lab.llvm.org/buildbot/#/builders/187/builds/14113 https://lab.llvm.org/buildbot/#/builders/98/builds/32350 https://lab.llvm.org/buildbot/#/builders/239/builds/5473	2024-01-26 13:19:25 +00:00
lifengxiang1025	6ccb06a7ab	[MemProf] Fix assert when exists direct recursion (#78264 ) Fix assert in `MemProfContextDisambiguation::applyImport` when exists direct recursion.	2024-01-26 20:55:44 +08:00
Graham Hunter	d4c0171423	[LV] Fix handling of interleaving linear args (#78725 ) Currently when interleaving vector calls with linear arguments, the Part is ignored and all vector calls use the initial value from the first lane of the current iteration. Fix this to extract from the correct part of the linear vector.	2024-01-26 11:30:35 +00:00
Florian Hahn	0ab539fd67	[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. (#78113 ) Add a new recipe to model scalar cast instructions, without relying on an underlying instruction. This allows creating scalar casts, without relying on an underlying instruction (like the current VPReplicateRecipe). The new recipe is used to explicitly model both truncating the induction step and the VPDerivedIVRecipe, thus simplifying both the recipe and code needed to introduce it. Truncating VPWidenIntOrFpInductionRecipes should also be modeled using the new recipe, as follow-up. PR: https://github.com/llvm/llvm-project/pull/78113	2024-01-26 11:13:05 +00:00
Kazu Hirata	d7ff7c3d18	[Transforms] Use llvm::pred_size and llvm::pred_successors (NFC)	2024-01-25 18:17:20 -08:00
Enna1	e0ade45991	[MemProf][NFC] Rename DefaultShadowGranularity to DefaultMemGranulari… (#79412 ) …ty in instrumentation code, be consistent with runtime In runtime code, the size of memory block mapped to a single shadow location is called MEM_GRANULARITY. In instrumentation code, the size of memory block mapped to a single shadow location is called DefaultShadowGranularity. Actually, the SHADOW_GRANULARITY is 8 (1 << SHADOW_SCALE), and the MEM_GRANULARITY is 64. The wording of DefaultShadowGranularity in instrumentation code is a bit misleading, this patch renames DefaultShadowGranularity to DefaultMemGranularity, be consistent with runtime.	2024-01-26 10:04:48 +08:00
Jeremy Morse	19b65a9c02	[DebugInfo][RemoveDIs] Add a DPValue implementation for instcombine sinking (#77930 ) In instcombine, when we sink an instruction into a successor block, we try to clone and salvage all the variable assignments that use that Value. This is a behaviour that's (IMO) flawed, but there are important use cases where we want to avoid regressions, thus we're implementing this for the non-instruction debug-info representation. This patch refactors the dbg.value sinking code into it's own function, and installs a parallel implementation for DPValues, the non-instruction debug-info container. This is mostly identical to the dbg.value implementation, except that we don't have an easy-to-access ordering between DPValues, and have to jump through extra hoops to establish one in the (rare) cases where that ordering is required. The test added represents a common use-case in LLVM where these behaviours are important: a loop has been completely optimised away, leaving several dbg.values in a row referring to an instruction that's going to sink. The dbg.values should sink in both dbg.value and RemoveDIs mode, and additionally only the last assignment should sink.	2024-01-25 23:28:56 +00:00
Florian Hahn	d88e3658ce	[SCEVExp] Move logic to replace congruent IV increments to helper (NFC). Move logic to replace congruent IV increments to helper function, to reduce the indentation by using early returns. This is in preparation for a follow-up patch.	2024-01-25 21:40:31 +00:00
Alexey Bataev	92ae2ca12b	[SLP][NFC]Improve BottomTopTop reordering of orders for multi-iterations attempts, NFC. If several iterations of reodering of orders is required, need to use different algorithm.	2024-01-25 13:04:01 -08:00
Kazu Hirata	28a2b85602	[DeadStoreElimination] Use SmallSetVector (NFC) (#79410 ) The use of SmallSetVector saves 0.58% of heap allocations during the compilation of a large preprocessed file, namely X86ISelLowering.cpp, for the X86 target. During the experiment, the final size of ToCheck was 8 or less 88% of the time.	2024-01-25 11:01:11 -08:00
Jeremy Morse	a19629dae7	Reapply 215b8f1e252, reverted in c3f7fb1421e Turns out I was using DbgMarker::getDbgValueRange rather than the helper utility in Instruction::getDbgValueRange, which checks for null-ness. Original commit message follows. [DebugInfo][RemoveDIs] Convert debug-info modes when loading bitcode (#78967) As part of eliminating debug-intrinsics in LLVM, we'll shortly be pushing the conversion from "old" dbg.value mode to "new" DPValue mode out from when the pass manager runs, to when modules are loaded. This patch adds that conversion process and some (temporary) options to llvm-lto{,2} to help test it. Specifically: now whenever we load a bitcode module, consider a flag of whether to "upgrade" it into the new debug-info mode, and if we're lazily materializing functions then do that lazily too. Doing this exposes an error in the IRLinker/materializer handling of DPValues, where we need to transfer the debug-info format flag correctly, and in ValueMapper we need to remap the Values that DPValues point at. I've added some test coverage in the modified tests; these will be exercised by our llvm-new-debug-iterators buildbot. This upgrading of debug-info won't be happening for the llvm18 release, instead we'll turn it on after the branch date, thenbe push the boundary of where "new" debug-info starts and ends down into the existing debug-info upgrade path over the course of the next release.	2024-01-25 18:37:13 +00:00
Alexey Bataev	6fe21bc1da	[SLP]Fix PR79229: Do not erase extractelement, if it used in multiregister node. If the node can be span between several registers and same extractelement instruction is used in several parts, it may be required to keep such extractelement instruction to avoid compiler crash.	2024-01-25 06:20:53 -08:00
Jeremy Morse	c3f7fb1421	Revert "[DebugInfo][RemoveDIs] Convert debug-info modes when loading bitcode (#78967 )" This reverts commit 215b8f1e252b4f30cf1b734faa370c0ac4b88659. Numerous builders exploded from this X_X, for example https://lab.llvm.org/buildbot/#/builders/46/builds/62657	2024-01-25 14:18:31 +00:00
John Brawn	a04d4a03f7	[LoopFlatten] Use loop versioning when overflow can't be disproven (#78576 ) Implement the TODO in loop flattening to version the loop when we can't prove that the trip count calculation won't overflow.	2024-01-25 13:57:19 +00:00
Jeremy Morse	215b8f1e25	[DebugInfo][RemoveDIs] Convert debug-info modes when loading bitcode (#78967 ) As part of eliminating debug-intrinsics in LLVM, we'll shortly be pushing the conversion from "old" dbg.value mode to "new" DPValue mode out from when the pass manager runs, to when modules are loaded. This patch adds that conversion process and some (temporary) options to llvm-lto{,2} to help test it. Specifically: now whenever we load a bitcode module, consider a flag of whether to "upgrade" it into the new debug-info mode, and if we're lazily materializing functions then do that lazily too. Doing this exposes an error in the IRLinker/materializer handling of DPValues, where we need to transfer the debug-info format flag correctly, and in ValueMapper we need to remap the Values that DPValues point at. I've added some test coverage in the modified tests; these will be exercised by our llvm-new-debug-iterators buildbot. This upgrading of debug-info won't be happening for the llvm18 release, instead we'll turn it on after the branch date, thenbe push the boundary of where "new" debug-info starts and ends down into the existing debug-info upgrade path over the course of the next release.	2024-01-25 13:27:40 +00:00
Florian Hahn	a04f615291	[LV] Check for innermost loop instead of EnableVPlanNativePath in CM. Replace EnableVPlanNativePath checks in the cost-model by assertions that the code is only called for innermost loops. This ensures that the cost model isn't used in the VPlanNativePath, which is only used for outer-loop vectorization. Even with EnableVPlanNativePath, inner loops are processed by the inner loop vectorization path, not the native path, so checking for EnableVPlanNativePath may impact decisions for inner loops and can cause crashes, like in the attached test case.	2024-01-25 12:49:52 +00:00

1 2 3 4 5 ...

35632 Commits