llvm-project

Author	SHA1	Message	Date
Florian Hahn	fbcf8a8cbb	[ConstraintElim] Add (UGE, var, 0) to unsigned system for new vars. (#76262 ) The constraint system used for ConstraintElimination assumes all varibles to be signed. This can cause missed optimization in the unsigned system, due to missing the information that all variables are unsigned (non-negative). Variables can be marked as non-negative by adding Var >= 0 for all variables. This is done for arguments on ConstraintInfo construction and after adding new variables. This handles cases like the ones outlined in https://discourse.llvm.org/t/why-does-llvm-not-perform-range-analysis-on-integer-values/74341 The original example shared above is now handled without this change, but adding another variable means that instcombine won't be able to simplify examples like https://godbolt.org/z/hTnra7zdY Adding the extra variables comes with a slight compile-time increase https://llvm-compile-time-tracker.com/compare.php?from=7568b36a2bc1a1e496ec29246966ffdfc3a8b87f&to=641a47f0acce7755e340447386013a2e086f03d9&stat=instructions:u stage1-O3 stage1-ReleaseThinLTO stage1-ReleaseLTO-g stage1-O0-g +0.04% +0.07% +0.05% +0.02% stage2-O3 stage2-O0-g stage2-clang +0.05% +0.05% +0.05% https://github.com/llvm/llvm-project/pull/76262	2023-12-23 15:53:48 +01:00
Kazu Hirata	03dc806b12	[Transforms] Use {DenseMap,SmallPtrSet}::contains (NFC)	2023-12-22 14:51:22 -08:00
Nikita Popov	54067c5fbe	[SROA] Use memcpy if type size does not match store size The original memcpy also copies the padding, so make sure that this is still the case after splitting. Fixes https://github.com/llvm/llvm-project/issues/64081.	2023-12-22 10:19:22 +01:00
Shan Huang	06a9c6738a	[CVP] Fix #76058 : missing debug location in processSDiv function (#76118 ) This PR fixes #76058.	2023-12-22 09:26:32 +01:00
boxu.zhang	d3ef867082	[LoopUnroll] Make UnrollMaxUpperBound to be overridable by target (#76029 ) The UnrollMaxUpperBound should be target dependent, since different chips provide different register set which brings different ability of storing more temporary values of a program. So I add a MaxUpperBound value in UnrollingPreference which can be override by targets. All uses of UnrollMaxUpperBound are replaced with UP.MaxUpperBound. The default value is still 8 and the command line argument '--unroll-max-upperbound' takes final effect if provided.	2023-12-21 09:47:46 +01:00
Florian Hahn	18170d0f28	[ConstraintElim] Extend AND implication logic to support OR as well. (#76044 ) Extend the logic check if an operand of an AND is implied by the other to also support OR. This is done by checking if !op1 implies op2 or vice versa.	2023-12-20 18:13:41 +01:00
Florian Hahn	7cf499c63b	[ConstraintElim] Check if second op implies first for And. (#75750 ) Generalize checkAndSecondOpImpliedByFirst to also check if the second operand implies the first.	2023-12-20 11:58:35 +01:00
Paul Walker	dea16ebd26	[LLVM][IR] Replace ConstantInt's specialisation of getType() with getIntegerType(). (#75217 ) The specialisation will not be valid when ConstantInt gains native support for vector types. This is largely a mechanical change but with extra attention paid to constant folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to remove the need to call `getIntegerType()`. Co-authored-by: Nikita Popov <github@npopov.com>	2023-12-18 11:58:42 +00:00
Paul Walker	930b5b52ff	[ConstantHoisting] Add a TTI hook to prevent hoisting. (#69004 ) Code generation can sometimes simplify expensive operations when an operand is constant. An example of this is divides on AArch64 where they can be rewritten using a cheaper sequence of multiplies and subtracts. Doing this is often better than hoisting expensive constants which are likely to be hoisted by MachineLICM anyway.	2023-12-13 17:20:36 +00:00
Kazu Hirata	f0ac6f92a7	[Transforms] Fix a warning This patch fixes: llvm/lib/Transforms/Scalar/ConstraintElimination.cpp:1112:13: error: unused function 'dumpUnpackedICmp' [-Werror,-Wunused-function]	2023-12-13 08:31:46 -08:00
Yingwei Zheng	26fbdff458	[ConstraintElim] Refactor `checkCondition`. NFC. (#75319 ) This patch refactors `checkCondition` to handle min/max intrinsic calls in #75306.	2023-12-13 23:20:01 +08:00
Jeremy Morse	4b64138ba4	[DebugInfo][RemoveDIs] Switch some insertion routines to use iterators (#75330 ) As part of RemoveDIs, we need instruction insertion to be done with iterators rather than instruction pointers, so that we can communicate some debug-info facts about the position. This patch is an entirely mechanical replacement of Instruction * with BasicBlock::iterator, plus using insertBefore to insert some instructions because we don't have iterator-taking constructors yet. Sadly it's not NFC because it causes dbg.value intrinsics / their DPValue equivalents to shift location.	2023-12-13 14:04:35 +00:00
Bruno De Fraine	3c9236c0bf	[LoopVersioningLICM] add comment regarding dubious check (NFC)	2023-12-13 12:20:08 +01:00
Kazu Hirata	4b4dcb4988	[Transforms] Fix a warning This patch fixes: llvm/lib/Transforms/Scalar/SROA.cpp:4855:9: error: unused variable 'NewAssign' [-Werror,-Wunused-variable]	2023-12-12 08:50:49 -08:00
Orlando Cazalet-Hyams	3d42557872	[RemoveDI] Handle DPValues in SROA (#74089 ) Handle dbg.declares in SROA using DPValues. In order to reduce duplication, the migrate-debug-info loop has been changed to a generic lambda with some helper function overloads, which is called for dbg.declares, dbg.assigns, and DPValues alike. The tests will become "live" once #74090 lands (see for more info).	2023-12-12 15:49:24 +00:00
Nikita Popov	6ab663be8d	[LVI] Require UndefAllowed argument to getConstantRangeAtUse() (NFC) For the remaining uses set it to true, matching the current behavior.	2023-12-12 12:45:49 +01:00
Nabeel Omer	1f71db78ce	[NFC][DSE] Fix typo comment in eliminateDeadStores (#75166 ) > We are re-using tryToMergePartialOverlappingStores, which requires DeadSI to dominate DeadSI. Should be "DeadSI to dominate KillingSI" because that's what the check is for.	2023-12-12 11:13:40 +00:00
Nikita Popov	967e84eee3	[CVP] Don't use undef range for LHS of div/rem transforms Using it for RHS is fine, as undef is UB in that case.	2023-12-12 12:06:19 +01:00
Nikita Popov	84df226c4a	[CVP] Don't use undef ranges in willNotOverflow()	2023-12-12 11:54:33 +01:00
Nikita Popov	4949fb7954	[CVP] Don't allow undef range when inferring nowrap flags	2023-12-12 11:26:00 +01:00
Orlando Cazalet-Hyams	2d9d9a1a55	[NFC] Change FindDbgDeclareUsers interface to match findDbgUsers/values (#73498 ) This simplifies an upcoming patch to support the RemoveDIs project (tracking variable locations without using intrinsics). Next in this series is #73500.	2023-12-12 09:43:58 +00:00
Thomas Symalla	b010747a4f	[NFC] Fix typo in ConstraintElimination assertion. (#75151 ) unsinged => unsigned Co-authored-by: Thomas Symalla <tsymalla@amd.com>	2023-12-12 10:24:11 +01:00
Wang Pengcheng	6aa6ef73ec	[MemCpyOpt] Don't perform call slot opt if alloc type is scalable (#75027 ) This fixes #75010.	2023-12-11 19:45:13 +08:00
Kazu Hirata	a16429365c	[Transforms] Remove unnecessary includes (NFC)	2023-12-09 18:23:06 -08:00
Yingwei Zheng	312cb34da6	[Reassociate] Preserve NUW flags after expr tree rewriting (#72360 ) Alive2: https://alive2.llvm.org/ce/z/38KiC_	2023-12-09 16:45:48 +08:00
XiangZhang	1d6a678591	[LoopUnroll] Make use of MaxTripCount for loops with "#pragma unroll" (#74703 ) Fix loop unroll fail caused by branches folding. For example: SimplifyCFG foldloop branches then cause loop unroll failed for "#program unroll" loop. ``` #program unroll for (int I = 0; I < ConstNum; ++I) { // folding "I < ConstNum" and "Cond2" if (Cond2) { break; } xxx loop body; } ``` The pragma unroll metadata only takes effect if there is an exact trip count, but not if there is an upper bound trip count. This patch make it work with an upper bound trip count as well in shouldPragmaUnroll(). Loop unroll is important in stack nervous devices (e.g. GPU, and that is why a lot of GPU code mark loop with "#program unroll"). It usually much simplify the address (offset) calculations in old iterations, then we can do a lot of others optimizations, e.g, SROA, for these simplifed address (escape alloca the whole aggregates).	2023-12-08 19:43:10 +08:00
alex-t	d8cd7fc1f4	AlignmentFromAssumptions should only track pointer operand users (#73370 ) AlignmentFromAssumptions uses SCEV to update the load/store alignment. It tracks down the use-def chains for the pointer which it takes from the assumption cache until it reaches the load or store instruction. It mistakenly adds to the worklist the users of the load result irrespective of the fact that the load result has no connection with the original pointer, moreover, it is not a pointer at all in most cases. Thus the def-use chain contains irrelevant load users. When it is a store instruction the algorithm attempts to adjust its alignment to the alignment of the original pointer. The problem appears when the load and store memory operand pointers belong to different address spaces and possibly have different sizes. The 4bf015c035e4e5b63c7222dfb15ff274a5ed905c was an attempt to address a similar problem. The truncation or zero extension was added to make pointers the same size. That looks strange to me because the zero extension of the pointer is not legal. The test in the 4bf015c035e4e5b63c7222dfb15ff274a5ed905c does not work any longer as for the explicit address spaces conversion the addrspacecast is generated. Summarize: 1. For the alloca to global address spaces conversion addrspacecasts are used, so the code added by the 4bf015c035e4e5b63c7222dfb15ff274a5ed905c is no longer functional. 2. The AlignmentFromAssumptions algorithm should not add the load users to the worklist as they have nothing to do with the original pointer. 3. Instead we only track users that are: GetelementPtrIns, PHINodes.	2023-12-07 17:35:35 +01:00
Craig Topper	533a0856bf	Recommit "[Reassociate] Use disjoint flag to convert Or to Add. (#72772 )" Original message: We still have to keep the noCommonBitsSet call to handle multiple reassociations in one pass. We'll lose the flag on the first reassociation.	2023-12-06 14:16:56 -08:00
Craig Topper	92fccea2e5	Revert "[Reassociate] Use disjoint flag to convert Or to Add. (#72772 )" This reverts commit 78964457cf1bafe57a54629fafbd081452a9e528. Looks like I didn't rebase this correctly before commit	2023-12-06 13:50:21 -08:00
Craig Topper	78964457cf	[Reassociate] Use disjoint flag to convert Or to Add. (#72772 ) We still have to keep the noCommonBitsSet call to handle multiple reassociations in one pass. We'll lose the flag on the first reassociation.	2023-12-06 13:48:15 -08:00
Jeremy Morse	384f916ea8	Reapply 34cdc913214fd (#74455 ), call-site-splitting for RemoveDIs Original commit message below; asan complained about this commit because it transpires that the final comparison with CurrentI is in fact a comparison of a pointer that has been freed. This seems to work fine most of the time, but using the iterator for such an instruction causes the freed instruction memory to be accessed, causing a use-after-free. The fix is to perform the comparison as an instruction, not an iterator. [NFC][DebugInfo][RemoveDIs] Use iterators to insert in callsite-splitting (#74455) This patch gets call site splitting to use iterators for insertion rather than instruction pointers. When we switch on non-instr debug-info this becomes significant, as the iterators are going to signal whether or not a position is before or after debug-info. NFC as this isn't going to affect the output of any existing test.	2023-12-06 16:52:10 +00:00
Nikita Popov	ea4ce16da2	[ConstraintElim] Use disjoint flag for decomposition (#74478 ) Use the or disjoint flag for decomposing or into add, which will handle cases that haveNoCommonBitsSet() may not be able to reinfer (e.g. because they require context-sensitive facts, which the call here does not use.)	2023-12-06 10:36:55 +01:00
Jeremy Morse	989e8f9d51	Revert "[NFC][DebugInfo][RemoveDIs] Use iterators to insert in callsite-splitting (#74455 )" This reverts commit 34cdc913214fd9561b6ec8d535bd3d0313772cb5. Two buildbots say this is bad: https://lab.llvm.org/buildbot/#/builders/265/builds/861 https://lab.llvm.org/buildbot/#/builders/168/builds/17272	2023-12-05 17:32:23 +00:00
Jeremy Morse	34cdc91321	[NFC][DebugInfo][RemoveDIs] Use iterators to insert in callsite-splitting (#74455 ) This patch gets call site splitting to use iterators for insertion rather than instruction pointers. When we switch on non-instr debug-info this becomes significant, as the iterators are going to signal whether or not a position is before or after debug-info. NFC as this isn't going to affect the output of any existing test.	2023-12-05 16:24:26 +00:00
Joshua Cao	72ffaa9156	[IR][TRE] Support associative intrinsics (#74226 ) There is support for intrinsics in Instruction::isCommunative, but there is no equivalent implementation for isAssociative. This patch builds support for associative intrinsics with TRE as an application. TRE can now have associative intrinsics as an accumulator. For example: ``` struct Node { Node next; unsigned val; } unsigned maxval(struct Node n) { if (!n) return 0; return std::max(n->val, maxval(n->next)); } ``` Can be transformed into: ``` unsigned maxval(struct Node n) { struct Node head = n; unsigned max = 0; // Identity of unsigned std::max while (true) { if (!head) return max; max = std::max(max, head->val); head = head->next; } return max; } ``` This example results in about 5x speedup in local runs. We conservatively only consider min/max and as associative for this patch to limit testing scope. There are probably other intrinsics that could be considered associative. There are a few consumers of isAssociative() that could be impacted. Testing has only required to Reassociate pass be updated.	2023-12-04 22:35:59 -08:00
Nikita Popov	4275da2278	[ValueTracking] Add isGuaranteedNotToBeUndef() variant (NFC) We have a bunch of places where we have to guard against undef to avoid multi-use issues, but would be fine with poison. Use a different function for these to make it clear, and to indicate that this check can be removed once we no longer support undef. I've replaced some of the obvious cases, but there's probably more. For now, the implementation is the same as UndefOrPoison, it just has a more precise name.	2023-12-04 12:04:41 +01:00
Kazu Hirata	3406a2bc5f	[llvm] Stop including tuple (NFC) Identified with clangd.	2023-12-03 23:01:26 -08:00
Kazu Hirata	92c2529ccd	[llvm] Stop including vector (NFC) Identified with clangd.	2023-12-03 22:32:21 -08:00
Joshua Cao	bd382032f6	[BBUtils][NFC] Delete SplitLandingPadPredecessors with DT (#73406 ) Function is marked for deprecation. There is only one consumer which is converted to use DomTreeUpdater.	2023-12-02 11:33:43 -08:00
Jeremy Morse	4424903156	[DebugInfo][RemoveDIs] Handle DPValues at remaining dbg.value using sites (#73788 ) This patch updates the last few places in LLVM using findDbgValues that don't also collect and handle DPValue objects. This largely involves instcombine and mem2reg changes, and are largely mechanical, calling existing utilities on collections of DPValues instead of just DbgValuesInsts. A variety of tests have had RemoveDIs RUN lines added to them to cover these behaviours. We have some technical debt of the instcombine sinking code for DPValues not being implemented yet, so I've left FIXME stubs indicating that we intend to cover tests with RemoveDIs but haven't yet.	2023-11-30 16:30:32 +00:00
Jeremy Morse	5ba5211a47	[DebugInfo][RemoveDIs] Have LICM insert at iterator positions (#73671 ) Because we're storing some extra debug-info information in the iterator class, we need to insert new LICM-created stores using such iterators. Switch LICM to storing iterators instead of pointers when it promotes variables in loops, add a test for the desired behaviour, and enable RemoveDIs instrumentation on a variety of other LICM tests for good measure. (This would appear to be the only pass in LLVM that needs to store iterators on the heap).	2023-11-30 13:00:26 +00:00
Jeremy Morse	2425e2940e	[DebugInfo][RemoveDIs] Have getInsertionPtAfterDef return an iterator (#73149 ) Part of the "RemoveDIs" project to remove debug intrinsics requires passing block-positions around in iterators rather than as instruction pointers, allowing some debug-info to reside in BasicBlock::iterator. This means getInsertionPointAfterDef has to return an iterator, and as it can return no-instruction that means returning an optional iterator. This patch changes the signature for getInsertionPtAfterDef and then patches up the various places that use it to handle the different type. This would overall be an NFC patch, however in InstCombinerImpl::freezeOtherUses I've started skipping any debug intrinsics at the returned insert-position. This should not have any _meaningful_ effect on the compiler output: at worst it means variable assignments that are skipped will now cover the freeze instruction and anything inserted before it, which should be inconsequential. Sadly: this makes the function signature ugly. This is probably the ugliest piece of fallout for the "RemoveDIs" work, but it serves the overall purpose of improving compile times and not allowing `-g` to affect compiler output, so should be worthwhile in the end.	2023-11-30 12:19:57 +00:00
Simon Pilgrim	3246a32d3f	Fix MSVC "not all control paths return a value" warning. NFC.	2023-11-30 10:07:01 +00:00
Nikita Popov	6d2dfd37bd	[LPM] Set gen_crash_diag=false for non-MSSA pass in MSSA pipeline When a loop pass that does not preserve MSSA is run as part of a loop-mssa pipeline, this is user error and we should not ask for a bug report. Fixes https://github.com/llvm/llvm-project/issues/73554.	2023-11-30 10:21:35 +01:00
Philip Reames	e947f95337	[LSR][TTI][RISCV] Enable terminator folding for RISC-V If looking for a miscompile revert candidate, look here! The transform being enabled prefers comparing to a loop invariant exit value for a secondary IV over using an otherwise dead primary IV. This increases register pressure (by requiring the exit value to be live through the loop), but reduces the number of instructions within the loop by one. On RISC-V which has a large number of scalar registers, this is generally a profitable transform. We loose the ability to use a beqz on what is typically a count down IV, and pay the cost of computing the exit value on the secondary IV in the loop preheader, but save an add or sub in the loop body. For anything except an extremely short running loop, or one with extreme register pressure, this is profitable. On spec2017, we see a 0.42% geomean improvement in dynamic icount, with no individual workload regressing by more than 0.25%. Code size wise, we trade a (possibly compressible) beqz and a (possibly compressible) addi for a uncompressible beq. We also add instructions in the preheader. Net result is a slight regression overall, but neutral or better inside the loop. Previous versions of this transform had numerous cornercase correctness bugs. All of them ones I can spot by inspection have been fixed, and I have run this through all of spec2017, but there may be further issues lurking. Adding uses to an IV is a fraught thing to do given poison semantics, so this transform is somewhat inherently risky. This patch is a reworked version of D134893 by @eop. That patch has been abandoned since May, so I picked it up, reworked it a bit, and am landing it.	2023-11-29 12:04:06 -08:00
Nikita Popov	4b3ea337ad	[ValueTracking] Convert isKnownNonNegative() to use SimplifyQuery (NFC)	2023-11-29 10:52:52 +01:00
Florian Hahn	d045e23c2d	[ConstraintElim] Refactor GEP offset collection. Move GEP offset collection to separate helper function and collect variable and constant offsets in OffsetResult. For now, this only supports 1 VariableOffset, but the new code structure can be more easily extended to handle more offsets in the future. The refactoring drops the check that the VariableOffset >= -1 * constant offset. This is not needed to check whether the constraint is monotonically increasing. The constant factors can be ignored, the constraint will be monotonically increasing if all variables are positive. See https://alive2.llvm.org/ce/z/ah2uSQ, https://alive2.llvm.org/ce/z/NCADNZ	2023-11-27 09:05:58 +00:00
Aiden Grossman	5eb85c052e	[JumpThreading] Remove LVI printer flag (#73426 ) This patch removes the -print-lvi-after-jump-threading flag now that we can print everything in the LVI cache using the print<lazy-value-info> pass.	2023-11-27 00:19:23 -08:00
Nikita Popov	2b646b5989	[CVP] Don't try to fold load/store operands to constant (#73338 ) CVP currently tries to fold load/store pointer operands to constants using LVI. If there is a dominating condition of the form `icmp eq ptr %p, @g`, then `%p` will be replaced with `@g`. LVI is geared towards range-based optimizations, and is very inefficient at handling simple pointer equality conditions. We have other passes that can handle this optimization in a more efficient way, such as IPSCCP and GVN. Removing this optimization gives a geomean 0.4-1.2% compile-time improvement depending on configuration. At the same time, there is no impact on codegen.	2023-11-27 09:17:03 +01:00
Youngsuk Kim	c419f3e10e	[llvm][SROA] Replace calls to Type::getPointerTo (NFC) NFC cleanup towards removing method Type::getPointerTo. * Remove unnecessary call to Type::getPointerTo * Replace call to Type::getPointerTo with IRB.getPtrTy	2023-11-26 20:17:06 -06:00

1 2 3 4 5 ...

12812 Commits