llvm-project

Author	SHA1	Message	Date
Florian Hahn	2f7ccaf4a8	[SCEV] Add predicate in SolveLinEq to ensure B is a multiple of A. (#108777 ) This can help in cases where pointer alignment info is missing, e.g. https://github.com/llvm/llvm-project/pull/108210 The predicate is formed for the complex expression that's passed to SolveLinEquationWithOverflow and the checks could probably be pushed closer to the root nodes, which in some cases may be cheaper to check. PR: https://github.com/llvm/llvm-project/pull/108777	2024-09-28 14:19:57 +01:00
Nikita Popov	5bcc82d433	[LoopPeel] Fix LCSSA phi node invalidation In the test case, the BECount of the second loop uses %load, but we only have an LCSSA phi node for %add, so that is what gets invalidated. Use the forgetLcssaPhiWithNewPredecessor() API instead, which will invalidate the roots of the expression instead. Fixes https://github.com/llvm/llvm-project/issues/109333.	2024-09-20 17:01:41 +02:00
Nikita Popov	4ec4ac15ed	[SCEVExpander] Fix addrec cost model (#106704 ) The current isHighCostExpansion cost model for addrecs computes the cost for some kind of polynomial expansion that does not appear to have any relation to addrec expansion whatsoever. A literal expansion of an affine addrec is a phi and add (plus the expansion of start and step). For a non-affine addrec, we get another phi+add for each additional addrec nested in the step recurrence. This partially `fixes` https://github.com/llvm/llvm-project/issues/53205 (the runtime unroll test case in this PR).	2024-09-19 09:39:35 +02:00
Ganesh	02e4186d0b	[X86] AMD Zen 5 Initial enablement (#107964 ) This patch enables the basic skeleton enablement of AMD next gen zen5 CPUs.	2024-09-13 17:45:33 +01:00
Nikita Popov	52b879594f	[LoopUnroll] Avoid undef values in test (NFC) Avoid most of the code being optimized away as a result of optimization improvements.	2024-09-03 12:10:29 +02:00
Nikita Popov	fe1a1eee2f	[Tests] Regenerate test checks (NFC)	2024-09-03 11:42:47 +02:00
Nikita Popov	9edd998e10	[LoopUnroll] Add test for #53205 (NFC)	2024-08-29 16:43:56 +02:00
Nikita Popov	fe182ddf1f	[LoopUnrollAnalyzer] Use constant folding API for loads Use ConstantFoldLoadFromConst() instead of a partial re-implementation. This makes the code slightly more generic by not depending on the exact structure of the constant.	2024-08-28 11:53:25 +02:00
James Y Knight	dfeb3991fb	Remove the `x86_mmx` IR type. (#98505 ) It is now translated to `<1 x i64>`, which allows the removal of a bunch of special casing. This _incompatibly_ changes the ABI of any LLVM IR function with `x86_mmx` arguments or returns: instead of passing in mmx registers, they will now be passed via integer registers. However, the real-world incompatibility caused by this is expected to be minimal, because Clang never uses the x86_mmx type -- it lowers `__m64` to either `<1 x i64>` or `double`, depending on ABI. This change does _not_ eliminate the SelectionDAG `MVT::x86mmx` type. That type simply no longer corresponds to an IR type, and is used only by MMX intrinsics and inline-asm operands. Because SelectionDAGBuilder only knows how to generate the operands/results of intrinsics based on the IR type, it thus now generates the intrinsics with the type MVT::v1i64, instead of MVT::x86mmx. We need to fix this before the DAG LegalizeTypes, and thus have the X86 backend fix them up in DAGCombine. (This may be a short-lived hack, if all the MMX intrinsics can be removed in upcoming changes.) Works towards issue #98272.	2024-07-25 09:19:22 -04:00
v01dXYZ	cff8d716bd	[SCEV] forgetValue: support (with-overflow-inst op0, op1) (#98015 ) The use-def walk in forgetValue() was skipping instructions with non-SCEVable types. However, SCEV may look past with.overflow intrinsics returning aggregates. Fixes #97586.	2024-07-09 09:14:33 +02:00
Nikita Popov	37c736e035	[LoopUnroll] Use poison instead of undef for another preheader value	2024-06-25 12:17:20 +02:00
Nikita Popov	eeb0884e66	[LoopUnroll] Use poison instead of undef for preheader value	2024-06-25 12:09:58 +02:00
Stephen Tozer	094572701d	[RemoveDIs] Print IR with debug records by default (#91724 ) This patch makes the final major change of the RemoveDIs project, changing the default IR output from debug intrinsics to debug records. This is expected to break a large number of tests: every single one that tests for uses or declarations of debug intrinsics and does not explicitly disable writing records. If this patch has broken your downstream tests (or upstream tests on a configuration I wasn't able to run): 1. If you need to immediately unblock a build, pass `--write-experimental-debuginfo=false` to LLVM's option processing for all failing tests (remember to use `-mllvm` for clang/flang to forward arguments to LLVM). 2. For most test failures, the changes are trivial and mechanical, enough that they can be done by script; see the migration guide for a guide on how to do this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates 3. If any tests fail for reasons other than FileCheck check lines that need updating, such as assertion failures, that is most likely a real bug with this patch and should be reported as such. For more information, see the recent PSA: https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578	2024-06-14 15:07:27 +01:00
Sameer Sahasrabuddhe	e0ac087ff0	[LoopUnroll] Consider convergence control tokens when unrolling (#91715 ) - There is no restriction on a loop with controlled convergent operations when the relevant tokens are defined and used within the loop. - When a token defined outside a loop is used inside (also called a loop convergence heart), unrolling is allowed only in the absence of remainder or runtime checks. - When a token defined inside a loop is used outside, such a loop is said to be "extended". This loop can only be unrolled by also duplicating the extended part lying outside the loop. Such unrolling is disabled for now. - Clean up loop hearts: When unrolling a loop with a heart, duplicating the heart will introduce multiple static uses of a convergence control token in a cycle that does not contain its definition. This violates the static rules for tokens, and needs to be cleaned up into a single occurrence of the intrinsic. - Spell out the initializer for UnrollLoopOptions to improve readability. Original implementation [D85605] by Nicolai Haehnle <nicolai.haehnle@amd.com>.	2024-06-06 13:13:46 +05:30
Sergey Kachkov	f34dedbf44	[LoopPeel] Support min/max intrinsics in loop peeling (#93162 ) This patch adds processing of min/max intrinsics in LoopPeel in the similar way as it was done for conditional statements: for min/max(IterVal, BoundVal) we peel iterations where IterVal < BoundVal for monotonically increasing IterVal; for monotonically decreasing IterVal we peel iterations where IterVal > BoundVal (strict comparision predicates are used to minimize number of peeled iterations).	2024-05-31 13:58:10 +03:00
Sergey Kachkov	60a890d855	[LoopPeel] Add pre-commit test for min/max intrinsics	2024-05-31 13:06:08 +03:00
Simon Pilgrim	54e52aa5eb	[X86] Reduce znver3/4 LoopMicroOpBufferSize to practical loop unrolling values (#91340 ) The znver3/4 scheduler models have previously associated the LoopMicroOpBufferSize with the maximum size of their op caches, and when this led to quadratic complexity issues this were reduced to a value of 512 uops, based mainly on compilation time and not its effectiveness on runtime performance. From a runtime performance POV, a large LoopMicroOpBufferSize leads to a higher number of loop unrolls, meaning the cpu has to rely on the frontend decode rate (4 ins/cy max) for much longer to fill the op cache before looping begins and we make use of the faster op cache rate (8/9 ops/cy). This patch proposes we instead cap the size of the LoopMicroOpBufferSize based off the maximum rate from the op cache (znver3 = 8op/cy, znver4 = 9op/cy) and the branch misprediction penalty from the opcache (~12cy) as a estimate of the useful number of ops we can unroll a loop by before mispredictions are likely to cause stalls. This isn't a perfect metric, but does try to be closer to the spirit of how we use LoopMicroOpBufferSize in the compiler vs the size of a similar naming buffer in the cpu.	2024-05-16 14:44:00 +01:00
Dmitri Gribenko	83974a4b92	Revert "[LoopUnroll] Clamp PartialThreshold for large LoopMicroOpBufferSize (#67657 )" This reverts commit f0b3654701bde1cf7821d60698b42383edaff9f3. This commit triggers UB by reading an uninitialized variable. `UP.PartialThreshold` is used uninitialized in `getUnrollingPreferences()` when it is called from `LoopVectorizationPlanner::executePlan()`. In this case the `UP` variable is created on the stack and its fields are not initialized. ``` ==8802==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x557c0b081b99 in llvm::BasicTTIImplBase<llvm::X86TTIImpl>::getUnrollingPreferences(llvm::Loop, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter) llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h #1 0x557c0b07a40c in llvm::TargetTransformInfo::Model<llvm::X86TTIImpl>::getUnrollingPreferences(llvm::Loop, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter) llvm-project/llvm/include/llvm/Analysis/TargetTransformInfo.h:2277:17 #2 0x557c0f5d69ee in llvm::TargetTransformInfo::getUnrollingPreferences(llvm::Loop, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter) const llvm-project/llvm/lib/Analysis/TargetTransformInfo.cpp:387:19 #3 0x557c0e6b96a0 in llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree, bool, llvm::DenseMap<llvm::SCEV const, llvm::Value, llvm::DenseMapInfo<llvm::SCEV const, void>, llvm::detail::DenseMapPair<llvm::SCEV const, llvm::Value>> const) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7624:7 #4 0x557c0e6e4b63 in llvm::LoopVectorizePass::processLoop(llvm::Loop) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10253:13 #5 0x557c0e6f2429 in llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo, llvm::TargetLibraryInfo, llvm::DemandedBits&, llvm::AssumptionCache&, llvm::LoopAccessInfoManager&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10344:30 #6 0x557c0e6f2f97 in llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10383:9 [...] Uninitialized value was created by an allocation of 'UP' in the stack frame #0 0x557c0e6b961e in llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree, bool, llvm::DenseMap<llvm::SCEV const, llvm::Value, llvm::DenseMapInfo<llvm::SCEV const, void>, llvm::detail::DenseMapPair<llvm::SCEV const, llvm::Value>> const) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7623:3 ```	2024-05-16 12:11:42 +02:00
Nikita Popov	f0b3654701	[LoopUnroll] Clamp PartialThreshold for large LoopMicroOpBufferSize (#67657 ) The znver3/znver4 scheduler models are outliers, specifying very large LoopMicroOpBufferSizes at 512, while typical values for other subtargets are on the order of ~50. Even if this information is micro-architecturally correct (), this does not mean that we want to runtime unroll all loops to a size that completely fills the loop buffer. Unless this is the single hot loop in the entire application, the massive code size increase will bust the micro-op and instruction caches. Protect against this by clamping to the default PartialThreshold of 150, which is the same as the default full-unroll threshold and half the aggressive full-unroll threshold. Allowing more partial unrolling than full unrolling certainly does not make sense. () I strongly doubt that this is actually correct -- I believe this may derive from an incorrect reading of Agner Fog's micro-architecture guide. The number 4096 that was originally used here is the size of the general micro-op cache, not that of a loop buffer. A separate loop buffer is not listed for the Zen microarchitecture. Comparing this to the listing for Skylake, it has a 1536 micro-op buffer, but only a 64 micro-op loopback buffer, with a note that it's rarely fully utilized. Our scheduling model specifies LoopMicroOpBufferSize of 50 in that case.	2024-05-16 10:21:22 +09:00
chenlin	79643565a8	[LoopUnroll] Remove redundant debug instructions after blocks have been merged (#91246 ) Remove redundant debug instructions after blocks have been merged into the predecessor, It can reduce some compile time in some cases. This change only fixes the situation of loop unrolling, and other situations are not considered. "RemoveRedundantDbgInstrs" seems to be very time-consuming. Thus, we just add here after the "Dest" has been merged into the "Fold", this may be a more targeted solution!!! fixes: https://github.com/llvm/llvm-project/issues/89073	2024-05-13 09:42:04 -07:00
Jorge Gorbe Moya	2cde0e2f97	Revert "[BasicBlockUtils] Remove redundant llvm.dbg instructions after blocks to reduce compile time (#89069 )" This reverts commit 2e3e0868748635b779ba89a772eae3664bd822e4. It caused quadratic slowdown at compilation time in some cases. See the comments in the original PR: https://github.com/llvm/llvm-project/pull/89069	2024-05-03 13:05:08 -07:00
annamthomas	46c2d93662	[StandardInstrumentation] Annotate loops with the function name (#90756 ) When analyzing pass debug output it is helpful to have the function name along with the loop name.	2024-05-03 14:13:59 -04:00
Florian Hahn	175d297102	[LoopUnroll] Add CSE to remove redundant loads after unrolling. (#83860 ) This patch adds loadCSE support to simplifyLoopAfterUnroll. It is based on EarlyCSE's implementation using ScopeHashTable and is using SCEV for accessed pointers to check to find redundant loads after unrolling. This applies to the late unroll pass only, for full unrolling those redundant loads will be cleaned up by the regular pipeline. The current approach constructs MSSA on-demand per-loop, but there is still small but notable compile-time impact: stage1-O3 +0.04% stage1-ReleaseThinLTO +0.06% stage1-ReleaseLTO-g +0.05% stage1-O0-g +0.02% stage2-O3 +0.09% stage2-O0-g +0.04% stage2-clang +0.02% https://llvm-compile-time-tracker.com/compare.php?from=c089fa5a729e217d0c0d4647656386dac1a1b135&to=ec7c0f27cb5c12b600d9adfc8543d131765ec7be&stat=instructions:u This benefits some workloads with runtime-unrolling disabled, where users use pragmas to force unrolling, as well as with runtime unrolling enabled. On SPEC/MultiSource, this removes a number of loads after unrolling on AArch64 with runtime unrolling enabled. ``` External/S...te/526.blender_r/526.blender_r 96 MultiSourc...rks/mediabench/gsm/toast/toast 39 SingleSource/Benchmarks/Misc/ffbench 4 External/SPEC/CINT2006/403.gcc/403.gcc 18 MultiSourc.../Applications/JM/ldecod/ldecod 4 MultiSourc.../mediabench/jpeg/jpeg-6a/cjpeg 6 MultiSourc...OE-ProxyApps-C/miniGMG/miniGMG 9 MultiSourc...e/Applications/ClamAV/clamscan 4 MultiSourc.../MallocBench/espresso/espresso 3 MultiSourc...dence-flt/LinearDependence-flt 2 MultiSourc...ch/office-ispell/office-ispell 4 MultiSourc...ch/consumer-jpeg/consumer-jpeg 6 MultiSourc...ench/security-sha/security-sha 11 MultiSourc...chmarks/McCat/04-bisect/bisect 3 SingleSour...tTests/2020-01-06-coverage-009 12 MultiSourc...ench/telecomm-gsm/telecomm-gsm 39 MultiSourc...lds-flt/CrossingThresholds-flt 24 MultiSourc...dence-dbl/LinearDependence-dbl 2 External/S...C/CINT2006/445.gobmk/445.gobmk 6 MultiSourc...enchmarks/mafft/pairlocalalign 53 External/S...31.deepsjeng_r/531.deepsjeng_r 3 External/S...rate/510.parest_r/510.parest_r 58 External/S...NT2006/464.h264ref/464.h264ref 29 External/S...NT2017rate/502.gcc_r/502.gcc_r 45 External/S...C/CINT2006/456.hmmer/456.hmmer 6 External/S...te/538.imagick_r/538.imagick_r 18 External/S.../CFP2006/447.dealII/447.dealII 4 MultiSourc...OE-ProxyApps-C++/miniFE/miniFE 12 External/S...2017rate/525.x264_r/525.x264_r 36 MultiSourc...Benchmarks/7zip/7zip-benchmark 33 MultiSourc...hmarks/ASC_Sequoia/AMGmk/AMGmk 2 MultiSourc...chmarks/VersaBench/8b10b/8b10b 1 MultiSourc.../Applications/JM/lencod/lencod 116 MultiSourc...lds-dbl/CrossingThresholds-dbl 24 MultiSource/Benchmarks/McCat/05-eks/eks 15 ``` PR: https://github.com/llvm/llvm-project/pull/83860	2024-05-02 11:01:24 +01:00
CL	2e3e086874	[BasicBlockUtils] Remove redundant llvm.dbg instructions after blocks to reduce compile time (#89069 ) this patch is to fix the compile time for some cases, before this change, some targets (riscv-64, ve) will spend much more compile time on this case (https://godbolt.org/z/rrov17cTo). With this change, the compile time was reduced a lot. Fixes https://github.com/llvm/llvm-project/issues/89073 PR: https://github.com/llvm/llvm-project/pull/89069	2024-04-26 10:57:37 +01:00
Florian Hahn	01f8da908c	[LoopUnroll] Add tests for performing load CSE after unrolling. Precommit tests for https://github.com/llvm/llvm-project/pull/83860.	2024-04-24 12:00:16 +01:00
Graham Hunter	03f852f704	[AArch64] Improve cost model for legal subvec insert/extract (#81135 ) Currently we model subvector inserts and extracts as shuffles, potentially going as far as scalarizing. If the types are legal then they can just be simple zip/unzip operations, or possible even no-ops. Change the cost to a relatively small one to ensure that simple loops featuring such operations between fixed and scalable vector types that are effectively the same at a given sve width can be unrolled and further optimized.	2024-03-04 16:17:01 +00:00
Vedant Paranjape	e209178d64	[SimplifyIndVar] LCSSA form is destroyed by simplifyLoopIVs, preserve it (#78696 ) In LoopUnroll, peelLoop is called on the loop. After the loop is peeled it calls simplifyLoopAfterUnroll on the loop. This call to simplifyLoopAfterUnroll doesn't preserve the LCSSA form of the parent loop and thus during the next call to peelLoop the LCSSA form is already broken. LoopPeel util takes in the PreserveLCSSA argument and it passes on the same argument to simplifyLoop which checks if the loop is in a valid LCSSA form, when (PreserveLCSSA = true). This causes an assert in simplifyLoop when (PreserveLCSSA = true), as during the last call LCSSA for the loop wasn't preserved, and thus crashes at the following assert. assert(L->isRecursivelyLCSSAForm(DT, LI) && "Requested to preserve LCSSA, but it's already broken."); Upon debugging, it is evident that simplifyLoopIVs call inside simplifyLoopAfterUnroll breaks the LCSSA form. This patch fixes llvm#77118, it checks if the replacement of IV Users with Loop Invariant preserves the LCSSA form. If it does not, it emits the required LCSSA Phi instructions.	2024-02-21 17:51:56 +05:30
Graham Hunter	ad78e210bd	[NFC][AArch64] Tests for guarding unrolling with scalable vec ins/ext (#81132 )	2024-02-19 09:47:49 +00:00
Sergey Kachkov	ffd79b3312	[LoopUnroll] Consider simplified operands while retrieving TTI instruction cost (#70929 ) Get more precise cost of instruction after LoopUnroll considering that some operands of it can be simplified, e.g. induction variable will be replaced by constant after full unrolling.	2024-02-06 17:01:38 +03:00
modiking	99ddd77ed9	[LoopUnroll] Introduce PragmaUnrollFullMaxIterations as a hard cap on how many iterations we try to unroll (#78648 ) Fixes [PR77842](https://github.com/llvm/llvm-project/issues/77842) where UBSAN causes pragma full unroll to try and unroll INT_MAX times. This sets a cap to make sure we don't attempt this and crash the compiler. Testing: ninja check-all with new test --------- Co-authored-by: Nikita Popov <github@npopov.com>	2024-02-05 17:01:00 -08:00
Nikita Popov	2d69827c5c	[Transforms] Convert tests to opaque pointers (NFC)	2024-02-05 11:57:34 +01:00
Nikita Popov	62ae7d976f	[LoopUnroll] Fix missing sign extension For integers larger than 64-bit, this would zero-extend a -1 value, instead of sign-extending it. Fixes https://github.com/llvm/llvm-project/issues/80289.	2024-02-01 16:08:25 +01:00
Nikita Popov	f7b05e055f	[LoopUnroll] Add test for #80289 (NFC)	2024-02-01 16:08:25 +01:00
Nikita Popov	90ba33099c	[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882 ) This patch canonicalizes getelementptr instructions with constant indices to use the `i8` source element type. This makes it easier for optimizations to recognize that two GEPs are identical, because they don't need to see past many different ways to express the same offset. This is a first step towards https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699. This is limited to constant GEPs only for now, as they have a clear canonical form, while we're not yet sure how exactly to deal with variable indices. The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives two representative examples of the kind of optimization improvement we expect from this change. In the first test SimplifyCFG can now realize that all switch branches are actually the same. In the second test it can convert it into simple arithmetic. These are representative of common optimization failures we see in Rust. Fixes https://github.com/llvm/llvm-project/issues/69841.	2024-01-24 15:25:29 +01:00
Yingwei Zheng	b7f50e13d8	[InstCombine] Improve `foldICmpWithDominatingICmp` with DomConditionCache (#75370 ) This patch uses affected values from DomConditionCache(introduced by #73662), instead of a cheap/incomplete check `getSinglePredecessor`.	2023-12-14 21:02:10 +08:00
XiangZhang	1d6a678591	[LoopUnroll] Make use of MaxTripCount for loops with "#pragma unroll" (#74703 ) Fix loop unroll fail caused by branches folding. For example: SimplifyCFG foldloop branches then cause loop unroll failed for "#program unroll" loop. ``` #program unroll for (int I = 0; I < ConstNum; ++I) { // folding "I < ConstNum" and "Cond2" if (Cond2) { break; } xxx loop body; } ``` The pragma unroll metadata only takes effect if there is an exact trip count, but not if there is an upper bound trip count. This patch make it work with an upper bound trip count as well in shouldPragmaUnroll(). Loop unroll is important in stack nervous devices (e.g. GPU, and that is why a lot of GPU code mark loop with "#program unroll"). It usually much simplify the address (offset) calculations in old iterations, then we can do a lot of others optimizations, e.g, SROA, for these simplifed address (escape alloca the whole aggregates).	2023-12-08 19:43:10 +08:00
Philip Reames	ffb2af3ed6	[SCEVExpander] Attempt to reinfer flags dropped due to CSE (#72431 ) LSR uses SCEVExpander to generate induction formulas. The expander internally tries to reuse existing IR expressions. To do that, it needs to strip any poison generating flags (nsw, nuw, exact, nneg, etc..) which may not be valid for the newly added users. This is conservatively correct, but has the effect that LSR will strip nneg flags on zext instructions involved in trip counts in loop preheaders. To avoid this, this patch adjusts the expanded to reinfer the flags on the CSE candidate if legal for all possible users. This should fix the regression reported in https://github.com/llvm/llvm-project/issues/71200. This should arguably be done inside canReuseInstruction instead, but doing it outside is more conservative compile time wise. Both canReuseInstruction and isGuaranteedNotToBePoison walk operand lists, so right now we are performing work which is roughly O(N^2) in the size of the operand graph. We should fix that before making the per operand step more expensive. My tenative plan is to land this, and then rework the code to sink the logic into more core interfaces.	2023-12-07 13:20:36 -08:00
Nikita Popov	eecb99c5f6	[Tests] Add disjoint flag to some tests (NFC) These tests rely on SCEV looking recognizing an "or" with no common bits as an "add". Add the disjoint flag to relevant or instructions in preparation for switching SCEV to use the flag instead of the ValueTracking query. The IR with disjoint flag matches what InstCombine would produce.	2023-12-05 14:09:36 +01:00
Joshua Cao	5602636835	[LoopPeel] Peel iterations based on and, or conditions (#73413 ) For example, this allows us to peel this loop with a `and`: ``` for (int i = 0; i < N; ++i) { if (i % 2 == 0 && i < 3) // can peel based on \|\| as well f1(); f2(); ``` into: ``` for (int i = 0; i < 3; ++i) { // peel three iterations if (i % 2 == 0) f1(); f2(); } for (int i = 3; i < N; ++i) f2(); ```	2023-12-02 11:24:02 -08:00
Jeremy Morse	d2d9dc8eb4	[DebugInfo][RemoveDIs] Make debugify pass convert to/from RemoveDIs mode (#73251 ) Debugify is extremely useful as a testing and debugging tool, and a good number of LLVM-IR transform tests use it. We need it to support "new" non-instruction debug-info to get test coverage, but it's not important enough to completely convert right now (and it'd be a large undertaking). Thus: convert to/from dbg.value/DPValue mode on entry and exit of the pass, which gives us the functionality without any further work. The cost is compile-time, but again this is only happening during tests. Tested by: the large set of debugify tests enabled here. Note the InstCombine test (cast-mul-select.ll) that hasn't been fully enabled: this is because there's a debug-info sinking piece of code there that hasn't been instrumented.	2023-11-29 13:19:50 +00:00
Nikita Popov	a86bce6577	[LoopPeel] Regenerate test checks (NFC)	2023-11-28 15:42:07 +01:00
Craig Topper	03d4a9d94d	[InstCombine] Set disjoint flag when turning Add into Or. (#72702 ) The disjoint flag was recently added to IR in #72583	2023-11-27 12:54:11 -08:00
Jeremy Morse	c672ba7dde	[DebugInfo][RemoveDIs] Instrument inliner for non-instr debug-info (#72884 ) With intrinsics representing debug-info, we just clone all the intrinsics when inlining a function and don't think about it any further. With non-instruction debug-info however we need to be a bit more careful and manually move the debug-info from one place to another. For the most part, this means keeping a "cursor" during block cloning of where we last copied debug-info from, and performing debug-info copying whenever we successfully clone another instruction. There are several utilities in LLVM for doing this, all of which now need to manually call cloneDebugInfo. The testing story for this is not well covered as we could rely on normal instruction-cloning mechanisms to do all the hard stuff. Thus, I've added a few tests to explicitly test dbg.value behaviours, ahead of them becoming not-instructions.	2023-11-26 21:24:29 +00:00
Jeremy Morse	59fab22642	[DebugInfo][RemoveDIs] Support cloning and remapping DPValues (#72546 ) This patch adds support for CloneBasicBlock duplicating the DPValues attached to instructions, and adds facilities to remap them into their new context. The plumbing to achieve this is fairly straightforwards and mechanical. I've also added illustrative uses to LoopUnrollRuntime, SimpleLoopUnswitch and SimplifyCFG. The former only updates for the epilogue right now so I've added CHECK lines just for the end of an unrolled loop (further updates coming later). SimpleLoopUnswitch had no debug-info tests so I've added a new one. The two modified parts of SimplifyCFG are covered by the two modified SimplifyCFG tests. These are scenarios where we have to do extra cloning for copying of DPValues because they're no longer instructions, and remap them too.	2023-11-24 15:17:32 +00:00
Ramkumar Ramachandra	4c01a58008	update_analyze_test_checks: support output from LAA (#67584 ) update_analyze_test_checks.py is an invaluable tool in updating tests. Unfortunately, it only supports output from the CostModel, ScalarEvolution, and LoopVectorize analyses. Many LoopAccessAnalysis tests use hand-crafted CHECK lines, and it is moreover tedious to generate these CHECK lines, as the output fom the analysis is not stable, and requires the test-writer to hand-craft FileCheck matches. Alleviate this pain, and support output from: $ opt -passes='print<loop-accesses>' This patch includes several non-trivial changes including: - Preserving whitespace at the beginning of the line, so that the LAA output can be properly indented. - Regexes matching the unstable output, which is basically a pointer address hex. - Separating is_analyze from preserve_names clearly, as the former was formerly used as an overload for the latter. To demonstate the utility of this patch, several tests in LoopAccessAnalysis have been auto-generated by update_analyze_test_checks.py.	2023-10-31 14:33:53 +00:00
Aleksandr Popov	e8d5db206c	[LoopPeeling] Fix weights updating of peeled off branches (#70094 ) In https://reviews.llvm.org/D64235 a new algorithm has been introduced for updating the branch weights of latch blocks and their copies. It increases the probability of going to the exit block for each next peel iteration, calculating weights by (F - I * E, E), where: - F is a weight of the edge from latch to header. - E is a weight of the edge from latch to exit. - I is a number of peeling iteration. E.g: Let's say the latch branch weights are (100,300) and the estimated trip count is 4. If we peel off all 4 iterations the weights of the copied branches will be: 0: (100,300) 1: (100,200) 2: (100,100) 3: (100,1) https://godbolt.org/z/93KnoEsT6 So we make the original loop almost unreachable from the 3rd peeled copy according to the profile data. But that's only true if the profiling data is accurate. Underestimated trip count can lead to a performance issues with the register allocator, which may decide to spill intervals inside the loop assuming it's unreachable. Since we don't know how accurate the profiling data is, it seems better to set neutral 1/1 weights on the last peeled latch branch. After this change, the weights in the example above will look like this: 0: (100,300) 1: (100,200) 2: (100,100) 3: (100,100) Co-authored-by: Aleksandr Popov <apopov@azul.com>	2023-10-31 14:02:42 +01:00
David Green	75b3c3d267	[ARM] Disable UpperBound loop unrolling for MVE tail predicated loops. (#69709 ) For MVE tail predicated loops, better code can be generated by keeping the loop whole than to unroll to an upper bound, which requires the expansion of active lane masks that can be difficult to generate good code for. This patch disables UpperBound unrolling when we find a active_lane_mask in the loop.	2023-10-31 09:51:30 +00:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Alex Richardson	e86d6a43f0	Regenerate test checks for tests affected by D141060	2023-10-04 10:51:35 -07:00
Nikita Popov	4d5525e0cb	[LoopUnroll] Add tests for excessive znver3 unrolls (NFC)	2023-09-28 12:10:38 +02:00

1 2 3 4 5 ...

608 Commits