llvm-project

Author	SHA1	Message	Date
Kazu Hirata	890c4bece2	[memprof] Use SmallVector for InlinedCallStack (NFC) (#114599 ) We can stay within 8 inlined elements more than 99% of the time while building a large application.	2024-11-01 19:52:11 -07:00
Min-Yih Hsu	64314dedeb	[InlineCost] Print inline cost for invoke call sites as well (#114476 ) Previously InlineCostAnnotationPrinter only prints inline cost for call instructions. I don't think there is any reason not to analyze invoke and its callee, and this patch adds such support.	2024-11-01 09:55:17 -07:00
Yingwei Zheng	a77dedcacb	[InstSimplify][InstCombine][ConstantFold] Move vector div/rem by zero fold to InstCombine (#114280 ) Previously we fold `div/rem X, C` into `poison` if any element of the constant divisor `C` is zero or undef. However, it is incorrect when threading udiv over an vector select: https://alive2.llvm.org/ce/z/3Ninx5 ``` define <2 x i32> @vec_select_udiv_poison(<2 x i1> %x) { %sel = select <2 x i1> %x, <2 x i32> <i32 -1, i32 -1>, <2 x i32> <i32 0, i32 1> %div = udiv <2 x i32> <i32 42, i32 -7>, %sel ret <2 x i32> %div } ``` In this case, `threadBinOpOverSelect` folds `udiv <i32 42, i32 -7>, <i32 -1, i32 -1>` and `udiv <i32 42, i32 -7>, <i32 0, i32 1>` into `zeroinitializer` and `poison`, respectively. One solution is to introduce a new flag indicating that we are threading over a vector select. But it requires to modify both `InstSimplify` and `ConstantFold`. However, this optimization doesn't provide benefits to real-world programs: https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/IR/ConstantFold.cpp.html#L908 https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp.html#L1107 This patch moves the fold into InstCombine to avoid breaking numerous existing tests. Fixes #114191 and #113866 (only poison-safety issue).	2024-11-01 22:56:22 +08:00
David Green	0f919444ad	[ValueTracking] Handle recursive phis in knownFPClass (#114008 ) As a follow-on to 113686, this breaks the recursion between phi nodes that have p1 = phi(x, p2) and p2 = phi(y, p1). The knownFPClass can be calculated from the classes of p1 and p2.	2024-11-01 13:38:29 +00:00
c8ef	cf0b6cc711	Revert "[ConstantFold] Fold `tgamma` and `tgammaf` when the input parameter is a constant value." (#114496 ) Reverts llvm/llvm-project#114065	2024-11-01 09:26:11 +08:00
c8ef	1f07f995cc	[ConstantFold] Fold `tgamma` and `tgammaf` when the input parameter is a constant value. (#114065 ) This patch adds support for constant folding for the `tgamma` and `tgammaf` libc functions.	2024-11-01 09:07:55 +08:00
Manish Kausik H	0856592f6f	Ensure `collectTransitivePredecessors` returns Pred only from the Loop. (#113831 ) It's possible that we encounter Irreducible control flow, due to which, we may find that a few predecessors of BB are not a part of the CurLoop. Currently we crash in the function for such cases. This patch ensures that we only return Predecessors that are a part of CurLoop and gracefully ignore other Predecessors. For example, consider Irreducible IR of this form: ``` define i64 @baz() { bb: br label %bb1 bb1: ; preds = %bb3, %bb br label %bb3 bb2: ; No predecessors! br label %bb3 bb3: ; preds = %bb2, %bb1 %load = load ptr addrspace(1), ptr addrspace(1) null, align 8 br label %bb1 } ``` This crashes when `collectTransitivePredecessors` is called on the `%bb1<Header>, %bb3<latch>` loop, because the loop body has a predecessor `%bb2` which is not a part of the loop. See https://godbolt.org/z/E9fM1q3cT for the crash	2024-10-31 11:08:15 -07:00
Kenji Mouri / 毛利研二	7e877fc0ac	[Reland][TLI] Add support for hypot libcall. (#114343 ) This patch adds basic support for `hypot`. Constant folding support will be submitted in a subsequent patch. Related issue: https://github.com/llvm/llvm-project/issues/113711 Note: It's my first time contributing to the LLVM with encouragement from one of my friends, @fawdlstty. I learned a lot from https://github.com/llvm/llvm-project/pull/99611, and thanks for that. Note: I had created the same PR and merged (https://github.com/llvm/llvm-project/pull/113724), but reverted caused by the merging issue. (The CI issue happened in 3 A.M. at my timezone. So, I need to fall asleep again after I replied about why issue happened.) So, I rebased to the latest main branch and recreate the PR and hope I won't have the third time to create the same PR. I hope @arsenm can help me review the code again. I’m sorry for that. Kenji Mouri	2024-10-31 07:50:29 -07:00
David Green	9735c05186	[ValueTracking] Compute KnownFP state from recursive select/phi. (#113686 ) Given a recursive phi with select: %p = phi [ 0, entry ], [ %sel, loop] %sel = select %c, %other, %p The fp state can be calculated using the knowledge that the select/phi pair can only be the initial state (0 here) or from %other. This adds a short-cut into computeKnownFPClass for PHI to detect that the select is recursive back to the phi, and if so use the state from the other operand. This helps to address a regression from #83200.	2024-10-31 07:50:44 +00:00
gulfemsavrun	36d5692570	Revert "[TLI] Add support for hypot libcall." (#114312 ) Reverts llvm/llvm-project#113724	2024-10-30 15:10:29 -07:00
Kenji Mouri / 毛利研二	feb2d867fa	[TLI] Add support for hypot libcall. (#113724 ) This patch adds basic support for `hypot`. Constant folding support will be submitted in a subsequent patch. Related issue: https://github.com/llvm/llvm-project/issues/113711 Note: It's my first time contributing to the LLVM with encouragement from one of my friends, @fawdlstty. I learned a lot from https://github.com/llvm/llvm-project/pull/99611, and thanks for that. Kenji Mouri	2024-10-30 10:34:32 -07:00
Fangrui Song	318bdd0aeb	[StackSafetyAnalysis] Bail out when calling ifunc An assertion failure arises when a call instruction calls a GlobalIFunc. Since we cannot reason about the underlying function, just bail out. Fix #87923 Pull Request: https://github.com/llvm/llvm-project/pull/113841	2024-10-29 09:26:47 -07:00
Rohit Aggarwal	dfb60bb919	Adding more vector calls for -fveclib=AMDLIBM (#109662 ) AMD has it's own implementation of vector calls. New vector calls are introduced in the library for exp10, log10, sincos and finite asin/acos Please refer [https://github.com/amd/aocl-libm-ose] --------- Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>	2024-10-29 10:09:55 +00:00
Kyungwoo Lee	0dd9fdcf83	[StructuralHash] Support Differences (#112638 ) This computes a structural hash while allowing for selective ignoring of certain operands based on a custom function that is provided. Instead of a single hash value, it now returns FunctionHashInfo which includes a hash value, an instruction mapping, and a map to track the operand location and its corresponding hash value that is ignored. Depends on https://github.com/llvm/llvm-project/pull/112621. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.	2024-10-26 20:02:05 -07:00
Nashe Mncube	e37d736def	Recommit: [llvm][ARM][GlobalOpt]Add widen global arrays pass (#113289 ) This is a recommit of #107120 . The original PR was approved but failed buildbot. The newly added tests should only be run for compilers that support the ARM target. This has been resolved by adding a config file for these tests. - Pass optimizes memcpy's by padding out destinations and sources to a full word to make ARM backend generate full word loads instead of loading a single byte (ldrb) and/or half word (ldrh). Only pads destination when it's a stack allocated constant size array and source when it's constant string. Heuristic to decide whether to pad or not is very basic and could be improved to allow more examples to be padded. - Pass works at the midend level	2024-10-24 10:12:01 +01:00
Thomas Fransham	b8fddca7bd	[llvm] Support llvm::Any across shared libraries on windows (#108051 ) This is part of the effort to support for enabling plugins on windows by adding better support for building llvm as a DLL. The export macros used here were added in #96630 Since shared library symbols aren't deduplicated across multiple libraries on windows like Linux we have to manually explicitly import and export `Any::TypeId` template instantiations for the uses of `llvm::Any` in the LLVM codebase to support LLVM Windows shared library builds. This change ensures that external code, including LLVM's own tests, can use PassManager callbacks when LLVM is built as a DLL. I also removed the only use of llvm::Any for LoopNest that only existed in debug code and there also doesn't seem to be any code creating `Any<LoopNest>`	2024-10-24 08:07:13 +03:00
Ramkumar Ramachandra	d897ea37db	LAA: check nusw on GEP in place of inbounds (#112223 ) With the introduction of the nusw flag in GEPNoWrapFlags, it should be safe to weaken the check in LoopAccessAnalysis to just check the nusw flag on the GEP, instead of inbounds.	2024-10-22 09:58:54 +01:00
Ramkumar Ramachandra	f719cfa868	LAA: be less conservative in isNoWrap (#112553 ) isNoWrap has exactly one caller which handles Assume = true separately, but too conservatively. Instead, pass Assume to isNoWrap, so it is threaded into getPtrStride, which has the correct handling for the Assume flag. Also note that the Stride == 1 check in isNoWrap is incorrect: getPtrStride returns Strides == 1 or -1, except when isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we can include the case of -1 Stride, and when isNoWrapAddRec is true. With this change, passing Assume = true to getPtrStride could return a non-unit stride, and we correctly handle that case as well.	2024-10-22 09:55:51 +01:00
c8ef	b90ea5caad	[ConstantFold] Fold `erf` and `erff` when the input parameter is a constant value. (#113079 ) This patch adds support for constant folding for the `erf` and `erff` libc functions.	2024-10-22 12:58:11 +08:00
Farzon Lotfi	dcbf2c2ca0	[Scalarizer][DirectX] support structs return types (#111569 ) Based on this RFC: https://discourse.llvm.org/t/rfc-allow-the-scalarizer-pass-to-scalarize-vectors-returned-in-structs/82306 LLVM intrinsics do not support out params. To get around this limitation implementers will make intrinsics return structs to capture a return type and an out param. This implementation detail should not impact scalarization since these cases should be elementwise operations. ## Three changes are needed. - The CallInst visitor needs to be updated to handle Structs - A new visitor is needed for `ExtractValue` instructions - finsh needs to be update to handle structs so that insert elements are properly propogated. ## Testing changes - Add support for `llvm.frexp` - Add support for `llvm.dx.splitdouble` fixes https://github.com/llvm/llvm-project/issues/111437	2024-10-21 12:51:01 -04:00
Nikita Popov	a18dd29077	[ConstantFolding] Set signed/implicitTrunc when handling GEP offsets GEP offsets have sext_or_trunc semantics. We were already doing this for the outer-most GEP, but not for the inner ones. I believe one of the sanitizer buildbot failures was due to this, but I did not manage to reproduce the issue or come up with a test case. Usually the problematic case will already be folded away due to index type canonicalization.	2024-10-21 12:47:02 +02:00
Fawdlstty	20bda93e43	[TLI] Add basic support for scalbnxx (#112936 ) This patch adds basic support for `scalbln, scalblnf, scalblnl, scalbn, scalbnf, scalbnl`. Constant folding support will be submitted in a subsequent patch. Related issue: <#112631>	2024-10-20 14:17:15 -07:00
c8ef	1336e3d0b9	[ConstantFold] Fold `ilogb` and `ilogbf` when the input parameter is a constant value. (#113014 ) This patch adds support for constant folding for the `ilogb` and `ilogbf` libc functions.	2024-10-20 10:46:35 +08:00
Teresa Johnson	5995e4b97b	[MemProf] Disable memprof ICP support by default (#112940 ) A failure showed up after this was committed, rather than revert simply disable this new support to simplify investigation and further testing.	2024-10-18 10:40:27 -07:00
Teresa Johnson	6264288d70	[MemProf] Fix the option to disable memprof ICP (#112917 ) The -enable-memprof-indirect-call-support meant to guard the recently added memprof ICP support was not used in enough places. Specifically, it was not checked in mayHaveMemprofSummary, which is called from the ThinLTO backend applyImports. This led to failures when checking the callsite records, as we incorrectly expected records for indirect calls. Fix the option to be checked in all necessary locations, and add testing.	2024-10-18 10:12:23 -07:00
Mohammed Keyvanzadeh	721b796809	[llvm] prefer isa_and_nonnull over v && isa (#112541 ) Use `isa_and_nonnull<T>(v)` instead of `v && isa<T>(v)`, where `v` is evaluated twice in the latter.	2024-10-18 19:12:04 +03:30
Yingwei Zheng	c89d731c5d	[LVI] Infer non-zero from equality icmp (#112838 ) This following pattern is common in loop headers: ``` %101 = sub nuw i64 %78, %98 %103 = icmp eq i64 %78, %98 br i1 %103, label %.thread.i.i, label %.preheader.preheader.i.i .preheader.preheader.i.i: %invariant.umin.i.i = call i64 @llvm.umin.i64(i64 %101, i64 9) %umax.i = call i64 @llvm.umax.i64(i64 %invariant.umin.i.i, i64 1) br label %.preheader.i.i .preheader.i.i: ... %116 = add nuw nsw i64 %.011.i.i, 1 %exitcond.not.i = icmp eq i64 %116, %umax.i br i1 %exitcond.not.i, label %.critedge.i.i, label %.preheader.i.i ``` As `%78` is not equal to `%98` in BB `.preheader.preheader.i.i`, we can prove `%101` is non-zero. Then we can simplify the loop exit condition. Addresses regression introduced by https://github.com/llvm/llvm-project/pull/112742.	2024-10-18 21:19:02 +08:00
c8ef	761fa5844e	[TLI] Add support for the `ilogb` libcall. (#112725 ) This patch adds the `ilogb` libcall. Constant folding will be handled in subsequent patches.	2024-10-18 14:20:34 +08:00
Kazu Hirata	b47849b4cb	[SCEV] Avoid repeated hash lookups (NFC) (#112656 )	2024-10-17 07:46:32 -07:00
Nashe Mncube	370fd74361	Revert "[llvm][ARM]Add widen global arrays pass" (#112701 ) Reverts llvm/llvm-project#107120 Unexpected build failures in post-commit pipelines. Needs investigation	2024-10-17 13:38:01 +01:00
Nashe Mncube	ab90d2793c	[llvm][ARM]Add widen global arrays pass (#107120 ) - Pass optimizes memcpy's by padding out destinations and sources to a full word to make backend generate full word loads instead of loading a single byte (ldrb) and/or half word (ldrh). Only pads destination when it's a stack allocated constant size array and source when it's constant array. Heuristic to decide whether to pad or not is very basic and could be improved to allow more examples to be padded. - Pass works within GlobalOpt but is disabled by default on all targets except ARM.	2024-10-17 11:56:00 +01:00
Nikita Popov	255a99c29f	[APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (#80309 ) This fixes all the places that hit the new assertion added in https://github.com/llvm/llvm-project/pull/106524 in tests. That is, cases where the value passed to the APInt constructor is not an N-bit signed/unsigned integer, where N is the bit width and signedness is determined by the isSigned flag. The fixes either set the correct value for isSigned, set the implicitTrunc flag, or perform more calculations inside APInt. Note that the assertion is currently still disabled by default, so this patch is mostly NFC.	2024-10-17 08:48:08 +02:00
Yingwei Zheng	aad3a1630e	[ValueTracking] Respect `samesign` flag in `isKnownInversion` (#112390 ) In https://github.com/llvm/llvm-project/pull/93591 we introduced `isKnownInversion` and assumes `X` is poison implies `Y` is poison because they share common operands. But after introducing `samesign` this assumption no longer hold if `X` is an icmp has `samesign` flag. Alive2 link: https://alive2.llvm.org/ce/z/rj3EwQ (Please run it locally with this patch and https://github.com/AliveToolkit/alive2/pull/1098). This approach is the most conservative way in my mind to address this problem. If `X` has `samesign` flag, it will check if `Y` also has this flag and make sure constant RHS operands have the same sign. Fixes https://github.com/llvm/llvm-project/issues/112350.	2024-10-17 00:27:21 +08:00
Rahul Joshi	6924fc0326	[LLVM] Add `Intrinsic::getDeclarationIfExists` (#112428 ) Add `Intrinsic::getDeclarationIfExists` to lookup an existing declaration of an intrinsic in a `Module`.	2024-10-16 07:21:10 -07:00
Amr Hesham	4ba1800be6	[LLVM][NFC] Reduce copying of parameter in lambda (#110299 ) Reduce redundant copy parameter in lambda Fixes #95642	2024-10-16 09:55:01 +01:00
Alexey Bader	583fa4f5b7	[InstCombine] Extend fcmp+select folding to minnum/maxnum intrinsics (#112088 ) Today, InstCombine can fold fcmp+select patterns to minnum/maxnum intrinsics when the nnan and nsz flags are set. The ordering of the operands in both the fcmp and select instructions is important for the folding to occur. maxnum patterns: 1. (a op b) ? a : b -> maxnum(a, b), where op is one of {ogt, oge} 2. (a op b) ? b : a -> maxnum(a, b), where op is one of {ule, ult} The second pattern is supposed to make the order of the operands in the select instruction irrelevant. However, the pattern matching code uses the CmpInst::getInversePredicate method to invert the comparison predicate. This method doesn't take into account the fast-math flags, which can lead missing the folding opportunity. The patch extends the pattern matching code to handle unordered fcmp instructions. This allows the folding to occur even when the select instruction has the operands in the inverse order. New maxnum patterns: 1. (a op b) ? a : b -> maxnum(a, b), where op is one of {ugt, uge} 2. (a op b) ? b : a -> maxnum(a, b), where op is one of {ole, olt} The same changes are applied to the minnum intrinsic.	2024-10-15 22:05:16 +04:00
c8ef	47a6da2d4d	[ConstantFold] Fold `log1p` and `log1pf` when the input parameter is a constant value. (#112113 ) This patch adds support for constant folding for the `log1p` and `log1pf` libc functions.	2024-10-16 00:19:26 +08:00
Ramkumar Ramachandra	1c6c850937	InstCombine: extend select-equiv to support vectors (#111966 ) foldSelectEquivalence currently doesn't support GVN-like replacements on vector types. Put in the checks for potentially lane-crossing operations, and lift the limitation.	2024-10-15 11:10:45 +01:00
Alexey Bataev	f9bc00e4bb	[SLP]Initial support for interleaved loads Adds initial support for interleaved loads, which allows emission of segmented loads for RISCV RVV. Vectorizes extra code for RISCV CFP2006/447.dealII, CFP2006/453.povray, CFP2017rate/510.parest_r, CFP2017rate/511.povray_r, CFP2017rate/526.blender_r, CFP2017rate/538.imagick_r, CINT2006/403.gcc, CINT2006/473.astar, CINT2017rate/502.gcc_r, CINT2017rate/525.x264_r Reviewers: RKSimon, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/112042	2024-10-14 09:12:33 -04:00
Ramkumar Ramachandra	bdf241cab3	ValueTracking: handle more ops in isNotCrossLaneOperation (#112183 ) Reuse llvm::isTriviallyVectorizable in llvm::isNotCrossLaneOperation, in order to get it to handle more intrinsics. Alive2 proofs for changed tests: https://alive2.llvm.org/ce/z/XSV_GT	2024-10-14 14:08:12 +01:00
Florian Hahn	7f06d8afb0	[SCEV] Retain SCEVSequentialMinMaxExpr if an operand may trigger UB. (#110824 ) Retain SCEVSequentialMinMaxExpr if an operand may trigger UB, e.g. if there is an UDiv operand that may divide by 0 or poison PR: https://github.com/llvm/llvm-project/pull/110824	2024-10-14 13:08:49 +01:00
Ramkumar Ramachandra	c5f82f7893	ValueTracking: introduce llvm::isNotCrossLaneOperation (#112011 ) Factor out and unify common code from InstSimplify and InstCombine that partially guard against cross-lane vector operations into llvm::isNotCrossLaneOperation in ValueTracking. Alive2 proofs for changed tests: https://alive2.llvm.org/ce/z/68H4ka	2024-10-14 11:37:30 +01:00
Kazu Hirata	c9a1cffd3d	[Analysis] Simplify code with DenseMap::operator[] (NFC) (#112082 )	2024-10-12 08:04:38 -07:00
Tim Renouf	76007138f4	[LLVM] New NoDivergenceSource function attribute (#111832 ) A call to a function that has this attribute is not a source of divergence, as used by UniformityAnalysis. That allows a front-end to use known-name calls as an instruction extension mechanism (e.g. https://github.com/GPUOpen-Drivers/llvm-dialects ) without such a call being a source of divergence.	2024-10-12 09:34:45 +01:00
Teresa Johnson	1de71652fd	[MemProf] Support cloning for indirect calls with ThinLTO (#110625 ) This patch enables support for cloning in indirect callsites. This is done by synthesizing callsite records for each virtual call target from the profile metadata. In the thin link all the synthesized records for a particular indirect callsite initially share the same context node, but support is added to partition the callsites and outgoing edges based on the callee function, creating a separate node for each target. In the LTO backend, when cloning is needed we first perform indirect call promotion, then change the target of the new direct call to the desired clone. Note this is ThinLTO-specific, since for regular LTO indirect call promotion should have already occurred.	2024-10-11 13:53:35 -07:00
Shilei Tian	e34e27f198	[TTI][AMDGPU] Allow targets to adjust `LastCallToStaticBonus` via `getInliningLastCallToStaticBonus` (#111311 ) Currently we will not be able to inline a large function even if it only has one live use because the inline cost is still very high after applying `LastCallToStaticBonus`, which is a constant. This could significantly impact the performance because CSR spill is very expensive. This PR adds a new function `getInliningLastCallToStaticBonus` to TTI to allow targets to customize this value. Fixes SWDEV-471398.	2024-10-11 10:19:54 -04:00
David Sherwood	72f339de45	[LoopVectorize] Use predicated version of getSmallConstantMaxTripCount (#109928 ) There are a number of places where we call getSmallConstantMaxTripCount without passing a vector of predicates: getSmallBestKnownTC isIndvarOverflowCheckKnownFalse computeMaxVF isMoreProfitable I've changed all of these to now pass in a predicate vector so that we get the benefit of making better vectorisation choices when we know the max trip count for loops that require SCEV predicate checks. I've tried to add tests that cover all the cases affected by these changes.	2024-10-11 10:10:15 +01:00
c8ef	923566a67d	[ConstantFold] Fold `logb` and `logbf` when the input parameter is a constant value. (#111232 ) This patch adds support for constant folding for the `logb` and `logbf` libc functions.	2024-10-10 07:56:16 +08:00
Jeffrey Byrnes	853c43d04a	[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564 ) Porting to TTI provides direct access to the instruction cost model, which can enable instruction cost based sinking without introducing code duplication.	2024-10-09 14:30:09 -07:00
Paul Walker	87cdc8328d	[LLVM][ConstFolds] Verify a scalar src before attempting scalar->vector bitcast transformation. (#111149 ) It was previously safe to assume isa<Constant{Int,FP}> meant a scalar value. This is not true when use-constant-##-for-###-splat are enabled.	2024-10-08 13:28:44 +01:00

1 2 3 4 5 ...

13651 Commits