llvm-project

Author	SHA1	Message	Date
Matt Arsenault	f5c8242042	SimplifyLibCalls: Prefer to emit intrinsic in pow(2, x) -> ldexp(1, x) (#92363 )	2024-05-17 14:28:03 +02:00
David Sherwood	0ad275c158	[InstCombine] Fold vector.reduce.op(vector.reverse(X)) -> vector.reduce.op(X) (#91743 ) For all of the following reductions: vector.reduce.or vector.reduce.and vector.reduce.xor vector.reduce.add vector.reduce.mul vector.reduce.umin vector.reduce.umax vector.reduce.smin vector.reduce.smax vector.reduce.fmin vector.reduce.fmax if the input operand is the result of a vector.reverse then we can perform a reduction on the vector.reverse input instead since the answer is the same. If the reassociation is permitted we can also do the same folds for these: vector.reduce.fadd vector.reduce.fmul	2024-05-17 12:58:14 +01:00
Florian Hahn	1e7d047c71	[VPlan] Mark LoopInfo preserved in native-path as well (NFC). LoopInfo is updated during VPlan execution now, so it will also be updated correctly in the native path.	2024-05-17 12:18:01 +01:00
Shan Huang	d0e2808f80	[DebugInfo][LoopLoadElim] Fix missing debug location updates (#91839 )	2024-05-17 10:56:05 +01:00
DianQK	c79690040a	[GlobalOpt] Don't replace aliasee with alias that has weak linkage (#91483 ) Fixes #91312. Don't perform the transform if the alias may be replaced at link time.	2024-05-17 05:51:49 +08:00
Noah Goldstein	23f1047daa	[InstCombine] Fold `(icmp pred (trunc nuw/nsw X), C)` -> `(icmp pred X, (zext/sext C))` This is valid as long as the sign of the wrap flag doesn't differ from the sign of the `pred`. Proofs: https://alive2.llvm.org/ce/z/35NsrR NB: The online Alive2 hasn't been updated with `trunc nuw/nsw` support, so the proofs must be reproduced locally. Closes #87935	2024-05-16 13:03:32 -05:00
Matt Arsenault	cdb41e416a	PlaceSafepoints: Fix using default constructed TargetLibraryInfo (#92411 )	2024-05-16 17:54:26 +02:00
Jie Fu	e948da1021	[Transforms] Fix -Wunused-variable in DemoteRegToStack.cpp (NFC) llvm-project/llvm/lib/Transforms/Utils/DemoteRegToStack.cpp:58:21: error: unused variable 'BB' [-Werror,-Wunused-variable] BasicBlock *BB = SplitCriticalEdge(II, i); ^ 1 error generated.	2024-05-16 20:52:56 +08:00
Jie Fu	03d8e61391	[Transforms] Fix -Wsign-compare in DemoteRegToStack.cpp (NFC) llvm-project/llvm/lib/Transforms/Utils/DemoteRegToStack.cpp:54:23: error: comparison of integers of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare] for (int i = 0; i < CBI->getNumSuccessors(); i++) { ~ ^ ~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated.	2024-05-16 20:29:54 +08:00
XChy	fdaad73875	[Reg2Mem] Handle CallBr instructions (#90953 ) Fixes #90900	2024-05-16 20:13:39 +08:00
Matt Arsenault	0ea178b085	SimplifyLibCalls: Emit vector ldexp intrinsics in exp2->ldexp combine (#92219 ) Co-authored-by: Nikita Popov <github@npopov.com>	2024-05-16 10:24:56 +02:00
Matt Arsenault	8389177710	SimplifyLibCalls: Use IRBuilder helpers for creating intrinsics (#92288 )	2024-05-16 09:20:18 +02:00
Matt Arsenault	ce1ce5d30c	InstCombine: Try to use exp10 intrinsic instead of libcall (#92287 ) Addresses old TODO about the exp10 intrinsic not existing.	2024-05-16 09:09:02 +02:00
Nikita Popov	b4d1a606c7	[SeparateConstOffsetFromGEP] Check correct index for non-negativity We were checking the index of GEP twice, instead of checking both GEP and PtrGEP.	2024-05-16 11:59:07 +09:00
AdityaK	b42d245b77	[GVNHoist] Replace combineKnownMetadata with combineMetadataForCSE (#92197 ) There is no reason to call combineMetadata directly with a list of MD_ nodes. The combineMetadataForCSE function handles all the metadata correctly Partially fixes: #30866	2024-05-15 07:44:34 -07:00
Jie Fu	7c8176ebd3	[Coroutines] Remove unused function (NFC) llvm-project/llvm/lib/Transforms/Coroutines/CoroSplit.cpp:1223:1: error: unused function 'scanPHIsAndUpdateValueMap' [-Werror,-Wunused-function] scanPHIsAndUpdateValueMap(Instruction Prev, BasicBlock NewBlock, ^ 1 error generated.	2024-05-15 22:08:17 +08:00
Hans	3bb39690d7	[coro] Lower `llvm.coro.await.suspend.handle` to resume with tail call (#89751 ) The C++ standard requires that symmetric transfer from one coroutine to another is performed via a tail call. Failure to do so is a miscompile and often breaks programs by quickly overflowing the stack. Until now, the coro split pass tried to ensure this in the `addMustTailToCoroResumes()` function by searching for `llvm.coro.resume` calls to lower as tail calls if the conditions were right: the right function arguments, attributes, calling convention etc., and if a `ret void` was sure to be reached after traversal with some ad-hoc constant folding following the call. This was brittle, as the kind of implicit variants required for a tail call to happen could easily be broken by other passes (e.g. if some instruction got in between the `resume` and `ret`), see for example 9d1cb18d19862fc0627e4a56e1e491a498e84c71 and 284da049f5feb62b40f5abc41dda7895e3d81d72. Also the logic seemed backwards: instead of searching for possible tail call candidates and doing them if the circumstances are right, it seems better to start with the intention of making the tail calls we need, and forcing the circumstances to be right. Now that we have the `llvm.coro.await.suspend.handle` intrinsic (since f78688134026686288a8d310b493d9327753a022) which corresponds exactly to symmetric transfer, change the lowering of that to also include the `resume` part, always lowered as a tail call.	2024-05-15 15:29:08 +02:00
Pietro Ghiglio	83d9aa2768	[VPlan] Add scalar inferencing support for addrspace cast (#92107 ) Fixes https://github.com/llvm/llvm-project/issues/91434 PR: https://github.com/llvm/llvm-project/pull/92107	2024-05-15 14:03:21 +01:00
Jay Foad	1650f1b3d7	Fix typo "indicies" (#92232 )	2024-05-15 13:10:16 +01:00
Florian Hahn	d187005cad	[VPlan] Update VPBlendRecipe codegen for for first-lane only. Update VPBlendRecipe::execute to support generating code for first-lane only. This fixes a crash in the newly added test @test_not_first_lane_only_wide_compare_incoming_order_swapped.	2024-05-15 11:00:15 +01:00
Daniel Kiss	45726c1a3a	[LLVM] Make sanitizers respect the disable_santizer_instrumentation attribute. (#91732 ) `disable_sanitizer_instrumetation` is attached to functions that shall not be instrumented e.g. ifunc resolver because those run before everything is initialised. Some sanitizer already handles this attribute, this patch adds it to DataFLow and Coverage too.	2024-05-15 08:40:16 +02:00
Matt Arsenault	d7bb0723fe	InstCombine: Emit ldexp intrinsic in exp2->ldexp combine (#92039 ) Prefer to emit the intrinsic over a libcall in the intrinsic or no-math-errno case.	2024-05-15 07:41:28 +02:00
Matt Arsenault	847c83f7cc	InstCombine: Process addrspacecast uses in PointerReplacer (#91953 ) This was looking through an addrspacecast, and not finding a later unfoldable cast to another address space. Fixes improperly deleting a required alloca + memcpy and introducing an illegal addrspacecast. This also required fixing some worklist management issues with addrspacecast, and assuming that only memcpy sources could need replacement. Regresses one test function, but this looks like it optimized before by accident. It never saw the pointer use by the call to readonly_callee, which should require insertion of a new cast. Fixes #68120	2024-05-15 07:02:31 +02:00
Nikita Popov	71fbbb69d6	[IR] Move GlobalValue::getGUID() out of line (NFC) Avoid including MD5.h in a core IR header.	2024-05-15 10:49:25 +09:00
AtariDreams	4d1ecf1923	[Transforms] Preserve inbounds attribute of transformed GEPs when flattening loops (#86961 ) When flattening the loop, if the GEP was inbound, it should stay inbound, because the only thing that changed is how the pointers are calculated, not the elements being accessed. Proof: https://alive2.llvm.org/ce/z/dApMpQ	2024-05-15 10:26:23 +09:00
Philip Reames	baca93fc83	[LSR] Tweak debug output to always print initial cost	2024-05-14 13:34:20 -07:00
Florian Hahn	67d840b60f	[VPlan] Relax over-aggressive assertion in VPTransformState::get(). There are cases where a vector value has some users that demand the the single scalar value only (NeedsScalar), while other users demand the vector value (see attached test cases). In those cases, the NeedsScalar users should only demand the first lane. Fixes https://github.com/llvm/llvm-project/issues/91883.	2024-05-14 19:10:49 +01:00
Mingming Liu	6c8ebc0535	[NFC][CallPromotionUtils]Extract a helper function versionCallSiteWithCond from versionCallSite (#81181 ) * This is to be used by https://github.com/llvm/llvm-project/pull/81378 to implement a variant of versionCallSite that compares vtables. * The parent patch is https://github.com/llvm/llvm-project/pull/81051	2024-05-14 10:13:57 -07:00
Graham Hunter	2b15c4a62b	[AArch64] Postcommit fixes for histogram intrinsic (#92095 ) A buildbot with expensive checks enabled flagged some problems with my patch. There was also a post-commit nit on the langref changes.	2024-05-14 15:16:42 +01:00
AdityaK	bf7a0f9958	Fix incorrect codegen with respect to GEPs #85333 (#92047 ) As mentioned in #68882 and https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699 Gep arithmetic isn't consistent with different types. GVNSink didn't realize this and sank all geps as long as their operands can be wired via PHIs in a post-dominator. Fixes: #85333 Reapply: #88440 after fixing the non-determinism issues in #90995	2024-05-14 06:13:11 -07:00
Florian Hahn	b1e99a699d	[LV] Drop redundant comment from createEdgeMask (NFC). Follow-up to remove a redundant comment post-commit https://github.com/llvm/llvm-project/pull/91897	2024-05-14 12:43:47 +01:00
Ramkumar Ramachandra	d7ef34bfe3	[LV] update comment following 63d8058 (NFC) (#91120 ) Address a review comment post landing 63d8058 (LoopVectorize: guard appending InstsToScalarize; fix bug) to update a comment.	2024-05-14 10:59:26 +01:00
Florian Hahn	632317e9ab	[VPlan] Add non-poison propagating LogicalAnd VPInstruction opcode. (#91897 ) Add a new opcode to mode non-poison propagating logical AND operations used when generating edge masks. This follows the similar decision to model Not as dedicated opcode as well, to improve clarity. This also helps to simplify the matchers for https://github.com/llvm/llvm-project/pull/89386. PR: https://github.com/llvm/llvm-project/pull/91897	2024-05-14 09:42:49 +01:00
Lei Wang	5b6f151104	[SampleFDO] Improve stale profile matching by diff algorithm (#87375 ) This change improves the matching algorithm by using the diff algorithm, the current matching algorithm only processes the callsites grouped by the same name functions, it doesn't consider the order relationships between different name functions, this sometimes fails to handle this ambiguous anchor case. For example. (`Foo:1` means a calliste[callee_name: callsite_location]) ``` IR : foo:1 bar:2 foo:4 bar:5 Profile : bar:3 foo:5 bar:6 ``` The `foo:1` is matched to the 2nd `foo:5` and using the diff algorithm(finding longest common subsequence ) can help on this issue. One well-known diff algorithm is the Myers diff algorithm(paper "An O(ND) Difference Algorithm and Its Variations∗" Eugene W. Myers), its variations have been implemented and used in many famous tools, like the GNU diff or git diff. It provides an efficient way to find the longest common subsequence or the shortest edit script through graph searching. There are several variations/refinements for the algorithm, but as in our case, the num of function callsites is usually very small, so we implemented the basic greedy version in this change which should be good enough. We observed better matchings and positive perf improvement on our internal services.	2024-05-13 16:01:29 -07:00
Florian Hahn	e122380445	[LV] Use VPBuilder to create Select (NFCI).	2024-05-13 20:44:39 +01:00
chenlin	79643565a8	[LoopUnroll] Remove redundant debug instructions after blocks have been merged (#91246 ) Remove redundant debug instructions after blocks have been merged into the predecessor, It can reduce some compile time in some cases. This change only fixes the situation of loop unrolling, and other situations are not considered. "RemoveRedundantDbgInstrs" seems to be very time-consuming. Thus, we just add here after the "Dest" has been merged into the "Fold", this may be a more targeted solution!!! fixes: https://github.com/llvm/llvm-project/issues/89073	2024-05-13 09:42:04 -07:00
Paul Kirth	89a080cb79	[llvm][NFC] Document cl::opt MisExpectTolerance and fix typo Pull Request: https://github.com/llvm/llvm-project/pull/90670	2024-05-13 16:19:09 +00:00
Matt Arsenault	8823abea6f	InstCombine: Simplify vector initialization	2024-05-13 13:59:45 +02:00
Orlando Cazalet-Hyams	91d7ca904c	[DebugInfo] Remap extracted DIAssignIDs in hotcoldsplit (#91940 ) Fix #91814 When instructions are extracted into a new function the `DIAssignID` metadata uses and attachments need to be remapped so that the stores and assignment markers don't link to stores and assignment markers in the original function. This matches existing inlining behaviour for DIAssignIDs.	2024-05-13 12:49:42 +01:00
Matt Arsenault	c5b0da9d83	InstCombine: Preserve inbounds in PointerReplacer (#91735 ) This avoids spurious test changes in a future commit.	2024-05-13 13:49:09 +02:00
Graham Hunter	fbb37e9606	[AArch64] Add an all-in-one histogram intrinsic Based on discussion from https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788 Current interface is: llvm.experimental.histogram(<vecty> ptrs, <intty> inc_amount, <vecty> mask) The integer type used by 'inc_amount' needs to match the type of the buckets in memory. The intrinsic covers the following operations: * Gather load * histogram on the elements of 'ptrs' * multiply the histogram results by 'inc_amount' * add the result of the multiply to the values loaded by the gather * scatter store the results of the add Supports lowering to histcnt instructions for AArch64 targets, and scalarization for all others at present.	2024-05-13 11:35:28 +01:00
Yingwei Zheng	b5f4210e9f	[InstCombine] Drop nuw flag when CtlzOp is a sub nuw (#91776 ) See the following case: ``` define i32 @src1(i32 %x) { %dec = sub nuw i32 -2, %x %ctlz = tail call i32 @llvm.ctlz.i32(i32 %dec, i1 false) %sub = sub nsw i32 32, %ctlz %shl = shl i32 1, %sub %ugt = icmp ult i32 %x, -2 %sel = select i1 %ugt, i32 %shl, i32 1 ret i32 %sel } define i32 @tgt1(i32 %x) { %dec = sub nuw i32 -2, %x %ctlz = tail call i32 @llvm.ctlz.i32(i32 %dec, i1 false) %sub = sub nsw i32 32, %ctlz %and = and i32 %sub, 31 %shl = shl nuw i32 1, %and ret i32 %shl } ``` `nuw` in `%dec` should be dropped after the select instruction is eliminated. Alive2: https://alive2.llvm.org/ce/z/7S9529 Fixes https://github.com/llvm/llvm-project/issues/91691.	2024-05-13 14:27:59 +08:00
Kazu Hirata	e6785fd752	[Scalar] Fix a warning This patch fixes: llvm/lib/Transforms/Scalar/GVNSink.cpp:270:33: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture] While I am at it, this patch replaces llvm::for_each with a range-based for loop.	2024-05-12 23:02:37 -07:00
AdityaK	abe3c5ac19	[GVNSink] Fix non-determinisms by using a deterministic ordering (#90995 ) GVNSink used to order instructions based on their pointer values and was prone to non-determinism because of that. This patch ensures all the values stored are using a deterministic order. I have also added a verfier(`ModelledPHI::verifyModelledPHI`) to assert when ordering isn't preserved. Additionally, I have added a test case (mirror graph image of an existing test) that would have failed before this patch. Fixes: #77852	2024-05-12 19:41:54 -07:00
David Green	b7ed097f29	[VectorCombine] Add intrinsics handling to shuffleToIdentity (#91000 ) This is probably the most involved addition, as it tries to make use of isTriviallyVectorizable with isVectorIntrinsicWithScalarOpAtArg to handle a number of different intrinsics that are all lane-wise. Additional tests have been added for some of the different intrinsics from isVectorIntrinsicWithScalarOpAtArg / isVectorIntrinsicWithOverloadTypeAtArg.	2024-05-12 20:31:11 +01:00
Shan Huang	cdd782183d	[DebugInfo][LICM] Fix missing debug location updates (#91729 )	2024-05-11 16:26:04 +01:00
Shan Huang	3773191fc4	[DebugInfo][JumpThreading] Fix missing debug location updates (#91581 )	2024-05-11 16:10:00 +01:00
Alex Bradbury	3be8e2c95d	[InstCombine] Prefer to keep power-of-2 constants when combining ashr exact and slt/ult of a constant (#86111 ) We have flexibility in what constant to use when combining an `ashr exact` with a slt or ult of a constant, and it's not possible to revisit this decision later in the compilation pipeline after the `ashr exact` is removed. Keeping a constant close to power-of-2 (pow2val + 1) should be no worse than neutral, and in some cases may allow better codegen later on for targets that can more cheaply generate power of 2 (which may be selectable if converting back to setle/setge) or near power of 2 constants. Alive2 proofs: <https://alive2.llvm.org/ce/z/2BmPnq> and <https://alive2.llvm.org/ce/z/DtuhnR>	2024-05-10 13:50:03 +01:00
Florian Hahn	28767afd53	[LAA] Support backward dependences with non-constant distance. (#91525 ) Following up to 933f49248, also update the code reasoning about backwards dependences to support non-constant distances. Update the code to use the signed minimum distance instead of a constant distance This means e checked the lower bound of the dependence distance and the distance may be larger at runtime (and safe for vectorization). Whether to classify it as Unknown or Backwards depends on the vector width and LAA was updated to take TTI to get the maximum vector register width. If the minimum dependence distance is larger than the max vector width, we consider it as backwards-vectorizable. Otherwise we classify them as Unknown, so we re-try with runtime checks. PR: https://github.com/llvm/llvm-project/pull/91525	2024-05-10 11:47:13 +01:00
Graham Hunter	2e8d815596	[TTI] Support scalable offsets in getScalingFactorCost (#88113 ) Part of the work to support vscale-relative immediates in LSR.	2024-05-10 11:22:11 +01:00

1 2 3 4 5 ...

36407 Commits