llvm-project

Author	SHA1	Message	Date
Thurston Dang	d398f476c5	[msan] Rename '-msan-dump-strict-intrinsics' to '-msan-dump-heuristic-instructions' (#143186 ) This updates the flag from https://github.com/llvm/llvm-project/pull/123381 Also expands the description of msan-dump-strict-instructions	2025-06-06 13:06:00 -07:00
Alex MacLean	107601ed06	[InstCombine] Allow min/max in constant BOp min/max folding (#142878 ) Extend folding for `X Pred C2 ? X BOp C1 : C2 BOp C1` to `min/max(X, C2) BOp C1` to allow min and max as `BOp`. This ensures a constant clamping pattern is folded into a pair of min/max instructions. Here is a simplified example of a case where this folding is not occurring currently. int clampToU8(int v) { if (v < 0) return 0; if (v > 255) return 255; return v; } https://godbolt.org/z/78jhKPWbv Generic proof: https://alive2.llvm.org/ce/z/cdpLYy	2025-06-06 12:44:04 -07:00
Peter Collingbourne	e3c72e1075	LowerTypeTests: Shrink check size by 1 instruction on x86. We currently generate code like this on x86 for a jump table with 5 elements, assuming the call target is in rbx: lea global_addr(%rip), %rax # initialize temporary rax with base address mov %rbx, %rcx # initialize another temporary rcx for index (rbx will be used for the call, so it is still live) sub %rax, %rcx # compute `address - base` ror $0x3, %rcx # compute `(address - base) ror 3` i.e. index cmp $0x4, %rcx # check index <= 4 ja .Ltrap [...] .Ltrap: ud1 A more efficient instruction sequence, that only needs one temporary register and one fewer instruction, is possible by subtracting the address we are testing from the fixed address instead of vice versa: lea (global_addr + 4*8)(%rip), %rax # initialize temporary rax with address of last element sub %rbx, %rax # compute `last element - address` ror $0x3, %rax # compute `(last element - address) ror 3` i.e. 4 - index cmp $0x4, %rax # check 4 - index <= 4 (same as above) ja .Ltrap [...] .Ltrap: ud1 Change LowerTypeTests to generate that sequence. As a consequence, the order of bits in the bitsets is reversed. Because it doesn't matter how we do the subtraction on other architectures (to the best of my knowledge), do so unconditionally. Reviewers: fmayer, vitalybuka Reviewed By: fmayer Pull Request: https://github.com/llvm/llvm-project/pull/142887	2025-06-06 12:43:24 -07:00
vporpo	47d9473e49	[SandboxVec][BottomUpVec] Fix ownership of Legality (#143018 ) Fix the ownership of `Legality` member variable of BottomUpVec. It should get created in runOnFunction() and get destroyed when the function returns.	2025-06-06 12:21:25 -07:00
Kazu Hirata	445974547d	[llvm] Ensure newline at the end of files (NFC) (#143061 ) Without newlines at the end, git diff would display: No newline at end of file	2025-06-05 22:58:15 -07:00
Kazu Hirata	ad6631fb0d	[SCCP] Directly call SCCPSolver::isOverdefined (NFC) (#143059 ) We don't need a lambda here.	2025-06-05 22:58:10 -07:00
Yingwei Zheng	4eac8daa38	[LoopPeel] Handle non-local instructions/arguments when updating exiting values (#142993 ) Similar to `7e14161f49`, the exiting value may be a non-local instruction or an argument. Closes https://github.com/llvm/llvm-project/issues/142895.	2025-06-06 12:56:28 +08:00
Teresa Johnson	b58b3e1d36	[MemProf] Add dot graph dumping immediately after stack node update (#143025 ) To aid in debugging, (optionally) dump the dot graph immediately after the stack update phase (which matches nodes to interior callsites) and before we cleanup mismatched callee edges (either via tail call fixup, indirect call fixup, or nulling otherwise).	2025-06-05 13:49:11 -07:00
Florian Hahn	01b9828a66	[VPlan] Remove unneeded friend classes from VPValue (NFC). None of the removed classes makes use of the friendship relationship.	2025-06-05 21:40:21 +01:00
Peter Collingbourne	0a85b31a81	LowerTypeTests: Fix UAF.	2025-06-05 12:33:13 -07:00
Jon Roelofs	7b2ac8ff54	[Matrix] Pass ShapeInfo to Visit* methods (NFC). (#142487 ) They all require it now.	2025-06-05 11:22:17 -07:00
Peter Collingbourne	b88e8cceb9	LowerTypeTests: Avoid zext of ptrtoint ConstantExpr. In the LowerTypeTests pass we used to create IR like this: %3 = zext i8 ptrtoint (ptr @__typeid_allones7_align to i8) to i64 %4 = lshr i64 %2, %3 %5 = zext i8 sub (i8 64, i8 ptrtoint (ptr @__typeid_allones7_align to i8)) to i64 %6 = shl i64 %2, %5 %7 = or i64 %4, %6 This is because when this code was originally written there were no funnel shifts and as I recall it was necessary to create an i8 and zext to pointer width (instead of just having a ptrtoint of pointer width) in order for the shl/shr/or to be pattern matched to ror. At the time this caused no problems because there existed a zext ConstantExpr. But after zext ConstantExpr was removed in #71040, the newly present zext instruction can prevent pattern matching the rotate, for example if the zext gets hoisted to a loop preheader or common ancestor of the check. LowerTypeTests was made to use fshr in #141735 so now we can ptrtoint to pointer width and stop creating the zext. Reviewers: fmayer, nikic Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/142886	2025-06-05 11:10:35 -07:00
Peter Collingbourne	3fa231f47c	Add SimplifyTypeTests pass. This pass figures out whether inlining has exposed a constant address to a lowered type test, and remove the test if so and the address is known to pass the test. Unfortunately this pass ends up needing to reverse engineer what LowerTypeTests did; this is currently inherent to the design of ThinLTO importing where LowerTypeTests needs to run at the start. Reviewers: teresajohnson Reviewed By: teresajohnson Pull Request: https://github.com/llvm/llvm-project/pull/141327	2025-06-05 11:09:20 -07:00
Peter Collingbourne	d1b0b4bb44	Add -funique-source-file-identifier option. This option complements -funique-source-file-names and allows the user to use a different unique identifier than the source file path. Reviewers: teresajohnson Reviewed By: teresajohnson Pull Request: https://github.com/llvm/llvm-project/pull/142901	2025-06-05 10:52:01 -07:00
Florian Hahn	eb83c43fe9	[Matrix] Don't update Changed based on Visit* return value (NFC). (#142417 ) Visit* are always modifying the IR, remove the boolean result. Depends on https://github.com/llvm/llvm-project/pull/142416. PR: https://github.com/llvm/llvm-project/pull/142417	2025-06-05 17:54:06 +01:00
Vasileios Porpodas	79861d2db7	Reapply "[SandboxVec] Add a simple pack reuse pass (#141848 )" This reverts commit 31abf0774232735ad7a7d45e531497305bf99fae.	2025-06-05 09:14:17 -07:00
Snehasish Kumar	16c7b3c9f5	[MemProf] Split MemProfiler into Instrumentation and Use. (#142811 ) Most of the recent development on the MemProfiler has been on the Use part. The instrumentation has been quite stable for a while. As the complexity of the use grows (with undrifting, diagnostics etc) I figured it would be good to separate these two implementations.	2025-06-05 07:36:50 -07:00
Ryotaro Kasuga	1e5f7f64b0	[LoopInterchange] Handle confused dependence correctly (#140709 ) This patch fixes the handling of a confused `Dependence` object. Such an object doesn’t contain any information about dependencies, so we must process it conservatively. However, it was converted into a direction vector like `[I I ... I]`. As a result, it was treated as if there are no loop-carried dependencies, which can lead to illegal loop exchanges. Fixes #140238	2025-06-05 16:44:02 +09:00
Florian Hahn	2e337349f4	[VPlan] Remove unnecessary DomTreeUpdater flush (NFC). The current version does not need the explicit flush at this point.	2025-06-05 08:17:42 +01:00
Vasileios Porpodas	31abf07742	Revert "[SandboxVec] Add a simple pack reuse pass (#141848 )" This reverts commit 1268352656f81ea173860a8002aadb88844137e7.	2025-06-04 14:24:49 -07:00
vporpo	1268352656	[SandboxVec] Add a simple pack reuse pass (#141848 ) This patch implements a simple pass that tries to de-duplicate packs. If there are two packing patterns inserting the exact same values in the exact same order, then we will keep the top-most one of them. Even though such patterns may be optimized away by subsequent passes it is still useful to do this within the vectorizer because otherwise the cost estimation may be off, making the vectorizer over conservative.	2025-06-04 14:12:06 -07:00
Teresa Johnson	3ec2de2753	[MemProf] Optionally save context size info on largest cold allocations (#142837 ) Reapply PR142507 with fix for test: add in the same x86_64-linux requirement as other tests as the stack ids are currently computed differently on big endian systems. This will be investigated separately. In order to allow selective reporting of context hinting during the LTO link, and in the future to allow selective more aggressive cloning, add an option to specify a minimum percent of the max cold size in the profile summary. Contexts that meet that threshold will get context size info metadata (and ThinLTO summary information) on the associated allocations. Specifying -memprof-report-hinted-sizes during the pre-LTO compile step will continue to cause all contexts to receive this metadata. But specifying -memprof-report-hinted-sizes only during the LTO link will cause only those that meet the new threshold and have the metadata to get reported. To support this, because the alloc info summary and associated bitcode requires the context size information to be in the same order as the other context information, 0s are inserted for contexts without this metadata. The bitcode writer uses a more compact format for the context ids to allow better compression of the 0s. As part of this change several helper methods are added to query whether metadata contains context size info on any or all contexts.	2025-06-04 13:08:56 -07:00
PiJoules	f32f048719	[llvm] Use ABI instead of preferred alignment for const prop checks (#142500 ) We'd hit an assertion checking proper alignment for an i8 when building chromium because we used the prefered alignment (which is 4 bytes) instead of the ABI alignment (which is 1 byte). The ABI alignment should be used because that's the actual alignment needed to load a constant from the vtable. This also updates the two `virtual-const-prop-small-alignment-*` to explicitly give ABI alignments for i64s.	2025-06-04 10:57:59 -07:00
John Brawn	81d3189891	[LAA] Keep pointer checks on partial analysis (#139719 ) Currently if there's any memory access that AccessAnalysis couldn't analyze then all of the runtime pointer check results are discarded. This patch makes this able to be controlled with the AllowPartial option, which makes it so we generate the runtime check information for those pointers that we could analyze, as transformations may still be able to make use of the partial information. Of the transformations that use LoopAccessAnalysis, only LoopVersioningLICM changes behaviour as a result of this change. This is because the others either: * Check canVectorizeMemory, which will return false when we have partial pointer information as analyzeLoop() will return false. * Examine the dependencies returned by getDepChecker(), which will be empty as we exit analyzeLoop if we have partial pointer information before calling areDepsSafe(), which is what fills in the dependency information.	2025-06-04 16:47:20 +01:00
Yingwei Zheng	519cb460f6	[SCCP] Remove masking operations (#142736 ) CVP version: `2d5820cd72` Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=3ec0c5c7fef03985b43432c6b914c289d8a5435e&to=92b4df90695dd37defdabf8a30f0b0322b648a00&stat=instructions:u	2025-06-04 22:31:08 +08:00
clubby789	8ed3cb0e64	[DSE] Fix uninitialized variable (#142768 ) Introduced by accident in #138299 (https://lab.llvm.org/buildbot/#/builders/164/builds/10604)	2025-06-04 15:00:28 +02:00
Yingwei Zheng	5e2dcfe42c	[InstCombine] Avoid infinite loop in `foldSelectValueEquivalence` (#142754 ) Before this patch, InstCombine hung because it replaced a value with a more complex one: ``` %sel = select i1 %cmp, i32 %smax, i32 0 -> %sel = select i1 %cmp, i32 %masked, i32 0 -> %sel = select i1 %cmp, i32 %smax, i32 0 -> ... ``` This patch makes this replacement more conservative. It only performs the replacement iff the new value is one of the operands of the original value. Closes https://github.com/llvm/llvm-project/issues/142405.	2025-06-04 19:42:56 +08:00
Yingwei Zheng	e2c698c7e8	[InstCombine] Fix miscompilation in `sinkNotIntoLogicalOp` (#142727 ) Consider the following case: ``` define i1 @src(i8 %x) { %cmp = icmp slt i8 %x, -1 %not1 = xor i1 %cmp, true %or = or i1 %cmp, %not1 %not2 = xor i1 %or, true ret i1 %not2 } ``` `sinkNotIntoLogicalOp(%or)` calls `freelyInvert(%cmp, /IgnoredUser=/%or)` first. However, as `%cmp` is also used by `Op1 = %not1`, the RHS of `%or` is set to `%cmp.not = xor i1 %cmp, true`. Thus `Op1` is out of date in the second call to `freelyInvert`. Similarly, the second call may change `Op0`. According to the analysis above, I decided to avoid this fold when one of the operands is also a user of the other. Closes https://github.com/llvm/llvm-project/issues/142518.	2025-06-04 17:48:01 +08:00
Florian Hahn	370d01765c	[Matrix] Use shape info for StoreInst directly. (#142664 ) ShapeInfo for the store operand may be dropped, e.g. because the operand got folded by transpose optimizations to another instruction w/o shape info. This was exposed by the assertion added in https://github.com/llvm/llvm-project/pull/142416. This updates VisitStore to use the shape-info directly from the instruction, which is in line with the other Visit* functions and ensures that we won't lose shape info. PR: https://github.com/llvm/llvm-project/pull/142664	2025-06-04 09:15:57 +01:00
clubby789	c7c79d2590	[IR][DSE] Support non-malloc functions in malloc+memset->calloc fold (#138299 ) Add a `alloc-variant-zeroed` function attribute which can be used to inform folding allocation+memset. This addresses https://github.com/rust-lang/rust/issues/104847, where LLVM does not know how to perform this transformation for non-C languages. Co-authored-by: Jamie <jamie@osec.io>	2025-06-04 09:35:20 +02:00
Nikita Popov	9a0197c3a4	[EarlyCSE] Check attributes for commutative intrinsics (#142610 ) Commutative intrinsics go through a separate code path, which did not check for attribute compatibility, resulting in a later assertion failure. Fixes https://github.com/llvm/llvm-project/issues/142462.	2025-06-04 09:04:27 +02:00
Yingwei Zheng	7e1fa09ce2	[SimplifyCFG] Bail out on vector GEPs in `passingValueIsAlwaysUndefined` (#142526 ) Closes https://github.com/llvm/llvm-project/issues/142522.	2025-06-04 12:37:30 +08:00
Vitaly Buka	3531cc1cc7	[PromoteMem2Reg] Optimize memory usage in PromoteMem2Reg (#142474 ) When BasicBlock has a large number of allocas, and successors, we had to copy entire IncomingVals and IncomingLocs vectors for successors. Also updates to IncomingVals and IncomingLocs are infrequent (only Load/Store into alloca affect arrays). Given the nature of DFS traversal, instead of copying the entire vector, we can keep track of the changes and undo all changes done by successors. Fixes #142461 On the attached to issue #142461 IR RSS drops from 35Gb to 1.8Gb. But it does not affect compile time on average https://llvm-compile-time-tracker.com/compare.php?from=2e98ed8caa0b47ee79af4ad24b5436a89fe49dfa&to=effac6d1fd600e544f8bc21382c7e541973b1378&stat=instructions:u	2025-06-03 19:34:36 -07:00
Teresa Johnson	6c1091ea3f	Revert "[MemProf] Optionally save context size info on largest cold allocations" (#142688 ) Reverts llvm/llvm-project#142507 due to buildbot failures that I will look into tomorrow.	2025-06-03 16:05:16 -07:00
Vitaly Buka	ac4893dd77	[NFC][PromoteMem2Reg] Move IncomingVals, IncomingLocs, Worklist into class (#142468 ) They are all DFS state related, as `Visited`. But `Visited` is already a class member, so we make things more consistent and less parameters to pass around. By itself, the patch has little value, but it simplifies stuff in the #142474. For #142461	2025-06-03 14:47:45 -07:00
Teresa Johnson	f2adae5780	[MemProf] Optionally save context size info on largest cold allocations (#142507 ) In order to allow selective reporting of context hinting during the LTO link, and in the future to allow selective more aggressive cloning, add an option to specify a minimum percent of the max cold size in the profile summary. Contexts that meet that threshold will get context size info metadata (and ThinLTO summary information) on the associated allocations. Specifying -memprof-report-hinted-sizes during the pre-LTO compile step will continue to cause all contexts to receive this metadata. But specifying -memprof-report-hinted-sizes only during the LTO link will cause only those that meet the new threshold and have the metadata to get reported. To support this, because the alloc info summary and associated bitcode requires the context size information to be in the same order as the other context information, 0s are inserted for contexts without this metadata. The bitcode writer uses a more compact format for the context ids to allow better compression of the 0s. As part of this change several helper methods are added to query whether metadata contains context size info on any or all contexts.	2025-06-03 14:20:38 -07:00
Craig Topper	96336b2330	[AggressiveInstCombine] Improve popcount matching if the input has known zero bits (#142501 ) If the input has known zero bits, InstCombine may have simplied one of the expected And masks. Teach AggressiveInstCombine to use MaskedValueIsZero to recover these missing bits. Fixes #142042.	2025-06-03 12:56:53 -07:00
Vitaly Buka	3cb967a2cd	[NFCI][PromoteMem2Reg] Don't handle the first successor out of order (#142464 ) Just for consistency, to avoid confusing conditions. `reverse` helps to avoid tests updates as nothing is changing for for successors count <=2. For #142461	2025-06-03 10:26:55 -07:00
Vitaly Buka	b9dec5aa79	[NFC] Remove goto in PromoteMem2Reg::RenamePass (#142454 ) 'goto' is essentially a shortcut for push/pop for worklist. It can be expensive if we copy vectors, but if we move them, it should not be an issue. Without 'goto' it's easier to reason about the code, when `PromoteMem2Reg::RenamePass` processes exactly one edge at a time. There is out of order processing of the first successor, I keep it just to make this patch pure NFC. I'll remove this in follow up patches. For #142461	2025-06-03 09:14:04 -07:00
Ramkumar Ramachandra	b40e4ceaa6	[ValueTracking] Make Depth last default arg (NFC) (#142384 ) Having a finite Depth (or recursion limit) for computeKnownBits is very limiting, but is currently a load-bearing necessity, as all KnownBits are recomputed on each call and there is no caching. As a prerequisite for an effort to remove the recursion limit altogether, either using a clever caching technique, or writing a easily-invalidable KnownBits analysis, make the Depth argument in APIs in ValueTracking uniformly the last argument with a default value. This would aid in removing the argument when the time comes, as many callers that currently pass 0 explicitly are now updated to omit the argument altogether.	2025-06-03 17:12:24 +01:00
Ramkumar Ramachandra	6716d4eaa8	[LV] Prefer DenseMap::lookup over find (NFC) (#141809 ) Apart from the stylistic improvement, lookup has the nice property of returning a default-constructed object on failure-to-find, while find returns the end iterator, which cannot be dereferenced.	2025-06-03 14:37:19 +01:00
Florian Hahn	5520ab3d50	[VPlan] Add ComputeAnyOfResult VPInstruction (NFC) (#141932 ) Add a dedicated opcode for any-of reduction, similar to https://github.com/llvm/llvm-project/pull/132689 and https://github.com/llvm/llvm-project/pull/132690. The patch also explictly adds the start value to not require RecurrenceDescriptor during execute. It also allows freezing the start value to make it poison-safe. PR: https://github.com/llvm/llvm-project/pull/141932	2025-06-03 14:33:53 +01:00
Luke Lau	ddfeecf4c5	[VPlan] Convert to concrete recipes before dissolving loop regions. NFCI (#141999 ) After updating #118638 on tip of tree, expanding VPWidenIntOrFpInductionRecipes fails because it needs the loop region to get the latch to insert the increment into: VPBasicBlock ExitingBB = Plan->getVectorLoopRegion()->getExitingBasicBlock(); Builder.setInsertPoint(ExitingBB, ExitingBB->getTerminator()->getIterator()); auto Next = Builder.createNaryOp(AddOp, {Prev, Inc}, Flags, WidenIVR->getDebugLoc(), "vec.ind.next"); However after #117506, the region is dissolved so it doesn't work. This shuffles the dissolveLoopRegions steps to be after convertToConcreteRecipes so we can use the region when expanding VPWidenIntOrFpInductionRecipes	2025-06-03 12:05:13 +01:00
Weibo He	038dc2c63b	[CoroSplit] Always erase lifetime intrinsics for spilled allocas (#142551 ) If the control flow between `lifetime.start` and `lifetime.end` is too complex, it is acceptable to give up the optimization opportunity and collect the alloca to the frame. However, storing to the frame will lengthen the lifetime of the alloca, and the sanitizer will complain. I propose we always erase lifetime intrinsics of spilled allocas. Fix #124612 --------- Co-authored-by: Chuanqi Xu <yedeng.yd@linux.alibaba.com>	2025-06-03 18:52:41 +08:00
Florian Hahn	2eab83f618	[VPlan] Remove CanonicalIV when dissolving loop regions (NFC). (#142372 ) Directly replace the canonical IV when we dissolve the containing region. That ensures that it won't get removed before the region gets removed, which would result in an invalid region. This removes the current ordering constraint between convertToConcreteRecipes and dissolving regions. PR: https://github.com/llvm/llvm-project/pull/142372	2025-06-03 10:05:28 +01:00
Kazu Hirata	54d836a080	[llvm] Use *Set::insert_range (NFC) (#138237 )	2025-06-02 19:48:13 -07:00
Usama Hameed	cc400d4417	[HWASan][bugfix] Fix kernel check in ShadowMapping::init (#142226 ) The function currently checks for the command line argument only to check if compiling for kernel. This is incorrect as the setting can also be passed programatically.	2025-06-02 10:39:15 -07:00
Florian Hahn	adba40e188	[Matrix] Assert that there's shapeinfo in Visit* (NFC). (#142416 ) We should only call Visit* for instructions with shape info. Turn early exit into assert. PR: https://github.com/llvm/llvm-project/pull/142416	2025-06-02 18:18:42 +01:00
Florian Hahn	11713e86b0	[LV] Move VPlan-based calculateRegisterUsage to VPlanAnalysis (NFC). (#135673 ) Move VPlan-based calculateRegisterUsage from LoopVectorize to VPlanAnalysis.cpp. It is a VPlan-based analysis and this helps to reduce the size of LoopVectorize. PR: https://github.com/llvm/llvm-project/pull/135673	2025-06-02 17:40:50 +01:00
Kazu Hirata	c261bb7649	[memprof] Deduplicate alloc site matches (#142334 ) With: commit 2425626d803002027cbf71c39df80cb7b56db0fb Author: Kazu Hirata <kazu@google.com> Date: Sun Jun 1 08:09:58 2025 -0700 we print out a lot of duplicate alloc site matches. This patch partially reverts the patch above. The core idea of using a map to deduplicate entries remains the same, but details are different. Specifically: - This PR uses the [FullStackID, MatchLength] as the key, where MatchLength is the length of an alloc site match. - AllocMatchInfo in this PR no longer has Matched because we always report matches. - AllocMatchInfo in this PR no longer has NumFramesMatched because it has become part of the key. This deduplication roughly halves the amount of messages printed out.	2025-06-02 07:59:34 -07:00

1 2 3 4 5 ...

40045 Commits