llvm-project

Author	SHA1	Message	Date
Teresa Johnson	edb7f6c0da	[MemProf] Add more assertion checking to the edge removal helper (#125017 ) Check a few unexpected cases (edge already removed, edge not in its caller or callee edge lists).	2025-01-29 19:23:35 -08:00
Teresa Johnson	6c3bf34114	[MemProf] Fix summary identification for imported locals (#124659 ) When we apply cloning decisions in the ThinLTO backend, we need to find the corresponding summary for each function in the IR, and in some cases for callee functions. This is complicated when the function was a promoted local, in which case the GUID was formed from the hash of the original source file prepended to the function name. Those functions can be identified by the fact that they were given a ".llvm." suffix during promotion. We previously didn't do this correctly for promoted locals imported from other modules, as we only tried the current module source name. This led to crashes, in particular when the current module also had an local function of the same original name. In particular, we were attempting to iterate through the wrong summary's callsites, and there were fewer than in the actual function so we accessed data off the end (in a release build with assertion checking off - with assertion checking on we double check the stack ids and that would have failed). Even if we hadn't crashed or hit an assert, we could have applied the wrong cloning decisions, leading to unsats at link time. Luckily, function importing attaches thinlto_src_file metadata containing the original source file name to all imported functions. It normally doesn't do this by default, however, it always does if MemProf context disambiguation is enabled. Therefore, we can just look to see if the function contains this metadata and if so use it to recreate the original GUID. A similar issue can occur when looking for the ValueInfo / GUID of a direct tail call to see if we synthesized a callsite record for a missing tail call frame. In that case, the callee function may be a declaration, if we imported its caller but not the callee function definition. Because imported declarations don't get the thinlto_src_file metadata, we instead look at its caller (which works because this happens very early in the backend before any inlining).	2025-01-29 18:22:14 -08:00
Alex MacLean	de7438e472	[NVPTX] Auto-Upgrade some nvvm.annotations to attributes (#119261 ) Add a new AutoUpgrade function to convert some legacy nvvm.annotations metadata to function level attributes. These attributes are quicker to look-up so improve compile time and are more idiomatic than using metadata which should not include required information that changes the meaning of the program. Currently supported annotations are: - !"kernel" -> ptx_kernel calling convention - !"align" -> alignstack parameter attributes (return not yet supported)	2025-01-29 16:27:27 -08:00
vporpo	e094c0fa67	[SandboxVec][Legality] Don't vectorize when instructions repeat (#124479 ) This patch adds a legality check that checks for repeated instrs in a bundle and won't vectorize if such pattern is found.	2025-01-29 15:54:15 -08:00
Kazu Hirata	774b12c4a0	[memprof] Initialize AllocInfoIter and CallSitesIter (NFC) (#124972 ) This patch initializes AllocInfoIter and CallSitesIter to their respective end(). I'm doing this not because I'm worried about uninitialized iterators, but because the resulting code looks shorter and makes it clear which data structure each iterator is associated with.	2025-01-29 14:31:00 -08:00
Teresa Johnson	8a86e6aefe	[MemProf] Constify a couple of methods used during cloning (#124994 ) This also helps ensure we don't inadvartently create map entries by forcing use of at() instead of operator[].	2025-01-29 14:18:11 -08:00
Simon Pilgrim	5921295dca	Revert "[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis." (#124962 ) Reverts llvm/llvm-project#124129 as its currently causing a regression at #124499 - avoids the regression until a proper fix can be added to getSpillCost	2025-01-29 22:17:53 +00:00
Joel E. Denny	18f8106f31	[KernelInfo] Implement new LLVM IR pass for GPU code analysis (#102944 ) This patch implements an LLVM IR pass, named kernel-info, that reports various statistics for codes compiled for GPUs. The ultimate goal of these statistics to help identify bad code patterns and ways to mitigate them. The pass operates at the LLVM IR level so that it can, in theory, support any LLVM-based compiler for programming languages supporting GPUs. It has been tested so far with LLVM IR generated by Clang for OpenMP offload codes targeting NVIDIA GPUs and AMD GPUs. By default, the pass runs at the end of LTO, and options like ``-Rpass=kernel-info`` enable its remarks. Example `opt` and `clang` command lines appear in `llvm/docs/KernelInfo.rst`. Remarks include summary statistics (e.g., total size of static allocas) and individual occurrences (e.g., source location of each alloca). Examples of its output appear in tests in `llvm/test/Analysis/KernelInfo`.	2025-01-29 12:40:19 -05:00
Nikita Popov	8a43d0e873	[Attributor] Check correct IRPosition in AANoCapture::isImpliedByIR() This case is intended to check the callee argument, not the call-site. Fixes an issue introduced in #123181.	2025-01-29 17:34:10 +01:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00
Alexey Bataev	4a1a697427	[SLP][NFC]Unify ScalarToTreeEntries and MultiNodeScalars, NFC Currently, SLP has 2 distinct storages to manage mapping between vectorized instructions and their corresponding vectorized TreeEntry nodes. It leads to inefficient lookup for the matching TreeEntries and makes it harder to correctly track instructions, associated with multiple nodes. There is a plan to extend this support for instructions, that require scheduling, to allow support for copyable elements. Merging ScalarToTreeEntry and MultiNodeScalars will allow reduce maintenance of the feature Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/124914	2025-01-29 09:05:54 -05:00
Yingwei Zheng	cf37ae5cae	[InstCombine] Add one-use check when folding fabs over selects (#122270 ) Fixes multi-use issue introduced by https://github.com/llvm/llvm-project/pull/86390. It allows the folding of `fabs (select Cond, TrueC, FalseC)` to avoid performance regression in ocio	2025-01-29 21:44:59 +08:00
Florian Hahn	2b55ef187c	[VPlan] Add helper to run VPlan passes, verify after run (NFC). (#123640 ) Add new runPass helpers to run a VPlan transformation. This makes it easier to add additional checks/functionality for each transform run. In this patch, an option is added to run the verifier after each VPlan transform. Follow-ups will use the same helper to also support printing VPlans after each transform. Note that the verifier at the moment requires there to be a canonical IV and vector loop region, so the final lowering transforms aren't run via runPass yet. PR: https://github.com/llvm/llvm-project/pull/123640	2025-01-29 10:50:01 +00:00
Ryotaro Kasuga	690f251063	[LoopInterchange] Handle LE and GE correctly (#124901 ) LoopInterchange have converted `DVEntry::LE` and `DVEntry::GE` in direction vectors to '<' and '>' respectively. This handling is incorrect because the information about the '=' it lost. This leads to miscompilation in some cases. To resolve this issue, convert them to '*' instead. Resolve #123920	2025-01-29 19:30:54 +09:00
Ryotaro Kasuga	89e767f127	[LoopIdiom] Move up atomic checks for memcpy/memmove (NFC) (#124535 ) This patch moves up the checks that verify if it is legal to replace the atomic load/store with memcpy. Currently these checks are done after we determine to convert the load/store to memcpy/memmove, which makes the logic a bit confusing. This patch is a prelude to #50892	2025-01-29 19:21:45 +09:00
Fangrui Song	4c7aa6f983	[msan] Fix -Wunused-variable in non-assertion builds after #124421	2025-01-28 20:20:25 -08:00
Thurston Dang	fdadef9be3	[msan] Handle x86_avx512_(min\|max)_p[sd]_512 intrinsics (#124421 ) The AVX/SSE variants are already handled heuristically (maybeHandleSimpleNomemIntrinsic via handleUnknownIntrinsic), but the AVX512 variants contain an additional parameter (the rounding method) which fails to match heuristically. This patch generalizes maybeHandleSimpleNomemIntrinsic to allow additional flags (ignored by MSan) and explicitly call it to handle AVX512 min/max ps/pd intrinsics. It also updates the test added in https://github.com/llvm/llvm-project/pull/123980	2025-01-28 19:12:44 -08:00
vporpo	79cbad188a	[SandboxVec] Clear Context's state within runOnFunction() (#124842 ) `sandboxir::Context` is defined at a pass-level scope with the `SandboxVectorizerPass` class because the function pass manager `FPM` object depends on it, and that is in pass-level scope to avoid recreating the pass pipeline every single time `runOnFunction()` is called. This means that the Context's state lives on across function passes. The problem is twofold: (i) the LLVM IR to Sandbox IR map can grow very large including objects from different functions, which is of no use to the vectorizer, as it's a function-level pass. (ii) this can result in stale data in the LLVM IR to Sandbox IR object map, as other passes may delete LLVM IR objects. To fix both issues this patch introduces a `Context::clear()` function that clears the `LLVMValueToValueMap`.	2025-01-28 18:28:08 -08:00
Thurston Dang	4a426079d6	[msan] Use horizontal add to compute shadow for horizontal sub (#124835 ) This improves the horizontal sub handling (from https://github.com/llvm/llvm-project/pull/124159), by always using horizontal add for the shadow, as recommended by Vitaly. Fixes https://github.com/llvm/llvm-project/issues/124662	2025-01-28 14:56:05 -08:00
Florian Hahn	6338bde568	[VPlan] Use cast<VPRecipeBase> in verifier (NFC). All users of VPValue must be a VPRecipeBase, use cast.	2025-01-28 21:01:02 +00:00
Thurston Dang	7bd9c780e3	[msan][NFCI] Generalize handleIntrinsicByApplyingToShadow to allow alternative intrinsic for shadows (#124831 ) https://github.com/llvm/llvm-project/pull/124159 uses handleIntrinsicByApplyingToShadow for horizontal add/sub, but Vitaly recommends always using the add version to avoid false negatives for fully uninitialized data (https://github.com/llvm/llvm-project/issues/124662). This patch lays the groundwork by generalizing handleIntrinsicByApplyingToShadow to allow using a different intrinsic (of the same type as the original intrinsic) for the shadow. Planned work will apply it to horizontal sub.	2025-01-28 12:35:07 -08:00
Thurston Dang	063db51cd4	Reapply "[msan] Add handlers for AVX masked load/store intrinsics (#123857 )" This reverts commit b9d301cc7e4fe4c442ec15169686fa4a18f5cdfc i.e., relands db79fb2a91df31a07f312f8e061936927ac5c506. I had mistakenly thought this caused a buildbot breakage (the actual culprit was my other patch, https://github.com/llvm/llvm-project/pull/123980, which landed at the same time) and thus had reverted it even though AFAIK it is not broken.	2025-01-28 18:11:44 +00:00
Alexey Bataev	947d8ebbf3	[SLP]Unify getNumberOfParts use Adds getNumberOfParts and uses it instead of similar code across code base, fixes analysis of non-vectorizable types in computeMinimumValueSizes. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/124774	2025-01-28 12:16:44 -05:00
Ramkumar Ramachandra	d76ea250c8	Reland [InstCombine] Teach foldSelectOpOp about samesign (#124320 ) Changes: There was a serious bug in the previous patch, leading to a miscompile. See #122723 for the miscompile report from Alexander, and the follow-up investigation by Nikita. The patch has since been reworked, and now includes the testcase from the miscompile. Follow up on 4a0d53a (PatternMatch: migrate to CmpPredicate) to get rid of one of the FIXMEs it introduced by replacing a predicate comparison with CmpPredicate::getMatching. Co-authored-by: Nikita Popov <npopov@redhat.com>	2025-01-28 16:53:01 +00:00
Thurston Dang	ef92e6b99f	[BoundsChecking] Update ubsantrap to use GuardKind (#124613 ) This change makes it consistent with other uses of ubsantrap. This also updates the tests. Notably, BoundsChecking/runtimes.ll had guard=3 which passed only because the method of calculating the parameter (`IRB.GetInsertBlock()->getParent()->size()`) happened to give the same answer.	2025-01-28 08:52:31 -08:00
Alexey Bataev	a1ab5b4c87	[SLP]Check the MainOp matches the requirements for the instructions Need to include MainOp into the analysis of the instructions in getSameOpcode to be sure that it is checked for the requirements to prevent crashes during further analysis.	2025-01-28 06:00:52 -08:00
Alexey Bataev	1d5fbe83c3	[SLP]Adjust NumberOfParts value for adjusted number of buildvector scalars Need to adjust NumParts value, when GatheredScalars scalars are adjusted after extractelements analysis, to fix compiler crash	2025-01-28 05:45:13 -08:00
Jeremy Morse	304a99091c	[NFC][DebugInfo] Use iterators for insertion at some final callsites These are the callsites that have materialised in the last three weeks since I last built with deprecation warnings.	2025-01-28 11:37:11 +00:00
Nicholas Guy	cdea38f91a	Reland "[LoopVectorizer] Add support for chaining partial reductions #120272 " (#124282 ) Change `getScaledReduction` to take an existing vector, rather than creating and returning a new one each call. Rename `getScaledReduction` to `getScaledReductions` to more accurately reflect what it's now doing. --------- Co-authored-by: Karlo Basioli <68535415+basioli-k@users.noreply.github.com>	2025-01-28 10:40:35 +00:00
David Sherwood	0f61558b97	[LoopVectorize][NFC] Remove unused variable in addUsersInExitBlocks (#124553 ) We were allocating a VPTypeAnalysis object on the stack, but never using it for anything.	2025-01-28 09:11:27 +00:00
vporpo	334a1cdbfa	[SandboxIR] createFunction() should always create a function (#124665 ) This patch removes the assertion that checks for an existing function. If one exists it will remove it and create a new one. This helps remove a crash when a function declaration object already exists and we are about to create a SandboxIR object for the definition.	2025-01-27 20:16:30 -08:00
Thurston Dang	fa9ac62d02	[ubsan] Parse and use <cutoffs[0,1,2]=70000;cutoffs[5,6,8]=90000> in LowerAllowCheckPass (#124211 ) This adds and utilizes a cutoffs parameter for LowerAllowCheckPass, via the Options parameter (introduced in https://github.com/llvm/llvm-project/pull/122994). Future work will connect -fsanitize-skip-hot-cutoff (introduced patch in https://github.com/llvm/llvm-project/pull/121619) in the clang frontend to the cutoffs parameter used here.	2025-01-27 20:08:53 -08:00
Han-Kuan Chen	08d14e10ca	[SLP] Fix CommonMask will be transformed into an incorrect mask if createShuffle is called multiple times. (#124244 ) We have two types of mask in SLP: a scalar mask and a vector mask. When vectorizing four i32 additions into <4 x i32>, SLP creates a mask of length 4. When vectorizing four <2 x i32> additions into <8 x i32>, SLP also creates a mask of length 4. We refer to the first case as a scalar mask (because the mask element represents a scalar, i32), and the second case as a vector mask (because the mask element represents a vector, <4 x i32>). At some point, we must convert the scalar mask into a vector mask (otherwise, calling TTI cost functions or IRBuilderBase functions may yield incorrect results). Since both ShuffleCostEstimator and ShuffleInstructionBuilder can modify the CommonMask, we have decided to perform the mask transformation only within createShuffle. However, we do not store the transformed result, as createShuffle may be called multiple times.	2025-01-28 12:02:37 +08:00
Joseph Huber	760a786d15	[Clang] Prevent `mlink-builtin-bitcode` from internalizing the RPC client (#118661 ) Summary: Currently, we only use `-mlink-builtin-bitcode` for non-LTO NVIDIA compiliations. This has the problem that it will internalize the RPC client symbol which needs to be visible to the host. To counteract that, I put `retain` on it, but this also prevents optimizations on the global itself, so the passes we have that remove the symbol don't work on OpenMP anymore. This patch does the dumbest solution, adding a special string check for it in clang. Not the best solution, the runner up would be to have a clang attribute for `externally_initialized` because those can't be internalized, but that might have some unfortunate side-effects. Alternatively we could make NVIDIA compilations do LTO all the time, but that would affect some users and it's harder than I thought.	2025-01-27 19:30:59 -06:00
Florian Hahn	713482fccf	[VPlan] Use State.get to extract lane mask for BranchOnMask. Simplifies the code slightly and avoids redundant extracts/broadcasts if the operand is live-in or already scalar.	2025-01-27 21:35:36 +00:00
Florian Hahn	ad9da92cf6	[LoopUnroll] Add RuntimeUnrollMultiExit to loop unroll options (NFC) (#124462 ) Add an extra knob to RuntimeUnrollMultiExit to let backends control whether to allow multi-exit unrolling on a per-loop basis. This gives backends more fine-grained control on deciding if multi-exit unrolling is profitable for a given loop and uarch. Similar to 4226e0a0c75. PR: https://github.com/llvm/llvm-project/pull/124462	2025-01-27 21:20:04 +00:00
Jeremy Morse	285009f202	[NFC][DebugInfo] Rewrite more call-sites to insert with iterators (#124288 ) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction*'s as positions with iterators. The call-sites updated in this patch are those where the dyn_cast_or_null cast utility doesn't compose well with iterator insertion. It can distinguish between nullptr and a "present" (non-null) Instruction pointer, but not between a legal and illegal instruction iterator. This can lead to end-iterator dereferences and thus crashes. We can improve this in the future (as parent-pointers can now be accessed from ilist nodes), but for the moment, add explicit tests for end() iterators at the five call sites affected by this.	2025-01-27 20:30:45 +00:00
Kazu Hirata	e0c5a8553d	[memprof] Migrate away from PointerUnion::dyn_cast (NFC) (#124505 ) Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses cast because we know which alternative to expect in the ternary expression.	2025-01-27 10:35:37 -08:00
Jeremy Morse	34b139594a	[NFC][DebugInfo] Switch more call-sites to using iterator-insertion (#124283 ) To finalise the "RemoveDIs" work removing debug intrinsics, we're updating call sites that insert instructions to use iterators instead. This set of changes are those where it's not immediately obvious that just calling getIterator to fetch an iterator is correct, and one or two places where more than one line needs to change. Overall the same rule holds though: iterators generated for the start of a block such as getFirstNonPHIIt need to be passed into insert/move methods without being unwrapped/rewrapped, everything else can use getIterator.	2025-01-27 16:44:14 +00:00
Jeremy Morse	81d18ad864	[NFC][DebugInfo] Make some block-start-position methods return iterators (#124287 ) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction's as positions with iterators. A number of these (such as getFirstNonPHIOrDbg) are sufficiently infrequently used that we can just replace the pointer-returning version with an iterator-returning version, hopefully without much/any disruption. Thus this patch has getFirstNonPHIOrDbg and getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all call-sites. There are no concerns about the iterators returned being converted to Instruction's and losing the debug-info bit: because the methods skip debug intrinsics, the iterator head bit is always false anyway.	2025-01-27 16:27:54 +00:00
Florian Hahn	09a29fcc8d	[VPlan] Don't collect live-ins in collectUsersInExitBlocks. (NFC) (#123819 ) Live-ins don't need to be handled, other than adding to the exit phi recipe. Do that early and assert that otherwise the exit value is defined in the vector loop region. This should enable simply skipping other exit values that do not need further fixing, e.g. if handling the exit value from the early exit directly in handleUncountableEarlyExit. PR: https://github.com/llvm/llvm-project/pull/123819	2025-01-27 16:12:07 +00:00
Nikita Popov	212f344b84	[InstCombine] Handle constant expression result in tryFactorization() If IRBuilder folds the result to a constant expression, don't try to set nowrap flags on it. Fixes https://github.com/llvm/llvm-project/issues/124526.	2025-01-27 16:25:37 +01:00
Jeremy Morse	e14962a39c	[NFC][DebugInfo] Use iterators for instruction insertion in more places (#124291 ) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction*'s as positions with iterators. This patch changes some more complex call-sites, those crossing file boundaries and where I've had to perform some minor rewrites.	2025-01-27 15:25:17 +00:00
Alexey Bataev	f1d5e70a00	[SLP][NFC]Do not check poison values for corresponding vectorized entries No need to check poison values if they have been vectorized and/or mark them as vectorized, it should work only for instructions.	2025-01-27 06:38:23 -08:00
Thurston Dang	b9d301cc7e	Revert "[msan] Add handlers for AVX masked load/store intrinsics (#123857 )" This reverts commit db79fb2a91df31a07f312f8e061936927ac5c506. Reason: buildbot breakage (https://lab.llvm.org/buildbot/#/builders/144/builds/16636/steps/6/logs/FAIL__LLVM__avx512-intrinsics-upgrade_ll)	2025-01-27 01:10:35 +00:00
Thurston Dang	db79fb2a91	[msan] Add handlers for AVX masked load/store intrinsics (#123857 ) This patch adds explicit support for AVX masked load/store intrinsics, largely by applying the intrinsics to the shadows (but subtly different to handleIntrinsicByApplyingToShadow()). We do not reuse the handleMaskedLoad/Store functions. The key challenge is that the LLVM masked intrinsics require a vector of booleans, while AVX masked intrinsics use the MSBs of a vector of integers. X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad mentions that the x86 backend does not know how to efficiently convert from a vector of booleans back into the AVX mask format; therefore, they (and we) do not reduce AVX masked intrinsics into LLVM masked intrinsics.	2025-01-26 15:40:55 -08:00
Vasileios Porpodas	1c4341d176	[SandboxVec][DAG] Fix interval check without Node This patch moves the check of whether a node exists before the check of whether it is contained in the interval.	2025-01-26 11:54:09 -08:00
Florian Hahn	1395cd015f	[VPlan] Support multi-exit loops in HCFG builder. Update HCFG construction to support multi-exit loops. If there is no unique exit block, map the middle block of the initial plan to the exit block from the latch. This further unifies HCFG construction and prepares for use to also build an initial VPlan (VPlan0) for inner loops. Effectively NFC as this isn't used on the default code path yet.	2025-01-25 21:55:15 +00:00
Fangrui Song	2131115be5	[InstCombine] Drop Range attribute when simplifying 'fshl' based on demanded bits (#124429 ) When simplifying operands based on demanded bits, the return value range of llvm.fshl might change. Keeping the Range attribute might cause llvm.fshl to generate a poison and lead to miscompile. Drop the Range attribute similar to `dropPosonGeneratingFlags` elsewhere. Fix #124387	2025-01-25 13:35:11 -08:00
Vasileios Porpodas	b178c2d63e	[SandboxVec][DAG] Fix trim schedule Fix trimSchedule by skipping instructions without a DAG Node.	2025-01-25 09:42:14 -08:00

1 2 3 4 5 ...

38786 Commits