llvm-project

Author	SHA1	Message	Date
Harald van Dijk	23209eb1d9	Revert "[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059 )" This reverts commit 3ec9f7494b31f2fe51d5ed0e07adcf4b7199def6.	2025-02-12 17:50:39 +00:00
Harald van Dijk	3ec9f7494b	[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059 ) After #124287 updated several functions to return iterators rather than Instruction , it was no longer straightforward to pass their result to DIBuilder. This commit updates DIBuilder methods to accept an InsertPosition instead, so that they can be called with an iterator (preferred), or with a deprecation warning an Instruction , or a BasicBlock *. This commit also updates the existing calls to the DIBuilder methods to pass in iterators.	2025-02-12 17:38:59 +00:00
Alireza Torabian	3c74430320	[DependenceAnalysis][NFC] Removing PossiblyLoopIndependent parameter (#124615 ) Parameter PossiblyLoopIndependent has lost its intended purpose. This flag is always set to true in all cases when depends() is called, hence we want to reconsider the utility of this variable and remove it from the function signature entirely. This is an NFC patch.	2025-02-11 16:23:28 -05:00
Florian Hahn	e258bca950	[VPlan] Only skip expansion for SCEVUnknown if it isn't an instruction. (#125235 ) Update getOrCreateVPValueForSCEVExpr to only skip expansion of SCEVUnknown if the underlying value isn't an instruction. Instructions may be defined in a loop and using them without expansion may break LCSSA form. SCEVExpander will take care of preserving LCSSA if needed. We could also try to pass LoopInfo, but there are some users of the function where it won't be available and main benefit from skipping expansion is slightly more concise VPlans. Note that SCEVExpander is now used to expand SCEVUnknown with floats. Adjust the check in expandCodeFor to only check the types and casts if the type of the value is different to the requested type. Otherwise we crash when trying to expand a float and requesting a float type. Fixes https://github.com/llvm/llvm-project/issues/121518. PR: https://github.com/llvm/llvm-project/pull/125235	2025-02-11 13:03:12 +01:00
Ramkumar Ramachandra	d0f472c246	SimplifyIndVar: teach widenLoopCompare about samesign (#125851 ) Proof: https://alive2.llvm.org/ce/z/NVXaeo	2025-02-06 11:13:01 +00:00
Yingwei Zheng	9fbd5fbcc6	[IR][NFC] Switch to use `LifetimeIntrinsic` (#125528 )	2025-02-04 02:18:33 +08:00
Yingwei Zheng	84ba55d18b	[NFC][SimplifyCFG] Refactor `passingValueIsAlwaysUndefined` to work on `Use` (#125519 ) Address comment https://github.com/llvm/llvm-project/pull/125383#discussion_r1938436526	2025-02-04 00:42:25 +08:00
Ramkumar Ramachandra	88f858d858	IndVarSimplify: thread CmpPredicate to SCEV (NFC) (#125240 ) Relevant parts of ScalarEvolution's API accept a CmpPredicate instead of a CmpInst::Predicate after 60dc450 (SCEV: migrate to CmpPredicate (NFC)). After auditing the callers of these APIs, it was found that IndVarSimplify was dropping samesign information. Fix this.	2025-01-31 18:07:44 +00:00
Lou	c75b251103	[GVN] Load-store forwaring of scalable store to fixed load. (#124748 ) When storing a scalable vector and loading a fixed-size vector, where the scalable vector is known to be larger based on vscale_range, perform store-to-load forwarding through temporary @llvm.vector.extract calls. InstCombine then folds the insert/extract pair away. The usecase is shown in https://godbolt.org/z/KT3sMrMbd, which shows that clang generates IR that matches this pattern when the "arm_sve_vector_bits" attribute is used: ```c typedef svfloat32_t svfloat32_fixed_t __attribute__((arm_sve_vector_bits(512))); struct svfloat32_wrapped_t { svfloat32_fixed_t v; }; static inline svfloat32_wrapped_t add(svfloat32_wrapped_t a, svfloat32_wrapped_t b) { return {svadd_f32_x(svptrue_b32(), a.v, b.v)}; } svfloat32_wrapped_t foo(svfloat32_wrapped_t a, svfloat32_wrapped_t b) { // The IR pattern this patch matches is generated for this return: return add(a, b); } ```	2025-01-30 11:49:42 +01:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00
Florian Hahn	ad9da92cf6	[LoopUnroll] Add RuntimeUnrollMultiExit to loop unroll options (NFC) (#124462 ) Add an extra knob to RuntimeUnrollMultiExit to let backends control whether to allow multi-exit unrolling on a per-loop basis. This gives backends more fine-grained control on deciding if multi-exit unrolling is profitable for a given loop and uarch. Similar to 4226e0a0c75. PR: https://github.com/llvm/llvm-project/pull/124462	2025-01-27 21:20:04 +00:00
Jeremy Morse	34b139594a	[NFC][DebugInfo] Switch more call-sites to using iterator-insertion (#124283 ) To finalise the "RemoveDIs" work removing debug intrinsics, we're updating call sites that insert instructions to use iterators instead. This set of changes are those where it's not immediately obvious that just calling getIterator to fetch an iterator is correct, and one or two places where more than one line needs to change. Overall the same rule holds though: iterators generated for the start of a block such as getFirstNonPHIIt need to be passed into insert/move methods without being unwrapped/rewrapped, everything else can use getIterator.	2025-01-27 16:44:14 +00:00
Jeremy Morse	81d18ad864	[NFC][DebugInfo] Make some block-start-position methods return iterators (#124287 ) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction's as positions with iterators. A number of these (such as getFirstNonPHIOrDbg) are sufficiently infrequently used that we can just replace the pointer-returning version with an iterator-returning version, hopefully without much/any disruption. Thus this patch has getFirstNonPHIOrDbg and getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all call-sites. There are no concerns about the iterators returned being converted to Instruction's and losing the debug-info bit: because the methods skip debug intrinsics, the iterator head bit is always false anyway.	2025-01-27 16:27:54 +00:00
Jeremy Morse	e14962a39c	[NFC][DebugInfo] Use iterators for instruction insertion in more places (#124291 ) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction*'s as positions with iterators. This patch changes some more complex call-sites, those crossing file boundaries and where I've had to perform some minor rewrites.	2025-01-27 15:25:17 +00:00
Stephen Long	ab976a1712	PreISelIntrinsicLowering: Lower llvm.exp/llvm.exp2 to a loop if scalable vec arg (#117568 )	2025-01-24 14:02:06 -05:00
Jeremy Morse	6292a808b3	[NFC][DebugInfo] Use iterator-flavour getFirstNonPHI at many call-sites (#123737 ) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to getFirstNonPHI use the iterator-returning version. This patch changes a bunch of call-sites calling getFirstNonPHI to use getFirstNonPHIIt, which returns an iterator. All these call sites are where it's obviously safe to fetch the iterator then dereference it. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer getFirstNonPHI, but not before adding concise documentation of what considerations are needed (very few). --------- Co-authored-by: Stephen Tozer <Melamoto@gmail.com>	2025-01-24 13:27:56 +00:00
Jeremy Morse	8e70273509	[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583 ) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to moveBefore use iterators. This patch adds a (guaranteed dereferenceable) iterator-taking moveBefore, and changes a bunch of call-sites where it's obviously safe to change to use it by just calling getIterator() on an instruction pointer. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer insertBefore, but not before adding concise documentation of what considerations are needed (very few).	2025-01-24 10:53:11 +00:00
Artem Pianykh	196f7c2a4f	[Utils] Identity map module-level debug info on first use in CloneFunction* (#118627 ) Summary: To avoid cloning module-level debug info (owned by the module rather than the function), CloneFunction implementation used to eagerly identity map such debug info into ValueMap's MD map. In larger modules with meaningful volume of debug info this gets very expensive. By passing such debug info metadata via an IdentityMD set for the ValueMapper to map on first use, we get several benefits: 1. Mapping metadata is not cheap, particularly because of tracking. When cloning a Function we identity map lots of global module-level metadata to avoid cloning it, while only a fraction of it is actually used by the function. Mapping on first use is a lot faster for modules with meaningful amount of debug info. 2. Eagerly identity mapping metadata makes it harder to cache module-level data (e.g. a set of metadata nodes in a \a DICompileUnit). With this patch we can cache certain module-level metadata calculations to speed things up further. Anecdata from compiling a sample cpp file with full debug info shows that this moderately speeds up CoroSplitPass which is one of the heavier users of cloning: \| \| Baseline \| IdentityMD set \| \|-----------------\|----------\|----------------\| \| CoroSplitPass \| 306ms \| 221ms \| \| CoroCloner \| 101ms \| 72ms \| \| Speed up \| 1x \| 1.4x \| Test Plan: ninja check-llvm-unit ninja check-llvm	2025-01-24 08:38:38 +00:00
David Green	775d0f36f7	[GVN] Handle scalable vectors with the same size in VNCoercion (#123984 ) This allows us to forward to a load even if the types do not match (nxv4i32 vs nxv2i64 for example). Scalable types are allowed in canCoerceMustAliasedValueToLoad so long as the size (minelts * scalarsize) is the same, and some follow-on code is adjusted to make sure it handles scalable sizes correctly. Methods like analyzeLoadFromClobberingWrite and analyzeLoadFromClobberingStore still do nothing for scalable vectors, as Offsets and mismatching types are not supported.	2025-01-23 18:43:50 +00:00
Joshua Cranmer	b0d35cf22e	[SSAUpdater] Avoid scanning basic blocks to find instruction order. (#123803 ) This fixes a compile-time regression caused by #116645, where an entry basic block with a very large number of allocas and other instructions caused SROA to take ~100× its expected runtime, as every alloca (with ~2 uses) now calls this method to find the order of those few instructions, rescanning the very large basic block every single time. Since this code was originally written, Instructions now have ordering numbers available to determine relative order without unnecessarily scanning the basic block.	2025-01-22 11:11:41 -05:00
Mats Larsen	b95ed30ea2	[IR] Remove unused variables from #123617 Failed to notice them when landing that patch - apologies!	2025-01-21 01:47:38 +09:00
Mats Jun Larsen	416f1c465d	[IR] Replace of PointerType::get(Type) with opaque version (NFC) (#123617 ) In accordance with https://github.com/llvm/llvm-project/issues/123569 In order to keep the patch at reasonable size, this PR only covers for the llvm subproject, unittests excluded.	2025-01-21 00:32:56 +09:00
Nikita Popov	193ea83dd7	[SimplifyLibCalls] Don't infer call-site nocapture on atoi This is already inferred on the function declaration by BLC, there is no need to also do it at the call-site.	2025-01-14 17:13:45 +01:00
Pedro Lobo	7e19103895	[DebugInfo] Map VAM args to `poison` instead of `undef` [NFC] (#122756 ) If an argument cannot be mapped in `Mapper::mapValue`, it can be mapped to `poison` instead of `undef`.	2025-01-13 21:38:40 +00:00
Nikita Popov	22e9024c9f	[IR] Introduce captures attribute (#116990 ) This introduces the `captures` attribute as described in: https://discourse.llvm.org/t/rfc-improvements-to-capture-tracking/81420 This initial patch only introduces the IR/bitcode support for the attribute and its in-memory representation as `CaptureInfo`. This will be followed by a patch to upgrade and remove the `nocapture` attribute, and then by actual inference/analysis support. Based on the RFC feedback, I've used a syntax similar to the `memory` attribute, though the only "location" that can be specified is `ret`. I've added some pretty extensive documentation to LangRef on the semantics. One non-obvious bit here is that using ptrtoint will not result in a "return-only" capture, even if the ptrtoint result is only used in the return value. Without this requirement we wouldn't be able to continue ordinary capture analysis on the return value.	2025-01-13 14:40:25 +01:00
Nikita Popov	71f7b972c3	[Local] Make combineAAMetadata() more principled (#122091 ) This moves combineAAMetadata() into Local and implements it via a new AAOnly flag, which will intersect only AA metadata and keep other known metadata. The existing KnownIDs list is dropped, because it is redundant with the switch in combineMetadata(), which already drops unknown metadata. I tried a few variants of this, and ultimately went with the AAOnly flag because this way we make an explicit choice for each metadata kind supported by combineMetadata(), and ignoring the flag gives you conservatively correct behavior. I checked that the memcpy tests still pass if we adjust the logic for MD_memprof/MD_callsite to drop the metadata instead of arbitrarily picking one. Fixes https://github.com/llvm/llvm-project/issues/121495.	2025-01-09 09:34:46 +01:00
Akshat Oke	f6c76d5180	[PM] Remove is_analysis label for LoopSimplify (#121433 ) This reverts part of the changes in #118779	2025-01-09 10:11:14 +05:30
Ryan Mansfield	67efbd0bf1	[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955 )	2025-01-08 11:07:23 +01:00
Mircea Trofin	4312075efa	[nfc][thinlto] remove unnecessary return from `renameModuleForThinLTO` (#121851 ) Same goes for `FunctionImportGlobalProcessing::run`. The return value was used, but it was always `false`.	2025-01-06 15:19:09 -08:00
Yingwei Zheng	a77346bad0	[IRBuilder] Refactor FMF interface (#121657 ) Up to now, the only way to set specified FMF flags in IRBuilder is to use `FastMathFlagGuard`. It makes the code ugly and hard to maintain. This patch introduces a helper class `FMFSource` to replace the original parameter `Instruction *FMFSource` in IRBuilder. To maximize the compatibility, it accepts an instruction or a specified FMF. This patch also removes the use of `FastMathFlagGuard` in some simple cases. Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=f87a9db8322643ccbc324e317a75b55903129b55&to=9397e712f6010be15ccf62f12740e9b4a67de2f4&stat=instructions%3Au	2025-01-06 14:37:04 +08:00
Fangrui Song	e6f76378c2	EntryExitInstrumenter: skip available_externally linkage gnu::always_inline functions, which lower to available_externally, may not have definitions external to the module. -finstrument-function family options instrumentating the function (which takes the function address) may lead to a linker error if the function is not optimized out, e.g. ``` // -std=c++17 or above with libstdc++ #include <string> std::string str; int main() {} ``` Simplified reproduce: ``` template <typename T> struct A { [[gnu::always_inline]] T bar(T a) { return a * 2; } }; extern template class A<int>; int main(int argc, char **argv) { return A<int>().bar(argc); } ``` GCC's -finstrument-function instrumentation skips such functions (https://gcc.gnu.org/PR78333). Let's skip such functions (available_externally) as well. Fix #50742 Pull Request: https://github.com/llvm/llvm-project/pull/121452	2025-01-03 09:25:08 -08:00
Teresa Johnson	3a423a10ff	[MemProf][PGO] Prevent dropping of profile metadata during optimization (#121359 ) This patch fixes a couple of places where memprof-related metadata (!memprof and !callsite) were being dropped, and one place where PGO metadata (!prof) was being dropped. All were due to instances of combineMetadata() being invoked. That function drops all metadata not in the list provided by the client, and also drops any not in its switch statement. Memprof metadata needed a case in the combineMetadata switch statement. For now we simply keep the metadata of the instruction being kept, which doesn't retain all the profile information when two calls with memprof metadata are being combined, but at least retains some. For the memprof metadata being dropped during call CSE, add memprof and callsite metadata to the list of known ids in combineMetadataForCSE. Neither memprof nor regular prof metadata were in the list of known ids for the callsite in MemCpyOptimizer, which was added to combine AA metadata after optimization of byval arguments fed by memcpy instructions, and similar types of optimizations of memcpy uses. There is one other callsite of combineMetadata, but it is only invoked on load instructions, which do not carry these types of metadata.	2025-01-02 12:11:59 -08:00
Yingwei Zheng	eafbab6fac	[EntryExitInstrumenter][AArch64][RISCV][LoongArch] Pass `__builtin_return_address(0)` into `_mcount` (#121107 ) On RISC-V, AArch64, and LoongArch, the `_mcount` function takes `__builtin_return_address(0)` as an argument since `__builtin_return_address(1)` is not available on these platforms. This patch fixes the argument passing to match the behavior of glibc/gcc. Closes https://github.com/llvm/llvm-project/issues/121103.	2025-01-01 15:02:08 +08:00
DaPorkchop_	cea738bc9a	[SimplifyCFG] Replace unreachable switch lookup table holes with poison (#94990 ) As discussed in #94468, this causes switch lookup table entries which are unreachable to be poison instead of filling them with a value from one of the reachable cases. --------- Co-authored-by: DianQK <dianqk@dianqk.net>	2024-12-26 07:47:26 +08:00
Owen Anderson	bc8fa9c443	Revert "SimplifyLibCalls: Use default globals address space when building new global strings. (#118729 )" (#119616 ) This reverts commit cfa582e8aaa791b52110791f5e6504121aaf62bf.	2024-12-21 09:33:39 +13:00
Dominik Steenken	fa9cef50b1	Only guard loop metadata that has non-debug info in it (#118825 ) This PR is motivated by a mismatch we discovered between compilation results with vs. without `-g3`. We noticed this when compiling SPEC2017 testcases. The specific instance we saw is fixed in this PR by modifying a guard (see below), but it is likely similar instances exist elsewhere in the codebase. The specific case fixed in this PR manifests itself in the `SimplifyCFG` pass doing different things depending on whether DebugInfo is generated or not. At the end of this comment, there is reduced example code that shows the behavior in question. The differing behavior has two root causes: 1. Commit https://github.com/llvm/llvm-project/commit/c07e19b adds loop metadata including debug locations to loops that otherwise would not have loop metadata 2. Commit https://github.com/llvm/llvm-project/commit/ac28efa6c100 adds a guard to a simplification action in `SImplifyCFG` that prevents it from simplifying away loop metadata So, the change in 2. does not consider that when compiling with debug symbols, loops that otherwise would not have metadata that needs preserving, now have debug locations in their loop metadata. Thus, with `-g3`, `SimplifyCFG` behaves differently than without it. The larger issue is that while debug info is not supposed to influence the final compilation result, commits like 1. blur the line between what is and is not debug info, and not all optimization passes account for this. This PR does not address that and rather just modifies this particular guard in order to restore equivalent behavior between debug and non-debug builds in this one instance. --- Here is a reduced version of a file from `f526.blender_r` that showcases the behavior in question: ```C struct LinkNode; typedef struct LinkNode { struct LinkNode next; void link; } LinkNode; void do_projectpaint_thread_ph_v_state() { int ps = do_projectpaint_thread_ph_v_state; LinkNode node; while (do_projectpaint_thread_ph_v_state) for (node = ps; node; node = node->next) ; } ``` Compiling this with and without DebugInfo, and then disassembling the results, leads to different outcomes (tested on SystemZ and X86). The reason for this is that the `SimplifyCFG` pass does different things in either case.	2024-12-20 15:15:51 +01:00
Abhay Kanhere	cc246d4a29	[Transforms][CodeExtraction] bug fix regions with stackrestore (#118564 ) Ensure code extraction for outlining to a function does not create a function with stacksave of caller to restore stack (e.g. tail call).	2024-12-19 09:19:11 -07:00
Florian Hahn	a487b792e2	[TySan] Add initial Type Sanitizer (LLVM) (#76259 ) This patch introduces the LLVM components of a type sanitizer: a sanitizer for type-based aliasing violations. It is based on Hal Finkel's https://reviews.llvm.org/D32198. C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help. For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected. The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime. The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls. The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated. Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test in the runtime patch. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer. When the sanitizer is active, we disable actually using the TBAA metadata for AA. This way we're less likely to use TBAA to remove memory accesses that we'd like to verify. As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work. It goes together with the corresponding clang changes (https://github.com/llvm/llvm-project/pull/76260) and compiler-rt changes (https://github.com/llvm/llvm-project/pull/76261) PR: https://github.com/llvm/llvm-project/pull/76259	2024-12-17 13:57:34 +00:00
Artem Pianykh	fbdbb13d5b	[NFC][Utils] Eliminate DISubprogram set from BuildDebugInfoMDMap (#118625 ) Summary: Previously, we'd add all SPs distinct from the cloned one into a set. Then when cloning a local scope we'd check if it's from one of those 'distinct' SPs by checking if it's in the set. We don't need to do that. We can just check against the cloned SP directly and drop the set. Test Plan: ninja check-llvm-unit check-llvm	2024-12-17 08:57:59 +00:00
Artem Pianykh	8402a0fab0	[NFC][Utils] Extract CloneFunctionBodyInto from CloneFunctionInto (#118624 ) Summary: This and previously extracted `CloneFunction*Into` functions will be used in later diffs. Test Plan: ninja check-llvm-unit check-llvm	2024-12-16 22:30:56 +00:00
Artem Pianykh	a9237b1a10	[NFC][Utils] Extract CloneFunctionMetadataInto from CloneFunctionInto (#118623 ) Summary: The new API expects the caller to populate the VMap. We need it this way for a subsequent change around coroutine cloning. Test Plan: ninja check-llvm-unit check-llvm	2024-12-16 20:50:05 +00:00
Vedant Paranjape	b21fa18b44	[LoopVersioning] Add a check to see if the input loop is in LCSSA form (#116443 ) Loop Optimizations expect the input loop to be in LCSSA form. But it seems that LoopVersioning doesn't have any check to see if the loop is actually in LCSSA form. As a result, if we give it a loop which is not in LCSSA form but still correct semantically, the resulting transformation fails to pass through verifier pass with the following error. Instruction does not dominate all uses! %inc = add nsw i16 undef, 1 store i16 %inc, ptr @c, align 1 As the loop is not in LCSSA form, LoopVersioning's transformations leads to invalid IR! As some instructions do not dominate all their uses. This patch checks if a loop is in LCSSA form, if not it will call formLCSSARecursively on the loop before passing it to LoopVersioning. Fixes: #36998	2024-12-16 11:55:19 -05:00
David Green	0032c151dc	[SROA] Optimize reloaded values in allocas that escape into readonly nocapture calls. (#116645 ) Given an alloca that potentially has many uses in big complex code and escapes into a call that is readonly+nocapture, we cannot easily split up the alloca. There are several optimizations that will attempt to take a value that is stored and a reload, and replace the load with the original stored value. Instcombine has some simple heuristics, GVN can sometimes do it, as can CSE in limited situations. They all suffer from the same issue with complex code - they start from a load/store and need to prove no-alias for all code between, which in complex cases might be a lot to look through. Especially if the ptr is an alloca with many uses that is over the normal escape capture limits. The pass that does do well with allocas is SROA, as it has a complete view of all of the uses. This patch adds a case to SROA where it can detect allocas that are passed into calls that are no-capture readonly. It can then optimize the reloaded values inside the alloca slice with the stored value knowing that it is valid no matter the location of the loads/stores from the no-escaping nature of the alloca.	2024-12-14 18:07:21 +00:00
Florian Hahn	c4a78b6fe3	[SimplifyCFG] Always allow hoisting if all instructions match. (#97158 ) Generalize hoistCommonCodeFromSuccessors's `EqTermsOnly` to `AllInstsEqOnly` and always allow hoisting if all instructions match. In that case, all instructions can be hoisted and the original branch will be replaced and selects for PHIs are added. This allows preserving metadata in more cases, using the existing hoisting logic, whereas previously FoldTwoEntryPHINode would drop the metadata. https://llvm-compile-time-tracker.com/compare.php?from=716360367fbdabac2c374c19b8746f4de49a5599&to=986b2c47df516b31d998c055400e4f62aa76edc6&stat=instructions:u PR: https://github.com/llvm/llvm-project/pull/97158	2024-12-13 21:26:27 +00:00
Ramkumar Ramachandra	4a0d53a0b0	PatternMatch: migrate to CmpPredicate (#118534 ) With the introduction of CmpPredicate in 51a895a (IR: introduce struct with CmpInst::Predicate and samesign), PatternMatch is one of the first key pieces of infrastructure that must be updated to match a CmpInst respecting samesign information. Implement this change to Cmp-matchers. This is a preparatory step in migrating the codebase over to CmpPredicate. Since we no functional changes are desired at this stage, we have chosen not to migrate CmpPredicate::operator==(CmpPredicate) calls to use CmpPredicate::getMatching(), as that would have visible impact on tests that are not yet written: instead, we call CmpPredicate::operator==(Predicate), preserving the old behavior, while also inserting a few FIXME comments for follow-ups.	2024-12-13 14:18:33 +00:00
Antonio Frighetto	d26df32255	[SimplifyCFG] Consider preds to switch in `simplifyDuplicateSwitchArms` Allow a duplicate basic block with multiple predecessors to the jump table to be simplified, by considering that the same basic block may appear in more switch cases.	2024-12-13 09:07:24 +01:00
Kirill Stoimenov	e3676aa21f	Revert "[SROA] Optimize reloaded values in allocas that escape into readonly nocapture calls. (#116645 )" Causing buffer overflow: SUMMARY: AddressSanitizer: heap-buffer-overflow llvm/lib/Transforms/Scalar/SROA.cpp:5552:35 This reverts commit 5e247d726d7a54cf0acc997bc17b50e7494e6fa3.	2024-12-12 21:32:35 +00:00
David Green	5e247d726d	[SROA] Optimize reloaded values in allocas that escape into readonly nocapture calls. (#116645 ) Given an alloca that potentially has many uses in big complex code and escapes into a call that is readonly+nocapture, we cannot easily split up the alloca. There are several optimizations that will attempt to take a value that is stored and a reload, and replace the load with the original stored value. Instcombine has some simple heuristics, GVN can sometimes do it, as can CSE in limited situations. They all suffer from the same issue with complex code - they start from a load/store and need to prove no-alias for all code between, which in complex cases might be a lot to look through. Especially if the ptr is an alloca with many uses that is over the normal escape capture limits. The pass that does do well with allocas is SROA, as it has a complete view of all of the uses. This patch adds a case to SROA where it can detect allocas that are passed into calls that are no-capture readonly. It can then optimize the reloaded values inside the alloca slice with the stored value knowing that it is valid no matter the location of the loads/stores from the no-escaping nature of the alloca.	2024-12-12 10:27:27 +00:00
Nikita Popov	5013c81b78	[GlobalOpt][Evaluator] Don't evaluate calls with signature mismatch (#119548 ) The global ctor evaluator tries to evalute function calls where the call function type and function type do not match, by performing bitcasts. This currently causes a crash when calling a void function with non-void return type. I've opted to remove this functionality entirely rather than fixing this specific case. With opaque pointers, there shouldn't be a legitimate use case for this anymore, as we don't need to look through pointer type casts. Doing other bitcasts is very iffy because it ignores ABI considerations. We should at least leave adjusting the signatures to make them line up to InstCombine (which also does some iffy things, but is at least somewhat more constrained). Fixes https://github.com/llvm/llvm-project/issues/118725.	2024-12-12 10:44:52 +01:00
Mel Chen	b3cba9be41	[LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable (#67812 ) Consider the following loop: ``` int rdx = init; for (int i = 0; i < n; ++i) rdx = (a[i] > b[i]) ? i : rdx; ``` We can vectorize this loop if `i` is an increasing induction variable. The final reduced value will be the maximum of `i` that the condition `a[i] > b[i]` is satisfied, or the start value `init`. This patch added new RecurKind enums - IFindLastIV and FFindLastIV. --------- Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>	2024-12-12 16:48:31 +08:00

1 2 3 4 5 ...

7688 Commits