llvm-project

Author	SHA1	Message	Date
Hari Limaye	e19a5fc6d3	[FuncSpec] Improve accounting of specialization codesize growth (#113448 ) Only accumulate the codesize increase of functions that are actually specialized, rather than for every candidate specialization that we analyse. This fixes a subtle bug where prior analysis of candidate specializations that were deemed unprofitable could prevent subsequent profitable candidates from being recognised.	2024-10-29 11:53:12 +00:00
Jay Foad	2443549b85	[IR] Remove some uses of StructType::setBody. NFC. (#113685 ) It is simple to create the struct body up front, now that we have transitioned to opaque pointers.	2024-10-29 11:44:53 +00:00
Hari Limaye	06664fdc76	[FuncSpec] Enable SpecializeLiteralConstant by default (#113442 ) Enable specialization on literal constant arguments by default in Function Specialization. --------- Co-authored-by: Alexandros Lamprineas <alexandros.lamprineas@arm.com>	2024-10-29 11:41:25 +00:00
Yingwei Zheng	18311093ab	[InstCombine] Do not fold `shufflevector(select)` if the select condition is a vector (#113993 ) Since `shufflevector` is not element-wise, we cannot do fold it into select when the select condition is a vector. For shufflevector that doesn't change the length, it doesn't crash, but it is still a miscompilation: https://alive2.llvm.org/ce/z/s8saCx Fixes https://github.com/llvm/llvm-project/issues/113986.	2024-10-29 10:39:07 +08:00
c8ef	0c1c37bfbe	[TLI] Add support for the `tgamma` libcall. (#113791 ) This patch adds the `tgamma` libcall.	2024-10-29 10:08:38 +08:00
vporpo	a461869db3	[SandboxIR][Pass] Implement Analyses class (#113962 ) The Analyses class provides a way to pass around commonly used Analyses to SandboxIR passes throught `runOnFunction()` and `runOnRegion()` functions.	2024-10-28 18:00:52 -07:00
Igor Kudrin	757d0e4764	Revert "[CFI][LowerTypeTests] Fix indirect call with alias" (#113978 ) Reverts llvm/llvm-project#106185 This is breaking Sanitizer bots: https://lab.llvm.org/buildbot/#/builders/66/builds/5449/steps/8/logs/stdio	2024-10-28 16:13:32 -07:00
David Majnemer	902acde341	[InstCombine] Optimize away certain additions using modular arithmetic We can turn: ``` %add = add i8 %arg, C1 %and = and i8 %add, C2 %cmp = icmp eq i1 %and, C3 ``` into: ``` %and = and i8 %arg, C2 %cmp = icmp eq i1 %and, (C3 - C1) & C2 ``` This is only worth doing if the sequence is the sole user of the addition operation.	2024-10-28 22:51:35 +00:00
Matthias Braun	5903c6af44	InstCombine: Fold shufflevector(select) and shufflevector(phi) (#113746 ) - Transform `shufflevector(select(c, x, y), C)` to `select(c, shufflevector(x, C), shufflevector(y, C))` by re-using the `FoldOpIntoSelect` helper. - Transform `shufflevector(phi(x, y), C)` to `phi(shufflevector(x, C), shufflevector(y, C))` by re-using the `foldOpInotPhi` helper.	2024-10-28 15:35:17 -07:00
vporpo	bf4b31ad54	[SandboxVec][Legality] Check Fastmath flags (#113967 )	2024-10-28 15:32:20 -07:00
vporpo	5ea694816b	[SandboxVec][Legality] Check opcodes and types (#113741 )	2024-10-28 14:05:58 -07:00
Igor Kudrin	67bcce2141	[CFI][LowerTypeTests] Fix indirect call with alias (#106185 ) Motivation example: ``` > cat test.cpp extern "C" [[gnu::weak]] void f() {} void alias() __attribute__((alias("f"))); int main() { auto p = alias; p(); } > clang test.cpp -fsanitize=cfi-icall -flto=thin -fuse-ld=lld > ./a.out [1] 1868 illegal hardware instruction ./a.out ``` If the address of a function was only taken through its alias, the function was not considered exported and therefore was not included in the CFI jumptable. This resulted in `@llvm.type.test()` being lowered to `false`, and consequently the indirect call to the function was eventually optimized to `ubsantrap()`.	2024-10-28 13:14:42 -07:00
Florian Hahn	0d0abb351b	[VPlan] Use ResumePhi to create reduction resume phis. (#110004 ) Use VPInstruction::ResumePhi to create phi nodes for reduction resume values in the scalar preheader, similar to how ResumePhis are used for first-order recurrence resume values after 9a5a8731e77. This allows simplifying createAndCollectMergePhiForReduction to only collect reduction resume phis when vectorizing epilogue loops and adding extra incoming edges from the main vector loop. Updating phis for the epilogue vector loops requires special attention, because additional incoming values from the bypass blocks need to be added. PR: https://github.com/llvm/llvm-project/pull/110004	2024-10-28 20:14:08 +01:00
Lei Wang	e517cfc531	[InstrPGO] Support cold function coverage instrumentation (#109837 ) This patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead. More details about the implementation and flags: - Added a flag `--pgo-instrument-cold-function-only` in `PGOInstrumentation.cpp` as the main switch to control skipping the instrumentation. - Built the extra instrumentation passes(a bundle of passes in `addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by `--instrument-cold-function-only-path` flag. - Added a driver flag `-fprofile-generate-cold-function-coverage`: - 1) Config the flags in one place, i,e. adding `--instrument-cold-function-only-path=<...>` and `--pgo-function-entry-coverage`. Note that the instrumentation file path is passed through `--instrument-sample-cold-function-path`, because we cannot use the `PGOOptions.ProfileFile` as it's already used by `-fprofile-sample-use=<...>`. - 2) makes linker to link `compiler_rt.profile` lib(see [ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131) ). - Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry count to determine cold function. Overall, the full command is like: ``` clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...> code.cc -o code ```	2024-10-28 10:13:45 -07:00
Ellis Hoag	6ab26eab4f	Check hasOptSize() in shouldOptimizeForSize() (#112626 )	2024-10-28 09:45:03 -07:00
Alexey Bataev	7152bf3bc8	[SLP]Do not create new vector node if scalars fully overlap with the existing one If the list of scalars vectorized as the part of the same vector node, no need to generate vector node again, it will be handled as part of overlapping matching. Fixes #113810	2024-10-28 06:59:41 -07:00
Yingwei Zheng	f78610af3f	[InstCombine] Add function attribute `instcombine-no-verify-fixpoint` (#113822 ) This patch introduces a function attribute `instcombine-no-verify-fixpoint` to avoids disabling fix-point verification for unrelated tests in the same file. Address comment https://github.com/llvm/llvm-project/pull/112642#discussion_r1804714387.	2024-10-28 17:45:08 +08:00
Yingwei Zheng	5155c38cee	[InstCombine] Don't check uses of constant exprs (#113684 ) This patch skips constant expressions to avoid iterating over uses on other functions. Fix crash reported in https://github.com/llvm/llvm-project/pull/105510#issuecomment-2437521147.	2024-10-28 15:09:20 +08:00
David Majnemer	5d4a0d54b5	[InstCombine] Teach takeLog2 about right shifts, truncation and bitwise-and We left some easy opportunities for further simplifications. log2(trunc(x)) is simply trunc(log2(x)). This is safe if we know that trunc is NUW because it means that the truncation didn't drop any bits. It is also safe if the caller is OK with zero as a possible answer. log2(x >>u y) is simply `log2(x) - y`. log2(x & y) is a funny one. It comes up when doing something like: ``` unsigned int f(unsigned int x, unsigned int y) { unsigned char a = 1u << x; return y / a; } ``` LLVM would canonicalize this to: ``` %shl = shl nuw i32 1, %x %conv1 = and i32 %shl, 255 %div = udiv i32 %y, %conv1 ``` In cases like these, we can ignore the mask entirely. This is equivalent to `y >> x`.	2024-10-28 05:13:04 +00:00
Florian Hahn	7fe149cdf0	[VPlan] Replace getIRBasicBlock with IRBB in VPIRBB::execute (NFC). Suggested in https://github.com/llvm/llvm-project/pull/109975. This makes the function consistent throughout.	2024-10-27 16:22:18 +01:00
Eirik Byrkjeflot Anonsen	d2e9532fe1	[DemoteRegToStack] Use correct variable for branch instructions in DemoteRegToStack (#113798 ) I happened to see this code, and it seems "obviously" wrong to me. So here's what I think this code is supposed to look like.	2024-10-27 17:09:39 +08:00
Teresa Johnson	355e6948d4	[MemProf] Fix clone edge comparison (#113753 ) The issue fixed in PR113337 exposed a bug in the comparisons done in allocTypesMatch, which compares a vector of alloc types to those in the given vector of Edges. The form of std::equal used, which didn't provide the end iterator for the Edges vector, will iterate through as many entries in the Edges vector as in the InAllocTypes vector, which can fail if there are fewer entries in the Edges vector, because we may dereference a bogus Edge pointer. This function is called twice, once for the Node, with its callee edges, in which case the number of edges should always match the number of entries in allocTypesMatch, which is computed from the Node's callee edges. It was also called for Node's clones, and it turns out that after cloning and edge modifications done for other allocations, the number of callee edges in Node and its clones may no longer match. In some cases, more common with memprof ICP before the PR113337, the number of clone edges can be smaller leading to a bad dereference. I found for a large application even before adding memprof ICP support we sometimes call this with fewer entries in the clone's callee edges, but were getting lucky as they had allocation type None, and we didn't end up attempting to dereference the bad edge pointer. Fix this by passing Edges.end() to std::equal, which means std::equal will fail if the number of entries in the 2 vectors are not equal. However, this is too conservative, as clone edges may have been added or removed since it was initially cloned, and in fact can be wrong as we may not be comparing allocation types corresponding to the same callee. Therefore, a couple of enhancements are made to avoid regressing and improve the checking and cloning: - Don't bother calling the alloc type comparison when the clone and the Node's alloc type for the current allocation are precise (have a single allocation type) and are the same (which is guaranteed by an earlier check, and an assert is added to confirm that). In that case we can trivially determine that the clone can be used. - Split the alloc type matching handling into a separate function for the clone case. In that case, for each of the InAllocType entries, attempt to find and compare to the clone callee edge with the same callee as the corresponding original node callee. To create a test case I needed to take a spec application (xalancbmk), and repeatedly apply random hot/cold-ness to the memprof contexts when building, until I hit the problematic case. I then reduced that full LTO IR using llvm-reduce and then manually.	2024-10-26 20:53:20 -07:00
Kyungwoo Lee	1941c5180b	Reland (2nd attempt) [StructuralHash] Refactor (#112621 ) This is largely NFC, and it prepares for #112638. - Use stable_hash instead of uint64_t - Rename update* to hash* functions. They compute stable_hash locally and return it. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.	2024-10-26 16:12:28 -07:00
Kyungwoo Lee	d104b8e827	Revert "Reland [StructuralHash] Refactor (#112621 )" This reverts commit 98ca9a635bd2fb98cee473a9558687a5b522e219.	2024-10-26 13:55:46 -07:00
Kyungwoo Lee	98ca9a635b	Reland [StructuralHash] Refactor (#112621 ) This is largely NFC, and it prepares for #112638. - Use stable_hash instead of uint64_t - Rename update* to hash* functions. They compute stable_hash locally and return it. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.	2024-10-26 12:07:57 -07:00
Kyungwoo Lee	9672375623	Revert "[StructuralHash] Refactor (#112621 )" This reverts commit b667d161f0a9ff6b29cda0ccdb0081610c1e8b8c.	2024-10-26 09:55:21 -07:00
Kyungwoo Lee	b667d161f0	[StructuralHash] Refactor (#112621 ) This is largely NFC, and it prepares for #112638. - Use stable_hash instead of uint64_t - Rename update* to hash* functions. They compute stable_hash locally and return it. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.	2024-10-26 09:20:26 -07:00
Shih-Po Hung	266ff98cba	[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974 ) Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the runtime VF, similar to #95305. Since only reverse vector pointers require the runtime VF, the patch sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for reverse vector pointers. As a result, the generation of reverse vector pointers is moved into a separate recipe.	2024-10-26 23:18:50 +08:00
davidtrevelyan	4102625380	[rtsan][llvm][NFC] Rename sanitize_realtime_unsafe attr to sanitize_realtime_blocking (#113155 ) # What This PR renames the newly-introduced llvm attribute `sanitize_realtime_unsafe` to `sanitize_realtime_blocking`. Likewise, sibling variables such as `SanitizeRealtimeUnsafe` are renamed to `SanitizeRealtimeBlocking` respectively. There are no other functional changes. # Why? - There are a number of problems that can cause a function to be real-time "unsafe", - we wish to communicate what problems rtsan detects and why they're unsafe, and - a generic "unsafe" attribute is, in our opinion, too broad a net - which may lead to future implementations that need extra contextual information passed through them in order to communicate meaningful reasons to users. - We want to avoid this situation and make the runtime library boundary API/ABI as simple as possible, and - we believe that restricting the scope of attributes to names like `sanitize_realtime_blocking` is an effective means of doing so. We also feel that the symmetry between `[[clang::blocking]]` and `sanitize_realtime_blocking` is easier to follow as a developer. # Concerns - I'm aware that the LLVM attribute `sanitize_realtime_unsafe` has been part of the tree for a few weeks now (introduced here: https://github.com/llvm/llvm-project/pull/106754). Given that it hasn't been released in version 20 yet, am I correct in considering this to not be a breaking change?	2024-10-26 13:06:11 +01:00
Vasileios Porpodas	1540f772c7	Reapply "[SandboxVec][Legality] Reject non-instructions (#113190 )" This reverts commit eb9f4756bc3daaa4b19f4f46521dc05180814de4.	2024-10-25 12:55:58 -07:00
Vasileios Porpodas	eb9f4756bc	Revert "[SandboxVec][Legality] Reject non-instructions (#113190 )" This reverts commit 6c9bbbc818ae8a0d2849dbc1ebd84a220cc27d20.	2024-10-25 12:52:31 -07:00
vporpo	6c9bbbc818	[SandboxVec][Legality] Reject non-instructions (#113190 )	2024-10-25 12:47:19 -07:00
Florian Hahn	e724226da7	[VPlan] Return cost of 0 for VPWidenCastRecipe without underlying value. In some cases, VPWidenCastRecipes are created but not considered in the legacy cost model, including truncates/extends when evaluating a reduction in a smaller type. Return 0 for such casts for now, to avoid divergences between VPlan and legacy cost models. Fixes https://github.com/llvm/llvm-project/issues/113526.	2024-10-25 21:25:44 +02:00
Florian Hahn	9648271a3c	[LV] Pass flag indicating epilogue is vectorized to executePlan (NFC) This clarifies the flag, which is now only passed if the epilogue loop is being vectorized.	2024-10-25 20:39:47 +02:00
Teresa Johnson	144ddca9ed	[MemProf] Avoid duplicate edges between nodes (#113337 ) The recent change to add support for cloning indirect calls inadvertantly caused duplicate edges to be created between the same caller/callee pair. This is due to the new moveCalleeEdgeToNewCaller not properly guarding the addition of a new edge (ironically I was testing for that in an assertion, but failed to handle that case specially otherwise). Now simply move the context ids over to any existing edge. This issue in turn led to some assumptions in cloning being violated, resulting in a later crash. Add a test for this case to checkNode.	2024-10-25 11:09:57 -07:00
Jay Foad	90cdc03e7f	[IR] Fix undiagnosed cases of structs containing scalable vectors (#113455 ) Type::isScalableTy and StructType::containsScalableVectorType failed to detect some cases of structs containing scalable vectors because containsScalableVectorType did not call back into isScalableTy to check the element types. Fix this, which requires sharing the same Visited set in both functions. Also change the external API so that callers are never required to pass in a Visited set, and normalize the naming to isScalableTy.	2024-10-25 12:56:10 +01:00
Sergio Afonso	d87964de78	[OpenMP][OMPIRBuilder] Error propagation across callbacks (#112533 ) This patch implements an approach to communicate errors between the OMPIRBuilder and its users. It introduces `llvm::Error` and `llvm::Expected` objects to replace the values returned by callbacks passed to `OMPIRBuilder` codegen functions. These functions then check the result for errors when callbacks are called and forward them back to the caller, which has the flexibility to recover, exit cleanly or dump a stack trace. This prevents a failed callback to leave the IR in an invalid state and still continue the codegen process, triggering unrelated assertions or segmentation faults. In the case of MLIR to LLVM IR translation of the 'omp' dialect, this change results in the compiler emitting errors and exiting early instead of triggering a crash for not-yet-implemented errors. The behavior in Clang and openmp-opt stays unchanged, since callbacks will continue always returning 'success'.	2024-10-25 11:30:16 +01:00
Farzon Lotfi	21b3769d1d	[Scalarizer] Fix to only scalarize if intrinsic was marked as isTriviallyScalarizable (#113625 ) fixes #113624	2024-10-24 23:26:02 -07:00
Vitaly Buka	cf8d24531e	[msan] Reduces overhead of #113200 , by 10% (#113201 ) CTMark #113200 size overhead was 5.3%, now it's 4.7%. The patch affects only signed integers. https://alive2.llvm.org/ce/z/Lv5hyi * The patch replaces code which extracted sign bit, maximized/minimized it, then packed it back, with simple sign bit flip. The another way to think about transformation is as a subtraction of MIN_SINT from A/B. Then we map MIN_SINT to 0, 0 to -MIN_SINT, and MAX_SINT to MAX_UINT. * Then to maximize/minimize A/B we don't need to extract sign bit, we can apply shadow the same way as to other bits. * After sign bit flip, we had to switch to unsigned version of the predicates. * After change above getHighestPossibleValue/getLowestPossibleValue became very similar, so we can combine into a single function. * Because the function does sign bit flip and requires unsigned predicates used for returned values, there is no point in keeping it as a member of class, to hide, we switch to function local lambda.	2024-10-24 20:46:49 -07:00
Haopeng Liu	a31ce36f56	Apply initializes attribute to DSE (#113630 ) retry #107282 Fixed with `MadeChange \|= Changed;` and confirmed it works. ``` cmake -DLLVM_CCACHE_BUILD=ON -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON -DLLVM_ENABLE_WERROR=OFF -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS=-U_GLIBCXX_DEBUG '-DLLVM_LIT_ARGS=-v -vv -j96' '-DLLVM_ENABLE_PROJECTS=llvm;lld' -DLLVM_ENABLE_ASSERTIONS=ON -GNinja ../llvm ninja check-llvm ```	2024-10-24 18:43:20 -07:00
Florian Hahn	7b9f988a53	[VPlan] Limit stride replacement to vector region and middle VPBB (NFC). At the moment this in NFC, but ensures we only replace uses that are dominated by runtime checks as we model more of the skeleton in VPlan.	2024-10-24 15:08:36 -07:00
Alexey Bataev	e914421d7f	[SLP]Do correct signedness analysis for externally used scalars If the scalars is used externally is in the root node, it may have incorrect signedness info because of the conflict with the demanded bits analysis. Need to perform exact signedness analysis and compute it rather than rely on the precomputed value, which might be incorrect for alternate zext/sext nodes. Fixes #113520	2024-10-24 08:59:24 -07:00
Arthur Eubanks	3cec720449	Revert "[DSE] Apply initializes attribute to DSE" (#113589 ) Reverts llvm/llvm-project#107282 Seems to be causing invalid analysis caching as mentioned in https://github.com/llvm/llvm-project/pull/107282#issuecomment-2435083978.	2024-10-24 08:51:31 -07:00
Alexey Bataev	d2e7ee77d3	[SLP]Do not check for clustered loads only Since SLP support "clusterization" of the non-load instructions, the restriction for reduced values for loads only should be removed to avoid compiler crash. Fixes #113516	2024-10-24 08:16:42 -07:00
Alexey Bataev	cb5046da26	[SLP]Do not ignore undefs when trying to replace with "poisonous" shuffles Need to consider undefs correctly, when trying to replace them with potentially poisonous values in shuffles. Such elements should not be silently replaced by poison values, instead complex analysis should be implemented to see if it is safe to do it. Fixes #113425	2024-10-24 07:47:23 -07:00
Nashe Mncube	e37d736def	Recommit: [llvm][ARM][GlobalOpt]Add widen global arrays pass (#113289 ) This is a recommit of #107120 . The original PR was approved but failed buildbot. The newly added tests should only be run for compilers that support the ARM target. This has been resolved by adding a config file for these tests. - Pass optimizes memcpy's by padding out destinations and sources to a full word to make ARM backend generate full word loads instead of loading a single byte (ldrb) and/or half word (ldrh). Only pads destination when it's a stack allocated constant size array and source when it's constant string. Heuristic to decide whether to pad or not is very basic and could be improved to allow more examples to be padded. - Pass works at the midend level	2024-10-24 10:12:01 +01:00
Mingming Liu	60944177b8	[WPD][ThinLTO]Add cutoff option for WPD (#113383 ) This option applies for _import_ WPD (i.e., when `DevirtModule` pass de-virtualizes according to an imported summary, in ThinLTO backend pipeline). It's meant for debugging (e.g., bisection).	2024-10-23 23:47:27 -07:00
Haopeng Liu	089237c0d0	[DSE] Apply initializes attribute to DSE (#107282 ) Apply the initializes attribute to DSE and guard with a flag, "enable-dse-initializes-attr-improvement". The attribute support has been landed in: https://github.com/llvm/llvm-project/pull/84803 The attribute inference will be landed after this PR: https://github.com/llvm/llvm-project/pull/97373	2024-10-23 22:18:59 -07:00
Thomas Fransham	b8fddca7bd	[llvm] Support llvm::Any across shared libraries on windows (#108051 ) This is part of the effort to support for enabling plugins on windows by adding better support for building llvm as a DLL. The export macros used here were added in #96630 Since shared library symbols aren't deduplicated across multiple libraries on windows like Linux we have to manually explicitly import and export `Any::TypeId` template instantiations for the uses of `llvm::Any` in the LLVM codebase to support LLVM Windows shared library builds. This change ensures that external code, including LLVM's own tests, can use PassManager callbacks when LLVM is built as a DLL. I also removed the only use of llvm::Any for LoopNest that only existed in debug code and there also doesn't seem to be any code creating `Any<LoopNest>`	2024-10-24 08:07:13 +03:00
Florian Hahn	ef217a0f6b	[VPlan] Introduce and use getVectorPreheader (NFC). Introduce a dedicated function to retrieve the vector preheader. This ensures the correct block is used, even if the skeleton is exetended.	2024-10-23 21:01:52 -07:00

1 2 3 4 5 ...

37965 Commits