llvm-project

Author	SHA1	Message	Date
Hari Limaye	e19a5fc6d3	[FuncSpec] Improve accounting of specialization codesize growth (#113448 ) Only accumulate the codesize increase of functions that are actually specialized, rather than for every candidate specialization that we analyse. This fixes a subtle bug where prior analysis of candidate specializations that were deemed unprofitable could prevent subsequent profitable candidates from being recognised.	2024-10-29 11:53:12 +00:00
Hari Limaye	06664fdc76	[FuncSpec] Enable SpecializeLiteralConstant by default (#113442 ) Enable specialization on literal constant arguments by default in Function Specialization. --------- Co-authored-by: Alexandros Lamprineas <alexandros.lamprineas@arm.com>	2024-10-29 11:41:25 +00:00
Rohit Aggarwal	dfb60bb919	Adding more vector calls for -fveclib=AMDLIBM (#109662 ) AMD has it's own implementation of vector calls. New vector calls are introduced in the library for exp10, log10, sincos and finite asin/acos Please refer [https://github.com/amd/aocl-libm-ose] --------- Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>	2024-10-29 10:09:55 +00:00
Yingwei Zheng	18311093ab	[InstCombine] Do not fold `shufflevector(select)` if the select condition is a vector (#113993 ) Since `shufflevector` is not element-wise, we cannot do fold it into select when the select condition is a vector. For shufflevector that doesn't change the length, it doesn't crash, but it is still a miscompilation: https://alive2.llvm.org/ce/z/s8saCx Fixes https://github.com/llvm/llvm-project/issues/113986.	2024-10-29 10:39:07 +08:00
c8ef	0c1c37bfbe	[TLI] Add support for the `tgamma` libcall. (#113791 ) This patch adds the `tgamma` libcall.	2024-10-29 10:08:38 +08:00
Igor Kudrin	757d0e4764	Revert "[CFI][LowerTypeTests] Fix indirect call with alias" (#113978 ) Reverts llvm/llvm-project#106185 This is breaking Sanitizer bots: https://lab.llvm.org/buildbot/#/builders/66/builds/5449/steps/8/logs/stdio	2024-10-28 16:13:32 -07:00
David Majnemer	902acde341	[InstCombine] Optimize away certain additions using modular arithmetic We can turn: ``` %add = add i8 %arg, C1 %and = and i8 %add, C2 %cmp = icmp eq i1 %and, C3 ``` into: ``` %and = and i8 %arg, C2 %cmp = icmp eq i1 %and, (C3 - C1) & C2 ``` This is only worth doing if the sequence is the sole user of the addition operation.	2024-10-28 22:51:35 +00:00
Matthias Braun	5903c6af44	InstCombine: Fold shufflevector(select) and shufflevector(phi) (#113746 ) - Transform `shufflevector(select(c, x, y), C)` to `select(c, shufflevector(x, C), shufflevector(y, C))` by re-using the `FoldOpIntoSelect` helper. - Transform `shufflevector(phi(x, y), C)` to `phi(shufflevector(x, C), shufflevector(y, C))` by re-using the `foldOpInotPhi` helper.	2024-10-28 15:35:17 -07:00
Igor Kudrin	67bcce2141	[CFI][LowerTypeTests] Fix indirect call with alias (#106185 ) Motivation example: ``` > cat test.cpp extern "C" [[gnu::weak]] void f() {} void alias() __attribute__((alias("f"))); int main() { auto p = alias; p(); } > clang test.cpp -fsanitize=cfi-icall -flto=thin -fuse-ld=lld > ./a.out [1] 1868 illegal hardware instruction ./a.out ``` If the address of a function was only taken through its alias, the function was not considered exported and therefore was not included in the CFI jumptable. This resulted in `@llvm.type.test()` being lowered to `false`, and consequently the indirect call to the function was eventually optimized to `ubsantrap()`.	2024-10-28 13:14:42 -07:00
Florian Hahn	0d0abb351b	[VPlan] Use ResumePhi to create reduction resume phis. (#110004 ) Use VPInstruction::ResumePhi to create phi nodes for reduction resume values in the scalar preheader, similar to how ResumePhis are used for first-order recurrence resume values after 9a5a8731e77. This allows simplifying createAndCollectMergePhiForReduction to only collect reduction resume phis when vectorizing epilogue loops and adding extra incoming edges from the main vector loop. Updating phis for the epilogue vector loops requires special attention, because additional incoming values from the bypass blocks need to be added. PR: https://github.com/llvm/llvm-project/pull/110004	2024-10-28 20:14:08 +01:00
Lei Wang	e517cfc531	[InstrPGO] Support cold function coverage instrumentation (#109837 ) This patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead. More details about the implementation and flags: - Added a flag `--pgo-instrument-cold-function-only` in `PGOInstrumentation.cpp` as the main switch to control skipping the instrumentation. - Built the extra instrumentation passes(a bundle of passes in `addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by `--instrument-cold-function-only-path` flag. - Added a driver flag `-fprofile-generate-cold-function-coverage`: - 1) Config the flags in one place, i,e. adding `--instrument-cold-function-only-path=<...>` and `--pgo-function-entry-coverage`. Note that the instrumentation file path is passed through `--instrument-sample-cold-function-path`, because we cannot use the `PGOOptions.ProfileFile` as it's already used by `-fprofile-sample-use=<...>`. - 2) makes linker to link `compiler_rt.profile` lib(see [ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131) ). - Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry count to determine cold function. Overall, the full command is like: ``` clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...> code.cc -o code ```	2024-10-28 10:13:45 -07:00
Sushant Gokhale	c9f01f699c	[SLP][AArch64][NFC] Add more tests for SLP vectorization of div (#113876 ) Currently, we dont have much tests that show SLP outcome for integer divisions. This patch adds tests for same. In certain scenarios, for Neon, vectorization is profitable. An attempt would be made in future to improve the cost-model for the same.	2024-10-28 20:37:41 +05:30
Alexey Bataev	7152bf3bc8	[SLP]Do not create new vector node if scalars fully overlap with the existing one If the list of scalars vectorized as the part of the same vector node, no need to generate vector node again, it will be handled as part of overlapping matching. Fixes #113810	2024-10-28 06:59:41 -07:00
Yingwei Zheng	f78610af3f	[InstCombine] Add function attribute `instcombine-no-verify-fixpoint` (#113822 ) This patch introduces a function attribute `instcombine-no-verify-fixpoint` to avoids disabling fix-point verification for unrelated tests in the same file. Address comment https://github.com/llvm/llvm-project/pull/112642#discussion_r1804714387.	2024-10-28 17:45:08 +08:00
Yingwei Zheng	5155c38cee	[InstCombine] Don't check uses of constant exprs (#113684 ) This patch skips constant expressions to avoid iterating over uses on other functions. Fix crash reported in https://github.com/llvm/llvm-project/pull/105510#issuecomment-2437521147.	2024-10-28 15:09:20 +08:00
Serge Pavlov	819abe412d	[Test] Fix usage of constrained intrinsics (#113523 ) Some tests contain errors in constrained intrinsic usage, such as missed or extra type parameters, wrong type parameters order and some other. --------- Co-authored-by: Andy Kaylor <andy_kaylor@yahoo.com>	2024-10-28 14:07:32 +07:00
David Majnemer	5d4a0d54b5	[InstCombine] Teach takeLog2 about right shifts, truncation and bitwise-and We left some easy opportunities for further simplifications. log2(trunc(x)) is simply trunc(log2(x)). This is safe if we know that trunc is NUW because it means that the truncation didn't drop any bits. It is also safe if the caller is OK with zero as a possible answer. log2(x >>u y) is simply `log2(x) - y`. log2(x & y) is a funny one. It comes up when doing something like: ``` unsigned int f(unsigned int x, unsigned int y) { unsigned char a = 1u << x; return y / a; } ``` LLVM would canonicalize this to: ``` %shl = shl nuw i32 1, %x %conv1 = and i32 %shl, 255 %div = udiv i32 %y, %conv1 ``` In cases like these, we can ignore the mask entirely. This is equivalent to `y >> x`.	2024-10-28 05:13:04 +00:00
Teresa Johnson	355e6948d4	[MemProf] Fix clone edge comparison (#113753 ) The issue fixed in PR113337 exposed a bug in the comparisons done in allocTypesMatch, which compares a vector of alloc types to those in the given vector of Edges. The form of std::equal used, which didn't provide the end iterator for the Edges vector, will iterate through as many entries in the Edges vector as in the InAllocTypes vector, which can fail if there are fewer entries in the Edges vector, because we may dereference a bogus Edge pointer. This function is called twice, once for the Node, with its callee edges, in which case the number of edges should always match the number of entries in allocTypesMatch, which is computed from the Node's callee edges. It was also called for Node's clones, and it turns out that after cloning and edge modifications done for other allocations, the number of callee edges in Node and its clones may no longer match. In some cases, more common with memprof ICP before the PR113337, the number of clone edges can be smaller leading to a bad dereference. I found for a large application even before adding memprof ICP support we sometimes call this with fewer entries in the clone's callee edges, but were getting lucky as they had allocation type None, and we didn't end up attempting to dereference the bad edge pointer. Fix this by passing Edges.end() to std::equal, which means std::equal will fail if the number of entries in the 2 vectors are not equal. However, this is too conservative, as clone edges may have been added or removed since it was initially cloned, and in fact can be wrong as we may not be comparing allocation types corresponding to the same callee. Therefore, a couple of enhancements are made to avoid regressing and improve the checking and cloning: - Don't bother calling the alloc type comparison when the clone and the Node's alloc type for the current allocation are precise (have a single allocation type) and are the same (which is guaranteed by an earlier check, and an assert is added to confirm that). In that case we can trivially determine that the clone can be used. - Split the alloc type matching handling into a separate function for the clone case. In that case, for each of the InAllocType entries, attempt to find and compare to the clone callee edge with the same callee as the corresponding original node callee. To create a test case I needed to take a spec application (xalancbmk), and repeatedly apply random hot/cold-ness to the memprof contexts when building, until I hit the problematic case. I then reduced that full LTO IR using llvm-reduce and then manually.	2024-10-26 20:53:20 -07:00
Kyungwoo Lee	1941c5180b	Reland (2nd attempt) [StructuralHash] Refactor (#112621 ) This is largely NFC, and it prepares for #112638. - Use stable_hash instead of uint64_t - Rename update* to hash* functions. They compute stable_hash locally and return it. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.	2024-10-26 16:12:28 -07:00
Kyungwoo Lee	d104b8e827	Revert "Reland [StructuralHash] Refactor (#112621 )" This reverts commit 98ca9a635bd2fb98cee473a9558687a5b522e219.	2024-10-26 13:55:46 -07:00
Kyungwoo Lee	98ca9a635b	Reland [StructuralHash] Refactor (#112621 ) This is largely NFC, and it prepares for #112638. - Use stable_hash instead of uint64_t - Rename update* to hash* functions. They compute stable_hash locally and return it. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.	2024-10-26 12:07:57 -07:00
Kyungwoo Lee	9672375623	Revert "[StructuralHash] Refactor (#112621 )" This reverts commit b667d161f0a9ff6b29cda0ccdb0081610c1e8b8c.	2024-10-26 09:55:21 -07:00
Kyungwoo Lee	b667d161f0	[StructuralHash] Refactor (#112621 ) This is largely NFC, and it prepares for #112638. - Use stable_hash instead of uint64_t - Rename update* to hash* functions. They compute stable_hash locally and return it. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.	2024-10-26 09:20:26 -07:00
Shih-Po Hung	266ff98cba	[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974 ) Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the runtime VF, similar to #95305. Since only reverse vector pointers require the runtime VF, the patch sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for reverse vector pointers. As a result, the generation of reverse vector pointers is moved into a separate recipe.	2024-10-26 23:18:50 +08:00
Matthias Braun	054c23d78f	X86: Improve cost model of fp16 conversion (#113195 ) Improve cost-modeling for x86 __fp16 conversions so the SLPVectorizer transforms the patterns: - Override `X86TTIImpl::getStoreMinimumVF` to report a minimum VF of 4 (SSE register can hold 4xfloat converted/stored to 4xf16) this is necessary as fp16 stores are neither modeled as trunc-stores nor can we mark direct Xxfp16 stores as legal as we generally expand fp16 operations). - Add missing cost entries to `X86TTIImpl::getCastInstrCost` conversion from/to fp16. Note that conversion from f64 to f16 is not supported by an X86 instruction.	2024-10-25 16:22:24 -07:00
Florian Hahn	e724226da7	[VPlan] Return cost of 0 for VPWidenCastRecipe without underlying value. In some cases, VPWidenCastRecipes are created but not considered in the legacy cost model, including truncates/extends when evaluating a reduction in a smaller type. Return 0 for such casts for now, to avoid divergences between VPlan and legacy cost models. Fixes https://github.com/llvm/llvm-project/issues/113526.	2024-10-25 21:25:44 +02:00
ssijaric-nv	14db069468	[InstCombine] Fix a cycle when folding fneg(select) with scalable vector types (#112465 ) The two folding operations are causing a cycle for the following case with scalable vector types: define <vscale x 2 x double> @test_fneg_select_abs(<vscale x 2 x i1> %cond, <vscale x 2 x double> %b) { %1 = select <vscale x 2 x i1> %cond, <vscale x 2 x double> zeroinitializer, <vscale x 2 x double> %b %2 = fneg fast <vscale x 2 x double> %1 ret <vscale x 2 x double> %2 } 1) fold fneg: -(Cond ? C : Y) -> Cond ? -C : -Y 2) fold select: (Cond ? -X : -Y) -> -(Cond ? X : Y) 1) results in the following since '<vscale x 2 x double> zeroinitializer' passes the check for the immediate constant: %.neg = fneg fast <vscale x 2 x double> zeroinitializer %b.neg = fneg fast <vscale x 2 x double> %b %1 = select fast <vscale x 2 x i1> %cond, <vscale x 2 x double> %.neg, <vscale x 2 x double> %b.neg and so we end up going back and forth between 1) and 2). Attempt to fold scalable vector constants, so that we end up with a splat instead: define <vscale x 2 x double> @test_fneg_select_abs(<vscale x 2 x i1> %cond, <vscale x 2 x double> %b) { %b.neg = fneg fast <vscale x 2 x double> %b %1 = select fast <vscale x 2 x i1> %cond, <vscale x 2 x double> shufflevector (<vscale x 2 x double> insertelement (<vscale x 2 x double> poison, double -0.000000e+00, i64 0), <vscale x 2 x double> poison, <vscale x 2 x i32> zeroinitializer), <vscale x 2 x double> %b.neg ret <vscale x 2 x double> %1 }	2024-10-25 10:47:39 -07:00
Jonas Paulsson	aba39c3974	[System] Precommit of test for #112491 (#113704 )	2024-10-25 17:40:00 +02:00
David Green	577c7dd7cc	[AArch64] Add a phase-ordering test for vectorizing predicated selects. NFC	2024-10-25 15:20:24 +01:00
Farzon Lotfi	21b3769d1d	[Scalarizer] Fix to only scalarize if intrinsic was marked as isTriviallyScalarizable (#113625 ) fixes #113624	2024-10-24 23:26:02 -07:00
Haopeng Liu	a31ce36f56	Apply initializes attribute to DSE (#113630 ) retry #107282 Fixed with `MadeChange \|= Changed;` and confirmed it works. ``` cmake -DLLVM_CCACHE_BUILD=ON -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON -DLLVM_ENABLE_WERROR=OFF -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS=-U_GLIBCXX_DEBUG '-DLLVM_LIT_ARGS=-v -vv -j96' '-DLLVM_ENABLE_PROJECTS=llvm;lld' -DLLVM_ENABLE_ASSERTIONS=ON -GNinja ../llvm ninja check-llvm ```	2024-10-24 18:43:20 -07:00
Tex Riddell	c03d09ce3e	[aarch64] atan2 intrinsic lowering (p5) (#112611 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 - `VecFuncs.def`: define intrinsic to sleef/armpl mapping - `LegalizerHelper.cpp`: add missing fewerElementsVector handling for the new atan2 intrinsic - `AArch64ISelLowering.cpp`: Add arch64 specializations for lowering like neon instructions - `AArch64LegalizerInfo.cpp`: Legalize atan2. Part 5 for Implement the atan2 HLSL Function #70096.	2024-10-24 17:53:12 -07:00
Alexey Bataev	e914421d7f	[SLP]Do correct signedness analysis for externally used scalars If the scalars is used externally is in the root node, it may have incorrect signedness info because of the conflict with the demanded bits analysis. Need to perform exact signedness analysis and compute it rather than rely on the precomputed value, which might be incorrect for alternate zext/sext nodes. Fixes #113520	2024-10-24 08:59:24 -07:00
Arthur Eubanks	3cec720449	Revert "[DSE] Apply initializes attribute to DSE" (#113589 ) Reverts llvm/llvm-project#107282 Seems to be causing invalid analysis caching as mentioned in https://github.com/llvm/llvm-project/pull/107282#issuecomment-2435083978.	2024-10-24 08:51:31 -07:00
Alexey Bataev	d2e7ee77d3	[SLP]Do not check for clustered loads only Since SLP support "clusterization" of the non-load instructions, the restriction for reduced values for loads only should be removed to avoid compiler crash. Fixes #113516	2024-10-24 08:16:42 -07:00
Alexey Bataev	cb5046da26	[SLP]Do not ignore undefs when trying to replace with "poisonous" shuffles Need to consider undefs correctly, when trying to replace them with potentially poisonous values in shuffles. Such elements should not be silently replaced by poison values, instead complex analysis should be implemented to see if it is safe to do it. Fixes #113425	2024-10-24 07:47:23 -07:00
Nashe Mncube	e37d736def	Recommit: [llvm][ARM][GlobalOpt]Add widen global arrays pass (#113289 ) This is a recommit of #107120 . The original PR was approved but failed buildbot. The newly added tests should only be run for compilers that support the ARM target. This has been resolved by adding a config file for these tests. - Pass optimizes memcpy's by padding out destinations and sources to a full word to make ARM backend generate full word loads instead of loading a single byte (ldrb) and/or half word (ldrh). Only pads destination when it's a stack allocated constant size array and source when it's constant string. Heuristic to decide whether to pad or not is very basic and could be improved to allow more examples to be padded. - Pass works at the midend level	2024-10-24 10:12:01 +01:00
Mingming Liu	60944177b8	[WPD][ThinLTO]Add cutoff option for WPD (#113383 ) This option applies for _import_ WPD (i.e., when `DevirtModule` pass de-virtualizes according to an imported summary, in ThinLTO backend pipeline). It's meant for debugging (e.g., bisection).	2024-10-23 23:47:27 -07:00
Haopeng Liu	089237c0d0	[DSE] Apply initializes attribute to DSE (#107282 ) Apply the initializes attribute to DSE and guard with a flag, "enable-dse-initializes-attr-improvement". The attribute support has been landed in: https://github.com/llvm/llvm-project/pull/84803 The attribute inference will be landed after this PR: https://github.com/llvm/llvm-project/pull/97373	2024-10-23 22:18:59 -07:00
Florian Hahn	2dfb1c664c	[VPlan] Try to hoist Previous (and operands), if sinking fails for FORs. (#108945 ) In some cases, Previous (and its operands) can be hoisted. This allows supporting additional cases where sinking of all users of to FOR fails, e.g. due having to sink recipes with side-effects. This fixes a crash where we fail to create a scalar VPlan for a first-order recurrence, but can create a vector VPlan, because the trunc instruction of an IV which generates the previous value of the recurrence has been optimized to a truncated induction recipe, thus hoisting it to the beginning. Fixes https://github.com/llvm/llvm-project/issues/106523. PR: https://github.com/llvm/llvm-project/pull/108945	2024-10-23 13:12:03 -07:00
Alexey Bataev	b65b2b4ab6	[SLP]Expand vector to the whole register size in extracts adjustment Need to expand the number of elements to the whole register to correctly process estimation and avoid compiler crash. Fixes #113462	2024-10-23 12:04:40 -07:00
Alexey Bataev	a3508e0246	[SLP]Small buidlvector only graph should contains scalars from same block If the graph is small and has single buildvector node, all scalars instructions must be from the same basic block to prevent compiler crash. Fixes #113451	2024-10-23 10:46:38 -07:00
Noah Goldstein	294726d738	Reapply "[InstCombine] Folding `(icmp eq/ne (and X, -P2), INT_MIN)`" (#111236 ) The underlying issue with msan was fixed by #113200	2024-10-23 09:12:08 -05:00
Hari Limaye	c6931c2525	[FuncSpec] Only compute Latency bonus when necessary (#113159 ) Only compute the Latency component of a specialisation's Bonus when necessary, to avoid unnecessarily computing the Block Frequency Information for a Function.	2024-10-23 09:05:44 +01:00
Florian Hahn	ddbb382a7c	[LV] Regenerate check-lines for some tests.	2024-10-23 04:34:13 +01:00
Alex MacLean	4c1b1f6d21	[NVPTX] Add support for clamped funnel shift intrinsics (#113228 ) Add support for ``llvm.nvvm.fshl.clamp`` and ``llvm.nvvm.fshr.clamp`` intrinsics. These intrinsics are similar to the generic llvm funnel shift, except that the shift value is clamped to the integer width. Currently only ``i32`` is supported and is implemented with the `shf.[rl].clamp.b32` PTX instruction.	2024-10-22 16:39:44 -07:00
Michael O'Farrell	10f0c1aadd	[PGO] Ensure non-zero entry-count after `populateCounters` (#112029 ) With sampled instrumentation (#69535), profile counts may appear corrupt and `fixFuncEntryCount` may assert. In particular a function can have a 0 block count for its entry, while later blocks are non zero. This is only likely to happen for colder functions, so it is reasonable to take any action that does not crash. Here we simply bail from fixing the entry count.	2024-10-22 16:05:40 -07:00
Justin Fargnoli	8a12e0131f	Revert "[LLVM] Add IRNormalizer Pass" (#113392 ) Reverts llvm/llvm-project#68176 Introduced BuildBot failure: https://github.com/llvm/llvm-project/pull/68176#issuecomment-2428243474	2024-10-22 16:01:32 -07:00
Michael O'Farrell	b4fcaa137f	[PGO][SampledInstr] Correct off by 1s and allow 100% sampling (#113350 ) This corrects a couple off by ones related to the sampling of instrumented counters, and enables setting 100% rates for burst sampling (burst duration = period). Off by ones: Prior to this change it was impossible to set a period of 65535 because this was converted to fast sampling which rollsover at USHRT_MAX + 1 (65536). Similarly the burst durations would collect burst duration + 1 counts as they used an ULE comparison. 100% sampling: Although this is not useful for a productionized use case, it does allow for more deterministic testing with the sampling checks in place. After all the off by ones are fixed, allowing for 100% sampling is a matter of letting burst duration = period.	2024-10-22 16:01:13 -07:00
Paul Walker	5bb34803a4	[NFC] Migrate tests to use autoupdate for CHECK lines.	2024-10-22 12:55:15 +00:00

1 2 3 4 5 ...

30205 Commits