llvm-project

Author	SHA1	Message	Date
Min-Yih Hsu	64314dedeb	[InlineCost] Print inline cost for invoke call sites as well (#114476 ) Previously InlineCostAnnotationPrinter only prints inline cost for call instructions. I don't think there is any reason not to analyze invoke and its callee, and this patch adds such support.	2024-11-01 09:55:17 -07:00
Steven Perron	f405c683ba	[OPT] Search whole BB for convergence token. (#112728 ) The spec for llvm.experimental.convergence.entry says that is must be in the entry block for a function, and must preceed any other convergent operation. It does not have to be the first instruction in the entry block. Inlining assumes that the call to llvm.experimental.convergence.entry will be the first instruction after any phi instructions. This commit modifies inlining to search the entire block for the call.	2024-10-30 11:19:23 -04:00
goldsteinn	69a798a996	Reapply "[Inliner] Propagate more attributes to params when inlining (#91101 )" (2nd Attempt) (#112749 ) Root cause of the bug was code hanging onto `range` attr after changing BitWidth. This was fixed in PR #112633.	2024-10-17 20:28:47 -05:00
Arthur Eubanks	9e6d24f61f	Revert "[Inliner] Propagate more attributes to params when inlining (#91101 )" This reverts commit ae778ae7ce72219270c30d5c8b3d88c9a4803f81. Creates broken IR, see comments in #91101.	2024-10-16 21:21:34 +00:00
goldsteinn	ae778ae7ce	[Inliner] Propagate more attributes to params when inlining (#91101 ) - [Inliner] Add tests for propagating more parameter attributes; NFC - [Inliner] Propagate more attributes to params when inlining Add support for propagating: - `derefereancable` - `derefereancable_or_null` - `align` - `nonnull` - `range` These are only propagated if the parameter to the to-be-inlined callsite match the exact parameter used in the to-be-inlined function.	2024-10-16 11:53:21 -05:00
goldsteinn	3c777f04f0	[Inliner] Don't propagate access attr to byval params (#112256 ) - [Inliner] Add tests for bad propagationg of access attr for `byval` param; NFC - [Inliner] Don't propagate access attr to `byval` params We previously only handled the case where the `byval` attr was in the callbase's param attr list. This PR also handles the case if the `ByVal` was a param attr on the function's param attr list.	2024-10-15 09:25:16 -05:00
Shilei Tian	e34e27f198	[TTI][AMDGPU] Allow targets to adjust `LastCallToStaticBonus` via `getInliningLastCallToStaticBonus` (#111311 ) Currently we will not be able to inline a large function even if it only has one live use because the inline cost is still very high after applying `LastCallToStaticBonus`, which is a constant. This could significantly impact the performance because CSR spill is very expensive. This PR adds a new function `getInliningLastCallToStaticBonus` to TTI to allow targets to customize this value. Fixes SWDEV-471398.	2024-10-11 10:19:54 -04:00
Teresa Johnson	79b32bcda6	[MemProf] Strip callsite metadata when inlining an unprofiled callsite (#110998 ) We weren't flagging inlined callee functions with callsite but not memprof metadata correctly, leading to the callsite metadata not being stripped when that function was inlined into a callsite that didn't itself have callsite metadata. In practice, this meant that we went into the LTO link with many more calls than necessary having callsite metadata / summary records, which in turn made the graph larger than necessary. Fixing this oversight resulted in huge reductions in the thin link of a large target: 99% fewer duplicated context ids (recall we have to duplicate when callsites containing the same stack ids are in different functions) 71% fewer graph edges 17% fewer graph nodes 13% fewer functions cloned 44% smaller peak memory 47% smaller time	2024-10-03 08:06:56 -07:00
goldsteinn	a9352a0d31	[Inliner] Fix bug where attributes are propagated incorrectly (#109347 ) - [Inliner] Add tests for incorrect propagation of return attrs; NFC - [Inliner] Fix bug where attributes are propagated incorrectly The bug stems from the fact that we assume the new (inlined) callsite is calling the same function as the original (callee) callsite. While this is typically the case, since `VMap` simplifies the new instructions, callee intrinsics callsites can end up not corresponding with the same function. This can lead to buggy propagation.	2024-09-20 19:57:35 -05:00
Simon Pilgrim	b065ec0af5	[Inline][X86] Regenerate inline-target-cpu-* tests	2024-08-30 12:06:24 +01:00
Aiden Grossman	085587e1a9	Reland "[MLGO] Remove Python <3.8 from unsupported config (#106132 )" This reverts commit c3776c11c26e5c0e27b772e6694e6c76f73ac9e8. This relands commit a959d70eb5b6d47c0b32eb34fc409e50c01d722d. This was originally causing bot failures on Python version 3.8. This relanding fixes that by adjusting the relevant type annotations that are not supported in earlier versions.	2024-08-26 18:45:34 -07:00
Aiden Grossman	c3776c11c2	Revert "[MLGO] Remove Python <3.8 from unsupported config (#106132 )" This reverts commit a959d70eb5b6d47c0b32eb34fc409e50c01d722d. This was causing bot failures. https://lab.llvm.org/buildbot/#/builders/174/builds/3975	2024-08-26 23:36:56 +00:00
Aiden Grossman	a959d70eb5	[MLGO] Remove Python <3.8 from unsupported config (#106132 ) Now that Python 3.8 is the minimum version supported by LLVM, we don't need to explicitly check that the python version we are using is greater than 3.8 in the MLGO tests.	2024-08-26 13:57:43 -07:00
David Green	83a5c7cb62	[ConstantFolding] Ensure TLI is valid when simplifying fp128 intrinsics. TLI might not be valid for all contexts that constant folding is performed. Add a quick guard that it is not null.	2024-08-24 14:39:20 +01:00
Matt Arsenault	edded8d7b5	AMDGPU: Stop handling legacy amdgpu-unsafe-fp-atomics attribute (#101699 ) This is now autoupgraded to annotate atomicrmw instructions in old bitcode.	2024-08-13 22:02:25 +04:00
Andreas Jonson	04da77308f	Allow empty range attribute and add assert for full range (#100601 ) fix #99619	2024-08-08 18:07:09 +02:00
Sander de Smalen	fb470db7b3	[AArch64] Avoid inlining if ZT0 needs preserving. (#101343 ) Inlining may result in different behaviour when the callee clobbers ZT0, because normally the call-site will have code to preserve ZT0. When inlining the function this code to preserve ZT0 will no longer be emitted, and so the resulting behaviour of the program is changed.	2024-08-02 10:29:08 +01:00
Daniel Kiss	1782810b84	[Clang][ARM][AArch64] Alway emit protection attributes for functions. (#82819 ) So far branch protection, sign return address, guarded control stack attributes are only emitted as module flags to indicate the functions need to be generated with those features. The problem is in case of an LTO build the module flags are merged with the `min` rule which means if one of the module is not build with sign return address then the features will be turned off for all functions. Due to the functions take the branch-protection and sign-return-address features from the module flags. The sign-return-address is function level option therefore it is expected functions from files that is compiled with -mbranch-protection=pac-ret to be protected. The inliner might inline functions with different set of flags as it doesn't consider the module flags. This patch adds the attributes to all functions and drops the checking of the module flags for the code generation. Module flag is still used for generating the ELF markers. Also drops the "true"/"false" values from the branch-protection-enforcement, branch-protection-pauth-lr, guarded-control-stack attributes as presence of the attribute means it is on absence means off and no other option. Releand with test fixes.	2024-07-10 11:32:41 +02:00
Daniel Kiss	4b2daeccc7	Revert "[Clang][ARM][AArch64] Alway emit protection attributes for functions." (#98284 ) Reverts llvm/llvm-project#82819	2024-07-10 10:22:38 +02:00
Daniel Kiss	e15d67cfc2	[Clang][ARM][AArch64] Alway emit protection attributes for functions. (#82819 ) So far branch protection, sign return address, guarded control stack attributes are only emitted as module flags to indicate the functions need to be generated with those features. The problem is in case of an LTO build the module flags are merged with the `min` rule which means if one of the module is not build with sign return address then the features will be turned off for all functions. Due to the functions take the branch-protection and sign-return-address features from the module flags. The sign-return-address is function level option therefore it is expected functions from files that is compiled with -mbranch-protection=pac-ret to be protected. The inliner might inline functions with different set of flags as it doesn't consider the module flags. This patch adds the attributes to all functions and drops the checking of the module flags for the code generation. Module flag is still used for generating the ELF markers. Also drops the "true"/"false" values from the branch-protection-enforcement, branch-protection-pauth-lr, guarded-control-stack attributes as presence of the attribute means it is on absence means off and no other option.	2024-07-10 10:06:14 +02:00
Yingwei Zheng	be7239e5a6	[Inline] Remove bitcast handling in `CallAnalyzer::stripAndComputeInBoundsConstantOffsets` (#97988 ) As we are now using opaque pointers, bitcast handling is no longer needed. Closes https://github.com/llvm/llvm-project/issues/97590.	2024-07-09 15:08:04 +08:00
Arthur Eubanks	94471e6d23	[MLInliner] Handle CGSCC changes from #94815 (#96274 ) With #94815, the nodes belonging to dead functions are no longer invalidated, but kept around to batch delete at the end of the call graph walk. The ML inliner needs to be updated to handle this. This fixes some asserts getting hit, e.g. https://crbug.com/348376263.	2024-07-03 10:14:49 -07:00
Daniil Fukalov	12c1156207	[NFC][AlwaysInliner] Reduce AlwaysInliner memory consumption. (#96958 ) Refactored AlwaysInliner to remove some of inlined functions earlier. Before the change AlwaysInliner walked through all functions in the module and inlined them into calls where it is appropriate. Removing of the dead inlined functions was performed only after all of inlining. For the test case from the issue [59126](https://github.com/llvm/llvm-project/issues/59126) compiler consumes all of the memory on 64GB machine, so is killed. The change checks if just inlined function can be removed from the module and removes it.	2024-07-02 10:43:49 +02:00
Matt Arsenault	e47359a925	Inline: Fix handling of byval using non-alloca addrspace (#97306 ) Use the address space of the original pointer argument instead of querying the datalayout. This avoids producing a verifier error since this was changing the address space for the user instructions. Fixes #97086	2024-07-01 21:09:41 +02:00
Mingming Liu	1518b260ce	[TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (#81442 ) Clang's `-fwhole-program-vtables` is required for this optimization to take place. If `-fwhole-program-vtables` is not enabled, this change is no-op. * Function-comparison (before): ``` %vtable = load ptr, ptr %obj %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1 %func = load ptr, ptr %vfn %cond = icmp eq ptr %func, @callee br i1 %cond, label bb1, label bb2: bb1: call @callee bb2: call %func ``` * VTable-comparison (after): ``` %vtable = load ptr, ptr %obj %cond = icmp eq ptr %vtable, @vtable-address-point br i1 %cond, label bb1, label bb2: bb1: call @callee bb2: %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1 %func = load ptr, ptr %vfn call %func ``` Key changes: 1. Find out virtual calls and the vtables they come from. - The ICP relies on type intrinsic `llvm.type.test` to find out virtual calls and the compatible vtables, and relies on type metadata to find the address point for comparison. 2. ICP pass does cost-benefit analysis and compares vtable only when the number of vtables for a function candidate is within (option specified) threshold. 3. Sink the function addressing and vtable load instruction to indirect fallback. - The sink helper functions are simplified versions of `InstCombinerImpl::tryToSinkInstruction`. Currently debug intrinsics are not handled. Ideally `InstCombinerImpl::tryToSinkInstructionDbgValues` and `InstCombinerImpl::tryToSinkInstructionDbgVariableRecords` could be moved into Transforms/Utils/Local.cpp (or another util cpp file) to handle debug intrinsics when moving instructions across basic blocks. 4. Keep value profiles updated 1) Update vtable value profiles after inline 2) For either function-based comparison or vtable-based comparison, update both vtable and indirect call value profiles.	2024-06-29 23:21:33 -07:00
Mircea Trofin	600ff28772	[mlgo] add 2 new features whether caller/callee is `available_externally` (#96585 ) AvailableExternally linkage is interesting because, in ThinLTO cases, it means the function may get elided if it survives inlining - see `elim-avail-extern` pass.	2024-06-25 12:36:40 -07:00
Noah Goldstein	db03d9d33a	Recommit "[Inliner] Propagate callee argument memory access attributes before inlining" (2nd Try) In the re-commit, just dropping the propagation of `writeonly` as that is the only attribute that can play poorly with call slot optimization (see issue: #95152 for more details). Closes #95888	2024-06-21 16:14:28 +08:00
Mircea Trofin	6037a698b9	[mlgo] inline for size: add bypass mechanism for perserving performance (#95616 ) This allows shrinking for size the cold part of the code, without sacrificing performance.	2024-06-17 14:18:55 -07:00
Stephen Tozer	094572701d	[RemoveDIs] Print IR with debug records by default (#91724 ) This patch makes the final major change of the RemoveDIs project, changing the default IR output from debug intrinsics to debug records. This is expected to break a large number of tests: every single one that tests for uses or declarations of debug intrinsics and does not explicitly disable writing records. If this patch has broken your downstream tests (or upstream tests on a configuration I wasn't able to run): 1. If you need to immediately unblock a build, pass `--write-experimental-debuginfo=false` to LLVM's option processing for all failing tests (remember to use `-mllvm` for clang/flang to forward arguments to LLVM). 2. For most test failures, the changes are trivial and mechanical, enough that they can be done by script; see the migration guide for a guide on how to do this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates 3. If any tests fail for reasons other than FileCheck check lines that need updating, such as assertion failures, that is most likely a real bug with this patch and should be reported as such. For more information, see the recent PSA: https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578	2024-06-14 15:07:27 +01:00
Nikita Popov	5f99a7a51a	Revert "[Inliner] Propagate callee argument memory access attributes before inlining" This exposes a miscompile reported in https://github.com/llvm/llvm-project/issues/95152. Whether the new inference or MemCpyOpt is at fault depends on the precise semantics of writeonly attributes. Revert the patch while this is being pinned down. This reverts commit 285dbed147e243f416b003e150d67ffb0922ff16. This reverts commit cda5790e38af5da3ad455eddab36ef16bf3e8104.	2024-06-12 12:32:50 +02:00
Arthur Eubanks	71497cc7a4	[CGSCC] Fix compile time blowup with large RefSCCs (#94815 ) In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC and the call graph updates done as a result of the inliner take a long time. This is due to RefSCC::removeInternalRefEdges() getting called many times, each time removing one function from the RefSCC, but each call to removeInternalRefEdges() is proportional to the size of the RefSCC. There are two places that call removeInternalRefEdges(), in updateCGAndAnalysisManagerForPass() and LazyCallGraph::removeDeadFunction(). 1) Since LazyCallGraph can deal with spurious (edges that exist in the graph but not in the IR) ref edges, we can simply not call removeInternalRefEdges() in updateCGAndAnalysisManagerForPass(). 2) LazyCallGraph::removeDeadFunction() still ends up taking the brunt of compile time with the above change for the original reason. So instead we batch all the dead function removals so we can call removeInternalRefEdges() just once. This requires some changes to callers of removeDeadFunction() to not actually erase the function from the module, but defer it to when we batch delete dead functions at the end of the CGSCC run, leaving the function body as "unreachable" in the meantime. We still need to ensure that call edges are accurate. I had also tried deleting dead functions after visiting a RefSCC, but deleting them all at once at the end was simpler. Many test changes are due to not performing unnecessary revisits of an SCC (the CGSCC infrastructure deems ref edge refinements as unimportant when it comes to revisiting SCCs, although that seems to not be consistently true given these changes) because we don't remove some ref edges. Specifically for devirt-invalidated.ll this seems to expose an inlining order issue with the inliner. Probably unimportant for this type of intentionally weird call graph. Compile time: https://llvm-compile-time-tracker.com/compare.php?from=6f2c61071c274a1b5e212e6ad4114641ec7c7fc3&to=b08c90d05e290dd065755ea776ceaf1420680224&stat=instructions:u	2024-06-11 09:50:13 -07:00
Min-Yih Hsu	1fe4f2d1a4	[Inliner][test] Fix incorrect REQUIRE line in `inline-switch-default.ll` (NFC) (#95009 ) It should be `x86-registered-target` because we only need the X86 target in this case. `x86_64-linux` will be too strict here as it puts a prerequisite on the default target triple.	2024-06-10 15:32:35 -07:00
Nikita Popov	deab451e7a	[IR] Remove support for icmp and fcmp constant expressions (#93038 ) Remove support for the icmp and fcmp constant expressions. This is part of: https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179 As usual, many of the updated tests will no longer test what they were originally intended to -- this is hard to preserve when constant expressions get removed, and in many cases just impossible as the existence of a specific kind of constant expression was the cause of the issue in the first place.	2024-06-04 08:31:03 +02:00
Andreas Jonson	5c214eb0c6	[Inline] Clone return range attribute on the callsite into inlined call (#92666 )	2024-05-29 12:05:05 +02:00
Krzysztof Pszeniczny	cda5790e38	[Inliner] Don't propagate memory attributes to byval params (#93381 ) Memory restrictions for params to the inlined function do not apply to the copies logically made when that function further passes its own params as byval. In other words, imagine that `@foo()` calls `@bar(ptr readonly %p)` which in turn calls `@baz(ptr byval("...") %p)` (passing the same `%p`). This is fully legal - `baz` is allowed to modify its copy of the object referenced by `%p` because the argument is passed by value. However, when inlining `@bar` into `@foo`, we can't say that the callsite is now `@baz(ptr readonly byval("...") %p)`, as this would mean that `@baz` is not allowed to modify it's copy of the object pointed to by `%p`. LangRef says: "The copy is considered to belong to the caller not the callee (for example, readonly functions should not write to byval parameters)". This fixes a miscompile introduced by PR #89024 in a program in the Google codebase.	2024-05-26 18:05:13 +02:00
Alex Voicu	10edb4991c	[Clang][CodeGen] Start migrating away from assuming the Default AS is 0 (#88182 ) At the moment, Clang is rather liberal in assuming that 0 (and by extension unqualified) is always a safe default. This does not work for targets that actually use a different value for the default / generic AS (for example, the SPIRV that obtains from HIPSPV or SYCL). This patch is a first, fairly safe step towards trying to clear things up by querying a modules' default AS from the target, rather than assuming it's 0, alongside fixing a few places where things break / we encode the 0 == DefaultAS assumption. A bunch of existing tests are extended to check for non-zero default AS usage.	2024-05-19 14:59:03 +01:00
David Green	220756f1f9	[AArch64][Inline] Regenerate Inline/AArch64/binop.ll test check lines. NFC Should hopefully help with #91854	2024-05-13 09:49:09 +01:00
DianQK	d48bf8aef2	Reapply "[InlineCost] Correct the default branch cost for the switch statement (#85160 )" This reverts commit c6e4f6309184814dfc4bb855ddbdb5375cc971e0.	2024-05-10 21:18:53 +08:00
Mingming Liu	64f4ceb09e	[Inline][PGO] After inline, update InvokeInst profile counts in caller and cloned callee (#83809 ) A related change is https://reviews.llvm.org/D133121, which correctly preserves both branch weights and value profiles for invoke instruction. * If the branch weight of the `invokeinst` specifies taken / not-taken branches, there is no scale.	2024-05-08 15:48:40 -07:00
DianQK	c6e4f63091	Revert "[InlineCost] Correct the default branch cost for the switch statement (#85160 )" This reverts commit 882814edd33cab853859f07b1dd4c4fa1393e0ea.	2024-05-05 21:54:30 +08:00
Quentin Dian	882814edd3	[InlineCost] Correct the default branch cost for the switch statement (#85160 ) Fixes #81723. The earliest commit of the related code is: `919f9e8d65`. I tried to understand the following code with https://github.com/llvm/llvm-project/pull/77856#issuecomment-1993499085. `5932fcc478/llvm/lib/Analysis/InlineCost.cpp (L709-L720)` I think only scenarios where there is a default branch were considered.	2024-05-05 21:28:31 +08:00
Noah Goldstein	285dbed147	[Inliner] Propagate callee argument memory access attributes before inlining To avoid losing information, we can propagate some access attribute from the to-be-inlined callee to its callsites. We can propagate argument memory access attributes to callsite parameters if they are from the same underlying object. Closes #89024	2024-05-03 14:10:24 -05:00
Noah Goldstein	f8ff51e1b0	[Inliner] Add tests for not propagating `writable` if `readonly` is present; NFC	2024-05-03 14:10:24 -05:00
Matt Arsenault	9f9856d623	AMDGPU: Update name for amdgpu.no.remote.memory metadata	2024-05-03 11:50:59 +02:00
Antonio Frighetto	1bb929833b	[Inline][Cloning] Drop incompatible attributes from `NewFunc` Performing `instSimplify` while cloning is unsafe due to incomplete remapping (as reported in #87534). Ideally, `instSimplify` ought to reason on the updated newly-cloned function, after returns have been rewritten and callee entry basic block / call-site have been fixed up. This is in contrast to `CloneAndPruneIntoFromInst` behaviour, which is inherently expected to clone basic blocks, with pruning on top of – if any –, and not actually fixing up returns / CFG, which should be up to the Inliner. We may solve this by letting `instSimplify` work on the newly-cloned function, while maintaining old function attributes, so as to avoid inconsistencies between the yet-to-be-solved return type, and new function ret type attributes.	2024-05-02 16:29:09 +02:00
Antonio Frighetto	42c7cb6969	Reapply "[Inline][Cloning] Defer simplification after phi-nodes resolution" Original commit: a61f9fe31750cee65c726fb51f1b14e31e177258 Multiple 2-stage buildbots were reporting failures. These issues have been addressed separately. Fixes: https://github.com/llvm/llvm-project/issues/87534.	2024-05-02 16:29:09 +02:00
Vitaly Buka	29c98e59cd	Revert "[Inline][Cloning] Defer simplification after phi-nodes resolution" #87963 Reopens #87534. Breaks multiple bots: https://lab.llvm.org/buildbot/#/builders/168/builds/20028 https://lab.llvm.org/buildbot/#/builders/74/builds/27773 And reproducer in a61f9fe31750cee65c726fb51f1b14e31e177258. This reverts commit a61f9fe31750cee65c726fb51f1b14e31e177258.	2024-04-24 14:51:54 -07:00
Antonio Frighetto	a61f9fe317	[Inline][Cloning] Defer simplification after phi-nodes resolution A logic issue arose when inlining via `CloneAndPruneFunctionInto`, which, besides cloning, performs instruction simplification as well. By the time a new cloned instruction is being simplified, phi-nodes are not remapped yet as the whole CFG needs to be processed first. As `VMap` state at this stage is incomplete, `threadCmpOverPHI` and variants could lead to unsound optimizations. This issue has been addressed by performing basic constant folding while cloning, and postponing instruction simplification once phi-nodes are revisited. Fixes: https://github.com/llvm/llvm-project/issues/87534.	2024-04-24 16:55:33 +02:00
Antonio Frighetto	c1d00510ab	[Inline][Cloning] Introduce test for PR87963 (NFC)	2024-04-24 16:55:33 +02:00
Matt Arsenault	f433c3b380	AMDGPU: Add tests for atomicrmw handling of new metadata (#89248 ) Add baseline tests which should comprehensively test the new atomic metadata. Test codegen / expansion, and preservation in a few transforms. New metadata defined in #85052	2024-04-20 00:43:36 +02:00

1 2 3 4 5 ...

1014 Commits