6131 Commits

Author SHA1 Message Date
Jeremy Morse
0065d06760
[NFC][DebugInfo] Maintain RemoveDIs flag when attributor creates functions (#79143)
We're using this flag (IsNewDbgInfoFormat) to detect the boundaries in
LLVM of what's treating debug-info as intrinsics (i.e. dbg.value), and
what's using DPValue objects (the non-intrinsic replacement). The
attributor tends to create new wrapper functions and doesn't insert them
into Modules in the usual way, thus we have to manually update that flag
to signal what debug-info mode it's using.

I've added some --try-experimental-debuginfo-iterators RUN lines to
tests that would otherwise crash because of this, so that they're
exercised by our new-debuginfo-iterators buildbot.

NB: there's an attributor test with a dbg.value in it, however
attributes re-order themselves in RemoveDIs mode for various reasons, so
we're going to address that in a different patch.
2024-01-24 15:20:05 +00:00
Paul Kirth
9d476e1e1a
[clang][FatLTO] Avoid UnifiedLTO until it can support WPD/CFI (#79061)
Currently, the UnifiedLTO pipeline seems to have trouble with several
LTO features, like SplitLTO units, which means we cannot use important
optimizations like Whole Program Devirtualization or security hardening
instrumentation like CFI.

This patch reverts FatLTO to using distinct pipelines for Full LTO and
ThinLTO. It still avoids module cloning, since that was error prone.
2024-01-23 14:04:52 -08:00
Jeremy Morse
4782ac8dd3
[DebugInfo][RemoveDIs] Use splice in Outliner rather than moveBefore (#79124)
This patch replaces a utility in the outliner that moves the contents of
one basic block into another basic block, with a call to splice instead.
I think it's NFC, however I'd like a second pair of eyes to look at it
just in case.

The reason for doing this is an edge case in the handling of DPValue
objects, the replacement for dbg.values. If there's a variable
assignment "dangling" at the end of a block (which happens when we
delete the terminator), inserting instructions at end() doesn't shift
the DPValue up into the block. We could probably fix this; but it's much
easier to use splice at the only call site that does this.

Patch adds --try-experimental-debuginfo-iterators to a test to exercise
this code path.
2024-01-23 16:23:48 +00:00
Teresa Johnson
070738ba88
[MemProf][NFC] Explicitly specify llvm version of function_ref (#77783)
As suggested in https://github.com/llvm/llvm-project/pull/75823, to
avoid confusion with std::function_ref, qualify all uses with llvm::
(we were already using the llvm version, but this avoids ambiguity).
2024-01-16 11:20:55 -08:00
Nikita Popov
6c2fbc3a68
[IRBuilder] Add CreatePtrAdd() method (NFC) (#77582)
This abstracts over the common pattern of creating a gep with i8 element
type.
2024-01-12 14:21:21 +01:00
Kazu Hirata
7b9bc4729b [IPO] Use a range-based for loop (NFC) 2024-01-11 22:48:22 -08:00
Teresa Johnson
c37699b9e3
[MemProf] Add missing <unordered_map> include to fix buildbot (#77788)
Should fix buildbot failure
https://lab.llvm.org/buildbot/#/builders/54/builds/8451
from #75823.
2024-01-11 07:48:55 -08:00
Teresa Johnson
26a8664ed4
[MemProf] Handle missing tail call frames (#75823)
If tail call optimization was not disabled for the profiled binary, the
call contexts will be missing frames for tail calls. Handle this by
performing a limited search through tail call edges for the profiled
callee when a discontinuity is detected. The search depth is adjustable
but defaults to 5.

If we are able to identify a short sequence of tail calls, update the
graph for those calls. In the case of ThinLTO, synthesize the necessary
CallsiteInfos for carrying the cloning information to the backends.
2024-01-11 06:57:48 -08:00
Bill Wendling
fc6b5666db
[NFC][ObjectSizeOffset] Use classes instead of std::pair (#76882)
The use of std::pair makes the values it holds opaque. Using classes
improves this while keeping the POD aspect of a std::pair. As a nice
addition, the "known" functions held inappropriately in the Visitor
classes can now properly reside in the value classes. :-)
2024-01-05 18:08:53 -08:00
Yingwei Zheng
7e405eb722
[FuncAttrs] Don't infer noundef for functions with sanitize_memory attribute (#76691)
MemorySanitizer assumes that the definition and declaration of a
function will be consistent. If we add `noundef` for some definitions,
it will break msan.

Fix buildbot failure caused by #76553.
2024-01-02 06:59:56 +08:00
Yingwei Zheng
1228becf7d
[FuncAttrs] Deduce noundef attributes for return values (#76553)
This patch deduces `noundef` attributes for return values.
IIUC, a function returns `noundef` values iff all of its return values
are guaranteed not to be `undef` or `poison`.
Definition of `noundef` from LangRef:
```
noundef
This attribute applies to parameters and return values. If the value representation contains any 
undefined or poison bits, the behavior is undefined. Note that this does not refer to padding 
introduced by the type’s storage representation.
```
Alive2: https://alive2.llvm.org/ce/z/g8Eis6

Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=30dcc33c4ea3ab50397a7adbe85fe977d4a400bd&to=c5e8738d4bfbf1e97e3f455fded90b791f223d74&stat=instructions:u
|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.01%|+0.01%|-0.01%|+0.01%|+0.03%|-0.04%|+0.01%|

The motivation of this patch is to reduce the number of `freeze` insts
and enable more optimizations.
2023-12-31 20:44:48 +08:00
Yingwei Zheng
4358e6e0c5
[FuncAttrs] Infer norecurse for funcs with calls to nocallback callees (#76372)
This patch adds missing `norecurse` attrs to funcs that only call intrinsics with `nocallback` attrs.
Fixes the regression found in https://github.com/dtcxzyw/llvm-opt-benchmark/pull/45#discussion_r1436148743.
The function loses `norecurse` attr because it calls `@llvm.fabs.f64`, which is not marked as `norecurse`.

Since `norecurse` is not a default attribute of intrinsics and it is
ambiguous for intrinsics, I decided to use the existing `callback`
attributes.

> nocallback
This attribute indicates that the function is only allowed to jump back
into caller’s module by a return or an exception, and is not allowed to
jump back by invoking a callback function, a direct, possibly
transitive, external function call, use of longjmp, or other means. It
is a compiler hint that is used at module level to improve dataflow
analysis, dropped during linking, and has no effect on functions defined
in the current module.

See also https://llvm.org/docs/LangRef.html#function-attributes.
2023-12-27 03:16:43 +08:00
Benjamin Kramer
9423e45987 [ProfileData] Copy CallTargetMaps a bit less. NFCI 2023-12-24 17:48:18 +01:00
Nikita Popov
658b260dbf [Attributor] Don't construct pretty GEPs
Bring this in line with other transforms like ArgPromotion/SROA/
SCEVExpander and always produce canonical i8 GEPs.
2023-12-22 16:48:13 +01:00
Ivan R. Ivanov
39f09ec245
Invalidate analyses after running Attributor in OpenMPOpt (#74908)
Using the LoopInfo from OMPInfoCache after the Attributor ran resulted
in a crash due to it being in an invalid state.

---------

Co-authored-by: Ivan Radanov Ivanov <ivanov2@llnl.gov>
2023-12-20 15:01:21 -08:00
Paul Walker
dea16ebd26
[LLVM][IR] Replace ConstantInt's specialisation of getType() with getIntegerType(). (#75217)
The specialisation will not be valid when ConstantInt gains native
support for vector types.

This is largely a mechanical change but with extra attention paid to constant
folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to
remove the need to call `getIntegerType()`.

Co-authored-by: Nikita Popov <github@npopov.com>
2023-12-18 11:58:42 +00:00
Mircea Trofin
ed10fba1b2
[ThinLTO] Allow importing based on a workload definition (#74545)
An example of a "workload definition" would be "the transitive closure of functions actually called to satisfy a RPC request", i.e. a (typically significantly) smaller subset of the transitive closure (static + possible indirect call targets) of callees. This means this workload definition is a type of flat dynamic profile.

Producing one is not in scope - it can be produced offline from traces, or from sample-based profiles, etc.

This patch adds awareness to ThinLTO of such a concept. A workload is defined as a root and a list of functions. All function references are by-name (more readable than GUIDs). In the case of aliases, the expectation is the list contains all the alternative names.

The workload definitions are presented to the linker as a json file, containing a dictionary. The keys are the roots, the values are the list of functions.

The import list for a module defining a root will be the functions listed for it in the profile.

Using names this way assumes unique names for internal functions, i.e. clang's `-funique-internal-linkage-names`.

Note that the behavior affects the entire module where a root is defined (i.e. different workloads best be defined in different modules), and does not affect modules that don't define roots.
2023-12-14 15:10:48 -08:00
Nikita Popov
696cc20d4e [LVI] Make UndefAllowed argument of getConstantRange() required
For the two remaining uses that did not explicitly specify it,
set UndefAllowed=false. In both cases, I believe that treating
undef as a full range is the correct behavior.
2023-12-12 11:43:52 +01:00
Youngsuk Kim
9071a15d4b
[llvm][Attributor] Strip AddressSpaceCast from 'constructPointer' (#74742)
* Remove pointer AddressSpaceCast in function `constructPointer`
* Remove 1st parameter (`ResTy`) of function `constructPointer`

1st input argument to function `constructPointer` in all 4 call-sites is
`ptr addrspace(0)`. Function `constructPointer` performs a pointer
AddressSpaceCast to `ResTy`, making the returned pointer have type `ptr
addrspace(0)` in all 4 call-sites.

Unless there's a clear reason to discard the addrspace info of input
parameter `Ptr`, I think we should keep and forward that info to the
returned pointer of `constructPointer`.

Opaque ptr cleanup effort.
2023-12-11 09:29:01 -05:00
Oskar Wirga
81360ec582
[CFI] Fix Direct Call Issues in CFI Dispatch Table (#69663)
I discovered two issues for when a CFI dispatch table entry is used as a
direct call.
# Inlining
There is the possibility that the dispatch table entry contains only a
single function pointer:
```
; Function Attrs: naked nocf_check
define private void @.cfi.jumptable() #6 align 8 { entry:
  call void asm sideeffect "jmp ${0:c}@plt\0Aint3\0Aint3\0Aint3\0A", "s"(ptr @_Z7throw_ei)
  unreachable
}
```
If this function is inlined, the unreachable follows and ruins the
containing function.
# Exception Handling
The dispatch table is always marked NoUnwind. This is fine if the
entries are never used directly, but if a direct call is used which the
containing function expects to throw, it will no longer throw and the
exception handling code will be lost.
2023-12-06 12:56:59 -08:00
Teresa Johnson
88fbc4d3df
[ThinLTO] Add tail call flag to call edges in summary (#74043)
This adds support for a HasTailCall flag on function call edges in the
ThinLTO summary. It is intended for use in aiding discovery of missing
frames from tail calls in profiled call stacks for MemProf of profiled
binaries that did not disable tail call elimination. A follow on change
will add the use of this new flag during MemProf context disambiguation.

The new flag is encoded in the bitcode along with either the hotness
flag from the profile, or the relative block frequency under the
-write-relbf-to-summary flag when there is no profile data.
Because we now will always have some additional call edge information, I
have removed the non-profile function summary record format, and we
simply encode the tail call flag along with a hotness type of none when
there is no profile information or relative block frequency. The change
of record format and name caused most of the test case changes.

I have added explicit testing of generation of the new tail call flag
into the bitcode and IR assembly format as part of the changes to
llvm/test/Bitcode/thinlto-function-summary-refgraph.ll. I have also
added round trip testing through assembly and bitcode to
llvm/test/Assembler/thinlto-summary.ll.
2023-12-06 08:41:44 -08:00
Kazu Hirata
92c2529ccd [llvm] Stop including vector (NFC)
Identified with clangd.
2023-12-03 22:32:21 -08:00
Paul Kirth
cfe1ece833
[clang][llvm][fatlto] Avoid cloning modules in FatLTO (#72180)
https://github.com/llvm/llvm-project/issues/70703 pointed out that
cloning LLVM modules could lead to miscompiles when using FatLTO.

This is due to an existing issue when cloning modules with labels (see
#55991 and #47769). Since this can lead to miscompilation, we can avoid
cloning the LLVM modules, which was desirable anyway.

This patch modifies the EmbedBitcodePass to no longer clone the module
or run an input pipeline over it. Further, it make FatLTO always perform
UnifiedLTO, so we can still defer the Thin/Full LTO decision to
link-time. Lastly, it removes dead/obsolete code related to now defunct
options that do not work with the EmbedBitcodePass implementation any
longer.
2023-11-30 17:09:34 -08:00
Youngsuk Kim
c57ef2c698
[llvm][OpenMPOpt] Remove no-op ptr-to-ptr bitcast (NFC) (#73869)
* Remove a call to CreatePointerBitCastOrAddrSpaceCast which merely adds
a no-op ptr-to-ptr bitcast.

* Most of the diff is from removing checks for no-op ptr-to-ptr bitcasts
in relevant LIT tests
2023-11-29 20:47:37 -05:00
Davide Italiano
61ab43ae00
[IROutliner] Skip dbg values during the candidate search. (#72945)
dbg value don't really have a value number associated as they have no
semantic value associated, i.e. they don't change the code being
generated. Use the correct API to go over them.

Fixes https://github.com/llvm/llvm-project/issues/62876
2023-11-22 11:26:36 -08:00
Mats Petersson
2fb51fba8c
[FuncSpec] Update function specialization to handle phi-chains (#72903)
When using the LLVM flang compiler with alias analysis (AA) enabled,
SPEC2017:548.exchange2_r was running significantly slower than wihtout
the AA.

This was caused by the GVN pass replacing many of the loads in the
pre-AA code with phi-nodes that form a long chain of dependencies, which
the function specialization was unable to follow.

This adds a function to discover phi-nodes in a transitive set, with
some limitations to avoid spending ages analysing phi-nodes.

The minimum latency savings also had to be lowered - fewer load
instructions means less saving.

Adding some more prints to help debugging the isProfitable decision.

No significant change in compile time or generated code-size.

(A previous attempt to fix this was abandoned: https://github.com/llvm/llvm-project/pull/71442)

---------

Co-authored-by: Alexandros Lamprineas <alexandros.lamprineas@arm.com>
2023-11-22 10:41:01 +00:00
Jeremy Morse
f42482def2
[DebugInfo][RemoveDIs] Don't convert debug-intrinsics to Unreachable (#72380)
It might seem obvious, but it's not a good idea to convert a
debug-intrinsic instruction into an UnreachableInst, as this means
things operate differently with and without the -g option. However this
can happen due to the "mutate the next instruction" API calls we make.
With RemoveDIs eliminating debug intrinsics, this behaviour is at risk
of changing, hence this patch ensures we only ever mutate the next _non_
debuginfo instruction into an Unreachable.

The tests instrumented with the --try... flag all exercise this, I've
added some metadata to a SCCP test to ensure it's exercised.
2023-11-20 20:53:24 +00:00
Davide Italiano
615ebfc3e5
[SampleProfileProbe] Downgrade probes too large from error to warning. (#72574) 2023-11-16 15:57:51 -08:00
Matthias Braun
cb4627d150
Add setBranchWeigths convenience function. NFC (#72446)
Add `setBranchWeights` convenience function to ProfDataUtils.h and use
it where appropriate.
2023-11-16 10:55:19 -08:00
Teresa Johnson
24a618f69e
[MemProf] Look through alias when applying cloning in ThinLTO backend (#72156)
Mirror the handling in ModuleSummaryAnalysis to look through aliases
when handling call instructions in the ThinLTO backend MemProf handling.

Fixes #72094
2023-11-15 13:14:19 -08:00
Kazu Hirata
d4360e428f [llvm] Stop including llvm/ADT/DenseMap.h (NFC)
Ientified with clangd.
2023-11-11 10:07:19 -08:00
Kazu Hirata
84a48ee9fb [llvm] Stop including llvm/ADT/SetVector.h (NFC)
Identified with clangd.
2023-11-10 23:50:23 -08:00
Vidhush Singhal
754b93e466
[Attributor] New attribute to identify what byte ranges are alive for an allocation (#66148)
Changes the size of allocations automatically.
For now, implements the case when a single range from start of the
allocation is alive and the allocation can be reduced.
2023-11-10 16:26:37 -08:00
William Junda Huang
683f2df6e5
[SampleProfile] Fix bug where remapper returns empty string and crashing Sample Profile loader (#71479)
Normally SampleContext does not allow using an empty StirngRef to
construct an object, this is to prevent bugs reading the profile.
However empty names may be emitted by a function which its name is
intentionally set to empty, or a bug in the remapper that returns an
empty string. Regardless, converting it to FunctionId first will prevent
the assert, and that assert check is unnecessary, which will be
addressed in another patch
2023-11-10 21:38:13 +00:00
Jeremy Morse
f1b0a54451 Reapply 7d77bbef4ad92, adding new debug-info classes
This reverts commit 957efa4ce4f0391147cec62746e997226ee2b836.

Original commit message below -- in this follow up, I've shifted
un-necessary inclusions of DebugProgramInstruction.h into being forward
declarations (fixes clang-compile time I hope), and a memory leak in the
DebugInfoTest.cpp IR unittests.

I also tracked a compile-time regression in D154080, more explanation
there, but the result of which is hiding some of the changes behind the
EXPERIMENTAL_DEBUGINFO_ITERATORS compile-time flag. This is tested by the
"new-debug-iterators" buildbot.

[DebugInfo][RemoveDIs] Add prototype storage classes for "new" debug-info

This patch adds a variety of classes needed to record variable location
debug-info without using the existing intrinsic approach, see the rationale
at [0].

The two added files and corresponding unit tests are the majority of the
plumbing required for this, but at this point isn't accessible from the
rest of LLVM as we need to stage it into the repo gently. An overview is
that classes are added for recording variable information attached to Real
(TM) instructions, in the form of DPValues and DPMarker objects. The
metadata-uses of DPValues is plumbed into the metadata hierachy, and a
field added to class Instruction, which are all stimulated in the unit
tests. The next few patches in this series add utilities to convert to/from
this new debug-info format and add instruction/block utilities to have
debug-info automatically updated in the background when various operations
occur.

This patch was reviewed in Phab in D153990 and D154080, I've squashed them
together into this commit as there are dependencies between the two
patches, and there's little profit in landing them separately.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
2023-11-08 16:42:35 +00:00
Nikita Popov
e360a16fee
[GlobalOpt] Cache whether CC is changeable (#71381)
The hasAddressTaken() call in hasOnlyColdCalls() has quadratic
complexity if there are many cold calls to a function: We're going to
visit each call of the function, and then for each of them iterate all
the users of the function.

We've recently encountered a case where GlobalOpt spends more than an
hour in these hasAddressTaken() checks when full LTO is used.

Avoid this by moving the hasAddressTaken() check into hasChangeableCC()
and caching its result, so it is only computed once per function.
2023-11-07 10:36:45 +01:00
Simon Pilgrim
3ca4fe80d4 [Transforms] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC.
startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)
2023-11-06 16:50:18 +00:00
Nikita Popov
c4c0ac10f1 [IPO] Remove unnecessary bitcasts (NFC) 2023-11-06 16:49:45 +01:00
Nikita Popov
16a595e398 [Attributor] Avoid use of ConstantExpr::getFPTrunc() (NFC)
Use the constant folding API instead. For simplificity I'm using
the DL-independent API here.
2023-11-06 15:27:01 +01:00
Dominik Adamski
2cce0f6c57
[OpenMP][OMPIRBuilder] Add support to omp target parallel (#67000)
Added support for LLVM IR code generation which is used for handling omp
target parallel code. The call for __kmpc_parallel_51 is generated and
the parallel region is outlined to separate function.

The proper setup of kmpc_target_init mode is not included in the commit.
It is assumed that the SPMD mode for target initialization is properly
set by other codegen functions.
2023-11-06 11:44:00 +01:00
Nikita Popov
a682a9cfd0 Revert "Port Swift's merge function pass to llvm: merging functions that differ in constants (#68235)"
This reverts commit 19b5495b653a00da7a250f48b4f739fcf2bbe82f.

PR landed without approval, with severe quality issues.
2023-11-03 21:15:46 +01:00
Manman Ren
19b5495b65
Port Swift's merge function pass to llvm: merging functions that differ in constants (#68235)
See RFC for details:
https://discourse.llvm.org/t/rfc-for-moving-swift-s-merge-function-pass-to-llvm/73778

We will need to refactor extension to FunctionComparator/FunctionHash to
StructuralHash. This patch adds a new pass which is ported from Swift,
and will need to discuss on how to migrate Swift’s pass over after we
land this in llvm.

Create this PR to get some early review on the patch.

---------

Co-authored-by: Manman Ren <mren@meta.com>
2023-11-03 11:13:58 -07:00
Johannes Doerfert
d3e7a48cbd [OpenMP][NFC] Remove a no-op function 2023-11-03 10:28:36 -07:00
Jeremy Morse
957efa4ce4 Revert "[DebugInfo][RemoveDIs] Add prototype storage classes for "new" debug-info"
And some intervening fixups. There are two remaining problems:
 * A memory leak via https://lab.llvm.org/buildbot/#/builders/236/builds/7120/steps/10/logs/stdio
 * A performance slowdown with -g where I'm not completely sure what the cause it

These might be fairly straightforwards to fix, but it's the end of the day
hear, so I figure I'll clear the buildbots til tomorrow.

This reverts commit 7d77bbef4ad9230f6f427649373fe46a668aa909.
This reverts commit 9026f35afe6ffdc5e55b6615efcbd36f25b11558.
This reverts commit d97b2b389a0e511c65af6845119eb08b8a2cb473.
2023-11-02 17:41:36 +00:00
Jeremy Morse
7d77bbef4a [DebugInfo][RemoveDIs] Add prototype storage classes for "new" debug-info
This patch adds a variety of classes needed to record variable location
debug-info without using the existing intrinsic approach, see the rationale
at [0].

The two added files and corresponding unit tests are the majority of the
plumbing required for this, but at this point isn't accessible from the
rest of LLVM as we need to stage it into the repo gently. An overview is
that classes are added for recording variable information attached to Real
(TM) instructions, in the form of DPValues and DPMarker objects. The
metadata-uses of DPValues is plumbed into the metadata hierachy, and a
field added to class Instruction, which are all stimulated in the unit
tests. The next few patches in this series add utilities to convert to/from
this new debug-info format and add instruction/block utilities to have
debug-info automatically updated in the background when various operations
occur.

This patch was reviewed in Phab in D153990 and D154080, I've squashed them
together into this commit as there are dependencies between the two
patches, and there's little profit in landing them separately.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
2023-11-02 12:44:53 +00:00
Johannes Doerfert
a8152086ff [Attributor][FIX] Ensure new BBs are registered 2023-11-01 12:12:14 -07:00
Nikita Popov
6b8ed78719 [IR] Add writable attribute
This adds a writable attribute, which in conjunction with
dereferenceable(N) states that a spurious store of N bytes is
introduced on function entry. This implies that this many bytes
are writable without trapping or introducing data races. See
https://llvm.org/docs/Atomics.html#optimization-outside-atomic for
why the second point is important.

This attribute can be added to sret arguments. I believe Rust will
also be able to use it for by-value (moved) arguments. Rust likely
won't be able to use it for &mut arguments (tree borrows does not
appear to allow spurious stores).

In this patch the new attribute is only used by LICM scalar promotion.
However, the actual motivation for this is to fix a correctness issue
in call slot optimization, which needs this attribute to avoid
optimization regressions.

Followup to the discussion on D157499.

Differential Revision: https://reviews.llvm.org/D158081
2023-11-01 10:46:31 +01:00
Davide Italiano
954af75ceb
[PGO] Skip optimizing probes that don't fit. (#70875)
The discriminator can only pack 16 bits, so anything exceeding that
value will cause the packing code to crash. Emit a diagnostic and skip
the optimization instead.
2023-10-31 21:30:36 -07:00
Joseph Huber
e8c0ae60d7
[OpenMP] Add optimization to remove the RPC client (#70683)
Summary:
Part of the work done in the `libc` project is to provide host services
for things like `printf` or `malloc`, or generally any syscall-like
behaviour. This scheme works by emitting an externally visible global
called `__llvm_libc_rpc_client` that the host runtime can pick up to get
a handle to the global memory associated with the client. We use the
presence of this symbol to indicate whether or not we need to run an RPC
server. Normally, this symbol is only present if something requiring an
RPC server was linked in, such as `printf`. However, if this call to
`printf` was subsequently optimizated out, the symbol would remain and
cannot be removed (rightfully so) because of its linkage. This patch
adds a special-case optimization to remove this symbol so we can
indicate that an RPC server is no longer needed.

This patch puts this logic in `OpenMPOpt` as the most readily available
place for it. In the future, we should think how to move this somewhere
more generic. Furthermore, we use a hard-coded runtime name (which isn't
uncommon given all the other magic symbol names). But it might be nice
to abstract that part away.
2023-10-31 17:23:24 -05:00
Sander de Smalen
00a831421f
[AArch64][SME] Extend Inliner cost-model with custom penalty for calls. (#68416)
This is a stacked PR following on from #68415 

This patch has two purposes:
(1) It tries to make inlining more likely when it can avoid a
streaming-mode change.
(2) It avoids inlining when inlining causes more streaming-mode changes.

An example of (1) is:
```
  void streaming_compatible_bar(void);

  void foo(void) __arm_streaming {
    /* other code */
    streaming_compatible_bar();
    /* other code */
  }

  void f(void) {
    foo();            // expensive streaming mode change
  }

  ->

  void f(void) {
    /* other code */
    streaming_compatible_bar();
    /* other code */
  }
```
where it wouldn't have inlined the function when foo would be a
non-streaming function.

An example of (2) is:
```
  void streaming_bar(void) __arm_streaming;

  void foo(void) __arm_streaming {
    streaming_bar();
    streaming_bar();
  }

  void f(void) {
    foo();            // expensive streaming mode change
  }

  -> (do not inline into)

  void f(void) {
    streaming_bar();  // these are now two expensive streaming mode changes
    streaming_bar();
  }```
2023-10-31 10:28:40 +00:00