217 Commits

Author SHA1 Message Date
Ryan Mansfield
67efbd0bf1
[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955) 2025-01-08 11:07:23 +01:00
Florian Hahn
9e66206638
[Passes] Generalize ShouldRunExtraVectorPasses to allow re-use (NFCI). (#118323)
Generalize ShouldRunExtraVectorPasses to ShouldRunExtraPasses, to allow
re-use for other transformations.

PR: https://github.com/llvm/llvm-project/pull/118323
2024-12-04 16:55:06 +00:00
Kyungwoo Lee
d23c5c2d65
[CGData] Global Merge Functions (#112671)
This implements a global function merging pass. Unlike traditional
function merging passes that use IR comparators, this pass employs a
structurally stable hash to identify similar functions while ignoring
certain constant operands. These ignored constants are tracked and
encoded into a stable function summary. When merging, instead of
explicitly folding similar functions and their call sites, we form a
merging instance by supplying different parameters via thunks. The
actual size reduction occurs when identically created merging instances
are folded by the linker.

Currently, this pass is wired to a pre-codegen pass, enabled by the
`-enable-global-merge-func` flag.
In a local merging mode, the analysis and merging steps occur
sequentially within a module:
- `analyze`: Collects stable function hashes and tracks locations of
ignored constant operands.
- `finalize`: Identifies merge candidates with matching hashes and
computes the set of parameters that point to different constants.
- `merge`: Uses the stable function map to optimistically create a
merged function.

We can enable a global merging mode similar to the global function
outliner
(https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753/),
which will perform the above steps separately.
- `-codegen-data-generate`: During the first round of code generation,
we analyze local merging instances and publish their summaries.
- Offline using `llvm-cgdata` or at link-time, we can finalize all these
merging summaries that are combined to determine parameters.
- `-codegen-data-use`: During the second round of code generation, we
optimistically create merging instances within each module, and finally,
the linker folds identically created merging instances.

Depends on #112664
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
2024-11-13 17:34:07 -08:00
Lei Wang
bc1aa2863b
[SampleFDO] Support enabling sample loader pass in O0 mode (#113985)
Add support for enabling sample loader pass in O0 mode(under
`-fsample-profile-use`). This can help verify PGO raw profile count
quality or provide a more accurate performance proxy(predictor), as O0
mode has minimal or no compiler optimizations that might otherwise
impact profile count accuracy.
- Explicitly disable the sample loader inlining to ensure it only emits
sampling annotation.
- Use flattened profile for O0 mode.
- Add the pass after `AddDiscriminatorsPass` pass to work with
`-fdebug-info-for-profiling`.
2024-11-08 15:29:44 -08:00
Yuxuan Chen
c6414970d7
[Coroutines] Inline the .noalloc ramp function marked coro_safe_elide (#114004) 2024-11-07 22:41:32 -08:00
Hari Limaye
fbd89bcc66
Reland "[LTO] Run Argument Promotion before IPSCCP" (#111853)
Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more
constants to be propagated. We also run PostOrderFunctionAttrs to
improve the information available to ArgumentPromotion's alias analysis,
and SROA to clean up allocas.

Relands #111163.
2024-11-06 13:54:48 +00:00
Shilei Tian
390300d9f4
[PassBuilder] Add ThinOrFullLTOPhase to optimizer pipeline (#114577) 2024-11-03 23:25:29 -05:00
Shilei Tian
dc45ff1d2a
[PassBuilder] Add ThinOrFullLTOPhase to early simplication EP call backs (#114547)
The early simplication pipeline is used in non-LTO and (Thin/Full)LTO
pre-link
stage. There are some passes that we want them in non-LTO mode, but not
at LTO
pre-link stage. The control is missing currently. This PR adds the
support. To
demonstrate the use, we only enable the internalization pass in non-LTO
mode for
AMDGPU because having it run in pre-link stage causes some issues.
2024-11-03 23:24:10 -05:00
Shilei Tian
5445edb5d6
[PassBuilder] Replace bool LTOPreLink with ThinOrFullLTOPhase Phase (#114564)
This will allow more fine-grained control in the future.
2024-11-01 14:56:35 -04:00
Lei Wang
bef3b54ea1
[InstrPGO] Avoid using global variable to fix potential data race (#114364)
In https://github.com/llvm/llvm-project/pull/109837, it sets a global
variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp
which introduced a data race detected by TSan. To fix this, I decouple
the flag setting, the flags are now set
separately(`instrument-cold-function-only-path` is required to be used
with `--pgo-instrument-cold-function-only`).
2024-10-31 21:28:13 -07:00
Paul Kirth
913cd11f94
[llvm][fatlto] Drop any CFI related instrumentation after emitting bitcode (#112788)
We want to support CFI instrumentation for the bitcode section, without
miscompiling the object code portion of a FatLTO object. We can reuse
the existing mechanisms in the LowerTypeTestsPass to do that, by just
adding the pass to the FatLTO pipeline after the EmbedBitcodePass with
the correct options set.

Fixes #112053
2024-10-31 12:40:21 -07:00
Dmitry Chernenkov
d924a9ba03 Revert "[InstrPGO] Support cold function coverage instrumentation (#109837)"
This reverts commit e517cfc531886bf6ed64b4e7109bb3141ac7f430.
2024-10-31 10:55:17 +00:00
Paul Kirth
b01e2a8b56
[llvm] Allow always dropping all llvm.type.test sequences
Currently, the `DropTypeTests` parameter only fully works with phi nodes
and llvm.assume instructions. However, we'd like CFI to work in
conjunction with FatLTO, in so far as the bitcode section should be able
to contain the CFI instrumentation, while any incompatible bits are
dropped when compiling the object code.

To do that, we need to drop the llvm.type.test instructions everywhere,
and not just their uses in phi nodes. This patch updates the
LowerTypeTest pass so that uses are removed, and replaced with `true` in
all cases, and not just in phi nodes.

Addressing this will allow us to fix #112053 by modifying the FatLTO
pipeline.

Reviewers: pcc, nikic

Reviewed By: pcc

Pull Request: https://github.com/llvm/llvm-project/pull/112787
2024-10-30 16:56:30 -07:00
Lei Wang
e517cfc531
[InstrPGO] Support cold function coverage instrumentation (#109837)
This patch adds support for cold function coverage instrumentation based
on sampling PGO counts. The major motivation is to detect dead functions
for the services that are optimized with sampling PGO. If a function is
covered by sampling profile count (e.g., those with an entry count > 0),
we choose to skip instrumenting those functions, which significantly
reduces the instrumentation overhead.

More details about the implementation and flags:
- Added a flag `--pgo-instrument-cold-function-only` in
`PGOInstrumentation.cpp` as the main switch to control skipping the
instrumentation.
- Built the extra instrumentation passes(a bundle of passes in
`addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by
`--instrument-cold-function-only-path` flag.
- Added a driver flag `-fprofile-generate-cold-function-coverage`: 
- 1) Config the flags in one place, i,e. adding
`--instrument-cold-function-only-path=<...>` and
`--pgo-function-entry-coverage`. Note that the instrumentation file path
is passed through `--instrument-sample-cold-function-path`, because we
cannot use the `PGOOptions.ProfileFile` as it's already used by
`-fprofile-sample-use=<...>`.
- 2) makes linker to link `compiler_rt.profile` lib(see
[ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131)
).
- Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry
count to determine cold function.

Overall, the full command is like:

```
clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...>  code.cc -o code
```
2024-10-28 10:13:45 -07:00
Teresa Johnson
1de71652fd
[MemProf] Support cloning for indirect calls with ThinLTO (#110625)
This patch enables support for cloning in indirect callsites.

This is done by synthesizing callsite records for each virtual call
target from the profile metadata. In the thin link all the synthesized
records for a particular indirect callsite initially share the same
context node, but support is added to partition the callsites and
outgoing edges based on the callee function, creating a separate node
for each target.

In the LTO backend, when cloning is needed we first perform indirect
call promotion, then change the target of the new direct call to the
desired clone.

Note this is ThinLTO-specific, since for regular LTO indirect call
promotion should have already occurred.
2024-10-11 13:53:35 -07:00
Arthur Eubanks
e34d614e7d
[Passes] Remove -enable-infer-alignment-pass flag (#111873)
This flag has been on for a while without any complaints.
2024-10-10 12:28:46 -07:00
Hari Limaye
0a0f100a70
Revert "[LTO] Run Argument Promotion before IPSCCP" (#111839)
Reverts llvm/llvm-project#111163, as this was merged prematurely.
2024-10-10 15:03:01 +01:00
Hari Limaye
b9754e9d28
[LTO] Run Argument Promotion before IPSCCP (#111163)
Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more
constants to be propagated. We also run PostOrderFunctionAttrs to
improve the information available to ArgumentPromotion's alias analysis,
and SROA to clean up allocas.
2024-10-10 06:08:27 -04:00
Mircea Trofin
9c0ba62010
[ctx_prof] Relax the "profile use" case around PGOOpt (#108265)
`PGOOpt` could have a value if, for instance, debug info for profiling
is requested. Relaxing the requirement, for now, following that
eventually we would factor `PGOOpt` to better capture the supported
interplay between the various profiling options.
2024-09-11 13:39:50 -07:00
Yuxuan Chen
761bf333e3
[LLVM][Coroutines] Switch CoroAnnotationElidePass to a FunctionPass (#107897)
After landing https://github.com/llvm/llvm-project/pull/99285 we found
that the call graph update was causing the following crash when
expensive checks are turned on
```
llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:982: LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(LazyCallGraph &, LazyCallGraph::SCC &, LazyCallGraph::Node &, CGSCCAnalysisManager &, CGSCCUpdateResult &, FunctionAnalysisManager &, bool): Assertion `(RC == &TargetRC || RC->isAncestorOf(Targe
tRC)) && "New call edge is not trivial!"' failed.                                                                                                                                                                                                                                                                               
```
I have to admit I believe that the call graph update process I did for
that patch could be wrong.

After reading the code in `CGSCCToFunctionPassAdaptor`, I am convinced
that `CoroAnnotationElidePass` can be a FunctionPass and rely on the
adaptor to update the call graph for us, so long as we properly
invalidate the caller's analyses.

After this patch,
`llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll` no longer
fails under expensive checks.
2024-09-09 18:57:39 -07:00
Mircea Trofin
3b22618094
[ctx_prof] Insert the ctx prof flattener after the module inliner (#107499)
This patch enables experimenting with the contextual profile. ICP is currently disabled in this case - will reenable it subsequently. Also subsequently the inline cost model / decision making would be updated to be context-aware. Right now, this just achieves "complete use" of the profile, in that it's ingested, maintained, and sunk to a flat profile when not needed anymore.

Issue [#89287](https://github.com/llvm/llvm-project/issues/89287)
2024-09-09 18:16:24 -07:00
Yuxuan Chen
a416267a5f
[LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the noalloc variant (#99285)
This patch is episode three of the middle end implementation for the
coroutine HALO improvement project published on discourse:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

After we attribute the calls to some coroutines as "coro_elide_safe" in
the C++ FE and creating a `noalloc` ramp function, we use a new middle
end pass to move the call to coroutines to the noalloc variant.

This pass should be run after CoroSplit. For each node we process in
CoroSplit, we look for its callers and replace the attributed ones in
presplit coroutines to the noalloc one. The transformed `noalloc` ramp
function will also require a frame pointer to a block of memory it can
use as an activation frame. We allocate this on the caller's frame with
an alloca.

Please note that we cannot safely transform such attributed calls in
post-split coroutines due to memory lifetime reasons. The CoroSplit pass
is responsible for creating the coroutine frame spills for all the
allocas in the coroutine. Therefore it will be unsafe to create new
allocas like this one in post-split coroutines. This happens relatively
rarely because CGSCC performs the passes on the callees before the
caller. However, if multiple coroutines coexist in one SCC, this
situation does happen (and prevents us from having potentially unbound
frame size due to recursion.)

You can find episode 1: Clang FE of this patch series at
https://github.com/llvm/llvm-project/pull/99282
Episode 2: CoroSplit at https://github.com/llvm/llvm-project/pull/99283
2024-09-08 23:09:40 -07:00
Mingming Liu
d4ddf06b0c
[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (#107471)
The primary motivation is to remove `EntryCount` from `FunctionSummary`.
This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of
64498c5483).

While I'm at it, this PR clean up {SummaryBasedOptimizations,
SyntheticCountsPropagation} since they were not used and there are no
plans to further invest on them.

With this patch, bitcode writer writes a placeholder 0 at the byte
offset of `EntryCount` and bitcode reader can parse the function entry
count at the correct byte offset. Added a TODO to stop writing
`EntryCount` and bump bitcode version
2024-09-06 16:38:17 -07:00
Mircea Trofin
775c50709c
[ctx_prof] Flattened profile lowering pass (#107329)
Pass to flatten and lower the contextual profile to profile (i.e. `MD_prof`) metadata. This is expected to be used after all IPO transformations have happened.

Prior to lowering, the instrumentation is maintained during IPO and the contextual profile is kept in sync (see PRs #105469, #106154). Flattening (#104539) sums up all the counters belonging to all a function's context nodes.

We first propagate counter values (from the flattened profile) using the same propagation algorithm as `PGOUseFunc::populateCounters`, then map the edge values to `branch_weights`. Functions. in the module that don't have an entry in the flattened profile are deemed cold, and any `MD_prof` metadata they may have is reset. The profile summary is also reset at this point.

Issue [#89287](https://github.com/llvm/llvm-project/issues/89287)
2024-09-06 13:47:08 -07:00
Arthur Eubanks
fb14f1df54
[PGO][Pipeline] Enable PGOForceFunctionAttrs in PGO optimization pipelines (#106790)
Remove flag that turns on the PGOForceFunctionAttrs pass and always add
it to default pipelines when using PGO.

This is NFC by default since PGOOpt->ColdOptType is by default
ColdFuncOpt::Default.

Remove -O2 RUN line in basic.ll since we now have the pipeline tests.
2024-09-03 10:35:08 -07:00
Shengchen Kan
87c86aa6b9
[X86,SimplifyCFG] Support hoisting load/store with conditional faulting (Part I) (#96878)
This is simplifycfg part of
https://github.com/llvm/llvm-project/pull/95515

In this PR, we support hoisting load/store with conditional faulting in
`SimplifyCFGOpt::speculativelyExecuteBB` to eliminate conditional
branches.
This is for cases like
```
void test (int a, int *b) {
  if (a)
   *b = a;
}
``` 

In the following patches, we will support the hoist in
`SimplifyCFGOpt::hoistCommonCodeFromSuccessors`.
That is for cases like
```
void test (int a, int *c, int *d) {
  if (a)
   *c = a;
  else 
   *d = a;
}
```
2024-08-29 10:42:44 +08:00
Mircea Trofin
3f18a0a71c
[nfc] Improve testability of PGOInstrumentationGen (#104490)
Passing to the `PGOInstrumentationGen` pass whether it needs to produce contextual profiling instrumentation as a flag, in the process restructuring a bit the places that need to be aware of that (some were unnecessarily in `PGOInstrumentationUse`)
2024-08-16 09:45:29 -07:00
Mircea Trofin
aca01bff07
[ctx_prof] CtxProfAnalysis: populate module data (#102930)
Continuing from #102084, which introduced the analysis, we now populate
it with info about functions contained in the module.

When we will update the profile due to e.g. inlined callsites, we'll
ingest the callee's counters and callsites to the caller. We'll move
those to the caller's respective index space (counter and callers), so
we need to know and maintain where those currently end.

We also don't need to keep profiles not pertinent to this module.

This patch also introduces an arguably much simpler way to track the
GUID of a function from the frontend compilation, through ThinLTO, and
into the post-thinlink compilation step, which doesn't rely on keeping
names around. A separate RFC and patches will discuss extending this to
the current PGO (instrumented and sampled) and other consumers as an
infrastructural component.
2024-08-14 18:46:25 -07:00
Mircea Trofin
4a2bf05980 Reapply "[ctx_prof] Fix the pre-thinlink "use" case (#102511)"
This reverts commit 967185eeb85abb77bd6b6cdd2b026d5c54b7d4f3.

The problem was link dependencies, moved `UseCtxProfile` to `Analysis`.
2024-08-08 17:04:00 -07:00
Aiden Grossman
967185eeb8 Revert "[ctx_prof] Fix the pre-thinlink "use" case (#102511)"
This reverts commit 1a6d60e0162b3ef767c87c95512dd453bf4f4746.

Broke some buildbots.
2024-08-08 21:14:56 +00:00
Mircea Trofin
1a6d60e016
[ctx_prof] Fix the pre-thinlink "use" case (#102511)
Didn't notice in #101338 that the instrumentation in `llvm/test/Transforms/PGOProfile/ctx-prof-use-prelink.ll` was actually incorrect.
2024-08-08 16:45:04 -04:00
Mircea Trofin
dbbf0762b6
[ctx_prof] CtxProfAnalysis (#102084)
This is an immutable analysis that loads and makes the contextual profile available to other passes. This patch introduces the analysis and an analysis printer pass. Subsequent patches will introduce the APIs that IPO passes will call to modify the profile as result of their changes.
2024-08-07 14:39:48 -04:00
Mircea Trofin
ba4da5a087
[ctx_prof] "Use" support for pre-thinlink. (#101338)
There is currently no plan to support contextual profiling use in a non-
ThinLTO scenario.

In the pre-link phase, we only instrument and then immediately bail out
to let the linker group functions under an entrypoint in the same module
as the entrypoint. We don't actually care what the profile contains -
just that we want to use a contextual profile.

After that, in post-thinlink, we require the profile be passed again so
we can actually use it. The earlier instrumentation will be used to
match counter values.

While the feature is in development, we add a hidden flag for the use
scenario, but we can eventually tie it to the `PGOOptions` mechanism. We
will use the same flag in both pre- and post-thinlink, because it
simplifies things - usually the post-thinlink args are the same as the
ones for pre-. This, despite the flag being basically treated as a
boolean in pre-thinlink.
2024-08-02 20:51:27 -04:00
Wei Wang
3a9ef4e69a
[Pipelines] Do not run CoroSplit and CoroCleanup in LTO pre-link pipeline (#100205)
This is re-land of #90310 after making asan skip pre-split coroutines in
#99415.

Skip CoroSplit and CoroCleanup in LTO pre-link pipeline so that
CoroElide can happen after callee coroutine is imported into caller's
module in ThinLTO.
2024-07-29 17:42:01 -07:00
Joseph Huber
8758091a70
[LLVM] Add 'ExpandVariadicsPass' to LTO default pipeline (#100479)
Summary:
This pass expands variadic functions into non-variadic function calls
according to the target ABI. Currently, this is used as the lowering for
the NVPTX and AMDGPU targets.

This pass is currently only run late in the target's backend. However,
during LTO we want to run it before the inliner pass so that the
expanded functions can be inlined using standard heuristics. This pass
is a no-op for unsupported targets, so this won't apply to any code that
isn't already using it.
2024-07-25 09:21:05 -05:00
Tianqing Wang
3d494bfc7f
[SimplifyCFG] Increase budget for FoldTwoEntryPHINode() if the branch is unpredictable. (#98495)
The `!unpredictable` metadata has been present for a long time, but
it's usage in optimizations is still limited. This patch teaches
`FoldTwoEntryPHINode()` to be more aggressive with an unpredictable
branch to reduce mispredictions.

A TTI interface `getBranchMispredictPenalty()` is added to distinguish
between different hardwares to ensure we don't go too far for simpler
cores. For simplicity, only a naive x86 implementation is included for
the time being.
2024-07-23 07:47:21 +08:00
xur-llvm
b1ca2a9546
[PGO] Sampled instrumentation in PGO to speed up instrumentation binary (#69535)
In comparison to non-instrumented binaries, PGO instrumentation binaries
can be significantly slower. For highly threaded programs, this slowdown
can
reach 10x due to data races or false sharing within counters.

This patch incorporates sampling into the PGO instrumentation process to
enhance the speed of instrumentation binaries. The fundamental concept
is similar to the one proposed in https://reviews.llvm.org/D63949.

Three sampling modes are introduced:
1. Simple Sampling: When '-sampled-instr-bust-duration' is set to 1.
2. Fast Burst Sampling: When not using simple sampling, and 
'-sampled-instr-period' is set to 65535. This is the default mode of
sampling.
3. Full Burst Sampling: When neither simple nor fast burst sampling is
used.

Utilizing this sampled instrumentation significantly improves the
binary's
execution speed. Measurements show up to 5x speedup with default
settings. Fast burst sampling now results in only around 20% to 30%
slowdown (compared to 8 to 10x slowdown without sampling).

Out tests show that profile quality remains good with sampling,
with edge counts typically showing more than 90% overlap.
For applications whose behavior changes due to binary speed,
sampling instrumentation can enhance performance.
Observations have shown some apps experiencing up to
a ~2% improvement in PGO.

A potential drawback of this patch is the increased binary size
and compilation time. The Sampling method in this patch does
not improve single threaded program instrumentation binary
speed.
2024-07-22 09:19:17 -07:00
YAMAMOTO Takashi
5d79110959
[Pipelines] Perform mergefunc after constmerge (#92498)
Constmerge can fold switch jump tables, possibly making functions
identical again. It can help mergefunc.
On the other hand, the opposite seems unlikely.

Fixes https://github.com/llvm/llvm-project/issues/92201.
2024-07-05 12:28:03 +02:00
Egor Pasko
cab81dd038
[EntryExitInstrumenter] Move passes out of clang into LLVM default pipelines (#92171)
Move EntryExitInstrumenter(PostInlining=true) to as late as possible and
EntryExitInstrumenter(PostInlining=false) to an early pre-inlining stage
(but skip for ThinLTO post-link).

This should fix the issues reported in
https://github.com/rust-lang/rust/issues/92109 and
https://github.com/llvm/llvm-project/issues/52853. These are caused
by https://reviews.llvm.org/D97608.
2024-05-31 12:48:45 -07:00
Mircea Trofin
d311a62e2f
[ctx_profile] Decouple ctx instrumentation from PGOOpt (#92445)
We currently don't support passing files and don't need frontend involvement either.
2024-05-16 13:41:36 -07:00
Mircea Trofin
174cdeced0
[nfc] Clarify when the various PGO instrumentation passes run (#92330)
The code seems easier to read if it's centered on what the user wants rather than combinations of whatever internal variables.
2024-05-16 12:17:22 -07:00
Reid Kleckner
aa0776de46 Revert "[Pipelines] Do not run CoroSplit and CoroCleanup in LTO pre-link pipeline (#90310)" and related patches
This change is incorrect when thinlto and asan are enabled, and this can
be observed by adding `-fsanitize=address` to the provided
coro-elide-thinlto.cpp test. It results in the error "Coroutines cannot
handle non static allocas yet", and ASan introduces a dynamic alloca.

In other words, we must preserve the invariant that CoroSplit runs
before ASan. If we move CoroSplit to the post post-link compile stage,
ASan has to be moved to the post-link compile stage first.  It would
also be correct to make CoroSplit handle dynamic allocas so the pass
ordering doesn't matter, but sanitizer instrumentation really ought to
be last, after coroutine splitting.

This reverts commit bafc5f42c0132171287d7cba7f5c14459be1f7b7.
This reverts commit b1b1bfa7bea0ce489b5ea9134e17a43c695df5ec.
This reverts commit 0232b77e145577ab78e3ed1fdbb7eacc5a7381ab.
This reverts commit fb2d3056618e3d03ba9a695627c7b002458e59f0.
This reverts commit 1cb33713910501c6352d0eb2a15b7a15e6e18695.
This reverts commit cd68d7b3c0ebf6da5e235cfabd5e6381737eb7fe.
2024-05-10 21:28:13 +00:00
Mircea Trofin
96568f3539
[llvm][ctx_profile] Add instrumentation lowering (#90821)
This adds the instrumentation lowering pass.

(Tracking Issue: #89287, RFC referenced there)
2024-05-08 16:49:08 -07:00
Wei Wang
bafc5f42c0
[Pipelines][Coroutines] Tune coroutine passes only for ThinLTO pre-link pipeline (#90690)
Follow up to #90310, limit the tune up only to ThinLTO pre-link as
coroutine passes are not in MonoLTO backend
2024-04-30 21:40:04 -07:00
Wei Wang
cd68d7b3c0
[Pipelines] Do not run CoroSplit and CoroCleanup in LTO pre-link pipeline (#90310)
Skip CoroSplit and CoroCleanup in LTO pre-link pipeline so that
CoroElide can happen after callee coroutine is imported into caller's
module in ThinLTO.
2024-04-29 10:24:53 -07:00
Arthur Eubanks
947b656add
[PGO] Check that PGOOpt exists before using PGOOpt->ColdOptType (#89139)
This means that the pass is unusable without some sort of profile. We
can revisit this decision later if we want to support running this pass
without a profile.
2024-04-18 11:22:10 -07:00
Florian Hahn
0f82469314
[Passes] Run SimpleLoopUnswitch after introducing invariant branches. (#81271)
IndVars may be able to replace a loop dependent condition with a loop
invariant one, but loop-unswitch runs before IndVars, so the invariant
check remains in the loop.

For an example, consider a read-only loop with a bounds check:
https://godbolt.org/z/8cdj4qhbG

This patch uses a approach similar to the way extra cleanup passes are
run on demand after vectorization (added in acea6e9cfa4c4a0e8678c7).

It introduces a new ShouldRunExtraSimpleLoopUnswitch analysis marker,
which IndVars can use to indicate that extra unswitching is beneficial.

ExtraSimpleLoopUnswitchPassManager uses this analysis to determine
whether to run its passes on a loop.

Compile-time impact (geomean) ranges from +0.0% to 0.02%
https://llvm-compile-time-tracker.com/compare.php?from=138c0beb109ffe47f75a0fe8c4dc2cdabe8a6532&to=19e6e99eeb280d426907ea73a21b139ba7225627&stat=instructions%3Au

Compile-time impact (geomean) of unconditionally running
SimpleLoopUnswitch ranges from +0.05% - +0.16%

https://llvm-compile-time-tracker.com/compare.php?from=138c0beb109ffe47f75a0fe8c4dc2cdabe8a6532&to=2930dfd5accdce2e6f8d5146ae4d626add2065a2&stat=instructions:u

Unconditionally running SimpleLoopUnswitch seems to indicate that there
are multiple other scenarios where we fail to run unswitching when
opportunities remain.


Fixes https://github.com/llvm/llvm-project/issues/85551.

PR: https://github.com/llvm/llvm-project/pull/81271
2024-04-12 22:07:29 +01:00
lifengxiang1025
e40cabfea4
[MemProf] Match function's summary and definition strictly (#83665)
Problem description:
https://github.com/llvm/llvm-project/pull/81008#issuecomment-1933468520
Solution:
https://github.com/llvm/llvm-project/pull/81008#issuecomment-1934192548
(choose plan2)
2024-03-12 11:00:02 +08:00
Paul Kirth
2fef685363
[llvm][loop-rotate] Allow forcing loop-rotation (#82828)
Many profitable optimizations cannot be performed at -Oz, due to
unrotated loops. While this is worse for size (minimally), many of the
optimizations significantly reduce code size, such as memcpy
optimizations and other patterns found by loop idiom recognition.
Related discussion can be found in issue #50308.

This patch adds an experimental, backend-only flag to allow loop header
duplication, regardless of the optimization level. Downstream consumers
can experiment with this flag, and if it is profitable, we can adjust
the compiler's defaults accordingly, and expose any useful frontend
flags to opt into the new behavior.
2024-02-29 13:46:13 -08:00
Paul Kirth
777ac46ddb
[llvm] Remove pipeline checks for optsize for DFAJumpThreadingPass
The pass itself checks whether to apply the optimization based on the
minsize attribute, so there isn't much functional benefit to preventing
the pass from being added. Gating the pass gets added to the pass
pipeline complicates the interaction with -enable-dfa-jump-thread, as
well.

Reviewers: aeubanks

Reviewed By: aeubanks

Pull Request: https://github.com/llvm/llvm-project/pull/83318
2024-02-28 11:12:13 -08:00