721 Commits

Author SHA1 Message Date
Teresa Johnson
546ec641b4 Restore "[MemProf] Use new option/pass for profile feedback and matching"
This restores commit b4a82b62258c5f650a1cccf5b179933e6bae4867, reverted
in 3ab7ef28eebf9019eb3d3c4efd7ebfd160106bb1 because it was thought to
cause a bot failure, which ended up being unrelated to this patch set.

Differential Revision: https://reviews.llvm.org/D154856
2023-07-11 13:16:20 -07:00
JP Lehr
3ab7ef28ee Revert "[MemProf] Use new option/pass for profile feedback and matching"
This reverts commit b4a82b62258c5f650a1cccf5b179933e6bae4867.

Broke AMDGPU OpenMP Offload buildbot
2023-07-11 05:44:42 -04:00
Teresa Johnson
b4a82b6225 [MemProf] Use new option/pass for profile feedback and matching
Previously the MemProf profile was expected to be in the same profile
file as a normal PGO profile, passed via the usual -fprofile-use=
option, and was matched in the same pass. To simplify profile
preparation, since the raw MemProf profile requires the binary for
symbolization and may be simpler to index separately from the raw PGO
profile, and also to enable providing a MemProf profile for a SamplePGO
build, separate out the MemProf feedback option and matching pass.

This patch adds the -fmemory-profile-use=${file} option, and the
provided file is passed down to LLVM and ultimately used in a new
MemProfUsePass which performs the matching of just the memory profile
contents of that file.

Note that a single profile file containing both normal PGO and MemProf
profile data is still supported, and the relevant profile data is
matched by the appropriate matching pass(es) based on which option(s)
the profile is provided with (the same profile file can be supplied to
both feedback options).

Differential Revision: https://reviews.llvm.org/D154856
2023-07-10 16:42:56 -07:00
Arthur Eubanks
72e7e5851f [MemorySSA] Always perform MemoryUses liveOnEntry optimization on MSSA construction
Fixes invariant memory regressions in future DSE patches.

Also add a flag to print<memoryssa> to not ensure optimized uses to test this.

Noticeable compile time regression [1], but a future DSE change that depends on this more than makes up for it.

[1] https://llvm-compile-time-tracker.com/compare.php?from=9d5466849a770eeab222d5a5890376d3596e8ad6&to=95682dbe11d76a3342870437377216e96b167504&stat=instructions:u

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D152859
2023-07-06 14:09:47 -07:00
Matthew Voss
a1ca3af31e [llvm] A Unified LTO Bitcode Frontend
Here's a high level summary of the changes in this patch. For more
information on rational, see the RFC.
(https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774).

  - Add config parameter to LTO backend, specifying which LTO mode is
    desired when using unified LTO.
  - Add unified LTO flag to the summary index for efficiency. Unified
    LTO modules can be detected without parsing the module.
  - Make sure that the ModuleID is generated by incorporating more types
    of symbols.

Differential Revision: https://reviews.llvm.org/D123803
2023-07-05 14:53:14 -07:00
Arthur Eubanks
ff79eb3af6 [PassBuilder] Add textual representation for function simplification pipeline
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D153784
2023-06-29 09:39:04 -07:00
Paul Kirth
75a1797044 Reland [llvm] Preliminary fat-lto-objects support
Fat LTO objects contain both LTO compatible IR, as well as generated
object code. This allows users to defer the choice of whether to use LTO
or not to link-time. This is a feature available in GCC for some time,
and makes the existing -ffat-lto-objects flag functional in the same
way as GCC's.

Within LLVM, we add a new EmbedBitcodePass that serializes the module to
the object file, and expose a new pass pipeline for compiling fat
objects. The new pipeline initially clones the module and runs the
selected (Thin)LTOPrelink pipeline, after which it will serialize the
module into a `.llvm.lto` section of an ELF file. When compiling for
(Thin)LTO, this normally the point at which the compiler would emit a
object file containing the bitcode and metadata.

After that point we compile the original module using the
PerModuleDefaultPipeline used for non-LTO compilation. We generate
standard object files at the end of this pipeline, which contain machine
code and the new `.llvm.lto` section containing bitcode.

Since the two pipelines operate on different copies of the module, we
can be sure that the bitcode in the `.llvm.lto` section and object code
in  `.text` are congruent with the existing output produced by the
default and LTO pipelines.

Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977

Earlier versions of this patch were missing REQUIRES lines for llc
related tests in Transforms/EmbedBitcode. Those tests are now under
CodeGen/X86, which should avoid running the check on unsupported
platforms.

The EmbedbBitcodePass also returned PreservedAnalyses::all when adding a
metadata section, which failed expensive checks, since it modified the
module. This is now corrected.

Reviewed By: tejohnson, MaskRay, nikic

Differential Revision: https://reviews.llvm.org/D146776
2023-06-28 21:37:50 +00:00
Alex Brachet
6085eb3084 Revert "Reland [llvm] Preliminary fat-lto-objects support"
This reverts commit 44265dc3554ef40920b587eeb787a400663af6c7.
2023-06-24 01:15:50 +00:00
Teresa Johnson
200cc952a2 [LTO][GlobalDCE] Use pass parameter instead of module flag for LTO phase
D63932 added a module flag to indicate that we are executing the regular
LTO post merge pipeline, so that GlobalDCE could perform more aggressive
optimization for Dead Virtual Function Elimination. This caused issues
trying to reuse bitcode that had already been through the LTO pipeline
(see context in D139816).

Instead support this by passing down a parameter flag to the
GlobalDCEPass constructor, which is the more usual way for indicating
this information.

Most test changes are to remove incidental uses of this flag. Of the 2
real uses, llvm/test/LTO/ARM/lto-linking-metadata.ll is now obsolete and
removed in this patch, and the virtual-functions-visibility-post-lto.ll
test is updated to use the regular LTO default pipeline where this
parameter is set to true.

Differential Revision: https://reviews.llvm.org/D153655
2023-06-23 17:05:07 -07:00
Paul Kirth
44265dc355 Reland [llvm] Preliminary fat-lto-objects support
Fat LTO objects contain both LTO compatible IR, as well as generated
object code. This allows users to defer the choice of whether to use LTO
or not to link-time. This is a feature available in GCC for some time,
and makes the existing -ffat-lto-objects flag functional in the same
way as GCC's.

Within LLVM, we add a new EmbedBitcodePass that serializes the module to
the object file, and expose a new pass pipeline for compiling fat
objects. The new pipeline initially clones the module and runs the
selected (Thin)LTOPrelink pipeline, after which it will serialize the
module into a `.llvm.lto` section of an ELF file. When compiling for
(Thin)LTO, this normally the point at which the compiler would emit a
object file containing the bitcode and metadata.

After that point we compile the original module using the
PerModuleDefaultPipeline used for non-LTO compilation. We generate
standard object files at the end of this pipeline, which contain machine
code and the new `.llvm.lto` section containing bitcode.

Since the two pipelines operate on different copies of the module, we
can be sure that the bitcode in the `.llvm.lto` section and object code
in  `.text` are congruent with the existing output produced by the
default and LTO pipelines.

Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977

Earlier versions of this patch were missing REQUIRES lines for llc
related tests in Transforms/EmbedBitcode. Those tests are now under
CodeGen/X86, which should avoid running the check on unsupported
platforms.

Reviewed By: tejohnson, MaskRay, nikic

Differential Revision: https://reviews.llvm.org/D146776
2023-06-23 23:23:58 +00:00
Paul Kirth
a3800ad9d8 Revert "[llvm] Preliminary fat-lto-objects support"
There seems to be a problem on arm buildbots. Reverting until I can
investigate.  https://lab.llvm.org/buildbot#builders/245/builds/10184

This reverts commit a67208e1c697649ce432e6497f56a93675273dd8
and dependent commit e54a3112cee5ae0a9117359ecbea878e1388f51e.
2023-06-23 18:43:41 +00:00
Paul Kirth
a67208e1c6 [llvm] Preliminary fat-lto-objects support
Fat LTO objects contain both LTO compatible IR, as well as generated
object code. This allows users to defer the choice of whether to use LTO
or not to link-time. This is a feature available in GCC for some time,
and makes the existing -ffat-lto-objects flag functional in the same
way as GCC's.

Within LLVM, we add a new EmbedBitcodePass that serializes the module to
the object file, and expose a new pass pipeline for compiling fat
objects. The new pipeline initially clones the module and runs the
selected (Thin)LTOPrelink pipeline, after which it will serialize the
module into a `.llvm.lto` section of an ELF file. When compiling for
(Thin)LTO, this normally the point at which the compiler would emit a
object file containing the bitcode and metadata.

After that point we compile the original module using the
PerModuleDefaultPipeline used for non-LTO compilation. We generate
standard object files at the end of this pipeline, which contain machine
code and the new `.llvm.lto` section containing bitcode.

Since the two pipelines operate on different copies of the module, we
can be sure that the bitcode in the `.llvm.lto` section and object code
in  `.text` are congruent with the existing output produced by the
default and LTO pipelines.

Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977

Reviewed By: tejohnson, MaskRay, nikic

Differential Revision: https://reviews.llvm.org/D146776
2023-06-23 17:51:30 +00:00
Yann Girsberger
1d5651060e [opt] Exposing the parameters of LoopRotate to the -passes interface
There is a gap between running opt -Oz and running opt -passes="OZ_PASSES" where OZ_PASSES is taken from running opt -Oz -print-pipeline-passes.

One of the reasons causing this is that -Oz uses non-default setting for LoopRotate but LoopRotate does not expose its settings when printing the pipeline.

This commit fixes this by exposing LoopRotates parameters.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D153437
2023-06-22 11:09:23 -07:00
Arthur Eubanks
d49984fa4f [SimplifyCFG] Add option to not speculate blocks
Required for phase ordering changes to not regress Rust code with D145265.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D153391
2023-06-22 08:51:40 -07:00
Arthur Eubanks
278d65b2cf [SimplifyCFG] Add textual pass params for FoldTwoEntryPHINode and SimplifyCondBranch 2023-06-15 14:21:24 -07:00
serge-sans-paille
afa13ba18d
Reapply Move "auto-init" instructions to the dominator of their users
Original patch (50b2a113db197a97f60ad2aace8b7382dc9b8c31) ignored the
fact that -ftrivial-auto-var-init could affect function parameters with
the sret attribute.
Just do not move instruction that don't affect alloca.
Also add missing test case for volatile instruction.

Differential Revision: https://reviews.llvm.org/D148507
2023-04-24 18:10:10 +02:00
pvanhout
ae77aceba5 [Analysis] Remove DA & LegacyDA
UniformityAnalysis offers all of the same features and much more, there is no reason left to use the legacy DAs.
See RFC: https://discourse.llvm.org/t/rfc-deprecate-divergenceanalysis-legacydivergenceanalysis/69538

- Remove LegacyDivergenceAnalysis.h/.cpp
- Remove DivergenceAnalysis.h/.cpp + Unit tests
- Remove SyncDependenceAnalysis - it was not a real registered analysis and was only used by DAs
- Remove/adjust references to the passes in the docs where applicable
- Remove TTI hook associated with those passes.
- Move tests to UniformityAnalysis folder.
  - Remove RUN lines for the DA, leave only the UA ones.
- Some tests had to be adjusted/removed depending on how they used the legacy DAs.

Reviewed By: foad, sameerds

Differential Revision: https://reviews.llvm.org/D148116
2023-04-17 09:01:22 +02:00
Hans Wennborg
a6d9730f40 Revert "Move "auto-init" instructions to the dominator of their users"
This could also move initialization of sret args, causing actually
initialized parts of such return values to be uninitialized. See
discussion on the code review.

> As a result of -ftrivial-auto-var-init, clang generates instructions to
> set alloca'd memory to a given pattern, right after the allocation site.
> In some cases, this (somehow costly) operation could be delayed, leading
> to conditional execution in some cases.
>
> This is not an uncommon situation: it happens ~500 times on the cPython
> code base, and much more on the LLVM codebase. The benefit greatly
> varies on the execution path, but it should not regress on performance.
>
> This is a recommit of cca01008cc31a891d0ec70aff2201b25d05d8f1b with
> MemorySSA update fixes.
>
> Differential Revision: https://reviews.llvm.org/D137707

This reverts commit 50b2a113db197a97f60ad2aace8b7382dc9b8c31
and follow-up commit ad9ad3735c4821ff4651fab7537a75b8f0bb60f8.
2023-04-12 13:37:21 +02:00
serge-sans-paille
50b2a113db
Move "auto-init" instructions to the dominator of their users
As a result of -ftrivial-auto-var-init, clang generates instructions to
set alloca'd memory to a given pattern, right after the allocation site.
In some cases, this (somehow costly) operation could be delayed, leading
to conditional execution in some cases.

This is not an uncommon situation: it happens ~500 times on the cPython
code base, and much more on the LLVM codebase. The benefit greatly
varies on the execution path, but it should not regress on performance.

This is a recommit of cca01008cc31a891d0ec70aff2201b25d05d8f1b with
MemorySSA update fixes.

Differential Revision: https://reviews.llvm.org/D137707
2023-04-04 07:30:03 +02:00
serge-sans-paille
11ae47dfc6
Revert "Move "auto-init" instructions to the dominator of their users"
This reverts commit cca01008cc31a891d0ec70aff2201b25d05d8f1b.

This change breaks memory ssa checks, see https://lab.llvm.org/buildbot#builders/109/builds/60970
2023-04-03 15:46:18 +02:00
serge-sans-paille
cca01008cc
Move "auto-init" instructions to the dominator of their users
As a result of -ftrivial-auto-var-init, clang generates instructions to
set alloca'd memory to a given pattern, right after the allocation site.
In some cases, this (somehow costly) operation could be delayed, leading
to conditional execution in some cases.

This is not an uncommon situation: it happens ~500 times on the cPython
code base, and much more on the LLVM codebase. The benefit greatly
varies on the execution path, but it should not regress on performance.

Differential Revision: https://reviews.llvm.org/D137707
2023-04-03 15:27:27 +02:00
Teresa Johnson
700cd99061 Restore "[MemProf] Context disambiguation cloning pass [patch 1a/3]"
This restores commit d6ad4f01c3dafcab335bca66dac6e36d9eac8421, which was
reverted in commit 883dbb9c86be87593a58ef10b070b3a0564c7fee, along with
a fix for gcc 12.2 build errors in the original commit.

Support for building, printing, and displaying CallsiteContextGraph
which represents the MemProf metadata contexts. Uses CRTP to enable
support for both IR (regular LTO) and summary (ThinLTO). This patch
includes the support for building it in regular LTO mode (from
memprof and callsite metadata), and the next patch will add the
handling for building it from ThinLTO summaries.

Also includes support for dumping the graph to text and to dot files.

Follow-on patches will contain the support for cloning on the graph and
in the IR.

The graph represents the call contexts in all memprof metadata on
allocation calls, with nodes for the allocations themselves, as well as
for the calls in each context. The graph is initially built from the
allocation memprof metadata (or summary) MIBs. It is then updated to
match calls with callsite metadata onto the nodes, updating it to
reflect any inlining performed on those calls.

Each MIB (representing an allocation's call context with allocation
behavior) is assigned a unique context id during the graph build. The
edges and nodes in the graph are decorated with the context ids they
carry. This is used to correctly update the graph when cloning is
performed so that we can uniquify the context for a single (possibly
cloned) allocation.

Differential Revision: https://reviews.llvm.org/D140908
2023-03-22 10:16:06 -07:00
Nikita Popov
883dbb9c86 Revert "[MemProf] Context disambiguation cloning pass [patch 1a/3]"
This reverts commit d6ad4f01c3dafcab335bca66dac6e36d9eac8421.

Fails to build on at least gcc 12.2:

/home/npopov/repos/llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:482:1: error: no declaration matches ‘ContextNode<DerivedCCG, FuncTy, CallTy>* CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForInst(const CallInfo&)’
  482 | CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForInst(
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/npopov/repos/llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:393:16: note: candidate is: ‘CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::ContextNode* CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForInst(const CallInfo&)’
  393 |   ContextNode *getNodeForInst(const CallInfo &C);
      |                ^~~~~~~~~~~~~~
/home/npopov/repos/llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:99:7: note: ‘class CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>’ defined here
   99 | class CallsiteContextGraph {
      |       ^~~~~~~~~~~~~~~~~~~~
2023-03-22 15:43:46 +01:00
Teresa Johnson
d6ad4f01c3 [MemProf] Context disambiguation cloning pass [patch 1a/3]
Support for building, printing, and displaying CallsiteContextGraph
which represents the MemProf metadata contexts. Uses CRTP to enable
support for both IR (regular LTO) and summary (ThinLTO). This patch
includes the support for building it in regular LTO mode (from
memprof and callsite metadata), and the next patch will add the
handling for building it from ThinLTO summaries.

Also includes support for dumping the graph to text and to dot files.

Follow-on patches will contain the support for cloning on the graph and
in the IR.

The graph represents the call contexts in all memprof metadata on
allocation calls, with nodes for the allocations themselves, as well as
for the calls in each context. The graph is initially built from the
allocation memprof metadata (or summary) MIBs. It is then updated to
match calls with callsite metadata onto the nodes, updating it to
reflect any inlining performed on those calls.

Each MIB (representing an allocation's call context with allocation
behavior) is assigned a unique context id during the graph build. The
edges and nodes in the graph are decorated with the context ids they
carry. This is used to correctly update the graph when cloning is
performed so that we can uniquify the context for a single (possibly
cloned) allocation.

Depends on D140786.

Differential Revision: https://reviews.llvm.org/D140908
2023-03-22 07:05:27 -07:00
Arthur Eubanks
5558346c2b [CGSCC] Allow creation of no-rerun CGSCC->function adaptor via textual pipeline
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D145196
2023-03-19 16:26:27 -07:00
Nikita Popov
a8f6b5763e [PassBuilder] Support O0 in default pipelines
The default and pre-link pipeline builders currently require you to
call a separate method for optimization level O0, even though they
have perfectly well-defined O0 optimization pipelines.

Accept O0 optimization level and call buildO0DefaultPipeline()
internally, so all consumers don't need to repeat this.

Differential Revision: https://reviews.llvm.org/D146200
2023-03-17 10:00:05 +01:00
Arthur Eubanks
0d4a709bb8 [Pipeline] Adjust PostOrderFunctionAttrs placement in simplification pipeline
We can infer more attribute information once functions are fully
simplified, so move the PostOrderFunctionAttrs pass after the function
simplification pipeline. However, just doing this can impact
simplification of recursive functions since function simplification
takes advantage of function attributes of callees (some LLVM tests are
actually impacted by this), so keep a copy of PostOrderFunctionAttrs
before the function simplification pipeline that only runs on recursive
functions.

For example, this fixes the small regression noticed in https://reviews.llvm.org/D128830.

This requires some restructuring of the CGSCC NoRerun feature. We need
to cache the ShouldNotRunFunctionPassesAnalysis analysis after the
simplification is done, which now is after the second
PostOrderFunctionAttrs run, rather than after the function
simplification pipeline.

Compile time impact:
https://llvm-compile-time-tracker.com/compare.php?from=33cf40122279342b50f92a3a53f5c185390b6018&to=1bb2a07875634e508a6bdf2ca1b130f55510f060&stat=instructions:u

Compile time increase from unconditionally running the first PostOrderFunctionAttrs:
https://llvm-compile-time-tracker.com/compare.php?from=1bb2a07875634e508a6bdf2ca1b130f55510f060&to=f4f87e89cc7a35c64e3a103a8036192a84ae002b&stat=instructions:u

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D145210
2023-03-06 09:01:45 -08:00
Jan Dupej
1ceb79e2e0 Port PlaceSafepoints pass to the new pass manager
This patch ports the PlaceSafepoints pass to the new pass manager as it is used by .NET/Mono. Compatibility with the legacy pass manager is maintained by adding PlaceSafepointsLegacyPass. This pass also depends on PlaceBackedgeSafepointsLegacyPass, which has been kept in the legacy-only variant, since it is apparently used only from PlaceSafepointsPass. It has been renamed, though, to indicate its legacy interface.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D136163
2023-02-17 09:17:49 -08:00
Sanjay Patel
4ecc6af813 [InstCombine] create a pass options container and add "use-loop-info" argument
This is a cleanup/modernization patch requested in D144045 to make loop
analysis a proper optional parameter to the pass rather than a
semi-arbitrary value inherited via the pass pipeline.

It's a bit more complicated than the recent patch I started copying from
(D143980) because InstCombine already has an option for MaxIterations
(added with D71145).

I debated just deleting that option, but it was used by a pair of existing
tests, so I put it into a struct (code largely copied from SimplifyCFG's
implementation) to make the code more flexible for future options
enhancements.

I didn't alter the pass manager invocations of InstCombine in this patch
because the patch was already getting big, but that will be a small
follow-up as noted with the TODO comment.

Differential Revision: https://reviews.llvm.org/D144199
2023-02-17 10:30:15 -05:00
Liren Peng
a52432f633 [NFC][SeparateConstOffsetFromGEP] Added flag lower-gep
We need such a flag to check whether the transformation is correct if
LowerGEP was enabled.

Reviewed By: nikic, arsenm, spatel

Differential Revision: https://reviews.llvm.org/D143980
2023-02-15 02:04:30 +00:00
Samuel Parker
2a58be4239 [HardwareLoops] NewPM support.
With the NPM, we're now defaulting to preserving LCSSA, so a couple
of tests have changed slightly.

Differential Revision: https://reviews.llvm.org/D140982
2023-02-13 09:46:31 +00:00
Arthur Eubanks
4ce34bb2a9 [CGSCC] Add pass which counts the max number of times we visit a function
This will help with finding potential pathological CGSCC cases.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D142853
2023-01-30 10:06:53 -08:00
Nikita Popov
edc1bcfc24 [PassBuilder] Detect loop-mssa for licm with parameters (PR60149)
When auto-detecting loop-mssa for licm/lnicm, also handle the case
where there are pass parameters.

Fixes https://github.com/llvm/llvm-project/issues/60149.
2023-01-23 12:11:33 +01:00
Anshil Gandhi
c52f9485b0 [LegacyDivergenceAnalysis] Add NewPM support
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D142161
2023-01-20 15:53:28 -07:00
Matt Arsenault
e7cd42f8e4 Utils: Add utility pass to lower ifuncs
Create a global constructor which will initialize a global table of
function pointers. For now, this is only used as a reduction technique
for llvm-reduce.

In the future this may be useful to support ifunc on systems where the
program loader doesn't natively support it.
2023-01-17 22:33:56 -05:00
Samuel Parker
615333bc09 [TypePromotion] NewPM support.
Differential Revision: https://reviews.llvm.org/D140893
2023-01-03 15:09:29 +00:00
Alexandros Lamprineas
f952bc05fd [IPSCCP] Create a Pass parameter to control specialization of functions.
Required for D140210 in order to disable FuncSpec at {Os, Oz}
optimization levels.

Differential Revision: https://reviews.llvm.org/D140564
2022-12-23 16:54:45 +00:00
Sameer Sahasrabuddhe
475ce4c200 RFC: Uniformity Analysis for Irreducible Control Flow
Uniformity analysis is a generalization of divergence analysis to
include irreducible control flow:

  1. The proposed spec presents a notion of "maximal convergence" that
     captures the existing convention of converging threads at the
     headers of natual loops.

  2. Maximal convergence is then extended to irreducible cycles. The
     identity of irreducible cycles is determined by the choices made
     in a depth-first traversal of the control flow graph. Uniformity
     analysis uses criteria that depend only on closed paths and not
     cycles, to determine maximal convergence. This makes it a
     conservative analysis that is independent of the effect of DFS on
     CycleInfo.

  3. The analysis is implemented as a template that can be
     instantiated for both LLVM IR and Machine IR.

Validation:
  - passes existing tests for divergence analysis
  - passes new tests with irreducible control flow
  - passes equivalent tests in MIR and GMIR

Based on concepts originally outlined by
Nicolai Haehnle <nicolai.haehnle@amd.com>

With contributions from Ruiling Song <ruiling.song@amd.com> and
Jay Foad <jay.foad@amd.com>.

Support for GMIR and lit tests for GMIR/MIR added by
Yashwant Singh <yashwant.singh@amd.com>.

Differential Revision: https://reviews.llvm.org/D130746
2022-12-20 07:22:24 +05:30
Nikita Popov
8005332835 [AA] Remove CFL AA passes
The CFL Steens/Anders alias analysis passes are not enabled by
default, and to the best of my knowledge have no pathway towards
ever being enabled by default. The last significant interest in
these passes seems to date back to 2016. Given the little
maintenance these have seen in recent times, I also have very
little confidence in the correctness of these passes. I don't
think we should keep these in-tree.

Differential Revision: https://reviews.llvm.org/D139703
2022-12-12 09:34:20 +01:00
Roman Lebedev
4f7e5d2206
[SROA] For non-speculatable loads of selects -- split block, insert then/else blocks, form two-entry PHI node, take 2
Currently, SROA is CFG-preserving.
Not doing so does not affect any pipeline test. (???)
Internally, SROA requires Dominator Tree, and uses it solely for the final `-mem2reg` call.

By design, we can't really SROA alloca if their address escapes somehow,
but we have logic to deal with `load` of `select`/`PHI`,
where at least one of the possible addresses prevents promotion,
by speculating the `load`s and `select`ing between loaded values.

As one would expect, that requires ensuring that the speculation is actually legal.
Even ignoring complexity bailouts, that logic does not deal with everything,
e.g. `isSafeToLoadUnconditionally()` does not recurse into hands of `select`.
There can also be cases where the load is genuinely non-speculate.

So if we can't prove that the load can be speculated,
unfold the select, produce two-entry phi node, and perform predicated load.

Now, that transformation must obviously update Dominator Tree,
since we require it later on. Doing so is trivial.
Additionally, we don't want to do this for the final SROA invocation (D136806).

In the end, this ends up having negative (!) compile-time cost:
https://llvm-compile-time-tracker.com/compare.php?from=c6d7e80ec4c17a415673b1cfd25924f98ac83608&to=ddf9600365093ea50d7e278696cbfa01641c959d&stat=instructions:u

Though indeed, this only deals with `select`s, `PHI`s are still using speculation.

Should we update some more analysis?

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D138238

This reverts commit 739611870d3b06605afe25cc07833f6a62de9545,
and recommits 03e6d9d9d1d48e43f3efc35eb75369b90d4510d5
with a fixed assertion - we should check that DTU is there,
not just assert false...
2022-12-08 20:19:55 +03:00
Roman Lebedev
739611870d
Revert "[SROA] For non-speculatable loads of selects -- split block, insert then/else blocks, form two-entry PHI node"
The assertion about not modifying the CFG seems to not hold,
will recommit in a bit.

https://lab.llvm.org/buildbot#builders/139/builds/32412

This reverts commit 03e6d9d9d1d48e43f3efc35eb75369b90d4510d5.
This reverts commit 4f90f4ada33718f9025d0870a4fe3fe88276b3da.
2022-12-08 19:51:15 +03:00
Roman Lebedev
03e6d9d9d1
[SROA] For non-speculatable loads of selects -- split block, insert then/else blocks, form two-entry PHI node
Currently, SROA is CFG-preserving.
Not doing so does not affect any pipeline test. (???)
Internally, SROA requires Dominator Tree, and uses it solely for the final `-mem2reg` call.

By design, we can't really SROA alloca if their address escapes somehow,
but we have logic to deal with `load` of `select`/`PHI`,
where at least one of the possible addresses prevents promotion,
by speculating the `load`s and `select`ing between loaded values.

As one would expect, that requires ensuring that the speculation is actually legal.
Even ignoring complexity bailouts, that logic does not deal with everything,
e.g. `isSafeToLoadUnconditionally()` does not recurse into hands of `select`.
There can also be cases where the load is genuinely non-speculate.

So if we can't prove that the load can be speculated,
unfold the select, produce two-entry phi node, and perform predicated load.

Now, that transformation must obviously update Dominator Tree,
since we require it later on. Doing so is trivial.
Additionally, we don't want to do this for the final SROA invocation (D136806).

In the end, this ends up having negative (!) compile-time cost:
https://llvm-compile-time-tracker.com/compare.php?from=c6d7e80ec4c17a415673b1cfd25924f98ac83608&to=ddf9600365093ea50d7e278696cbfa01641c959d&stat=instructions:u

Though indeed, this only deals with `select`s, `PHI`s are still using speculation.

Should we update some more analysis?

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D138238
2022-12-08 16:51:32 +03:00
Fangrui Song
4e62072ca1 [Passes] llvm::Optional => std::optional 2022-12-04 20:44:52 +00:00
Kazu Hirata
aadaaface2 [llvm] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 21:11:44 -08:00
Kazu Hirata
410c1f6269 [Passes] Use std::optional in PassBuilder.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-25 12:47:46 -08:00
Sami Tolvanen
cacd3e73d7 Add generic KCFI operand bundle lowering
The KCFI sanitizer emits "kcfi" operand bundles to indirect
call instructions, which the LLVM back-end lowers into an
architecture-specific type check with a known machine instruction
sequence. Currently, KCFI operand bundle lowering is supported only
on 64-bit X86 and AArch64 architectures.

As a lightweight forward-edge CFI implementation that doesn't
require LTO is also useful for non-Linux low-level targets on
other machine architectures, add a generic KCFI operand bundle
lowering pass that's only used when back-end lowering support is not
available and allows -fsanitize=kcfi to be enabled in Clang on all
architectures.

This relands commit eb2a57ebc7aaad551af30462097a9e06c96db925 with
fixes.

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D135411
2022-11-22 23:01:18 +00:00
Alexander Shaposhnikov
7059a6c32c [IR] Split out IR printing passes into IRPrinter
This diff splits out (from LLVMCore) IR printing passes into IRPrinter.
This structure is similar to what we already have for IRReader and
enables us to avoid circular dependencies between LLVMCore and Analysis
(this is a preparation for https://reviews.llvm.org/D137768).
The legacy interface is left unchanged, once the legacy pass manager
is removed (in the future) we will be able to clean it up further.
The bazel build configuration has been updated as well.

Test plan:
1/ Tested the following cmake configurations: static/dynamic linking * lld/gold * clang/gcc
2/ bazel build --config=generic_clang @llvm-project//...

Differential revision: https://reviews.llvm.org/D138081
2022-11-18 01:47:56 +00:00
Fangrui Song
fc91c70593 Revert D135411 "Add generic KCFI operand bundle lowering"
This reverts commit eb2a57ebc7aaad551af30462097a9e06c96db925.

llvm/include/llvm/Transforms/Instrumentation/KCFI.h including
llvm/CodeGen is a layering violation. We should use an approach where
Instrumementation/ doesn't need to include CodeGen/.
Sorry for not spotting this in the review.
2022-11-17 22:45:30 +00:00
Sami Tolvanen
eb2a57ebc7 Add generic KCFI operand bundle lowering
The KCFI sanitizer emits "kcfi" operand bundles to indirect
call instructions, which the LLVM back-end lowers into an
architecture-specific type check with a known machine instruction
sequence. Currently, KCFI operand bundle lowering is supported only
on 64-bit X86 and AArch64 architectures.

As a lightweight forward-edge CFI implementation that doesn't
require LTO is also useful for non-Linux low-level targets on
other machine architectures, add a generic KCFI operand bundle
lowering pass that's only used when back-end lowering support is not
available and allows -fsanitize=kcfi to be enabled in Clang on all
architectures.

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D135411
2022-11-17 21:55:00 +00:00
OCHyams
913b561c0a [Assignment Tracking][6/*] Add trackAssignments function
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

Add trackAssignments which adds assignment tracking metadata to a function for
a specified set of variables. The intended callers are the inliner and the
front end - those calls will be added in separate patches.

I've added a pass called declare-to-assign (AssignmentTrackingPass) that
converts dbg.declare intrinsics to dbg.assigns using trackAssignments so that
the function can be easily tested (see
llvm/test/DebugInfo/Generic/track-assignments.ll). The pass could also be used
by front ends to easily test out enabling assignment tracking.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D132225
2022-11-08 16:52:11 +00:00