6118 Commits

Author SHA1 Message Date
Nikita Popov
658b260dbf [Attributor] Don't construct pretty GEPs
Bring this in line with other transforms like ArgPromotion/SROA/
SCEVExpander and always produce canonical i8 GEPs.
2023-12-22 16:48:13 +01:00
Ivan R. Ivanov
39f09ec245
Invalidate analyses after running Attributor in OpenMPOpt (#74908)
Using the LoopInfo from OMPInfoCache after the Attributor ran resulted
in a crash due to it being in an invalid state.

---------

Co-authored-by: Ivan Radanov Ivanov <ivanov2@llnl.gov>
2023-12-20 15:01:21 -08:00
Paul Walker
dea16ebd26
[LLVM][IR] Replace ConstantInt's specialisation of getType() with getIntegerType(). (#75217)
The specialisation will not be valid when ConstantInt gains native
support for vector types.

This is largely a mechanical change but with extra attention paid to constant
folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to
remove the need to call `getIntegerType()`.

Co-authored-by: Nikita Popov <github@npopov.com>
2023-12-18 11:58:42 +00:00
Mircea Trofin
ed10fba1b2
[ThinLTO] Allow importing based on a workload definition (#74545)
An example of a "workload definition" would be "the transitive closure of functions actually called to satisfy a RPC request", i.e. a (typically significantly) smaller subset of the transitive closure (static + possible indirect call targets) of callees. This means this workload definition is a type of flat dynamic profile.

Producing one is not in scope - it can be produced offline from traces, or from sample-based profiles, etc.

This patch adds awareness to ThinLTO of such a concept. A workload is defined as a root and a list of functions. All function references are by-name (more readable than GUIDs). In the case of aliases, the expectation is the list contains all the alternative names.

The workload definitions are presented to the linker as a json file, containing a dictionary. The keys are the roots, the values are the list of functions.

The import list for a module defining a root will be the functions listed for it in the profile.

Using names this way assumes unique names for internal functions, i.e. clang's `-funique-internal-linkage-names`.

Note that the behavior affects the entire module where a root is defined (i.e. different workloads best be defined in different modules), and does not affect modules that don't define roots.
2023-12-14 15:10:48 -08:00
Nikita Popov
696cc20d4e [LVI] Make UndefAllowed argument of getConstantRange() required
For the two remaining uses that did not explicitly specify it,
set UndefAllowed=false. In both cases, I believe that treating
undef as a full range is the correct behavior.
2023-12-12 11:43:52 +01:00
Youngsuk Kim
9071a15d4b
[llvm][Attributor] Strip AddressSpaceCast from 'constructPointer' (#74742)
* Remove pointer AddressSpaceCast in function `constructPointer`
* Remove 1st parameter (`ResTy`) of function `constructPointer`

1st input argument to function `constructPointer` in all 4 call-sites is
`ptr addrspace(0)`. Function `constructPointer` performs a pointer
AddressSpaceCast to `ResTy`, making the returned pointer have type `ptr
addrspace(0)` in all 4 call-sites.

Unless there's a clear reason to discard the addrspace info of input
parameter `Ptr`, I think we should keep and forward that info to the
returned pointer of `constructPointer`.

Opaque ptr cleanup effort.
2023-12-11 09:29:01 -05:00
Oskar Wirga
81360ec582
[CFI] Fix Direct Call Issues in CFI Dispatch Table (#69663)
I discovered two issues for when a CFI dispatch table entry is used as a
direct call.
# Inlining
There is the possibility that the dispatch table entry contains only a
single function pointer:
```
; Function Attrs: naked nocf_check
define private void @.cfi.jumptable() #6 align 8 { entry:
  call void asm sideeffect "jmp ${0:c}@plt\0Aint3\0Aint3\0Aint3\0A", "s"(ptr @_Z7throw_ei)
  unreachable
}
```
If this function is inlined, the unreachable follows and ruins the
containing function.
# Exception Handling
The dispatch table is always marked NoUnwind. This is fine if the
entries are never used directly, but if a direct call is used which the
containing function expects to throw, it will no longer throw and the
exception handling code will be lost.
2023-12-06 12:56:59 -08:00
Teresa Johnson
88fbc4d3df
[ThinLTO] Add tail call flag to call edges in summary (#74043)
This adds support for a HasTailCall flag on function call edges in the
ThinLTO summary. It is intended for use in aiding discovery of missing
frames from tail calls in profiled call stacks for MemProf of profiled
binaries that did not disable tail call elimination. A follow on change
will add the use of this new flag during MemProf context disambiguation.

The new flag is encoded in the bitcode along with either the hotness
flag from the profile, or the relative block frequency under the
-write-relbf-to-summary flag when there is no profile data.
Because we now will always have some additional call edge information, I
have removed the non-profile function summary record format, and we
simply encode the tail call flag along with a hotness type of none when
there is no profile information or relative block frequency. The change
of record format and name caused most of the test case changes.

I have added explicit testing of generation of the new tail call flag
into the bitcode and IR assembly format as part of the changes to
llvm/test/Bitcode/thinlto-function-summary-refgraph.ll. I have also
added round trip testing through assembly and bitcode to
llvm/test/Assembler/thinlto-summary.ll.
2023-12-06 08:41:44 -08:00
Kazu Hirata
92c2529ccd [llvm] Stop including vector (NFC)
Identified with clangd.
2023-12-03 22:32:21 -08:00
Paul Kirth
cfe1ece833
[clang][llvm][fatlto] Avoid cloning modules in FatLTO (#72180)
https://github.com/llvm/llvm-project/issues/70703 pointed out that
cloning LLVM modules could lead to miscompiles when using FatLTO.

This is due to an existing issue when cloning modules with labels (see
#55991 and #47769). Since this can lead to miscompilation, we can avoid
cloning the LLVM modules, which was desirable anyway.

This patch modifies the EmbedBitcodePass to no longer clone the module
or run an input pipeline over it. Further, it make FatLTO always perform
UnifiedLTO, so we can still defer the Thin/Full LTO decision to
link-time. Lastly, it removes dead/obsolete code related to now defunct
options that do not work with the EmbedBitcodePass implementation any
longer.
2023-11-30 17:09:34 -08:00
Youngsuk Kim
c57ef2c698
[llvm][OpenMPOpt] Remove no-op ptr-to-ptr bitcast (NFC) (#73869)
* Remove a call to CreatePointerBitCastOrAddrSpaceCast which merely adds
a no-op ptr-to-ptr bitcast.

* Most of the diff is from removing checks for no-op ptr-to-ptr bitcasts
in relevant LIT tests
2023-11-29 20:47:37 -05:00
Davide Italiano
61ab43ae00
[IROutliner] Skip dbg values during the candidate search. (#72945)
dbg value don't really have a value number associated as they have no
semantic value associated, i.e. they don't change the code being
generated. Use the correct API to go over them.

Fixes https://github.com/llvm/llvm-project/issues/62876
2023-11-22 11:26:36 -08:00
Mats Petersson
2fb51fba8c
[FuncSpec] Update function specialization to handle phi-chains (#72903)
When using the LLVM flang compiler with alias analysis (AA) enabled,
SPEC2017:548.exchange2_r was running significantly slower than wihtout
the AA.

This was caused by the GVN pass replacing many of the loads in the
pre-AA code with phi-nodes that form a long chain of dependencies, which
the function specialization was unable to follow.

This adds a function to discover phi-nodes in a transitive set, with
some limitations to avoid spending ages analysing phi-nodes.

The minimum latency savings also had to be lowered - fewer load
instructions means less saving.

Adding some more prints to help debugging the isProfitable decision.

No significant change in compile time or generated code-size.

(A previous attempt to fix this was abandoned: https://github.com/llvm/llvm-project/pull/71442)

---------

Co-authored-by: Alexandros Lamprineas <alexandros.lamprineas@arm.com>
2023-11-22 10:41:01 +00:00
Jeremy Morse
f42482def2
[DebugInfo][RemoveDIs] Don't convert debug-intrinsics to Unreachable (#72380)
It might seem obvious, but it's not a good idea to convert a
debug-intrinsic instruction into an UnreachableInst, as this means
things operate differently with and without the -g option. However this
can happen due to the "mutate the next instruction" API calls we make.
With RemoveDIs eliminating debug intrinsics, this behaviour is at risk
of changing, hence this patch ensures we only ever mutate the next _non_
debuginfo instruction into an Unreachable.

The tests instrumented with the --try... flag all exercise this, I've
added some metadata to a SCCP test to ensure it's exercised.
2023-11-20 20:53:24 +00:00
Davide Italiano
615ebfc3e5
[SampleProfileProbe] Downgrade probes too large from error to warning. (#72574) 2023-11-16 15:57:51 -08:00
Matthias Braun
cb4627d150
Add setBranchWeigths convenience function. NFC (#72446)
Add `setBranchWeights` convenience function to ProfDataUtils.h and use
it where appropriate.
2023-11-16 10:55:19 -08:00
Teresa Johnson
24a618f69e
[MemProf] Look through alias when applying cloning in ThinLTO backend (#72156)
Mirror the handling in ModuleSummaryAnalysis to look through aliases
when handling call instructions in the ThinLTO backend MemProf handling.

Fixes #72094
2023-11-15 13:14:19 -08:00
Kazu Hirata
d4360e428f [llvm] Stop including llvm/ADT/DenseMap.h (NFC)
Ientified with clangd.
2023-11-11 10:07:19 -08:00
Kazu Hirata
84a48ee9fb [llvm] Stop including llvm/ADT/SetVector.h (NFC)
Identified with clangd.
2023-11-10 23:50:23 -08:00
Vidhush Singhal
754b93e466
[Attributor] New attribute to identify what byte ranges are alive for an allocation (#66148)
Changes the size of allocations automatically.
For now, implements the case when a single range from start of the
allocation is alive and the allocation can be reduced.
2023-11-10 16:26:37 -08:00
William Junda Huang
683f2df6e5
[SampleProfile] Fix bug where remapper returns empty string and crashing Sample Profile loader (#71479)
Normally SampleContext does not allow using an empty StirngRef to
construct an object, this is to prevent bugs reading the profile.
However empty names may be emitted by a function which its name is
intentionally set to empty, or a bug in the remapper that returns an
empty string. Regardless, converting it to FunctionId first will prevent
the assert, and that assert check is unnecessary, which will be
addressed in another patch
2023-11-10 21:38:13 +00:00
Jeremy Morse
f1b0a54451 Reapply 7d77bbef4ad92, adding new debug-info classes
This reverts commit 957efa4ce4f0391147cec62746e997226ee2b836.

Original commit message below -- in this follow up, I've shifted
un-necessary inclusions of DebugProgramInstruction.h into being forward
declarations (fixes clang-compile time I hope), and a memory leak in the
DebugInfoTest.cpp IR unittests.

I also tracked a compile-time regression in D154080, more explanation
there, but the result of which is hiding some of the changes behind the
EXPERIMENTAL_DEBUGINFO_ITERATORS compile-time flag. This is tested by the
"new-debug-iterators" buildbot.

[DebugInfo][RemoveDIs] Add prototype storage classes for "new" debug-info

This patch adds a variety of classes needed to record variable location
debug-info without using the existing intrinsic approach, see the rationale
at [0].

The two added files and corresponding unit tests are the majority of the
plumbing required for this, but at this point isn't accessible from the
rest of LLVM as we need to stage it into the repo gently. An overview is
that classes are added for recording variable information attached to Real
(TM) instructions, in the form of DPValues and DPMarker objects. The
metadata-uses of DPValues is plumbed into the metadata hierachy, and a
field added to class Instruction, which are all stimulated in the unit
tests. The next few patches in this series add utilities to convert to/from
this new debug-info format and add instruction/block utilities to have
debug-info automatically updated in the background when various operations
occur.

This patch was reviewed in Phab in D153990 and D154080, I've squashed them
together into this commit as there are dependencies between the two
patches, and there's little profit in landing them separately.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
2023-11-08 16:42:35 +00:00
Nikita Popov
e360a16fee
[GlobalOpt] Cache whether CC is changeable (#71381)
The hasAddressTaken() call in hasOnlyColdCalls() has quadratic
complexity if there are many cold calls to a function: We're going to
visit each call of the function, and then for each of them iterate all
the users of the function.

We've recently encountered a case where GlobalOpt spends more than an
hour in these hasAddressTaken() checks when full LTO is used.

Avoid this by moving the hasAddressTaken() check into hasChangeableCC()
and caching its result, so it is only computed once per function.
2023-11-07 10:36:45 +01:00
Simon Pilgrim
3ca4fe80d4 [Transforms] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC.
startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)
2023-11-06 16:50:18 +00:00
Nikita Popov
c4c0ac10f1 [IPO] Remove unnecessary bitcasts (NFC) 2023-11-06 16:49:45 +01:00
Nikita Popov
16a595e398 [Attributor] Avoid use of ConstantExpr::getFPTrunc() (NFC)
Use the constant folding API instead. For simplificity I'm using
the DL-independent API here.
2023-11-06 15:27:01 +01:00
Dominik Adamski
2cce0f6c57
[OpenMP][OMPIRBuilder] Add support to omp target parallel (#67000)
Added support for LLVM IR code generation which is used for handling omp
target parallel code. The call for __kmpc_parallel_51 is generated and
the parallel region is outlined to separate function.

The proper setup of kmpc_target_init mode is not included in the commit.
It is assumed that the SPMD mode for target initialization is properly
set by other codegen functions.
2023-11-06 11:44:00 +01:00
Nikita Popov
a682a9cfd0 Revert "Port Swift's merge function pass to llvm: merging functions that differ in constants (#68235)"
This reverts commit 19b5495b653a00da7a250f48b4f739fcf2bbe82f.

PR landed without approval, with severe quality issues.
2023-11-03 21:15:46 +01:00
Manman Ren
19b5495b65
Port Swift's merge function pass to llvm: merging functions that differ in constants (#68235)
See RFC for details:
https://discourse.llvm.org/t/rfc-for-moving-swift-s-merge-function-pass-to-llvm/73778

We will need to refactor extension to FunctionComparator/FunctionHash to
StructuralHash. This patch adds a new pass which is ported from Swift,
and will need to discuss on how to migrate Swift’s pass over after we
land this in llvm.

Create this PR to get some early review on the patch.

---------

Co-authored-by: Manman Ren <mren@meta.com>
2023-11-03 11:13:58 -07:00
Johannes Doerfert
d3e7a48cbd [OpenMP][NFC] Remove a no-op function 2023-11-03 10:28:36 -07:00
Jeremy Morse
957efa4ce4 Revert "[DebugInfo][RemoveDIs] Add prototype storage classes for "new" debug-info"
And some intervening fixups. There are two remaining problems:
 * A memory leak via https://lab.llvm.org/buildbot/#/builders/236/builds/7120/steps/10/logs/stdio
 * A performance slowdown with -g where I'm not completely sure what the cause it

These might be fairly straightforwards to fix, but it's the end of the day
hear, so I figure I'll clear the buildbots til tomorrow.

This reverts commit 7d77bbef4ad9230f6f427649373fe46a668aa909.
This reverts commit 9026f35afe6ffdc5e55b6615efcbd36f25b11558.
This reverts commit d97b2b389a0e511c65af6845119eb08b8a2cb473.
2023-11-02 17:41:36 +00:00
Jeremy Morse
7d77bbef4a [DebugInfo][RemoveDIs] Add prototype storage classes for "new" debug-info
This patch adds a variety of classes needed to record variable location
debug-info without using the existing intrinsic approach, see the rationale
at [0].

The two added files and corresponding unit tests are the majority of the
plumbing required for this, but at this point isn't accessible from the
rest of LLVM as we need to stage it into the repo gently. An overview is
that classes are added for recording variable information attached to Real
(TM) instructions, in the form of DPValues and DPMarker objects. The
metadata-uses of DPValues is plumbed into the metadata hierachy, and a
field added to class Instruction, which are all stimulated in the unit
tests. The next few patches in this series add utilities to convert to/from
this new debug-info format and add instruction/block utilities to have
debug-info automatically updated in the background when various operations
occur.

This patch was reviewed in Phab in D153990 and D154080, I've squashed them
together into this commit as there are dependencies between the two
patches, and there's little profit in landing them separately.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
2023-11-02 12:44:53 +00:00
Johannes Doerfert
a8152086ff [Attributor][FIX] Ensure new BBs are registered 2023-11-01 12:12:14 -07:00
Nikita Popov
6b8ed78719 [IR] Add writable attribute
This adds a writable attribute, which in conjunction with
dereferenceable(N) states that a spurious store of N bytes is
introduced on function entry. This implies that this many bytes
are writable without trapping or introducing data races. See
https://llvm.org/docs/Atomics.html#optimization-outside-atomic for
why the second point is important.

This attribute can be added to sret arguments. I believe Rust will
also be able to use it for by-value (moved) arguments. Rust likely
won't be able to use it for &mut arguments (tree borrows does not
appear to allow spurious stores).

In this patch the new attribute is only used by LICM scalar promotion.
However, the actual motivation for this is to fix a correctness issue
in call slot optimization, which needs this attribute to avoid
optimization regressions.

Followup to the discussion on D157499.

Differential Revision: https://reviews.llvm.org/D158081
2023-11-01 10:46:31 +01:00
Davide Italiano
954af75ceb
[PGO] Skip optimizing probes that don't fit. (#70875)
The discriminator can only pack 16 bits, so anything exceeding that
value will cause the packing code to crash. Emit a diagnostic and skip
the optimization instead.
2023-10-31 21:30:36 -07:00
Joseph Huber
e8c0ae60d7
[OpenMP] Add optimization to remove the RPC client (#70683)
Summary:
Part of the work done in the `libc` project is to provide host services
for things like `printf` or `malloc`, or generally any syscall-like
behaviour. This scheme works by emitting an externally visible global
called `__llvm_libc_rpc_client` that the host runtime can pick up to get
a handle to the global memory associated with the client. We use the
presence of this symbol to indicate whether or not we need to run an RPC
server. Normally, this symbol is only present if something requiring an
RPC server was linked in, such as `printf`. However, if this call to
`printf` was subsequently optimizated out, the symbol would remain and
cannot be removed (rightfully so) because of its linkage. This patch
adds a special-case optimization to remove this symbol so we can
indicate that an RPC server is no longer needed.

This patch puts this logic in `OpenMPOpt` as the most readily available
place for it. In the future, we should think how to move this somewhere
more generic. Furthermore, we use a hard-coded runtime name (which isn't
uncommon given all the other magic symbol names). But it might be nice
to abstract that part away.
2023-10-31 17:23:24 -05:00
Sander de Smalen
00a831421f
[AArch64][SME] Extend Inliner cost-model with custom penalty for calls. (#68416)
This is a stacked PR following on from #68415 

This patch has two purposes:
(1) It tries to make inlining more likely when it can avoid a
streaming-mode change.
(2) It avoids inlining when inlining causes more streaming-mode changes.

An example of (1) is:
```
  void streaming_compatible_bar(void);

  void foo(void) __arm_streaming {
    /* other code */
    streaming_compatible_bar();
    /* other code */
  }

  void f(void) {
    foo();            // expensive streaming mode change
  }

  ->

  void f(void) {
    /* other code */
    streaming_compatible_bar();
    /* other code */
  }
```
where it wouldn't have inlined the function when foo would be a
non-streaming function.

An example of (2) is:
```
  void streaming_bar(void) __arm_streaming;

  void foo(void) __arm_streaming {
    streaming_bar();
    streaming_bar();
  }

  void f(void) {
    foo();            // expensive streaming mode change
  }

  -> (do not inline into)

  void f(void) {
    streaming_bar();  // these are now two expensive streaming mode changes
    streaming_bar();
  }```
2023-10-31 10:28:40 +00:00
Johannes Doerfert
31b91213bd [OpenMP] Unify the min/max thread/teams pathways
We used to pass the min/max threads/teams values through different paths
from the frontend to the middle end. This simplifies the situation by
passing the values once, only when we will create the KernelEnvironment,
which contains the values. At that point we also manifest the metadata,
as appropriate. Some footguns have also been removed, e.g., our target
check is now triple-based, not calling convention-based, as the latter
is dependent on the ordering of operations. The types of the values have
been unified to int32_t.
2023-10-29 10:53:20 -07:00
Mehdi Amini
f390a76b7e Revert "Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257)""
This reverts commit ddbaa11e9f43a38d50d62a9b9b07c3653b6bf8ab.

Reapply the original commit, the broken test was repaired in 5e51363f38d083ab326736c0d4d1b5f9fe0de080 in the meantime.
2023-10-26 17:30:01 -07:00
Mehdi Amini
ddbaa11e9f Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257)"
This reverts commit c2a1249a8257ed033a98e32e425539c6da6700ec.

The MLIR bots are broken with an omp test failure.
2023-10-26 17:25:20 -07:00
Johannes Doerfert
c2a1249a82
[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257)
The runtime needs to know about the acceptable launch bounds, especially
if the compiler (middle- or backend) assumed those bounds. While this
patch does not yet inform the runtime, it stores the bounds in a place
that can/will be accessed and is associated with the kernel.
2023-10-26 14:46:55 -07:00
Harvin Iriawan
211dc4ad40
[Analysis] Add Scalable field in MemoryLocation.h (#69716)
This is the first of a series of patch to improve Alias Analysis on
  Scalable quantities.
  Keep Scalable information from TypeSize which
  will be used in Alias Analysis.
2023-10-24 18:18:51 +01:00
Krzysztof Drewniak
93f8e52d1f
[FunctionAttrs] Improve handling of alias-preserving intrinsic calls (#68453)
Fixes #68270

The function attribute analysis handles many instructions, like
addrspacecast, which do not themselves read or write memory but which
transform pointers into other values in the same alias set.

There are intrinsic functions, such as ptrmask or the AMDGPU-specific
make.buffer.rsrc, which also preserve membership in alias sets without
capturing. This commit adds the addrspacecast-like behavior to these
calls.
2023-10-24 11:16:54 -05:00
Carlos Alberto Enciso
f3b20cb16a
[IPSCCP] Variable not visible at Og. (#66745)
https://bugs.llvm.org/show_bug.cgi?id=51559
https://github.com/llvm/llvm-project/issues/50901

IPSCCP pass removes the global variable and does not create a constant
expression for the initializer value.
2023-10-24 06:22:18 +01:00
Igor Kudrin
42a3a3b3f0
[ThinLTOBitcodeWriter] Do not crash on a typed declaration (#69564)
This fixes a crash when `splitAndWriteThinLTOBitcode()` hits a
declaration with type metadata. For example, such declarations can be
generated by the `EliminateAvailableExternally` pass.
2023-10-24 04:29:53 +07:00
Kazu Hirata
a5dca533bd Use llvm::count (NFC) 2023-10-22 21:18:23 -07:00
Kazu Hirata
9c5a5a421d [llvm] Stop including llvm/ADT/iterator_range.h (NFC)
Identified with misc-include-cleaner.
2023-10-22 15:41:18 -07:00
Kazu Hirata
935d8e12e0 [llvm] Stop including llvm/ADT/StringMap.h (NFC)
Identified misc-include-cleaner.
2023-10-22 11:53:56 -07:00
Johannes Doerfert
ba87fba806 [Attributor] Ignore different kernels for kernel lifetime objects
If a potential interfering access is in a different kernel and the
underlying object has kernel lifetime we can straight out ignore the
interfering access.
TODO: This should be made much stronger via "reaching kernels", which we
already track in AAKernelInfo.
2023-10-21 12:31:06 -07:00
Johannes Doerfert
499fb1b8d8 [Attributor][FIX] Interposable constants cannot be propagated 2023-10-20 19:28:09 -07:00