1017 Commits

Author SHA1 Message Date
Nikita Popov
979c275097
[IR] Store Triple in Module (NFC) (#129868)
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.

For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.

The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
2025-03-06 10:27:47 +01:00
Tom Tromey
e298fc2da9
Add DISubrangeType (#126772)
An Ada program can have types that are subranges of other types. This
patch adds a new DIType node, DISubrangeType, to represent this concept.
    
I considered extending the existing DISubrange to do this, but as
DISubrange does not derive from DIType, that approach seemed more
disruptive.
    
A DISubrangeType can be used both as an ordinary type, but also as the
type of an array index. This is also important for Ada.

Ada subrange types can also be stored using a bias. Representing this in
the DWARF required the use of an extension. GCC has been emitting this
extension for years, so I've reused it here.
2025-02-24 10:11:53 -08:00
Antonio Frighetto
ff585feacf [IR][ModRef] Introduce errno memory location
Model C/C++ `errno` macro by adding a corresponding `errno`
memory location kind to the IR. Preliminary work to separate
`errno` writes from other memory accesses, to the benefit of
alias analyses and optimization correctness.

Previous discussion: https://discourse.llvm.org/t/rfc-modelling-errno-memory-effects/82972.
2025-02-13 12:13:39 +01:00
Michael Buch
eb8901bda1
[llvm][DebugInfo] Add new DW_AT_APPLE_enum_kind to encode enum_extensibility (#124752)
When creating `EnumDecl`s from DWARF for Objective-C `NS_ENUM`s, the
Swift compiler tries to figure out if it should perform "swiftification"
of that enum (which involves renaming the enumerator cases, etc.). The
heuristics by which it determines whether we want to swiftify an enum is
by checking the `enum_extensibility` attribute (because that's what
`NS_ENUM` pretty much are). Currently LLDB fails to attach the
`EnumExtensibilityAttr` to `EnumDecl`s it creates (because there's not
enough info in DWARF to derive it), which means we have to fall back to
re-building Swift modules on-the-fly, slowing down expression evaluation
substantially. This happens around
4b3931c8ce/lib/ClangImporter/ImportEnumInfo.cpp (L37-L59)

To speed up Swift exression evaluation, this patch proposes encoding the
C/C++/Objective-C `enum_extensibility` attribute in DWARF via a new
`DW_AT_APPLE_ENUM_KIND`. This would currently be only used from the LLDB
Swift plugin. But may be of interest to other language plugins as well
(though I haven't come up with a concrete use-case for it outside of
Swift).

I'm open to naming suggestions of the various new attributes/attribute
constants proposed here. I tried to be as generic as possible if we
wanted to extend it to other kinds of enum properties (e.g., flag
enums).

The new attribute would look as follows:
```
DW_TAG_enumeration_type
  DW_AT_type      (0x0000003a "unsigned int")
  DW_AT_APPLE_enum_kind   (DW_APPLE_ENUM_KIND_Closed)
  DW_AT_name      ("ClosedEnum")
  DW_AT_byte_size (0x04)
  DW_AT_decl_file ("enum.c")
  DW_AT_decl_line (23)

DW_TAG_enumeration_type
  DW_AT_type      (0x0000003a "unsigned int")
  DW_AT_APPLE_enum_kind   (DW_APPLE_ENUM_KIND_Open)
  DW_AT_name      ("OpenEnum")
  DW_AT_byte_size (0x04)
  DW_AT_decl_file ("enum.c")
  DW_AT_decl_line (27)
```
Absence of the attribute means the extensibility of the enum is unknown
and abides by whatever the language rules of that CU dictate.

This does feel like a big hammer for quite a specific use-case, so I'm
happy to discuss alternatives.

Alternatives considered:
* Re-using an existing DWARF attribute to express extensibility. E.g., a
`DW_TAG_enumeration_type` could have a `DW_AT_count` or
`DW_AT_upper_bound` indicating the number of enumerators, which could
imply closed-ness. I felt like a dedicated attribute (which could be
generalized further) seemed more applicable. But I'm open to re-using
existing attributes.
* Encoding the entire attribute string (i.e., `DW_TAG_LLVM_annotation
("enum_extensibility((open))")`) on the `DW_TAG_enumeration_type`. Then
in LLDB somehow parse that out into a `EnumExtensibilityAttr`. I haven't
found a great API in Clang to parse arbitrary strings into AST nodes
(the ones I've found required fully formed C++ constructs). Though if
someone knows of a good way to do this, happy to consider that too.
2025-02-06 08:58:35 +00:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Teresa Johnson
6a214ec1ee
[MemProf] Fix an assertion when writing distributed index for aliasee (#122946)
The ThinLTO index bitcode writer uses a helper forEachSummary to manage
preparation and writing of summaries needed for each distributed index
file. For alias summaries, it invokes the provided callback for the
aliasee as well, as we at least need to produce a value id for the
alias's summary. However, all summary generation for the aliasee itself
should be skipped on calls when IsAliasee is true. We invoke the
callback again if that value's summary is to be written as well.

We were asserting in debug mode when invoking collectMemProfCallStacks,
because a given stack id index was not in the StackIdIndicesToIndex
map. It was not added because the forEachSummary invocation that records
these ids in the map (invoked from the IndexBitcodeWriter constructor)
was correctly skipping this handling when invoked for aliasees. We need
the same guard in the invocation that calls collectMemProfCallStacks.

Note that this doesn't cause any real problems in a non-asserts build
as the missing map lookup will return the default 0 value from the map,
which isn't used since we don't actually write the corresponding
summary.
2025-01-14 13:46:34 -08:00
Nikita Popov
22e9024c9f
[IR] Introduce captures attribute (#116990)
This introduces the `captures` attribute as described in:
https://discourse.llvm.org/t/rfc-improvements-to-capture-tracking/81420

This initial patch only introduces the IR/bitcode support for the
attribute and its in-memory representation as `CaptureInfo`. This will
be followed by a patch to upgrade and remove the `nocapture` attribute,
and then by actual inference/analysis support.

Based on the RFC feedback, I've used a syntax similar to the `memory`
attribute, though the only "location" that can be specified is `ret`.

I've added some pretty extensive documentation to LangRef on the
semantics. One non-obvious bit here is that using ptrtoint will not
result in a "return-only" capture, even if the ptrtoint result is only
used in the return value. Without this requirement we wouldn't be able
to continue ordinary capture analysis on the return value.
2025-01-13 14:40:25 +01:00
Florian Hahn
a487b792e2
[TySan] Add initial Type Sanitizer (LLVM) (#76259)
This patch introduces the LLVM components of a type sanitizer: a
sanitizer for type-based aliasing violations.

It is based on Hal Finkel's https://reviews.llvm.org/D32198.

C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit
these given TBAA metadata added by Clang. Roughly, a pointer of given
type cannot be used to access an object of a different type (with, of
course, certain exceptions). Unfortunately, there's a lot of code in the
wild that violates these rules (e.g. for type punning), and such code
often must be built with -fno-strict-aliasing. Performance is often
sacrificed as a result. Part of the problem is the difficulty of finding
TBAA violations. Hopefully, this sanitizer will help.

For each TBAA type-access descriptor, encoded in LLVM's IR using
metadata, the corresponding instrumentation pass generates descriptor
tables. Thus, for each type (and access descriptor), we have a unique
pointer representation. Excepting anonymous-namespace types, these
tables are comdat, so the pointer values should be unique across the
program. The descriptors refer to other descriptors to form a type
aliasing tree (just like LLVM's TBAA metadata does). The instrumentation
handles the "fast path" (where the types match exactly and no
partial-overlaps are detected), and defers to the runtime to handle all
of the more-complicated cases. The runtime, of course, is also
responsible for reporting errors when those are detected.

The runtime uses essentially the same shadow memory region as tsan, and
we use 8 bytes of shadow memory, the size of the pointer to the type
descriptor, for every byte of accessed data in the program. The value 0
is used to represent an unknown type. The value -1 is used to represent
an interior byte (a byte that is part of a type, but not the first
byte). The instrumentation first checks for an exact match between the
type of the current access and the type for that address recorded in the
shadow memory. If it matches, it then checks the shadow for the
remainder of the bytes in the type to make sure that they're all -1. If
not, we call the runtime. If the exact match fails, we next check if the
value is 0 (i.e. unknown). If it is, then we check the shadow for the
remainder of the byes in the type (to make sure they're all 0). If
they're not, we call the runtime. We then set the shadow for the access
address and set the shadow for the remaining bytes in the type to -1
(i.e. marking them as interior bytes). If the type indicated by the
shadow memory for the access address is neither an exact match nor 0, we
call the runtime.

The instrumentation pass inserts calls to the memset intrinsic to set
the memory updated by memset, memcpy, and memmove, as well as
allocas/byval (and for lifetime.start/end) to reset the shadow memory to
reflect that the type is now unknown. The runtime intercepts memset,
memcpy, etc. to perform the same function for the library calls.

The runtime essentially repeats these checks, but uses the full TBAA
algorithm, just as the compiler does, to determine when two types are
permitted to alias. In a situation where access overlap has occurred and
aliasing is not permitted, an error is generated.

Clang's TBAA representation currently has a problem representing unions,
as demonstrated by the one XFAIL'd test in the runtime patch. We'll
update the TBAA representation to fix this, and at the same time, update
the sanitizer.

When the sanitizer is active, we disable actually using the TBAA
metadata for AA. This way we're less likely to use TBAA to remove memory
accesses that we'd like to verify.

As a note, this implementation does not use the compressed shadow-memory
scheme discussed previously
(http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That
scheme would not handle the struct-path (i.e. structure offset)
information that our TBAA represents. I expect we'll want to further
work on compressing the shadow-memory representation, but I think it
makes sense to do that as follow-up work.

It goes together with the corresponding clang changes
(https://github.com/llvm/llvm-project/pull/76260) and compiler-rt
changes (https://github.com/llvm/llvm-project/pull/76261)

PR: https://github.com/llvm/llvm-project/pull/76259
2024-12-17 13:57:34 +00:00
Mingming Liu
6faf17b762
[ThinLTO]Supports declaration import for global variables in distributed ThinLTO (#117616)
When `-import-declaration` option is enabled, declaration import is
supported for functions. https://github.com/llvm/llvm-project/pull/88024
has the context for this option.

This patch supports declaration import for global variables in
distributed ThinLTO. The motivating use case is to propagate `dso_local`
attribute of global variables across modules, to optimize global
variable access when a binary is built with
`-fno-direct-access-external-data`.
* With `-fdirect-access-external-data`, non thread-local global
variables will [have `dso_local`
attributes](fe3c23b439/clang/lib/CodeGen/CodeGenModule.cpp (L1730-L1746)).
This optimizes the global variable access as shown by
https://gcc.godbolt.org/z/vMzWcKdh3
2024-12-02 16:15:52 -08:00
Kazu Hirata
ff7b42c194
[memprof] Speed up llvm-profdata (#117446)
CallStackRadixTreeBuilder::build takes the parameter
MemProfFrameIndexes by value, involving copies:

  std::optional<const llvm::DenseMap<FrameIdTy, LinearFrameId>>
    MemProfFrameIndexes

Then "build" makes another copy of MemProfFrameIndexe and passes it to
encodeCallStack for every call stack, which is painfully slow.

This patch changes the type to a pointer so that we don't have to make
a copy every time we pass the argument.

Without this patch, it takes 553 seconds to run "llvm-profdata merge"
on a large MemProf raw profile.  This patch shortenes that down to 67
seconds.
2024-11-24 21:08:54 -08:00
Teresa Johnson
776476c282
Reapply "[MemProf] Use radix tree for alloc contexts in bitcode summaries" (#117395) (#117404)
This reverts commit fdb050a5024320ec29d2edf3f2bc686c3a84abaa, and
restores ccb4702038900d82d1041ff610788740f5cef723, with a fix for build
bot failures.

Specifically, add ProfileData to the dependences of the BitWriter
library, which was causing shared library builds of LLVM to fail.
Reproduced the failure with a shared library build and confirmed this
change fixes that build failure.
2024-11-22 16:18:30 -08:00
Teresa Johnson
fdb050a502
Revert "[MemProf] Use radix tree for alloc contexts in bitcode summaries" (#117395)
Reverts llvm/llvm-project#117066

This is causing some build bot failures that need investigation.
2024-11-22 14:57:58 -08:00
Teresa Johnson
ccb4702038
[MemProf] Use radix tree for alloc contexts in bitcode summaries (#117066)
Leverage the support added to represent allocation contexts in a more
compact way via a radix tree in the indexed profile to similarly reduce
sizes of the bitcode summaries.

For a large target, this reduced the size of the per-module summaries by
about 18% and in the distributed combined index files by 28%.
2024-11-22 14:49:55 -08:00
Mingming Liu
97b2903455
[NFCI][WPD]Use unique string saver to store type id (#106932)
Currently, both
[TypeIdMap](67a1fdb014/llvm/include/llvm/IR/ModuleSummaryIndex.h (L1356))
and
[TypeIdCompatibleVtableMap](67a1fdb014/llvm/include/llvm/IR/ModuleSummaryIndex.h (L1363))
keep type-id as `std::string` in the combined index for LTO indexing
analysis.

With this change, index uses a unique-string-saver to own the string
copies and two maps above can use string references to save some memory.

This shows a 3% memory reduction (from 8.2GiB to 7.9GiB) in an internal
binary with high indexing memory usage.
2024-11-20 23:44:18 -08:00
Alex Voicu
53a6a11e0d
[LLVM][NFC] Use used's element type if available (#116804)
When embedding, if `compiler.used` exists, we should re-use it's element
type instead of blindly assuming it's an unqualified pointer.
2024-11-20 23:57:55 +00:00
Teresa Johnson
b35f40688e
[MemProf] Change the STACK_ID record to fixed width values (#116448)
The stack ids are hashes that are close to 64 bits in size, so emitting
as a pair of 32-bit fixed-width values is more efficient than a VBR.
This reduced the summary bitcode size for a large target by about 1%.

Bump the index version and ensure we can read the old format.
2024-11-18 15:16:48 -08:00
Teresa Johnson
9513f2fdf2
[MemProf] Print full context hash when reporting hinted bytes (#114465)
Improve the information printed when -memprof-report-hinted-sizes is
enabled. Now print the full context hash computed from the original
profile, similar to what we do when reporting matching statistics. This
will make it easier to correlate with the profile.

Note that the full context hash must be computed at profile match time
and saved in the metadata and summary, because we may trim the context
during matching when it isn't needed for distinguishing hotness.
Similarly, due to the context trimming, we may have more than one full
context id and total size pair per MIB in the metadata and summary,
which now get a list of these pairs.

Remove the old aggregate size from the metadata and summary support.
One other change from the prior support is that we no longer write the
size information into the combined index for the LTO backends, which
don't use this information, which reduces unnecessary bloat in
distributed index files.
2024-11-15 08:24:44 -08:00
Augusto Noronha
67fb2686fb
[DebugInfo] Add a specification attribute to LLVM DebugInfo (#115362)
Add a specification attribute to LLVM DebugInfo, which is analogous
to DWARF's DW_AT_specification. According to the DWARF spec:
"A debugging information entry that represents a declaration that
completes another (earlier) non-defining declaration may have a
DW_AT_specification attribute whose value is a reference to the
debugging information entry representing the non-defining declaration."

This patch allows types to be specifications of other types. This is
used by Swift to represent generic types. For example, given this Swift
program:

```
struct MyStruct<T> {
    let t: T
}

let variable = MyStruct<Int>(t: 43)
```

The Swift compiler emits (roughly) an unsubtituted type for MyStruct<T>:
```
DW_TAG_structure_type
    DW_AT_name	("MyStruct")
    // "$s1w8MyStructVyxGD" is a Swift mangled name roughly equivalent to 
    // MyStruct<T>
    DW_AT_linkage_name	("$s1w8MyStructVyxGD")
    // other attributes here
```
And a specification for MyStruct<Int>:
```
DW_TAG_structure_type
    DW_AT_specification	(<link to "MyStruct">)
    // "$s1w8MyStructVySiGD" is a Swift mangled name equivalent to
    // MyStruct<Int>
    DW_AT_linkage_name	("$s1w8MyStructVySiGD")
    DW_AT_byte_size	(0x08)
    // other attributes here
```
2024-11-13 09:55:37 -08:00
Augusto Noronha
f6617d65e4
[DebugInfo] Add num_extra_inhabitants to debug info (#112590)
An extra inhabitant is a bit pattern that does not represent a valid
value for instances of a given type. The number of extra inhabitants is
the number of those bit configurations.

This is used by Swift to save space when composing types. For example,
because Bool only needs 2 bit patterns to represent all of its values
(true and false), an Optional<Bool> only occupies 1 byte in memory by
using a bit configuration that is unused by Bool. Which bit patterns are
unused are part of the ABI of the language.

Since Swift generics are not monomorphized, by using dynamic libraries
you can have generic types whose size, alignment, etc, are known only
at runtime (which is why this feature is needed).

This patch adds num_extra_inhabitants to LLVM-IR debug info and in DWARF
as an Apple extension.
2024-11-06 15:48:04 -08:00
davidtrevelyan
4102625380
[rtsan][llvm][NFC] Rename sanitize_realtime_unsafe attr to sanitize_realtime_blocking (#113155)
# What

This PR renames the newly-introduced llvm attribute
`sanitize_realtime_unsafe` to `sanitize_realtime_blocking`. Likewise,
sibling variables such as `SanitizeRealtimeUnsafe` are renamed to
`SanitizeRealtimeBlocking` respectively. There are no other functional
changes.


# Why?

- There are a number of problems that can cause a function to be
real-time "unsafe",
- we wish to communicate what problems rtsan detects and *why* they're
unsafe, and
- a generic "unsafe" attribute is, in our opinion, too broad a net -
which may lead to future implementations that need extra contextual
information passed through them in order to communicate meaningful
reasons to users.
- We want to avoid this situation and make the runtime library boundary
API/ABI as simple as possible, and
- we believe that restricting the scope of attributes to names like
`sanitize_realtime_blocking` is an effective means of doing so.

We also feel that the symmetry between `[[clang::blocking]]` and
`sanitize_realtime_blocking` is easier to follow as a developer.

# Concerns

- I'm aware that the LLVM attribute `sanitize_realtime_unsafe` has been
part of the tree for a few weeks now (introduced here:
https://github.com/llvm/llvm-project/pull/106754). Given that it hasn't
been released in version 20 yet, am I correct in considering this to not
be a breaking change?
2024-10-26 13:06:11 +01:00
elhewaty
9efb07f261
[IR] Add samesign flag to icmp instruction (#111419)
Inspired by
https://discourse.llvm.org/t/rfc-signedness-independent-icmps/81423
2024-10-15 17:11:25 +08:00
Tim Renouf
76007138f4
[LLVM] New NoDivergenceSource function attribute (#111832)
A call to a function that has this attribute is not a source of
divergence, as used by UniformityAnalysis. That allows a front-end to
use known-name calls as an instruction extension mechanism (e.g.
https://github.com/GPUOpen-Drivers/llvm-dialects ) without such a call
being a source of divergence.
2024-10-12 09:34:45 +01:00
Serge Pavlov
15de239406
[IR] Allow MDString in operand bundles (#110805)
This change implements support of metadata strings in operand bundle
values. It makes possible calls like:

    call void @some_func(i32 %x) [ "foo"(i32 42, metadata !"abc") ]

It requires some extension of the bitcode serialization. As SSA values
and metadata are stored in different tables, there must be a way to
distinguish them during deserialization. It is implemented by putting a
special marker before the metadata index. The marker cannot be treated
as a reference to any SSA value, so it unambiguously identifies
metadata. It allows extending the bitcode serialization without breaking
compatibility.

Metadata as operand bundle values are intended to be used in
floating-point function calls. They would represent the same information
as now is passed by the constrained intrinsic arguments.
2024-10-11 12:09:10 +07:00
davidtrevelyan
0f488a0b7d
[LLVM][rtsan] Add sanitize_realtime_unsafe attribute (#106754) 2024-09-19 16:45:25 -06:00
Jonas Paulsson
14120227a3
Target ABI: improve call parameters extensions handling (#100757)
For the purpose of verifying proper arguments extensions per the target's ABI,
introduce the NoExt attribute that may be used by a target when neither sign-
or zeroextension is required (e.g. with a struct in register). The purpose of
doing so is to be able to verify that there is always one of these attributes
present and by this detecting cases where sign/zero extension is actually
missing.

As a first step, this patch has the verification step done for the SystemZ
backend only, but left off by default until all known issues have been
addressed.

Other targets/front-ends can now also add NoExt attribute where needed and do
this check in the backend.
2024-09-19 16:59:31 +02:00
Yuxuan Chen
e17a39bc31
[Clang] C++20 Coroutines: Introduce Frontend Attribute [[clang::coro_await_elidable]] (#99282)
This patch is the frontend implementation of the coroutine elide
improvement project detailed in this discourse post:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

This patch proposes a C++ struct/class attribute
`[[clang::coro_await_elidable]]`. This notion of await elidable task
gives developers and library authors a certainty that coroutine heap
elision happens in a predictable way.

Originally, after we lower a coroutine to LLVM IR, CoroElide is
responsible for analysis of whether an elision can happen. Take this as
an example:
```
Task foo();
Task bar() {
  co_await foo();
}
```
For CoroElide to happen, the ramp function of `foo` must be inlined into
`bar`. This inlining happens after `foo` has been split but `bar` is
usually still a presplit coroutine. If `foo` is indeed a coroutine, the
inlined `coro.id` intrinsics of `foo` is visible within `bar`. CoroElide
then runs an analysis to figure out whether the SSA value of
`coro.begin()` of `foo` gets destroyed before `bar` terminates.

`Task` types are rarely simple enough for the destroy logic of the task
to reference the SSA value from `coro.begin()` directly. Hence, the pass
is very ineffective for even the most trivial C++ Task types. Improving
CoroElide by implementing more powerful analyses is possible, however it
doesn't give us the predictability when we expect elision to happen.

The approach we want to take with this language extension generally
originates from the philosophy that library implementations of `Task`
types has the control over the structured concurrency guarantees we
demand for elision to happen. That is, the lifetime for the callee's
frame is shorter to that of the caller.

The ``[[clang::coro_await_elidable]]`` is a class attribute which can be
applied to a coroutine return type.

When a coroutine function that returns such a type calls another
coroutine function, the compiler performs heap allocation elision when
the following conditions are all met:
- callee coroutine function returns a type that is annotated with
``[[clang::coro_await_elidable]]``.
- In caller coroutine, the return value of the callee is a prvalue that
is immediately `co_await`ed.

From the C++ perspective, it makes sense because we can ensure the
lifetime of elided callee cannot exceed that of the caller if we can
guarantee that the caller coroutine is never destroyed earlier than the
callee coroutine. This is not generally true for any C++ programs.
However, the library that implements `Task` types and executors may
provide this guarantee to the compiler, providing the user with
certainty that HALO will work on their programs.

After this patch, when compiling coroutines that return a type with such
attribute, the frontend checks that the type of the operand of
`co_await` expressions (not `operator co_await`). If it's also
attributed with `[[clang::coro_await_elidable]]`, the FE emits metadata
on the call or invoke instruction as a hint for a later middle end pass
to elide the elision.

The original patch version is
https://github.com/llvm/llvm-project/pull/94693 and as suggested, the
patch is split into frontend and middle end solutions into stacked PRs.

The middle end CoroSplit patch can be found at
https://github.com/llvm/llvm-project/pull/99283
The middle end transformation that performs the elide can be found at
https://github.com/llvm/llvm-project/pull/99285
2024-09-08 23:08:58 -07:00
Mingming Liu
d4ddf06b0c
[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (#107471)
The primary motivation is to remove `EntryCount` from `FunctionSummary`.
This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of
64498c5483).

While I'm at it, this PR clean up {SummaryBasedOptimizations,
SyntheticCountsPropagation} since they were not used and there are no
plans to further invest on them.

With this patch, bitcode writer writes a placeholder 0 at the byte
offset of `EntryCount` and bitcode reader can parse the function entry
count at the correct byte offset. Added a TODO to stop writing
`EntryCount` and bump bitcode version
2024-09-06 16:38:17 -07:00
anjenner
4af249fe6e
Add usub_cond and usub_sat operations to atomicrmw (#105568)
These both perform conditional subtraction, returning the minuend and
zero respectively, if the difference is negative.
2024-09-06 16:19:20 +01:00
Chris Apple
fef3426ad3
Revert "[LLVM][rtsan] Add LLVM nosanitize_realtime attribute (#105447)" (#106743)
This reverts commit 178fc4779ece31392a2cd01472b0279e50b3a199.

This attribute was not needed now that we are using the lsan style
ScopedDisabler for disabling this sanitizer

See #106736 
#106125 

For more discussion
2024-08-30 07:48:31 -07:00
Jan Voung
fa4fbaefde
Reapply: Use an abbrev to reduce size of VALUE_GUID records in ThinLTO summaries (#106165)
This retries #90692 which was reverted previously due to issues with
lld-available being set, even if the copy of lld is not built from
source.

This does not change any code compared to #90692 to address the
lld-available issue.
The main change w.r.t, lld-available is xfailing tests in PR #99056
(until a longer term fix is available).
2024-08-27 13:53:25 -04:00
Chris Apple
178fc4779e
[LLVM][rtsan] Add LLVM nosanitize_realtime attribute (#105447) 2024-08-26 12:49:27 -07:00
Kazu Hirata
dbd7ce0ccd
[IR] Inroduce ModuleToSummariesForIndexTy (NFC) (#105906)
This patch introduces type alias ModuleToSummariesForIndexTy.

I'm planning to change the type slightly to allow heterogeneous lookup
(that is, std::map<K, V, std::less<>>) in a subsequent patch.  The
problem is that changing the type affects many places.  Using a type
alias reduces the impact.
2024-08-23 17:32:52 -07:00
Kazu Hirata
ca53611c90
[llvm] Use range-based for loops (NFC) (#105861) 2024-08-23 16:56:27 -07:00
Kazu Hirata
3b703d479f
[Bitcode] Use DenseSet instead of std::set (NFC) (#105851)
DefOrUseGUIDs is used only for membership checking purposes.  We don't
need std::set's strengths like iterators staying valid or the ability
to traverse in a sorted order.

While I am at it, this patch replaces count with contains for slightly
increased readability.
2024-08-23 14:19:48 -07:00
Kazu Hirata
5f01fda4ac
[Bitcode] Use range-based for loops (NFC) (#104534) 2024-08-15 20:17:40 -07:00
Kazu Hirata
74c5fa12a8
[Bitcode] Use range-based for loops (NFC) (#103628) 2024-08-13 23:31:40 -07:00
Chris Apple
b143b2483f
[LLVM][rtsan] Add sanitize_realtime attribute for the realtime sanitizer (#100596)
Add a new "sanitize_realtime" attribute, which will correspond to the
nonblocking function effect in clang. This is used in the realtime
sanitizer transform.

Please see the [reviewer support
document](https://github.com/realtime-sanitizer/radsan/blob/doc/review-support/doc/review.md)
for what our next steps are. The original discourse thread can be found
[here](https://discourse.llvm.org/t/rfc-nolock-and-noalloc-attributes/76837)
2024-08-08 15:41:06 +02:00
James Y Knight
dfeb3991fb
Remove the x86_mmx IR type. (#98505)
It is now translated to `<1 x i64>`, which allows the removal of a bunch
of special casing.

This _incompatibly_ changes the ABI of any LLVM IR function with
`x86_mmx` arguments or returns: instead of passing in mmx registers,
they will now be passed via integer registers. However, the real-world
incompatibility caused by this is expected to be minimal, because Clang
never uses the x86_mmx type -- it lowers `__m64` to either `<1 x i64>`
or `double`, depending on ABI.

This change does _not_ eliminate the SelectionDAG `MVT::x86mmx` type.
That type simply no longer corresponds to an IR type, and is used only
by MMX intrinsics and inline-asm operands.

Because SelectionDAGBuilder only knows how to generate the
operands/results of intrinsics based on the IR type, it thus now
generates the intrinsics with the type MVT::v1i64, instead of
MVT::x86mmx. We need to fix this before the DAG LegalizeTypes, and thus
have the X86 backend fix them up in DAGCombine. (This may be a
short-lived hack, if all the MMX intrinsics can be removed in upcoming
changes.)

Works towards issue #98272.
2024-07-25 09:19:22 -04:00
Jacek Caban
6cc8774228
[CodeGen][ARM64EC] Add support for hybrid_patchable attribute. (#92965) 2024-07-19 11:43:25 +02:00
Teresa Johnson
9f8205d9d8
[MemProf] Track and report profiled sizes through cloning (#98382)
If requested, via the -memprof-report-hinted-sizes option, track the
total profiled size of each MIB through the thin link, then report on
the corresponding allocation coldness after all cloning is complete.

To save size, a different bitcode record type is used for the allocation
info when the option is specified, and the sizes are kept separate from
the MIBs in the index.
2024-07-11 16:10:30 -07:00
Mingming Liu
50fea9943f
Reland "[ThinLTO][Bitcode] Generate import type in bitcode" (#97253)
https://github.com/llvm/llvm-project/pull/87600 was reverted in order to
revert
6262763341.
Now https://github.com/llvm/llvm-project/pull/95482 is fix forward for
6262763341.
This patch is a reland for
https://github.com/llvm/llvm-project/pull/87600

**Changes on top of original patch**
In `llvm/include/llvm/IR/ModuleSummaryIndex.h`, make the type of
`GVSummaryPtrSet` an `unordered_set` which is more memory efficient when
the number of elements is smaller than 128 [1]

**Original commit message**

For distributed ThinLTO, the LTO indexing step generates combined
summary for each module, and postlink pipeline reads the combined
summary which stores the information for link-time optimization.

This patch populates the 'import type' of a summary in bitcode, and
updates bitcode reader to parse the bit correctly.

[1]
393eff4e02/llvm/lib/Support/SmallPtrSet.cpp (L43)
2024-07-08 22:20:33 -07:00
Alex Voicu
9acb533c38
[clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (#95061)
This patch augments the HIPAMD driver to allow it to target AMDGCN
flavoured SPIR-V compilation. It's mostly straightforward, as we re-use
some of the existing SPIRV infra, however there are a few notable
additions:

- we introduce an `amdgcnspirv` offload arch, rather than relying on
using `generic` (this is already fairly overloaded) or simply using
`spirv` or `spirv64` (we'll want to use these to denote unflavoured
SPIRV, once we bring up that capability)
- initially it is won't be possible to mix-in SPIR-V and concrete AMDGPU
targets, as it would require some relatively intrusive surgery in the
HIPAMD Toolchain and the Driver to deal with two triples
(`spirv64-amd-amdhsa` and `amdgcn-amd-amdhsa`, respectively)
- in order to retain user provided compiler flags and have them
available at JIT time, we rely on embedding the command line via
`-fembed-bitcode=marker`, which the bitcode writer had previously not
implemented for SPIRV; we only allow it conditionally for AMDGCN
flavoured SPIRV, and it is handled correctly by the Translator (it ends
up as a string literal)

Once the SPIRV BE is no longer experimental we'll switch to using that
rather than the translator. There's some additional work that'll come
via a separate PR around correctly piping through AMDGCN's
implementation of `printf`, for now we merely handle its flags
correctly.
2024-06-25 12:19:28 +01:00
Haopeng Liu
5ece35df85
Add the 'initializes' attribute langref and support (#84803)
We propose adding a new LLVM attribute,
`initializes((Lo1,Hi1),(Lo2,Hi2),...)`, which expresses the notion of
memory space (i.e., intervals, in bytes) that the argument pointing to
is initialized in the function.

Will commit the attribute inferring in the follow-up PRs.


https://discourse.llvm.org/t/rfc-llvm-new-initialized-parameter-attribute-for-improved-interprocedural-dse/77337
2024-06-21 12:09:00 -07:00
eddyz87
01ce74fe14
Revert "[DebugInfo][BPF] Add 'annotations' field for DIBasicType & DI… (#96172)
…SubroutineType (#91422)"

This reverts commit 3ca17443ef4af21bdb1f3b4fbcfff672cbc6176c.

As reported in [1,2] the commit above causes CI failure for powerpc-aix
target.
There is also a performance regression reported in [3]. Reverting to
comply with the developer policy.

[1]
https://github.com/llvm/llvm-project/pull/91422#issuecomment-2179425473
[2] https://lab.llvm.org/buildbot/#/builders/64/builds/62
[3]
https://github.com/llvm/llvm-project/pull/91422#issuecomment-2175631443
2024-06-20 21:28:02 +03:00
eddyz87
3ca17443ef
[DebugInfo][BPF] Add 'annotations' field for DIBasicType & DISubroutineType (#91422)
Extend `DIBasicType` and `DISubroutineType` with additional field
`annotations`, e.g. as below:

```
  !5 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed, annotations: !6)
  !6 = !{!7}
  !7 = !{!"btf:type_tag", !"tag1"}
```

The field would be used by BPF backend to generate DWARF attributes
corresponding to `btf_type_tag` type attributes, e.g.:

```
  0x00000029:   DW_TAG_base_type
                  DW_AT_name	("int")
                  DW_AT_encoding	(DW_ATE_signed)
                  DW_AT_byte_size	(0x04)

  0x0000002d:     DW_TAG_LLVM_annotation
                    DW_AT_name	("btf:type_tag")
                    DW_AT_const_value	("tag1")
```

Such DWARF entries would be used to generate BTF definitions by tools
like [pahole](https://github.com/acmel/dwarves).

Note: similar fields with similar purposes are already present in
DIDerivedType and DICompositeType.

Currently "btf_type_tag" attributes are represented in debug information
as 'annotations' fields in DIDerivedType with DW_TAG_pointer_type tag.
The annotation on a pointer corresponds to pointee having the attributes
in the final BTF.

The discussion in
[thread](https://lore.kernel.org/bpf/87r0w9jjoq.fsf@oracle.com/) came to
conclusion, that such annotations should apply to the annotated type
itself. Hence the necessity to extend `DIBasicType` & `DISubroutineType`
types with 'annotations' field to represent cases like below:

```
  int __attribute__((btf_type_tag("foo"))) bar;
```

This was previously tracked as differential revision:
https://reviews.llvm.org/D143966
2024-06-18 10:23:25 +03:00
Alexander Shaposhnikov
c4f8ae6f32
[LLVM][IR][Sanitizers] Add sanitize_numerical_stability attribute (#95051)
Add sanitize_numerical_stability attribute.
2024-06-10 17:53:22 -07:00
Mingming Liu
53061eecdb
Revert "[ThinLTO][Bitcode] Generate import type in bitcode (#87600)" (#94502)
This reverts commit 6262763341fcd71a2b0708cf7485f9abd1d26ba8, to prepare
for the revert of https://github.com/llvm/llvm-project/pull/92718.


https://github.com/llvm/llvm-project/pull/92718 causes LTO indexing OOM
in some applications.
2024-06-05 09:59:46 -07:00
Nikita Popov
deab451e7a
[IR] Remove support for icmp and fcmp constant expressions (#93038)
Remove support for the icmp and fcmp constant expressions.

This is part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179

As usual, many of the updated tests will no longer test what they were
originally intended to -- this is hard to preserve when constant
expressions get removed, and in many cases just impossible as the
existence of a specific kind of constant expression was the cause of the
issue in the first place.
2024-06-04 08:31:03 +02:00
Teresa Johnson
f4d705871a
[MemProf] Determine stack id references in BitcodeWriter without sorting (#94285)
A cycle profile of a thin link showed a lot of time spent in sort called
from the BitcodeWriter, which was being used to compute the unique
references to stack ids in the summaries emitted for each backend in a
distributed thinlto build. We were also frequently invoking lower_bound
to locate stack id indices in the resulting vector when writing out the
referencing memprof records.

Change this to use a map to uniquify the references, and to hold the
index of the corresponding stack id in the StackIds vector, which is
now populated at the same time.

This reduced the time of a large thin link by about 10%.
2024-06-03 20:32:20 -07:00
Mircea Trofin
c49bc1a3b7 BitcodeWriter: ensure Buffer is heap allocated
PR #92983 accidentally changed the buffer allocation in
`llvm::WriteBitcodeToFile` to be allocated on the stack, which is
problematic given it's a large-ish buffer (256K)
2024-06-03 12:48:23 -07:00