This patch changes the `DIArgList` class's inheritance from `MDNode` to
`Metadata, ReplaceableMetadataImpl`, and ensures that it is always
unique, i.e. a distinct DIArgList should never be produced.
This should not result in any changes to IR or bitcode parsing and
printing, as the format for DIArgList is unchanged, and the order in which it
appears should also be identical. As a minor note, this patch also fixes
a gap in the verifier, where the ValueAsMetadata operands to a DIArgList
would not be visited.
Close https://github.com/llvm/llvm-project/issues/56980.
This patch tries to introduce a light-weight optimization attribute for
coroutines which are guaranteed to only be destroyed after it reached
the final suspend.
The rationale behind the patch is simple. See the example:
```C++
A foo() {
dtor d;
co_await something();
dtor d1;
co_await something();
dtor d2;
co_return 43;
}
```
Generally the generated .destroy function may be:
```C++
void foo.destroy(foo.Frame *frame) {
switch(frame->suspend_index()) {
case 1:
frame->d.~dtor();
break;
case 2:
frame->d.~dtor();
frame->d1.~dtor();
break;
case 3:
frame->d.~dtor();
frame->d1.~dtor();
frame->d2.~dtor();
break;
default: // coroutine completed or haven't started
break;
}
frame->promise.~promise_type();
delete frame;
}
```
Since the compiler need to be ready for all the cases that the coroutine
may be destroyed in a valid state.
However, from the user's perspective, we can understand that certain
coroutine types may only be destroyed after it reached to the final
suspend point. And we need a method to teach the compiler about this.
Then this is the patch. After the compiler recognized that the
coroutines can only be destroyed after complete, it can optimize the
above example to:
```C++
void foo.destroy(foo.Frame *frame) {
frame->promise.~promise_type();
delete frame;
}
```
I spent a lot of time experimenting and experiencing this in the
downstream. The numbers are really good. In a real-world coroutine-heavy
workload, the size of the build dir (including .o files) reduces 14%.
And the size of final libraries (excluding the .o files) reduces 8% in
Debug mode and 1% in Release mode.
This adds a writable attribute, which in conjunction with
dereferenceable(N) states that a spurious store of N bytes is
introduced on function entry. This implies that this many bytes
are writable without trapping or introducing data races. See
https://llvm.org/docs/Atomics.html#optimization-outside-atomic for
why the second point is important.
This attribute can be added to sret arguments. I believe Rust will
also be able to use it for by-value (moved) arguments. Rust likely
won't be able to use it for &mut arguments (tree borrows does not
appear to allow spurious stores).
In this patch the new attribute is only used by LICM scalar promotion.
However, the actual motivation for this is to fix a correctness issue
in call slot optimization, which needs this attribute to avoid
optimization regressions.
Followup to the discussion on D157499.
Differential Revision: https://reviews.llvm.org/D158081
Add an nneg flag to the zext instruction, which specifies that the
argument is non-negative. Otherwise, the result is a poison value.
The primary use-case for the flag is to preserve information when sext
gets replaced with zext due to range-based canonicalization. The nneg
flag allows us to convert the zext back into an sext later. This is
useful for some optimizations (e.g. a signed icmp can fold with sext but
not zext), as well as some targets (e.g. RISCV prefers sext over zext).
Discourse thread: https://discourse.llvm.org/t/rfc-add-zext-nneg-flag/73914
This patch is based on https://reviews.llvm.org/D156444 by
@Panagiotis156, with some implementation simplifications and additional
tests.
---------
Co-authored-by: Panagiotis K <karouzakispan@gmail.com>
This patch adds a new fn attribute, `optdebug`, that specifies that
optimizations should make decisions that prioritize debug info quality,
potentially at the cost of runtime performance.
This patch does not add any functional changes triggered by this
attribute, only the attribute itself. A subsequent patch will use this
flag to disable the post-RA scheduler.
* Remove if its sole use is to support an unnecessary ptr-to-ptr bitcast
(remove the bitcast as well)
* Replace with use of other APIs.
NFC opaque pointer cleanup effort.
The module paths string table mapped to both an id sequentially assigned
during LTO linking, and the module hash. The former is leftover from
before the module hash was added for caching and subsequently replaced
use of the module id when renaming promoted symbols (to avoid affects
due to link order changes). The sequentially assigned module id was not
removed, however, as it was still a convenience when serializing to/from
bitcode and assembly.
This patch removes the module id from this table, since it isn't
strictly needed and can lead to confusion on when it is appropriate to
use (e.g. see fix in D156525). It also takes a (likely not significant)
amount of overhead. Where an integer module id is needed (e.g. bitcode
writing), one is assigned on the fly.
There are a couple of test changes since the paths are now sorted
alphanumerically when assigning ids on the fly during assembly writing,
in order to ensure deterministic behavior.
Differential Revision: https://reviews.llvm.org/D156730
Here's a high level summary of the changes in this patch. For more
information on rational, see the RFC.
(https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774).
- Add config parameter to LTO backend, specifying which LTO mode is
desired when using unified LTO.
- Add unified LTO flag to the summary index for efficiency. Unified
LTO modules can be detected without parsing the module.
- Make sure that the ModuleID is generated by incorporating more types
of symbols.
Differential Revision: https://reviews.llvm.org/D123803
As pointed out in
https://discourse.llvm.org/t/undeterministic-thin-index-file/69985, the
block count added to distributed ThinLTO index files breaks incremental
builds on ThinLTO - if any linked file has a different number of BBs,
then the accumulated sum placed in the index files will change, causing
all ThinLTO backend compiles to be redone.
The block count is only used for scaling of partial sample profiles, and
was added in D80403 for D79831.
This patch simply removes this field from the index files of non partial
sample profile compiles, which is NFC on the output of the compiler.
We subsequently need to see if this can be removed for partial sample
profiles without signficant performance loss, or redesigned in a way
that does not destroy caching.
Differential Revision: https://reviews.llvm.org/D148746
This carries a bitmask indicating forbidden floating-point value kinds
in the argument or return value. This will enable interprocedural
-ffinite-math-only optimizations. This is primarily to cover the
no-nans and no-infinities cases, but also covers the other floating
point classes for free. Textually, this provides a number of names
corresponding to bits in FPClassTest, e.g.
call nofpclass(nan inf) @must_be_finite()
call nofpclass(snan) @cannot_be_snan()
This is more expressive than the existing nnan and ninf fast math
flags. As an added bonus, you can represent fun things like nanf:
declare nofpclass(inf zero sub norm) float @only_nans()
Compared to nnan/ninf:
- Can be applied to individual call operands as well as the return value
- Can distinguish signaling and quiet nans
- Distinguishes the sign of infinities
- Can be safely propagated since it doesn't imply anything about
other operands.
- Does not apply to FP instructions; it's not a flag
This is one step closer to being able to retire "no-nans-fp-math" and
"no-infs-fp-math". The one remaining situation where we have no way to
represent no-nans/infs is for loads (if we wanted to solve this we
could introduce !nofpclass metadata, following along with
noundef/!noundef).
This is to help simplify the GPU builtin math library
distribution. Currently the library code has explicit finite math only
checks, read from global constants the compiler driver needs to set
based on the compiler flags during linking. We end up having to
internalize the library into each translation unit in case different
linked modules have different math flags. By propagating known-not-nan
and known-not-infinity information, we can automatically prune the
edge case handling in most functions if the function is only reached
from fast math uses.
These are essentially add/sub 1 with a clamping value.
AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do no carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.
Target-extension types represent types that need to be preserved through
optimization, but otherwise are not introspectable by target-independent
optimizations. This patch doesn't add any uses of these types by an existing
backend, it only provides basic infrastructure such that these types would work
correctly.
Reviewed By: nikic, barannikov88
Differential Revision: https://reviews.llvm.org/D135202
getCanonicalMDString() also returns a nullptr for empty strings, which
tripped over the getSource() method. Solve the ambiguity of no source
versus an optional containing a nullptr by simply storing a pointer.
Differential Revision: https://reviews.llvm.org/D138658
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
This restores commit 98ed423361de2f9dc0113a31be2aa04524489ca9 and
follow on fix 00c22351ba697dbddb4b5bf0ad94e4bcea4b316b, which were
reverted in 5d938eb6f79b16f55266dd23d5df831f552ea082 due to an
MSVC bot failure. I've included a fix for that failure.
Differential Revision: https://reviews.llvm.org/D135714
This reverts commit 00c22351ba697dbddb4b5bf0ad94e4bcea4b316b.
This reverts commit 98ed423361de2f9dc0113a31be2aa04524489ca9.
Seemingly MSVC has some kind of issue with this patch, in terms of linking:
https://lab.llvm.org/buildbot/#/builders/123/builds/14137
I'll post more detail on D135714 momentarily.
This restores 47459455009db4790ffc3765a2ec0f8b4934c2a4, which was
reverted in commit 452a14efc84edf808d1e2953dad2c694972b312f, along with
fixes for a couple of bot failures.
Implements the ThinLTO summary support for memprof related metadata.
This includes support for the assembly format, and for building the
summary from IR during ModuleSummaryAnalysis.
To reduce space in both the bitcode format and the in memory index,
we do 2 things:
1. We keep a single vector of all uniq stack id hashes, and record the
index into this vector in the callsite and allocation memprof
summaries.
2. When building the combined index during the LTO link, the callsite
and allocation memprof summaries are only kept on the FunctionSummary
of the prevailing copy.
Differential Revision: https://reviews.llvm.org/D135714
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
Add the DIAssignID metadata attachment boilerplate. Includes a textual-bitcode
roundtrip test and tests that the verifier and parser catch badly formed IR.
This piece of metadata links together stores (used as an attachment) and the
yet-to-be-added llvm.dbg.assign debug intrinsic (used as an operand).
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D132222
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.
The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.
High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.
Differential Revision: https://reviews.llvm.org/D135780
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
Add the DIAssignID metadata attachment boilerplate. Includes a textual-bitcode
roundtrip test and tests that the verifier and parser catch badly formed IR.
This piece of metadata links together stores (used as an attachment) and the
yet-to-be-added llvm.dbg.assign debug intrinsic (used as an operand).
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D132222
This implements IR and bitcode support for the memory attribute,
as specified in https://reviews.llvm.org/D135597.
The new attribute is not used for anything yet (and as such, the
old memory attributes are unaffected).
Differential Revision: https://reviews.llvm.org/D135592
This patch renames FuncletPadInst::getNumArgOperands to arg_size for
consistency with CallBase, where getNumArgOperands was removed in
favor of arg_size in commit 3e1c787b3160bed4146d3b2b5f922aeed3caafd7
Differential Revision: https://reviews.llvm.org/D136048
As discussed in [0], this diff adds the `skipprofile` attribute to
prevent the function from being profiled while allowing profiled
functions to be inlined into it. The `noprofile` attribute remains
unchanged.
The `noprofile` attribute is used for functions where it is
dangerous to add instrumentation to while the `skipprofile` attribute is
used to reduce code size or performance overhead.
[0] https://discourse.llvm.org/t/why-does-the-noprofile-attribute-restrict-inlining/64108
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D130807
This allows the construct to be shared between different backends. However, it
still remains illegal to use TypedPointerType in LLVM IR--the type is intended
to remain an auxiliary type, not a real LLVM type. So no support is provided for
LLVM-C, nor bitcode, nor LLVM assembly (besides the bare minimum needed to make
Type->dump() work properly).
Reviewed By: beanz, nikic, aeubanks
Differential Revision: https://reviews.llvm.org/D130592
For MTE globals, we should have clang emit the attribute for all GV's
that it creates, and then use that in the upcoming AArch64 global
tagging IR pass. We need a positive attribute for this sanitizer (rather
than implicit sanitization of all globals) because it needs to interact
with other parts of LLVM, including:
1. Suppressing certain global optimisations (like merging),
2. Emitting extra directives by the ASM writer, and
3. Putting extra information in the symbol table entries.
While this does technically make the LLVM IR / bitcode format
non-backwards-compatible, nobody should have used this attribute yet,
because it's a no-op.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D128950
This patch adds the support for `fmax` and `fmin` operations in `atomicrmw`
instruction. For now (at least in this patch), the instruction will be expanded
to CAS loop. There are already a couple of targets supporting the feature. I'll
create another patch(es) to enable them accordingly.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D127041
This removes the insertvalue constant expression, as part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
This is very similar to the extractvalue removal from D125795.
insertvalue is also not supported in bitcode, so no auto-ugprade
is necessary.
ConstantExpr::getInsertValue() can be replaced with
IRBuilder::CreateInsertValue() or ConstantFoldInsertValueInstruction(),
depending on whether a constant result is required (with the latter
being fallible).
The ConstantExpr::hasIndices() and ConstantExpr::getIndices()
methods also go away here, because there are no longer any constant
expressions with indices.
Differential Revision: https://reviews.llvm.org/D128719
Plan is the migrate the global variable metadata for sanitizers, that's
currently carried around generally in the 'llvm.asan.globals' section,
onto the global variable itself.
This patch adds the attribute and plumbs it through the LLVM IR and
bitcode formats, but is a no-op other than that so far.
Reviewed By: vitalybuka, kstoimenov
Differential Revision: https://reviews.llvm.org/D126100
I chose to encode the allockind information in a string constant because
otherwise we would get a bit of an explosion of keywords to deal with
the possible permutations of allocation function types.
I'm not sure that CodeGen.h is the correct place for this enum, but it
seemed to kind of match the UWTableKind enum so I put it in the same
place. Constructive suggestions on a better location most certainly
encouraged.
Differential Revision: https://reviews.llvm.org/D123088
The original fix (commit 23ec5782c3cc) of
https://github.com/llvm/llvm-project/issues/52787 only adds `Function`s
that have `Instruction`s that directly use `BlockAddress`es into the
bitcode (`FUNC_CODE_BLOCKADDR_USERS`).
However, in either @rickyz's original reproducing code:
```
void f(long);
__attribute__((noinline)) static void fun(long x) {
f(x + 1);
}
void repro(void) {
fun(({
label:
(long)&&label;
}));
}
```
```
...
define dso_local void @repro() #0 {
entry:
br label %label
label: ; preds = %entry
tail call fastcc void @fun()
ret void
}
define internal fastcc void @fun() unnamed_addr #1 {
entry:
tail call void @f(i64 add (i64 ptrtoint (i8* blockaddress(@repro, %label) to i64), i64 1)) #3
ret void
}
...
```
or the xfs and overlayfs in the Linux kernel, `BlockAddress`es (e.g.,
`i8* blockaddress(@repro, %label)`) may first compose `ConstantExpr`s
(e.g., `i64 ptrtoint (i8* blockaddress(@repro, %label) to i64)`) and
then used by `Instruction`s. This case is not handled by the original
fix.
This patch adds *indirect* users of `BlockAddress`es, i.e., the
`Instruction`s using some `Constant`s which further use the
`BlockAddress`es, into the bitcode as well, by doing depth-first
searches.
Fixes: https://github.com/llvm/llvm-project/issues/52787
Fixes: 23ec5782c3cc ("[Bitcode] materialize Functions early when BlockAddress taken")
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D124878
This continues the push away from hard-coded knowledge about functions
towards attributes. We'll use this to annotate free(), realloc() and
cousins and obviate the hard-coded list of free functions.
Differential Revision: https://reviews.llvm.org/D123083