(Reland of #190092 with verifier change to look through GlobalAliases)
So that it's preserved across all inline invocations rather than just
one inliner pass run.
This prevents cases where devirtualization in the simplification
pipeline uncovers inlining opportunities that should be discarded due to
inline history, but we dropped the inline history between inliner pass
runs, causing code size to blow up, sometimes exponentially.
For compile time reasons, we want to limit this to only call sites that
have the potential to inline through SCCs, potentially with the help of
devirtualization. This means that the callee is in a non-trivial
(Ref)SCC, or the call site was previously an indirect call, which can
potentially be devirtualized to call any function.
The CGSCCUpdater::InlinedInternalEdges logic still seems to be relevant
even with this change, as monster_scc.ll blows up if I remove that code.
http://llvm-compile-time-tracker.com/compare.php?from=e830d88e8ae5f44a97cc76136a0a4e83aa9157c0&to=ed535e732fc41b79ab8efda2417886cbd0812f7f&stat=instructions:uFixes#186926.
So that it's preserved across all inline invocations rather than just
one inliner pass run.
This prevents cases where devirtualization in the simplification
pipeline uncovers inlining opportunities that should be discarded due to
inline history, but we dropped the inline history between inliner pass
runs, causing code size to blow up, sometimes exponentially.
For compile time reasons, we want to limit this to only call sites that
have the potential to inline through SCCs, potentially with the help of
devirtualization. This means that the callee is in a non-trivial
(Ref)SCC, or the call site was previously an indirect call, which can
potentially be devirtualized to call any function.
The CGSCCUpdater::InlinedInternalEdges logic still seems to be relevant
even with this change, as monster_scc.ll blows up if I remove that code.
http://llvm-compile-time-tracker.com/compare.php?from=e830d88e8ae5f44a97cc76136a0a4e83aa9157c0&to=ed535e732fc41b79ab8efda2417886cbd0812f7f&stat=instructions:uFixes#186926.
This adds a new method Verifier::visitDIType, and then changes method
for subclasses of DIType to call it. The new method just dispatches to
DIScope and adds a file/line check inspired by
Verifier::visitDISubprogram.
This instruction is an alternative for the `alloca` instruction when
targeting logical targets like DXIL/SPIR-V.
This instruction allocates some memory, but the exact size of the
allocation is not known at the IR level. Only some equivalence can be
determined.
Commit adds docs, instruction declaration, and IR verifier testing.
Related to:
https://discourse.llvm.org/t/rfc-adding-logical-structured-alloca/
Windows EH requires exception objects allocated on stack. But there is
no reliable way to identify them. CoroSplit employs a best-effort
algorithm to determine whether allocas persist on the stack or the
frame, which may result in miscompilation when Windows exceptions are
used.
This patch proposes that we treat allocas used by catchpad as exception
objects and never place them on the frame. A verifier check is added to
enforce that operands of catchpad are either constants or allocas.
Close#143235Close#153949Close#182584
Since https://reviews.llvm.org/D144004, DwarfDebug asserts if
function-local imported entities are present in the imports field of
DICompileUnit.
This patch adds a Verifier check to detect such invalid IR earlier.
Incorrect occurrences of imported entities in DICompileUnit's imports
field in llvm/test/Bitcode/DIImportedEntity_elements.ll,
llvm/test/Bitcode/DIModule-fortran-external-module.ll are fixed.
This change is extracted from https://reviews.llvm.org/D144008.
This patch relands https://github.com/llvm/llvm-project/pull/178666. The
original version caused CI failures due to the missing target triple in
`llvm/test/CodeGen/X86/byte-constants.ll`. CI should be green now.
There's no point constructing a dominator tree or similar on
known-broken IR. Generally, functions should be able to assume that IR
is valid (i.e., passes the verifier). Users of this "feature" were:
- Verifier, fixed by verifying existence of terminators first.
- FuzzMutate, worked around by temporarily inserting terminators.
- OpenMP to run analyses while building the IR, worked around by
temporarily inserting terminators.
- Polly to work with an empty dominator tree, fixed by temporarily
adding an unreachable inst.
- MergeBlockIntoPredecessor, inadvertently, fixed by adding terminator
before updating MemorySSA.
- Some sloppily written unit tests.
DwarfFile asserts if two arguments of the same subprogram with the same
index are present in a DISubprogram scope:
5d7a502a9d/llvm/lib/CodeGen/AsmPrinter/DwarfFile.cpp (L110)
This patch adds a check to the Verifier to detect such invalid IR
earlier. It can be helpful for finding reproducers for bugs like
https://issues.chromium.org/issues/40288032.
The incorrect args field of DILocalVariable in
llvm/test/DebugInfo/MIR/X86/live-debug-values-reg-copy.mir is fixed.
Now that CondBrInst and UncondBrInst are explicit subclasses, use them
instead.
HotColdSplitting was trying to inspect prof metadata also on
unconditional branches, fix this.
Also introduce C API cast functions and deprecate LLVMIsConditional in
favor of LLVMIsACondBrInst.
This patch covers all LLVM uses outside of Transforms, Analysis,
CodeGen/Target, SandboxIR, Frontend/OpenMP, tools, examples.
Since #165032, DwarfDebug asserts if function-local enums are present in
the enums field of DICompileUnit.
This patch adds a check to the Verifier to detect such invalid IR
earlier.
Incorrect occurence of a local enum in DICompileUnit's enums field in
`llvm/test/DebugInfo/COFF/enum-co.ll` is fixed.
This change is extracted from https://reviews.llvm.org/D144008.
Add `__arm_atomic_store_with_stshh` implementation as defined in the
ACLE. Validate arguments passed are correct, and lower to the `stshh`
intrinsic plus an atomic store using a pseudo-instruction with the
allowed orderings:
* memory orderings: relaxed, release, seq_cst
* retention policies: keep, strm
The `STSHH` instruction (Store with Store Hint for Hardware) is part
of the `FEAT_PCDPHINT` extension.
When a global variable has a size that exceeds the size of the address
space it resides in, the verifier should fail as the variable can
neither be materialized nor fully accessed. This patch adds a check to
the verifier to enforce it.
---------
Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>
Co-authored-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>
Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first
class denormal_fpenv attribute. Previously the query for the effective
denormal mode involved two string attribute queries with parsing. I'm
introducing more uses of this, so it makes sense to convert this
to a more efficient encoding. The old representation was also awkward
since it was split across two separate attributes. The new encoding
just stores the default and float modes as bitfields, largely avoiding
the need to consider if the other mode is set.
The syntax in the common cases looks like this:
`denormal_fpenv(preservesign,preservesign)`
`denormal_fpenv(float: preservesign,preservesign)`
`denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)`
I wasn't sure about reusing the float type name instead of adding a
new keyword. It's parsed as a type but only accepts float. I'm also
debating switching the name to subnormal to match the current
preferred IEEE terminology (also used by nofpclass and other
contexts).
This has a behavior change when using the command flag debug
options to set the denormal mode. The behavior of the flag
ignored functions with an explicit attribute set, per
the default and f32 version. Now that these are one attribute,
the flag logic can't distinguish which of the two components
were explicitly set on the function. Only one test appeared to
rely on this behavior, so I just avoided using the flags in it.
This also does not perform all the code cleanups this enables.
In particular the attributor handling could be cleaned up.
I also guessed at how to support this in MLIR. I followed
MemoryEffects as a reference; it appears bitfields are expanded
into arguments to attributes, so the representation there is
a bit uglier with the 2 2-element fields flattened into 4 arguments.
This is an attempt to merge https://reviews.llvm.org/D144006 with LTO
fix.
The last merge attempt was
https://github.com/llvm/llvm-project/pull/75385.
The issue with it was investigated in
https://github.com/llvm/llvm-project/pull/75385#issuecomment-2386684121.
The problem happens when
1. Several modules are being linked.
2. There are several DISubprograms that initially belong to different
modules but represent the same source code function (for example, a
function included from the same source code file).
3. Some of such DISubprograms survive IR linking. It may happen if one
of them is inlined somewhere or if the functions that have these
DISubprograms attached have internal linkage.
4. Each of these DISubprograms has a local type that corresponds to the
same source code type. These types are initially from different modules,
but have the same ODR identifier.
If the same (in the sense of ODR identifier/ODR uniquing rules) local
type is present in two modules, and these modules are linked together,
the type gets uniqued. A DIType, that happens to be loaded first,
survives linking, and the references on other types with the same ODR
identifier from the modules loaded later are replaced with the
references on the DIType loaded first. Since defintion subprograms, in
scope of which these types are located, are not deduplicated, the linker
output may contain multiple DISubprogram's having the same (uniqued)
type in their retainedNodes lists.
Further compilation of such modules causes crashes.
To tackle that,
* previous solution to handle LTO linking with local types in
retainedNodes is removed (cloneLocalTypes() function),
* for each loaded distinct (definition) DISubprogram, its retainedNodes
list is scanned after loading, and DITypes with a scope of another
subprogram are removed. If something from a Function corresponding to
the DISubprogram references uniqued type, we rely on cross-CU links.
Additionally:
* a check is added to Verifier to report about local types located in a
wrong retainedNodes list,
Original commit message follows.
---------
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Similar to imported declarations, the patch tracks function-local types in
DISubprogram's 'retainedNodes' field. DwarfDebug is adjusted in accordance with
the aforementioned metadata change and provided a support of function-local
types scoped within a lexical block.
The patch assumes that DICompileUnit's 'enums field' no longer tracks local
types and DwarfDebug would assert if any locally-scoped types get placed there.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Co-authored-by: Jeremy Morse <jeremy.morse@sony.com>
In some of our use cases, the GPU runtime stores some data at the top of
the stack. It figures out where it's safe to store it by using the PAL
metadata generated by the backend, which includes the total stack size.
However, the metadata does not include the space reserved at the bottom
of the stack for the trap handler when CWSR is enabled in dynamic VGPR
mode. This space is reserved dynamically based on whether or not the
code is running on the compute queue. Therefore, the runtime needs a way
to take that into account.
Add support for `llvm.sponentry`, which should return the base of the
stack,
skipping over any reserved areas. This allows us to keep this
computation in
one place rather than duplicate it between the backend and the runtime.
The implementation for functions that set up their own stack uses a
pseudo
that is expanded to the same code sequence as that used in the prolog to
set up the stack in the first place.
In callable functions, we generate a fixed stack object and use that
instead,
similar to the Arm/AArch64 approach. This wastes some stack space but
that's
not a problem for now because we're not planning to use this in callable
functions yet.
This adds the analogous metadata to the nofpclass attribute
to assert values are not a certain set of floating-point classes.
This allows the same information to be expressed if a function
argument is passed indirectly. This matches the bitmask encoding
of nofpclass.
I also think this should be allowed for stores to symmetrically handle
sret, but leave that for later.
Alternatively we could add a more expressive !fprange metadata,
but that would be much more complex. It's useful to match the attribute,
and more annotations can always be added.
Fixes#133560
Following on from #170796, this PR implements the second part of
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974
by allowing non-constant offsets in the vector splice intrinsics.
Previously @llvm.vector.splice had a restriction enforced by the
verifier that the offset had to be known to be within the range of the
vector at compile time. Because we can't enforce this with non-constant
offsets, it's been relaxed so that offsets that would slide the vector
out of bounds return a poison value, similar to
insertelement/extractelement.
@llvm.vector.splice.left also previously only allowed offsets within the
range 0 <= Offset < N, but this has been relaxed to 0 <= Offset <= N so
that it's consistent with @llvm.vector.splice.right.
In lieu of the verifier checks that were removed, InstSimplify has been
taught to fold splices to poison when the offset is out of bounds.
The cost model isn't implemented in this PR, and just returns invalid
for any non-constant offsets for now. I think the correct way to cost
these non-constant offets isn't through getShuffleCost because they
can't handle variable masks, but instead just through
getIntrinsicInstCost.
Add a verifier check to reject GEP instructions that index into vectors
with non-byte-addressable element types (e.g., <2 x i1>). Such GEPs
cannot have their offset computed in bytes, causing assertions in passes
like SROA that try to compute byte offsets.
Fixes#176628.
The patch adds two intrinsics: llvm.convert.to.arbitrary.fp and
llvm.convert.from.arbitrary.fp.
The intrinsics perform conversions between values whose interpretation
differs from their representation in LLVM IR. The intrinsics are
overloaded on both its return type and first argument. Metadata operands
describe how the raw bits should be interpreted before and after the
conversion.
Typical use case is to convert IEEE-754 floating point types to FP8/FP4
and backwards for ML applications.
Addresses
https://discourse.llvm.org/t/rfc-spir-v-way-to-represent-float8-in-llvm-ir/87758/10
This patch adds support in Clang for the RPRFM instruction, by adding
the following intrinsics:
```
void __pldx_range(unsigned int *access_kind*, unsigned int retention_policy,
signed int length*, unsigned int count, signed int stride,
size_t reuse distance, void const *addr);
void __pld_range(unsigned int access_kind*, unsigned int retention_policy,
uint64_t metadata, void const *addr);
```
The `__ARM_PREFETCH_RANGE` macro can be used to test whether these
intrinsics are implemented. If the RPRFM instruction is not available, this
instruction is a NOP.
This implements the following ACLE proposal:
https://github.com/ARM-software/acle/pull/423
Add a new metadata node `!implicit.ref` to represent an implicit
dependency between 2 symbols. The metadata is unique to AIX and gets
lowered to a relocation that adds an explicit link between the section
the global that the metadata is placed on is allocated in, to the
asscoiated symbol. This relocation will cause the associated symbol to
remain live if the section is not garbage collected. This is used mainly
for compiler features where there is some hidden runtime dependency
between the symbols that isn't otherwise obvious to the linker.
This PR implements the first change outlined in
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel
In order to allow non-immediate offsets in the llvm.vector.splice
intrinsic, we need to separate out the "shift left" and "shift right"
modes into two separate intrinsics, which were previously determined by
whether or not the offset is positive or negative.
The description in the LangRef has also been reworded in terms of
sliding elements left or right and extracting either the upper or lower
half as opposed to extracting from a certain index, which brings it
inline with the definition of `llvm.fshr.*`/`llvm.fshl.*`.
This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into
their new equivalent one based on their offset, so existing uses of
vector.splice should still work.
Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced
in this PR to keep the diff small and kick the tyres on the AutoUpgrader
a bit. I planned to do this in a follow up NFC but can include it in
this PR if reviewers prefer.
Similarly the shuffle costing kind `SK_Splice` has just been kept the
same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight`
later.
Fixes https://github.com/llvm/llvm-project/issues/172647
Currently, MC assumes that all `target-feature` flag attributes are well
formed and will crash otherwise. This change handles those cases more
gracefully.
The memprof metadata verifier supported multiple string tags, but in
reality, the other code (e.g. addCallStack) only supports a single such
tag. Update the verifier to reflect that limitation, and the associated
tests.
Fixes#157217
SwitchInst case values must be ConstantInt, which have no use list.
Therefore it is not necessary to store these as Use, instead store them
more efficiently as a simple array of pointers after the uses, similar
to how PHINode stores basic blocks.
After this change, the successors of all terminators are stored
consecutively in the operand list. This is preparatory work for
improving the performance of successor access.
Add new C API functions so that switch case values remain accessible
from bindings for other languages.
While this could also be achieved by merely changing the order of
operands (i.e., first all successors, then all constants), doing so
would increase the asymptotic runtime of addCase from O(1) to O(n)
(i.e., adding n cases would be O(n^2)), because it would need to shift
all constants by one slot. Having null/invalid operands is also a bad
idea and would cause much more breakage.
Pull Request: https://github.com/llvm/llvm-project/pull/170984
SwitchInst case values must be ConstantInt, which have no use list.
Therefore it is not necessary to store these as Use, instead store them
more efficiently as a simple array of pointers after the uses, similar
to how PHINode stores basic blocks.
After this change, the successors of all terminators are stored
consecutively in the operand list. This is preparatory work for
improving the performance of successor access.
Currently we multiply the known minimum number of elements by vscale
even if the vector in question is fixed, so sometimes we miss some fixed
vectors with out of bounds indices.
This commit adds support for using intrinsics with callbr. The uses of
this will most of the time look like this example:
```llvm
callbr void @llvm.amdgcn.kill(i1 %c) to label %cont [label %kill]
kill:
unreachable
cont:
...
```
Consider this Ada type:
```
type Array_Type is array (Natural range <>) of Integer;
type Record_Type (L1, L2 : Natural) is record
I1 : Integer;
A1 : Array_Type (1 .. L1);
I2 : Integer;
A2 : Array_Type (1 .. L2);
I3 : Integer;
end record;
```
Here, the array fields have lengths that depend on the discriminants of
the record type. However, in this case the array lengths cannot be
expressed as DWARF location expressions, with the issue being that "A2"
has a non-constant offset, but an expression involving
DW_OP_push_object_address will push the address of the field -- with no
way to find the location of "L2".
In a case like this, I believe the correct DWARF is to emit the array
ranges using a direct reference to the discriminant, like:
```
<3><1156>: Abbrev Number: 1 (DW_TAG_member)
<1157> DW_AT_name : l1
...
<3><1177>: Abbrev Number: 6 (DW_TAG_array_type)
<1178> DW_AT_name : (indirect string, offset: 0x1a0b): vla__record_type__T4b
<117c> DW_AT_type : <0x1287>
<1180> DW_AT_sibling : <0x118e>
<4><1184>: Abbrev Number: 7 (DW_TAG_subrange_type)
<1185> DW_AT_type : <0x1280>
<1189> DW_AT_upper_bound : <0x1156>
```
(FWIW this is what GCC has done for years.)
This patch makes this possible in LLVM, by letting a DISubrangeType
refer to a DIDerivedType. gnat-llvm can then arrange for the DIE
reference to be correct by setting the array type's scope to be the
record.
Deactivation symbol operands are supported in the code generator by
building on the previously added support for IRELATIVE relocations.
Reviewers: ojhunt, fmayer, ahmedbougacha, nikic, efriedma-quic
Reviewed By: fmayer
Pull Request: https://github.com/llvm/llvm-project/pull/133537
For swift async code, we need to use a debug intrinsic that behaves like
an llvm.dbg.declare but can take any location type rather than just a
pointer or integer.
To solve this, a new debug instrinsic called llvm.dbg.declare_value has
been created, which behaves exactly like an llvm.dbg.declare but can
take non pointer and integer location types.
More information here:
https://discourse.llvm.org/t/rfc-introduce-new-llvm-dbg-coroframe-entry-intrinsic/88269
This is the first patch as part of a stack of patches, with the one
succeeding it being: https://github.com/llvm/llvm-project/pull/168134
A new InstCombine transform uses this attribute to rewrite calls to a
modular version of the implementation along with llvm.reloc.none
relocations against aspects of the implementation needed by the call.
This change only adds support for the 'float' aspect, but it also builds
the structure needed for others.
See issue #146159
This patch adds a new `FramePointerKind::NonLeafNoReserve` and makes it
the default for `-momit-leaf-frame-pointer`.
It also adds a new commandline option `-m[no-]reserve-frame-pointer-reg`.
This should fix#154379, the main impact of this patch can be found in
`clang/lib/Driver/ToolChains/CommonArgs.cpp`.
These checks ensure that retained nodes of a DISubprogram belong to the
subprogram.
Tests with incorrect IR are fixed. We should not have variables of one subprogram present in retained nodes of other subprograms.
Also, interface for accessing DISubprogram's retained nodes is slightly
refactored. `DISubprogram::visitRetainedNodes` and
`DISubprogram::forEachRetainedNode` are added to avoid repeating checks
like
```
if (const auto *LV = dyn_cast<DILocalVariable>(N))
...
else if (const auto *L = dyn_cast<DILabel>(N))
...
else if (const auto *IE = dyn_cast<DIImportedEntity>(N))
...
```
This intrinsic emits a BFD_RELOC_NONE relocation at the point of call,
which allows optimizations and languages to explicitly pull in symbols
from static libraries without there being any code or data that has an
effectual relocation against such a symbol.
See issue #146159 for context.