When a global variable has a size that exceeds the size of the address
space it resides in, the verifier should fail as the variable can
neither be materialized nor fully accessed. This patch adds a check to
the verifier to enforce it.
---------
Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>
Co-authored-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>
Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first
class denormal_fpenv attribute. Previously the query for the effective
denormal mode involved two string attribute queries with parsing. I'm
introducing more uses of this, so it makes sense to convert this
to a more efficient encoding. The old representation was also awkward
since it was split across two separate attributes. The new encoding
just stores the default and float modes as bitfields, largely avoiding
the need to consider if the other mode is set.
The syntax in the common cases looks like this:
`denormal_fpenv(preservesign,preservesign)`
`denormal_fpenv(float: preservesign,preservesign)`
`denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)`
I wasn't sure about reusing the float type name instead of adding a
new keyword. It's parsed as a type but only accepts float. I'm also
debating switching the name to subnormal to match the current
preferred IEEE terminology (also used by nofpclass and other
contexts).
This has a behavior change when using the command flag debug
options to set the denormal mode. The behavior of the flag
ignored functions with an explicit attribute set, per
the default and f32 version. Now that these are one attribute,
the flag logic can't distinguish which of the two components
were explicitly set on the function. Only one test appeared to
rely on this behavior, so I just avoided using the flags in it.
This also does not perform all the code cleanups this enables.
In particular the attributor handling could be cleaned up.
I also guessed at how to support this in MLIR. I followed
MemoryEffects as a reference; it appears bitfields are expanded
into arguments to attributes, so the representation there is
a bit uglier with the 2 2-element fields flattened into 4 arguments.
This is an attempt to merge https://reviews.llvm.org/D144006 with LTO
fix.
The last merge attempt was
https://github.com/llvm/llvm-project/pull/75385.
The issue with it was investigated in
https://github.com/llvm/llvm-project/pull/75385#issuecomment-2386684121.
The problem happens when
1. Several modules are being linked.
2. There are several DISubprograms that initially belong to different
modules but represent the same source code function (for example, a
function included from the same source code file).
3. Some of such DISubprograms survive IR linking. It may happen if one
of them is inlined somewhere or if the functions that have these
DISubprograms attached have internal linkage.
4. Each of these DISubprograms has a local type that corresponds to the
same source code type. These types are initially from different modules,
but have the same ODR identifier.
If the same (in the sense of ODR identifier/ODR uniquing rules) local
type is present in two modules, and these modules are linked together,
the type gets uniqued. A DIType, that happens to be loaded first,
survives linking, and the references on other types with the same ODR
identifier from the modules loaded later are replaced with the
references on the DIType loaded first. Since defintion subprograms, in
scope of which these types are located, are not deduplicated, the linker
output may contain multiple DISubprogram's having the same (uniqued)
type in their retainedNodes lists.
Further compilation of such modules causes crashes.
To tackle that,
* previous solution to handle LTO linking with local types in
retainedNodes is removed (cloneLocalTypes() function),
* for each loaded distinct (definition) DISubprogram, its retainedNodes
list is scanned after loading, and DITypes with a scope of another
subprogram are removed. If something from a Function corresponding to
the DISubprogram references uniqued type, we rely on cross-CU links.
Additionally:
* a check is added to Verifier to report about local types located in a
wrong retainedNodes list,
Original commit message follows.
---------
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Similar to imported declarations, the patch tracks function-local types in
DISubprogram's 'retainedNodes' field. DwarfDebug is adjusted in accordance with
the aforementioned metadata change and provided a support of function-local
types scoped within a lexical block.
The patch assumes that DICompileUnit's 'enums field' no longer tracks local
types and DwarfDebug would assert if any locally-scoped types get placed there.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Co-authored-by: Jeremy Morse <jeremy.morse@sony.com>
In some of our use cases, the GPU runtime stores some data at the top of
the stack. It figures out where it's safe to store it by using the PAL
metadata generated by the backend, which includes the total stack size.
However, the metadata does not include the space reserved at the bottom
of the stack for the trap handler when CWSR is enabled in dynamic VGPR
mode. This space is reserved dynamically based on whether or not the
code is running on the compute queue. Therefore, the runtime needs a way
to take that into account.
Add support for `llvm.sponentry`, which should return the base of the
stack,
skipping over any reserved areas. This allows us to keep this
computation in
one place rather than duplicate it between the backend and the runtime.
The implementation for functions that set up their own stack uses a
pseudo
that is expanded to the same code sequence as that used in the prolog to
set up the stack in the first place.
In callable functions, we generate a fixed stack object and use that
instead,
similar to the Arm/AArch64 approach. This wastes some stack space but
that's
not a problem for now because we're not planning to use this in callable
functions yet.
This adds the analogous metadata to the nofpclass attribute
to assert values are not a certain set of floating-point classes.
This allows the same information to be expressed if a function
argument is passed indirectly. This matches the bitmask encoding
of nofpclass.
I also think this should be allowed for stores to symmetrically handle
sret, but leave that for later.
Alternatively we could add a more expressive !fprange metadata,
but that would be much more complex. It's useful to match the attribute,
and more annotations can always be added.
Fixes#133560
Following on from #170796, this PR implements the second part of
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974
by allowing non-constant offsets in the vector splice intrinsics.
Previously @llvm.vector.splice had a restriction enforced by the
verifier that the offset had to be known to be within the range of the
vector at compile time. Because we can't enforce this with non-constant
offsets, it's been relaxed so that offsets that would slide the vector
out of bounds return a poison value, similar to
insertelement/extractelement.
@llvm.vector.splice.left also previously only allowed offsets within the
range 0 <= Offset < N, but this has been relaxed to 0 <= Offset <= N so
that it's consistent with @llvm.vector.splice.right.
In lieu of the verifier checks that were removed, InstSimplify has been
taught to fold splices to poison when the offset is out of bounds.
The cost model isn't implemented in this PR, and just returns invalid
for any non-constant offsets for now. I think the correct way to cost
these non-constant offets isn't through getShuffleCost because they
can't handle variable masks, but instead just through
getIntrinsicInstCost.
Add a verifier check to reject GEP instructions that index into vectors
with non-byte-addressable element types (e.g., <2 x i1>). Such GEPs
cannot have their offset computed in bytes, causing assertions in passes
like SROA that try to compute byte offsets.
Fixes#176628.
The patch adds two intrinsics: llvm.convert.to.arbitrary.fp and
llvm.convert.from.arbitrary.fp.
The intrinsics perform conversions between values whose interpretation
differs from their representation in LLVM IR. The intrinsics are
overloaded on both its return type and first argument. Metadata operands
describe how the raw bits should be interpreted before and after the
conversion.
Typical use case is to convert IEEE-754 floating point types to FP8/FP4
and backwards for ML applications.
Addresses
https://discourse.llvm.org/t/rfc-spir-v-way-to-represent-float8-in-llvm-ir/87758/10
This patch adds support in Clang for the RPRFM instruction, by adding
the following intrinsics:
```
void __pldx_range(unsigned int *access_kind*, unsigned int retention_policy,
signed int length*, unsigned int count, signed int stride,
size_t reuse distance, void const *addr);
void __pld_range(unsigned int access_kind*, unsigned int retention_policy,
uint64_t metadata, void const *addr);
```
The `__ARM_PREFETCH_RANGE` macro can be used to test whether these
intrinsics are implemented. If the RPRFM instruction is not available, this
instruction is a NOP.
This implements the following ACLE proposal:
https://github.com/ARM-software/acle/pull/423
Add a new metadata node `!implicit.ref` to represent an implicit
dependency between 2 symbols. The metadata is unique to AIX and gets
lowered to a relocation that adds an explicit link between the section
the global that the metadata is placed on is allocated in, to the
asscoiated symbol. This relocation will cause the associated symbol to
remain live if the section is not garbage collected. This is used mainly
for compiler features where there is some hidden runtime dependency
between the symbols that isn't otherwise obvious to the linker.
This PR implements the first change outlined in
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel
In order to allow non-immediate offsets in the llvm.vector.splice
intrinsic, we need to separate out the "shift left" and "shift right"
modes into two separate intrinsics, which were previously determined by
whether or not the offset is positive or negative.
The description in the LangRef has also been reworded in terms of
sliding elements left or right and extracting either the upper or lower
half as opposed to extracting from a certain index, which brings it
inline with the definition of `llvm.fshr.*`/`llvm.fshl.*`.
This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into
their new equivalent one based on their offset, so existing uses of
vector.splice should still work.
Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced
in this PR to keep the diff small and kick the tyres on the AutoUpgrader
a bit. I planned to do this in a follow up NFC but can include it in
this PR if reviewers prefer.
Similarly the shuffle costing kind `SK_Splice` has just been kept the
same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight`
later.
Fixes https://github.com/llvm/llvm-project/issues/172647
Currently, MC assumes that all `target-feature` flag attributes are well
formed and will crash otherwise. This change handles those cases more
gracefully.
The memprof metadata verifier supported multiple string tags, but in
reality, the other code (e.g. addCallStack) only supports a single such
tag. Update the verifier to reflect that limitation, and the associated
tests.
Fixes#157217
SwitchInst case values must be ConstantInt, which have no use list.
Therefore it is not necessary to store these as Use, instead store them
more efficiently as a simple array of pointers after the uses, similar
to how PHINode stores basic blocks.
After this change, the successors of all terminators are stored
consecutively in the operand list. This is preparatory work for
improving the performance of successor access.
Add new C API functions so that switch case values remain accessible
from bindings for other languages.
While this could also be achieved by merely changing the order of
operands (i.e., first all successors, then all constants), doing so
would increase the asymptotic runtime of addCase from O(1) to O(n)
(i.e., adding n cases would be O(n^2)), because it would need to shift
all constants by one slot. Having null/invalid operands is also a bad
idea and would cause much more breakage.
Pull Request: https://github.com/llvm/llvm-project/pull/170984
SwitchInst case values must be ConstantInt, which have no use list.
Therefore it is not necessary to store these as Use, instead store them
more efficiently as a simple array of pointers after the uses, similar
to how PHINode stores basic blocks.
After this change, the successors of all terminators are stored
consecutively in the operand list. This is preparatory work for
improving the performance of successor access.
Currently we multiply the known minimum number of elements by vscale
even if the vector in question is fixed, so sometimes we miss some fixed
vectors with out of bounds indices.
This commit adds support for using intrinsics with callbr. The uses of
this will most of the time look like this example:
```llvm
callbr void @llvm.amdgcn.kill(i1 %c) to label %cont [label %kill]
kill:
unreachable
cont:
...
```
Consider this Ada type:
```
type Array_Type is array (Natural range <>) of Integer;
type Record_Type (L1, L2 : Natural) is record
I1 : Integer;
A1 : Array_Type (1 .. L1);
I2 : Integer;
A2 : Array_Type (1 .. L2);
I3 : Integer;
end record;
```
Here, the array fields have lengths that depend on the discriminants of
the record type. However, in this case the array lengths cannot be
expressed as DWARF location expressions, with the issue being that "A2"
has a non-constant offset, but an expression involving
DW_OP_push_object_address will push the address of the field -- with no
way to find the location of "L2".
In a case like this, I believe the correct DWARF is to emit the array
ranges using a direct reference to the discriminant, like:
```
<3><1156>: Abbrev Number: 1 (DW_TAG_member)
<1157> DW_AT_name : l1
...
<3><1177>: Abbrev Number: 6 (DW_TAG_array_type)
<1178> DW_AT_name : (indirect string, offset: 0x1a0b): vla__record_type__T4b
<117c> DW_AT_type : <0x1287>
<1180> DW_AT_sibling : <0x118e>
<4><1184>: Abbrev Number: 7 (DW_TAG_subrange_type)
<1185> DW_AT_type : <0x1280>
<1189> DW_AT_upper_bound : <0x1156>
```
(FWIW this is what GCC has done for years.)
This patch makes this possible in LLVM, by letting a DISubrangeType
refer to a DIDerivedType. gnat-llvm can then arrange for the DIE
reference to be correct by setting the array type's scope to be the
record.
Deactivation symbol operands are supported in the code generator by
building on the previously added support for IRELATIVE relocations.
Reviewers: ojhunt, fmayer, ahmedbougacha, nikic, efriedma-quic
Reviewed By: fmayer
Pull Request: https://github.com/llvm/llvm-project/pull/133537
For swift async code, we need to use a debug intrinsic that behaves like
an llvm.dbg.declare but can take any location type rather than just a
pointer or integer.
To solve this, a new debug instrinsic called llvm.dbg.declare_value has
been created, which behaves exactly like an llvm.dbg.declare but can
take non pointer and integer location types.
More information here:
https://discourse.llvm.org/t/rfc-introduce-new-llvm-dbg-coroframe-entry-intrinsic/88269
This is the first patch as part of a stack of patches, with the one
succeeding it being: https://github.com/llvm/llvm-project/pull/168134
A new InstCombine transform uses this attribute to rewrite calls to a
modular version of the implementation along with llvm.reloc.none
relocations against aspects of the implementation needed by the call.
This change only adds support for the 'float' aspect, but it also builds
the structure needed for others.
See issue #146159
This patch adds a new `FramePointerKind::NonLeafNoReserve` and makes it
the default for `-momit-leaf-frame-pointer`.
It also adds a new commandline option `-m[no-]reserve-frame-pointer-reg`.
This should fix#154379, the main impact of this patch can be found in
`clang/lib/Driver/ToolChains/CommonArgs.cpp`.
These checks ensure that retained nodes of a DISubprogram belong to the
subprogram.
Tests with incorrect IR are fixed. We should not have variables of one subprogram present in retained nodes of other subprograms.
Also, interface for accessing DISubprogram's retained nodes is slightly
refactored. `DISubprogram::visitRetainedNodes` and
`DISubprogram::forEachRetainedNode` are added to avoid repeating checks
like
```
if (const auto *LV = dyn_cast<DILocalVariable>(N))
...
else if (const auto *L = dyn_cast<DILabel>(N))
...
else if (const auto *IE = dyn_cast<DIImportedEntity>(N))
...
```
This intrinsic emits a BFD_RELOC_NONE relocation at the point of call,
which allows optimizations and languages to explicitly pull in symbols
from static libraries without there being any code or data that has an
effectual relocation against such a symbol.
See issue #146159 for context.
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.
This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)
It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
a1ef81d added overloads for `llvm.matrix.column.major.store` and
`llvm.matrix.column.major.load` that allow strides to occupy an
arbitrary bitwidth. This change wasn't reflected in the verifier,
causing an assertion to trip when given strides overflowing 64-bit. This
patch explicitly caps the bitwidth at 64, repairing the crash and
avoiding future complexity dealing with strides that overflow 64 bits.
PR: https://github.com/llvm/llvm-project/pull/163729
This introduces `!captures` metadata on stores, which looks like this:
```
store ptr %x, ptr %y, !captures !{!"address", !"read_provenance"}
```
The semantics are the same as replacing the store with a call like this:
```
call void @llvm.store(ptr captures(address, read_provenance) %x, ptr %y)
```
This metadata is intended for annotation by frontends -- it's not
something we can feasibly infer at this point, as it would require
analyzing uses of the pointer stored in memory.
The motivating use case for this is Rust's `println!()` machinery, which
involves storing a reference to the value inside a structure. This means
that printing code (including conditional debugging code), can inhibit
optimizations because the pointer escapes. With the new metadata we can
annotate this as a read-only capture, which has less impact on
optimizations.
Add a new named module-level frontend-annotated metadata that
specifies the TBAA node for an integer access, for which, C/C++
`errno` accesses are guaranteed to use (under strict aliasing).
This should allow LLVM to prove the involved memory location/
accesses may not alias `errno`; thus, to perform optimizations
around errno-writing libcalls (store-to-load forwarding amongst
others).
Previous discussion: https://discourse.llvm.org/t/rfc-modelling-errno-memory-effects/82972.
Assumes either have a boolean condition, or a number of attribute based
operand bundles. Currently, we also allow mixing both forms, though we
don't make use of this in practice. This adds additional complexity for
code dealing with assumes.
Forbid mixing both forms, by requiring that assumes with operand bundles
have an i1 true condition.
Teach the IR parser and writer to support metadata on ifuncs, and update
documentation.
In PR #153049, we have a use case of attaching the `!associated`
metadata to an ifunc.
Since an ifunc is similar to a function declaration, it seems natural to
allow metadata on ifuncs.
Currently, the metadata API allows adding Metadata to
llvm::GlobalObject, so the in-memory IR allows for metadata on ifuncs,
but the IR reader/writer is not aware of that.
---------
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
Rather than passes using `!prof = !{!”unknown”}`for cases where don’t have enough information to emit profile values, this patch captures the pass (or some other information) that can help diagnostics - i.e. `!{!”unknown”, !”some-pass-name”}`.
For example, suppose we emitted a `select` with the unknown metadata, and, later, end up needing to lower that to a conditional branch. If we observe (via sample profiling, for example) that the branch is biased and would have benefitted from a valid profile, the extra information can help speed up debugging.
We can also (in a subsequent pass) generate optimization remarks about such lowered selects, with a similar aim - identify patterns lowering to `select` that may be worth some extra investment in extracting a more precise profile.