Depends on #87545
Emit `GNU_PROPERTY_AARCH64_FEATURE_PAUTH` property in
`.note.gnu.property` section depending on
`aarch64-elf-pauthabi-platform` and `aarch64-elf-pauthabi-version` llvm
module flags.
Intrinsics like @llvm.seh.scope.begin and @llvm.seh.scope.end which do
not throw do not need funclets in catchpads or cleanuppads.
Fixes#69428
Co-authored-by: Robert Cox <robert.cox@intel.com>
---------
Co-authored-by: Robert Cox <robert.cox@intel.com>
Adds logic to the IR verifier that checks whether !tbaa.struct nodes are
well-formed. That is, it checks that the operands of !tbaa.struct nodes
are in groups of three, that each group of three operands consists of
two integers and a valid tbaa node, and that the regions described by
the offset and size operands are non-overlapping.
PR: https://github.com/llvm/llvm-project/pull/86709
These tests show invalid tbaa.struct metadata that is currently accepted
in preparation for a change to the IR Verifier that will then reject it.
PR: https://github.com/llvm/llvm-project/pull/86167
This patch adds support for parsing the proposed non-instruction debug
info ("RemoveDIs") from textual IR, and adds a test for the parser as well
as a set of verifier tests that are dependent on parsing to fire.
An important detail of this patch is the fact that although we can now
parse in the RemoveDIs (new) and Intrinsic (old) debug info formats, we
will always convert back to the old format at the end of parsing - this
is done for two reasons: firstly to ensure that every tool is able to
process IR printed in the new format, regardless of whether that tool
has had RemoveDIs support added, and secondly to maintain the effect of
the existing flags: for the tools where support for the new format has
been added, we will run LLVM passes in the new format iff
`--try-experimental-debuginfo-iterators=true`, and we will print in the
new format iff `--write-experimental-debuginfo-iterators=true`; the
format of the textual IR input should have no effect on either of these
features.
With e8512786fedbfa6ddba70ceddc29d7122173ba5e the for loop that iterates
over MDNode operands was changed to a range-based for loop. This change
surfaces a bug where if the result of MD->operands() is an ArrayRef that
has a size of 0, then iterating over that ArrayRef leads to a
segmentation fault, due to accessing invalid addresses.
This was reverted with 6ce03ff3fef8fb6fa9afe8eb22c6d98bced26d48 but this
test should be added to test that codepath in the future.
Since https://github.com/ARM-software/acle/pull/276 the ACLE
defines attributes to better describe the use of a given SME state.
Previously the attributes merely described the possibility of it being
'shared' or 'preserved', whereas the new attributes have more semantics
and also describe how the data flows through the program.
For ZT0 we already had to add new LLVM IR attributes:
* aarch64_new_zt0
* aarch64_in_zt0
* aarch64_out_zt0
* aarch64_inout_zt0
* aarch64_preserves_zt0
We have now done the same for ZA, such that we add:
* aarch64_new_za (previously `aarch64_pstate_za_new`)
* aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_inout_za (more specific variation of
`aarch64_pstate_za_shared`)
* aarch64_preserves_za (previously `aarch64_pstate_za_shared,
aarch64_pstate_za_preserved`)
This explicitly removes 'pstate' from the name, because with SME2 and
the new ACLE attributes there is a difference between "sharing ZA"
(sharing
the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing
either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
This patch extends SMEAttrs to interpret the following new attributes,
which are mutually exclusive and apply to SME2 only:
- aarch64_sme_zt0_in (ZT0_In)
- aarch64_sme_zt0_out (ZT0_Out)
- aarch64_sme_zt0_inout (ZT0_InOut)
- aarch64_sme_zt0_new (ZT0_New)
- aarch64_sme_zt0_preserved (ZT0_Preserved)
ZT0_In, ZT0_Out, ZT0_InOut & ZT0_Preserved are all considered to share
ZT0. These attributes will be required by later patches to determine
if ZT0 should be preserved around function calls, or cleared on entry
to the function.
This patch adds an intrinsic for setmaxnreg PTX instruction.
* PTX Doc link for this instruction:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-setmaxnreg
* The i32 argument, an immediate value, specifies the actual
absolute register count for the instruction.
* The `setmaxnreg` instruction is available in SM90a.
So, this patch adds 'hasSM90a' predicate to use in
the NVPTX backend.
* lit tests are added to verify the lowering of the intrinsic.
* Verifier logic (and tests) are added to test the register
count range and divisibility-by-8 requirements.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
Add the `dead_on_unwind` attribute, which states that the caller will
not read from this argument if the call unwinds. This allows eliding
stores that could otherwise be visible on the unwind path, for example:
```
declare void @may_unwind()
define void @src(ptr noalias dead_on_unwind %out) {
store i32 0, ptr %out
call void @may_unwind()
store i32 1, ptr %out
ret void
}
define void @tgt(ptr noalias dead_on_unwind %out) {
call void @may_unwind()
store i32 1, ptr %out
ret void
}
```
The optimization is not valid without `dead_on_unwind`, because the `i32
0` value might be read if `@may_unwind` unwinds.
This attribute is primarily intended to be used on sret arguments. In
fact, I previously wanted to change the semantics of sret to include
this "no read after unwind" property (see D116998), but based on the
feedback there it is better to keep these attributes orthogonal (sret is
an ABI attribute, dead_on_unwind is an optimization attribute). This is
a reboot of that change with a separate attribute.
Helps avoid the crash in verifier when it tries to print the error.
`none` token might be produced by llvm-reduce, since it's a default
value, so by extension this also fixes llvm-reduce crash, allowing it to
just discard invalid IR.
---------
Co-authored-by: arpilipe <apilipenko@azul.com>
Add a new intrinsic, similar to llvm.amdgcn.set.inactive, but used only
in functions with the `amdgpu_cs_chain` or `amdgpu_cs_chain_preserve`
calling conventions. It allows setting the inactive lanes to those of a
value received as a VGPR argument (whereas llvm.amdgcn.set.inactive
usually takes a constant as the value of the inactive lanes).
Differential Revision: https://reviews.llvm.org/D158604
This adds a writable attribute, which in conjunction with
dereferenceable(N) states that a spurious store of N bytes is
introduced on function entry. This implies that this many bytes
are writable without trapping or introducing data races. See
https://llvm.org/docs/Atomics.html#optimization-outside-atomic for
why the second point is important.
This attribute can be added to sret arguments. I believe Rust will
also be able to use it for by-value (moved) arguments. Rust likely
won't be able to use it for &mut arguments (tree borrows does not
appear to allow spurious stores).
In this patch the new attribute is only used by LICM scalar promotion.
However, the actual motivation for this is to fix a correctness issue
in call slot optimization, which needs this attribute to avoid
optimization regressions.
Followup to the discussion on D157499.
Differential Revision: https://reviews.llvm.org/D158081
Currently, we specify that the ptrmask intrinsic allows the mask to have
any size, which will be zero-extended or truncated to the pointer size.
However, what semantics of the specified GEP expansion actually imply is
that the mask is only meaningful up to the pointer type *index* size --
any higher bits of the pointer will always be preserved. In other words,
the mask gets 1-extended from the index size to the pointer size. This
is also the behavior we want for CHERI architectures.
This PR makes two changes:
* It spells out the interaction with the pointer type index size more
explicitly.
* It requires that the mask matches the pointer type index size. The
intention here is to make handling of this intrinsic more robust, to
avoid accidental mix-ups of pointer size and index size in code
generating this intrinsic. If a zero-extend or truncate of the mask is
desired, it should just be done explicitly in IR. This also cuts down on
the amount of testing we have to do, and things transforms needs to
check for.
As far as I can tell, we don't actually support pointers with different
index type size at the SDAG level, so I'm just asserting the sizes match
there for now. Out-of-tree targets using different index sizes may need
to adjust that code.
llvm.ptrmask is currently limited to pointers only, and does not accept
vectors of pointers. This is an unnecessary limitation, especially as
the underlying instructions (getelementptr etc) do support vectors of
pointers.
We should relax this sooner rather than later, to avoid introducing code
that assumes non-vectors (#67166).
The entry and loop intrinsics for convergence control cannot be preceded
by convergent operations in their respective basic blocks. To check
that, the verifier needs to reset its state at the start of the block.
This was missed in the previous commit
fa6dd7a24af2b02f236ec3b980d9407e86c2c4aa.
The only real requirement is that entry and loop intrinsics should not
be preceded by convergent operations in the same basic block. They do
not need to be the first in the block.
Relaxing the constraint on the entry and loop intrinsics avoids having
to make changes in the construction of LLVM IR, such as
getFirstInsertionPt(). It also avoids added complexity in the lowering
to Machine IR, where COPY instructions may be added to the start of the
basic block.
There is no vp.fpclass after FCLASS_VL(D151176), try to support vp.fpclass.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D152993
This is in preparation for using the same convergence verifier for both LLVM IR
and Machine IR.
Reviewed By: yassingh
Differential Revision: https://reviews.llvm.org/D158394
This patch relaxes the verifier when it checks whether an OP_entry_value has a
valid Value associated with it. We now allow undef/poison values as well, since
those may be introduced naturally through optimization.
Differential Revision: https://reviews.llvm.org/D158101
The affected lit tests failed when they were run in a path that contained the
word "call". CHECK-NOT lines that were supposed to match only the IR ended up
matching the path printed in the output. Fixed this by checking for "call void"
instead.
The refactored template can now be used with MachineVerifier.
Resubmitted after fixing build errors:
- Shared libraries build failed with undefined references due to "extern
template" declarations.
- Modules build failed due to a cycle dependence between llvm/ADT and llvm/IR.
The Generic*Impl.h files should be in llvm/IR to prevent this.
Differential Revision: https://reviews.llvm.org/D156522
This restores commit 93a3706711fd46d4d487640d91b16c2eec747c9e.
Originally reverted in 466bd9981150906552a1f2308e3c9065bfcb6741.
Resolve https://github.com/llvm/llvm-project/issues/61932. We should add the validation.
LLVM can't handle IR where subprogram definitions are nested within DICompositeType when doing LTO builds, because there's no good way to cross the CU boundary to insert a nested DISubprogram definition in one CU into a type defined in another CU.
The test cases `cross-cu-inlining-2.ll` and `cross-cu-inlining-ranges.ll` can be deleted. In the `cross-cu-inlining-2.ll`, the low pc and high pc of the CU are also incorrect.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D152095
According the discuss on D154953, we need to make the LangRef change
before the optimization relied on the new behaviour:
vscale_range implies vscale is a power-of-two value, parse of the
attribute to reject values that are not a power-of-two.
Thanks nikic for the wonderful summary of discussing on D154953:
To provide a bit more context here. We would like to have power of two vscale exposed in a target-independent way, so we can make use of this in places like ValueTracking, just like we currently do the vscale range. Some options that have been discussed are:
- Remove support for non-power-of-two vscales entirely. (This is my personal preference, but this is hard to undo if it turns out someone does need them.)
- Add an extra attribute vscale_pow2, or a data layout property.
- Make vscale_range imply power-of-two vscale, as a compromise solution (what this patch does). This would be relatively easy to turn into one of the two above at a later point.
Reviewed By: paulwalker-arm, nikic, efriedma
Differential Revision: https://reviews.llvm.org/D155193
In D146869 @arsenm pointed out that the constrained intrinsics aren't
getting the strictfp attribute by default. They should be since they are
required to have it anyway.
TableGen did not know about this attribute until now. This patch adds
strictfp to TableGen, and it uses it on all of the constrained intrinsics.
Differential Revision: https://reviews.llvm.org/D154991
This is a reboot of the original design and implementation by
Nicolai Haehnle <nicolai.haehnle@amd.com>:
https://reviews.llvm.org/D85603
This change also obsoletes an earlier attempt at restarting the work on
convergence tokens:
https://reviews.llvm.org/D104504
Changes relative to D85603:
1. Clean up the definition of a "convergent operation", a convergent
call and convergent function.
2. Clean up the relationship between dynamic instances, sets of threads and
convergence tokens.
3. Redistribute the formal rules into the definitions of the convergence
intrinsics.
4. Expand on the semantics of entering a function from outside LLVM,
and the environment-defined outcome of the entry intrinsic.
5. Replace the term "cycle" with "closed path". The static rules are defined
in terms of closed paths, and then a relation is established with cycles.
6. Specify that if a function contains a controlled convergent operation, then
all convergent operations in that function must be controlled.
7. Describe an optional procedure to infer tokens for uncontrolled convergent
operations.
8. Introduce controlled maximal convergence-before and controlled m-converged
property as an update to the original properties in UniformityAnalysis.
9. Additional constraint that a cycle heart can only occur in the header of a
reducible cycle (natural loop).
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D147116
We only check a subset of the constraints in the verifier:
* that we only call the intrinsic from functions with a restricted set of
calling conventions
* that the 'flags' argument is an immediate
Other checks are (probably) more appropriate for codegen.
Differential Revision: https://reviews.llvm.org/D151995
Add the amdgpu_cs_chain and amdgpu_cs_chain_preserve keywords to
LLVM IR and make sure we can parse and print them. Also make sure we
perform some basic checks in the IR verifier - similar to what we check
for many of the other AMDGPU calling conventions, plus the additional
restriction that we can't have direct calls to functions with these
calling conventions.
Differential Revision: https://reviews.llvm.org/D151994
The generic implementation is umin(TC, VF * vscale).
Lowering to vsetvli for RISC-V will come in a future patch.
This patch is a pre-requisite to be able to CodeGen vectorized code from
D99750.
Reviewed By: reames, frasercrmck
Differential Revision: https://reviews.llvm.org/D149916
This patch-set aims to simplify the existing RVV segment load/store
intrinsics to use a type that represents a tuple of vectors instead.
To achieve this, first we need to relax the current limitation for an
aggregate type to be a target of load/store/alloca when the aggregate
type contains homogeneous scalable vector types. Then to adjust the
prolog of an LLVM function during lowering to clang. Finally we
re-define the RVV segment load/store intrinsics to use the tuple types.
The pull request under the RVV intrinsic specification is
riscv-non-isa/rvv-intrinsic-doc#198
---
This is the 1st patch of the patch-set. This patch is originated from
D98169.
This patch allows aggregate type (StructType) that contains homogeneous
scalable vector types to be a target of load/store/alloca. The RFC of
this patch was posted in LLVM Discourse.
https://discourse.llvm.org/t/rfc-ir-permit-load-store-alloca-for-struct-of-the-same-scalable-vector-type/69527
The main changes in this patch are:
Extend `StructLayout::StructSize` from `uint64_t` to `TypeSize` to
accommodate an expression of scalable size.
Allow `StructType:isSized` to also return true for homogeneous
scalable vector types.
Let `Type::isScalableTy` return true when `Type` is `StructType`
and contains scalable vectors
Extra description is added in the LLVM Language Reference Manual on the
relaxation of this patch.
Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Co-Authored-by: eop Chen <eop.chen@sifive.com>
Reviewed By: craig.topper, nikic
Differential Revision: https://reviews.llvm.org/D146872
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
A follow up patch will make the CoroSplit pass introduce such operations in the
IR level when it is safe to do so.
Depends on D149748
Differential Revision: https://reviews.llvm.org/D149778
Annotation metadata supports adding singular annotation strings to annotation block. This patch adds the ability to insert a tuple of strings into the metadata array.
The idea here is that each tuple of strings represents a piece of information that can be all related. It makes it easier to parse through related metadata information given it will be contained in one tuple.
For example in remarks any pass that implements annotation remarks can have different type of remarks and pass additional information for each.
The original behaviour of annotation remarks is preserved here and we can mix tuple annotations and single annotations for the same instruction.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D148328