Add support for specifying the names of address spaces when specifying
pointer properties for an address space. Update LLVM's AsmPrinter and
LLParser to print and read these symbolic address space name.
This commit adds finer-grained versions of isNonIntegralAddressSpace() and
isNonIntegralPointerType() where the current semantics prohibit
introduction of both ptrtoint and inttoptr instructions. The current
semantics are too strict for some targets (e.g. AMDGPU/CHERI) where
ptrtoint has a stable value, but the pointer has additional metadata.
Currently, marking a pointer address space as non-integral also marks it
as having an unstable bitwise representation (e.g. when pointers can be
changed by a copying GC). This property inhibits a lot of
optimizations that are perfectly legal for other non-integral pointers
such as fat pointers or CHERI capabilities that have a well-defined
bitwise representation but can't be created with only an address.
This change splits the properties of non-integral pointers and allows
for address spaces to be marked as unstable or non-integral (or both)
independently using the 'p' part of the DataLayout string.
A 'u' following the p marks the address space as unstable and specifying
a index width != representation width marks it as non-integral.
Finally, we also add an 'e' flag to mark pointers with external state
(such as the CHERI capability validity) state. These pointers require
special handling of loads and stores in addition to being non-integral.
This does not change the checks in any of the passes yet - we
currently keep the existing non-integral behaviour. In the future I plan
to audit calls to DL.isNonIntegral[PointerType]() and replace them with
the DL.mustNotIntroduce{IntToPtr,PtrToInt}() checks that allow for more
optimizations.
RFC: https://discourse.llvm.org/t/rfc-finer-grained-non-integral-pointer-properties/83176
Reviewed By: nikic, krzysz00
Pull Request: https://github.com/llvm/llvm-project/pull/105735
Clang and other frontends generally need the LLVM data layout string in
order to generate LLVM IR modules for LLVM. MLIR clients often need it
as well, since MLIR users often lower to LLVM IR.
Before this change, the LLVM datalayout string was computed in the
LLVM${TGT}CodeGen library in the relevant TargetMachine subclass.
However, none of the logic for computing the data layout string requires
any details of code generation. Clients who want to avoid duplicating
this information were forced to link in LLVMCodeGen and all registered
targets, leading to bloated binaries. This happened in PR #145899,
which measurably increased binary size for some of our users.
By moving this information to the TargetParser library, we
can delete the duplicate datalayout strings in Clang, and retain the
ability to generate IR for unregistered targets.
This is intended to be a very mechanical LLVM-only change, but there is
an immediately obvious follow-up to clang, which will be prepared
separately.
The vast majority of data layouts are computable with two inputs: the
triple and the "ABI name". There is only one exception, NVPTX, which has
a cl::opt to enable short device pointers. I invented a "shortptr" ABI
name to pass this option through the target independent interface.
Everything else fits. Mips is a bit awkward because it uses a special
MipsABIInfo abstraction, which includes members with codegen-like
concepts like ABI physical registers that can't live in TargetParser. I
think the string logic of looking for "n32" "n64" etc is reasonable to
duplicate. We have plenty of other minor duplication to preserve
layering.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
I don't think we need to explicitly specify i1 alignment, as this is
going to fall back to i8 alignment.
This may change behavior if a data layout explicitly sets i8 alignment
without also setting i1 layout, but I'd expect this to be a bug fix in
that case.
getTypeAllocSize() currently works by taking the type store size and
aligning it to the ABI alignment. However, this ends up doing redundant
work in various cases, for example arrays will unnecessarily repeat the
alignment step, and structs will fetch the StructLayout multiple times.
As this code is rather hot (it is called every time we need to calculate
GEP offsets for example), specialize the implementation. This repeats a
small amount of logic from getAlignment(), but I think that's
worthwhile.
The check for `isOSWindows() || isUEFI()` is used in several places
across the codebase. Introducing `isOSWindowsOrUEFI()` in Triple.h
to simplify these checks.
Because it was implemented in terms of getMaxIndexSize, it was always
rounding the values up to a multiple of 8. Additionally, it was using
the PointerSpec's BitWidth rather than its IndexBitWidth, which was
self-evidently incorrect.
Since getMaxIndexSize was only used by getMaxIndexSizeInBits, and its
name and function seem niche and somewhat confusing, go ahead and remove
it until a concrete need for it arises.
Instead of storing this as a separate array of non-integral pointers,
add it to the PointerSpec class instead. This will allow for future
simplifications such as splitting the non-integral property into
multiple distinct ones: relocatable (i.e. non-stable representation) and
non-integral representation (i.e. pointers with metadata).
Reviewed By: arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/105734
Split off of #104545 to reduce patch size.
Similar to #104546, this introduces `parseSize` and `parseAlignment`,
which are improved versions of `getInt` tailored for specific needs.
I'm not a GTest guru, so the tests are not ideal.
Split off of #104545 to reduce patch size.
This introduces `parseAddrSpace` function, intended as a replacement for
`getAddrSpace`, which doesn't check for trailing characters after the
address space number. `getAddrSpace` will be removed after switching all
uses to `parseAddrSpace`.
Pull Request: https://github.com/llvm/llvm-project/pull/104546
This makes `LayoutAlignElem` / `PointerAlignElem` and `AlignTypeEnum`
inner types of `DataLayout`. The types are also renamed to match their
meaning (LangRef refers to them as "specification" and "specifier").
Pull Request: https://github.com/llvm/llvm-project/pull/103723
`clear` was never necessary as it is always called on a fresh instance
of the class or just before freeing an instance's memory. `reset` is
effectively the same as the constructor.
Pull Reuquest: https://github.com/llvm/llvm-project/pull/102993
The constructor initializes `*this` with `M->getDataLayout()`, which
is effectively the same as calling the copy constructor.
There does not seem to be a case where a copy would be necessary.
Pull Request: https://github.com/llvm/llvm-project/pull/102841
`DataLayout` isn't exactly cheap to copy (448 bytes on a 64-bit host).
Move `operator=` to cpp file to improve compilation time. Also move
`operator==` closer to `operator=` and add a couple of FIXMEs.
It is now translated to `<1 x i64>`, which allows the removal of a bunch
of special casing.
This _incompatibly_ changes the ABI of any LLVM IR function with
`x86_mmx` arguments or returns: instead of passing in mmx registers,
they will now be passed via integer registers. However, the real-world
incompatibility caused by this is expected to be minimal, because Clang
never uses the x86_mmx type -- it lowers `__m64` to either `<1 x i64>`
or `double`, depending on ABI.
This change does _not_ eliminate the SelectionDAG `MVT::x86mmx` type.
That type simply no longer corresponds to an IR type, and is used only
by MMX intrinsics and inline-asm operands.
Because SelectionDAGBuilder only knows how to generate the
operands/results of intrinsics based on the IR type, it thus now
generates the intrinsics with the type MVT::v1i64, instead of
MVT::x86mmx. We need to fix this before the DAG LegalizeTypes, and thus
have the X86 backend fix them up in DAGCombine. (This may be a
short-lived hack, if all the MMX intrinsics can be removed in upcoming
changes.)
Works towards issue #98272.
Vectors are always bit-packed and don't respect the elements' alignment
requirements. This is different from arrays. This means offsets of
vector GEPs need to be computed differently than offsets of array GEPs.
This PR fixes many places that rely on an incorrect pattern
that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`.
We replace these by usages of `GTI.getSequentialElementStride(DL)`,
which is a new helper function added in this PR.
This changes behavior for GEPs into vectors with element types for which
the (bit) size and alloc size is different. This includes two cases:
* Types with a bit size that is not a multiple of a byte, e.g. i1.
GEPs into such vectors are questionable to begin with, as some elements
are not even addressable.
* Overaligned types, e.g. i16 with 32-bit alignment.
Existing tests are unaffected, but a miscompilation of a new test is fixed.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
It seems TypeSize is currently broken in the sense that:
TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)
without failing its assert that explicitly tests for this case:
assert(LHS.Scalable == RHS.Scalable && ...);
The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.
This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.
The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
I don't think there is a use case for having an index type that is wider
than the pointer type, and I'm not entirely clear what semantics this
would even have.
Also clarify the GEP semantics to explicitly say how they interact with
the index type width.
The last use of getABITypeAlignment was removed by:
commit 26bd6476c61f08fc8c01895caa02b938d6a37221
Author: Guillaume Chatelet <gchatelet@google.com>
Date: Fri Jan 13 15:05:24 2023 +0000
Differential Revision: https://reviews.llvm.org/D152670
This patch-set aims to simplify the existing RVV segment load/store
intrinsics to use a type that represents a tuple of vectors instead.
To achieve this, first we need to relax the current limitation for an
aggregate type to be a target of load/store/alloca when the aggregate
type contains homogeneous scalable vector types. Then to adjust the
prolog of an LLVM function during lowering to clang. Finally we
re-define the RVV segment load/store intrinsics to use the tuple types.
The pull request under the RVV intrinsic specification is
riscv-non-isa/rvv-intrinsic-doc#198
---
This is the 1st patch of the patch-set. This patch is originated from
D98169.
This patch allows aggregate type (StructType) that contains homogeneous
scalable vector types to be a target of load/store/alloca. The RFC of
this patch was posted in LLVM Discourse.
https://discourse.llvm.org/t/rfc-ir-permit-load-store-alloca-for-struct-of-the-same-scalable-vector-type/69527
The main changes in this patch are:
Extend `StructLayout::StructSize` from `uint64_t` to `TypeSize` to
accommodate an expression of scalable size.
Allow `StructType:isSized` to also return true for homogeneous
scalable vector types.
Let `Type::isScalableTy` return true when `Type` is `StructType`
and contains scalable vectors
Extra description is added in the LLVM Language Reference Manual on the
relaxation of this patch.
Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Co-Authored-by: eop Chen <eop.chen@sifive.com>
Reviewed By: craig.topper, nikic
Differential Revision: https://reviews.llvm.org/D146872
Many uses of getIntPtrType() were using that type to calculate the
neened type for GEP offset arguments. However, some time ago,
DataLayout was extended to support pointers where the size of the
pointer is not equal to the size of the values used to index it.
Much code was already migrated to, for example, use getIndexSizeInBits
instead of getPtrSizeInBits, but some rewrites still used
getIntPtrType() to get the type for GEP offsets.
This commit changes uses of getIntPtrType() to getIndexType() where
they are involved in a GEP-related calculation.
In at least one case (bounds check insertion) this resolves a compiler
crash that the new test added here would previously trigger.
This commit does not impact
- C library-related rewriting (memcpy()), which are operating under
the assumption that intptr_t == size_t. While all the mechanisms for
breaking this assumption now exist, doing so is outside the scope of
this commit.
- Code generation and below. Note that the use of getIntPtrType() in
CodeGenPrepare will be changed in a future commit.
- Usage of getIntPtrType() in any backend
Depends on D143435
Reviewed By: arichardson
Differential Revision: https://reviews.llvm.org/D143437
- The current implementation checks them for 24-bit inegers but the
document says 23-bit one effectively by listing the range as [1,2^23).
- Minor error message correction.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D144685
Instead of storing alignment for integers, floats, vectors and
structs in a single vector with a type tag, store them in
separate vectors instead. This makes the alignment lookup faster,
as we don't have to scan over irrelevant alignment entries.
The method DataLayout::getGEPIndexForOffset(Type *&ElemTy, APInt &Offset)
allows to generate GEP indices for a given byte-based offset.
This allows to generate "natural" GEPs using the given type structure
if the byte offset happens to match a nested element object.
With opaque pointers and a general move towards byte-based GEPs [1],
this function may be questionable in the future.
This patch avoids creation of GEPs into vectors in routines that use
DataLayout::getGEPIndexForOffset by not returning indices in that case.
The reason is that A) GEPs into vectors have been discouraged for a long
time [2], and B) that GEPs into vectors are currently broken if the element
type is overaligned [1]. This is also demonstrated by a lit test where
previously InstCombine replaced valid loads by poison. Note that
the result of InstCombine on that test is *still* invalid, because
padding bytes are assumed.
Moreover, GEPs into vectors may be outright forbidden in the future [1].
[1]: https://discourse.llvm.org/t/67497
[2]: https://llvm.org/docs/GetElementPtr.html
The test case is new. It will be precommitted if this patch is accepted.
Differential Revision: https://reviews.llvm.org/D142146
It is widely assumed that i8 is naturally aligned (i8:8),
and that hence i8s can be used to access arbitrary bytes.
As discussed in https://discourse.llvm.org/t/status-of-overaligned-i8,
this patch makes this assumption explicit, by documenting it in
the LangRef, and enforcing it when parsing a data layout string.
Historically, there have been data layouts that violate this requirement,
notably the old DXIL data layout that aligns i8 to 32 bits.
A previous patch (df1a74a) enabled importing modules with invalid data layouts
using override callbacks.
Users who wish to continue importing modules with overaligned i8s (e.g. DXIL)
thus need to provide a data layout override callback that fixes the
data layout, at minimum by setting natural alignment for i8.
Any further adjustments to the module (e.g. adding padding bytes if necessary)
need to be done after module import. In the case of DXIL, this should not be
necessary, because i8 usage in DXIL is very limited and its alignment actually
does not matter, see
https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#primitive-types
Differential Revision: https://reviews.llvm.org/D142211
Target-extension types represent types that need to be preserved through
optimization, but otherwise are not introspectable by target-independent
optimizations. This patch doesn't add any uses of these types by an existing
backend, it only provides basic infrastructure such that these types would work
correctly.
Reviewed By: nikic, barannikov88
Differential Revision: https://reviews.llvm.org/D135202
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716