- Address space cast of nullptr in local_space into a generic_space for
the CUDA backend. The reason for this cast was having invalid local
memory base address for the associated variable.
- In the context of AMD GPU, assigns a NULL value as ~0 for the address
spaces of sycl_local and sycl_private to match the ones for opencl_local
and opencl_private.
Summary:
The PTX target supports the f16 type natively and we alreaqdy have a few
LLVM backend tests that support the LLVM-IR. We should be able to enable
this for generic use. This is done prior the f16 math functions being
written in the GPU libc case.
This reverts commit 89c1bf1230e011f2f0e43554c278205fa1819de5.
This has been unimplemenented for a while, and GCC does not implement
it, therefore we need to consider whether we should just deprecate it
in the ACLE instead.
This adds support for the AArch64 soft-float ABI. The specification for
this ABI was added by https://github.com/ARM-software/abi-aa/pull/232.
Because all existing AArch64 hardware has floating-point hardware, we
expect this to be a niche option, only used for embedded systems on
R-profile systems. We are going to document that SysV-like systems
should only ever use the base (hard-float) PCS variant:
https://github.com/ARM-software/abi-aa/pull/233. For that reason, I've
not added an option to select the ABI independently of the FPU hardware,
instead the new ABI is enabled iff the target architecture does not have
an FPU.
For testing, I have run this through an ABI fuzzer, but since this is
the first implementation it can only test for internal consistency
(callers and callees agree on the PCS), not for conformance to the ABI
spec.
The dot is too confusing for tools. Output temporaries would have
'10.3-generic' so tools could parse it as an extension, device libs &
the associated clang driver logic are also confused by the dot.
After discussions, we decided it's better to just remove the '.' from
the target name than fix each issue one by one.
This patch changes how the macro __ARM_ARCH is defined to match its
defintion in the ACLE. In ACLE 5.4.1, __ARM_ARCH is defined as equal to
the major architecture version for ISAs up to and including v8. From
v8.1 onwards, its definition is changed to include minor versions, such
that for an architecture vX.Y, __ARM_ARCH = X*100 + Y. Before this
patch, LLVM defined __ARM_ARCH using only the major architecture version
for all architecture versions. This patch adds functionality to define
__ARM_ARCH correctly for architectures greater than or equal to v8.1.
These generic targets include multiple GPUs and will, in the future,
provide a way to build once and run on multiple GPU, at the cost of less
optimization opportunities.
Note that this is just doing the compiler side of things, device libs an
runtimes/loader/etc. don't know about these targets yet, so none of them
actually work in practice right now. This is just the initial commit to
make LLVM aware of them.
This contains the documentation changes for both this change and #76954
as well.
This mechanism is introduced by #68324.
This refactor makes the prototype and attributes clear.
Reviewers: asb, kito-cheng, philnik777, topperc, preames
Reviewed By: topperc
Pull Request: https://github.com/llvm/llvm-project/pull/80280
Add a SPIR-V target-specific intrinsic for creating handles, which is
used for lowering HLSL resources types like RWBuffer.
`llvm/lib/TargetParser/Triple.cpp`: SPIR-V intrinsics use "spv" as the
target prefix, not "spirv". As far as I can tell, this is the first one
that is used via the `CGBuiltin` codepath, which relies on
`getArchTypePrefix`, so I've corrected it here.
`clang/lib/Basic/Targets/SPIR.h`: When records are laid out in the
lowering from AST to IR, they were incorrectly offset because these
Pointer attributes were defaulting to 32.
Related to #81036
The new experimental calling convention preserve_none is the opposite
side of existing preserve_all. It tries to preserve as few general
registers as possible. So all general registers are caller saved
registers. It can also uses more general registers to pass arguments.
This attribute doesn't impact floating-point registers. Floating-point
registers still follow the c calling convention.
Currently preserve_none is supported on X86-64 only. It changes the c
calling convention in following fields:
* RSP and RBP are the only preserved general registers, all other
general registers are caller saved registers.
* We can use [RDI, RSI, RDX, RCX, R8, R9, R11, R12, R13, R14, R15, RAX]
to pass arguments.
It can improve the performance of hot tailcall chain, because many
callee saved registers' save/restore instructions can be removed if the
tail functions are using preserve_none. In my experiment in protocol
buffer, the parsing functions are improved by 3% to 10%.
__ARM_STATE_ZA and __ARM_STATE_ZT0 are set when the compiler can parse
the "za" and "zt0" strings in the SME attributes.
__ARM_FEATURE_SME and __ARM_FEATURE_SME2 are set when the compiler can
generate code for attributes with "za" and "zt0" state, respectively.
__ARM_FEATURE_LOCALLY_STREAMING is set when the compiler supports the
__arm_locally_streaming attribute.
This enables specifing "za" or "zt0" to the clobber list for inline asm.
This complies with the acle SME addition to the asm extension here:
https://github.com/ARM-software/acle/pull/276
GCC has supported a generic constraint "s" for a long time (since at
least 1992), which references a symbol or label with an optional
constant offset. "i" is a superset that also supports a constant
integer.
GCC's RISC-V port also supports a machine-specific constraint "S",
which cannot be used with a preemptible symbol. (We don't bother to
check preemptibility.) In PIC code, an external symbol is preemptible by
default, making "S" less useful if you want to create an artificial
reference for linker garbage collection, or define sections to hold
symbol addresses:
```
void fun();
// error: impossible constraint in ‘asm’ for riscv64-linux-gnu-gcc -fpie/-fpic
void foo() { asm(".reloc ., BFD_RELOC_NONE, %0" :: "S"(fun)); }
// good even if -fpie/-fpic
void foo() { asm(".reloc ., BFD_RELOC_NONE, %0" :: "s"(fun)); }
```
This patch adds support for "s". Modify https://reviews.llvm.org/D105254
("S") to handle multi-depth GEPs (https://reviews.llvm.org/D61560).
Add AEK_PAUTH to ARMV8_3A in TargetParser and let it propagate to
ARMV8R, as it aligns with GCC defaults.
After adding AEK_PAUTH, several tests from TargetParserTest.cpp crashed
when trying to format an error message, thus update a format string in
AssertSameExtensionFlags to account for bitmask being pre-formatted as
std::string.
The CHECK-PAUTH* lines in aarch64-target-features.c are updated to
account for the fact that FEAT_PAUTH support and pac-ret can be enabled
independently and all four combinations are possible.
This updates clang's target defines to include the ACLE changes covering
the FEAT_PAuth_LR architecture extension.
The changes include:
* The new `__ARM_FEATURE_PAUTH_LR` feature macro, which is set to 1 when
FEAT_PAuth_LR is available in the target.
* A new bit field for the existing `__ARM_FEATURE_PAC_DEFAULT` macro,
indicating the use of PC as a diversifier for Pointer Authentication
(from -mbranch-protection=pac-ret+pc).
The approved changes to the ACLE spec can be found here:
https://github.com/ARM-software/acle/pull/292
Summary:
The NVPTX tools require an architecture to be used, however if we are
creating generic LLVM-IR we should be able to leave it unspecified. This
will result in the `target-cpu` attributes not being set on the
functions so it can be changed when linked into code. This allows the
standalone `--target=nvptx64-nvidia-cuda` toolchain to create LLVM-IR
simmilar to how CUDA's deviceRTL looks from C/C++
Summary:
Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means
to create a sort of "generic" IR. The resulting IR will not contain any
target dependent attributes and can then be inserted into another
program via `-mlink-builtin-bitcode` to inherit its attributes.
However, there are a handful of macros that can leak incorrect
information when compiling for an unspecified architecture. Currently,
things like the wavefront size will default to 64, which is actually
variable. We should not expose these macros unless it is known.
This reverts commit c9a6e993f7b349405b6c8f9244cd9cf0f56a6a81.
This breaks HIP code that incorrectly depended on GPU-specific macros to
be set. The code is totally wrong as using `__WAVEFRTONSIZE__` on the
host is absolutely meaningless, but it seems this entire corner of the
toolchain is fundmentally broken. Reverting for now to avoid breakages.
Summary:
Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means
to create a sort of "generic" IR. The resulting IR will not contain any
target dependent attributes and can then be inserted into another
program via `-mlink-builtin-bitcode` to inherit its attributes.
However, there are a handful of macros that can leak incorrect
information when compiling for an unspecified architecture. Currently,
things like the wavefront size will default to 64, which is actually
variable. We should not expose these macros unless it is known.
When this option is passed to clang, external (and/or weak) symbols
are not assumed to have the minimum ABI alignment normally required.
Symbols defined locally that are not weak are however still given the
minimum alignment.
This is implemented by passing a new parameter to getMinGlobalAlign()
named HasNonWeakDef that is used to return the right alignment value.
This is needed when external symbols created from a linker script may
not get the ABI minimum alignment and must therefore be treated as
unaligned by the compiler.
This patch disallows the use of the -maix-small-local-exec-tls and
-fno-data-sections options within clang, and also disallows the use of
the aix-small-local-exec-tls attribute with the -data-sections=false
option in llc.
This is because having data sections off when using the
aix-small-local-exec-tls feature is not ideal for performance. As the
small-local-exec-tls region is a limited resource, this space should not
used for variables that may be replaced.
Note, that on AIX, data sections is turned on by default, so this patch
makes it so that a diagnostic is emitted when users explicitly turn off
data sections while using the aix-small-local-exec-tls feature.
Make __builtin_cpu_{init|supports|is} target independent and provide an
opt-in query for targets that want to support it. Each target is still
responsible for their specific lowering/code-gen. Also provide code-gen
for PowerPC.
I originally proposed this in https://reviews.llvm.org/D152914 and this
addresses the comments I received there.
---------
Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn>
Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>
Moved from https://reviews.llvm.org/D126302
The current behaviour with these three options is quite undesirable:
-mno-altivec -mvsx allows VSX to override no Altivec, thereby turning on
both
-msoft-float -maltivec causes a crash if an actual Altivec instruction
is required because soft float turns of Altivec
-msoft-float -mvsx is also accepted with both Altivec and VSX turned off
(potentially causing crashes as above)
This patch diagnoses these impossible combinations in the driver so the
user does not end up with surprises in terms of their options being
ignored or silently overridden.
Fixes https://github.com/llvm/llvm-project/issues/55556
---------
Co-authored-by: Nemanja Ivanovic <nemanja.i.ibm@gmail.com>
Add support for specifying the logical SPIR-V target environment in the
triple as Vulkan. When compiling HLSL, this replaces the DirectX Shader
Model with a Vulkan environment instead.
Currently, the only supported combinations of SPIR-V version and Vulkan
environment are:
- Vulkan 1.2 and SPIR-V 1.5
- Vulkan 1.3 and SPIR-V 1.6
Fixes#70051
Currently there are several bits of code in the AArch64 driver which
attempt to enforce dependencies between optional features in the -march=
and -mcpu= options. However, these are based on the list of feature
names being enabled/disabled, so they have a lot of logic to consider
the order in which features were turned on and off, which doesn't scale
well as dependency chains get longer.
This patch moves the code handling these dependencies to TargetParser,
and changes them to use a Bitset of enabled features. This makes it easy
to check which features are enabled, and is converted back to a list of
LLVM feature names once all of the command-line options are parsed.
The motivating example for this was the -mcpu=cortex-r82+nofp option.
Previously, the code handling the dependency between the fp16 and
fp16fml extensions did not consider the nofp modifier, so it added
+fullfp16 to the feature list. This should have been disabled by the
+nofp modifier, and also the backend did follow the dependency between
fullfp16 and fp, resulting in fp being turned back on in the backend.
Most of the dependencies added to AArch64TargetParser.h weren't known
about by clang before, I built that list by checking what the backend
thinks the dependencies between SubtargetFeatures are.
Printing the raw symbol is useful in inline asm (e.g. getting the C++
mangled name, referencing a symbol in a custom way while ensuring it is
not optimized out even if internal). Similar constraints are available
in other targets (e.g. "S" for aarch64/riscv, "Cs" for m68k).
```
namespace ns { extern int var, a[4]; }
void foo() {
asm(".pushsection .xxx,\"aw\"; .dc.a %p0; .popsection" :: "Ws"(&ns::var));
asm(".reloc ., BFD_RELOC_NONE, %p0" :: "Ws"(&ns::a[3]));
}
```
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105576
This patch reworks RISCVTargetInfo::initFeatureMap to fix the issue
described
in
https://github.com/llvm/llvm-project/pull/74889#pullrequestreview-1773445559
(and is an alternative to #75804)
When a full arch string is specified, a "full" list of extensions is now
passed
after the __RISCV_TargetAttrNeedOverride marker feature, which includes
any
negative features that disable ISA extensions.
In initFeatureMap, there are now two code paths:
1. If the arch string was overriden, use the "full" list of override
features,
only adding back any non-isa features that were specified.
Using the full list of positive and negative features will mean that the
target-cpu will have no effect on the final arch, e.g.
__attribute__((target("arch=rv64i"))) with -mcpu=sifive-x280 will have
the
features for rv64i, not a mix of both.
2. Otherwise, parse and *append* the list of implied features. By
appending, we
turn back on any features that might have been disabled by a negative
extension, i.e. this handles the case fixed in #74889.
This commit includes the necessary changes to clang and LLVM to support
codegen of `RVE` and the `ilp32e`/`lp64e` ABIs.
The differences between `RVE` and `RVI` are:
* `RVE` reduces the integer register count to 16(x0-x16).
* The ABI should be `ilp32e` for 32 bits and `lp64e` for 64 bits.
`RVE` can be combined with all current standard extensions.
The central changes in ilp32e/lp64e ABI, compared to ilp32/lp64 are:
* Only 6 integer argument registers (rather than 8).
* Only 2 callee-saved registers (rather than 12).
* A Stack Alignment of 32bits (rather than 128bits).
* ilp32e isn't compatible with D ISA extension.
If `ilp32e` or `lp64` is used with an ISA that has any of the registers
x16-x31 and f0-f31, then these registers are considered temporaries.
To be compatible with the implementation of ilp32e in GCC, we don't use
aligned registers to pass variadic arguments and set stack alignment\
to 4-bytes for types with length of 2*XLEN.
FastCC is also supported on RVE, while GHC isn't since there is only one
avaiable register.
Differential Revision: https://reviews.llvm.org/D70401
-mbranch-protection=gcs (enabled by -mbranch-protection=standard) causes
generated objects to be marked with the gcs feature. This is done via
the guarded-control-stack module flag, in a similar way to
branch-target-enforcement and sign-return-address.
Enabling GCS causes the GNU_PROPERTY_AARCH64_FEATURE_1_GCS bit to be set
on generated objects. No code generation changes are required, as GCS
just requires that functions are called using BL and returned from using
RET (or other similar variant instructions), which is already the case.
We have two structs for representing the version of an extension in
RISCVISAInfo, RISCVExtensionInfo and RISCVExtensionVersion, both
with the exact same fields. This patch deduplicates them.
Since Knight Landing and Knight Mill microarchitectures are EOL, we
would like to remove intrinsic supports for its specific ISA in LLVM 19.
In LLVM 18, we will first emit a warning for the usage.
toFeatures and toFeatureVector both output a list of target feature
flags, just with a slightly different interface. toFeatures keeps any
unsupported extensions, and also provides a way to append negative
extensions (AddAllExtensions=true).
This patch combines them into one function, so that a later patch will
be be able to get a std::vector of features that includes all the
negative extensions, which was previously only possible through the
StrAlloc interface.
collectNonISAExtFeature was returning any negative extension features,
e.g.
given an input of
+zifencei,+m,+a,+save-restore,-zbb,-relax,-zfa
It would return
+save-restore,-zbb,-relax,-zfa
Because negative extensions aren't emitted when calling
toFeatureVector(), and
so were considered missing. Hence why we still see "-zfa" and "-zfb" in
the tests for
the full arch string attributes, even though with a full arch string we
should be overriding the extensions.
This fixes it by using RISCVISAInfo::isSupportedExtensionFeature instead
to
check if a feature is an ISA extension.