Currently, the more features a version has, the higher its priority is.
We are changing ACLE https://github.com/ARM-software/acle/pull/370 as
follows:
"Among any two versions, the higher priority version is determined by
identifying the highest priority feature that is specified in exactly
one of the versions, and selecting that version."
Reverts llvm/llvm-project#118734
There are currently some specific versions of MSVC that are miscompiling
this code (we think). We don't know why as all the other build bots and
at least some folks' local Windows builds work fine.
This is a candidate revert to help the relevant folks catch their
builders up and have time to debug the issue. However, the expectation
is to roll forward at some point with a workaround if at all possible.
The Clang binary (and any binary linking Clang as a library), when built
using PIE, ends up with a pretty shocking number of dynamic relocations
to apply to the executable image: roughly 400k.
Each of these takes up binary space in the executable, and perhaps most
interestingly takes start-up time to apply the relocations.
The largest pattern I identified were the strings used to describe
target builtins. The addresses of these string literals were stored into
huge arrays, each one requiring a dynamic relocation. The way to avoid
this is to design the target builtins to use a single large table of
strings and offsets within the table for the individual strings. This
switches the builtin management to such a scheme.
This saves over 100k dynamic relocations by my measurement, an over 25%
reduction. Just looking at byte size improvements, using the `bloaty`
tool to compare a newly built `clang` binary to an old one:
```
FILE SIZE VM SIZE
-------------- --------------
+1.4% +653Ki +1.4% +653Ki .rodata
+0.0% +960 +0.0% +960 .text
+0.0% +197 +0.0% +197 .dynstr
+0.0% +184 +0.0% +184 .eh_frame
+0.0% +96 +0.0% +96 .dynsym
+0.0% +40 +0.0% +40 .eh_frame_hdr
+114% +32 [ = ] 0 [Unmapped]
+0.0% +20 +0.0% +20 .gnu.hash
+0.0% +8 +0.0% +8 .gnu.version
+0.9% +7 +0.9% +7 [LOAD #2 [R]]
[ = ] 0 -75.4% -3.00Ki .relro_padding
-16.1% -802Ki -16.1% -802Ki .data.rel.ro
-27.3% -2.52Mi -27.3% -2.52Mi .rela.dyn
-1.6% -2.66Mi -1.6% -2.66Mi TOTAL
```
We get a 16% reduction in the `.data.rel.ro` section, and nearly 30%
reduction in `.rela.dyn` where those reloctaions are stored.
This is also visible in my benchmarking of binary start-up overhead at
least:
```
Benchmark 1: ./old_clang --version
Time (mean ± σ): 17.6 ms ± 1.5 ms [User: 4.1 ms, System: 13.3 ms]
Range (min … max): 14.2 ms … 22.8 ms 162 runs
Benchmark 2: ./new_clang --version
Time (mean ± σ): 15.5 ms ± 1.4 ms [User: 3.6 ms, System: 11.8 ms]
Range (min … max): 12.4 ms … 20.3 ms 216 runs
Summary
'./new_clang --version' ran
1.13 ± 0.14 times faster than './old_clang --version'
```
We get about 2ms faster `--version` runs. While there is a lot of noise
in binary execution time, this delta is pretty consistent, and
represents over 10% improvement. This is particularly interesting to me
because for very short source files, repeatedly starting the `clang`
binary is actually the dominant cost. For example, `configure` scripts
running against the `clang` compiler are slow in large part because of
binary start up time, not the time to process the actual inputs to the
compiler.
----
This PR implements the string tables using `constexpr` code and the
existing macro system. I understand that the builtins are moving towards
a TableGen model, and if complete that would provide more options for
modeling this. Unfortunately, that migration isn't complete, and even
the parts that are migrated still rely on the ability to break out of
the TableGen model and directly expand an X-macro style `BUILTIN(...)`
textually. I looked at trying to complete the move to TableGen, but it
would both require the difficult migration of the remaining targets, and
solving some tricky problems with how to move away from any macro-based
expansion.
I was also able to find a reasonably clean and effective way of doing
this with the existing macros and some `constexpr` code that I think is
clean enough to be a pretty good intermediate state, and maybe give a
good target for the eventual TableGen solution. I was also able to
factor the macros into set of consistent patterns that avoids a
significant regression in overall boilerplate.
Currently we have code with target hooks in CodeGenModule shared between
X86 and AArch64 for sorting MultiVersionResolverOptions. Those are used
when generating IFunc resolvers for FMV. The RISCV target has different
criteria for sorting, therefore it repeats sorting after calling
CodeGenFunction::EmitMultiVersionResolver.
I am moving the FMV priority logic in TargetInfo, so that it can be
implemented by the TargetParser which then makes it possible to query it
from llvm. Here is an example why this is handy:
https://github.com/llvm/llvm-project/pull/87939
We have defined `__riscv_cpu_model` variable in #101449. It contains
`mvendorid`, `marchid` and `mimpid` fields which are read via system
call `sys_riscv_hwprobe`.
We can support `__builtin_cpu_is` via comparing values in compiler's
CPU definitions and `__riscv_cpu_model`.
This depends on #116202.
Reviewers: lenary, BeMg, kito-cheng, preames, lukel97
Reviewed By: lenary
Pull Request: https://github.com/llvm/llvm-project/pull/116231
This reverts commit b36fcf4f493ad9d30455e178076d91be99f3a7d8.
This reverts commit c11b6b1b8af7454b35eef342162dc2cddf54b4de.
This reverts commit 775148f2367600f90d28684549865ee9ea2f11be.
multiple bot build breakages, e.g. https://lab.llvm.org/buildbot/#/builders/3/builds/8076
We have defined `__riscv_cpu_model` variable in #101449. It contains
`mvendorid`, `marchid` and `mimpid` fields which are read via system
call `sys_riscv_hwprobe`.
We can support `__builtin_cpu_is` via comparing values in compiler's
CPU definitions and `__riscv_cpu_model`.
This depends on #116202.
Reviewers: lenary, BeMg, kito-cheng, preames, lukel97
Reviewed By: lenary
Pull Request: https://github.com/llvm/llvm-project/pull/116231
This patch adds support for getting even-odd general purpose register
pairs into and out of inline assembly using the `R` constraint as
proposed in riscv-non-isa/riscv-c-api-doc#92
There are a few different pieces to this patch, each of which need their
own explanation.
- Renames the Register Class used for f64 values on rv32i_zdinx from
`GPRPair*` to `GPRF64Pair*`. These register classes are kept broadly
unmodified, as their primary value type is used for type inference
over selection patterns. This rename affects quite a lot of files.
- Adds new `GPRPair*` register classes which will be used for `R`
constraints and for instructions that need an even-odd GPR pair. This
new type is used for `amocas.d.*`(rv32) and `amocas.q.*`(rv64) in
Zacas, instead of the `GPRF64Pair` class being used before.
- Marks the new `GPRPair` class legal as for holding a `MVT::Untyped`.
Two new RISCVISD node types are added for creating and destructing a
pair - `BuildGPRPair` and `SplitGPRPair`, and are introduced when
bitcasting to/from the pair type and `untyped`.
- Adds functionality to `splitValueIntoRegisterParts` and
`joinRegisterPartsIntoValue` to handle changing `i<2*xlen>` MVTs into
`untyped` pairs.
- Adds an override for `getNumRegisters` to ensure that `i<2*xlen>`
values, when going to/from inline assembly, only allocate one (pair)
register (they would otherwise allocate two). This is due to a bug in
SelectionDAGBuilder.cpp which other backends also work around.
- Ensures that Clang understands that `R` is a valid inline assembly
constraint.
- This also allows `R` to be used for `f64` types on `rv32_zdinx`
architectures, where doubles are stored in a GPR pair.
This change implements support for the `cr` and `cf` register
constraints (which allocate a RVC GPR or RVC FPR respectively), and the
`N` modifier (which prints the raw encoding of a register rather than
the name).
The intention behind these additions is to make it easier to use inline
assembly when assembling raw instructions that are not supported by the
compiler, for instance when experimenting with new instructions or when
supporting proprietary extensions outside the toolchain.
These implement part of my proposal in riscv-non-isa/riscv-c-api-doc#92
As part of the implementation, I felt there was not enough coverage of
inline assembly and the "in X" floating-point extensions, so I have
added more regression tests around these configurations.
Only allow GPR registers and verify the size is the same as XLen.
This fixes the crash seen in #109588 by making it a frontend error.
gcc does accept the code so we may need to consider if we can fix the
backend. Some other targets I tried appear to have similar issues so it
might not be straightforward to fix.
This patch enable the function multiversion(FMV) and `target_clones`
attribute for RISC-V target.
The proposal of `target_clones` syntax can be found at the
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/48 (which has
landed), as modified by the proposed
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85 (which adds the
priority syntax).
It supports the `target_clones` function attribute and function
multiversioning feature for RISC-V target. It will generate the ifunc
resolver function for the function that declared with target_clones
attribute.
The resolver function will check the version support by runtime object
`__riscv_feature_bits`.
For example:
```
__attribute__((target_clones("default", "arch=+ver1", "arch=+ver2"))) int bar() {
return 1;
}
```
the corresponding resolver will be like:
```
bar.resolver() {
__init_riscv_feature_bits();
// Check arch=+ver1
if ((__riscv_feature_bits.features[0] & BITMASK_OF_VERSION1) == BITMASK_OF_VERSION1) {
return bar.arch=+ver1;
} else {
// Check arch=+ver2
if ((__riscv_feature_bits.features[0] & BITMASK_OF_VERSION2) == BITMASK_OF_VERSION2) {
return bar.arch=+ver2;
} else {
// Default
return bar.default;
}
}
}
```
This patch makes unsupported target attributes emit a warning and ignore
the target attribute during semantic checks. The changes include:
1. Adding the RISCVTargetInfo::isValidFeatureName function.
2. Rejecting non-full-arch strings in the handleFullArchString function.
3. Adding test cases to demonstrate the warning behavior.
This patch aims to replace the target attribute override mechanism based
on `__RISCV_TargetAttrNeedOverride` with the insertion of several
negative target features
When the target attribute uses the full architecture string
("arch=rv64gc") or specifies the CPU ("cpu=rocket-rv64") as the version,
it will override the module-level target feature. Currently, this
mechanism is implemented by inserting `__RISCV_TargetAttrNeedOverride`
as a dummy target feature immediately before the target attribute's
feature.
```
module target features + __RISCV_TargetAttrNeedOverride + target attribute's feature
```
The RISCVTargetInfo::initFeatureMap function will remove the "module
target features" and use only the "target attribute's features".
This patch changes the process as follows:
```
module target features + negative target feature for all supported extension + target attribute's feature
```
The `module target features` will be disable by `negative target feature
for all supported extension` in `TargetInfo::initFeatureMap`
The spec can be found at
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/74.
1. Add the new extension GroupID/Bitmask with latest hwprobe key.
2. Update the `initRISCVFeature `
3. Update `EmitRISCVCpuSupports` due to not only group0 now.
This implements the __builtin_cpu_init and __builtin_cpu_supports
builtin routines based on the compiler runtime changes in
https://github.com/llvm/llvm-project/pull/85790.
This is inspired by https://github.com/llvm/llvm-project/pull/85786.
Major changes are a) a restriction in scope to only the builtins (which
have a much narrower user interface), and the avoidance of false
generality. This change deliberately only handles group 0 extensions
(which happen to be all defined ones today), and avoids the tblgen
changes from that review.
I don't have an environment in which I can actually test this, but @BeMg
has been kind enough to report that this appears to work as expected.
Before this can make it into a release, we need a change such as
https://github.com/llvm/llvm-project/pull/99958. The gcc docs claim that
cpu_support can be called by "normal" code without calling the cpu_init
routine because the init routine will have been called by a high
priority constructor. Our current compiler-rt mechanism does not do
this.
This is largely a revert of commit
e81796671890b59c110f8e41adc7ca26f8484d20.
As #88029 shows, there exists hardware that only supports unaligned
scalar.
I'm leaving how this gets exposed to the clang interface to a future
patch.
[RISCV] RISCV vector calling convention (1/2)
This is the vector calling convention based on
https://github.com/riscv-non-isa/riscv-elf-psabi-doc,
the idea is to split between "scalar" callee-saved registers
and "vector" callee-saved registers. "scalar" ones remain the
original strategy, however, "vector" ones are handled together
with RVV objects.
The stack layout would be:
|--------------------------| <-- FP
| callee-allocated save |
| area for register varargs|
|--------------------------|
| callee-saved registers | <-- scalar callee-saved
| (scalar) |
|--------------------------|
| RVV alignment padding |
|--------------------------|
| callee-saved registers | <-- vector callee-saved
| (vector) |
|--------------------------|
| RVV objects |
|--------------------------|
| padding before RVV |
|--------------------------|
| scalar local variables |
|--------------------------| <-- BP
| variable size objects |
|--------------------------| <-- SP
Note: This patch doesn't contain "tuple" type, e.g. vint32m1x2.
It will be handled in https://github.com/riscv-non-isa/riscv-elf-psabi-doc (2/2).
Differential Revision: https://reviews.llvm.org/D154576
This mechanism is introduced by #68324.
This refactor makes the prototype and attributes clear.
Reviewers: asb, kito-cheng, philnik777, topperc, preames
Reviewed By: topperc
Pull Request: https://github.com/llvm/llvm-project/pull/80280
GCC has supported a generic constraint "s" for a long time (since at
least 1992), which references a symbol or label with an optional
constant offset. "i" is a superset that also supports a constant
integer.
GCC's RISC-V port also supports a machine-specific constraint "S",
which cannot be used with a preemptible symbol. (We don't bother to
check preemptibility.) In PIC code, an external symbol is preemptible by
default, making "S" less useful if you want to create an artificial
reference for linker garbage collection, or define sections to hold
symbol addresses:
```
void fun();
// error: impossible constraint in ‘asm’ for riscv64-linux-gnu-gcc -fpie/-fpic
void foo() { asm(".reloc ., BFD_RELOC_NONE, %0" :: "S"(fun)); }
// good even if -fpie/-fpic
void foo() { asm(".reloc ., BFD_RELOC_NONE, %0" :: "s"(fun)); }
```
This patch adds support for "s". Modify https://reviews.llvm.org/D105254
("S") to handle multi-depth GEPs (https://reviews.llvm.org/D61560).
This patch reworks RISCVTargetInfo::initFeatureMap to fix the issue
described
in
https://github.com/llvm/llvm-project/pull/74889#pullrequestreview-1773445559
(and is an alternative to #75804)
When a full arch string is specified, a "full" list of extensions is now
passed
after the __RISCV_TargetAttrNeedOverride marker feature, which includes
any
negative features that disable ISA extensions.
In initFeatureMap, there are now two code paths:
1. If the arch string was overriden, use the "full" list of override
features,
only adding back any non-isa features that were specified.
Using the full list of positive and negative features will mean that the
target-cpu will have no effect on the final arch, e.g.
__attribute__((target("arch=rv64i"))) with -mcpu=sifive-x280 will have
the
features for rv64i, not a mix of both.
2. Otherwise, parse and *append* the list of implied features. By
appending, we
turn back on any features that might have been disabled by a negative
extension, i.e. this handles the case fixed in #74889.
This commit includes the necessary changes to clang and LLVM to support
codegen of `RVE` and the `ilp32e`/`lp64e` ABIs.
The differences between `RVE` and `RVI` are:
* `RVE` reduces the integer register count to 16(x0-x16).
* The ABI should be `ilp32e` for 32 bits and `lp64e` for 64 bits.
`RVE` can be combined with all current standard extensions.
The central changes in ilp32e/lp64e ABI, compared to ilp32/lp64 are:
* Only 6 integer argument registers (rather than 8).
* Only 2 callee-saved registers (rather than 12).
* A Stack Alignment of 32bits (rather than 128bits).
* ilp32e isn't compatible with D ISA extension.
If `ilp32e` or `lp64` is used with an ISA that has any of the registers
x16-x31 and f0-f31, then these registers are considered temporaries.
To be compatible with the implementation of ilp32e in GCC, we don't use
aligned registers to pass variadic arguments and set stack alignment\
to 4-bytes for types with length of 2*XLEN.
FastCC is also supported on RVE, while GHC isn't since there is only one
avaiable register.
Differential Revision: https://reviews.llvm.org/D70401
We have two structs for representing the version of an extension in
RISCVISAInfo, RISCVExtensionInfo and RISCVExtensionVersion, both
with the exact same fields. This patch deduplicates them.
toFeatures and toFeatureVector both output a list of target feature
flags, just with a slightly different interface. toFeatures keeps any
unsupported extensions, and also provides a way to append negative
extensions (AddAllExtensions=true).
This patch combines them into one function, so that a later patch will
be be able to get a std::vector of features that includes all the
negative extensions, which was previously only possible through the
StrAlloc interface.
collectNonISAExtFeature was returning any negative extension features,
e.g.
given an input of
+zifencei,+m,+a,+save-restore,-zbb,-relax,-zfa
It would return
+save-restore,-zbb,-relax,-zfa
Because negative extensions aren't emitted when calling
toFeatureVector(), and
so were considered missing. Hence why we still see "-zfa" and "-zfb" in
the tests for
the full arch string attributes, even though with a full arch string we
should be overriding the extensions.
This fixes it by using RISCVISAInfo::isSupportedExtensionFeature instead
to
check if a feature is an ISA extension.
The RISC-V vector crypto extensions have been ratified. This patch
updates the Clang and LLVM support for these extensions to be
non-experimental, while leaving the C intrinsics as experimental since
the C intrinsics are not yet standardized.
Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
d80e46d added support for the target function attribute. However, it
turns out that commit has a nasty bug/oversight. As the tests in that
revision show, everything works if clang -cc1 is directly invoked. I was
suprised to learn this morning that compiling with clang (i.e. the
typical user workflow) did not work.
The bug is that if a set of explicit negative extensions is passed to
cc1 at the command line (as the clang driver always does), we were
copying these negative extensions to the end of the rewritten extension
list. When this was later parsed, this had the effect of turning back
off any extension that the target attribute had enabled.
This patch updates the logic to only propagate the features from the
input which don't appear in the rewritten form in either positive or
negative form.
Note that this code structure is still highly suspect. In particular I'm
fairly sure that mixing extension versions with this code will result in
odd results. However, I figure its better to have something which mostly
works than something which doesn't work at all.
When we'd originally added unaligned-scalar-mem and
unaligned-vector-mem, they were separated into two parts under the
theory that some processor might implement one, but not the other. At
the moment, we don't have evidence of such a processor. The C/C++ level
interface, and the clang driver command lines have settled on a single
unaligned flag which indicates both scalar and vector support unaligned.
Given that, let's remove the test matrix complexity for a set of
configurations which don't appear useful.
Given these are internal feature names, I don't think we need to provide
any forward compatibility. Anyone disagree?
Note: The immediate trigger for this patch was finding another case
where the unaligned-vector-mem wasn't being properly serialized to IR
from clang which resulted in problems reproducing assembly from clang's
-emit-llvm feature. Instead of fixing this, I decided getting rid of the
complexity was the better approach.
RISC-V C API introduced predefined macro to achieve hints about
unaligned accesses ([pr]). This patch defines __riscv_misaligned_fast
when using -mno-strict-align, otherwise, defines
__riscv_misaligned_avoid.
Note: This ignores __riscv_misaligned_slow which is also defined by
spec.
[pr]: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/40
Fix several instances of macros being defined multiple times
in several targets. Most of these are just simple duplication in a
TargetInfo or OSTargetInfo of things already defined in
InitializePredefinedMacros or InitializeStandardPredefinedMacros,
but there are a few that aren't:
* AArch64 defines a couple of feature macros for armv8.1a that are
handled generically by getTargetDefines.
* CSKY needs to take care when CPUName and ArchName are the same.
* Many os/target combinations result in __ELF__ being defined twice.
Instead define __ELF__ just once in InitPreprocessor based on
the Triple, which already knows what the object format is based
on os and target.
These changes shouldn't change the final result of which macros are
defined, with the exception of the changes to __ELF__ where if you
explicitly specify the object type in the triple then this affects
if __ELF__ is defined, e.g. --target=i686-windows-elf results in it
being defined where it wasn't before, but this is more accurate as an
ELF file is in fact generated.
Differential Revision: https://reviews.llvm.org/D150966
Now that codegen support for zhinx in landed (D149811), we should set
HasLegalHalfType=true for zhinx (see D145071 for the patch doing this
for zfh).
Differential Revision: https://reviews.llvm.org/D150777
The desired semantics for HasLegalHalfType are slightly unclear in that
the comment for HasLegalHalfType says "True if the backend supports
operations on the half LLVM IR type." Which operations? We get very
limited scalar operations with zfhmin, more with zfh, and vector support
with zvfh. While the comment for hasLegalHalfType() says "Determine
whether _Float16 is supported on this target."
This patch sets HasLegalHalfType to true for zfh.
Differential Revision: https://reviews.llvm.org/D145071
Similar for RISCV::parseTuneCPU and RISCV::checkTuneCPUKind.
This makes the CPUKind enum no longer part of the API. It wasn't
providing much value. It was only used to pass between the two
functions.
By removing it, we can remove a dependency on a tablegen generated
file from the RISCVTargetParser.h file. Then we can remove a
dependency from several CMakeLists.txt.
This allows the user to set the size of the scalable vector so they
can be used in structs and as the type of global variables. This works
by representing the type as a fixed vector instead of a scalable vector
in IR. Conversions to and from scalable vectors are made where necessary
like function arguments/returns and intrinsics.
This features has been requested here
https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/176
I know arm_sve_vector_bits is used by the Eigen library so this
could be used to port Eigen to RVV.
This patch adds a new preprocessor define `__riscv_v_fixed_vlen` that
is set when -mrvv_vector_bits is passed on the command line.
The code is largely based on the AArch64 code. A lot of code was
copy/pasted and then modiied to RVV. There may be some opportunities
for sharing.
This first patch only supports the LMUL=1 types. Additional changes
will be needed to support other LMULs. I have also not supported
mask vectors.
Differential Revision: https://reviews.llvm.org/D145088