Clang has some code which is doing a direct arch name
string compare which should really be recognizing anything
usable as a triple architecture. It makes more sense to
directly parse the architecture than to construct a temporary
triple just to see what the parsed arch is.
For some reason the existing public parsing method is
getArchTypeForLLVMName. I'm not fully sure what the difference
between the 2 is supposed to be. My current guess is
getArchTypeForLLVMName is only supposed to handle the
canonical architecture name.
Summary:
This PR simply changes the behavior of the `wchar_size` flag. Currently,
we emit this in all cases for all targets. This causes problems during
LLVM-IR linking, specifically because this would vary between Linux and
Windows in unintuitive ways. Now we have an llvm::Triple helper to
determine the size from the known values. The module flag will only be
emitted if these do not match (indicating a non-standard environment).
In addition to fixing AMDGCN bitcode linking, this also means we don't
need to bloat *every* IR module compiled by clang with this flag. The
changed tests reflects this, one less unnecessary piece of metadata.
Introduce two new subtarget features:
- WMMA256bInsts for GFX11 WMMA instructions and
- WMMA128bInsts for GFX1170 and GFX12 WMMA and SWMMAC instructions
Some WMMA instructions have changed from GFX 11.0 to GFX 11.7 so new
Real versions were added with "_gfx1170" suffix. For consistency all
WMMA and SWMMAC GFX11.7 instructions use this suffix.
To resolve decoding issues between different formats for some WMMA
instructions between GFX 11 and GFX 11.7, new decoding tables were
added.
This pushes some of our simplifications to extension dependencies into
other parts of RISCVISAInfo and into the tablegen predicates.
The key affected pieces are:
- Error messages around Zcd incompatibilities now reference only `zcd`.
- We now have a big list of extensions that are rv32-only.
There are no instructions in the Xqci extension itself, it is just an
alias of a group. If we have all the items in the group, then we should
add `xqci` to the list of extensions we have.
This helps with multilib matching.
Reland #174731, resolve cyclic dependency issue.
The use of LLVM_Object in LLVM_Util would cause cyclic dependency.
Fix cyclic dependency by reimplement `getFeatureSetFromEFlag()`.
Original description:
---
This PR updates llvm-objdump to detect the specific AVR architecture
from the ELF header flags when no specific CPU is provided.
Fixes: https://github.com/llvm/llvm-project/issues/146451
Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
This patch adds initial support for AMD Zen 6 architecture (znver6):
- Added znver6 CPU target recognition in Clang and LLVM
- Updated compiler-rt CPU model detection for znver6
- Added znver6 to target parser and host CPU detection
- Added znver6 to various optimizer tests
znver6 features: FP16, AVXVNNIINT8, AVXNECONVERT, AVXIFMA (without BMM).
Reverts llvm/llvm-project#174731 due to introducing a cyclic dependency
when building LLVM with modules enabled: LLVM_Utils -> LLVM_Object ->
LLVM_Utils
This PR updates llvm-objdump to detect the specific AVR architecture
from the ELF header flags when no specific CPU is provided.
Fixes: #146451
---------
Signed-off-by: RuoyuQiu <cabbaken@outlook.com>
Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
Co-authored-by: qiuruoyu <qiuruoyu@hygon.cn>
This patch proposes new a tuning feature string format that helps users
to build a performance model by "configuring" an existing tune CPU,
along with its scheduling model. For example, this string
```
"sifive-x280:single-element-vec-fp64"
```
takes ``sifive-x280`` as the "base" tune CPU and configured it with
``single-element-vec-fp64``. This gives us a performance model that
looks exactly like that of ``sifive-x280``, except some of the 64-bit
vector floating point instructions now produce only a single element per
cycle due to ``single-element-vec-fp64``.
This string could eventually be used in places like ``-mtune`` at the
frontend. Right now, this patch only implements the parser part, which
is put under the TargetParser library.
The grammar for this string is:
```
tune-cpu ::= 'tuning CPU name in lower case'
directive ::= "[a-zA-Z0-9_-]+"
tune-features ::= directive ["," directive]*
```
A *directive* can and can only _enable_ or _disable_ a certain tuning
feature from the tuning CPU. A **positive directive**, like the
``single-element-vec-fp64`` we just saw, enables an additional tuning
feature in the associated tuning model.
A **negative directive**, on the other hand, removes a certain tuning
feature. For example, ``sifive-x390`` already has the
``single-element-vec-fp64`` feature, and we can use
"sifive-x390:no-single-element-vec-fp64" to create a new performance
model that looks nearly the same as ``sifive-x390`` except
``single-element-vec-fp64`` being cut out. In this case,
``no-single-element-vec-fp64`` is a negative directive.
There are additional restrictions on what we can put in the list of
directives, please refer to the documentations for more details.
Right now, this string only accepts directives that are explicitly
supported by the tune CPU. For example, "sifive-x280:prefer-w-inst" is
not a valide string as ``prefer-w-inst`` is not supported by
``sifive-x280`` at this moment. Vendors of these processors are expected
to maintain the compatibility of their supported directives across
different versions.
---------
Co-authored-by: Sam Elliott <aelliott@qti.qualcomm.com>
Make a Triple::OSType to support a generic "firmware" OS that isn't bare
metal, but isn't tied to a specific hardware platform like macOS or iOS.
Hook up support for the new OSType in the Darwin toolchain.
The PUSH2/POP2/PPX instructions for APX require updates to the Microsoft
Windows OS x64 calling convention documented at
https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170
due to lack of suitable unwinder opcodes that can support APX
PUSH2/POP2/PPX.
The PR request disables this support by default for code robustness;
workloads that choose to explicitly enable this support can change the
default behavior by explicitly specifying the flag options that enable
this support e.g. for experimentation or code paths that do not need
unwinder support.
Currently, most extensions controlled through -march and -mcpu options
are handled in a bitset of AArch64::ExtensionSet. However, extensions
detected at runtime for native compilation are handled in a separate
list of CPU features; once most of the parsing logic has run, the bitset
is converted to a feature list, added after the features detected at
runtime, and the resulting list is used from there on out.
This has the downside that runtime-detected features are unable to
override default CPU extensions. For example, if a CPU enables +aes in
its processor definition, but aes support is not detected at runtime,
the feature currently remains enabled---even though
unsupported---because default features are enabled after the runtime
logic attempts to disable them.
This patch inserts runtime-detected features directly into the extension
set such that these options can take precedence over extensions enabled
by default. The general parsing order for mcpu=native becomes:
1. CPU defaults;
2. Runtime detection;
3. +featureA+nofeatureB options;
4. Other parsing decisions.
This allows features that are found to be unsupported at runtime to be
removed from the list of features supported by targets that enable them
by default.
While at it, this also disables rng if not detected at runtime.
We should not allow `-wavefrontsize32` and `-wavefrontsize64` to be
specified at the same time. We should also not allow `-wavefrontsize32`
on a target that only supports `wavefrontsize32`, and the vice versa.
This adds code to recognize "wasm32-wasip1", "wasm32-wasip2", and
"wasm32-wasip3" as explicit targets, and adds a deprecation warning when
the "wasm32-wasi" target is used, pointing users to the "wasm32-wasip1"
target.
Fixes#165344.
I'm filing this as a draft PR for now, as I've only just now proposed to
make this change in #165344.
This change implements the conditional "Zca implies C" rule to match
GCC's behavior (PR119122) and the RISC-V specification for MISA.C.
The rule is:
- For RV32:
- No F and no D: Zca alone implies C
- F but no D: Zca + Zcf implies C
- F and D: Zca + Zcf + Zcd implies C
- For RV64:
- No D: Zca alone implies C
- D: Zca + Zcd implies C
This fixes multilib matching issues where LLVM-generated march strings
didn't include the C extension when GCC's multilib configurations
expected it.
Reference:
- GCC PR119122: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119122
- RISC-V Zc spec:
https://github.com/riscv/riscv-isa-manual/blob/main/src/zc.adoc
Signed-off-by: Jerry Zhang Jian <jerry.zhangjian@sifive.com>
This PR is the first step towards introducing LFI into LLVM as a new
sub-architecture backend of AArch64. For details, please see the
[RFC](https://discourse.llvm.org/t/rfc-lightweight-fault-isolation-lfi-efficient-native-code-sandboxing-upstream-lfi-target-and-compiler-changes/88380),
which has been approved for AArch64.
This patch creates the `aarch64_lfi` architecture, and marks the
appropriate registers as reserved when it is targeted (`x25`, `x26`,
`x27`, `x28`). It also adds a Clang driver toolchain for targeting LFI,
and updates the compiler-rt CMake to allow builds for the `aarch64_lfi`
target. The patch also includes documentation for LFI and the rewrites
that will be implemented in future patches.
I am planning to split the relevant modifications for LFI into a series
of patches, organized as described below (after this one). Please let me
know if you'd like me to split the changes in a different way, or
provide one big patch.
1. The next patch will introduce the `MCLFIExpander` mechanism for
applying the MC-level rewrites needed by LFI, along with the
`.lfi_expand` and `.lfi_no_expand` assembly directives when targeting
LFI. A preview can be seen on the `lfi-project`
[fork](https://github.com/llvm/llvm-project/compare/main...lfi-project:llvm-project:lfi-patchset/aarch64-pr-2).
2. The following patch will create an `MCLFIExpander` for the AArch64
backend that performs LFI expansions. This patch will contain the
majority of the LFI-specific logic.
3. The final patch will add an optimization to the rewriter that can
eliminate redundant guard instructions that occur within the same basic
block.
We plan to introduce x86-64 support after further discussion and once
the `MCLFIExpander` infrastructure is in place.
Please let me know your feedback, and thank you very much for your help
and guidance in the review process.
This corrects a wrong condition for avx10 (AVX10Ver is always set to
0/1) and corrects how CPUID for avx10 is queried: per ISE table 1-3 we
should query with EAX = 0x24 and ECX = 0x0 -- previously we omitted the
latter.
Issue reported by user Seraphimt here
https://discourse.llvm.org/t/test-for-sys-gethostcpufeatures/89130
Mostly adding feature flags from the newest SDK.
(Note that in addition to the obvious, this also affects the compiler-rt
SME ABI routines, which rely on FEAT_SME and FEAT_SME2.)
Add `f64:32:64` to the data layout for AIX, to indicate that doubles
have a 32-bit ABI alignment and 64-bit preferred alignment.
Clang was already taking this into account, but it was not reflected in
LLVM's data layout.
A notable effect of this change is that `double` loads/stores with 4
byte alignment are no longer considered "unaligned" and avoid the
corresponding unaligned access legalization. I assume that this is
correct/desired for AIX. (The codegen previously already relied on this
in some places related to the call ABI simply by dint of assuming
certain stack locations were 8 byte aligned, even though they were only
actually 4 byte aligned.)
Fixes https://github.com/llvm/llvm-project/issues/133599.
Refactor some logic in transferBefore to handle hasSEWLMULRatioOnly()
before calling getSEW/getLMUL.
Update adjustIncoming to use getSEWLMULRatio(). Update the interface of
RISCVVType::getSameRatioLMUL to take the ratio instead of SEW and LMUL.
Update the few other callers to call RISCVVType::getSEWLMULRatio first.
Implements https://github.com/ARM-software/acle/pull/404
This allows the user to specify "featA+featB;priority=[1-255]" where
priority=255 means highest priority. If the explicit priority string is
omitted then the priority of "featA+featB" is implied, which is lower
than priority=1.
Internally this gets expanded using special FMV features P0 ... P7 which
can encode up to 256-1 priority levels (excluding all zeros). Those do
not have corresponding detection bit at pos FEAT_#enum so I made this
field optional in FMVInfo. Also they don't affect the codegen or name
mangling of versioned functions.