This allows formatting large integers in a human friendly way. Example:
"5321584" -> "5.32M".
Use it where such human numbers are generated manually today.
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
Currently we maintain a hand written list of subtarget features which we
are implied for a given FMV feature. It is more robust to expand such
dependencies using ExtensionDependency from TargetParser, since that is
generated by tablegen. For this to work each FMV feature must have a
corresponding SubtargetFeature in place. FMV features which didn't
satisfy this criteria have been removed from the ACLE specification
(https://github.com/ARM-software/acle/pull/315). However, I deliberately
marked the ArchExtKind in FMVInfo structure as std::optional in case we
decide to break this rule in the future.
I have also added the missing dependencies:
* FEAT_DPB2 -> FEAT_DPB
* FEAT_FlagM2 -> FEAT_FlagM
This implements
https://discourse.llvm.org/t/rfc-add-support-for-controlling-diagnostics-severities-at-file-level-granularity-through-command-line/81292.
Users now can suppress warnings for certain headers by providing a
mapping with globs, a sample file looks like:
```
[unused]
src:*
src:*clang/*=emit
```
This will suppress warnings from `-Wunused` group in all files that
aren't under `clang/` directory. This mapping file can be passed to
clang via `--warning-suppression-mappings=foo.txt`.
At a high level, mapping file is stored in DiagnosticOptions and then
processed with rest of the warning flags when creating a
DiagnosticsEngine. This is a functor that uses SpecialCaseLists
underneath to match against globs coming from the mappings file.
This implies processing warning options now performs IO, relevant
interfaces are updated to take in a VFS, falling back to RealFileSystem
when one is not available.
So far, these macros can be used in contexts where no meaningful
wavefront size is available. We therefore deprecate these macros, to
replace them with a more resilient interface to access wavefront size
information where it is available.
Reapplies #112849 with a fix for the non-hermetic clang test that failed
on Mac after the revert in #115499.
For SWDEV-491529.
So far, these macros can be used in contexts where no meaningful
wavefront size is available. We therefore deprecate these macros, to
replace them with a more resilient interface to access wavefront size
information where it is available.
For SWDEV-491529.
`EmitClangAttrSpellingListIndex()` performs a lot of unnecessary string
comparisons which is wasteful in time and stack space. This commit
attempts to refactor this method to be more performant.
Summary:
The cache line size on AMDGPU varies between 64 and 128 (The lowest L2
cache also goes to 256 on some architectures.) This macro is intended to
present a size that will not cause destructive interference, so we
choose the larger of those values.
The 'allocator' modifier is now accepted in the 'allocate' clause. Added
LIT tests covering codegen, PCH, template handling, and serialization
for 'allocator' modifier.
Added support for allocator-modifier to release notes.
Testing
- New allocate modifier LIT tests.
- OpenMP LIT tests.
- check-all
- relevant sollve_vv test cases
tests/5.2/scope/test_scope_allocate_construct.c
SPIR-V doesn't currently encode "native" integer bit-widths in its
datalayout(s). This is problematic as it leads to optimisation passes,
such as InstCombine, getting ideas and e.g. shrinking to non
byte-multiple integer types, which is not desirable and can lead to
breakage further down in the toolchain. This patch addresses that by
encoding `i8`, `i16`, `i32` and `i64` as native types for vanilla SPIR-V
(the spec natively supports them), and `i32` and `i64` for AMDGCNSPIRV
(where the hardware targets are known). We also set the stack alignment
on the latter, as it is overaligned (32-bit vs 8-bit).
This starts moving `X86Builtins.def` to be a tablegen file. It's quite
large, so I think it'd be good to move things in multiple steps to avoid
a bunch of merge conflicts due to the amount of time this takes to
complete.
Removes sve-bf16, sve-ebf16, and sve-i8mm since they are obsolete. One
could write target_version("sve+bf16") instead of sve-bf16 for instance.
Approved in ACLE as https://github.com/ARM-software/acle/pull/353
Enables the support of `-fcf-protection=return` on RISC-V, which
requires Zicfiss. It also adds a string attribute "hw-shadow-stack"
to every function if the option is set on RISC-V
This patch avoids eagerly populating the submodule index on `Module`
construction. The `StringMap` allocation shows up in my profiles of
`clang-scan-deps`, while the index is not necessary most of the time. We
still construct it on-demand.
Moreover, this patch avoids performing qualified submodule lookup in
`ASTReader` whenever we're serializing a module graph whose top-level
module is unknown. This is pointless, since that's guaranteed to never
find any existing submodules anyway.
This speeds up `clang-scan-deps` by ~0.5% on my workload.
This patch shrinks the size of the `Module` class from 2112B to 1624B. I
wasn't able to get a good data on the actual impact on memory usage, but
given my `clang-scan-deps` workload at hand (with tens of thousands of
instances), I think there should be some win here. This also speeds up
my benchmark by under 0.1%.
This commit implements the [wide-arithmetic] proposal which has recently
reached phase 2 in the WebAssembly proposals process. The goal here is
to implement support in LLVM for emitting these instructions which are
gated behind a new feature flag by default. A new `wide-arithmetic`
feature flag is introduced which gates these four new instructions from
being emitted.
Emission of each instruction itself is relatively simple given LLVM's
preexisting lowering rules and infrastructure. The main gotcha is that
due to the multi-result nature of all of these instructions it needed
the lowerings to be implemented in C++ rather than in TableGen.
[wide-arithmetic]: https://github.com/WebAssembly/wide-arithmetic
Two options for clang: -mlam-bh & -mno-lam-bh.
Enable or disable amswap[__db].{b/h} and amadd[__db].{b/h} instructions.
The default is -mno-lam-bh.
Only works on LoongArch64.
In `clang-scan-deps`, we're creating lots of `Module` instances.
Allocating them all in a bump-pointer allocator reduces the number of
retired instructions by 1-1.5% on my workload.
Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use
`private` as the default address space. This is wrong for cases where
the `generic` address space is available, and is corrected via this
patch. In general, this AS map abuse is a bad hack and we should re-work
it altogether, but at least after this patch we will stop being
incorrect for e.g. OpenCL 2.0.
If we split these features in the compiler (see relevant pull request
https://github.com/llvm/llvm-project/pull/109299), we would only be able
to hand-write a 'memtag2' version using inline assembly since the
compiler cannot generate the instructions that become available with
FEAT_MTE2. However these instructions only work at Exception Level 1, so
they would be unusable since FMV is a user space facility. I am
therefore unifying them.
Approved in ACLE as https://github.com/ARM-software/acle/pull/351
This patch adds an IsText parameter to the following getBufferForFile,
getBufferForFileImpl. We introduce a new virtual function
openFileForReadBinary which defaults to openFileForRead except in
RealFileSystem which uses the OF_None flag instead of OF_Text.
The default is set to OF_Text instead of OF_None, this change in value
does not affect any other platforms other than z/OS. Setting this
parameter correctly is required to open files on z/OS in the correct
encoding. The IsText parameter is based on the context of where we open
files, for example, in the ASTReader, HeaderMap requires that files
always be opened in binary even though they might be tagged as text.
This change implements support for the `cr` and `cf` register
constraints (which allocate a RVC GPR or RVC FPR respectively), and the
`N` modifier (which prints the raw encoding of a register rather than
the name).
The intention behind these additions is to make it easier to use inline
assembly when assembling raw instructions that are not supported by the
compiler, for instance when experimenting with new instructions or when
supporting proprietary extensions outside the toolchain.
These implement part of my proposal in riscv-non-isa/riscv-c-api-doc#92
As part of the implementation, I felt there was not enough coverage of
inline assembly and the "in X" floating-point extensions, so I have
added more regression tests around these configurations.
MSVC has a set of qualifiers to allow using 32-bit signed/unsigned
pointers when building 64-bit targets. This is useful for WoW code
(i.e., the part of Windows that handles running 32-bit application on a
64-bit OS). Currently this is supported on x64 using the 270, 271 and
272 address spaces, but does not work for AArch64 at all.
This change adds the same 270, 271 and 272 address spaces to AArch64 and
adjusts the data layout string accordingly. Clang will generate the
correct address space casts, but these will currently be ignored until
the AArch64 backend is updated to handle them.
Partially fixes#62536
This is a resurrected version of <https://reviews.llvm.org/D158857>
(originally created by @a_vorobev) - I've cleaned it up a little, fixed
the rest of the tests and added to auto-upgrade for the data layout.