This commit adds a new pass gate that allows selective disabling
of one or more passes via the clang command line using the
`-opt-disable` option. Passes to be disabled should be specified as a
comma-separated list of their names.
The implementation resides in the same file as the bisection tool. The
`getGlobalPassGate()` function returns the currently enabled gate.
Example: `-opt-disable="PassA,PassB"`
Pass names are matched using case-insensitive comparisons. However, note
that special characters, including spaces, must be included exactly as
they appear in the pass names.
Additionally, a `-opt-disable-enable-verbosity` flag has been introduced to
enable verbose output when this functionality is in use. When enabled,
it prints the status of all passes (either running or NOT running),
similar to the default behavior of `-opt-bisect-limit`. This flag is
disabled by default, which is the opposite of the `-opt-bisect-verbose`
flag (which defaults to enabled).
To validate this functionality, a test file has also been provided. It reuses
the same infrastructure as the opt-bisect test, but disables three
specific passes and checks the output to ensure the expected behavior.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
PPCSubtarget is not always initialized, depending on which passes are
running, and in our downstream fork, -enable-matrix is the default
configuration (regardless of whether matrix intrinsics are present in
the IR), which triggers a fatal error in builtins-ppc-fpconstrained.c.
XAndesBFHCvt provides two builtins functions for converting between
float and bf16. Users can use them to convert bf16 values loaded from
memory to float, perform arithmetic operations, then convert them back
to bf16 and store them to memory.
The load/store and move operations for bf16 will be handled in a later
patch.
Clang can chose which sort of source-file hash is attached to a DIFile
metadata node. However, whenever hashing is possible, we /always/ attach
a hash. This patch permits users who want DWARF5 but don't want the file
hashes to opt out, by adding a "none" option to the -gsrc-hash option
that skips hash computation.
Feature flags protect instructions not datatypes. This means only
builtins associated with +bf16 protected instructions must be guarded.
Those that treat the data as opaque 16-bit values (e.g. loads, store and
shuffles) should be freely available with the underlying SVE feature.
The underlying type of an enumeration is the non-atomic, unqualified
version of the specified type. Clang was rejecting such enumerations,
with a hard error, but now has the ability to downgrade the error into a
warning. Additionally, we diagnose (as a warning) dropping other
qualifiers. _Atomic is special given that an atomic type need not have
the same size as its non-atomic counterpart, and that the C++ version
of <stdatomic.h> defines _Atomic to std::atomic for easing cross-
language atomic use and std::atomic is an invalid enum base in C++.
(Note: we expose _Atomic in C++ even without including
<stdatomic,h>.)
Fixes#147736
NFC patch to add testcase for locking down the support of Zero vector
comparisons using the `vcmpgtuh (vector compare greater than unsigned
halfword)` instruction.
Currently `vcmpequh (vector compare equal unsigned halfword)` is in use.
---------
Co-authored-by: himadhith <himadhith.v@ibm.com>
Co-authored-by: Tony Varghese <tonypalampalliyil@gmail.com>
The vwcvt intrinsic produces a vwadd with a scalar 0 for the RHS. We
should be using the element type of the source so that the 0 needs to be
widened.
The i32->i64 vwcvt previously failed on RV32 because the legalization
code doesn't expect to see an i64 type.
The quadword vector instructions introduced by SVE2p1/SME2p1 but the
builtins were not available to streaming mode.
RAX1 is available in streaming mode when SME2p1 is available.
Builtins that are enabled via +sve2p1 in non-streaming mode and +sme{2}
in streaming mode should also be enabled via +sve+sme{2} in
non-streaming mode and +sme+sve2p1 in streaming mode.
To improve organization and maintainability, I'd like to split out the
intrinsic tests for zvfh, zvfhmin, zvfbfmin and zvfbfwma into separate
directories.
RFC on discourse:
https://discourse.llvm.org/t/rfc-debug-info-for-coroutine-suspension-locations-take-2/86606
With this commit, we add `DILabel` debug infos to the resume points of a
coroutine. Those labels can be used by debugging scripts to figure out
the exact line and column at which a coroutine was suspended by looking
up current `__coro_index` value inside the coroutines frame, and then
searching for the corresponding label inside the coroutine's resume
function.
The DWARF information generated for such a label looks like:
```
0x00000f71: DW_TAG_label
DW_AT_name ("__coro_resume_1")
DW_AT_decl_file ("generator-example.cpp")
DW_AT_decl_line (5)
DW_AT_decl_column (3)
DW_AT_artificial (true)
DW_AT_LLVM_coro_suspend_idx (0x01)
DW_AT_low_pc (0x00000000000019be)
```
The labels can be mapped to their corresponding `__coro_idx` values
either via their naming convention `__coro_resume_<N>` or using the new
`DW_AT_LLVM_coro_suspend_idx` attribute. In gdb, those line numebrs can
be looked up using `info line -function my_coroutine -label
__coro_resume_1`. LLDB unfortunately does not understand DW_TAG_label
debug information, yet.
Given this is an artificial compiler-generated label, I did apply the
DW_AT_artificial tag to it. The DWARFv5 standard only allows that tag on
type and variable definitions, but this is a natural extension and was
also blessed in the RFC on discourse.
Also, this commit adds `DW_AT_decl_column` to labels, not only for
coroutines but also for normal C and C++ labels. While not strictly
necessary, I am doing so now because it would be harder to do so later
without breaking the binary LLVM-IR format
Drive-by fixes: While reading the existing test cases to understand how
to write my own test case, I did a couple of small typo fixes and
comment improvements
llvm.aarch64.set.fpmr only writes to inaccessible memory. Tag it with
the IntrWriteMem and IntrInaccessibleMemOnly properties so the optimiser
can treat it as a pure write.
The original patch did not add this property, causing the intrinsic to
be conservatively treated as readwrite. This commit fixes that.
XAndesVPackFPH can actually be used independently without requiring
Zvfhmin. Therefore, we remove the implicitly required Zvfhmin extension
from XAndesVPackFPH and imply that the f extension is sufficient.
This is similar to -msve-vector-bits, but for streaming mode: it
constrains the legal values of "vscale", allowing optimizations based on
that constraint.
This also fixes conversions between SVE vectors and fixed-width vectors
in streaming functions with -msve-vector-bits and
-msve-streaming-vector-bits.
This rejects any use of arm_sve_vector_bits types in streaming
functions; if it becomes relevant, we could add
arm_sve_streaming_vector_bits types in the future.
This doesn't touch the __ARM_FEATURE_SVE_BITS define.
Adds support for __sys Clang builtin for AArch64
__sys is a long existing MSVC intrinsic used to manage caches, tlbs, etc
by writing to system registers:
* It takes a macro-generated constant and uses it to form the AArch64 SYS instruction which is MSR with op0=1. The macro drops op0 and expects the implementation to hardcode it to 1 in the encoding.
* Volume use is in systems code (kernels, hypervisors, boot environments, firmware)
* Has an unused return value due to MSVC cut/paste error
Implementation:
* Clang builtin, sharing code with Read/WriteStatusReg
* Hardcodes the op0=1
* Explicitly returns 0
* Code-format change from clang-format
* Unittests included
* Not limited to MSVC-environment as its generally useful and neutral
This will enable removal of a hack from the wasm backend
in a future change.
This feels unnecessarily clunky. I would assume something was
automatically parsing this and propagating it in the C++ case,
but I can't seem to find it. In particular it feels wrong that
I need to parse out the individual values, given they are listed
in the options.td file. We should also be parsing and forwarding
every flag that corresponds to something else in TargetOptions,
which requires auditing.
As the data layout a few lines further up specifies, the int, long and
pointer alignment should be 16 instead of the default of 32.
The long long alignment is also incorrect, but that would require a
change to the data layout as well.
Comparison with GCC, which consistently uses 2 byte alignment:
https://gcc.godbolt.org/z/K3x6a7dEf At least based on some spot checks,
the changes to bit field layout also make use match GCC now.
This was found by https://github.com/llvm/llvm-project/pull/144720.
This patch makes determining alignment and width of BitInt to be target
ABI specific and makes it consistent with [Procedure Call Standard for
the LoongArch™ Architecture] for LoongArch target
(https://github.com/loongson/la-abi-specs/blob/release/lapcs.adoc).
This marks ffloor as legal providing that armv8 and neon is present (or
fullfp16 for the fp16 instructions). The existing arm_neon_vrintm
intrinsics are auto-upgraded to llvm.floor.
If this is OK I will update the other vrint intrinsics.
There're four possible formats to refer a register in inline assembly,
1. Numeric name without dollar sign ("f0")
2. Numeric name with dollar sign ("$f0")
3. ABI name without dollar sign ("fa0")
4. ABI name with dollar sign ("$fa0")
LoongArch GCC accepts 1 and 2 for FPRs before r15-8284[1] and all these
formats after the chagne. But Clang supports only 2 and 4 for FPRs. The
inconsistency has caused compatibility issues, such as QEMU's case[2].
This patch follows 0bbf3ddf5fea ("[Clang][LoongArch] Add GPR alias
handling without `$` prefix") and accepts FPRs without dollar sign
prefixes as well to keep aligned with GCC, avoiding future compatibility
problems.
Link:
https://gcc.gnu.org/cgit/gcc/commit/?id=d0110185eb78f14a8e485f410bee237c9c71548d
[1]
Link:
https://lore.kernel.org/qemu-devel/20250314033150.53268-3-ziyao@disroot.org/
[2]