See #90452. The old parse tree errors exploded to thousands of unhelpful
lines when there were multiple missing end directives.
Instead, allow a missing end directive in the parse tree then validate
that it is present during semantics (where the error messages are a lot
easier to control).
The old parse tree errors quckly exploded to thousands of unhelpful
lines when there were multiple missing end directives (e.g. #90452).
Instead I've added a flag to the parse tree indicating when a missing
end directive needs to be diagnosed, and moved the error messages to
semantics (where they are a lot easier to control).
This has the disadvantage of not displaying the error if there were
other parse errors, but there is a precedent for this approach (e.g.
parsing atomic constructs).
`replaceAllUsesWith` is not safe to use in a dialect conversion and will
be deactivated soon (#154112). Fix commit fixes some API violations.
Also some general improvements.
When an instruction that the disassembler does not recognize is in an IT
block, we should still advance the IT state otherwise the IT state
spills over into the next recognized instruction, which is incorrect.
We want to avoid disassembly like:
it eq
<unknown> // Often because disassembler has insufficient target info.
addeq r0,r0,r0 // eq spills over into add.
Fixes#150569
It was previously failing because of a warning marking a C++20 feature
as an extension.
This is a follow-up to 85043c1c146fd5658ad4c5b5138e58994333e645 that
introduced the test.
The vector granule (AArch64 DWARF register 46) is a pseudo-register that
contains the available size in bits of SVE vector registers in the
current call frame, divided by 64. The vector granule can be used in
DWARF expressions to describe SVE/SME stack frame layouts (e.g., the
location of SVE callee-saves).
The first time VG is evaluated (if not already set), it is initialized
to the result of evaluating a "CNTD" instruction (this assumes SVE is
available).
To support SME, the value of VG can change per call frame; this is
currently handled like any other callee-save and is intended to support
the unwind information implemented in #152283. This limits how VG is
used in the CFI information of functions with "streaming-mode changes"
(mode changes that change the SVE vector length), to make the unwinder's
job easier.
The orc-rt extensible RTTI mechanism is used to provide simple dynamic RTTI
checks for orc-rt types that do not depend on standard C++ RTTI (meaning that
they will work equally well for programs compiled with -fno-rtti).
ORC_RT_MARK_AS_BITMASK_ENUM and ORC_RT_DECLARE_ENUM_AS_BITMASK can be used to
easily add support for bitmask operators (&, |, ^, ~) to enum types.
This code was derived from LLVM's include/llvm/ADT/BitmaskEnum.h header.
It is generally better to allow the target independent combines before
creating AArch64 specific nodes (providing they don't mess it up). This
moves the generation of BSL nodes to lowering, not a combine, so that
intermediate nodes are more likely to be optimized. There is a small
change in the constant handling to detect legalized buildvector
arguments correctly.
Fixes#149380 but not directly. #151856 contained a direct fix for
expanding the pseudos.
The simplest way is:
1. Save `vtype` to a scalar register.
2. Insert a `vsetvli`.
3. Use segment load/store.
4. Restore `vtype` via `vsetvl`.
But `vsetvl` is usually slow, so this PR is not in this way.
Instead, we use wider whole load/store instructions if the register
encoding is aligned. We have done the same optimization for COPY in
https://github.com/llvm/llvm-project/pull/84455.
We found this suboptimal implementation when porting some video codec
kernels via RVV intrinsics.
The order entries in the tablegen API files are iterated is not the
order
they appear in the file. To avoid any issues with the order changing
in future, we now generate all definitions of a certain class before
class that can use them.
This is a NFC; the definitions don't actually change, just the order
they exist in in the OffloadAPI.h header.
This keeps it closer to the other legality checks like the FP exceptions
check.
It also means that isSupportedInstr only needs to check the opcode,
which allows it to be replaced with a TSFlags based check in a later
patch.
This patch works towards consolidating all Clang debug-info into the
`clang/test/DebugInfo` directory
(https://discourse.llvm.org/t/clang-test-location-of-clang-debug-info-tests/87958).
Here we move only the `clang/test/CodeGenCXX` tests. I created a `CXX`
subdirectory for now because many of the tests I checked actually did
seem C++-specific. There is probably overlap between the `Generic` and
`CXX` subdirectory, but I haven't gone through and audited them all.
The list of files i came up with is:
1. searched for anything with `*debug-info*` in the filename
2. searched for occurrences of `debug-info-kind` in the tests
There's a couple of tests in `clang/test/CodeGenCXX` that still set
`-debug-info-kind`. They probably don't need to do that, but I'm not
changing that as part of this PR.
This patch adds feature modules registry, as discussed with @kadircet in
[discourse](https://discourse.llvm.org/t/rfc-registry-for-feature-modules/87733).
Feature modules, which added into the feature module set from registry
entries, can't expose public API, but still can be used via
`FeatureModule` interface.
MSVC's STL marks `std::make_shared`, `std::allocate_shared`,
`std::bitset::to_ulong`, and `std::bitset::to_ullong` as
`[[nodiscard]]`, which causes these libcxx tests to emit righteous
warnings. They should use the traditional `(void)` cast technique to
ignore the return values.
Currently, Flang can generate no-loop kernels for all OpenMP target
kernels in the program if the flags
-fopenmp-assume-teams-oversubscription or
-fopenmp-assume-threads-oversubscription are set.
If we add an additional parameter, we can choose
in the future which OpenMP kernels should be generated as no-loop
kernels.
This PR doesn't modify current behavior of oversubscription flags.
RFC for no-loop kernels:
https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517
The change in PR #154268 introduced a dependency on the `__GLIBC_PREREQ`
macro, which is not defined in musl libc. This caused the build to fail
in environments using musl.
This patch fixes the build by including
`sanitizer_common/sanitizer_glibc_version.h`. This header provides a
fallback definition for `__GLIBC_PREREQ` when LLVM is built against
non-glibc C libraries, resolving the compilation error.
It is going to grow, so it makes sense to move its definition
out of class. Instead, inline `populateInstruction()` into it.
Also, rename a couple of methods to better convey their meaning.
That is, on all targets except ARM and AArch64.
This field used to be required due to a bug, it was fixed long ago
by 23423c0ea8d414e56081cb6a13bd8b2cc91513a9.
This patch contains a list of tests that are currently failing in the
LLVM_ENABLE_PROFCHECK=ON build. This enables passing them to lit through
the LIT_XFAIL env variable. This is necessary for getting a buildbot
spun up to catch regressions while work is being done to fix the
existing issues.
We need to keep this in the LLVM tree so that tests can be removed from
the list at the same time the passes causing issues are fixed.
Issue #147390
LSan was recently refactored to call GetMaxUserVirtualAddress for
diagnostic purposes. This leads to failures for some of our downstream
tests which only run with lsan. This occurs because
GetMaxUserVirtualAddress depends on setting up shadow via a call to
__sanitizer_shadow_bounds, but shadow bounds aren't set for standalone
lsan because it doesn't use shadow. This updates the function to invoke
the same syscall used by __sanitizer_shadow_bounds calls for getting the
memory limit. Ideally this function would only be called once since we
only need to get the bounds once.
More context in https://fxbug.dev/437346226.
Operate directly on the existing Ops vector instead of copying to
a new vector. This is similar to what the autogenerated codegen
does for other intrinsics.
Currently only cases rooted at a full copy of an MFMA result are
handled.
Prepare to relax that by testing more intricate subregister usage.
Currently only full copies are handled, add some tests to help work
towards handling subregisters.