This is a major change on how we represent nested name qualifications in
the AST.
* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.
This patch offers a great performance benefit.
It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.
This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.
It has some other miscelaneous drive-by fixes.
About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.
There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.
How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.
The rest and bulk of the changes are mostly consequences of the changes
in API.
PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.
Fixes#136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
Several Clang tests were failing on Cygwin, and were already marked as
requiring !system-windows, unsupported on system-windows, or xfail on
system-windows. Add system-cygwin to lit's llvm.config, and use it in
such tests in addition to system-windows.
The following intrinsics were replaced by `__builtin_elementwise_fma`:
- `__builtin_ia32_vfmaddps(256)`
- `__builtin_ia32_vfmaddpd(256)`
- `__builtin_ia32_vfmaddph(256)`
- `__builtin_ia32_vfmaddbf16(128 | 256 | 512)`
All the aforementioned `__builtin_ia32_vfmadd*` intrinsics are
equivalent to a `__builtin_elementwise_fma`, so keeping them is an
unnecessary indirection.
Fixes [#152461](https://github.com/llvm/llvm-project/issues/152461)
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
We can derive and upgrade alignment for loads/stores using other
well-aligned loads/stores. This optimization does a single forward pass through
each basic block and uses loads/stores (the alignment and the offset) to
derive the best possible alignment for a base pointer, caching the
result. If it encounters another load/store based on that pointer, it
tries to upgrade the alignment. The optimization must be a forward pass within a basic
block because control flow and exception throwing can impact alignment guarantees.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
This patch updates the pmulhw/pmulhuw builtins to support constant
expression handling - extending the VectorExprEvaluator::VisitCallExpr
handling code that handles elementwise integer binop builtins.
Hopefully this can be used as reference patch to show how to add future
target specific constexpr handling with minimal code impact.
I've also enabled pmullw constexpr handling (which are tagged on
#152490) as they all use very similar tests.
I've also had to tweak the MMX -> SSE2 wrapper as undefs are not
permitted in constexpr shuffle masks
Fixes#152524
Still missing the "extend to 256-bit" casts - _mm256_castpd128_pd256 / _mm256_castps128_ps256 / _mm256_castsi128_si256 - due to constexpr not liking undefined/poison etc.
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
I fixed support for varargs functions
(previously it didn't crash but the codegen was incorrect).
I added tests for structs and unions which already work. With the
multivalue abi they crash in the backend, so I added a sema check that
rejects structs and unions for that abi.
It will also crash in the backend if passed an int128 or float128 type.
Adds missing C++ run lines to test files containing `constexpr` tests.
Also adds missing 32/64-bit test coverage to the following tests:
- `clang/test/CodeGen/X86/avx512-reduceIntrin.c`
- `clang/test/CodeGen/X86/avx512-reduceMinMaxIntrin.c`
- `clang/test/CodeGen/X86/avx512vpopcntdq-builtins.c`
- `clang/test/CodeGen/X86/avx512vpopcntdqvl-builtins.c`
Additionally, fixes a `_mm512_popcnt_epi64` `constexpr` test that
incorrectly assumed 32-bit integers, leading to incorrect bit counts.
This change updates the test result to assume 64-bit integers.
Update the easy add/sub/mul/logic/cmp/scalar_to_vector intrinsics to be
constexpr compatible.
I'm not expecting anyone to be very interested in using MMX intrinsics,
but they're smaller than the other types and are useful to test the
constexpr handling and test methods before we start applying them to
SSE/AVX2/AVX512 intrinsics.
NFC patch to clean up extra lines of code in the file
`llvm/test/CodeGen/PowerPC/check-zero-vector.ll` as the current one has
loop unrolled.
Also removing the file `clang/test/CodeGen/PowerPC/check-zero-vector.c`
as the patch affects only the backend.
Co-authored-by: himadhith <himadhith.v@ibm.com>
In commit d5985905ae8e5b2 I introduced a Sema check that prohibits
`__builtin_arm_ldrex` and `__builtin_arm_strex` for data sizes not
supported by the target architecture version. However, `arm_acle.h`
unconditionally uses those builtins with a 32-bit data size. So now
including that header will cause a build failure on Armv6-M, or historic
architectures like Armv5.
To fix it, `arm_acle.h` now queries the compiler-defined
`__ARM_FEATURE_LDREX` macro (also fixed recently in commit
34f59d79209268e so that it matches the target architecture). If 32-bit
LDREX isn't available it will fall back to the older SWP instruction, or
failing that (on Armv6-M), a libcall.
While I was modifying the header anyway, I also renamed the local
variable `v` inside `__swp` so that it starts with `__`, avoiding any
risk of user code having #defined `v`.
Writing something like this:
__builtin_dynamic_object_size((0, p->array), 0)
is equivalent to writing this:
__builtin_dynamic_object_size(p->array, 0)
though the former will give a warning about the first value being
unused.
2f497ec3a0056f15727ee6008211aeb2c4a8f455 updated the backend's rules for
when lock-free atomics are available, but we never made a corresponding
change to the frontend. Fix it to be consistent. This only affects
targets older than v7.
This patch removes %T from clang lit tests. %T has been deprecated for
about seven years and is not reccomended as it is not unique to each
test, which can lead to races. This patch is intended to remove usage in
tree with the end goal of removing support for %T within lit.
Consider this declaration:
`int foo();`
This function is described in LLVM with `clang::FunctionNoProtoType`
class. ([See
description](https://clang.llvm.org/doxygen/classclang_1_1FunctionNoProtoType.html))
Judging by [this
comment](a1bf0d1394/clang/lib/CodeGen/CGCall.cpp (L159C11-L159C12))
all such functions are treated like functions with variadic number of
parameters.
When we want to [emit debug
info](0a8ddd3965/clang/lib/CodeGen/CGDebugInfo.cpp (L4808))
we have to know function that we calling.
In method
[getCalledFunction()](0a8ddd3965/llvm/include/llvm/IR/InstrTypes.h (L1348))
we compare two types of function:
1. Function that we deduce from calling operand, and
2. Function that we store locally
If they differ we get `nullptr` and can't emit appropriate debug info.
The only thing they differ is: lhs function is variadic, but rhs
function isn't
Reason of this difference is that under RISC-V there is no overridden
function that tells us about treating functions with no parameters.
[Default
function](0a8ddd3965/clang/lib/CodeGen/TargetInfo.cpp (L87))
always return `false`.
This patch overrides this function for RISC-V
CWD is queried in clang driver and passed to clang cc1 via flags when
needed. Respect the cc1 flags and do not repeated checking current
working directory in CodeGen.
Issue originally raised in
https://github.com/llvm/llvm-project/issues/71362#issuecomment-3028515618.
Certain NEON intrinsics that operate on poly types (e.g. poly8x8_t)
failed to compile with the -fno-lax-vector-conversions flag. This patch
updates NeonEmitter.cpp to insert an explicit __builtin_bit_cast from
poly types to the required signed integer vector types when generating
lane-related intrinsics. A test 'neon-bitcast-poly.ll' is included.
The Cygwin target is generally very similar to the MinGW target. The
default auto-import behavior, the default calling convention, the
`.dll.a` import library extension, the `__GXX_TYPEINFO_EQUALITY_INLINE`
pre-define by `g++`, and the long double configuration.
Co-authored-by: Mateusz Mikuła <oss@mateuszmikula.dev>
This patch adds a human readable trap category and message to UBSan
traps. The category and message are encoded in a fake frame in the debug
info where the function is a fake inline function where the name encodes
the trap category and message. This is the same mechanism used by
Clang’s `__builtin_verbose_trap()`.
This change allows consumers of binaries built with trapping UBSan to
more easily identify the reason for trapping. In particular LLDB already
has a frame recognizer that recognizes the fake function names emitted
in debug info by this patch. A patch testing this behavior in LLDB will
be added in a separately.
The human readable trap messages are based on the messages currently
emitted by the userspace runtime for UBSan in compiler-rt. Note the
wording is not identical because the userspace UBSan runtime has access
to dynamic information that is not available during Clang’s codegen.
Test cases for each UBSan trap kind are included.
This complements the [`-fsanitize-annotate-debug-info`
feature](https://github.com/llvm/llvm-project/pull/141997). While
`-fsanitize-annotate-debug-info` attempts to annotate all UBSan-added
instructions, this feature (`-fsanitize-debug-trap-reasons`) only
annotates the final trap instruction using SanitizerHandler information.
This work is part of a GSoc 2025 project.
Tests if the runtime type of the function pointer matches the static
type. If this returns false, calling the function pointer will trap.
Uses `@llvm.wasm.ref.test.func` added in #147486.
Also adds a "gc" wasm feature to gate the use of the ref.test
instruction.
Previously, the unsigned NEON intrinsic variants of 'vqshrun_high_n' and
'vqrshrun_high_n' were using signed integer types for their first
argument and return values.
These should be unsigned according to developer.arm.com, however.
Adjust the test cases accordingly.
Let Clang emit `dead_on_return` attribute on pointer arguments
that are passed indirectly, namely, large aggregates that the
ABI mandates be passed by value; thus, the parameter is destroyed
within the callee. Writes to such arguments are not observable by
the caller after the callee returns.
This should desirably enable further MemCpyOpt/DSE optimizations.
Previous discussion: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.
Currently, clang coerces (u)int128_t to two i64 IR parameters when they
are passed in registers. This leads to broken debug info for them after
applying SROA+InstCombine. SROA generates IR like this
([godbolt](https://godbolt.org/z/YrTa4chfc)):
```llvm
define dso_local { i64, i64 } @add(i64 noundef %a.coerce0, i64 noundef %a.coerce1) {
entry:
%a.sroa.2.0.insert.ext = zext i64 %a.coerce1 to i128
%a.sroa.2.0.insert.shift = shl nuw i128 %a.sroa.2.0.insert.ext, 64
%a.sroa.0.0.insert.ext = zext i64 %a.coerce0 to i128
%a.sroa.0.0.insert.insert = or i128 %a.sroa.2.0.insert.shift, %a.sroa.0.0.insert.ext
#dbg_value(i128 %a.sroa.0.0.insert.insert, !17, !DIExpression(), !18)
// ...
!17 = !DILocalVariable(name: "a", arg: 1, scope: !10, file: !11, line: 1, type: !14)
// ...
```
and InstCombine then removes the `or`, moving it into the
`DIExpression`, and the `shl` at which point the debug info salvaging in
`Transforms/Local` replaces the arguments with `poison` as it does not
allow constants larger than 64 bit in `DIExpression`s.
I'm working under the assumption that there is interest in fixing this.
If not, please tell me.
By not coercing `int128_t`s into `{i64, i64}` but keeping them as
`i128`, the debug info stays intact and SelectionDAG then generates two
`DW_OP_LLVM_fragment` expressions for the two corresponding argument
registers.
Given that the ABI code for x64 seems to not coerce the argument when it
is passed on the stack, it should not lead to any problems keeping it as
an `i128` when it is passed in registers.
Alternatively, this could be fixed by checking if a constant value fits
in 64 bits in the debug info salvaging code and then extending the value
on the expression stack to the necessary width. This fixes InstCombine
breaking the debug info but then SelectionDAG removes the expression and
that seems significantly more complex to debug.
Another fix may be to generate `DW_OP_LLVM_fragment` expressions when
removing the `or` as it gets marked as disjoint by InstCombine. However,
I don't know if the KnownBits information is still available at the time
the `or` gets removed and it would probably require refactoring of the
debug info salvaging code as that currently only seems to replace single
expressions and is not designed to support generating new debug records.
Converting `(u)int128_t` arguments to `i128` in the IR seems like the
simpler solution, if it doesn't cause any ABI issues.