10051 Commits

Author SHA1 Message Date
Florian Mayer
3f6c0e62d5
[clang][KCFI] Respect -fsanitize-cfi-icall-generalize-pointers (#152400)
This flag was previously ignored by KCFI.
2025-08-11 17:21:13 -07:00
Simon Pilgrim
a10a8bb612
[Headers][X86] Enable constexpr handling for MMX/SSE sitofp/uitofp helper cvt intrinsics (#153017) 2025-08-11 17:19:14 +01:00
Bhasawut Singhaphan
26a0962ce2
[Headers][X86] Allow AVX512 _mm512_set* intrinsics to be used in constexpr (#152910)
This PR adds constexpr support for the following AVX512F/BW/FP16 set
intrinsics:

  - _mm512_set4_pd
  - _mm512_set4_ps
  - _mm512_set4_epi32
  - _mm512_set4_epi64

  - _mm512_set_pd
  - _mm512_set_ps
  - _mm512_set_epi8
  - _mm512_set_epi16
  - _mm512_set_epi32
  - _mm512_set_epi64

  - _mm512_setr_pd
  - _mm512_setr_ps
  - _mm512_setr_epi32
  - _mm512_setr_epi64

  - _mm512_set1_ph
  - _mm512_set_ph
  - _mm512_setr_ph

  - _mm_setzero_ph
  - _mm256_setzero_ph
  - _mm512_setzero_ph

Closes https://github.com/llvm/llvm-project/issues/152288.
Part of https://github.com/llvm/llvm-project/issues/30794.
2025-08-11 15:47:50 +01:00
Simon Pilgrim
880d15e28c
[X86] avx512fp16-builtins.c - add C/C++ and 32/64-bit test coverage (#152997)
Helps testing for #152910
2025-08-11 12:49:26 +01:00
woruyu
50bd897fde
[Headers][X86] Allow SSE41/AVX2/AVX512F/AVX512BW integer extension intrinsics to be used in constexpr (#152971)
### Summary
This PR resolves https://github.com/llvm/llvm-project/issues/152315
2025-08-11 18:38:17 +08:00
Matheus Izvekov
91cdd35008
[clang] Improve nested name specifier AST representation (#147835)
This is a major change on how we represent nested name qualifications in
the AST.

* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.

This patch offers a great performance benefit.

It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.

This has great results on compile-time-tracker as well:

![image](https://github.com/user-attachments/assets/700dce98-2cab-4aa8-97d1-b038c0bee831)

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.

It has some other miscelaneous drive-by fixes.

About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.

There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.

How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.

The rest and bulk of the changes are mostly consequences of the changes
in API.

PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.

Fixes #136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
2025-08-09 05:06:53 -03:00
jeremyd2019
ff616b4806
[Tests] Add system-cygwin feature, and use it. (#152611)
Several Clang tests were failing on Cygwin, and were already marked as
requiring !system-windows, unsupported on system-windows, or xfail on
system-windows. Add system-cygwin to lit's llvm.config, and use it in
such tests in addition to system-windows.
2025-08-08 13:29:00 -07:00
moorabbit
989c0d2526
[Clang][X86] Replace unnecessary vfmadd* builtins with element_wise_fma (#152545)
The following intrinsics were replaced by `__builtin_elementwise_fma`:
- `__builtin_ia32_vfmaddps(256)`
- `__builtin_ia32_vfmaddpd(256)`
- `__builtin_ia32_vfmaddph(256)`
- `__builtin_ia32_vfmaddbf16(128 | 256 | 512)`

All the aforementioned `__builtin_ia32_vfmadd*` intrinsics are
equivalent to a `__builtin_elementwise_fma`, so keeping them is an
unnecessary indirection.

Fixes [#152461](https://github.com/llvm/llvm-project/issues/152461)

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-08 20:51:15 +01:00
Simon Pilgrim
817133d7d9
[X86] avx512vl-builtins.c - add C/C++ test coverage (#152765) 2025-08-08 20:26:05 +01:00
Drew Kersnar
90e8c8e718
[InferAlignment] Propagate alignment between loads/stores of the same base pointer (#145733)
We can derive and upgrade alignment for loads/stores using other
well-aligned loads/stores. This optimization does a single forward pass through
each basic block and uses loads/stores (the alignment and the offset) to
derive the best possible alignment for a base pointer, caching the
result. If it encounters another load/store based on that pointer, it
tries to upgrade the alignment. The optimization must be a forward pass within a basic
block because control flow and exception throwing can impact alignment guarantees.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2025-08-08 12:05:29 -05:00
Simon Pilgrim
45b4f1b438
[Headers][X86] Allow _mm512_set1_epi8/16/pd/ps intrinsics to be used in constexpr (#152746)
Pulled out of #152288 as I need this to proceed with several other patches
2025-08-08 17:04:08 +01:00
Simon Pilgrim
c8312bdd16
[Headers][X86] Enable constexpr handling for pmulhw/pmulhuw intrinsics (#152540)
This patch updates the pmulhw/pmulhuw builtins to support constant
expression handling - extending the VectorExprEvaluator::VisitCallExpr
handling code that handles elementwise integer binop builtins.

Hopefully this can be used as reference patch to show how to add future
target specific constexpr handling with minimal code impact.

I've also enabled pmullw constexpr handling (which are tagged on
#152490) as they all use very similar tests.

I've also had to tweak the MMX -> SSE2 wrapper as undefs are not
permitted in constexpr shuffle masks

Fixes #152524
2025-08-08 17:02:50 +01:00
Simon Pilgrim
f169893cbf
[Headers][X86] Allow BITALG vpopcntw/vpopcntb intrinsics to be used in constexpr (#152701)
Matches VPOPCNTDQ handling
2025-08-08 16:09:26 +01:00
Simon Pilgrim
e64224a224
[Headers][X86] Allow AVX cast intrinsics to be used in constexpr (#152730)
Still missing the "extend to 256-bit" casts - _mm256_castpd128_pd256 / _mm256_castps128_ps256 / _mm256_castsi128_si256 - due to constexpr not liking undefined/poison etc.
2025-08-08 15:39:39 +01:00
Simon Pilgrim
1e9ed918dd
[X86][AVX512BITALG] add C/C++ and 32/64-bit builtins test coverage (#152693) 2025-08-08 13:12:06 +01:00
Simon Pilgrim
691ede2830
[Headers][X86] Allow _mm512_set1_epi32/64 intrinsics to be used in constexpr (#152674)
Pulled out of #152288 as I need this to proceed with several other patches
2025-08-08 11:02:08 +01:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Hood Chatham
b9c328480c
[clang][WebAssembly] Support reftypes & varargs in test_function_pointer_signature (#150921)
I fixed support for varargs functions
(previously it didn't crash but the codegen was incorrect).

I added tests for structs and unions which already work. With the
multivalue abi they crash in the backend, so I added a sema check that
rejects structs and unions for that abi.

It will also crash in the backend if passed an int128 or float128 type.
2025-08-07 13:07:04 -07:00
Pedro Lobo
f3bf8e0166
[clang][x86] Add C/C++ and 32/64-bit test coverage to constexpr tests (#152478)
Adds missing C++ run lines to test files containing `constexpr` tests.
Also adds missing 32/64-bit test coverage to the following tests:
- `clang/test/CodeGen/X86/avx512-reduceIntrin.c`
- `clang/test/CodeGen/X86/avx512-reduceMinMaxIntrin.c`
- `clang/test/CodeGen/X86/avx512vpopcntdq-builtins.c`
- `clang/test/CodeGen/X86/avx512vpopcntdqvl-builtins.c`

Additionally, fixes a `_mm512_popcnt_epi64` `constexpr` test that
incorrectly assumed 32-bit integers, leading to incorrect bit counts.
This change updates the test result to assume 64-bit integers.
2025-08-07 13:50:52 +01:00
Simon Pilgrim
f24c50a635 [X86] avx512dq-builtins.c - add C/C++ and 32/64-bit test coverage
Inspired by #152478
2025-08-07 12:08:48 +01:00
Yi-Chi Lee
e1d6753006
[Headers][X86] Update AVX/AVX512 float/double add/sub/mul/div/unpck intrinsics to be used in constexpr (#152435)
Fixed #152313

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-07 11:21:58 +01:00
Pedro Lobo
5805e88745
[Headers][X86] Allow AVX512 reduction intrinsics to be used in constexpr (#152363)
Closes #152324.
Part of #30794.

This PR adds `constexpr` support for the following AVX512 integer
reduction intrinsics:

- `_mm512_reduce_add_epi32`
- `_mm512_reduce_add_epi64`
- `_mm512_reduce_mul_epi32`
- `_mm512_reduce_mul_epi64`
- `_mm512_reduce_and_epi32`
- `_mm512_reduce_and_epi64`
- `_mm512_reduce_or_epi32`
- `_mm512_reduce_or_epi64`
- `_mm512_reduce_max_epi32`
- `_mm512_reduce_max_epi64`
- `_mm512_reduce_min_epi32`
- `_mm512_reduce_min_epi64`
- `_mm512_reduce_max_epu32`
- `_mm512_reduce_max_epu64`
- `_mm512_reduce_min_epu32`
- `_mm512_reduce_min_epu64`

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-07 10:46:22 +01:00
Simon Pilgrim
6abf4f376e
[Headers][X86] Allow AVX movddup/movsldup/movshdup intrinsics to be used in constexpr (#152340)
Matches SSE3 handling
2025-08-07 08:17:31 +01:00
Simon Pilgrim
b83f7f195c
[Headers][X86] Update SSE/AVX and/andnot/or/xor intrinsics to be used in constexpr (#152305) 2025-08-07 08:16:26 +01:00
Simon Pilgrim
edad89e4e0
[Headers][X86] Update MMX arithmetic intrinsics to be used in constexpr (#152296)
Update the easy add/sub/mul/logic/cmp/scalar_to_vector intrinsics to be
constexpr compatible.

I'm not expecting anyone to be very interested in using MMX intrinsics,
but they're smaller than the other types and are useful to test the
constexpr handling and test methods before we start applying them to
SSE/AVX2/AVX512 intrinsics.
2025-08-07 08:05:05 +01:00
Florian Mayer
a7f1702f2c
[NFC] [CFI] correct comment in test (#152399)
It incorrectly stated that `const char*` gets normalized to ptr, while
it should say that `char*` does.
2025-08-06 16:07:40 -07:00
Himadhith
1f1b903a64
[NFC][PowerPC] Cleaning up test file and removing redundant front-end test (#151971)
NFC patch to clean up extra lines of code in the file
`llvm/test/CodeGen/PowerPC/check-zero-vector.ll` as the current one has
loop unrolled.
Also removing the file `clang/test/CodeGen/PowerPC/check-zero-vector.c`
as the patch affects only the backend.

Co-authored-by: himadhith <himadhith.v@ibm.com>
2025-08-06 15:59:47 +05:30
Simon Pilgrim
a5d85a6ab5
[Headers][X86] Allow AVX _mm256_set* intrinsics to be used in constexpr (#152173) 2025-08-06 11:14:22 +01:00
zGoldthorpe
d7074b63ed
[Clang][AMDGPU] Add builtins for some buffer resource atomics (#149216)
This patch exposes builtins for atomic `add`, `max`, and `min` operations that
operate over buffer resource pointers.
2025-08-05 11:04:15 -06:00
Simon Tatham
87283db548
[clang][ARM] Fix build failure in <arm_acle.h> for __swp (#151354)
In commit d5985905ae8e5b2 I introduced a Sema check that prohibits
`__builtin_arm_ldrex` and `__builtin_arm_strex` for data sizes not
supported by the target architecture version. However, `arm_acle.h`
unconditionally uses those builtins with a 32-bit data size. So now
including that header will cause a build failure on Armv6-M, or historic
architectures like Armv5.

To fix it, `arm_acle.h` now queries the compiler-defined
`__ARM_FEATURE_LDREX` macro (also fixed recently in commit
34f59d79209268e so that it matches the target architecture). If 32-bit
LDREX isn't available it will fall back to the older SWP instruction, or
failing that (on Armv6-M), a libcall.

While I was modifying the header anyway, I also renamed the local
variable `v` inside `__swp` so that it starts with `__`, avoiding any
risk of user code having #defined `v`.
2025-08-05 08:45:54 +01:00
Aiden Grossman
b7b501e54c Reapply "[clang] Remove %T from tests (#151614)"
This reverts commit 4c80193a58a5c24e2bbebe291feb406191c4e2ab.

This relands the commit. The issues have theoretically been fixed.
2025-08-02 20:08:53 +00:00
Bill Wendling
49a24b3116
[CodeGen][counted_by] Support use of the comma operator (#151776)
Writing something like this:

  __builtin_dynamic_object_size((0, p->array), 0)

is equivalent to writing this:

  __builtin_dynamic_object_size(p->array, 0)

though the former will give a warning about the first value being
unused.
2025-08-01 17:28:08 -07:00
Eli Friedman
558277ae4d
[clang][ARM] Fix setting of MaxAtomicInlineWidth. (#151404)
2f497ec3a0056f15727ee6008211aeb2c4a8f455 updated the backend's rules for
when lock-free atomics are available, but we never made a corresponding
change to the frontend. Fix it to be consistent. This only affects
targets older than v7.
2025-08-01 11:03:21 -07:00
Aiden Grossman
4c80193a58 Revert "[clang] Remove %T from tests (#151614)"
This reverts commit 5a586375aa3a128dadc9473cfa196bf8588c2a82.

This breaks two buildbots with failures in
implicit-module-header-maps.cpp. No idea why these failures are
occurring.

https://lab.llvm.org/buildbot/#/builders/64/builds/5166
https://lab.llvm.org/buildbot/#/builders/13/builds/8725
2025-08-01 17:30:24 +00:00
Aiden Grossman
5a586375aa
[clang] Remove %T from tests (#151614)
This patch removes %T from clang lit tests. %T has been deprecated for
about seven years and is not reccomended as it is not unique to each
test, which can lead to races. This patch is intended to remove usage in
tree with the end goal of removing support for %T within lit.
2025-08-01 08:25:14 -07:00
Georgiy Samoylov
bcbbb2c986
[clang] Fix clang debug info generation for unprtototyped function (#150022)
Consider this declaration:

`int foo();`

This function is described in LLVM with `clang::FunctionNoProtoType`
class. ([See
description](https://clang.llvm.org/doxygen/classclang_1_1FunctionNoProtoType.html))

Judging by [this
comment](a1bf0d1394/clang/lib/CodeGen/CGCall.cpp (L159C11-L159C12))
all such functions are treated like functions with variadic number of
parameters.

When we want to [emit debug
info](0a8ddd3965/clang/lib/CodeGen/CGDebugInfo.cpp (L4808))
we have to know function that we calling.

In method
[getCalledFunction()](0a8ddd3965/llvm/include/llvm/IR/InstrTypes.h (L1348))
we compare two types of function:

1. Function that we deduce from calling operand, and
2. Function that we store locally

If they differ we get `nullptr` and can't emit appropriate debug info.

The only thing they differ is: lhs function is variadic, but rhs
function isn't

Reason of this difference is that under RISC-V there is no overridden
function that tells us about treating functions with no parameters.
[Default
function](0a8ddd3965/clang/lib/CodeGen/TargetInfo.cpp (L87))
always return `false`.

This patch overrides this function for RISC-V
2025-08-01 12:04:39 +03:00
Steven Wu
3c08498fe2
[clang][CodeGen] Remove CWD fallback in compilation directory (#150130)
CWD is queried in clang driver and passed to clang cc1 via flags when
needed. Respect the cc1 flags and do not repeated checking current
working directory in CodeGen.
2025-07-31 16:32:44 -07:00
Bill Wendling
254b90fa95
[CodeGen][counted_by] See past parentheses and no-op casts (#151266)
Parentheses and no-op casts don't change the value. Skip past them to
get to a MemberExpr.

Fixes #151236
2025-07-30 14:37:05 -07:00
Amina Chabane
62744f3681
[AArch64][NEON] NEON intrinsic compilation error with -fno-lax-vector-conversion flag fix (#149329)
Issue originally raised in
https://github.com/llvm/llvm-project/issues/71362#issuecomment-3028515618.
Certain NEON intrinsics that operate on poly types (e.g. poly8x8_t)
failed to compile with the -fno-lax-vector-conversions flag. This patch
updates NeonEmitter.cpp to insert an explicit __builtin_bit_cast from
poly types to the required signed integer vector types when generating
lane-related intrinsics. A test 'neon-bitcast-poly.ll' is included.
2025-07-30 10:56:14 +01:00
jeremyd2019
a3228b6bf9
[Clang][Cygwin] Enable few conditions that are shared with MinGW (#149637)
The Cygwin target is generally very similar to the MinGW target. The
default auto-import behavior, the default calling convention, the
`.dll.a` import library extension, the `__GXX_TYPEINFO_EQUALITY_INLINE`
pre-define by `g++`, and the long double configuration.

Co-authored-by: Mateusz Mikuła <oss@mateuszmikula.dev>
2025-07-29 10:01:43 -07:00
Phoebe Wang
3ea3e334cc
[X86][AVX10.2] Fix VNNIINT16 maskz intrinsics arguments order (#151077)
For maskz intrinsics, the first argument is always the mask.
2025-07-29 14:52:14 +08:00
Anthony Tran
29992cfd62
[Clang][CodeGen] Emit “trap reasons” on UBSan traps (#145967)
This patch adds a human readable trap category and message to UBSan
traps. The category and message are encoded in a fake frame in the debug
info where the function is a fake inline function where the name encodes
the trap category and message. This is the same mechanism used by
Clang’s `__builtin_verbose_trap()`.

This change allows consumers of binaries built with trapping UBSan to
more easily identify the reason for trapping. In particular LLDB already
has a frame recognizer that recognizes the fake function names emitted
in debug info by this patch. A patch testing this behavior in LLDB will
be added in a separately.

The human readable trap messages are based on the messages currently
emitted by the userspace runtime for UBSan in compiler-rt. Note the
wording is not identical because the userspace UBSan runtime has access
to dynamic information that is not available during Clang’s codegen.

Test cases for each UBSan trap kind are included.

This complements the [`-fsanitize-annotate-debug-info`
feature](https://github.com/llvm/llvm-project/pull/141997). While
`-fsanitize-annotate-debug-info` attempts to annotate all UBSan-added
instructions, this feature (`-fsanitize-debug-trap-reasons`) only
annotates the final trap instruction using SanitizerHandler information.

This work is part of a GSoc 2025 project.
2025-07-26 08:50:25 -07:00
Hood Chatham
15b03687ff
[WebAssembly,clang] Add __builtin_wasm_test_function_pointer_signature (#150201)
Tests if the runtime type of the function pointer matches the static
type. If this returns false, calling the function pointer will trap.
Uses `@llvm.wasm.ref.test.func` added in #147486.

Also adds a "gc" wasm feature to gate the use of the ref.test
instruction.
2025-07-25 16:52:39 -07:00
Martin Wehking
933ba27306
Fix implicit vector conversion (#149970)
Previously, the unsigned NEON intrinsic variants of 'vqshrun_high_n' and
'vqrshrun_high_n' were using signed integer types for their first
argument and return values.
These should be unsigned according to developer.arm.com, however.

Adjust the test cases accordingly.
2025-07-23 15:44:46 +01:00
Wenju He
e0dd22fab1
[Clang] Add elementwise maximumnum/minimumnum builtin functions (#149775)
Addresses https://github.com/llvm/llvm-project/issues/112164. minimumnum
and maximumnum intrinsics were added in 5bf81e53dbea.

The new built-ins can be used for implementing OpenCL math function fmax
and fmin in #128506.
2025-07-23 08:34:35 +08:00
Timothy Herchen
e644f5fd9e
[clang] [Sema] Check argument range for prefetchi* intrinsics (#149745)
Fixes https://github.com/llvm/llvm-project/issues/144857 . I can create
a test if desired, but I think the fix is trivial enough.

<img width="805" height="105" alt="image"
src="https://github.com/user-attachments/assets/aaee8e5f-6e65-4f04-b8b9-e4ae1434d958"
/>
2025-07-22 15:25:17 +08:00
Lei Huang
53f4abc603
[PowerPC][NFC] Combine the 2 dmf neg test files (#149875)
Combining since these are testing the same err message with only
difference being the target cpu.
2025-07-21 15:12:18 -04:00
Jim Lin
6150034578
[RISCV] Add missing vcompress and vrgather intrinsic tests for zvfbfmin (#148129)
The permutation intrinsics for zvfbfmin are documented by
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/408.
2025-07-21 16:17:17 +08:00
Antonio Frighetto
9e0c06d708 [clang][CodeGen] Set dead_on_return when passing arguments indirectly
Let Clang emit `dead_on_return` attribute on pointer arguments
that are passed indirectly, namely, large aggregates that the
ABI mandates be passed by value; thus, the parameter is destroyed
within the callee. Writes to such arguments are not observable by
the caller after the callee returns.

This should desirably enable further MemCpyOpt/DSE optimizations.

Previous discussion: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.
2025-07-18 11:50:18 +02:00
T0b1-iOS
d35931c49e
[Clang][CodeGen][X86] don't coerce int128 into {i64,i64} for SysV-like ABIs (#135230)
Currently, clang coerces (u)int128_t to two i64 IR parameters when they
are passed in registers. This leads to broken debug info for them after
applying SROA+InstCombine. SROA generates IR like this
([godbolt](https://godbolt.org/z/YrTa4chfc)):
```llvm
define dso_local { i64, i64 } @add(i64 noundef %a.coerce0, i64 noundef %a.coerce1)  {
entry:
  %a.sroa.2.0.insert.ext = zext i64 %a.coerce1 to i128
  %a.sroa.2.0.insert.shift = shl nuw i128 %a.sroa.2.0.insert.ext, 64
  %a.sroa.0.0.insert.ext = zext i64 %a.coerce0 to i128
  %a.sroa.0.0.insert.insert = or i128 %a.sroa.2.0.insert.shift, %a.sroa.0.0.insert.ext
    #dbg_value(i128 %a.sroa.0.0.insert.insert, !17, !DIExpression(), !18)
// ...
!17 = !DILocalVariable(name: "a", arg: 1, scope: !10, file: !11, line: 1, type: !14)
// ...
```
  
and InstCombine then removes the `or`, moving it into the
`DIExpression`, and the `shl` at which point the debug info salvaging in
`Transforms/Local` replaces the arguments with `poison` as it does not
allow constants larger than 64 bit in `DIExpression`s.
  
I'm working under the assumption that there is interest in fixing this.
If not, please tell me.
By not coercing `int128_t`s into `{i64, i64}` but keeping them as
`i128`, the debug info stays intact and SelectionDAG then generates two
`DW_OP_LLVM_fragment` expressions for the two corresponding argument
registers.

Given that the ABI code for x64 seems to not coerce the argument when it
is passed on the stack, it should not lead to any problems keeping it as
an `i128` when it is passed in registers.

Alternatively, this could be fixed by checking if a constant value fits
in 64 bits in the debug info salvaging code and then extending the value
on the expression stack to the necessary width. This fixes InstCombine
breaking the debug info but then SelectionDAG removes the expression and
that seems significantly more complex to debug.

Another fix may be to generate `DW_OP_LLVM_fragment` expressions when
removing the `or` as it gets marked as disjoint by InstCombine. However,
I don't know if the KnownBits information is still available at the time
the `or` gets removed and it would probably require refactoring of the
debug info salvaging code as that currently only seems to replace single
expressions and is not designed to support generating new debug records.

Converting `(u)int128_t` arguments to `i128` in the IR seems like the
simpler solution, if it doesn't cause any ABI issues.
2025-07-17 09:57:32 -07:00