221 Commits

Author SHA1 Message Date
Jie Fu
80bc38bc92 [RISCV] Silent a warning (NFC)
/llvm-project/clang/lib/CodeGen/Targets/RISCV.cpp:865:9:
 error: unused variable 'FixedSrcTy' [-Werror,-Wunused-variable]
  auto *FixedSrcTy = cast<llvm::FixedVectorType>(SrcTy);
        ^
1 error generated.
2025-08-20 16:59:12 +08:00
Brandon Wu
52a2e68fda
[clang][RISCV] Fix crash on VLS calling convention (#145489)
This patch handle struct of fixed vector and struct of array of fixed
vector correctly for VLS calling convention in EmitFunctionProlog,
EmitFunctionEpilog and EmitCall.

stack on: https://github.com/llvm/llvm-project/pull/147173
2025-08-20 16:39:02 +08:00
Nikita Popov
246a64a12e
[Clang] Rename HasLegalHalfType -> HasFastHalfType (NFC) (#153163)
This option is confusingly named. What it actually controls is whether,
under the default of `-ffloat16-excess-precision=standard`, it is
beneficial for performance to perform calculations on float (without
intermediate rounding) or not. For `-ffloat16-excess-precision=none` the
LLVM `half` type will always be used, and all backends are expected to
legalize it correctly.
2025-08-18 09:23:48 +02:00
Matheus Izvekov
91cdd35008
[clang] Improve nested name specifier AST representation (#147835)
This is a major change on how we represent nested name qualifications in
the AST.

* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.

This patch offers a great performance benefit.

It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.

This has great results on compile-time-tracker as well:

![image](https://github.com/user-attachments/assets/700dce98-2cab-4aa8-97d1-b038c0bee831)

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.

It has some other miscelaneous drive-by fixes.

About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.

There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.

How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.

The rest and bulk of the changes are mostly consequences of the changes
in API.

PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.

Fixes #136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
2025-08-09 05:06:53 -03:00
Gergely Futo
1454db130a
[RISCV] Support resumable non-maskable interrupt handlers (#148134)
The `rnmi` interrupt attribute value has been added for the `Smrnmi`
extension.

---------

Co-authored-by: Sam Elliott <sam@lenary.co.uk>
2025-08-04 10:54:50 +02:00
T0b1-iOS
d35931c49e
[Clang][CodeGen][X86] don't coerce int128 into {i64,i64} for SysV-like ABIs (#135230)
Currently, clang coerces (u)int128_t to two i64 IR parameters when they
are passed in registers. This leads to broken debug info for them after
applying SROA+InstCombine. SROA generates IR like this
([godbolt](https://godbolt.org/z/YrTa4chfc)):
```llvm
define dso_local { i64, i64 } @add(i64 noundef %a.coerce0, i64 noundef %a.coerce1)  {
entry:
  %a.sroa.2.0.insert.ext = zext i64 %a.coerce1 to i128
  %a.sroa.2.0.insert.shift = shl nuw i128 %a.sroa.2.0.insert.ext, 64
  %a.sroa.0.0.insert.ext = zext i64 %a.coerce0 to i128
  %a.sroa.0.0.insert.insert = or i128 %a.sroa.2.0.insert.shift, %a.sroa.0.0.insert.ext
    #dbg_value(i128 %a.sroa.0.0.insert.insert, !17, !DIExpression(), !18)
// ...
!17 = !DILocalVariable(name: "a", arg: 1, scope: !10, file: !11, line: 1, type: !14)
// ...
```
  
and InstCombine then removes the `or`, moving it into the
`DIExpression`, and the `shl` at which point the debug info salvaging in
`Transforms/Local` replaces the arguments with `poison` as it does not
allow constants larger than 64 bit in `DIExpression`s.
  
I'm working under the assumption that there is interest in fixing this.
If not, please tell me.
By not coercing `int128_t`s into `{i64, i64}` but keeping them as
`i128`, the debug info stays intact and SelectionDAG then generates two
`DW_OP_LLVM_fragment` expressions for the two corresponding argument
registers.

Given that the ABI code for x64 seems to not coerce the argument when it
is passed on the stack, it should not lead to any problems keeping it as
an `i128` when it is passed in registers.

Alternatively, this could be fixed by checking if a constant value fits
in 64 bits in the debug info salvaging code and then extending the value
on the expression stack to the necessary width. This fixes InstCombine
breaking the debug info but then SelectionDAG removes the expression and
that seems significantly more complex to debug.

Another fix may be to generate `DW_OP_LLVM_fragment` expressions when
removing the `or` as it gets marked as disjoint by InstCombine. However,
I don't know if the KnownBits information is still available at the time
the `or` gets removed and it would probably require refactoring of the
debug info salvaging code as that currently only seems to replace single
expressions and is not designed to support generating new debug records.

Converting `(u)int128_t` arguments to `i128` in the IR seems like the
simpler solution, if it doesn't cause any ABI issues.
2025-07-17 09:57:32 -07:00
Brad Smith
0d2e11f3e8
Remove Native Client support (#133661)
Remove the Native Client support now that it has finally reached end of life.
2025-07-15 13:22:33 -04:00
Sven van Haastregt
d45d20e871
[OpenCL] Remove image dimensionality comments; NFC (#147312)
The code is correct as it aligns with the SPIR-V Specification, but the
comment was incorrect.
2025-07-09 10:27:30 +02:00
Brandon Wu
6ee375147b
[RISCV] Correct type lowering of struct of fixed-vector array in VLS (#147173)
Currently, struct of fixed-vector array is flattened and lowered to
scalable vector. However only struct of 1-element-fixed-vector array
should be lowered that way, struct of fixed-vector array of length >1
should be lowered to vector tuple type.

https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/418/files#diff-3a934f00cffdb3e509722753126a2cf6082a7648ab3b9ca8cbb0e84f8a6a12edR555-R558
2025-07-08 21:14:40 -07:00
Shafik Yaghmour
6efa366b43
[Clang][NFC] Avoid copies by using std::move (#146960)
Static analysis flagged this code as using copies when we could use move
instead. I used a temporary in some cases instead of an explicit move.
2025-07-07 17:53:45 -07:00
Eli Friedman
2aa0f0a3bd
[AArch64] Add option -msve-streaming-vector-bits= . (#144611)
This is similar to -msve-vector-bits, but for streaming mode: it
constrains the legal values of "vscale", allowing optimizations based on
that constraint.

This also fixes conversions between SVE vectors and fixed-width vectors
in streaming functions with -msve-vector-bits and
-msve-streaming-vector-bits.

This rejects any use of arm_sve_vector_bits types in streaming
functions; if it becomes relevant, we could add
arm_sve_streaming_vector_bits types in the future.

This doesn't touch the __ARM_FEATURE_SVE_BITS define.
2025-07-03 13:44:38 -07:00
Steven Perron
68173c8091
[HLSL][SPRIV] Handle signed RWBuffer correctly (#144774)
In Vulkan, the signedness of the accesses to images has to match the
signedness of the backing image.
    
See

https://docs.vulkan.org/spec/latest/chapters/textures.html#textures-input,
where it says the behaviour is undefined if
    
> the signedness of any read or sample operation does not match the
signedness of the image’s format.
    
Users who define say an `RWBuffer<int>` will create a Vulkan image with
a signed integer format. So the HLSL that is generated must match that
expecation.
    
The solution we use is to generate a `spirv.SignedImage` target type for
signed integer instead of `spirv.Image`. The two types are otherwise the
same.
    
The backend will add the `signExtend` image operand to access to the
image to ensure the image is access as a signed image.
    
Fixes #144580
2025-07-02 12:09:47 -04:00
Sarah Spall
23be14b222
[HLSL][SPIRV] Boolean in a RawBuffer should be i32 and Boolean vector in a RawBuffer should be <N x i32> (#144929)
Instead of converting the type in a RawBuffer to its HLSL type using
'ConvertType', use 'ConvertTypeForMem'.
ConvertTypeForMem handles booleans being i32 and boolean vectors being <
N x i32 >.
Add tests to show booleans and boolean vectors in RawBuffers now have
the correct type of i32, and respectively.
Closes #141089
2025-06-27 13:43:03 -07:00
Alex Voicu
992f0d1225
[Clang][SPIRV][AMDGPU] Override supportsLibCall for AMDGCNSPIRV (#143814)
The `supportsLibCall` predicate is used to select whether some math builtins get expanded in the FE or they get lowered into libcalls. The default implementation unconditionally returns true, which is problematic for AMDGCN-flavoured SPIRV, as AMDGPU does not support any libcalls at the moment. This change overrides the predicate in order to reflect this and correctly do the expected FE expansion when targeting AMDGCN-flavoured SPIRV.
2025-06-25 11:22:59 +01:00
Kazu Hirata
ae372bfca8
[CodeGen] Use range-based for loops (NFC) (#145142) 2025-06-21 08:20:57 -07:00
Nick Sarnie
86d1d6b2c0
[clang] Use TargetInfo to determine device kernel calling convention (#144728)
We should abstract this logic away to `TargetInfo`. See
https://github.com/llvm/llvm-project/pull/137882 for more information.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-06-18 20:50:12 +00:00
David Green
030a471753 [AArch64][Clang] Exclude address spaces from pointer-only coercion types.
As reported on #135064, the generic pointer coercion code in
CoerceIntOrPtrToIntOrPtr cannot handle address space casts (it tries to bitcast
the pointers). This bails out if an address space qualifier is found on the
pointer.
2025-06-12 20:51:58 +01:00
David Green
5f648c370e
[AArch64] Change the coercion type of structs with pointer members. (#135064)
The aim here is to avoid a ptrtoint->inttoptr round-trip through the function
argument whilst keeping the calling convention the same. Given a struct which
is <= 128bits in size, which can only contain either 1 or 2 pointers, we
convert to a ptr or [2 x ptr] as opposed to the old coercion that uses i64 or
[2 x i64]. This helps alias analysis produce more accurate results.
2025-06-10 07:04:54 +01:00
Nick Sarnie
3b9ebe9201
[clang] Simplify device kernel attributes (#137882)
We have multiple different attributes in clang representing device
kernels for specific targets/languages. Refactor them into one attribute
with different spellings to make it more easily scalable for new
languages/targets.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-06-05 14:15:38 +00:00
Ami-zhang
8c65f68330
[clang][LoongArch] Add support for the _Float16 type (#141703)
Enable _Float16 for LoongArch target. Additionally, this change fixes
incorrect ABI lowering of _Float16 in the case of structs containing
fp16 that are eligible for passing via GPR+FPR or FPR+FPR. Finally, it
also fixes int16 -> __fp16 conversion code gen, which uses generic LLVM
IR rather than llvm.convert.to.fp16 intrinsics.
2025-06-03 14:26:11 +08:00
Nikita Popov
e2b536431d
[CodeGen] Move CodeGenPGO behind unique_ptr (NFC) (#142155)
The InstrProf headers are very expensive. Avoid including them in all of
CodeGen/ by moving the CodeGenPGO member behind a unqiue_ptr.

This reduces clang build time by 0.8%.
2025-06-02 09:51:54 +02:00
Steven Perron
5584020d8a
[HLSL][SPIRV] Implement the SPIR-V target type for cbuffers. (#140061)
This change implement the type used to represent cbuffer for SPIR-V.

Fixes https://github.com/llvm/llvm-project/issues/138274.
2025-05-28 07:51:03 -04:00
David Green
3a42cbd47d [AArch64] Rename AArch64SVEACLETypes.def and add base SVE_TYPE. 2025-05-28 12:26:54 +01:00
Cassandra Beckley
5a4571133a
[HLSL] Implement SpirvType and SpirvOpaqueType (#134034)
This implements the design proposed by [Representing SpirvType in
Clang's Type System](https://github.com/llvm/wg-hlsl/pull/181). It
creates `HLSLInlineSpirvType` as a new `Type` subclass, and
`__hlsl_spirv_type` as a new builtin type template to create such a
type.

This new type is lowered to the `spirv.Type` target extension type, as
described in [Target Extension Types for Inline SPIR-V and Decorated
Types](https://github.com/llvm/wg-hlsl/blob/main/proposals/0017-inline-spirv-and-decorated-types.md).
2025-05-27 11:40:54 -04:00
Kazu Hirata
8075c15f54
[CodeGen] Remove unused includes (NFC) (#141418)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-25 10:55:28 -07:00
Anatoly Trosinenko
f10a90587f
[clang][AArch64] Move initialization of ptrauth-* function attrs (#140277)
Move the initialization of ptrauth-* function attributes near the
initialization of branch protection attributes. The semantics of these
groups of attributes partially overlaps, so handle both groups in
getDefaultFunctionAttributes() and setTargetAttributes() functions to
prevent getting them out of sync. This fixes C++ TLS wrappers.
2025-05-20 12:50:58 +03:00
choikwa
77de8a0c0a
[AMDGPU][clang] provide device implementation for __builtin_logb and … (#129347)
…__builtin_scalbn

Clang generates library calls for __builtin_* functions which can be a
problem for GPUs that cannot handle them. This patch generates call to
device implementation for __builtin_logb and ldexp intrinsic for
__builtin_scalbn.
2025-05-19 14:11:31 -04:00
Sam Elliott
cfc5baf6e6
[RISCV] SiFive CLIC Support (#132481)
This Change adds support for two SiFive vendor attributes in clang:
- "SiFive-CLIC-preemptible"
- "SiFive-CLIC-stack-swap"

These can be given together, and can be combined with "machine", but
cannot be combined with any other interrupt attribute values.

These are handled primarily in RISCVFrameLowering:
- "SiFive-CLIC-stack-swap" entails swapping `sp` with `sf.mscratchcsw`
  at function entry and exit, which holds the trap stack pointer.
- "SiFive-CLIC-preemptible" entails saving `mcause` and `mepc` before
  re-enabling interrupts using `mstatus`. To save these, `s0` and `s1`
  are first spilled to the stack, and then the values are read into
  these registers. If these registers are used in the function, their
  values will be spilled a second time onto the stack with the generic
  callee-saved-register handling. At the end of the function interrupts
  are disabled again before `mepc` and `mcause` are restored.

This Change also adds support for the following two experimental
extensions, which only contain CSRs:
- XSfsclic - for SiFive's CLIC Supervisor-Mode CSRs
- XSfmclic - for SiFive's CLIC Machine-Mode CSRs

The latter is needed for interrupt support.

The CFI information for this implementation is not correct, but I'd
prefer to correct this in a follow-up. While it's unlikely anyone wants
to unwind through a handler, the CFI information is also used by
debuggers so it would be good to get it right.

Co-authored-by: Ana Pazos <apazos@quicinc.com>
2025-04-25 17:12:27 -07:00
Victor Campos
6738cfe0a4
Mark CXX module initializer with PACBTI attributes (#133716)
The CXX module initializer function, which is called at program startup,
needs to be tagged with Pointer Authentication and Branch Target
Identification marks whenever relevant.

Before this patch, in CPUs set up for PACBTI execution, the function
wasn't protected with return address signing and no BTI instruction was
inserted at the start of it, thus leading to an execution fault.

This patch fixes the issue by marking the function with the function
attributes related to PAC and BTI if relevant.
2025-04-25 11:04:34 +01:00
Benson Chu
50320504c8 [ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute
FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been
saved.

- GPRCS1
- GPRCS2
- FPStatusRegs (new)
- DPRCS
- GPRCS3
- DPRCS2

FPSCR is present on all targets with a VFP, but the FPEXC register is
not present on Cortex-M devices, so different amounts of bytes are
being pushed onto the stack depending on our target, which would
affect alignment for subsequent saves.

DPRCS1 will sum up all previous bytes that were saved, and will emit
extra instructions to ensure that its alignment is correct. My
assumption is that if DPRCS1 is able to correct its alignment to be
correct, then all subsequent saves will also have correct alignment.

Avoid annotating the saving of FPSCR and FPEXC for functions marked
with the interrupt_save_fp attribute, even though this is done as part
of frame setup.  Since these are status registers, there really is no
viable way of annotating this. Since these aren't GPRs or DPRs, they
can't be used with .save or .vsave directives. Instead, just record
that the intermediate registers r4 and r5 are saved to the stack
again.

Co-authored-by: Jake Vossen <jake@vossen.dev>
Co-authored-by: Alan Phipps <a-phipps@ti.com>
2025-04-22 14:31:29 -05:00
Sarah Spall
7810d84844
[HLSL] Boolean in a RawBuffer should be i32 and Boolean vector in a RawBuffer should be <N x i32> (#135848)
Instead of converting the type in a RawBuffer to its HLSL type using
'ConvertType', use 'ConvertTypeForMem'.
ConvertTypeForMem handles booleans being i32 and boolean vectors being <
N x i32 >.
Add tests to show booleans and boolean vectors in RawBuffers now have
the correct type of i32, and <N x i32> respectively.
Closes #135635
2025-04-21 15:11:39 -07:00
Kazu Hirata
f4c76bba59
[clang] Use llvm::append_range (NFC) (#136256)
This patch replaces:

  llvm::copy(Src, std::back_inserter(Dst));

with:

  llvm::append_range(Dst, Src);

for breavity.

One side benefit is that llvm::append_range eventually calls
llvm::SmallVector::reserve if Dst is of llvm::SmallVector.
2025-04-18 00:15:13 -07:00
Tom Honermann
0348ff5158
[SYCL] Basic code generation for SYCL kernel caller offload entry point functions. (#133030)
A function declared with the `sycl_kernel_entry_point` attribute,
sometimes called a SYCL kernel entry point function, specifies a pattern
from which the parameters and body of an offload entry point function,
sometimes called a SYCL kernel caller function, are derived.

SYCL kernel caller functions are emitted during SYCL device compilation.
Their parameters and body are derived from the `SYCLKernelCallStmt`
statement and `OutlinedFunctionDecl` declaration associated with their
corresponding SYCL kernel entry point function. A distinct SYCL kernel
caller function is generated for each SYCL kernel entry point function
defined as a non-inline function or ODR-used in the translation unit.

The name of each SYCL kernel caller function is parameterized by the
SYCL kernel name type specified by the `sycl_kernel_entry_point`
attribute attached to the corresponding SYCL kernel entry point
function. For the moment, the Itanium ABI mangled name for typeinfo data
(`_ZTS<type>`) is used to name these functions; a future change will
switch to a more appropriate naming scheme.

The calling convention used for a SYCL kernel caller function is target
dependent. Support for AMDGCN, NVPTX, and SPIR targets is currently
provided. These functions are required to observe the language
restrictions for SYCL devices as specified by the SYCL 2020
specification; this includes a forward progress guarantee and prohibits
recursion.

Only SYCL kernel caller functions, functions declared as
`SYCL_EXTERNAL`, and functions directly or indirectly referenced from
those functions should be emitted during device compilation. Pruning of
other declarations has not yet been implemented.

---------

Co-authored-by: Elizabeth Andrews <elizabeth.andrews@intel.com>
2025-04-17 09:14:45 -04:00
Jonas Paulsson
6d03f51f0c
[SystemZ] Add support for 16-bit floating point. (#109164)
- _Float16 is now accepted by Clang.

- The half IR type is fully handled by the backend.

- These values are passed in FP registers and converted to/from float around
  each operation.

- Compiler-rt conversion functions are now built for s390x including the missing
  extendhfdf2 which was added.

Fixes #50374
2025-04-16 20:02:56 +02:00
Shilei Tian
ce01e4e2f6
[Clang][OpenCL][AMDGPU] Use byref for aggregate OpenCL kernel arguments (#134892)
Due to a previous workaround allowing kernels to be called from other
functions,
Clang currently doesn't use the `byref` attribute for aggregate kernel
arguments. The issue was recently resolved in
https://github.com/llvm/llvm-project/pull/115821. With that fix, we can
now
enable the use of `byref` consistently across all languages.

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>

Fixes SWDEV-247226.

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2025-04-13 10:17:55 -04:00
Aniket Lal
642481a428
[Clang][OpenCL][AMDGPU] Allow a kernel to call another kernel (#115821)
This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.
    
Fixes https://github.com/llvm/llvm-project/issues/60313
    
D120566 was an earlier attempt made to upstream a solution
for this issue.

---------

Co-authored-by: anikelal <anikelal@amd.com>
2025-04-08 10:29:30 +05:30
Farzon Lotfi
82103dfae9
Revert "Reland [Clang][Cmake] fix libtool duplicate member name warnings" (#134656)
Reverts llvm/llvm-project#133850
2025-04-07 10:00:53 -04:00
Farzon Lotfi
0d71d9ab28
Reland [Clang][Cmake] fix libtool duplicate member name warnings (#133850)
fixes https://github.com/llvm/llvm-project/issues/133199

As of the third commit the fix to the linker missing references in
`Targets/DirectX.cpp` found in
https://github.com/llvm/llvm-project/pull/133776 was fixed by moving
`HLSLBufferLayoutBuilder.cpp` to `clang/lib/CodeGen/Targets/`.

It fixes the circular reference issue found in
https://github.com/llvm/llvm-project/pull/133619 for all
`-DBUILD_SHARED_LIBS=ON` builds by removing `target_link_libraries` from
the sub directory cmake files.

testing for amdgpu offload was done via
`cmake -B ../llvm_amdgpu -S llvm -GNinja -C
offload/cmake/caches/Offload.cmake -DCMAKE_BUILD_TYPE=Release`

PR https://github.com/llvm/llvm-project/pull/132252 Created a second
file that shared <TargetName>.cpp in clang/lib/CodeGen/CMakeLists.txt

For example There were two AMDGPU.cpp's one in TargetBuiltins and the
other in Targets. Even though these were in different directories
libtool warns that it might not distinguish them because they share the
same base name.

There are two potential fixes. The easy fix is to rename one of them and
keep one cmake file. That solution though doesn't future proof this
problem in the event of a third <TargetName>.cpp and it seems teams want
to just use the target name

https://github.com/llvm/llvm-project/pull/132252#issuecomment-2758178483.

The alternative fix that this PR went with is to seperate the cmake
files into their own sub directories as static libs.
2025-04-07 09:53:07 -04:00
Steven Perron
16603d838c
[HLSL] Add SPIR-V target type for RWStructuredBuffers (#133468)
This PR adds the target type for main storage for HLSL raw buffer types.
It does not handle the counter variables that are associated with those
buffers.

This is implementing part of
https://github.com/llvm/wg-hlsl/blob/main/proposals/0018-spirv-resource-representation.md.
We do not handle other HLSL raw buffer types.
2025-04-01 16:59:46 -04:00
Farzon Lotfi
bdae91b08b
Revert "[Clang][Cmake] fix libtool duplicate member name warnings" (#133795)
Reverts llvm/llvm-project#133619
2025-03-31 17:00:38 -04:00
Farzon Lotfi
cc2b432614
[Clang][Cmake] fix libtool duplicate member name warnings (#133619)
fixes #133199
 
PR #132252 Created a second file that shared `<TargetName>.cpp` in
`clang/lib/CodeGen/CMakeLists.txt`

For example There were two `AMDGPU.cpp`'s one in `TargetBuiltins` and
the other in `Targets`. Even though these were in different directories
`libtool` warns that it might not distinguish them because they share
the same base name.

There are two potential fixes. The easy fix is to rename one of them and
keep one cmake file. That solution though doesn't future proof this
problem in the event of a third `<TargetName>.cpp` and it seems teams
want to just use the target name

https://github.com/llvm/llvm-project/pull/132252#issuecomment-2758178483.

The alternative fix is to seperate the cmake files into their own sub
directories. I chose to create static libraries. It might of been
possible to build an OBJECT, but I only saw examples of this in
compiler-rt and test directories so assumed there was a reason it wasn't
used.
2025-03-31 14:21:22 -04:00
Joseph Huber
772173f548
[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870)
Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.

This isn't a great solution because it forced every TU to have this
variable in it. The patch in
https://github.com/llvm/llvm-project/pull/131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).

So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.
2025-03-28 07:35:16 -05:00
Ben Shi
597accfea6
[clang][CodeGen][AVR] Fix a crash in AVRABIInfo (#131976)
fixes https://github.com/llvm/llvm-project/issues/131967
2025-03-22 13:22:32 +08:00
Alexander Shaposhnikov
297f0b3f4c
[CudaSPIRV] Allow using integral non-type template parameters as attribute args (#131546)
Allow using integral non-type template parameters as attribute arguments
of
reqd_work_group_size and work_group_size_hint.

Test plan:
ninja check-all
2025-03-19 10:11:18 -07:00
Helena Kotas
cb64a363ca
[HLSL] Make sure isSigned flag is set on target type for TypedBuffer resources with signed int vectors (#130223)
Fixes #130191
2025-03-14 13:09:21 -07:00
Helena Kotas
73e12de062
[HLSL] Implement explicit layout for default constant buffer ($Globals) (#128991)
Processes `HLSLResourceBindingAttr` attributes that represent
`register(c#)` annotations on default constant buffer declarations and
applies its value to the buffer layout. Any default buffer declarations
without an explicit `register(c#)` annotation are placed after the
elements with explicit layout.

This PR also adds a test case for a `cbuffer` that does not have
`packoffset` on all declarations. Same layout rules apply here as well.

Fixes #126791
2025-03-12 22:35:07 -07:00
Benson Chu
3b3356043c Revert "[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute"
This reverts commit 1f05703176d43a339b41a474f51c0e8b1a83c9bb.
2025-03-10 10:11:23 -05:00
Benson Chu
1f05703176 [ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute
FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been
saved.

- GPRCS1
- GPRCS2
- FPStatusRegs (new)
- DPRCS
- GPRCS3
- DPRCS2

FPSCR is present on all targets with a VFP, but the FPEXC register is
not present on Cortex-M devices, so different amounts of bytes are
being pushed onto the stack depending on our target, which would
affect alignment for subsequent saves.

DPRCS1 will sum up all previous bytes that were saved, and will emit
extra instructions to ensure that its alignment is correct. My
assumption is that if DPRCS1 is able to correct its alignment to be
correct, then all subsequent saves will also have correct alignment.

Avoid annotating the saving of FPSCR and FPEXC for functions marked
with the interrupt_save_fp attribute, even though this is done as part
of frame setup.  Since these are status registers, there really is no
viable way of annotating this. Since these aren't GPRs or DPRs, they
can't be used with .save or .vsave directives. Instead, just record
that the intermediate registers r4 and r5 are saved to the stack
again.

Co-authored-by: Jake Vossen <jake@vossen.dev>
Co-authored-by: Alan Phipps <a-phipps@ti.com>
2025-03-10 10:05:15 -05:00
Matt Arsenault
0d2c55cb96
AMDGPU: Move enqueued block handling into clang (#128519)
The previous implementation wasn't maintaining a faithful IR
representation of how this really works. The value returned by
createEnqueuedBlockKernel wasn't actually used as a function, and
hacked up later to be a pointer to the runtime handle global
variable. In reality, the enqueued block is a struct where the first
field is a pointer to the kernel descriptor, not the kernel itself. We
were also relying on passing around a reference to a global using a
string attribute containing its name. It's better to base this on a
proper IR symbol reference during final emission.

This now avoids using a function attribute on kernels and avoids using
the additional "runtime-handle" attribute to populate the final
metadata. Instead, associate the runtime handle reference to the
kernel with the !associated global metadata. We can then get a final,
correctly mangled name at the end.

I couldn't figure out how to get rename-with-external-symbol behavior
using a combination of comdats and aliases, so leaves an IR pass to
externalize the runtime handles for codegen. If anything breaks, it's
most likely this, so leave avoiding this for a later step. Use a
special section name to enable this behavior. This also means it's
possible to declare enqueuable kernels in source without going through
the dedicated block syntax or other dedicated compiler support.

We could move towards initializing the runtime handle in the
compiler/linker. I have a working patch where the linker sets up the
first field of the handle, avoiding the need to export the block
kernel symbol for the runtime. We would need new relocations to get
the private and group sizes, but that would avoid the runtime's
special case handling that requires the device_enqueue_symbol metadata
field.

https://reviews.llvm.org/D141700
2025-03-10 19:54:04 +07:00
Kito Cheng
55f86cf023
[RISCV][clang] Fix wrong VLS CC detection (#130107)
RISCVABIInfo::detectVLSCCEligibleStruct should early exit if VLS calling
convention is not used, however the sentinel value was not set to
correctly, it should be zero instead of one.
2025-03-07 11:15:20 +08:00