17463 Commits

Author SHA1 Message Date
Daniel Paoliello
35c7df1a21
[aarch64][arm] Add support for the _Interlocked[Compare]ExchangePointer_{acq|nf|rel} MS intrinsics (#117645)
Adds support for the following MSVC intrinsics:
* `_InterlockedCompareExchangePointer_acq`
* `_InterlockedCompareExchangePointer_rel`
* `_InterlockedExchangePointer_acq`
* `_InterlockedExchangePointer_nf`
* `_InterlockedExchangePointer_rel`

These are documented at:
<https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170#interlocked-intrinsics>

NOTE: `_InterlockedCompareExchangePointer_nf` is not being added since
it already exists, although it was incorrectly added for all
architectures instead of being Arm & AArch64 specific.

This change also unifies how the pointer and non-pointer interlocked
compare-exchange intrinsics are being handled.
2024-12-04 13:41:26 -08:00
Daniel Paoliello
ee9e786717
[aarch64] Add support for the __{inc|add}x18{byte|word|dword|qword intrinsics (#117752)
Adds support for the following MSVC intrinsics:
* `__addx18byte`
* `__addx18word`
* `__addx18dword`
* `__addx18qword`
* `__incx18byte`
* `__incx18word`
* `__incx18dword`
* `__incx18qword`

These are documented at:
<https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170>
2024-12-04 10:29:58 -08:00
Balazs Benics
2e85138c0d
[clang][NFC] Generalize getSpecificAttr for const attributes (#116606)
This patch allows using `getSpecificAttr` for getting `const`
attributes. Previously, if users of this API would want to get a const
Attribute pointer, they had to pass `getSpecificAttr<const XYZ>()`, to
get it compile. It feels like an arbitrary limitation as the constness
was already encoded in the Attribute container's value type.
2024-12-04 15:17:47 +01:00
Sarah Spall
46de3a7064
[HLSL] get inout/out ABI for array parameters working (#111047)
Get inout/out parameters working for HLSL Arrays.
Utilizes the fix from #109323, and corrects the assignment behavior
slightly to allow for Non-LValues on the RHS.
Closes #106917

---------

Co-authored-by: Chris B <beanz@abolishcrlf.org>
2024-12-03 17:43:36 -08:00
Adam Yang
dd2b2b8bbb
[clang][HLSL] Add GroupMemoryBarrierWithGroupSync intrinsic (#111883)
partially fixes #70103 

### Changes
* Implemented `GroupMemoryBarrierWithGroupSync` clang builtin
* Linked `GroupMemoryBarrierWithGroupSync` clang builtin with
`hlsl_intrinsics.h`
* Added sema checks for `GroupMemoryBarrierWithGroupSync` to
`CheckHLSLBuiltinFunctionCall` in
`SemaChecking.cpp`
* Add codegen for `GroupMemoryBarrierWithGroupSync` to
`EmitHLSLBuiltinExpr` in `CGBuiltin.cpp`
* Add codegen tests to
`clang/test/CodeGenHLSL/builtins/GroupMemoryBarrierWithGroupSync.hlsl`
* Add sema tests to
`clang/test/SemaHLSL/BuiltIns/GroupMemoryBarrierWithGroupSync-errors.hlsl`

### Related PRs
* [[DXIL] Add GroupMemoryBarrierWithGroupSync intrinsic
#111884](https://github.com/llvm/llvm-project/pull/111884)
* [[SPIRV] Add GroupMemoryBarrierWithGroupSync intrinsic
#111888](https://github.com/llvm/llvm-project/pull/111888)
2024-12-03 01:16:49 -08:00
Justin Bogner
bd92e46204
[HLSL] Implement RWBuffer::operator[] via __builtin_hlsl_resource_getpointer (#117017)
This introduces `__builtin_hlsl_resource_getpointer`, which lowers to
`llvm.dx.resource.getpointer` and is used to implement indexing into
resources.

This will only work through the backend for typed buffers at this point,
but the changes to structured buffers should be correct as far as the
frontend is concerned.

Note: We probably want this to return a reference in the HLSL device
address space, but for now we're just using address space 0. Creating a
device address space and updating this code can be done later as
necessary.

Fixes #95956
2024-12-02 14:03:31 -08:00
Matt Arsenault
a796f597cd
AMDGPU: Allow f16/bf16 for DS_READ_TR16_B64 gfx950 builtins (#118297)
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-12-02 14:40:36 -05:00
Oleksandr T.
071da9261b
[Clang] ensure mangled names are valid identifiers before being suggested in ifunc/alias attributes notes (#118170)
Fixes #112205

--- 

Commit that introduced this feature -
9306ef9750
2024-12-02 18:16:47 +02:00
Pedro Lobo
98e747ba56
[clang] Use poison instead of undef as the placeholder when creating a new vector [NFC] (#117064)
Call `@llvm.vector.insert` with a `poison` vector when coercing a fixed
vector to a scalable vector with the same element type.
2024-12-02 09:00:39 +00:00
SpencerAbson
e4ee970c4b
[AArch64] Implement intrinsics for F1CVTL/F2CVTL and BF1CVTL/BF2CVTL (#116959)
This patch implements the following intrinsics:

8-bit floating-point convert to deinterleaved half-precision or
BFloat16.
``` c
  // Variant is also available for: _bf16[_mf8]_x2
  svfloat16x2_t svcvtl1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
  svfloat16x2_t svcvtl2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
```

Defined in https://github.com/ARM-software/acle/pull/323

Co-authored-by: Caroline Concatto caroline.concatto@arm.com
Co-authored-by: Marian Lukac marian.lukac@arm.com
2024-11-28 12:37:02 +00:00
Alexandros Lamprineas
88c2af80fa
[NFC][clang][FMV][TargetInfo] Refactor API for FMV feature priority. (#116257)
Currently we have code with target hooks in CodeGenModule shared between
X86 and AArch64 for sorting MultiVersionResolverOptions. Those are used
when generating IFunc resolvers for FMV. The RISCV target has different
criteria for sorting, therefore it repeats sorting after calling
CodeGenFunction::EmitMultiVersionResolver.

I am moving the FMV priority logic in TargetInfo, so that it can be
implemented by the TargetParser which then makes it possible to query it
from llvm. Here is an example why this is handy:
https://github.com/llvm/llvm-project/pull/87939
2024-11-28 09:22:05 +00:00
CHANDRA GHALE
76e6c8d3fc
Codegen changes for strict modifier with grainsize/num_tasks of taskloop construct (#117196)
Initial parsing/sema for 'strict' modifier with 'num_tasks' and
‘grainsize’ clause is present in these commits
[grainsize_parsing](ab9eac762c)
and
[num_tasks_parsing](56c1660170 (diff-4184486638e85284c3a2c961a81e7752231022daf97e411007c13a6732b50db9R6545))
. However, this implementation appears incomplete as it lacks code
generation support. A runtime patch was introduced in this runtime
commit
[runtime_patch](540007b427 (diff-5e95f9319910d6965d09c301359dbe6b23f3eef5ce4d262ef2c2d2137875b5c4R374))
, which adds a new API, _kmpc_taskloop_5, to accommodate the strict
modifier. 
In this patch I have added codegen support. When the strict modifier is
present alongside the grainsize or num_tasks clauses of taskloop
construct, the code now emits a call to _kmpc_taskloop_5, which includes
an additional parameter of type i32 with the value 1 to indicate the
strict modifier. If the strict modifier is not present, it falls back to
the existing _kmpc_taskloop API call.

---------

Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
2024-11-28 14:18:59 +05:30
Thurston Dang
0d15d46362
[ubsan] Change ubsan-unique-traps to use nomerge instead of counter (#117651)
https://github.com/llvm/llvm-project/pull/65972 (continuation of
https://reviews.llvm.org/D148654) had considered adding nomerge to
ubsantrap, but did not proceed with that because of
https://github.com/llvm/llvm-project/issues/53011. Instead, it added a
counter (based on TrapBB->getParent()->size()) to each ubsantrap call.
However, this counter is not guaranteed to be unique after inlining, as
shown by https://github.com/llvm/llvm-project/pull/83470, which can
result in ubsantraps being merged by the backend.

https://github.com/llvm/llvm-project/pull/101549 has since fixed the
nomerge limitation ("It sets nomerge flag for the node if the
instruction has nomerge arrtibute."). This patch therefore takes
advantage of nomerge instead of using the counter, guaranteeing that the
ubsantraps are not merged.

This patch is equivalent to
https://github.com/llvm/llvm-project/pull/83470 but also adds nomerge
and updates tests (https://github.com/llvm/llvm-project/pull/117649:
ubsan-trap-merge.c; https://github.com/llvm/llvm-project/pull/117657:
ubsan-trap-merge.ll, ubsan-trap-nomerge.ll; catch-undef-behavior.c).
2024-11-26 21:13:00 -08:00
Matt Arsenault
62dc8f3069
AMDGPU: Add builtins & codegen support for bitop3_b{16|32} of gfx950. (#117823)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 23:33:07 -05:00
Zhengxing li
5fd4f32f98
[HLSL] Implement SV_GroupID semantic (#115911)
Support SV_GroupID attribute.
Translate it into dx.group.id in clang codeGen.

Fixes: #70120
2024-11-26 10:45:31 -08:00
Benjamin Maxwell
db6f627f3f
[clang][SME] Ignore flatten/clang::always_inline statements for callees with mismatched streaming attributes (#116391)
If `__attribute__((flatten))` is used on a function, or
`[[clang::always_inline]]` on a statement, don't inline any callees with
incompatible streaming attributes. Without this check, clang may produce
incorrect code when these attributes are used in code with streaming
functions.

Note: The docs for flatten say it can be ignored when inlining is
impossible: "causes calls within the attributed function to be inlined
unless it is impossible to do so".

Similarly, the (clang-only) `[[clang::always_inline]]` statement
attribute is more relaxed than the GNU `__attribute__((always_inline))`
(which says it should error it if it can't inline), saying only "If a
statement is marked [[clang::always_inline]] and contains calls, the
compiler attempts to inline those calls.". The docs also go on to show
an example of where `[[clang::always_inline]]` has no effect.
2024-11-26 14:26:34 +00:00
Alexandros Lamprineas
56eb559b1d
[clang][FMV] Fix crash with cpu_specific attribute. (#115762)
When dealing with cpu_specific GlobalDecl,
GetOrCreateMultiVersionResolver should immediately return the already
created llvm function if it exists.

Fixes https://github.com/llvm/llvm-project/issues/115299.
2024-11-26 07:45:15 +00:00
Helena Kotas
cac978331f
[HLSL] Add Increment/DecrementCounter methods to structured buffers (#117608)
Introduces `__builtin_hlsl_buffer_update_counter` clang buildin that is
used to implement the `IncrementCounter` and `DecrementCounter` methods
on `RWStructuredBuffer` and `RasterizerOrderedStructuredBuffer` (see
Note).

The builtin is translated to LLVM intrisic `llvm.dx.bufferUpdateCounter`
or `llvm.spv.bufferUpdateCounter`.

Introduces `BuiltinTypeMethodBuilder` helper in `HLSLExternalSemaSource`
that enables adding methods to builtin types using builder pattern like
this:
```
   BuiltinTypeMethodBuilder(Sema, RecordBuilder, "MethodName", ReturnType)
       .addParam("param_name", Type, InOutModifier)
       .callBuiltin("buildin_name", { BuiltinParams })
       .finalizeMethod();
```

Fixes #113513

[First version](llvm/llvm-project#114148) of this PR was reverted
because of build break.
2024-11-25 16:10:48 -08:00
Matt Arsenault
e97fb2207e
AMDGPU: Add support for load transpose instructions for gfx950 (#117378)
This patch support for intrinsics in clang, as well as assembly
instructions in the backend.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 09:39:04 -08:00
Viktoriia Bakalova
3de21477c4
[clang][codegen] Mention the invariant that LLVM demangler should be … (#117346)
…able to handle mangled names generated by clang.


https://discourse.llvm.org/t/rfc-clang-diagnostic-for-demangling-failures/82835/8

Since we're putting the work on the above RFC on hold, let's leave a
comment in the source code pointing to prior efforts and the suggestion
of further steps.
2024-11-25 17:23:10 +01:00
Congcong Cai
cbdd14ee9d
[clang][NFC]add static for internal linkage function (#117482)
Detected by misc-use-internal-linkage
2024-11-25 06:48:33 +08:00
Helena Kotas
dc4c8de179
Revert "[HLSL] Add Increment/DecrementCounter methods to structured buffers (#114148)" (#117448)
This reverts commit 94bde8cdc39ff7e9c59ee0cd5edda882955242aa.
2024-11-23 12:02:07 -08:00
Helena Kotas
94bde8cdc3
[HLSL] Add Increment/DecrementCounter methods to structured buffers (#114148)
Introduces `__builtin_hlsl_buffer_update_counter` clang buildin that is
used to implement the `IncrementCounter` and `DecrementCounter` methods
on `RWStructuredBuffer` and `RasterizerOrderedStructuredBuffer` (see
Note).

The builtin is translated to LLVM intrisic `llvm.dx.bufferUpdateCounter`
or `llvm.spv.bufferUpdateCounter`.

Introduces `BuiltinTypeMethodBuilder` helper in `HLSLExternalSemaSource`
that enables adding methods to builtin types using builder pattern like
this:
```
   BuiltinTypeMethodBuilder(Sema, RecordBuilder, "MethodName", ReturnType)
       .addParam("param_name", Type, InOutModifier)
       .callBuiltin("buildin_name", { BuiltinParams })
       .finalizeMethod();
```

Fixes #113513
2024-11-23 09:33:38 -08:00
Matt Arsenault
d1cca3133a
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
2024-11-22 20:12:50 -08:00
Finn Plummer
a5f501e347
[HLSL][DXIL] Implement asdouble intrinsic (#114847)
- define intrinsic as builtin in Builtins.td
- link intrinsic in hlsl_intrinsics.h
- add semantic analysis to SemaHLSL.cpp
- lower to `llvm` or a `dx` intrinsic when applicable in CGBuiltin.cpp
- define DXIL intrinsic in IntrinsicsDirectX.td
- add DXIL op and mapping in DXIL.td
- enable scalarization of intrinsic

- add basic sema checking to asdouble-errors.hlsl
    
 Resolves #99081
2024-11-22 10:23:30 -08:00
Pengcheng Wang
875b10f7d0 [RISCV] Support __builtin_cpu_is
We have defined `__riscv_cpu_model` variable in #101449. It contains
`mvendorid`, `marchid` and `mimpid` fields which are read via system
call `sys_riscv_hwprobe`.

We can support `__builtin_cpu_is` via comparing values in compiler's
CPU definitions and `__riscv_cpu_model`.

This depends on #116202.

Reviewers: lenary, BeMg, kito-cheng, preames, lukel97

Reviewed By: lenary

Pull Request: https://github.com/llvm/llvm-project/pull/116231
2024-11-22 22:58:54 +08:00
Mikhail Goncharov
d1dae1e861 Revert "[RISCV] Add mvendorid/marchid/mimpid to CPU definitions (#116202)" chain
This reverts commit b36fcf4f493ad9d30455e178076d91be99f3a7d8.
This reverts commit c11b6b1b8af7454b35eef342162dc2cddf54b4de.
This reverts commit 775148f2367600f90d28684549865ee9ea2f11be.

multiple bot build breakages, e.g. https://lab.llvm.org/buildbot/#/builders/3/builds/8076
2024-11-22 14:09:13 +01:00
Wang Pengcheng
b36fcf4f49 [RISCV] Rename variable CPUModel to Model
The variable name can't be the same as the struct name or we will
have "error: declaration of ‘llvm::RISCV::CPUModel llvm::RISCV::CPUInfo::CPUModel’
changes meaning of ‘CPUModel’ [-fpermissive]".
2024-11-22 20:12:28 +08:00
Pengcheng Wang
c11b6b1b8a
[RISCV] Support __builtin_cpu_is
We have defined `__riscv_cpu_model` variable in #101449. It contains
`mvendorid`, `marchid` and `mimpid` fields which are read via system
call `sys_riscv_hwprobe`.

We can support `__builtin_cpu_is` via comparing values in compiler's
CPU definitions and `__riscv_cpu_model`.

This depends on #116202.

Reviewers: lenary, BeMg, kito-cheng, preames, lukel97

Reviewed By: lenary

Pull Request: https://github.com/llvm/llvm-project/pull/116231
2024-11-22 20:04:57 +08:00
Florian Hahn
41a0c66f43
[TBAA] Don't emit pointer tbaa for unnamed structs or unions. (#116596)
For unnamed structs or unions, C's compatible types rule applies. Two
compatible types in different compilation units can have different
mangled names, meaning the metadata emitted below would incorrectly mark
them as no-alias. Use AnyPtr for such types in both C and C++, as C and
C++ types may be visible when doing LTO.

PR: https://github.com/llvm/llvm-project/pull/116596
2024-11-21 22:20:27 +00:00
Florian Hahn
decb87452d
[TBAA] Only emit pointer tbaa metedata for record types. (#116991)
Be conservative if the type isn't a record type. Handling other types
may
require stripping const-qualifiers inside the type, e.g.
MemberPointerType.

Also look through array types same as through pointer types, to not
pessimize
arrays of pointers.

Without this, we assign different tags to the accesses for p an q in the
second test in cwg158.

PR: https://github.com/llvm/llvm-project/pull/116991
2024-11-21 20:01:06 +00:00
Kazu Hirata
f881a3815a [CodeGen] Fix a warning
This patch fixes:

  clang/lib/CodeGen/CGBuiltin.cpp:19287:17: error: unused variable
  'Ty' [-Werror,-Wunused-variable]
2024-11-21 10:27:05 -08:00
Ashley Coleman
6735c5ebd4
[HLSL] Implement WaveActiveAnyTrue intrinsic (#115902)
Resolves https://github.com/llvm/llvm-project/issues/99160

- [x]  Implement `WaveActiveAnyTrue` clang builtin,
- [x]  Link `WaveActiveAnyTrue` clang builtin with `hlsl_intrinsics.h`
- [x] Add sema checks for `WaveActiveAnyTrue` to
`CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp`
- [x] Add codegen for `WaveActiveAnyTrue` to `EmitHLSLBuiltinExpr` in
`CGBuiltin.cpp`
- [x] Add codegen tests to
`clang/test/CodeGenHLSL/builtins/WaveActiveAnyTrue.hlsl`
- [x] Add sema tests to
`clang/test/SemaHLSL/BuiltIns/WaveActiveAnyTrue-errors.hlsl`
- [x] Create the `int_dx_WaveActiveAnyTrue` intrinsic in
`IntrinsicsDirectX.td`
- [x] Create the `DXILOpMapping` of `int_dx_WaveActiveAnyTrue` to `113`
in `DXIL.td`
- [x] Create the `WaveActiveAnyTrue.ll` and
`WaveActiveAnyTrue_errors.ll` tests in `llvm/test/CodeGen/DirectX/`
- [x] Create the `int_spv_WaveActiveAnyTrue` intrinsic in
`IntrinsicsSPIRV.td`
- [x] In SPIRVInstructionSelector.cpp create the `WaveActiveAnyTrue`
lowering and map it to `int_spv_WaveActiveAnyTrue` in
`SPIRVInstructionSelector::selectIntrinsic`.
- [x] Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WaveActiveAnyTrue.ll`

---------

Co-authored-by: Finn Plummer <50529406+inbelic@users.noreply.github.com>
Co-authored-by: Greg Roth <grroth@microsoft.com>
2024-11-21 09:44:58 -08:00
Matt Arsenault
01c9a14ccf
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers
are supported yet (op_sel, neg or clamp).

I'm not sure the intrinsic should really expose op_sel (or any of the
others). If I'm reading the documentation correctly, we should be able
to just have the raw scale operands and auto-match op_sel to byte
extract patterns.

The op_sel syntax also seems extra horrible in this usage, especially with the
usual assumed op_sel_hi=-1 behavior.
2024-11-21 08:51:58 -08:00
smanna12
7b61ff2c26
[Clang] Prevent null dereferences (#115502)
This commit addresses several Static Analyzer issues related to
potential null dereference by replacing dyn_cast<> with cast<> and
getAs<> with castAs<> in various parts of the codes.

The cast function asserts that the cast is valid, ensuring that the
pointer is not null and preventing null dereference errors.

The changes are made in the following files:
CGBuiltin.cpp: Ensure vector types have exactly 3 elements.
CGExpr.cpp: Ensure member declarations are field declarations.
AnalysisBasedWarnings.cpp: Ensure operations are member expressions.
SemaExprMember.cpp: Ensure base types are extended vector types.

These changes ensure that the types are correctly cast and prevent
potential null dereference issues, improving the robustness and safety
of the code.
2024-11-21 09:15:02 -06:00
Daniel Paoliello
c86899d2d2
[clang] Add support for __declspec(no_init_all) (#116847)
In MSVC, when `/d1initall` is enabled, `__declspec(no_init_all)` can be
applied to a type to suppress auto-initialization for all instances of
that type or to a function to suppress auto-initialization for all
locals within that function.

This change does the same for Clang, except that it applies to the
`-ftrivial-auto-var-init` flag instead.

NOTE: I did not add a Clang-specific spelling for this but would be
happy to make a followup PR if folks are interested in that.
2024-11-20 16:48:30 -08:00
Alexander Shaposhnikov
df13acf344
[CudaSPIRV] Add support for optional spir-v attributes (#116589)
Add support for optional spir-v attributes.

Test plan:
ninja check-all
2024-11-19 13:14:45 -08:00
Nuno Lopes
b0afa6bab9 [clang] Change some placeholders from undef to poison [NFC] 2024-11-19 15:18:40 +00:00
Daniil Kovalev
3b162f73d8
[PAC][clang] Add signed GOT cc1 flag (#96160)
Add `-fptrauth-elf-got` clang cc1 flag and set `ptrauth_elf_got`
preprocessor feature and `PointerAuthELFGOT` LangOption correspondingly.
No additional checks like ensuring OS binary format is ELF are
performed: it should be done on clang driver level when a pauth-enabled
environment implying signed GOT enabled is requested.

If the cc1 flag is passed, "ptrauth-elf-got" IR module flag is set.
2024-11-19 10:20:15 +03:00
Joseph Huber
1ced565400
[Clang] Add support for scoped atomic thread fence (#115545)
Summary:
Previously we added support for all of the atomic GNU extensions with
optional memory scoped except for `__atomic_thread_fence`. This patch
adds support for that. This should ideally allow us to generically emit
these LLVM scopes.
2024-11-18 16:43:33 -06:00
Matt Arsenault
a6fc489bb7
AMDGPU: Add gfx950 subtarget definitions (#116307)
Mostly a stub, but adds some baseline tests and
tests for removed instructions.
2024-11-18 10:41:14 -08:00
Maurice Heumann
71b3b32c6e
[Clang] [MS] Add /Zc:tlsGuards option to control tls guard emission (#113830)
This adds an option to control whether guards for on-demand TLS
initialization in combination with Microsoft's CXX ABI are emitted or
not.
The behaviour should match with Microsoft:
https://learn.microsoft.com/en-us/cpp/build/reference/zc-tlsguards?view=msvc-170

This fixes #103484
2024-11-16 17:15:47 +01:00
Kazu Hirata
e8a6624325
[CodeGen] Remove unused includes (NFC) (#116459)
Identified with misc-include-cleaner.
2024-11-16 07:37:13 -08:00
Shilei Tian
4b50ec43d0
[Clang] Avoid Using byval for ndrange_t when emitting __enqueue_kernel_basic (#116435)
AMDGPU disabled the use of `byval` for struct argument passing in commit
d77c620. However, when emitting `__enqueue_kernel_basic`, Clang still
adds the
`byval` attribute by default. Emitting the `byval` attribute by default
in this
context doesn’t seem like a good idea, as argument-passing conventions
are
highly target-dependent, and assumptions here could lead to issues. This
PR
removes the addition of the `byval` attribute, aligning the behavior
with other
`__enqueue_kernel_*` functions.
2024-11-15 16:54:29 -05:00
joaosaffran
bc6c068127
[HLSL] Adding HLSL clip function. (#114588)
Adding HLSL `clip` function.
 - adding llvm intrinsic
 - adding sema checks
 - adding dxil lowering
 - ading spirv lowering
 - adding sema tests
 - adding codegen tests
 - adding lowering tests

Closes #99093

---------

Co-authored-by: Joao Saffran <jderezende@microsoft.com>
2024-11-14 23:34:07 -08:00
Eli Friedman
6bd3f2e898
[clang codegen] Add CreateRuntimeFunction overload that takes a clang type. (#113506)
Correctly computing the LLVM types/attributes is complicated in general,
so add a variant which does that for you.
2024-11-14 14:35:40 -08:00
Shilei Tian
de0fd64bed
[AMDGPU] Introduce a new generic target gfx9-4-generic (#115190)
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
2024-11-12 23:11:05 -05:00
Tex Riddell
5c2a133b13
Emit constrained atan2 intrinsic for clang builtin (#113636)
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

- `Builtins.td` - Add f16 support for libm atan2 builtin
- `CGBuiltin.cpp` - Emit constraint atan2 intrinsic for clang builtin
- `clang/test/CodeGenCXX/builtin-calling-conv.cpp` - Use erff instead of
atan2 for clang builtin to lib call calling convention check, now that
atan2 maps to an intrinsic.
- add atan2 cases to llvm.experimental.constrained tests for more
backends: ARM, PowerPC, RISCV, SystemZ.
- LangRef.rst: add llvm.experimental.constrained.atan2, revise
llvm.atan2 description.

Last part of Implement the atan2 HLSL Function. Fixes #70096.
2024-11-12 13:34:29 -08:00
erichkeane
39351f8e46 [OpenACC] Implement AST/Sema for combined constructs
Combined constructs (OpenACC 3.3 section 2.11) are a short-cut for
writing a `loop` construct immediately inside of a `compute` construct.
However, this interaction requires we do additional work to ensure that
we get the semantics between the two correct, as well as diagnostics.

This patch adds the semantic analysis for the constructs (but no
    clauses), as well as the AST nodes.
2024-11-12 09:26:25 -08:00
Malay Sanghi
f77101ea79
[X86][AMX] Support AMX-MOVRS (#115151)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-11-12 15:05:43 +08:00