2079 Commits

Author SHA1 Message Date
Oliver Stannard
9fc2fadbfc
[Clang] Re-write codegen for atomic_test_and_set and atomic_clear (#120449)
Re-write the sema and codegen for the atomic_test_and_set and
atomic_clear builtin functions to go via AtomicExpr, like the other
atomic builtins do. This simplifies the code, because AtomicExpr already
handles things like generating code for to dynamically select the memory
ordering, which was duplicated for these builtins. This also fixes a few
crash bugs, one when passing an integer to the pointer argument, and one
when using an array.

This also adds diagnostics for the memory orderings which are not valid
for atomic_clear according to
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which
were missing before.

Fixes #111293.
2024-12-19 09:12:19 +00:00
Jie Fu
a1766699c6 [clang] Fix -Wunused-variable in CGBuiltin.cpp (NFC)
/llvm-project/clang/lib/CodeGen/CGBuiltin.cpp:19441:17:
 error: unused variable 'Ty' [-Werror,-Wunused-variable]
    llvm::Type *Ty = Op->getType();
                ^
1 error generated.
2024-12-17 09:07:44 +08:00
Ashley Coleman
41a6e9cfd6
[HLSL] Implement WaveActiveAllTrue Intrinsic (#117245)
Resolves https://github.com/llvm/llvm-project/issues/99161

- [x]  Implement `WaveActiveAllTrue` clang builtin,
- [x]  Link `WaveActiveAllTrue` clang builtin with `hlsl_intrinsics.h`
- [x] Add sema checks for `WaveActiveAllTrue` to
`CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp`
- [x] Add codegen for `WaveActiveAllTrue` to `EmitHLSLBuiltinExpr` in
`CGBuiltin.cpp`
- [x] Add codegen tests to
`clang/test/CodeGenHLSL/builtins/WaveActiveAllTrue.hlsl`
- [x] Add sema tests to
`clang/test/SemaHLSL/BuiltIns/WaveActiveAllTrue-errors.hlsl`
- [x] Create the `int_dx_WaveActiveAllTrue` intrinsic in
`IntrinsicsDirectX.td`
- [x] Create the `DXILOpMapping` of `int_dx_WaveActiveAllTrue` to `114`
in `DXIL.td`
- [x] Create the `WaveActiveAllTrue.ll` and
`WaveActiveAllTrue_errors.ll` tests in `llvm/test/CodeGen/DirectX/`
- [x] Create the `int_spv_WaveActiveAllTrue` intrinsic in
`IntrinsicsSPIRV.td`
- [x] In SPIRVInstructionSelector.cpp create the `WaveActiveAllTrue`
lowering and map it to `int_spv_WaveActiveAllTrue` in
`SPIRVInstructionSelector::selectIntrinsic`.
- [x] Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WaveActiveAllTrue.ll`
2024-12-16 16:13:35 -08:00
Momchil Velikov
c2172431c7
[AArch64] Implements FP8 SVE intrinsics for dot-product (#118125)
This patch adds the following intrinsics:

* 8-bit floating-point dot product to single-precision.

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT4) ||
__ARM_FEATURE_SSVE_FP8DOT4
svfloat32_t svdot[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat32_t svdot[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point indexed dot product to single-precision.

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT4) ||
__ARM_FEATURE_SSVE_FP8DOT4
svfloat32_t svdot_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
svmfloat8_t zm,
                                       uint64_t imm0_3, fpm_t fpm);

* 8-bit floating-point dot product to half-precision.

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT2) ||
__ARM_FEATURE_SSVE_FP8DOT2
svfloat16_t svdot[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat16_t svdot[_n_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point indexed dot product to half-precision.

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT2) ||
__ARM_FEATURE_SSVE_FP8DOT2
svfloat16_t svdot_lane[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
svmfloat8_t zm,
                                       uint64_t imm0_7, fpm_t fpm);
2024-12-13 14:06:54 +00:00
anoopkg6
dc04d414df
SystemZ: Add support for __builtin_setjmp and __builtin_longjmp. (#119257)
This pr includes fixes for original pr##116642.
Implementation for __builtin_setjmp and __builtin_longjmp for SystemZ..
2024-12-10 19:50:51 +01:00
SpencerAbson
99f6ca9b7b
[AArch64] Implement intrinsics for SME FP8 FMOPA (#118115)
This patch implements the following intrinsics:

8-bit floating-point sum of outer products and accumulate.
``` c
  // Only if __ARM_FEATURE_SME_F8F16 != 0
    void svmopa_za16[_mf8]_m_fpm(uint64_t tile, svbool_t pn, svbool_t pm,
                                 svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm)
                                 __arm_streaming __arm_inout("za");

  // Only if __ARM_FEATURE_SME_F8F32 != 0
    void svmopa_za32[_mf8]_m_fpm(uint64_t tile, svbool_t pn, svbool_t pm,
                                 svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm)
                                 __arm_streaming __arm_inout("za");
```

In accordance with: https://github.com/ARM-software/acle/pull/323/

Co-authored-by: Momchil Velikov momchil.velikov@arm.com
Co-authored-by: Marian Lukac marian.lukac@arm.com
2024-12-09 11:13:08 +00:00
Ulrich Weigand
8787bc72a6 Revert "[SystemZ] Add support for __builtin_setjmp and __builtin_longjmp (#116642)"
This reverts commit 030bbc92a705758f1131fb29cab5be6d6a27dd1f.
2024-12-07 00:55:54 +01:00
anoopkg6
030bbc92a7
[SystemZ] Add support for __builtin_setjmp and __builtin_longjmp (#116642)
Implementation for __builtin_setjmp and __builtin_longjmp for SystemZ.
2024-12-06 23:33:33 +01:00
wwwatermiao
409edc64d1
[AArch64][SME] Fix bug on SMELd1St1 (#118109)
Patch[1] has update intrinsic interface for ld1/st1, while based on
ARM's document, "If the intrinsic also has a vnum argument, the ZA slice
number is calculated by adding vnum to slice.". But the "vnum" did not
work for our realization now, this patch fix this point.


[1]ee31ba0dd9
2024-12-05 14:39:02 -03:00
Daniel Paoliello
35c7df1a21
[aarch64][arm] Add support for the _Interlocked[Compare]ExchangePointer_{acq|nf|rel} MS intrinsics (#117645)
Adds support for the following MSVC intrinsics:
* `_InterlockedCompareExchangePointer_acq`
* `_InterlockedCompareExchangePointer_rel`
* `_InterlockedExchangePointer_acq`
* `_InterlockedExchangePointer_nf`
* `_InterlockedExchangePointer_rel`

These are documented at:
<https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170#interlocked-intrinsics>

NOTE: `_InterlockedCompareExchangePointer_nf` is not being added since
it already exists, although it was incorrectly added for all
architectures instead of being Arm & AArch64 specific.

This change also unifies how the pointer and non-pointer interlocked
compare-exchange intrinsics are being handled.
2024-12-04 13:41:26 -08:00
Daniel Paoliello
ee9e786717
[aarch64] Add support for the __{inc|add}x18{byte|word|dword|qword intrinsics (#117752)
Adds support for the following MSVC intrinsics:
* `__addx18byte`
* `__addx18word`
* `__addx18dword`
* `__addx18qword`
* `__incx18byte`
* `__incx18word`
* `__incx18dword`
* `__incx18qword`

These are documented at:
<https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170>
2024-12-04 10:29:58 -08:00
Adam Yang
dd2b2b8bbb
[clang][HLSL] Add GroupMemoryBarrierWithGroupSync intrinsic (#111883)
partially fixes #70103 

### Changes
* Implemented `GroupMemoryBarrierWithGroupSync` clang builtin
* Linked `GroupMemoryBarrierWithGroupSync` clang builtin with
`hlsl_intrinsics.h`
* Added sema checks for `GroupMemoryBarrierWithGroupSync` to
`CheckHLSLBuiltinFunctionCall` in
`SemaChecking.cpp`
* Add codegen for `GroupMemoryBarrierWithGroupSync` to
`EmitHLSLBuiltinExpr` in `CGBuiltin.cpp`
* Add codegen tests to
`clang/test/CodeGenHLSL/builtins/GroupMemoryBarrierWithGroupSync.hlsl`
* Add sema tests to
`clang/test/SemaHLSL/BuiltIns/GroupMemoryBarrierWithGroupSync-errors.hlsl`

### Related PRs
* [[DXIL] Add GroupMemoryBarrierWithGroupSync intrinsic
#111884](https://github.com/llvm/llvm-project/pull/111884)
* [[SPIRV] Add GroupMemoryBarrierWithGroupSync intrinsic
#111888](https://github.com/llvm/llvm-project/pull/111888)
2024-12-03 01:16:49 -08:00
Justin Bogner
bd92e46204
[HLSL] Implement RWBuffer::operator[] via __builtin_hlsl_resource_getpointer (#117017)
This introduces `__builtin_hlsl_resource_getpointer`, which lowers to
`llvm.dx.resource.getpointer` and is used to implement indexing into
resources.

This will only work through the backend for typed buffers at this point,
but the changes to structured buffers should be correct as far as the
frontend is concerned.

Note: We probably want this to return a reference in the HLSL device
address space, but for now we're just using address space 0. Creating a
device address space and updating this code can be done later as
necessary.

Fixes #95956
2024-12-02 14:03:31 -08:00
Matt Arsenault
a796f597cd
AMDGPU: Allow f16/bf16 for DS_READ_TR16_B64 gfx950 builtins (#118297)
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-12-02 14:40:36 -05:00
SpencerAbson
e4ee970c4b
[AArch64] Implement intrinsics for F1CVTL/F2CVTL and BF1CVTL/BF2CVTL (#116959)
This patch implements the following intrinsics:

8-bit floating-point convert to deinterleaved half-precision or
BFloat16.
``` c
  // Variant is also available for: _bf16[_mf8]_x2
  svfloat16x2_t svcvtl1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
  svfloat16x2_t svcvtl2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
```

Defined in https://github.com/ARM-software/acle/pull/323

Co-authored-by: Caroline Concatto caroline.concatto@arm.com
Co-authored-by: Marian Lukac marian.lukac@arm.com
2024-11-28 12:37:02 +00:00
Matt Arsenault
62dc8f3069
AMDGPU: Add builtins & codegen support for bitop3_b{16|32} of gfx950. (#117823)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 23:33:07 -05:00
Helena Kotas
cac978331f
[HLSL] Add Increment/DecrementCounter methods to structured buffers (#117608)
Introduces `__builtin_hlsl_buffer_update_counter` clang buildin that is
used to implement the `IncrementCounter` and `DecrementCounter` methods
on `RWStructuredBuffer` and `RasterizerOrderedStructuredBuffer` (see
Note).

The builtin is translated to LLVM intrisic `llvm.dx.bufferUpdateCounter`
or `llvm.spv.bufferUpdateCounter`.

Introduces `BuiltinTypeMethodBuilder` helper in `HLSLExternalSemaSource`
that enables adding methods to builtin types using builder pattern like
this:
```
   BuiltinTypeMethodBuilder(Sema, RecordBuilder, "MethodName", ReturnType)
       .addParam("param_name", Type, InOutModifier)
       .callBuiltin("buildin_name", { BuiltinParams })
       .finalizeMethod();
```

Fixes #113513

[First version](llvm/llvm-project#114148) of this PR was reverted
because of build break.
2024-11-25 16:10:48 -08:00
Matt Arsenault
e97fb2207e
AMDGPU: Add support for load transpose instructions for gfx950 (#117378)
This patch support for intrinsics in clang, as well as assembly
instructions in the backend.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 09:39:04 -08:00
Congcong Cai
cbdd14ee9d
[clang][NFC]add static for internal linkage function (#117482)
Detected by misc-use-internal-linkage
2024-11-25 06:48:33 +08:00
Helena Kotas
dc4c8de179
Revert "[HLSL] Add Increment/DecrementCounter methods to structured buffers (#114148)" (#117448)
This reverts commit 94bde8cdc39ff7e9c59ee0cd5edda882955242aa.
2024-11-23 12:02:07 -08:00
Helena Kotas
94bde8cdc3
[HLSL] Add Increment/DecrementCounter methods to structured buffers (#114148)
Introduces `__builtin_hlsl_buffer_update_counter` clang buildin that is
used to implement the `IncrementCounter` and `DecrementCounter` methods
on `RWStructuredBuffer` and `RasterizerOrderedStructuredBuffer` (see
Note).

The builtin is translated to LLVM intrisic `llvm.dx.bufferUpdateCounter`
or `llvm.spv.bufferUpdateCounter`.

Introduces `BuiltinTypeMethodBuilder` helper in `HLSLExternalSemaSource`
that enables adding methods to builtin types using builder pattern like
this:
```
   BuiltinTypeMethodBuilder(Sema, RecordBuilder, "MethodName", ReturnType)
       .addParam("param_name", Type, InOutModifier)
       .callBuiltin("buildin_name", { BuiltinParams })
       .finalizeMethod();
```

Fixes #113513
2024-11-23 09:33:38 -08:00
Matt Arsenault
d1cca3133a
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
2024-11-22 20:12:50 -08:00
Finn Plummer
a5f501e347
[HLSL][DXIL] Implement asdouble intrinsic (#114847)
- define intrinsic as builtin in Builtins.td
- link intrinsic in hlsl_intrinsics.h
- add semantic analysis to SemaHLSL.cpp
- lower to `llvm` or a `dx` intrinsic when applicable in CGBuiltin.cpp
- define DXIL intrinsic in IntrinsicsDirectX.td
- add DXIL op and mapping in DXIL.td
- enable scalarization of intrinsic

- add basic sema checking to asdouble-errors.hlsl
    
 Resolves #99081
2024-11-22 10:23:30 -08:00
Pengcheng Wang
875b10f7d0 [RISCV] Support __builtin_cpu_is
We have defined `__riscv_cpu_model` variable in #101449. It contains
`mvendorid`, `marchid` and `mimpid` fields which are read via system
call `sys_riscv_hwprobe`.

We can support `__builtin_cpu_is` via comparing values in compiler's
CPU definitions and `__riscv_cpu_model`.

This depends on #116202.

Reviewers: lenary, BeMg, kito-cheng, preames, lukel97

Reviewed By: lenary

Pull Request: https://github.com/llvm/llvm-project/pull/116231
2024-11-22 22:58:54 +08:00
Mikhail Goncharov
d1dae1e861 Revert "[RISCV] Add mvendorid/marchid/mimpid to CPU definitions (#116202)" chain
This reverts commit b36fcf4f493ad9d30455e178076d91be99f3a7d8.
This reverts commit c11b6b1b8af7454b35eef342162dc2cddf54b4de.
This reverts commit 775148f2367600f90d28684549865ee9ea2f11be.

multiple bot build breakages, e.g. https://lab.llvm.org/buildbot/#/builders/3/builds/8076
2024-11-22 14:09:13 +01:00
Wang Pengcheng
b36fcf4f49 [RISCV] Rename variable CPUModel to Model
The variable name can't be the same as the struct name or we will
have "error: declaration of ‘llvm::RISCV::CPUModel llvm::RISCV::CPUInfo::CPUModel’
changes meaning of ‘CPUModel’ [-fpermissive]".
2024-11-22 20:12:28 +08:00
Pengcheng Wang
c11b6b1b8a
[RISCV] Support __builtin_cpu_is
We have defined `__riscv_cpu_model` variable in #101449. It contains
`mvendorid`, `marchid` and `mimpid` fields which are read via system
call `sys_riscv_hwprobe`.

We can support `__builtin_cpu_is` via comparing values in compiler's
CPU definitions and `__riscv_cpu_model`.

This depends on #116202.

Reviewers: lenary, BeMg, kito-cheng, preames, lukel97

Reviewed By: lenary

Pull Request: https://github.com/llvm/llvm-project/pull/116231
2024-11-22 20:04:57 +08:00
Kazu Hirata
f881a3815a [CodeGen] Fix a warning
This patch fixes:

  clang/lib/CodeGen/CGBuiltin.cpp:19287:17: error: unused variable
  'Ty' [-Werror,-Wunused-variable]
2024-11-21 10:27:05 -08:00
Ashley Coleman
6735c5ebd4
[HLSL] Implement WaveActiveAnyTrue intrinsic (#115902)
Resolves https://github.com/llvm/llvm-project/issues/99160

- [x]  Implement `WaveActiveAnyTrue` clang builtin,
- [x]  Link `WaveActiveAnyTrue` clang builtin with `hlsl_intrinsics.h`
- [x] Add sema checks for `WaveActiveAnyTrue` to
`CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp`
- [x] Add codegen for `WaveActiveAnyTrue` to `EmitHLSLBuiltinExpr` in
`CGBuiltin.cpp`
- [x] Add codegen tests to
`clang/test/CodeGenHLSL/builtins/WaveActiveAnyTrue.hlsl`
- [x] Add sema tests to
`clang/test/SemaHLSL/BuiltIns/WaveActiveAnyTrue-errors.hlsl`
- [x] Create the `int_dx_WaveActiveAnyTrue` intrinsic in
`IntrinsicsDirectX.td`
- [x] Create the `DXILOpMapping` of `int_dx_WaveActiveAnyTrue` to `113`
in `DXIL.td`
- [x] Create the `WaveActiveAnyTrue.ll` and
`WaveActiveAnyTrue_errors.ll` tests in `llvm/test/CodeGen/DirectX/`
- [x] Create the `int_spv_WaveActiveAnyTrue` intrinsic in
`IntrinsicsSPIRV.td`
- [x] In SPIRVInstructionSelector.cpp create the `WaveActiveAnyTrue`
lowering and map it to `int_spv_WaveActiveAnyTrue` in
`SPIRVInstructionSelector::selectIntrinsic`.
- [x] Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WaveActiveAnyTrue.ll`

---------

Co-authored-by: Finn Plummer <50529406+inbelic@users.noreply.github.com>
Co-authored-by: Greg Roth <grroth@microsoft.com>
2024-11-21 09:44:58 -08:00
Matt Arsenault
01c9a14ccf
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers
are supported yet (op_sel, neg or clamp).

I'm not sure the intrinsic should really expose op_sel (or any of the
others). If I'm reading the documentation correctly, we should be able
to just have the raw scale operands and auto-match op_sel to byte
extract patterns.

The op_sel syntax also seems extra horrible in this usage, especially with the
usual assumed op_sel_hi=-1 behavior.
2024-11-21 08:51:58 -08:00
smanna12
7b61ff2c26
[Clang] Prevent null dereferences (#115502)
This commit addresses several Static Analyzer issues related to
potential null dereference by replacing dyn_cast<> with cast<> and
getAs<> with castAs<> in various parts of the codes.

The cast function asserts that the cast is valid, ensuring that the
pointer is not null and preventing null dereference errors.

The changes are made in the following files:
CGBuiltin.cpp: Ensure vector types have exactly 3 elements.
CGExpr.cpp: Ensure member declarations are field declarations.
AnalysisBasedWarnings.cpp: Ensure operations are member expressions.
SemaExprMember.cpp: Ensure base types are extended vector types.

These changes ensure that the types are correctly cast and prevent
potential null dereference issues, improving the robustness and safety
of the code.
2024-11-21 09:15:02 -06:00
Joseph Huber
1ced565400
[Clang] Add support for scoped atomic thread fence (#115545)
Summary:
Previously we added support for all of the atomic GNU extensions with
optional memory scoped except for `__atomic_thread_fence`. This patch
adds support for that. This should ideally allow us to generically emit
these LLVM scopes.
2024-11-18 16:43:33 -06:00
Kazu Hirata
e8a6624325
[CodeGen] Remove unused includes (NFC) (#116459)
Identified with misc-include-cleaner.
2024-11-16 07:37:13 -08:00
Shilei Tian
4b50ec43d0
[Clang] Avoid Using byval for ndrange_t when emitting __enqueue_kernel_basic (#116435)
AMDGPU disabled the use of `byval` for struct argument passing in commit
d77c620. However, when emitting `__enqueue_kernel_basic`, Clang still
adds the
`byval` attribute by default. Emitting the `byval` attribute by default
in this
context doesn’t seem like a good idea, as argument-passing conventions
are
highly target-dependent, and assumptions here could lead to issues. This
PR
removes the addition of the `byval` attribute, aligning the behavior
with other
`__enqueue_kernel_*` functions.
2024-11-15 16:54:29 -05:00
joaosaffran
bc6c068127
[HLSL] Adding HLSL clip function. (#114588)
Adding HLSL `clip` function.
 - adding llvm intrinsic
 - adding sema checks
 - adding dxil lowering
 - ading spirv lowering
 - adding sema tests
 - adding codegen tests
 - adding lowering tests

Closes #99093

---------

Co-authored-by: Joao Saffran <jderezende@microsoft.com>
2024-11-14 23:34:07 -08:00
Tex Riddell
5c2a133b13
Emit constrained atan2 intrinsic for clang builtin (#113636)
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

- `Builtins.td` - Add f16 support for libm atan2 builtin
- `CGBuiltin.cpp` - Emit constraint atan2 intrinsic for clang builtin
- `clang/test/CodeGenCXX/builtin-calling-conv.cpp` - Use erff instead of
atan2 for clang builtin to lib call calling convention check, now that
atan2 maps to an intrinsic.
- add atan2 cases to llvm.experimental.constrained tests for more
backends: ARM, PowerPC, RISCV, SystemZ.
- LangRef.rst: add llvm.experimental.constrained.atan2, revise
llvm.atan2 description.

Last part of Implement the atan2 HLSL Function. Fixes #70096.
2024-11-12 13:34:29 -08:00
Malay Sanghi
f77101ea79
[X86][AMX] Support AMX-MOVRS (#115151)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-11-12 15:05:43 +08:00
Finn Plummer
e520b28397
[DXIL][SPIRV] Lower WaveActiveCountBits intrinsic (#113382)
```
  - add codegen for llvm builtin to spirv/directx intrinsic in CGBuiltin.cpp
  - add lowering of spirv intrinsic to spirv backend in SPIRVInstructionSelector.cpp
  - add lowering of directx intrinsic to dxil op in DXIL.td

  - add test cases to illustrate passes
  - add test case for semantic analysis
```
  
Resolves #80176
2024-11-07 19:06:37 -08:00
Adam Yang
36d757f840
[HLSL][SPIRV] Added clamp intrinsic (#113394)
Fixes #88052

- Added the following intrinsics:
  - `int_spv_uclamp`
  - `int_spv_sclamp`
  - `int_spv_fclamp`
- Updated DirectX counterparts to have the same three clamp intrinsics.
- Update the clamp.hlsl unit tests to include SPIRV
- Added the SPIRV specific tests
2024-11-07 17:47:53 -08:00
Bill Wendling
7475156d49
[Clang] Add __builtin_counted_by_ref builtin (#114495)
The __builtin_counted_by_ref builtin is used on a flexible array
pointer and returns a pointer to the "counted_by" attribute's COUNT
argument, which is a field in the same non-anonymous struct as the
flexible array member. This is useful for automatically setting the
count field without needing the programmer's intervention. Otherwise
it's possible to get this anti-pattern:
    
      ptr = alloc(<ty>, ..., COUNT);
      ptr->FAM[9] = 42; /* <<< Sanitizer will complain */
      ptr->count = COUNT;
    
To prevent this anti-pattern, the user can create an allocator that
automatically performs the assignment:
    
      #define alloc(TY, FAM, COUNT) ({ \
          TY __p = alloc(get_size(TY, COUNT));             \
          if (__builtin_counted_by_ref(__p->FAM))          \
              *__builtin_counted_by_ref(__p->FAM) = COUNT; \
          __p;                                             \
      })

The builtin's behavior is heavily dependent upon the "counted_by"
attribute existing. It's main utility is during allocation to avoid
the above anti-pattern. If the flexible array member doesn't have that
attribute, the builtin becomes a no-op. Therefore, if the flexible
array member has a "count" field not referenced by "counted_by", it
must be set explicitly after the allocation as this builtin will
return a "nullptr" and the assignment will most likely be elided.

---------

Co-authored-by: Bill Wendling <isanbard@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
2024-11-07 22:03:55 +00:00
Finn Plummer
bf30b6c33c
[HLSL][SPIRV][DXIL] Implement dot4add_u8packed intrinsic (#115068)
```- create a clang built-in in Builtins.td
- link dot4add_u8packed in hlsl_intrinsics.h
- add lowering to spirv backend through expansion of operation as OpUDot is missing up to SPIRV 1.6 in SPIRVInstructionSelector.cpp
- add lowering to spirv backend using OpUDot if applicable SPIRV version or SPV_KHR_integer_dot_product is enabled
- add dot4add_u8packed intrinsic to IntrinsicsDirectX.td and mapping to DXIL.td op Dot4AddU8Packed

- add tests for HLSL intrinsic lowering to dx/spv intrinsic in dot4add_u8packed.hlsl
- add tests for sema checks in dot4add_u8packed-errors.hlsl
- add test of spir-v lowering in SPIRV/dot4add_u8packed.ll
- add test to dxil lowering in DirectX/dot4add_u8packed.ll
```

Resolves #99219
2024-11-07 10:19:41 -08:00
Sarah Spall
fb90733e19
[HLSL] implement elementwise firstbithigh hlsl builtin (#111082)
Implements elementwise firstbithigh hlsl builtin.
Implements firstbituhigh intrinsic for spirv and directx, which handles
unsigned integers
Implements firstbitshigh intrinsic for spirv and directx, which handles
signed integers.
Fixes #113486
Closes #99115
2024-11-06 07:31:39 -08:00
Matt Arsenault
0c60573d1c
clang/AMDGPU: Emit grid size builtins with range metadata (#113038)
These cannot be 0.
2024-11-05 12:47:04 -08:00
Finn Plummer
3cdac06708
[HLSL][SPIRV][DXIL] Implement dot4add_i8packed intrinsic (#113623)
- create a clang built-in in Builtins.td
- link dot4add_i8packed in hlsl_intrinsics.h
- add lowering to spirv backend through expansion of operation as OPSDot
is missing up to SPIRV 1.6 in SPIRVInstructionSelector.cpp
- add lowering to spirv backend using OpSDot in applicable SPIRV version
or if SPV_KHR_integer_dot_product is enabled
- add dot4add_i8packed intrinsic to IntrinsicsDirectX.td and mapping to
DXIL.td op Dot4AddI8Packed

- add tests for HLSL intrinsic lowering to dx/spv intrinsic in
dot4add_i8packed.hlsl
- add tests for sema checks in dot4add_i8packed-errors.hlsl
- add test of spir-v lowering in SPIRV/dot4add_i8packed.ll
- add test to dxil lowering in DirectX/dot4add_i8packed.ll
    
 Resolves #99220
2024-11-05 10:29:08 -08:00
Phoebe Wang
c72a751dab
[X86][AMX] Support AMX-TRANSPOSE (#113532)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-11-01 16:45:03 +08:00
Craig Topper
cd8d507b07 [RISCV] Pull __builtin_riscv_clz/ctz out of a nested switch. NFC
The nested switch exists to share setting IntrinsicsTypes to {ResultType}.
clz/ctz return before we reach that so they can just be in the top
level switch.
2024-10-31 11:01:58 -07:00
Simon Pilgrim
fcaa8c6e22 Fix MSVC "signed/unsigned mismatch" warning. NFC. 2024-10-31 11:50:19 +00:00
Stanislav Mekhanoshin
ba1a09da8d
[AMDGPU] Allow overload of __builtin_amdgcn_mov_dpp8 (#113610)
The same handling as for __builtin_amdgcn_mov_dpp.
2024-10-31 02:19:20 -07:00
joaosaffran
481bce018e
Adding splitdouble HLSL function (#109331)
- Adding hlsl `splitdouble` intrinsics
- Adding DXIL lowering
- Adding SPIRV lowering
- Adding test

Fixes: #108901

---------

Co-authored-by: Joao Saffran <jderezende@microsoft.com>
2024-10-28 13:26:59 -07:00
Simon Pilgrim
d6d4569dd9 Fix MSVC "signed/unsigned mismatch" warnings. NFC. 2024-10-28 11:45:36 +00:00