2168 Commits

Author SHA1 Message Date
Piyou Chen
82f52d9c42
[RISCV] Support new groupid/bitmask for cpu_model (#101632)
The spec can be found at
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/74.

1. Add the new extension GroupID/Bitmask with latest hwprobe key.
2. Update the `initRISCVFeature `
3. Update `EmitRISCVCpuSupports` due to not only group0 now.
2024-08-08 14:42:41 +08:00
Phoebe Wang
0dba5381d8
[X86][AVX10.2] Support YMM rounding new instructions (#101825)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-08-04 21:05:45 +08:00
Joshua Batista
ed5b0e1e69
Add length builtins and length HLSL function to DirectX Backend (#101256)
This PR adds the length intrinsic and an HLSL function that uses it.
The SPIRV implementation is left for a future PR.
This PR addresses #99134, though some SPIR-V changes still need to be
made to complete the task. Below is how this PR addresses #99134.
- "Implement `length` clang builtin" was done by defining `HLSLL ength`
in Builtins.td
- "Link `length` clang builtin with hlsl_intrinsics.h" was done by using
the alias attribute to make `length` an alias of
`__builtin_hlsl_elementwise_length` in hlsl_intrinsics.h
- "Add sema checks for `length` to `CheckHLSLBuiltinFunctionCall` in
`SemaChecking.cpp` " was done, but in this case not in SemaChecking.cpp,
rather SemaHLSL.cpp. A case was added to the builtin to check for
semantic failures, and set `TheCall` up to have the right return type.
- "Add codegen for `length` to `EmitHLSLBuiltinExpr` in `CGBuiltin.cpp`"
was done. For scalars, fabs is emitted, otherwise, length is emitted.
- "Add codegen tests to `clang/test/CodeGenHLSL/builtins/length.hlsl`
was done to test that `length` in HLSL emits the right intrinsic.
- "Add sema tests to `clang/test/SemaHLSL/BuiltIns/length-errors.hlsl`"
was done to test for diagnostics emitted in SemaHLSL.cpp
- "Create the `int_dx_length` intrinsic in `IntrinsicsDirectX.td`" was
done. Specifying return types and parameter types was difficult, but
`idot` was used for reference, and `llvm\include\llvm\IR\Intrinsics.td`
contains all the ways to express return / parameter types.
- "Create an intrinsic expansion of `int_dx_length` in
`llvm/lib/Target/DirectX/DXILIntrinsicExpansion.cpp`" was done, and was
mostly derived by looking at `TranslateLength` in `HLOperationLower.cpp`
in the DXC codebase.
- "Create the `length.ll` and `length_errors.ll` tests in
`llvm/test/CodeGen/DirectX/`" was done by taking the DXIL output of
`clang/test/CodeGenHLSL/builtins/length.hlsl` and running `opt -S
-dxil-intrinsic-expansion` and ` opt -S -dxil-op-lower` on it, checking
for how the length intrinsic was either expanded or lowered.
- "Create the `int_spv_length` intrinsic in `IntrinsicsSPIRV.td`" was
done by copying `IntrinsicsDirectX.td`.

---------

Co-authored-by: Justin Bogner <mail@justinbogner.com>
2024-08-02 21:16:24 -07:00
Farzon Lotfi
96e6255e8b
[HLSL] cleanup builtin names elementwise usage (#101543)
Remove elementwise description for builtins that don't perform
elementwise operations.
2024-08-02 00:10:28 -04:00
Bill Wendling
160fb1121c
[Clang][NFC] Improve generation of GEP and RecordDecl loop (#101434)
As with other loops, we need only look at a RecordDecl's FieldDecls.
Convert to using them. In the meantime, we can improve the generation of
the 'counted_by' FieldDecl's GEP by creating one GEP instead of a series
of GEPs.
2024-08-01 19:46:57 +00:00
Allen
9589c128ae
[clang codegen] Emit int TBAA metadata on more FP math libcalls (#100302)
Follow #96025, except expf, more FP math libcalls in libm should also be
supported.

Fix https://github.com/llvm/llvm-project/issues/86635
2024-07-31 09:01:20 +08:00
Joseph Huber
dbb8b7a0f4 Reapply "[OpenMP][libc] Remove special handling for OpenMP printf (#98940)"
This reverts commit fea5914c926e2f013a8b5e27eaa74c7047fb2c71.
2024-07-26 17:21:56 -05:00
Joseph Huber
fea5914c92 Revert "[OpenMP][libc] Remove special handling for OpenMP printf (#98940)"
This reverts commit 069e8bcd82c4420239f95c7e6a09e1f756317cfc.

Summary:
Some tests failing, revert this for now.
2024-07-26 16:39:12 -05:00
Joseph Huber
069e8bcd82
[OpenMP][libc] Remove special handling for OpenMP printf (#98940)
Summary:
Currently there are several layers to handle `printf`. Since we now have
varargs and an implementation of `printf` this can be heavily
simplified.

1. The frontend renames `printf` into `omp_vprintf` and gives it an
   argument buffer.

Removing 1. triggered some code in the AMDGPU backend menat for HIP /
OpenCL, so I hadded an exception to it.

2. Forward this to CUDA vprintf or ignore it.

We no longer need special handling for it since we have varargs. So now
we just forward this to CUDA vprintf if we have libc, otherwise just
leave `printf` as an external function and expect that `libc` will be
linked in.
2024-07-26 16:03:36 -05:00
James Y Knight
0431d6dab4
Clang: convert __m64 intrinsics to unconditionally use SSE2 instead of MMX. (#96540)
The MMX instruction set is legacy, and the SSE2 variants are in every
way superior, when they are available -- and they have been available
since the Pentium 4 was released, 20 years ago.

Therefore, we are switching the "MMX" intrinsics to depend on SSE2,
unconditionally. This change entirely drops the ability to generate
vectorized code using compiler intrinsics for chips with MMX but without
SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released
1997-1999), as well as AMD K6 and K7 series chips of around the same
timeframe. Targeting these older CPUs remains supported -- simply
without the ability to use MMX compiler intrinsics.

Migrating away from the use of MMX registers also fixes a rather
non-obvious requirement. The long-standing programming model for these
MMX intrinsics requires that the programmer be aware of the x87/MMX
mode-switching semantics, and manually call `_mm_empty()` between using
any MMX instruction and any x87 FPU instruction. If you neglect to, then
every future x87 operation will return a NaN result. This requirement is
not at all obvious to users of these these intrinsic functions, and
causes very difficult to detect bugs.

Worse, even if the user did write code that correctly calls
`_mm_empty()` in the right places, LLVM may sometimes reorder x87 and
mmx operations around each-other, unaware of this mode switching issue.

Eliminating the use of MMX registers eliminates this problem.

This change also deletes the now-unnecessary MMX `__builtin_ia32_*`
functions from Clang. Only 3 MMX-related builtins remain in use --
`__builtin_ia32_emms`, used by `_mm_empty`, and
`__builtin_ia32_vec_{ext,set}_v4si`, used by `_mm_insert_pi16` and
`_mm_extract_pi16`. Note particularly that the latter two lower to
generic, non-MMX, IR. Support for the LLVM intrinsics underlying these
removed builtins still remains, for the moment.

The file `clang/www/builtins.py` has been updated with mappings from the
newly-removed `__builtin_ia32` functions to the still-supported
equivalents in `mmintrin.h`.

(Originally uploaded at https://reviews.llvm.org/D86855 and
https://reviews.llvm.org/D94252)

Fixes issue #41665
Works towards #98272
2024-07-24 17:00:12 -04:00
Brendan Dahl
0dbd72d6ab
[WebAssembly] Implement f16x8.replace_lane instruction. (#99388)
Use a builtin and intrinsic until half types are better supported for
instruction selection.
2024-07-24 11:55:36 -07:00
Andrii Levytskyi
c92d9b06d4
[SPIRV][HLSL] Add lowering of frac to SPIR-V (#97111)
Implements frac lowering to SPIR-V.

Closes #88059
2024-07-23 14:03:39 -04:00
Philip Reames
d1e28e2a7b
[RISCV] Support __builtin_cpu_init and __builtin_cpu_supports (#99700)
This implements the __builtin_cpu_init and __builtin_cpu_supports
builtin routines based on the compiler runtime changes in
https://github.com/llvm/llvm-project/pull/85790.

This is inspired by https://github.com/llvm/llvm-project/pull/85786.
Major changes are a) a restriction in scope to only the builtins (which
have a much narrower user interface), and the avoidance of false
generality. This change deliberately only handles group 0 extensions
(which happen to be all defined ones today), and avoids the tblgen
changes from that review.

I don't have an environment in which I can actually test this, but @BeMg
has been kind enough to report that this appears to work as expected.

Before this can make it into a release, we need a change such as
https://github.com/llvm/llvm-project/pull/99958. The gcc docs claim that
cpu_support can be called by "normal" code without calling the cpu_init
routine because the init routine will have been called by a high
priority constructor. Our current compiler-rt mechanism does not do
this.
2024-07-23 08:48:28 -07:00
Farzon Lotfi
a14baec0f3
[clang] Emit constraint intrinsics for arc and hyperbolic trig clang builtins (#98949)
## Change(s)
- `Builtins.td` - Add f16 support for libm arc and hyperbolic trig
functions
- `CGBuiltin.cpp` - Emit constraint intrinsics for trig clang builtins

## History
This change is part of an implementation of
https://github.com/llvm/llvm-project/issues/87367's investigation on
supporting IEEE math operations as intrinsics. Which was discussed in
this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This change adds wasm lowering cases for `acos`, `asin`, `atan`, `cosh`,
`sinh`, and `tanh`.

https://github.com/llvm/llvm-project/issues/70079
https://github.com/llvm/llvm-project/issues/70080
https://github.com/llvm/llvm-project/issues/70081
https://github.com/llvm/llvm-project/issues/70083
https://github.com/llvm/llvm-project/issues/70084
https://github.com/llvm/llvm-project/issues/95966

## Precursor PR(s)

Note this PR needs Merge after:
- #98937
- #98755
2024-07-19 10:19:41 -04:00
Allen
1df2e0c344
[clang codegen] Emit int TBAA metadata on FP math libcall expf (#96025)
Base on the discussion
https://discourse.llvm.org/t/fp-can-we-add-pure-attribute-for-math-library-functions-default/79459,
math libcalls set errno, so it should emit "int" TBAA metadata on FP
libcalls to solve the alias issue.

Note: Only add support for expf in this PR

Fix https://github.com/llvm/llvm-project/issues/86635
2024-07-19 11:19:21 +08:00
Shilei Tian
892c58cf74
[Clang][AMDGPU] Add builtins for instrinsic llvm.amdgcn.raw.ptr.buffer.load (#99258) 2024-07-18 15:33:03 -04:00
Changpeng Fang
280d90d0fd
AMDGPU: Add back half and bfloat support for global_load_tr16 pats (#99540)
half and bfloat are common types for 16-bit elements. The support of
them was original there and dropped due to some reasons. This work adds
the support of the float types back.
2024-07-18 11:23:35 -07:00
James Y Knight
f0eb5587ce
Remove support for 3DNow!, both intrinsics and builtins. (#96246)
This set of instructions was only supported by AMD chips starting in
the K6-2 (introduced 1998), and before the "Bulldozer" family
(2011). They were never much used, as they were effectively superseded
by the more-widely-implemented SSE (first implemented on the AMD side
in Athlon XP in 2001).

This is being done as a predecessor towards general removal of MMX
register usage. Since there is almost no usage of the 3DNow!
intrinsics, and no modern hardware even implements them, simple
removal seems like the best option.

(Clang half originally uploaded in https://reviews.llvm.org/D94213)

Works towards issue #41665 and issue #98272.
2024-07-16 12:08:48 -04:00
Mike Rice
945440033f
[NFC][clang] Replace unchecked dyn_cast with cast (#98948)
BI__builtin_hlsl_elementwise_rcp is only invoked with a FixedVectorType
so use cast to make this clear and satisfy the static verifier.
2024-07-16 08:05:43 -07:00
Zahira Ammarguellat
0bfdc4d492
Add __builtin_fmaf16. (#97424) 2024-07-15 08:29:16 -04:00
Amy Huang
ae7ab043f2
Add __hlt intrinsic for Windows ARM. (#96578)
Add __hlt, which is a MSVC ARM64 intrinsic. 

This intrinsic is just the HLT instruction. MSVC's version seems to
return something undefined; in this patch
it will just return zero. 

MSVC intrinsics are defined here
https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics.
I used unsigned int as the return type, because that is what the MSVC
intrin.h header uses, even though
it conflicts with the documentation.
2024-07-08 12:59:02 -07:00
Alex Voicu
d4216b5d0b
[clang][CodeGen][AMDGPU] Enable AMDGPU printf for spirv64-amd-amdhsa (#97132)
This enables the AMDGPU specific implementation of `printf` when
compiling for AMDGCN flavoured SPIR-V, the consequence being that the
expansion into ROCDL calls & friends gets expanded before "lowering" to
SPIR-V and gets carried through. The only relatively "novel" aspect is
that the `callAppendStringN` is simplified to take the type of the
passed in arguments, as opposed to querying them from the module. This
is a neutral change since the arguments were passed directly to the
call, without any attempt to cast them, hence the assumption that the
actual types match the formal ones was already baked in.
2024-07-05 14:08:07 +01:00
Chen Zheng
6a992bc89f [PowerPC] refactor CPU info in PPCTargetParser.def, NFC
CPU features will be done in follow up patches.
2024-07-03 00:20:14 -04:00
smanna12
05d8ea77c9
[Clang] Prevent null pointer dereferences in SVE tuple functions (#94267)
This patch 

addresses a null pointer dereference issue reported by static analyzer
tool in the
`EmitSVETupleSetOrGet()` and `EmitSVETupleCreate()` functions.
Previously, the function
assumed that the result of `dyn_cast<>` to `ScalableVectorType` would
always be non-null,
    which is not guaranteed.

The fix introduces a null check after the `dyn_cast<>` operation. If the
cast fails and
`SingleVecTy` is null, the function now returns `nullptr` to indicate an
error. This prevents the
  dereference of a null pointer, which could lead to undefined behavior.

Additionally, the assert message has been corrected to accurately
reflect the expected
   conditions.

These changes collectively enhance the robustness of the code by
ensuring type safety and preventing runtime errors due to improper type
casting.
2024-07-01 10:51:28 -05:00
Matt Arsenault
8f63d154ec
clang/AMDGPU: Use atomicrmw for ds fmin/fmax builtins (#96738) 2024-06-27 15:32:08 +02:00
Vikram Hegde
35f7b60aa6
[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725)
These are incremental changes over #89217 , with core logic being the
same. This patch along with #89217 and #91190 should get us ready to enable 64
bit optimizations in atomic optimizer.
2024-06-26 09:24:09 +05:30
Akira Hatanaka
2604830aac
Add support for __builtin_verbose_trap (#79230)
The builtin causes the program to stop its execution abnormally and
shows a human-readable description of the reason for the termination
when a debugger is attached or in a symbolicated crash log.

The motivation for the builtin is explained in the following RFC:

https://discourse.llvm.org/t/rfc-adding-builtin-verbose-trap-string-literal/75845

clang's CodeGen lowers the builtin to `llvm.trap` and emits debugging
information that represents an artificial inline frame whose name
encodes the category and reason strings passed to the builtin.
2024-06-25 08:33:05 -07:00
Shilei Tian
c9f083a994
[Clang][AMDGPU] Add builtins for instrinsic llvm.amdgcn.raw.ptr.buffer.store (#94576)
Depends on https://github.com/llvm/llvm-project/pull/96313.
2024-06-25 09:55:37 -04:00
Vikram Hegde
5feb32ba92
[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel

---------

Co-authored-by: vikramRH <vikhegde@amd.com>
2024-06-25 14:35:19 +05:30
Matt Arsenault
70c8b9c24a
AMDGPU: Remove ds atomic fadd intrinsics (#95396)
These have been replaced with atomicrmw fadd
2024-06-23 10:30:20 +02:00
Farzon Lotfi
f73ac218a6
[HLSL][clang] Add elementwise builtins for trig intrinsics (#95999)
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This is part 3 of 4 PRs. It sets the ground work for using the
intrinsics in HLSL.

Add HLSL frontend apis for `acos`, `asin`, `atan`, `cosh`, `sinh`, and
`tanh`
https://github.com/llvm/llvm-project/issues/70079
https://github.com/llvm/llvm-project/issues/70080
https://github.com/llvm/llvm-project/issues/70081
https://github.com/llvm/llvm-project/issues/70083
https://github.com/llvm/llvm-project/issues/70084
https://github.com/llvm/llvm-project/issues/95966
2024-06-22 17:17:34 -07:00
Shilei Tian
e52016a236
[Clang] Replace emitXXXBuiltin with a unified interface (#96313) 2024-06-21 16:35:53 -04:00
Ahmed Bougacha
e23250ecb7
[clang] Implement function pointer signing and authenticated function calls (#93906)
The functions are currently always signed/authenticated with zero
discriminator.

Co-Authored-By: John McCall <rjmccall@apple.com>
2024-06-21 10:20:15 -07:00
Ahmed Bougacha
7c814c13d0
[clang] Define ptrauth_sign_constant builtin. (#93904)
This is a constant-expression equivalent to
ptrauth_sign_unauthenticated.  Its constant nature lets us guarantee
a non-attackable sequence is generated, unlike
ptrauth_sign_unauthenticated which we generally discourage using.

It being a constant also allows its usage in global initializers, though
requiring constant pointers and discriminators.

The value must be a constant expression of pointer type which evaluates
to a non-null pointer.

The key must be a constant expression of type ptrauth_key.
The extra data must be a constant expression of pointer or integer type;
if an integer, it will be coerced to ptrauth_extra_data_t.
The result will have the same type as the original value.

This can be used in constant expressions.

Co-authored-by: John McCall <rjmccall@apple.com>
2024-06-20 12:09:54 -07:00
Shilei Tian
e3eb12cce9
[Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (#95276)
Depends on https://github.com/llvm/llvm-project/pull/94830.
2024-06-20 11:01:54 -04:00
Tomas Matheson
fa6d38d61a
[AArch64][TargetParser] Split FMV and extensions (#92882)
FMV extensions are really just mappings from FMV feature names to lists
of backend features for codegen. Split them out into their own separate
file.
2024-06-20 15:33:21 +01:00
Andreas Jonson
01ba3fa37b
[Clang] Swap range and noundef metadata to attribute for intrinsics. (#94851) 2024-06-19 17:23:53 +02:00
Matt Arsenault
76894c5e6e
clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (#95395)
We should have done this for the f32/f64 case a long time ago. Now that
codegen handles atomicrmw selection for the v2f16/v2bf16 case, start emitting
it instead.

This also does upgrade the behavior to respect a volatile qualified pointer,
which was previously ignored (for the cases that don't have an explicit
volatile argument).
2024-06-18 20:51:14 +02:00
Helena Kotas
35a2b60973
[SPIRV][HLSL] Add lowering of rsqrt to SPIRV (#95849)
Add lowering of `rsqrt` to SPIRV.

Fixes #88949
2024-06-18 10:35:38 -07:00
Brendan Dahl
3ab6d12625
[WebAssembly] Implement f16x8 madd and nmadd instructions. (#95151)
Implemented with intrinsics and builtins.

Specified at:

https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
2024-06-11 16:10:00 -07:00
Farzon Lotfi
189d471191
[clang] Reland Add tanf16 builtin and support for tan constrained intrinsic (#94559)
Relanding this PR now that
https://github.com/llvm/llvm-project/pull/90503 has merged. with `FTAN`
landing in
[TargetLoweringBase.cpp:L1021](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1020C23-L1021C63
) There is now a llvm tan intrinsic 32\64\128 Expand case for all llvm
backends.

In LLVM, the `llvm.experimental.constrained.cos` and
`llvm.experimental.constrained.sin` intrinsics are used for performing
cosine and sine calculations with additional constraints on
floating-point operations. This behavior is expected for all
floating-point math intrinsics. This change adds these constraints for
the `tan` intrinsic.

-  `Builtins.td` - replace TanF128 with F16F128MathTemplate
- `CGBuiltin.cpp` - map existing tan builtins to `tan` and
`constrained_tan` intrinsic
-   `ConstrainedOps.def` map tan and constrained_tan  to an ISDOpcode.

resolves  #91421

---------

Co-authored-by: Farzon Lotfi <farzon@farzon.com>
2024-06-10 20:46:26 -04:00
Alex Voicu
88e2bb4092
[clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (#89796)
This change seeks to add support for vendor flavoured SPIRV - more
specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that
carries some extra bits of information that are only usable by AMDGCN
targets, forfeiting absolute genericity to obtain greater expressiveness
for target features:

- AMDGCN inline ASM is allowed/supported, under the assumption that the
[SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc)
extension is enabled/used
- AMDGCN target specific builtins are allowed/supported, under the
assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is
enabled when using the downstream translator
- the featureset matches the union of AMDGCN targets' features
- the datalayout string is overspecified to affix both the program
address space and the alloca address space, the latter under the
assumption that the
[SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc)
extension is enabled/used, case in which the extant SPIRV datalayout
string would lead to pointers to function pointing to the private
address space, which would be wrong.

Existing AMDGCN tests are extended to cover this new target. It is
currently dormant / will require some additional changes, but I thought
I'd rather put it up for review to get feedback as early as possible. I
will note that an alternative option is to place this under AMDGPU, but
that seems slightly less natural, since this is still SPIRV, albeit
relaxed in terms of preconditions & constrained in terms of
postconditions, and only guaranteed to be usable on AMDGCN targets (it
is still possible to obtain pristine portable SPIRV through usage of the
flavoured target, though).
2024-06-07 11:50:23 +01:00
Nikita Popov
cd9a02e2c7 [CodeGen] Remove useless zero-index constant GEPs (NFCI)
Remove zero-index constant expression GEPs, which are not needed
with opaque pointers and will get folded away.
2024-05-30 10:24:57 +02:00
Farzon Lotfi
7348bb23ab
Revert "[clang] Add tanf16 builtin and support for tan constrained intrinsic (#93314)" (#93721)
This reverts commit b15a0a37404f36bcd9c7995de8cd16f9cb5ac8af.

This should undo PR: https://github.com/llvm/llvm-project/pull/93314
will need to re-open https://github.com/llvm/llvm-project/issues/91421

wait for https://github.com/llvm/llvm-project/pull/90503 to land
2024-05-29 15:32:38 -04:00
Farzon Lotfi
b15a0a3740
[clang] Add tanf16 builtin and support for tan constrained intrinsic (#93314)
In LLVM, the `llvm.experimental.constrained.cos` and
`llvm.experimental.constrained.sin` intrinsics are used for performing
cosine and sine calculations with additional constraints on
floating-point operations. This behavior is expected for all
floating-point math intrinsics. This change adds these constraints for
the `tan` intrinsic.

-  `Builtins.td` - replace TanF128 with F16F128MathTemplate
- `CGBuiltin.cpp` - map existing tan builtins to `tan` and
`constrained_tan` intrinsic
-   `ConstrainedOps.def` map tan and constrained_tan  to an ISDOpcode.
-  `ISDOpcodes.h` - define tan and strict tan  opcodes

resolves  #91421
2024-05-29 11:16:18 -04:00
Nikita Popov
975477e7f7 [CGBuiltin] Explicitly use inbounds GEP (NFCI)
All of these are inbounds as they access known offsets in fixed
globals. NFCI because constant expression construction currently
already infers this, this patch just makes it explicit.
2024-05-29 16:39:21 +02:00
Brendan Dahl
60bce6eab4
[WebAssembly] Implement all f16x8 binary instructions. (#93360)
This reuses most of the code that was created for f32x4 and f64x2 binary
instructions and tries to follow how they were implemented.

add/sub/mul/div - use regular LL instructions
min/max - use the minimum/maximum intrinsic, and also have builtins
pmin/pmax - use the wasm.pmax/pmin intrinsics and also have builtins

Specified at:

29a9b9462c/proposals/half-precision/Overview.md
2024-05-28 16:33:20 -07:00
Pierre van Houtryve
c1ac6d2dd4
[AMDGPU] Add amdgpu-as MMRA for fences (#78572)
Using MMRAs, allow `builtin_amdgcn_fence` to emit fences that only
target one or more address spaces, instead of fencing all address spaces
at once.

This is done through a `amdgpu-as` MMRA. Currently focused on OpenCL
fences, but can very easily support more AS names and codegen on more
than just fences.
2024-05-27 12:17:04 +02:00
Brendan Dahl
4ebe9bba59
[WebAssembly] Implement prototype f16x8.extract_lane instruction. (#93272)
Specified at:

29a9b9462c/proposals/half-precision/Overview.md

Note: the current spec has f16x8.extract_lane as opcode 0x124, but this
is incorrect and will be changed to 0x121 soon.
2024-05-24 08:31:07 -07:00
Brendan Dahl
09c5525610
[WebAssembly] Implement prototype f16x8.splat instruction. (#93228)
Adds a builtin and intrinsic for the f16x8.splat instruction.

Specified at:

29a9b9462c/proposals/half-precision/Overview.md

Note: the current spec has f16x8.splat as opcode 0x123, but this is
incorrect and will be changed to 0x120 soon.
2024-05-23 20:05:22 -07:00