llvm-project

Author	SHA1	Message	Date
Piyou Chen	82f52d9c42	[RISCV] Support new groupid/bitmask for cpu_model (#101632 ) The spec can be found at https://github.com/riscv-non-isa/riscv-c-api-doc/pull/74. 1. Add the new extension GroupID/Bitmask with latest hwprobe key. 2. Update the `initRISCVFeature ` 3. Update `EmitRISCVCpuSupports` due to not only group0 now.	2024-08-08 14:42:41 +08:00
Phoebe Wang	0dba5381d8	[X86][AVX10.2] Support YMM rounding new instructions (#101825 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965	2024-08-04 21:05:45 +08:00
Joshua Batista	ed5b0e1e69	Add length builtins and length HLSL function to DirectX Backend (#101256 ) This PR adds the length intrinsic and an HLSL function that uses it. The SPIRV implementation is left for a future PR. This PR addresses #99134, though some SPIR-V changes still need to be made to complete the task. Below is how this PR addresses #99134. - "Implement `length` clang builtin" was done by defining `HLSLL ength` in Builtins.td - "Link `length` clang builtin with hlsl_intrinsics.h" was done by using the alias attribute to make `length` an alias of `__builtin_hlsl_elementwise_length` in hlsl_intrinsics.h - "Add sema checks for `length` to `CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp` " was done, but in this case not in SemaChecking.cpp, rather SemaHLSL.cpp. A case was added to the builtin to check for semantic failures, and set `TheCall` up to have the right return type. - "Add codegen for `length` to `EmitHLSLBuiltinExpr` in `CGBuiltin.cpp`" was done. For scalars, fabs is emitted, otherwise, length is emitted. - "Add codegen tests to `clang/test/CodeGenHLSL/builtins/length.hlsl` was done to test that `length` in HLSL emits the right intrinsic. - "Add sema tests to `clang/test/SemaHLSL/BuiltIns/length-errors.hlsl`" was done to test for diagnostics emitted in SemaHLSL.cpp - "Create the `int_dx_length` intrinsic in `IntrinsicsDirectX.td`" was done. Specifying return types and parameter types was difficult, but `idot` was used for reference, and `llvm\include\llvm\IR\Intrinsics.td` contains all the ways to express return / parameter types. - "Create an intrinsic expansion of `int_dx_length` in `llvm/lib/Target/DirectX/DXILIntrinsicExpansion.cpp`" was done, and was mostly derived by looking at `TranslateLength` in `HLOperationLower.cpp` in the DXC codebase. - "Create the `length.ll` and `length_errors.ll` tests in `llvm/test/CodeGen/DirectX/`" was done by taking the DXIL output of `clang/test/CodeGenHLSL/builtins/length.hlsl` and running `opt -S -dxil-intrinsic-expansion` and ` opt -S -dxil-op-lower` on it, checking for how the length intrinsic was either expanded or lowered. - "Create the `int_spv_length` intrinsic in `IntrinsicsSPIRV.td`" was done by copying `IntrinsicsDirectX.td`. --------- Co-authored-by: Justin Bogner <mail@justinbogner.com>	2024-08-02 21:16:24 -07:00
Farzon Lotfi	96e6255e8b	[HLSL] cleanup builtin names elementwise usage (#101543 ) Remove elementwise description for builtins that don't perform elementwise operations.	2024-08-02 00:10:28 -04:00
Bill Wendling	160fb1121c	[Clang][NFC] Improve generation of GEP and RecordDecl loop (#101434 ) As with other loops, we need only look at a RecordDecl's FieldDecls. Convert to using them. In the meantime, we can improve the generation of the 'counted_by' FieldDecl's GEP by creating one GEP instead of a series of GEPs.	2024-08-01 19:46:57 +00:00
Allen	9589c128ae	[clang codegen] Emit int TBAA metadata on more FP math libcalls (#100302 ) Follow #96025, except expf, more FP math libcalls in libm should also be supported. Fix https://github.com/llvm/llvm-project/issues/86635	2024-07-31 09:01:20 +08:00
Joseph Huber	dbb8b7a0f4	Reapply "[OpenMP][libc] Remove special handling for OpenMP printf (#98940 )" This reverts commit fea5914c926e2f013a8b5e27eaa74c7047fb2c71.	2024-07-26 17:21:56 -05:00
Joseph Huber	fea5914c92	Revert "[OpenMP][libc] Remove special handling for OpenMP printf (#98940 )" This reverts commit 069e8bcd82c4420239f95c7e6a09e1f756317cfc. Summary: Some tests failing, revert this for now.	2024-07-26 16:39:12 -05:00
Joseph Huber	069e8bcd82	[OpenMP][libc] Remove special handling for OpenMP printf (#98940 ) Summary: Currently there are several layers to handle `printf`. Since we now have varargs and an implementation of `printf` this can be heavily simplified. 1. The frontend renames `printf` into `omp_vprintf` and gives it an argument buffer. Removing 1. triggered some code in the AMDGPU backend menat for HIP / OpenCL, so I hadded an exception to it. 2. Forward this to CUDA vprintf or ignore it. We no longer need special handling for it since we have varargs. So now we just forward this to CUDA vprintf if we have libc, otherwise just leave `printf` as an external function and expect that `libc` will be linked in.	2024-07-26 16:03:36 -05:00
James Y Knight	0431d6dab4	Clang: convert `__m64` intrinsics to unconditionally use SSE2 instead of MMX. (#96540 ) The MMX instruction set is legacy, and the SSE2 variants are in every way superior, when they are available -- and they have been available since the Pentium 4 was released, 20 years ago. Therefore, we are switching the "MMX" intrinsics to depend on SSE2, unconditionally. This change entirely drops the ability to generate vectorized code using compiler intrinsics for chips with MMX but without SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released 1997-1999), as well as AMD K6 and K7 series chips of around the same timeframe. Targeting these older CPUs remains supported -- simply without the ability to use MMX compiler intrinsics. Migrating away from the use of MMX registers also fixes a rather non-obvious requirement. The long-standing programming model for these MMX intrinsics requires that the programmer be aware of the x87/MMX mode-switching semantics, and manually call `_mm_empty()` between using any MMX instruction and any x87 FPU instruction. If you neglect to, then every future x87 operation will return a NaN result. This requirement is not at all obvious to users of these these intrinsic functions, and causes very difficult to detect bugs. Worse, even if the user did write code that correctly calls `_mm_empty()` in the right places, LLVM may sometimes reorder x87 and mmx operations around each-other, unaware of this mode switching issue. Eliminating the use of MMX registers eliminates this problem. This change also deletes the now-unnecessary MMX `__builtin_ia32_*` functions from Clang. Only 3 MMX-related builtins remain in use -- `__builtin_ia32_emms`, used by `_mm_empty`, and `__builtin_ia32_vec_{ext,set}_v4si`, used by `_mm_insert_pi16` and `_mm_extract_pi16`. Note particularly that the latter two lower to generic, non-MMX, IR. Support for the LLVM intrinsics underlying these removed builtins still remains, for the moment. The file `clang/www/builtins.py` has been updated with mappings from the newly-removed `__builtin_ia32` functions to the still-supported equivalents in `mmintrin.h`. (Originally uploaded at https://reviews.llvm.org/D86855 and https://reviews.llvm.org/D94252) Fixes issue #41665 Works towards #98272	2024-07-24 17:00:12 -04:00
Brendan Dahl	0dbd72d6ab	[WebAssembly] Implement f16x8.replace_lane instruction. (#99388 ) Use a builtin and intrinsic until half types are better supported for instruction selection.	2024-07-24 11:55:36 -07:00
Andrii Levytskyi	c92d9b06d4	[SPIRV][HLSL] Add lowering of frac to SPIR-V (#97111 ) Implements frac lowering to SPIR-V. Closes #88059	2024-07-23 14:03:39 -04:00
Philip Reames	d1e28e2a7b	[RISCV] Support __builtin_cpu_init and __builtin_cpu_supports (#99700 ) This implements the __builtin_cpu_init and __builtin_cpu_supports builtin routines based on the compiler runtime changes in https://github.com/llvm/llvm-project/pull/85790. This is inspired by https://github.com/llvm/llvm-project/pull/85786. Major changes are a) a restriction in scope to only the builtins (which have a much narrower user interface), and the avoidance of false generality. This change deliberately only handles group 0 extensions (which happen to be all defined ones today), and avoids the tblgen changes from that review. I don't have an environment in which I can actually test this, but @BeMg has been kind enough to report that this appears to work as expected. Before this can make it into a release, we need a change such as https://github.com/llvm/llvm-project/pull/99958. The gcc docs claim that cpu_support can be called by "normal" code without calling the cpu_init routine because the init routine will have been called by a high priority constructor. Our current compiler-rt mechanism does not do this.	2024-07-23 08:48:28 -07:00
Farzon Lotfi	a14baec0f3	[clang] Emit constraint intrinsics for arc and hyperbolic trig clang builtins (#98949 ) ## Change(s) - `Builtins.td` - Add f16 support for libm arc and hyperbolic trig functions - `CGBuiltin.cpp` - Emit constraint intrinsics for trig clang builtins ## History This change is part of an implementation of https://github.com/llvm/llvm-project/issues/87367's investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This change adds wasm lowering cases for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh`. https://github.com/llvm/llvm-project/issues/70079 https://github.com/llvm/llvm-project/issues/70080 https://github.com/llvm/llvm-project/issues/70081 https://github.com/llvm/llvm-project/issues/70083 https://github.com/llvm/llvm-project/issues/70084 https://github.com/llvm/llvm-project/issues/95966 ## Precursor PR(s) Note this PR needs Merge after: - #98937 - #98755	2024-07-19 10:19:41 -04:00
Allen	1df2e0c344	[clang codegen] Emit int TBAA metadata on FP math libcall expf (#96025 ) Base on the discussion https://discourse.llvm.org/t/fp-can-we-add-pure-attribute-for-math-library-functions-default/79459, math libcalls set errno, so it should emit "int" TBAA metadata on FP libcalls to solve the alias issue. Note: Only add support for expf in this PR Fix https://github.com/llvm/llvm-project/issues/86635	2024-07-19 11:19:21 +08:00
Shilei Tian	892c58cf74	[Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.ptr.buffer.load` (#99258 )	2024-07-18 15:33:03 -04:00
Changpeng Fang	280d90d0fd	AMDGPU: Add back half and bfloat support for global_load_tr16 pats (#99540 ) half and bfloat are common types for 16-bit elements. The support of them was original there and dropped due to some reasons. This work adds the support of the float types back.	2024-07-18 11:23:35 -07:00
James Y Knight	f0eb5587ce	Remove support for 3DNow!, both intrinsics and builtins. (#96246 ) This set of instructions was only supported by AMD chips starting in the K6-2 (introduced 1998), and before the "Bulldozer" family (2011). They were never much used, as they were effectively superseded by the more-widely-implemented SSE (first implemented on the AMD side in Athlon XP in 2001). This is being done as a predecessor towards general removal of MMX register usage. Since there is almost no usage of the 3DNow! intrinsics, and no modern hardware even implements them, simple removal seems like the best option. (Clang half originally uploaded in https://reviews.llvm.org/D94213) Works towards issue #41665 and issue #98272.	2024-07-16 12:08:48 -04:00
Mike Rice	945440033f	[NFC][clang] Replace unchecked dyn_cast with cast (#98948 ) BI__builtin_hlsl_elementwise_rcp is only invoked with a FixedVectorType so use cast to make this clear and satisfy the static verifier.	2024-07-16 08:05:43 -07:00
Zahira Ammarguellat	0bfdc4d492	Add __builtin_fmaf16. (#97424 )	2024-07-15 08:29:16 -04:00
Amy Huang	ae7ab043f2	Add __hlt intrinsic for Windows ARM. (#96578 ) Add __hlt, which is a MSVC ARM64 intrinsic. This intrinsic is just the HLT instruction. MSVC's version seems to return something undefined; in this patch it will just return zero. MSVC intrinsics are defined here https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics. I used unsigned int as the return type, because that is what the MSVC intrin.h header uses, even though it conflicts with the documentation.	2024-07-08 12:59:02 -07:00
Alex Voicu	d4216b5d0b	[clang][CodeGen][AMDGPU] Enable AMDGPU `printf` for `spirv64-amd-amdhsa` (#97132 ) This enables the AMDGPU specific implementation of `printf` when compiling for AMDGCN flavoured SPIR-V, the consequence being that the expansion into ROCDL calls & friends gets expanded before "lowering" to SPIR-V and gets carried through. The only relatively "novel" aspect is that the `callAppendStringN` is simplified to take the type of the passed in arguments, as opposed to querying them from the module. This is a neutral change since the arguments were passed directly to the call, without any attempt to cast them, hence the assumption that the actual types match the formal ones was already baked in.	2024-07-05 14:08:07 +01:00
Chen Zheng	6a992bc89f	[PowerPC] refactor CPU info in PPCTargetParser.def, NFC CPU features will be done in follow up patches.	2024-07-03 00:20:14 -04:00
smanna12	05d8ea77c9	[Clang] Prevent null pointer dereferences in SVE tuple functions (#94267 ) This patch addresses a null pointer dereference issue reported by static analyzer tool in the `EmitSVETupleSetOrGet()` and `EmitSVETupleCreate()` functions. Previously, the function assumed that the result of `dyn_cast<>` to `ScalableVectorType` would always be non-null, which is not guaranteed. The fix introduces a null check after the `dyn_cast<>` operation. If the cast fails and `SingleVecTy` is null, the function now returns `nullptr` to indicate an error. This prevents the dereference of a null pointer, which could lead to undefined behavior. Additionally, the assert message has been corrected to accurately reflect the expected conditions. These changes collectively enhance the robustness of the code by ensuring type safety and preventing runtime errors due to improper type casting.	2024-07-01 10:51:28 -05:00
Matt Arsenault	8f63d154ec	clang/AMDGPU: Use atomicrmw for ds fmin/fmax builtins (#96738 )	2024-06-27 15:32:08 +02:00
Vikram Hegde	35f7b60aa6	[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725 ) These are incremental changes over #89217 , with core logic being the same. This patch along with #89217 and #91190 should get us ready to enable 64 bit optimizations in atomic optimizer.	2024-06-26 09:24:09 +05:30
Akira Hatanaka	2604830aac	Add support for __builtin_verbose_trap (#79230 ) The builtin causes the program to stop its execution abnormally and shows a human-readable description of the reason for the termination when a debugger is attached or in a symbolicated crash log. The motivation for the builtin is explained in the following RFC: https://discourse.llvm.org/t/rfc-adding-builtin-verbose-trap-string-literal/75845 clang's CodeGen lowers the builtin to `llvm.trap` and emits debugging information that represents an artificial inline frame whose name encodes the category and reason strings passed to the builtin.	2024-06-25 08:33:05 -07:00
Shilei Tian	c9f083a994	[Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.ptr.buffer.store` (#94576 ) Depends on https://github.com/llvm/llvm-project/pull/96313.	2024-06-25 09:55:37 -04:00
Vikram Hegde	5feb32ba92	[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217 ) This patch is intended to be the first of a series with end goal to adapt atomic optimizer pass to support i64 and f64 operations (along with removing all unnecessary bitcasts). This legalizes 64 bit readlane, writelane and readfirstlane ops pre-ISel --------- Co-authored-by: vikramRH <vikhegde@amd.com>	2024-06-25 14:35:19 +05:30
Matt Arsenault	70c8b9c24a	AMDGPU: Remove ds atomic fadd intrinsics (#95396 ) These have been replaced with atomicrmw fadd	2024-06-23 10:30:20 +02:00
Farzon Lotfi	f73ac218a6	[HLSL][clang] Add elementwise builtins for trig intrinsics (#95999 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This is part 3 of 4 PRs. It sets the ground work for using the intrinsics in HLSL. Add HLSL frontend apis for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh` https://github.com/llvm/llvm-project/issues/70079 https://github.com/llvm/llvm-project/issues/70080 https://github.com/llvm/llvm-project/issues/70081 https://github.com/llvm/llvm-project/issues/70083 https://github.com/llvm/llvm-project/issues/70084 https://github.com/llvm/llvm-project/issues/95966	2024-06-22 17:17:34 -07:00
Shilei Tian	e52016a236	[Clang] Replace `emitXXXBuiltin` with a unified interface (#96313 )	2024-06-21 16:35:53 -04:00
Ahmed Bougacha	e23250ecb7	[clang] Implement function pointer signing and authenticated function calls (#93906 ) The functions are currently always signed/authenticated with zero discriminator. Co-Authored-By: John McCall <rjmccall@apple.com>	2024-06-21 10:20:15 -07:00
Ahmed Bougacha	7c814c13d0	[clang] Define ptrauth_sign_constant builtin. (#93904 ) This is a constant-expression equivalent to ptrauth_sign_unauthenticated. Its constant nature lets us guarantee a non-attackable sequence is generated, unlike ptrauth_sign_unauthenticated which we generally discourage using. It being a constant also allows its usage in global initializers, though requiring constant pointers and discriminators. The value must be a constant expression of pointer type which evaluates to a non-null pointer. The key must be a constant expression of type ptrauth_key. The extra data must be a constant expression of pointer or integer type; if an integer, it will be coerced to ptrauth_extra_data_t. The result will have the same type as the original value. This can be used in constant expressions. Co-authored-by: John McCall <rjmccall@apple.com>	2024-06-20 12:09:54 -07:00
Shilei Tian	e3eb12cce9	[Clang][AMDGPU] Add a builtin for `llvm.amdgcn.make.buffer.rsrc` intrinsic (#95276 ) Depends on https://github.com/llvm/llvm-project/pull/94830.	2024-06-20 11:01:54 -04:00
Tomas Matheson	fa6d38d61a	[AArch64][TargetParser] Split FMV and extensions (#92882 ) FMV extensions are really just mappings from FMV feature names to lists of backend features for codegen. Split them out into their own separate file.	2024-06-20 15:33:21 +01:00
Andreas Jonson	01ba3fa37b	[Clang] Swap range and noundef metadata to attribute for intrinsics. (#94851 )	2024-06-19 17:23:53 +02:00
Matt Arsenault	76894c5e6e	clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (#95395 ) We should have done this for the f32/f64 case a long time ago. Now that codegen handles atomicrmw selection for the v2f16/v2bf16 case, start emitting it instead. This also does upgrade the behavior to respect a volatile qualified pointer, which was previously ignored (for the cases that don't have an explicit volatile argument).	2024-06-18 20:51:14 +02:00
Helena Kotas	35a2b60973	[SPIRV][HLSL] Add lowering of `rsqrt` to SPIRV (#95849 ) Add lowering of `rsqrt` to SPIRV. Fixes #88949	2024-06-18 10:35:38 -07:00
Brendan Dahl	3ab6d12625	[WebAssembly] Implement f16x8 madd and nmadd instructions. (#95151 ) Implemented with intrinsics and builtins. Specified at: https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md	2024-06-11 16:10:00 -07:00
Farzon Lotfi	189d471191	[clang] Reland Add tanf16 builtin and support for tan constrained intrinsic (#94559 ) Relanding this PR now that https://github.com/llvm/llvm-project/pull/90503 has merged. with `FTAN` landing in [TargetLoweringBase.cpp:L1021](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1020C23-L1021C63 ) There is now a llvm tan intrinsic 32\64\128 Expand case for all llvm backends. In LLVM, the `llvm.experimental.constrained.cos` and `llvm.experimental.constrained.sin` intrinsics are used for performing cosine and sine calculations with additional constraints on floating-point operations. This behavior is expected for all floating-point math intrinsics. This change adds these constraints for the `tan` intrinsic. - `Builtins.td` - replace TanF128 with F16F128MathTemplate - `CGBuiltin.cpp` - map existing tan builtins to `tan` and `constrained_tan` intrinsic - `ConstrainedOps.def` map tan and constrained_tan to an ISDOpcode. resolves #91421 --------- Co-authored-by: Farzon Lotfi <farzon@farzon.com>	2024-06-10 20:46:26 -04:00
Alex Voicu	88e2bb4092	[clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (#89796 ) This change seeks to add support for vendor flavoured SPIRV - more specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that carries some extra bits of information that are only usable by AMDGCN targets, forfeiting absolute genericity to obtain greater expressiveness for target features: - AMDGCN inline ASM is allowed/supported, under the assumption that the [SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc) extension is enabled/used - AMDGCN target specific builtins are allowed/supported, under the assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is enabled when using the downstream translator - the featureset matches the union of AMDGCN targets' features - the datalayout string is overspecified to affix both the program address space and the alloca address space, the latter under the assumption that the [SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc) extension is enabled/used, case in which the extant SPIRV datalayout string would lead to pointers to function pointing to the private address space, which would be wrong. Existing AMDGCN tests are extended to cover this new target. It is currently dormant / will require some additional changes, but I thought I'd rather put it up for review to get feedback as early as possible. I will note that an alternative option is to place this under AMDGPU, but that seems slightly less natural, since this is still SPIRV, albeit relaxed in terms of preconditions & constrained in terms of postconditions, and only guaranteed to be usable on AMDGCN targets (it is still possible to obtain pristine portable SPIRV through usage of the flavoured target, though).	2024-06-07 11:50:23 +01:00
Nikita Popov	cd9a02e2c7	[CodeGen] Remove useless zero-index constant GEPs (NFCI) Remove zero-index constant expression GEPs, which are not needed with opaque pointers and will get folded away.	2024-05-30 10:24:57 +02:00
Farzon Lotfi	7348bb23ab	Revert "[clang] Add tanf16 builtin and support for tan constrained intrinsic (#93314 )" (#93721 ) This reverts commit b15a0a37404f36bcd9c7995de8cd16f9cb5ac8af. This should undo PR: https://github.com/llvm/llvm-project/pull/93314 will need to re-open https://github.com/llvm/llvm-project/issues/91421 wait for https://github.com/llvm/llvm-project/pull/90503 to land	2024-05-29 15:32:38 -04:00
Farzon Lotfi	b15a0a3740	[clang] Add tanf16 builtin and support for tan constrained intrinsic (#93314 ) In LLVM, the `llvm.experimental.constrained.cos` and `llvm.experimental.constrained.sin` intrinsics are used for performing cosine and sine calculations with additional constraints on floating-point operations. This behavior is expected for all floating-point math intrinsics. This change adds these constraints for the `tan` intrinsic. - `Builtins.td` - replace TanF128 with F16F128MathTemplate - `CGBuiltin.cpp` - map existing tan builtins to `tan` and `constrained_tan` intrinsic - `ConstrainedOps.def` map tan and constrained_tan to an ISDOpcode. - `ISDOpcodes.h` - define tan and strict tan opcodes resolves #91421	2024-05-29 11:16:18 -04:00
Nikita Popov	975477e7f7	[CGBuiltin] Explicitly use inbounds GEP (NFCI) All of these are inbounds as they access known offsets in fixed globals. NFCI because constant expression construction currently already infers this, this patch just makes it explicit.	2024-05-29 16:39:21 +02:00
Brendan Dahl	60bce6eab4	[WebAssembly] Implement all f16x8 binary instructions. (#93360 ) This reuses most of the code that was created for f32x4 and f64x2 binary instructions and tries to follow how they were implemented. add/sub/mul/div - use regular LL instructions min/max - use the minimum/maximum intrinsic, and also have builtins pmin/pmax - use the wasm.pmax/pmin intrinsics and also have builtins Specified at: `29a9b9462c/proposals/half-precision/Overview.md`	2024-05-28 16:33:20 -07:00
Pierre van Houtryve	c1ac6d2dd4	[AMDGPU] Add amdgpu-as MMRA for fences (#78572 ) Using MMRAs, allow `builtin_amdgcn_fence` to emit fences that only target one or more address spaces, instead of fencing all address spaces at once. This is done through a `amdgpu-as` MMRA. Currently focused on OpenCL fences, but can very easily support more AS names and codegen on more than just fences.	2024-05-27 12:17:04 +02:00
Brendan Dahl	4ebe9bba59	[WebAssembly] Implement prototype f16x8.extract_lane instruction. (#93272 ) Specified at: `29a9b9462c/proposals/half-precision/Overview.md` Note: the current spec has f16x8.extract_lane as opcode 0x124, but this is incorrect and will be changed to 0x121 soon.	2024-05-24 08:31:07 -07:00
Brendan Dahl	09c5525610	[WebAssembly] Implement prototype f16x8.splat instruction. (#93228 ) Adds a builtin and intrinsic for the f16x8.splat instruction. Specified at: `29a9b9462c/proposals/half-precision/Overview.md` Note: the current spec has f16x8.splat as opcode 0x123, but this is incorrect and will be changed to 0x120 soon.	2024-05-23 20:05:22 -07:00

... 3 4 5 6 7 ...

2168 Commits