llvm-project

Author	SHA1	Message	Date
Qiu Chaofan	a4558a4a53	[PowerPC] Implement 32-bit expansion for rldimi (#86783 ) rldimi is 64-bit instruction, due to backward compatibility, it needs to be expanded into series of rotate and masking in 32-bit environment. In the future, we may improve bit permutation selector and remove such direct codegen.	2024-04-09 16:43:49 +08:00
Farzon Lotfi	1cb64d75b2	[HLSL][DXIL][SPIRV] Implementation of an abstraction for intrinsic selection of HLSL backends (#87171 ) Start of #83882 - `Builtins.td` - add the `hlsl` `all` elementwise builtin. - `CGBuiltin.cpp` - Show a use case for CGHLSLUtils via an `all` intrinsic codegen. - `CGHLSLRuntime.cpp` - move `thread_id` to use CGHLSLUtils. - `CGHLSLRuntime.h` - Create a macro to help pick the right intrinsic for the backend. - `hlsl_intrinsics.h` - Add the `all` api. - `SemaChecking.cpp` - Add `all` builtin type checking - `IntrinsicsDirectX.td` - Add the `all` `dx` intrinsic - `IntrinsicsSPIRV.td` - Add the `all` `spv` intrinsic Work still needed: - `SPIRVInstructionSelector.cpp` - Add an implementation of `OpAll` for `spv_all` intrinsic	2024-04-04 21:41:55 -04:00
David Green	42c7bc04c3	[AArch64][ARM] Make neon fp16 generic intrinsics always available. (#87467 ) By generic intrinsics this mean things like dup, ext, zip and bsl that can always be executed with integer s16 operations and do not require fullfp16. This makes them always available, and brings them inline with GCC. https://godbolt.org/z/azs8eMv54 The relevant test cases have been moved into their own files, to allow them to be tested with armv8-a and armv8.2-a+fp16.	2024-04-03 19:10:14 +01:00
Sven van Haastregt	e47a81c1d2	[OpenCL] Fix BIenqueue_kernel fallthrough (#83238 ) Handling of the `BIenqueue_kernel` builtin must not fallthrough to the `BIget_kernel_work_group_size` builtin, as these builtins have no common functionality.	2024-04-02 09:31:38 +02:00
Marc Auberer	3c8ede9f45	[HLSL][clang] Move hlsl_wave_get_lane_index to EmitHLSLBuiltinExpr (#87131 ) Resolves #87109	2024-03-30 21:33:56 +01:00
Nathan Gauër	0f61051f54	[clang][HLSL][SPRI-V] Add convergence intrinsics (#80680 ) HLSL has wave operations and other kind of function which required the control flow to either be converged, or respect certain constraints as where and how to re-converge. At the HLSL level, the convergence are mostly obvious: the control flow is expected to re-converge at the end of a scope. Once translated to IR, HLSL scopes disapear. This means we need a way to communicate convergence restrictions down to the backend. For this, the SPIR-V backend uses convergence intrinsics. So this commit adds some code to generate convergence intrinsics when required. --------- Signed-off-by: Nathan Gauër <brioche@google.com>	2024-03-28 17:18:05 +01:00
Akira Hatanaka	84780af4b0	[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86923 ) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects. This reapplies d9a685a9dd589486e882b722e513ee7b8c84870c, which was reverted because it broke ubsan bots. There seems to be a bug in coroutine code-gen, which is causing EmitTypeCheck to use the wrong alignment. For now, pass alignment zero to EmitTypeCheck so that it can compute the correct alignment based on the passed type (see function EmitCXXMemberOrOperatorMemberCallExpr).	2024-03-28 06:54:36 -07:00
Akira Hatanaka	f75eebab88	Revert "[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86721 )" (#86898 ) This reverts commit d9a685a9dd589486e882b722e513ee7b8c84870c. The commit broke ubsan bots.	2024-03-27 18:14:04 -07:00
Akira Hatanaka	d9a685a9dd	[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86721 ) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects. This reapplies 8bd1f9116aab879183f34707e6d21c7051d083b6. The commit broke msan bots because LValue::IsKnownNonNull was uninitialized.	2024-03-27 12:24:49 -07:00
Alex Voicu	ab7dba233a	[CodeGen][LLVM] Make the `va_list` related intrinsics generic. (#85460 ) Currently, the builtins used for implementing `va_list` handling unconditionally take their arguments as unqualified `ptr`s i.e. pointers to AS 0. This does not work for targets where the default AS is not 0 or AS 0 is not a viable AS (for example, a target might choose 0 to represent the constant address space). This patch changes the builtins' signature to take generic `anyptr` args, which corrects this issue. It is noisy due to the number of tests affected. A test for an upstream target which does not use 0 as its default AS (SPIRV for HIP device compilations) is added as well.	2024-03-27 11:41:34 +00:00
Changpeng Fang	d023995ae2	AMDGPU: Simplify EmitAMDGPUBuiltinExpr for load transposes, NFC (#86707 ) We should not manually get the types of the loading data. Instead, we can get the types from the intrinsics directly.	2024-03-26 17:51:03 -07:00
Akira Hatanaka	b311756450	Revert "[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#67454 )" (#86674 ) This reverts commit 8bd1f9116aab879183f34707e6d21c7051d083b6. It appears that the commit broke msan bots.	2024-03-26 07:37:57 -07:00
Akira Hatanaka	8bd1f9116a	[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#67454 ) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects.	2024-03-25 18:05:42 -07:00
Changpeng Fang	350bda4419	AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313 ) Rename the intrinsics to close to the instruction mnemonic names: Use global_load_tr_b64 and global_load_tr_b128 instead of global_load_tr. This patch also removes f16/bf16 versions of builtins/intrinsics. To simplify the design, we should avoid enumerating all possible types in implementing builtins. We can always use bitcast.	2024-03-25 16:55:22 -07:00
Farzon Lotfi	060df78cdb	[DXIL] Add Float `Dot` Intrinsic Lowering (#86071 ) Completes #83626 - `CGBuiltin.cpp` - modify `getDotProductIntrinsic` to be able to emit `dot2`, `dot3`, and `dot4` intrinsics based on element count - `IntrinsicsDirectX.td` - for floating point add `dot2`, `dot3`, and `dot4` inntrinsics -`DXIL.td` add dxilop intrinsic lowering for `dot2`, `dot3`, & `dot4`. - `DXILOpLowering.cpp` - add vector arg flattening for dot product. - `DXILOpBuilder.h` - modify `createDXILOpCall` to take a smallVector instead of an iterator - `DXILOpBuilder.cpp` - modify `createDXILOpCall` by moving the small vector up to the calling function in `DXILOpLowering.cpp`. - Moving one function up gives us access to the `CallInst` and `Function` which were needed to distinguish the dot product intrinsics and get the operands without using the iterator.	2024-03-25 18:01:46 -04:00
Changpeng Fang	3054d0dae7	AMDGPU: Rename and add bf16 support for global_load_tr builtins (#86202 ) Make the name of a clang builtin as close to the mnemonic instruction name as possible. The data type suffix may not be enough to tell what instruction the builtin is going to produce. This patch also add the bf16 support for global_load_tr_b128 builtins.	2024-03-22 08:51:53 -07:00
OverMighty	c1c2551a28	[clang] Implement __builtin_{clzg,ctzg} (#83431 ) Fixes #83075, fixes #83076.	2024-03-21 09:33:16 -07:00
Yeoul Na	3eb9ff3095	Turn 'counted_by' into a type attribute and parse it into 'CountAttributedType' (#78000 ) In `-fbounds-safety`, bounds annotations are considered type attributes rather than declaration attributes. Constructing them as type attributes allows us to extend the attribute to apply nested pointers, which is essential to annotate functions that involve out parameters: `void foo(int __counted_by(out_count) out_buf, int out_count)`. We introduce a new sugar type to support bounds annotated types, `CountAttributedType`. In order to maintain extra data (the bounds expression and the dependent declaration information) that is not trackable in `AttributedType` we create a new type dedicate to this functionality. This patch also extends the parsing logic to parse the `counted_by` argument as an expression, which will allow us to extend the model to support arguments beyond an identifier, e.g., `__counted_by(n + m)` in the future as specified by `-fbounds-safety`. This also adjusts `__bdos` and array-bounds sanitizer code that already uses `CountedByAttr` to check `CountAttributedType` instead to get the field referred to by the attribute.	2024-03-20 13:36:56 +09:00
Farzon Lotfi	081a66ffac	[DXIL] implement dot intrinsic lowering for integers (#85662 ) this implements part 1 of 2 for #83626 - `CGBuiltin.cpp` - modified to have seperate cases for signed and unsigned integers. - `SemaChecking.cpp` - modified to prevent the generation of a double dot product intrinsic if the builtin were to be called directly. - `IntrinsicsDirectX.td` creation of the signed and unsigned dot intrinsics needed for instruction expansion. - `DXILIntrinsicExpansion.cpp` - handle instruction expansion cases for integer dot product.	2024-03-19 12:03:43 -04:00
Farzon Lotfi	8386a388bd	[HLSL] implement `clamp` intrinsic (#85424 ) closes #70071 - `CGBuiltin.cpp` - Add the unsigned\generic clamp intrinsic emitter. - `IntrinsicsDirectX.td` - add the `dx.clamp` & `dx.uclamp` intrinsics - `DXILIntrinsicExpansion.cpp` - add the `clamp` instruction expansion while maintaining vector form. - `SemaChecking.cpp` - Add `clamp` builtin Sema Checks. - `Builtins.td` - add a `clamp` builtin - `hlsl_intrinsics.h` - add the `clamp` api Why `clamp` as instruction expansion for DXIL? 1. SPIR-V has a GLSL `clamp` extension via: - [FClamp](https://registry.khronos.org/SPIR-V/specs/1.0/GLSL.std.450.html#FClamp) - [UClamp](https://registry.khronos.org/SPIR-V/specs/1.0/GLSL.std.450.html#UClamp) - [SClamp](https://registry.khronos.org/SPIR-V/specs/1.0/GLSL.std.450.html#SClamp) 2. Further Clamp lowers to `min(max( x, min_range ), max_range)` which we have float, signed, & unsigned dixilOps.	2024-03-15 20:57:08 -04:00
Ahmed Bougacha	0481f049c3	[AArch64][PAC] Support ptrauth builtins and -fptrauth-intrinsics. (#65996 ) This defines the basic set of pointer authentication clang builtins (provided in a new header, ptrauth.h), with diagnostics and IRGen support. The availability of the builtins is gated on a new flag, `-fptrauth-intrinsics`. Note that this only includes the basic intrinsics, and notably excludes `ptrauth_sign_constant`, `ptrauth_type_discriminator`, and `ptrauth_string_discriminator`, which need extra logic to be fully supported. This also introduces clang/docs/PointerAuthentication.rst, which describes the ptrauth model in general, in addition to these builtins. Co-Authored-By: Akira Hatanaka <ahatanaka@apple.com> Co-Authored-By: John McCall <rjmccall@apple.com>	2024-03-15 14:17:21 -07:00
Farzon Lotfi	de1a97db39	[DXIL] `exp`, `any`, `lerp`, & `rcp` Intrinsic Lowering (#84526 ) This change implements lowering for #70076, #70100, #70072, & #70102 `CGBuiltin.cpp` - - simplify `lerp` intrinsic `IntrinsicsDirectX.td` - simplify `lerp` intrinsic `SemaChecking.cpp` - remove unnecessary check `DXILIntrinsicExpansion.*` - add intrinsic to instruction expansion cases `DXILOpLowering.cpp` - make sure `DXILIntrinsicExpansion` happens first `DirectX.h` - changes to support new pass `DirectXTargetMachine.cpp` - changes to support new pass Why `any`, and `lerp` as instruction expansion just for DXIL? - SPIR-V there is an [OpAny](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpAny) - SPIR-V has a GLSL lerp extension via [Fmix](https://registry.khronos.org/SPIR-V/specs/1.0/GLSL.std.450.html#FMix) Why `exp` instruction expansion? - We have an `exp2` opcode and `exp` reuses that opcode. So instruction expansion is a convenient way to do preprocessing. - Further SPIR-V has a GLSL exp extension via [Exp](https://registry.khronos.org/SPIR-V/specs/1.0/GLSL.std.450.html#Exp) and [Exp2](https://registry.khronos.org/SPIR-V/specs/1.0/GLSL.std.450.html#Exp2) Why `rcp` as instruction expansion? This one is a bit of the odd man out and might have to move to `cgbuiltins` when we better understand SPIRV requirements. However I included it because it seems like [fast math mode has an AllowRecip flag](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_fp_fast_math_mode) which lets you compute the reciprocal without performing the division. We don't have that in DXIL so thought to include it.	2024-03-14 20:25:57 -04:00
Farzon Lotfi	d192b64370	[HLSL] implement the `isinf` intrinsic (#84927 ) This change implements part 1 of 2 for #70095 - `hlsl_intrinsics.h` - add the `isinf` api - `Builtins.td` - add an hlsl builtin for `isinf`. - `CGBuiltin.cpp` add the ir generation for `isinf` intrinsic. - `SemaChecking.cpp` - add a non-math elementwise checks because this is a bool return. - `IntrinsicsDirectX.td` - add an `isinf` intrinsic. `DXIL.td` lowering is left, but changes need to be made there before we can support this case.	2024-03-14 18:07:48 -04:00
Farzon Lotfi	8f9ee39c58	[HLSL] Implement `rsqrt` intrinsic (#84820 ) This change implements #70074 - `hlsl_intrinsics.h` - add the `rsqrt` api - `DXIL.td` add the llvm intrinsic to DXIL op lowering map. - `Builtins.td` - add an hlsl builtin for rsqrt. - `CGBuiltin.cpp` add the ir generation for the rsqrt intrinsic. - `SemaChecking.cpp` - reuse the one arg float only checks. - `IntrinsicsDirectX.td` -add an `rsqrt` intrinsic.	2024-03-14 16:49:33 -04:00
Tim Northover	4299c727e4	AArch64: add __builtin_arm_trap It's useful to provide an indicator code with the trap, which the generic __builtin_trap can't do. asm("brk #N") is an option, but following that with a __builtin_unreachable() leads to two traps when the compiler doesn't know the block can't return. So compiler support like this is useful.	2024-03-14 11:32:44 +00:00
Sven van Haastregt	c7f1a987a6	[OpenCL] Elaborate about BIenqueue_kernel expansion; NFC	2024-03-12 12:53:22 +00:00
Joseph Huber	1fc5e50ceb	[AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (#83906 ) Summary: This patch implements the LLVM floating point environment control intrinsics and also exposes it through clang. We encode the floating point environment as a 64-bit value that simply concatenates the values of the mode registers and the current trap status. We only fetch the bits relevant for floating point instructions. That is, rounding mode, denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16 overflow, and active exceptions.	2024-03-06 08:11:54 -06:00
Farzon Lotfi	5a5266248d	[HLSL] implement the rcp intrinsic (#83857 ) This PR implements the frontend for llvm#70100 This PR is part 1 of 2. Part 2 requires an intrinsic to instructions lowering. - `Builtins.td` - add an `rcp` builtin - `CGBuiltin.cpp` - add the builtin to intrinsic lowering - `hlsl_intrinsics.h` - add the `rcp` api - `SemaChecking.cpp` - reuse frac's sema checks - `IntrinsicsDirectX.td` - add the llvm intrinsic	2024-03-05 16:11:13 -05:00
Farzon Lotfi	2807ea6b80	[HLSL] implement the any intrinsic (#83903 ) This PR implements the frontend for #70076 This PR is part 1 of 2. Part 2 requires an intrinsic to instructions lowering. - `Builtins.td` - add an `any` builtin - `CGBuiltin.cpp` add the builtin to intrinsic lowering - `hlsl_basic_types.h` -add the `bool` vectors since that is an input for any - `hlsl_intrinsics.h` - add the `any` api - `SemaChecking.cpp` - addy `any` builtin checking - `IntrinsicsDirectX.td` - add the llvm intrinsic	2024-03-05 12:46:01 -05:00
Farzon Lotfi	643b31dbe8	[HLSL] implement `mad` intrinsic (#83826 ) This change implements #83736 The dot product lowering needs a tertiary multipy add operation. DXIL has three mad opcodes for `fmad`(46), `imad`(48), and `umad`(49). Dot product in DXIL only uses `imad`\ `umad`, but for completeness and because the hlsl `mad` intrinsic requires it `fmad` was also included. Two new intrinsics were needed to be created to complete this change. the `fmad` case already supported by llvm via `fmuladd` intrinsic. - `hlsl_intrinsics.h` - exposed mad api call. - `Builtins.td` - exposed a `mad` builtin. - `Sema.h` - make `tertiary` calls check for float types optional. - `CGBuiltin.cpp` - pick the intrinsic for singed\unsigned & float also reuse `int_fmuladd`. - `SemaChecking.cpp` - type checks for `__builtin_hlsl_mad`. - `IntrinsicsDirectX.td` create the two new intrinsics for `imad`\`umad`/ - `DXIL.td` - create the llvm intrinsic to `DXIL` opcode mapping. --------- Co-authored-by: Farzon Lotfi <farzon@farzon.com>	2024-03-05 12:23:26 -05:00
Qiu Chaofan	906580bad3	[PowerPC] Add intrinsics for rldimi/rlwimi/rlwnm (#82968 ) These builtins are already there in Clang, however current codegen may produce suboptimal results due to their complex behavior. Implement them as intrinsics to ensure expected instructions are emitted.	2024-03-04 21:13:59 +08:00
Pavel Iliin	185b1df1b1	[X86][AArch64][PowerPC] __builtin_cpu_supports accepts unknown options. (#83515 ) The patch fixes https://github.com/llvm/llvm-project/issues/83407 modifing __builtin_cpu_supports behaviour so that it returns false if unsupported features names provided in parameter and issue a warning. __builtin_cpu_supports is target independent, but currently supported by X86, AArch64 and PowerPC only.	2024-03-01 10:12:19 +00:00
Farzon Lotfi	489eadd142	[HLSL] Implementation of the frac intrinsic (#83315 ) This change implements the frontend for #70099 Builtins.td - add the frac builtin CGBuiltin.cpp - add the builtin to DirectX intrinsic mapping hlsl_intrinsics.h - add the frac api SemaChecking.cpp - add type checks for builtin IntrinsicsDirectX.td - add the frac intrinsic The backend changes for this are going to be very simple: `f309a0eb55` They were not included because llvm/lib/Target/DirectX/DXIL.td is going through a major refactor.	2024-02-29 10:40:38 -08:00
Farzon Lotfi	e60ebbd000	[HLSL] implementation of lerp intrinsic (#83077 ) This is the start of implementing the lerp intrinsic https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-lerp Builtins.td - defines the builtin hlsl_intrinsics.h - defines the lerp api DiagnosticSemaKinds.td - needed a new error to be inclusive for more than two operands. CGBuiltin.cpp - add the lerp intrinsic lowering SemaChecking.cpp - type checks for lerp builtin IntrinsicsDirectX.td - define the lerp intrinsic this change implements the first half of #70102 Co-authored-by: Xiang Li <python3kgae@outlook.com>	2024-02-29 07:01:36 -08:00
OverMighty	21d83324fb	[clang] Implement __builtin_popcountg (#82359 ) Fixes #82058.	2024-02-26 13:59:42 -08:00
Farzon Lotfi	82acec15af	[HLSL] Implementation of dot intrinsic (#81190 ) This change implements https://github.com/llvm/llvm-project/issues/70073 HLSL has a dot intrinsic defined here: https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-dot The intrinsic itself is defined as a HLSL_LANG LangBuiltin in Builtins.td. This is used to associate all the dot product typdef defined hlsl_intrinsics.h with a single intrinsic check in CGBuiltin.cpp & SemaChecking.cpp. In IntrinsicsDirectX.td we define the llvmIR for the dot product. A few goals were in mind for this IR. First it should operate on only vectors. Second the return type should be the vector element type. Third the second parameter vector should be of the same size as the first parameter. Finally `a dot b` should be the same as `b dot a`. In CGBuiltin.cpp hlsl has built on top of existing clang intrinsics via EmitBuiltinExpr. Dot product though is language specific intrinsic and so is guarded behind getLangOpts().HLSL. The call chain looks like this: EmitBuiltinExpr -> EmitHLSLBuiltinExp EmitHLSLBuiltinExp dot product intrinsics makes a destinction between vectors and scalars. This is because HLSL supports dot product on scalars which simplifies down to multiply. Sema.h & SemaChecking.cpp saw the addition of CheckHLSLBuiltinFunctionCall, a language specific semantic validation that can be expanded for other hlsl specific intrinsics. Fixes #70073	2024-02-26 10:08:59 -06:00
Pavel Iliin	568babab7e	[AArch64] Implement __builtin_cpu_supports, compiler-rt tests. (#82378 ) The patch complements https://github.com/llvm/llvm-project/pull/68919 and adds AArch64 support for builtin `__builtin_cpu_supports("feature1+...+featureN")` which return true if all specified CPU features in argument are detected. Also compiler-rt aarch64 native run tests for features detection mechanism were added and 'cpu_model' check was fixed after its refactor merged https://github.com/llvm/llvm-project/pull/75635 Original RFC was https://reviews.llvm.org/D153153	2024-02-22 23:33:54 +00:00
zhijian lin	5b8e5604c2	[AIX] Lower intrinsic __builtin_cpu_is into AIX platform-specific code. (#80069 ) On AIX OS, __builtin_cpu_is() references the runtime external variable _system_configuration from /usr/include/sys/systemcfg.h. ref issue: https://github.com/llvm/llvm-project/issues/80042	2024-02-22 08:46:08 -05:00
Pierrick Bouvier	0ea64ad88a	[COFF][Aarch64] Add _InterlockedAdd64 intrinsic (#81849 ) Found when compiling openssl master branch using clang-cl. This commit introduces usage of InterlockedAdd64: `d0e1a0ae70` https://learn.microsoft.com/en-us/cpp/intrinsics/interlockedadd-intrinsic-functions	2024-02-16 13:20:08 +02:00
Shilei Tian	630f82ec0c	[Clang][CodeGen] Loose the cast check when emitting builtins (#81669 ) This patch looses the cast check (`canLosslesslyBitCastTo`) and leaves it to the one inside `CreateBitCast`. It seems too conservative for the use case here.	2024-02-14 12:59:59 -05:00
Joseph Huber	11fcae69db	[LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime clocks (#81331 ) Summary: This patch adds a new intrinsic and builtin function mirroring the existing `__builtin_readcyclecounter`. The difference is that this implementation targets a separate counter that some targets have which returns a fixed frequency clock that can be used to determine elapsed time, this is different compared to the cycle counter which often has variable frequency. This patch only adds support for the NVPTX and AMDGPU targets. This is done as a new and separate builtin rather than an argument to `readcyclecounter` to avoid needing to change existing code and to make the separation more explicit.	2024-02-13 10:06:25 -06:00
Shilei Tian	c4b0dfcc99	[Clang] Fix a non-effective assertion (#81083 ) `PTy` here is literally `FTy->getParamType(i)`, which makes this assertion not work as expected.	2024-02-08 09:44:42 -05:00
Mészáros Gergely	5942868a21	[clang][AMDGPU][CUDA] Handle __builtin_printf for device printf (#68515 ) Previously `__builtin_printf` would result to emitting call to `printf`, even though directly calling `printf` was translated. Ref: #68478	2024-02-05 23:23:13 +05:30
Pierre van Houtryve	500846d2f5	[AMDGPU] Introduce Code Object V6 (#76954 ) Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are part of the follow-up patch #76955	2024-02-05 08:19:53 +01:00
Sander de Smalen	d313614b60	[AArch64] Replace LLVM IR function attributes for PSTATE.ZA. (#79166 ) Since https://github.com/ARM-software/acle/pull/276 the ACLE defines attributes to better describe the use of a given SME state. Previously the attributes merely described the possibility of it being 'shared' or 'preserved', whereas the new attributes have more semantics and also describe how the data flows through the program. For ZT0 we already had to add new LLVM IR attributes: * aarch64_new_zt0 * aarch64_in_zt0 * aarch64_out_zt0 * aarch64_inout_zt0 * aarch64_preserves_zt0 We have now done the same for ZA, such that we add: * aarch64_new_za (previously `aarch64_pstate_za_new`) * aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_inout_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_preserves_za (previously `aarch64_pstate_za_shared, aarch64_pstate_za_preserved`) This explicitly removes 'pstate' from the name, because with SME2 and the new ACLE attributes there is a difference between "sharing ZA" (sharing the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).	2024-02-01 13:37:37 +00:00
Nemanja Ivanovic	67c1c1dbb6	[PowerPC][X86] Make cpu id builtins target independent and lower for PPC (#68919 ) Make __builtin_cpu_{init\|supports\|is} target independent and provide an opt-in query for targets that want to support it. Each target is still responsible for their specific lowering/code-gen. Also provide code-gen for PowerPC. I originally proposed this in https://reviews.llvm.org/D152914 and this addresses the comments I received there. --------- Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn> Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>	2024-01-26 11:24:50 -05:00
Vojislav Tomasevic	2a77d92e2e	[clang] Incorrect IR involving the use of bcopy (#79298 ) This patch addresses the issue regarding the call of bcopy function in a conditional expression. It is analogous to the already accepted patch which deals with the same problem, just regarding the bzero function [0]. Here is the testcase which illustrates the issue: ``` void bcopy(const void , void , unsigned long); void foo(void); void test_bcopy() { char dst[20]; char src[20]; int _sz = 20, len = 20; return (_sz ? ((_sz >= len) ? bcopy(src, dst, len) : foo()) : bcopy(src, dst, len)); } ``` When processing it with clang, following issue occurs: Instruction does not dominate all uses! %arraydecay2 = getelementptr inbounds [20 x i8], ptr %dst, i64 0, i64 0, !dbg !38 %cond = phi ptr [ %arraydecay2, %cond.end ], [ %arraydecay5, %cond.false3 ], !dbg !33 fatal error: error in backend: Broken module found, compilation aborted! This happens because an incorrect phi node is created. It is created because bcopy function call is lowered to the call of llvm.memmove intrinsic and function memmove returns void *. Since llvm.memmove is called in two places in the same return statement, clang creates a phi node in the final basic block for the return value and that phi node is incorrect. However, bcopy function should return void in the first place, so this phi node is unnecessary. This is what this patch addresses. An appropriate test is also added and no existing tests fail when applying this patch. Also, this crash only happens when LLVM is configured with -DLLVM_ENABLE_ASSERTIONS=On option. [0] https://reviews.llvm.org/D39746	2024-01-24 09:39:36 -08:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Matthew Devereau	6ba62f4f25	[AArch64][SME2] Refine fcvtu/fcvts/scvtf/ucvtf (#77947 ) Rename intrinsics for fcvtu to fcvtzu and fcvts to fcvtzs. Use llvm_anyvector_ty for both multi vector returns and operands, therefore the return and operands can be specified in the intrinsic call, e.g. @llvm.aarch64.sve.scvtf.x4.nxv4f32.nxv4i32	2024-01-22 15:11:49 +00:00
Piotr Sobczak	57f6a3f7ea	[AMDGPU] Add global_load_tr for GFX12 (#77772 ) Support new amdgcn_global_load_tr instructions for load with transpose. * MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128 * Intrinsic int_amdgcn_global_load_tr * Clang builtins amdgcn_global_load_tr*	2024-01-18 15:14:42 +01:00

1 2 3 4 5 ...

1900 Commits