llvm-project

Author	SHA1	Message	Date
Sirraide	2b5f68a5f6	[Clang][C++23] Implement P1774R8: Portable assumptions (#81014 ) This implements the C++23 `[[assume]]` attribute. Assumption information is lowered to a call to `@llvm.assume`, unless the expression has side-effects, in which case it is discarded and a warning is issued to tell the user that the assumption doesn’t do anything. A failed assumption at compile time is an error (unless we are in `MSVCCompat` mode, in which case we don’t check assumptions at compile time). Due to performance regressions in LLVM, assumptions can be disabled with the `-fno-assumptions` flag. With it, assumptions will still be parsed and checked, but no calls to `@llvm.assume` will be emitted and assumptions will not be checked at compile time.	2024-03-09 12:07:16 +01:00
Jie Fu	27b297bf21	[clang] Fix -Wunused-variable in CGCall.cpp (NFC) llvm-project/clang/lib/CodeGen/CGCall.cpp:3226:24: error: unused variable 'StructSize' [-Werror,-Wunused-variable] llvm::TypeSize StructSize = CGM.getDataLayout().getTypeAllocSize(STy); ^ llvm-project/clang/lib/CodeGen/CGCall.cpp:3227:24: error: unused variable 'PtrElementSize' [-Werror,-Wunused-variable] llvm::TypeSize PtrElementSize = ^ llvm-project/clang/lib/CodeGen/CGCall.cpp:5313:24: error: unused variable 'SrcTypeSize' [-Werror,-Wunused-variable] llvm::TypeSize SrcTypeSize = ^ llvm-project/clang/lib/CodeGen/CGCall.cpp:5315:24: error: unused variable 'DstTypeSize' [-Werror,-Wunused-variable] llvm::TypeSize DstTypeSize = CGM.getDataLayout().getTypeAllocSize(STy); ^ 4 errors generated.	2024-02-28 20:58:20 +08:00
Paul Walker	3d454d2895	[LLVM][TypeSize] Remove default constructor. (#82810 )	2024-02-28 11:48:53 +00:00
Craig Topper	9be7b0a539	[IRGen][AArch64][RISCV] Generalize bitcast between i1 predicate vector and i8 fixed vector. (#76548 ) Instead of only handling vscale x 16 x i1 predicate vectors, handle any scalable i1 vector where the known minimum is divisible by 8. This is used on RISC-V where we have multiple sizes of predicate types.	2024-02-13 09:46:50 -08:00
weiguozhi	c166a43c6e	New calling convention preserve_none (#76868 ) The new experimental calling convention preserve_none is the opposite side of existing preserve_all. It tries to preserve as few general registers as possible. So all general registers are caller saved registers. It can also uses more general registers to pass arguments. This attribute doesn't impact floating-point registers. Floating-point registers still follow the c calling convention. Currently preserve_none is supported on X86-64 only. It changes the c calling convention in following fields: * RSP and RBP are the only preserved general registers, all other general registers are caller saved registers. * We can use [RDI, RSI, RDX, RCX, R8, R9, R11, R12, R13, R14, R15, RAX] to pass arguments. It can improve the performance of hot tailcall chain, because many callee saved registers' save/restore instructions can be removed if the tail functions are using preserve_none. In my experiment in protocol buffer, the parsing functions are improved by 3% to 10%.	2024-02-05 13:28:43 -08:00
Brandon Wu	f5154b9c98	[clang][RISCV] Enable struct of homogeneous scalable vector as function argument (#78550 ) llvm IR supports struct as function input, so RISCV tuple type can just use struct of homogeneous scalable vector instead of flatten them.	2024-02-03 17:57:15 +08:00
Sander de Smalen	d313614b60	[AArch64] Replace LLVM IR function attributes for PSTATE.ZA. (#79166 ) Since https://github.com/ARM-software/acle/pull/276 the ACLE defines attributes to better describe the use of a given SME state. Previously the attributes merely described the possibility of it being 'shared' or 'preserved', whereas the new attributes have more semantics and also describe how the data flows through the program. For ZT0 we already had to add new LLVM IR attributes: * aarch64_new_zt0 * aarch64_in_zt0 * aarch64_out_zt0 * aarch64_inout_zt0 * aarch64_preserves_zt0 We have now done the same for ZA, such that we add: * aarch64_new_za (previously `aarch64_pstate_za_new`) * aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_inout_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_preserves_za (previously `aarch64_pstate_za_shared, aarch64_pstate_za_preserved`) This explicitly removes 'pstate' from the name, because with SME2 and the new ACLE attributes there is a difference between "sharing ZA" (sharing the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).	2024-02-01 13:37:37 +00:00
Sander de Smalen	1652d44d8d	[Clang] Amend SME attributes with support for ZT0. (#77941 ) This patch builds on top of #76971 and implements support for: * __arm_new("zt0") * __arm_in("zt0") * __arm_out("zt0") * __arm_inout("zt0") * __arm_preserves("zt0")	2024-01-23 12:35:16 +01:00
Sander de Smalen	8e7f073eb4	[Clang][AArch64] Change SME attributes for shared/new/preserved state. (#76971 ) This patch replaces the `__arm_new_za`, `__arm_shared_za` and `__arm_preserves_za` attributes in favour of: * `__arm_new("za")` * `__arm_in("za")` * `__arm_out("za")` * `__arm_inout("za")` * `__arm_preserves("za")` As described in https://github.com/ARM-software/acle/pull/276. One change is that `__arm_in/out/inout/preserves(S)` are all mutually exclusive, whereas previously it was fine to write `__arm_shared_za __arm_preserves_za`. This case is now represented with `__arm_in("za")`. The current implementation uses the same LLVM attributes under the hood, since `__arm_in/out/inout` are all variations of "shared ZA", so can use the existing `aarch64_pstate_za_shared` attribute in LLVM. #77941 will add support for the new "zt0" state as introduced with SME2.	2024-01-15 09:41:32 +00:00
Nikita Popov	158d72d728	[Clang] Set writable and dead_on_unwind attributes on sret arguments (#77116 ) Set the writable and dead_on_unwind attributes for sret arguments. These indicate that the argument points to writable memory (and it's legal to introduce spurious writes to it on entry to the function) and that the argument memory will not be used if the call unwinds. This enables additional MemCpyOpt/DSE/LICM optimizations.	2024-01-11 09:46:54 +01:00
Kazu Hirata	f3dcc2351c	[clang] Use StringRef::{starts,ends}_with (NFC) (#75149 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 08:54:13 -08:00
Romaric Jodin	d56e0d07cc	clang/OpenCL: set sqrt fp accuracy on call to Z4sqrt (#66651 ) This is reverting the previous implementation to avoid adding inline function in opencl headers. This was breaking clspv flow google/clspv#1231, while https://reviews.llvm.org/D156743 mentioned that just decorating the call node with `!pfmath` was enough. This PR is implementing this idea. The test has been updated with this implementation.	2023-12-01 16:34:44 +09:00
Youngsuk Kim	5c91b2886f	[clang] Replace uses of CreatePointerBitCastOrAddrSpaceCast (NFC) (#68277 ) With opaque pointers, `CreatePointerBitCastOrAddrSpaceCast` can be replaced with `CreateAddrSpaceCast`. Replace or remove uses of `CreatePointerBitCastOrAddrSpaceCast`. Opaque pointer cleanup effort.	2023-11-11 10:57:44 -05:00
jyu2-git	c79b544d2b	[SEH] Fix assertin when return scalar value from __try block. (#71488 ) Current compler assert with `!SI->isAtomic() && !SI->isVolatile()' failed This due to following rule: First, no exception can move in or out of _try region., i.e., no "potential faulty instruction can be moved across _try boundary. Second, the order of exceptions for instructions 'directly' under a _try must be preserved (not applied to those in callees). Finally, global states (local/global/heap variables) that can be read outside of _try region must be updated in memory (not just in register) before the subsequent exception occurs. All memory instructions inside a _try are considered as 'volatile' to assure 2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's acceptable as the amount of code directly under _try is very small. However during findDominatingStoreToReturnValue: those are not allowed. To fix just skip the assertion when current function has seh try.	2023-11-07 20:43:40 -08:00
Jon Roelofs	fa71f9e87a	Reland "[Intrinsics][ObjC] Mark objc_retain and friends as thisreturn." This reverts commit cb62f67088aaf79493350547f74870318b71acc5. Fixes: https://github.com/llvm/llvm-project/issues/69658	2023-11-06 11:10:59 -08:00
Jon Roelofs	d9ccacee13	Revert "Reland "[Intrinsics][ObjC] Mark objc_retain and friends as thisreturn."" This reverts commit 30414fc614d80a45bad4c89763a353f50d3e04d6. Broke some buildbots.	2023-11-06 10:04:22 -08:00
Jon Roelofs	30414fc614	Reland "[Intrinsics][ObjC] Mark objc_retain and friends as thisreturn." This reverts commit cb62f67088aaf79493350547f74870318b71acc5. Fixes: https://github.com/llvm/llvm-project/issues/69658	2023-11-06 08:47:05 -08:00
Vlad Serebrennikov	a8ead56068	[clang][NFC] Rename ArgPassingKind to RecordArgPassingKind (#70955 ) During the recent refactoring (b120fe8d3288c4dca1b5427ca34839ce8833f71c) this enum was moved out of `RecordDecl`. During post-commit review it was found out that its association with `RecordDecl` should be expressed in the name.	2023-11-01 20:38:28 +04:00
Vlad Serebrennikov	b120fe8d32	[clang][NFC] Refactor `ArgPassingKind` This patch moves `RecordDecl::ArgPassingKind` to DeclBase.h to namespace scope, so that it's complete at the time bit-field is declared.	2023-11-01 11:49:59 +03:00
Vlad Serebrennikov	49fd28d960	[clang][NFC] Refactor `ArrayType::ArraySizeModifier` This patch moves `ArraySizeModifier` before `Type` declaration so that it's complete at `ArrayTypeBitfields` declaration. It's also converted to scoped enum along the way.	2023-10-31 18:06:34 +03:00
Jon Roelofs	cb62f67088	Revert "[Intrinsics][ObjC] Mark objc_retain and friends as thisreturn." This reverts commit ed83797f3cbfc8fb2a1af63542f97d7ec1d5505a. Reverting pending the investigation of https://github.com/llvm/llvm-project/issues/69658	2023-10-20 09:22:12 -07:00
Min-Yih Hsu	fd4f96290a	[Clang][M68k] Add Clang support for the new M68k_RTD CC This patch adds `CC_M68kRTD`, which will be used on function if either `__attribute__((m68k_rtd))` is presented or `-mrtd` flag is given. Differential Revision: https://reviews.llvm.org/D149867	2023-10-15 16:13:43 -07:00
Corentin Jabot	af4751738d	[C++] Implement "Deducing this" (P0847R7) This patch implements P0847R7 (partially), CWG2561 and CWG2653. Reviewed By: aaron.ballman, #clang-language-wg Differential Revision: https://reviews.llvm.org/D140828	2023-10-02 14:33:02 +02:00
Juan Manuel Martinez Caamaño	69183f8eb9	[NFC][Clang] Address reviews about overrideFunctionFeaturesWithTargetFeatures (#65938 ) Addressing remarks after merge of D159257 * Add comment * Remove irrelevant CHECKs from test * Simplify function * Use llvm::sort before setting target-features as it is done in CodeGenModeule	2023-09-20 13:37:13 +02:00
Zahira Ammarguellat	2c93e3c1c8	Take math-errno into account with '#pragma float_control(precise,on)' and 'attribute__((optnone)). Differential Revision: https://reviews.llvm.org/D151834	2023-09-08 09:48:53 -04:00
Juan Manuel MARTINEZ CAAMAÑO	d60c47476d	[Clang] Propagate target-features if compatible when using mlink-builtin-bitcode Buitlins from AMD's device-libs are compiled without specifying a target-cpu, which results in builtins without the target-features attribute set. Before this patch, when linking this builtins with -mlink-builtin-bitcode the target-features were not propagated in the incoming builtins. With this patch, the default target features are propagated if they are compatible with the target-features in the incoming builtin. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D159206	2023-09-08 11:20:16 +02:00
Chris Bieneman	400d3261a0	[HLSL] Cleanup support for `this` as an l-value The goal of this change is to clean up some of the code surrounding HLSL using CXXThisExpr as a non-pointer l-value. This change cleans up a bunch of assumptions and inconsistencies around how the type of `this` is handled through the AST and code generation. This change is be mostly NFC for HLSL, and completely NFC for other language modes. This change introduces a new member to query for the this object's type and seeks to clarify the normal usages of the this type. With the introudction of HLSL to clang, CXXThisExpr may now be an l-value and behave like a reference type rather than C++'s normal method of it being an r-value of pointer type. With this change there are now three ways in which a caller might need to query the type of `this`: * The type of the `CXXThisExpr` * The type of the object `this` referrs to * The type of the implicit (or explicit) `this` argument This change codifies those three ways you may need to query respectively as: * CXXMethodDecl::getThisType() * CXXMethodDecl::getThisObjectType() * CXXMethodDecl::getThisArgType() This change then revisits all uses of `getThisType()`, and in cases where the only use was to resolve the pointee type, it replaces the call with `getThisObjectType()`. In other cases it evaluates whether the desired returned type is the type of the `this` expr, or the type of the `this` function argument. The `this` expr type is used for creating additional expr AST nodes and for member lookup, while the argument type is used mostly for code generation. Additionally some cases that used `getThisType` in simple queries could be substituted for `getThisObjectType`. Since `getThisType` is implemented in terms of `getThisObjectType` calling the later should be more efficient if the former isn't needed. Reviewed By: aaron.ballman, bogner Differential Revision: https://reviews.llvm.org/D159247	2023-09-05 19:38:50 -05:00
Alexander Kornienko	b7f4915644	Revert "Reapply: [IRGen] Emit lifetime intrinsics around temporary aggregate argument allocas" This reverts commit e698695fbbf62e6676f8907665187f2d2c4d814b. The commit caused invalid AddressSanitizer: stack-use-after-scope errors. See https://reviews.llvm.org/D74094#4633785 for details. Differential Revision: https://reviews.llvm.org/D159346	2023-09-01 12:53:24 +02:00
Juan Manuel MARTINEZ CAAMAÑO	19550e79b5	[NFC][Clang] Remove redundant function definitions There were 3 definitions of the mergeDefaultFunctionDefinitionAttributes function: A private implementation, a version exposed in CodeGen, a version exposed in CodeGenModule. This patch removes the private and the CodeGenModule versions and keeps a single definition in CodeGen. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D159256	2023-08-31 14:47:42 +02:00
Juan Manuel MARTINEZ CAAMAÑO	9b35254018	[NFC][Clang] Remove unused function `CodeGenModule::addDefaultFunctionDefinitionAttributes` This patch deletes the unused `addDefaultFunctionDefinitionAttributes(llvm::Function);` function, while it still keeps `void addDefaultFunctionDefinitionAttributes(llvm::AttrBuilder &attrs);` which is being used. Differential Revision: https://reviews.llvm.org/D158990	2023-08-30 10:32:51 +02:00
Juan Manuel MARTINEZ CAAMAÑO	b63c6e585d	[NFC][Clang] Add missing & to function argument Differential Revision: https://reviews.llvm.org/D158991	2023-08-29 13:59:17 +02:00
Nikita Popov	14cc7a0772	[Clang] Allow __declspec(noalias) to access inaccessible memory MSVC defines __declspec(noalias) as follows (https://learn.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2012/k649tyc7(v=vs.110)?redirectedfrom=MSDN): > noalias means that a function call does not modify or reference > visible global state and only modifies the memory pointed to > directly by pointer parameters (first-level indirections). > If a function is annotated as noalias, the optimizer can assume > that, in addition to the parameters themselves, only first-level > indirections of pointer parameters are referenced or modified > inside the function. The visible global state is the set of all > data that is not defined or referenced outside of the compilation > scope, and their address is not taken. The compilation scope is > all source files (/LTCG (Link-time Code Generation) builds) or a > single source file (non-/LTCG build). The wording is not super clear to me, but I believe this is saying that __declspec(noalias) functions may access inaccessible memory (i.e. non-visible global state in their words). Indeed, the Windows CRT applies this attribute to malloc, which does access inaccessible memory under LLVM's memory model. As such, change the attribute to emit memory(argmem: readwrite, inaccessiblemem: readwrite) instead of memory(argmem: readwrite). Fixes https://github.com/llvm/llvm-project/issues/64827. Differential Revision: https://reviews.llvm.org/D158984	2023-08-29 11:43:57 +02:00
Chuanqi Xu	572cc8d38f	Revert "[C++20] [Coroutines] Mark await_suspend as noinline if the awaiter is not empty" This reverts commit 9d9c25f81456aace2bec4b58498a420e650007d9. This reverts commit 19ab2664ad3182ffa8fe3a95bb19765e4ae84653. This reverts commit c4672454743e942f148a1aff1e809dae73e464f6. As the issue https://github.com/llvm/llvm-project/issues/65018 shows, the previous fix introduce a regression actually. So this commit reverts the fix by our policies.	2023-08-28 13:21:17 +08:00
Chuanqi Xu	9d9c25f814	[C++20] [Coroutines] Don't mark await_suspend as noinline if it is specified as always_inline already Address https://github.com/llvm/llvm-project/issues/64933 and partially https://github.com/llvm/llvm-project/issues/64945. After c467245, we will add a noinline attribute to the await_suspend member function of an awaiter if the awaiter has any non static member functions. Obviously, this decision will bring some performance regressions. And people may complain about this while the long term solution may not be available soon. In such cases, it is better to provide a solution for the users who met the regression surprisingly. Also it is natural to not prevent the inlining if the function is marked as always_inline by the users already.	2023-08-28 11:43:33 +08:00
eopXD	39a41c8905	[CGCall][RISCV] Handle function calls with parameter of RVV tuple type This was an oversight in D146872, where function calls with tuple type was not covered. This commit fixes this. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D157953	2023-08-22 23:41:23 -07:00
Chuanqi Xu	c467245474	[C++20] [Coroutines] Mark await_suspend as noinline if the awaiter is not empty Close https://github.com/llvm/llvm-project/issues/56301 Close https://github.com/llvm/llvm-project/issues/64151 See the summary and the discussion of https://reviews.llvm.org/D157070 to get the full context. As @rjmccall pointed out, the key point of the root cause is that currently we didn't implement the semantics for '@llvm.coro.save' well ("after the await-ready returns false, the coroutine is considered to be suspended ") well. Since the semantics implies that we (the compiler) shouldn't write the spills into the coroutine frame in the await_suspend. But now it is possible due to some combinations of the optimizations so the semantics are broken. And the inlining is the root optimization of such optimizations. So in this patch, we tried to add the `noinline` attribute to the await_suspend call. Also as an optimization, we don't add the `noinline` attribute to the await_suspend call if the awaiter is an empty class. This should be correct since the programmers can't access the local variables in await_suspend if the awaiter is empty. I think this is necessary for the performance since it is pretty common. Another potential optimization is: call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle, ptr @awaitSuspendFn) Then it is much easier to perform the safety analysis in the middle end. If it is safe to inline the call to awaitSuspend, we can replace it in the CoroEarly pass. Otherwise we could replace it in the CoroSplit pass. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D157833	2023-08-22 09:56:44 +08:00
Erik Pilkington	e698695fbb	Reapply: [IRGen] Emit lifetime intrinsics around temporary aggregate argument allocas This reverts commit e26c24b849211f35a988d001753e0cd15e4a9d7b. These temporaries are only used in the callee, and their memory can be reused after the call is complete. rdar://58552124 Link: https://github.com/llvm/llvm-project/issues/38157 Link: https://github.com/llvm/llvm-project/issues/41896 Link: https://github.com/llvm/llvm-project/issues/43598 Link: https://github.com/ClangBuiltLinux/linux/issues/39 Link: https://reviews.llvm.org/rGfafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402 Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D74094	2023-08-16 15:21:46 -07:00
Martin Storsjö	d60c3d08e7	[clang] Skip stores in init for fields that are empty structs An empty struct is handled as a struct with a dummy i8, on all targets. Most targets treat an empty struct return value as essentially void - but some don't. (Currently, at least x86_64-windows-* and powerpc64le-* don't treat it as void.) When intializing a struct with such a no_unique_address member, make sure we don't write the dummy i8 into the struct where there's no space allocated for it. Previously it would clobber the actual valid data of the struct. Fixes https://github.com/llvm/llvm-project/issues/64253, and possibly https://github.com/llvm/llvm-project/issues/64077 and https://github.com/llvm/llvm-project/issues/64427 as well. We should omit the store for any empty record (not only ones declared with no_unique_address); we can have a situation where a class doesn't have the no_unique_address attribute, but is embedded in an outer struct with the no_unique_address attribute - like this: struct S {}; S f(); struct S2 : public S { S2();}; S2::S2() : S(f()) {} struct S3 { int x; [[no_unique_address]] S2 y; S3(); }; S3::S3() : x(1), y() {} Here, the problematic store (which this patch omits) is in the constructor of S2. In the case of S3, S2 has no valid storage and aliases x - thus the constructor of S2 should omit the dummy store. Differential Revision: https://reviews.llvm.org/D157332	2023-08-15 10:59:23 +03:00
Changpeng Fang	d77c62053c	[clang][AMDGPU]: Don't use byval for struct arguments in function ABI Summary: Byval requires allocating additional stack space, and always requires an implicit copy to be inserted in codegen, where it can be difficult to optimize. In this work, we use byref/IndirectAliased promotion method instead of byval with the implicit copy semantics. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D155986	2023-08-11 16:37:42 -07:00
Matt Arsenault	25bc999d1f	Intrinsics: Add type overload to stacksave and stackstore This allows use with non-0 address space stacks. llvm_ptr_ty should never be used. This could use some more percolation up through mlir, but this is enough to fix existing tests. https://reviews.llvm.org/D156666	2023-08-09 18:33:11 -04:00
Sander de Smalen	28b5f3087a	[Clang][AArch64] Add/implement ACLE keywords for SME. This patch adds all the language-level function keywords defined in: https://github.com/ARM-software/acle/pull/188 (merged) https://github.com/ARM-software/acle/pull/261 (update after D148700 landed) The keywords are used to control PSTATE.ZA and PSTATE.SM, which are respectively used for enabling the use of the ZA matrix array and Streaming mode. This information needs to be available on call sites, since the use of ZA or streaming mode may have to be enabled or disabled around the call-site (depending on the IR attributes set on the caller and the callee). For calls to functions from a function pointer, there is no IR declaration available, so the IR attributes must be added explicitly to the call-site. With the exception of '__arm_locally_streaming' and '__arm_new_za' the information is part of the function's interface, not just the function definition, and thus needs to be propagated through the FunctionProtoType::ExtProtoInfo. This patch adds the defintions of these keywords, as well as codegen and semantic analysis to ensure conversions between function pointers are valid and that no conflicting keywords are set. For example, '__arm_streaming' and '__arm_streaming_compatible' are mutually exclusive. Differential Revision: https://reviews.llvm.org/D127762	2023-08-08 07:00:59 +00:00
Jon Roelofs	ed83797f3c	[Intrinsics][ObjC] Mark objc_retain and friends as thisreturn. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-retain rdar://79869679 Differential revision: https://reviews.llvm.org/D105671	2023-08-01 18:02:00 -07:00
Yaxun (Sam) Liu	ac72531043	[Driver] Add `-f[no-]offload-uniform-block` By default, clang assumes HIP kernels are launched with uniform block size, which is the case for kernels launched through triple chevron or hipLaunchKernelGGL. Clang adds uniform-work-group-size function attribute to HIP kernels to allow the backend to do optimizations on that. However, in some rare cases, HIP kernels can be launched through hipExtModuleLaunchKernel where global work size is specified, which may result in non-uniform block size. To be able to support non-uniform block size for HIP kernels, an option `-f[no-]offload-uniform-block is added. This option is generic for offloading languages. Its default value is on for CUDA/HIP and off otherwise. Make -cl-uniform-work-group-size an alias to -foffload-uniform-block. Reviewed by: Siu Chi Chan, Matt Arsenault, Fangrui Song, Johannes Doerfert Differential Revision: https://reviews.llvm.org/D155213 Fixes: SWDEV-406592	2023-07-27 16:36:02 -04:00
Amy Huang	27dab4d305	Reland "Try to implement lambdas with inalloca parameters by forwarding without use of inallocas."t This reverts commit 8ed7aa59f489715d39d32e72a787b8e75cfda151. Differential Revision: https://reviews.llvm.org/D154007	2023-07-26 16:13:36 -07:00
Craig Topper	d53d842d12	[RISCV][AArch64][IRGen] Add a special case to CodeGenFunction::EmitCall for scalable vector return being coerced to fixed vector. Before falling back to CreateCoercedStore, detect a scalable vector return being coerced to fixed vector. Handle it using a vector.extract intrinsic without going through memory. Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D155495	2023-07-18 10:04:33 -07:00
Craig Topper	e8dc9dcd7d	[IRGen] Remove 'Sve' from the name of some IR names that are shared with RISC-V now. Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D155220	2023-07-17 08:43:43 -07:00
Youngsuk Kim	6f986bffc5	[clang] Remove CGBuilderTy::CreateElementBitCast `CGBuilderTy::CreateElementBitCast()` no longer does what its name suggests. Remove remaining in-tree uses by one of the following methods. * drop the call entirely * fold it to an `Address` construction * replace it with `Address::withElementType()` This is a NFC cleanup effort. Reviewed By: barannikov88, nikic, jrtc27 Differential Revision: https://reviews.llvm.org/D154285	2023-07-02 10:40:16 -04:00
Elliot Goodrich	f0fa2d7c29	[llvm] Move AttributeMask to a separate header Move `AttributeMask` out of `llvm/IR/Attributes.h` to a new file `llvm/IR/AttributeMask.h`. After doing this we can remove the `#include <bitset>` and `#include <set>` directives from `Attributes.h`. Since there are many headers including `Attributes.h`, but not needing the definition of `AttributeMask`, this causes unnecessary bloating of the translation units and slows down compilation. This commit adds in the include directive for `llvm/IR/AttributeMask.h` to the handful of source files that need to see the definition. This reduces the total number of preprocessing tokens across the LLVM source files in lib from (roughly) 1,917,509,187 to 1,902,982,273 - a reduction of ~0.76%. This should result in a small improvement in compilation time. Differential Revision: https://reviews.llvm.org/D153728	2023-06-27 15:26:17 +01:00
Eduard Zingerman	06eee734c1	[clang] Allow 'nomerge' attribute for function pointers Allow specifying 'nomerge' attribute for function pointers, e.g. like in the following C code: extern void (foo)(void) __attribute__((nomerge)); void bar(long i) { if (i) foo(); else foo(); } With the goal to attach 'nomerge' to both calls done through 'foo': @foo = external local_unnamed_addr global ptr, align 8 define dso_local void @bar(i64 noundef %i) local_unnamed_addr #0 { ; ... %0 = load ptr, ptr @foo, align 8, !tbaa !5 ; ... if.then: tail call void %0() #1 br label %if.end if.else: tail call void %0() #1 br label %if.end if.end: ret void } ; ... attributes #1 = { nomerge ... } Report a warning in case if 'nomerge' is specified for a variable that is not a function pointer, e.g.: t.c:2:22: warning: 'nomerge' attribute is ignored because 'j' is not a function pointer [-Wignored-attributes] 2 \| int j __attribute__((nomerge)); \| ^ The intended use-case is for BPF backend. BPF provides a sort of "standard library" functions that are called helpers. BPF also verifies usage of these helpers before program execution. Because of limitations of verification / runtime model it is important to keep calls to some of such helpers from merging. An example could be found by the link [1], there input C code: if (data_end - data > 1024) { bpf_for_each_map_elem(&map1, cb, &cb_data, 0); } else { bpf_for_each_map_elem(&map2, cb, &cb_data, 0); } Is converted to bytecode equivalent to: if (data_end - data > 1024) tmp = &map1; else tmp = &map2; bpf_for_each_map_elem(tmp, cb, &cb_data, 0); However, BPF verification/runtime requires to use the same map address for each particular `bpf_for_each_map_elem()` call. The 'nomerge' attribute is a perfect match for this situation, but unfortunately BPF helpers are declared as pointers to functions: static long (bpf_for_each_map_elem)(void map, ...) = (void ) 164; Hence, this commit, allowing to use 'nomerge' for function pointers. [1] https://lore.kernel.org/bpf/03bdf90f-f374-1e67-69d6-76dd9c8318a4@meta.com/ Differential Revision: https://reviews.llvm.org/D152986	2023-06-27 01:15:45 +03:00
Amy Huang	8ed7aa59f4	Revert "Try to implement lambdas with inalloca parameters by forwarding without use of inallocas." Causes a clang crash (see crbug.com/1457256). This reverts commit 015049338d7e8e0e81f2ad2f94e5a43e2e3f5220.	2023-06-22 11:42:33 -07:00

1 2 3 4 5 ...

1147 Commits