llvm-project

Author	SHA1	Message	Date
Matt Arsenault	51b4ada458	clang/AMDGPU: Set noalias.addrspace metadata on atomicrmw (#102462 )	2024-10-17 17:10:45 +04:00
Matt Arsenault	d50302f31c	clang/AMDGPU: Stop emitting amdgpu-unsafe-fp-atomics attribute (#111579 )	2024-10-09 08:52:32 +04:00
Matt Arsenault	e108853ac8	clang: Allow targets to set custom metadata on atomics (#96906 ) Use this to replace the emission of the amdgpu-unsafe-fp-atomics attribute in favor of per-instruction metadata. In the future new fine grained controls should be introduced that also cover the integer cases. Add a wrapper around CreateAtomicRMW that appends the metadata, and update a few use contexts to use it.	2024-07-26 09:57:28 +04:00
Mariya Podchishchaeva	6d973b4548	[clang][CodeGen] Return RValue from `EmitVAArg` (#94635 ) This should simplify handling of resulting value by the callers.	2024-06-17 13:29:20 +02:00
Jon Chesterfield	8516f54e6a	[AMDGPU] Implement variadic functions by IR lowering (#93362 ) This is a mostly-target-independent variadic function optimisation and lowering pass. It is only enabled for AMDGPU in this initial commit. The purpose is to make C style variadic functions a zero cost abstraction. They are lowered to equivalent IR which is then amenable to other optimisations. This is inherently slightly target specific but much less so than one might expect - the C varargs interface heavily constrains the ABI design divergence. The pass is primarily tested from webassembly. This is because wasm has a straightforward variadic lowering strategy which coincides exactly with what this pass transforms code into and a struct passing convention with few cases to check. Adding further targets conventions is straightforward and elided from this patch primarily to simplify the review. Implemented in other branches are Linux X86, AMD64, AArch64 and NVPTX. Testing for targets that have existing lowering for va_arg from clang is most efficiently done by checking that clang \| opt completely elides the variadic syntax from test cases. The lowering produces a struct for each call site which can be inspected to check the various alignment and indirections are correct. AMDGPU presently has no variadic support other than some ad hoc printf handling. Combined with the pass being inactive on all other targets landing this represents strict increase in capability with zero risk. Testing and refining will continue post commit. In addition to the compiler tests included here, a self contained x64 clang/musl toolchain was constructed using the "lowering" instead of the systemv ABI and used to build various C programs like lua and libxml2.	2024-06-06 10:44:53 +01:00
Jon Chesterfield	794457f6f9	[amdgpu] Pass variadic arguments without splitting (#94083 ) Pass variadic arguments without changing their type, unlike the fixed ones. Fixed arguments are modified to better fit into registers. This patch leaves those unchanged. Splitting struct types into individual fields and packing small structs into integers works well for passing via registers. Variadic arguments are currently unimplemented in the backend. They're likely to be implemented as a pointer to stack memory in which case register-themed optimisations are inapplicable. Splitting the struct into fields makes it difficult to implement va_arg robustly. The rules around padding and alignment to inverse the struct splitting could be constructed, but at high complexity and no particular advantage. Passing types as-is means there is a 1:1 correspondence with the type information va_arg has to work with and the parameter type at the call site. This is an ABI change, but as the only functions affected are variadic ones which are presently a compilation error, not a functional break. Factored out of the larger #93362 and can land independently.	2024-06-04 13:10:10 +01:00
Jun Wang	c4e517f59c	[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035 ) A new function attribute named amdgpu_num_work_groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of the three dimensions and do optimizations based on that information. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-03-12 10:30:39 -07:00
Joseph Huber	4e80bc7d71	[Clang] Introduce scoped variants of GNU atomic functions (#72280 ) Summary: The standard GNU atomic operations are a very common way to target hardware atomics on the device. With more heterogenous devices being introduced, the concept of memory scopes has been in the LLVM language for awhile via the `syncscope` modifier. For targets, such as the GPU, this can change code generation depending on whether or not we only need to be consistent with the memory ordering with the entire system, the single GPU device, or lower. Previously these scopes were only exported via the `opencl` and `hip` variants of these functions. However, this made it difficult to use outside of those languages and the semantics were different from the standard GNU versions. This patch introduces a `__scoped_atomic` variant for the common functions. There was some discussion over whether or not these should be overloads of the existing ones, or simply new variants. I leant towards new variants to be less disruptive. The scope here can be one of the following ``` __MEMORY_SCOPE_SYSTEM // All devices and systems __MEMORY_SCOPE_DEVICE // Just this device __MEMORY_SCOPE_WRKGRP // A 'work-group' AKA CUDA block __MEMORY_SCOPE_WVFRNT // A 'wavefront' AKA CUDA warp __MEMORY_SCOPE_SINGLE // A single thread. ``` Naming consistency was attempted, but it is difficult to capture to full spectrum with no many names. Suggestions appreciated.	2023-12-07 13:40:25 -06:00
Dominik Adamski	95943d2fab	[Flang] Add code-object-version option (#72638 ) Information about code object version can be configured by the user for AMD GPU target and it needs to be placed in LLVM IR generated by Flang. Information about code object version in MLIR generated by the parser can be reused by other tools. There is no need to specify extra flags if we want to invoke MLIR tools (like fir-opt) separately. Changes in comparison to a8ac93: * added information about required targets for test flang/test/Driver/driver-help.f90	2023-11-29 03:01:01 -06:00
Dominik Adamski	f00ffcdb58	Revert "[Flang] Add code-object-version option (#72638 )" This commit causes test errors on buildbots. This reverts commit a8ac930b99d93b2a539ada7e566993d148899144.	2023-11-28 13:18:46 -06:00
Dominik Adamski	a8ac930b99	[Flang] Add code-object-version option (#72638 ) Information about code object version can be configured by the user for AMD GPU target and it needs to be placed in LLVM IR generated by Flang. Information about code object version in MLIR generated by the parser can be reused by other tools. There is no need to specify extra flags if we want to invoke MLIR tools (like fir-opt) separately.	2023-11-28 19:57:36 +01:00
Saiyedul Islam	21861991e7	[OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (#71234 ) Fixes the DeviceRTL compilation to ensure it is ABI agnostic. Uses already available global variable "oclc_ABI_version" instead of "llvm.amdgcn.abi.verion". It also adds some minor fields in ImplicitArg structure.	2023-11-09 10:34:35 +05:30
Johannes Doerfert	0ba57c8bba	[OpenMP] Pass min/max thread and team count to the OMPIRBuilder (#70247 ) We now provide the information about the min/max thread and team count from to the OMPIRBuilder, no matter what the source was. That means we unify `thread_limit`, `num_teams`, `num_threads` handling with the target specific attriutes (`__launch_bounds__` and `amdgpu_flat_work_group_size`). This is in preparation to pass the values to the runtime, and to allow the middle-end (OpenMP-opt) to tighten the values if it seems appropriate. There is no "real" change after this commit.	2023-10-26 14:45:07 -07:00
Joseph Huber	1d959f9327	[OpenMP] Prevent AMDGPU from overriding visibility on DT_nohost variables (#68264 ) Summary: There's some logic in the AMDGPU target that manually resets the requested visibility of certain variables. This was triggering when we set a constant variable in OpenMP. However, we shouldn't do this for OpenMP when the variable has the `nohost` type. That implies that the variable is not visible to the host and therefore does not need to be visible, so we should respect the original value of it.	2023-10-05 17:10:03 -05:00
Joseph Huber	1b7a095e27	[Clang][AMDGPU] Permit language address spaces for AMDGPU globals (#66205 ) Summary: Currently, there is an assertion that prevents us from emitting an AMDGPU global with a non-target specific address space (i.e. numerical attribute). I'm unsure what the original intentions of this assertion were, but we should be able to use OpenCL address spaces when compiling directly to AMDGPU from C++. This is permitted on NVPTX so I'm unsure what this assertion is guarding. The patch simply removes the assertion and adds a test to ensure that these emit the expected address spaces. Fixes https://github.com/llvm/llvm-project/issues/65069	2023-09-13 08:43:01 -05:00
Joseph Huber	49ff6a96a7	[Clang] Define AMDGPU ABI when referenced in CodeGen for ABI "none" (#66162 ) Summary: We use the `llvm.amgcn.abi.version` varaible to control code generation. This is emitted in every module now to indicate what should be used when compiling. Previously, the logic caused us to emit an external reference to this variable when creating the code for the `none` type. This would then cause us not to emit the actual definition. This patch refines the logic to create the external reference, and then update it if it is found unset by the time we emit the global. I had to remove the reference to `GetOrCreateLLVmGlobal` because it did not accept the proper address space.	2023-09-13 08:31:31 -05:00
Saiyedul Islam	f616c3eeb4	[OpenMP][DeviceRTL][AMDGPU] Support code object version 5 Update DeviceRTL and the AMDGPU plugin to support code object version 5. Default is code object version 4. CodeGen for __builtin_amdgpu_workgroup_size generates code for cov4 as well as cov5 if -mcode-object-version=none is specified. DeviceRTL compilation passes this argument via Xclang option to generate abi-agnostic code. Generated code for the above builtin uses a clang control constant "llvm.amdgcn.abi.version" to branch on the abi version, which is available during linking of user's OpenMP code. Load of this constant gets eliminated during linking. AMDGPU plugin queries the ELF for code object version and then prepares various implicitargs accordingly. Differential Revision: https://reviews.llvm.org/D139730 Reviewed By: jhuber6, yaxunl	2023-08-29 06:35:44 -05:00
David Blaikie	19f2b68095	Make globals with mutable members non-constant, even in custom sections Turned out we were making overly simple assumptions about which sections (& section flags) would be used when emitting a global into a custom section. This lead to sections with read-only flags being used for globals of struct types with mutable members. Fixed by porting the codegen function with the more nuanced handling/checking for mutable members out of codegen for use in the sema code that does this initial checking/mapping to section flags. Differential Revision: https://reviews.llvm.org/D156726	2023-08-14 22:25:42 +00:00
Changpeng Fang	d77c62053c	[clang][AMDGPU]: Don't use byval for struct arguments in function ABI Summary: Byval requires allocating additional stack space, and always requires an implicit copy to be inserted in codegen, where it can be difficult to optimize. In this work, we use byref/IndirectAliased promotion method instead of byval with the implicit copy semantics. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D155986	2023-08-11 16:37:42 -07:00
Yaxun (Sam) Liu	ac72531043	[Driver] Add `-f[no-]offload-uniform-block` By default, clang assumes HIP kernels are launched with uniform block size, which is the case for kernels launched through triple chevron or hipLaunchKernelGGL. Clang adds uniform-work-group-size function attribute to HIP kernels to allow the backend to do optimizations on that. However, in some rare cases, HIP kernels can be launched through hipExtModuleLaunchKernel where global work size is specified, which may result in non-uniform block size. To be able to support non-uniform block size for HIP kernels, an option `-f[no-]offload-uniform-block is added. This option is generic for offloading languages. Its default value is on for CUDA/HIP and off otherwise. Make -cl-uniform-work-group-size an alias to -foffload-uniform-block. Reviewed by: Siu Chi Chan, Matt Arsenault, Fangrui Song, Johannes Doerfert Differential Revision: https://reviews.llvm.org/D155213 Fixes: SWDEV-406592	2023-07-27 16:36:02 -04:00
Johannes Doerfert	08a220764b	Reapply "[OpenMP] Add the `ompx_attribute` clause for target directives" This reverts commit 0d12683046ca75fb08e285f4622f2af5c82609dc and reapplies ef9ec4bbcca2fa4f64df47bc426f1d1c59ea47e2 with an extension to fix the Flang build. Differential Revision: https://reviews.llvm.org/D156184	2023-07-25 10:40:35 -07:00
Aaron Ballman	0d12683046	Revert "[OpenMP] Add the `ompx_attribute` clause for target directives" This reverts commit ef9ec4bbcca2fa4f64df47bc426f1d1c59ea47e2. The changes broke several bots: https://lab.llvm.org/buildbot/#/builders/176/builds/3408 https://lab.llvm.org/buildbot/#/builders/198/builds/4028 https://lab.llvm.org/buildbot/#/builders/197/builds/8491 https://lab.llvm.org/buildbot/#/builders/197/builds/8491	2023-07-25 07:57:36 -04:00
Johannes Doerfert	ef9ec4bbcc	[OpenMP] Add the `ompx_attribute` clause for target directives CUDA and HIP have kernel attributes to tune the code generation (in the backend). To reuse this functionality for OpenMP target regions we introduce the `ompx_attribute` clause that takes these kernel attributes and emits code as if they had been attached to the kernel fuction (which is implicitly generated). To limit the impact, we only support three kernel attributes: `amdgpu_waves_per_eu`, for AMDGPU `amdgpu_flat_work_group_size`, for AMDGPU `launch_bounds`, for NVPTX The existing implementations of those attributes are used for error checking and code generation. `ompx_attribute` can be attached to any executable target region and it can hold more than one kernel attribute. Differential Revision: https://reviews.llvm.org/D156184	2023-07-24 22:04:45 -07:00
Sergei Barannikov	992cb98462	[clang][CodeGen] Break up TargetInfo.cpp [8/8] This commit breaks up CodeGen/TargetInfo.cpp into a set of .cpp files, one file per target. There are no functional changes, mostly just code moving. Non-code-moving changes are: A virtual destructor has been added to DefaultABIInfo to pin the vtable to a cpp file. * A few methods of ABIInfo and DefaultABIInfo were split into declaration + definition in order to reduce the number of transitive includes. * Several functions that used to be static have been placed in clang::CodeGen namespace so that they can be accessed from other cpp files. RFC: https://discourse.llvm.org/t/rfc-splitting-clangs-targetinfo-cpp/69883 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D148094	2023-06-17 07:14:50 +03:00

24 Commits