llvm-project

Author	SHA1	Message	Date
Alexander Richardson	07e2ba445d	[AMDGPU] Set AS8 address width to 48 bits Of the 128-bits of buffer descriptor only 48 bits are address bits, so following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54, the logic conclusion is to set the index width to 48 bits instead of the current value of 128. Most of the test changes are mechanical datalayout updates, but there is one actual change: the ptrmask test now uses .i48 instead of .i128 and I had to update SelectionDAGBuilder to correctly extend the mask. Reviewed By: krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/139419	2025-05-19 17:26:05 -07:00
Kazu Hirata	9adcb4fe12	[clang] Use llvm::replace (NFC) (#140264 )	2025-05-16 09:06:31 -07:00
Steven Perron	c073c22865	[HLSL] Use hlsl_device address space for getpointer. (#127675 ) We add the hlsl_device address space to represent the device memory space as defined in section 1.7.1.3 of the [HLSL spec](https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf). Fixes https://github.com/llvm/llvm-project/issues/127075	2025-04-22 13:26:32 -04:00
Nathan Gauër	a625bc60e2	[HLSL][SPIR-V] Add hlsl_private address space for SPIR-V (#133464 ) This is an alternative to https://github.com/llvm/llvm-project/pull/122103 In SPIR-V, private global variables have the Private storage class. This PR adds a new address space which allows frontend to emit variable with this storage class when targeting this backend. This is covered in this proposal: llvm/wg-hlsl@4c9e11a This PR will cause addrspacecast to show up in several cases, like class member functions or assignment. Those will have to be handled in the backend later on, particularly to fixup pointer storage classes in some functions. Before this change, global variable were emitted with the 'Function' storage class, which was wrong.	2025-04-10 10:55:10 +02:00
Juan Manuel Martinez Caamaño	beae0e9f1a	[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 (#133055 ) This patch introduces the `vmem-to-lds-load-insts` target feature, which can be used to enable builtins `__builtin_amdgcn_global_load_lds` and `__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this feature. This feature is only available on gfx9/10. A limitation of using a common target feature for both builtins is that we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available on gfx6,7,8.	2025-04-02 20:00:09 +02:00
Yaxun (Sam) Liu	240f2269ff	Add clang atomic control options and attribute (#114841 ) Add option and statement attribute for controlling emitting of target-specific metadata to atomicrmw instructions in IR. The RFC for this attribute and option is https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641, Originally a pragma was proposed, then it was changed to clang attribute. This attribute allows users to specify one, two, or all three options and must be applied to a compound statement. The attribute can also be nested, with inner attributes overriding the options specified by outer attributes or the target's default options. These options will then determine the target-specific metadata added to atomic instructions in the IR. In addition to the attribute, three new compiler options are introduced: `-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`, `-f[no-]atomic-ignore-denormal-mode`. These compiler options allow users to override the default options through the Clang driver and front end. `-m[no-]unsafe-fp-atomics` is aliased to `-f[no-]ignore-denormal-mode`. In terms of implementation, the atomic attribute is represented in the AST by the existing AttributedStmt, with minimal changes to AST and Sema. During code generation in Clang, the CodeGenModule maintains the current atomic options, which are used to emit the relevant metadata for atomic instructions. RAII is used to manage the saving and restoring of atomic options when entering and exiting nested AttributedStmt.	2025-02-27 10:41:04 -05:00
Chandler Carruth	cd269fee05	[StrTable] Switch Clang builtins to use string tables This both reapplies #118734, the initial attempt at this, and updates it significantly. First, it uses the newly added `StringTable` abstraction for string tables, and simplifies the construction to build the string table and info arrays separately. This should reduce any `constexpr` compile time memory or CPU cost of the original PR while significantly improving the APIs throughout. It also restructures the builtins to support sharding across several independent tables. This accomplishes two improvements from the original PR: 1) It improves the APIs used significantly. 2) When builtins are defined from different sources (like SVE vs MVE in AArch64), this allows each of them to build their own string table independently rather than having to merge the string tables and info structures. 3) It allows each shard to factor out a common prefix, often cutting the size of the strings needed for the builtins by a factor two. The second point is important both to allow different mechanisms of construction (for example a `.def` file and a tablegen'ed `.inc` file, or different tablegen'ed `.inc files), it also simply reduces the sizes of these tables which is valuable given how large they are in some cases. The third builds on that size reduction. Initially, we use this new sharding rather than merging tables in AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system works, as without further changes these still push scaling limits. Subsequent commits will more deeply leverage the new structure, including using the prefix capabilities which cannot be easily factored out here and requires deep changes to the targets.	2025-02-04 18:04:57 +00:00
Helena Kotas	d92bac8a3e	[HLSL] Introduce address space `hlsl_constant(2)` for constant buffer declarations (#123411 ) Introduces a new address space `hlsl_constant(2)` for constant buffer declarations. This address space is applied to declarations inside `cbuffer` block. Later on, it will also be applied to `ConstantBuffer<T>` syntax and the default `$Globals` constant buffer. Clang codegen translates constant buffer declarations to global variables and loads from `hlsl_constant(2)` address space. More work coming soon will include addition of metadata that will map these globals to individual constant buffers and enable their transformation to appropriate constant buffer load intrinsics later on in an LLVM pass. Fixes #123406	2025-01-24 16:48:35 -08:00
Chandler Carruth	ca79ff07d8	Revert "Switch builtin strings to use string tables" (#119638 ) Reverts llvm/llvm-project#118734 There are currently some specific versions of MSVC that are miscompiling this code (we think). We don't know why as all the other build bots and at least some folks' local Windows builds work fine. This is a candidate revert to help the relevant folks catch their builders up and have time to debug the issue. However, the expectation is to roll forward at some point with a workaround if at all possible.	2024-12-13 23:58:48 -08:00
Chandler Carruth	be2df95e92	Switch builtin strings to use string tables (#118734 ) The Clang binary (and any binary linking Clang as a library), when built using PIE, ends up with a pretty shocking number of dynamic relocations to apply to the executable image: roughly 400k. Each of these takes up binary space in the executable, and perhaps most interestingly takes start-up time to apply the relocations. The largest pattern I identified were the strings used to describe target builtins. The addresses of these string literals were stored into huge arrays, each one requiring a dynamic relocation. The way to avoid this is to design the target builtins to use a single large table of strings and offsets within the table for the individual strings. This switches the builtin management to such a scheme. This saves over 100k dynamic relocations by my measurement, an over 25% reduction. Just looking at byte size improvements, using the `bloaty` tool to compare a newly built `clang` binary to an old one: ``` FILE SIZE VM SIZE -------------- -------------- +1.4% +653Ki +1.4% +653Ki .rodata +0.0% +960 +0.0% +960 .text +0.0% +197 +0.0% +197 .dynstr +0.0% +184 +0.0% +184 .eh_frame +0.0% +96 +0.0% +96 .dynsym +0.0% +40 +0.0% +40 .eh_frame_hdr +114% +32 [ = ] 0 [Unmapped] +0.0% +20 +0.0% +20 .gnu.hash +0.0% +8 +0.0% +8 .gnu.version +0.9% +7 +0.9% +7 [LOAD #2 [R]] [ = ] 0 -75.4% -3.00Ki .relro_padding -16.1% -802Ki -16.1% -802Ki .data.rel.ro -27.3% -2.52Mi -27.3% -2.52Mi .rela.dyn -1.6% -2.66Mi -1.6% -2.66Mi TOTAL ``` We get a 16% reduction in the `.data.rel.ro` section, and nearly 30% reduction in `.rela.dyn` where those reloctaions are stored. This is also visible in my benchmarking of binary start-up overhead at least: ``` Benchmark 1: ./old_clang --version Time (mean ± σ): 17.6 ms ± 1.5 ms [User: 4.1 ms, System: 13.3 ms] Range (min … max): 14.2 ms … 22.8 ms 162 runs Benchmark 2: ./new_clang --version Time (mean ± σ): 15.5 ms ± 1.4 ms [User: 3.6 ms, System: 11.8 ms] Range (min … max): 12.4 ms … 20.3 ms 216 runs Summary './new_clang --version' ran 1.13 ± 0.14 times faster than './old_clang --version' ``` We get about 2ms faster `--version` runs. While there is a lot of noise in binary execution time, this delta is pretty consistent, and represents over 10% improvement. This is particularly interesting to me because for very short source files, repeatedly starting the `clang` binary is actually the dominant cost. For example, `configure` scripts running against the `clang` compiler are slow in large part because of binary start up time, not the time to process the actual inputs to the compiler. ---- This PR implements the string tables using `constexpr` code and the existing macro system. I understand that the builtins are moving towards a TableGen model, and if complete that would provide more options for modeling this. Unfortunately, that migration isn't complete, and even the parts that are migrated still rely on the ability to break out of the TableGen model and directly expand an X-macro style `BUILTIN(...)` textually. I looked at trying to complete the move to TableGen, but it would both require the difficult migration of the remaining targets, and solving some tricky problems with how to move away from any macro-based expansion. I was also able to find a reasonably clean and effective way of doing this with the existing macros and some `constexpr` code that I think is clean enough to be a pretty good intermediate state, and maybe give a good target for the eventual TableGen solution. I was also able to factor the macros into set of consistent patterns that avoids a significant regression in overall boilerplate.	2024-12-08 19:00:14 -08:00
Nathan Gauër	f8b4182f07	Revert "[SPIR-V] Fixup storage class for global private (#116636 )" (#118312 ) This reverts commit aa7fe1c10e5d6d0d3aacdb345fed995de413e142.	2024-12-02 17:32:54 +01:00
Nathan Gauër	aa7fe1c10e	[SPIR-V] Fixup storage class for global private (#116636 ) Adds a new address spaces: `hlsl_private`. Variables with such address space will be emitted with a `Private` storage class. This is useful for variables global to a SPIR-V module, since up to now, they were still emitted with a `Function` storage class, which is wrong. --------- Signed-off-by: Nathan Gauër <brioche@google.com>	2024-12-02 16:17:44 +01:00
Fabian Ritter	d893c5ad35	[Clang][HIP] Reapply: Deprecate the AMDGCN_WAVEFRONT_SIZE macros (#115507 ) So far, these macros can be used in contexts where no meaningful wavefront size is available. We therefore deprecate these macros, to replace them with a more resilient interface to access wavefront size information where it is available. Reapplies #112849 with a fix for the non-hermetic clang test that failed on Mac after the revert in #115499. For SWDEV-491529.	2024-11-11 08:55:46 +01:00
Fabian Ritter	e734de1f5a	Revert "[Clang][HIP] Deprecate the AMDGCN_WAVEFRONT_SIZE macros" (#115499 ) Reverts llvm/llvm-project#112849 due to test failure on Mac, reported by @nico	2024-11-08 16:25:18 +01:00
Fabian Ritter	e5c6d1f4e6	[Clang][HIP] Deprecate the AMDGCN_WAVEFRONT_SIZE macros (#112849 ) So far, these macros can be used in contexts where no meaningful wavefront size is available. We therefore deprecate these macros, to replace them with a more resilient interface to access wavefront size information where it is available. For SWDEV-491529.	2024-11-08 14:24:34 +01:00
Alex Voicu	6e0b0038cd	[clang][OpenCL][CodeGen][AMDGPU] Do not use `private` as the default AS for when `generic` is available (#112442 ) Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use `private` as the default address space. This is wrong for cases where the `generic` address space is available, and is corrected via this patch. In general, this AS map abuse is a bad hack and we should re-work it altogether, but at least after this patch we will stop being incorrect for e.g. OpenCL 2.0.	2024-10-22 12:05:48 +01:00
Stanislav Mekhanoshin	f363e30f15	[AMDGPU] Report error in clang if wave32 is requested where unsupported (#97633 )	2024-07-09 14:25:58 -07:00
luolent	a98a6e95be	Add clarifying parenthesis around non-trivial conditions in ternary expressions. (#90391 ) Fixes [#85868](https://github.com/llvm/llvm-project/issues/85868) Parenthesis are added as requested on ternary operators with non trivial conditions. I used this [precedence table](https://en.cppreference.com/w/cpp/language/operator_precedence) for reference, to make sure we get the expected behavior on each change.	2024-05-04 18:38:45 +01:00
Pierre van Houtryve	43c7eb5d7b	[AMDGPU] Replace '.' with '-' in generic target names (#81718 ) The dot is too confusing for tools. Output temporaries would have '10.3-generic' so tools could parse it as an extension, device libs & the associated clang driver logic are also confused by the dot. After discussions, we decided it's better to just remove the '.' from the target name than fix each issue one by one.	2024-02-14 15:19:04 +01:00
Pierre van Houtryve	f93aa5157a	[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955 ) These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. This contains the documentation changes for both this change and #76954 as well.	2024-02-12 10:18:20 +01:00
Joseph Huber	6fecfbc7b6	[AMDGPU] Correctly exclude the HIP host from arch macros Summary: This logic was wrong and accidentally appling to OpenCL.	2024-01-30 13:45:01 -06:00
Joseph Huber	f2a78e68ee	[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#80035 ) Summary: Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means to create a sort of "generic" IR. The resulting IR will not contain any target dependent attributes and can then be inserted into another program via `-mlink-builtin-bitcode` to inherit its attributes. However, there are a handful of macros that can leak incorrect information when compiling for an unspecified architecture. Currently, things like the wavefront size will default to 64, which is actually variable. We should not expose these macros unless it is known.	2024-01-30 13:05:29 -06:00
Joseph Huber	72d4fc1b4d	Revert "[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#79660 )" This reverts commit c9a6e993f7b349405b6c8f9244cd9cf0f56a6a81. This breaks HIP code that incorrectly depended on GPU-specific macros to be set. The code is totally wrong as using `__WAVEFRTONSIZE__` on the host is absolutely meaningless, but it seems this entire corner of the toolchain is fundmentally broken. Reverting for now to avoid breakages.	2024-01-29 11:11:25 -06:00
Joseph Huber	c9a6e993f7	[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#79660 ) Summary: Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means to create a sort of "generic" IR. The resulting IR will not contain any target dependent attributes and can then be inserted into another program via `-mlink-builtin-bitcode` to inherit its attributes. However, there are a handful of macros that can leak incorrect information when compiling for an unspecified architecture. Currently, things like the wavefront size will default to 64, which is actually variable. We should not expose these macros unless it is known.	2024-01-29 08:46:14 -06:00
Jessica Del	32f9983c06	[AMDGPU] - Add address space for strided buffers (#74471 ) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.	2023-12-15 15:49:25 +01:00
Kazu Hirata	f3dcc2351c	[clang] Use StringRef::{starts,ends}_with (NFC) (#75149 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 08:54:13 -08:00
Dominik Adamski	276a024b49	[NFC][AMDGPU] Unify AMDGPU address space enum (#73944 ) Types of AMDGPU address space were defined not only in Clang-specific class but also in LLVM header. If we unify the AMD GPU address space enumeration, then we can reuse it in Clang, Flang and LLVM.	2023-12-11 10:45:21 +01:00
Yaxun (Sam) Liu	b8a9c50f22	[AMDGPU] Add target feature gws to clang Reviewed by: Matt Arsenault Differential Revision: https://reviews.llvm.org/D158367	2023-08-25 11:50:47 -04:00
Yaxun (Sam) Liu	7f12dcac79	[HIP] Fix regression about `__fp16` args and return value HIP allows __fp16 as function arguments and return value by passing -fallow-half-arguments-and-returns to clang through hipcc. https://reviews.llvm.org/D133885 removed -fallow-half-arguments-and-returns and add a TargetInfo member to control it. This caused regressions in some HIP apps (https://github.com/ROCm-Developer-Tools/HIP/issues/3178). Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D145345 Fixes: https://github.com/ROCm-Developer-Tools/HIP/issues/3178	2023-08-01 11:29:19 -04:00
Yaxun (Sam) Liu	ad96f25b93	[AMDGPU] Rename predefined macro __AMDGCN_WAVEFRONT_SIZE rename it to __AMDGCN_WAVEFRONT_SIZE__ for consistency. __AMDGCN_WAVEFRONT_SIZE will be deprecated in the future. Reviewed by: Matt Arsenault, Johannes Doerfert Differential Revision: https://reviews.llvm.org/D154207	2023-07-02 11:10:06 -04:00
Yaxun (Sam) Liu	c0f0d50653	[HIP] emit macro `__HIP_NO_IMAGE_SUPPORT` HIP texture/image support is optional as some devices do not have image instructions. A macro __HIP_NO_IMAGE_SUPPORT is defined for device not supporting images (`d0448aa4c4/docs/reference/kernel_language.md (L426)` ) Currently the macro is defined by HIP header based on predefined macros for GPU, e.g __gfx*__ , which is error prone. This patch let clang emit the predefined macro. Reviewed by: Matt Arsenault, Artem Belevich Differential Revision: https://reviews.llvm.org/D151349	2023-06-14 22:53:41 -04:00
Yaxun (Sam) Liu	6adb9a0602	[AMDGPU] Emit predefined macro `__AMDGCN_CUMODE__` Predefine __AMDGCN_CUMODE__ as 1 or 0 when compilation assumes CU or WGP modes. If WGP mode is not supported, ignore -mno-cumode and emit a warning. This is needed for implementing device functions like __smid (`312dff7b79/include/hip/amd_detail/amd_device_functions.h (L957)`) Reviewed by: Matt Arsenault, Artem Belevich, Brian Sumner Differential Revision: https://reviews.llvm.org/D145343	2023-05-12 18:50:52 -04:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00
Krzysztof Drewniak	f9c1ede254	[AMDGPU] Define data layout entries for buffers Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441	2023-05-03 15:25:58 +00:00
Dominik Adamski	e43247dd32	[Clang][Flang][AMDGPU] Add support for AMDGPU to Flang driver Scope of changes: 1) Extract common code between Clang and Flang for parsing AMDGPU features 2) Add function which adds implicit target features for AMDGPU as Clang does 3) Add AMDGPU target as one of valid targets for Flang Differential Revision: https://reviews.llvm.org/D145579 Reviewed By: yaxunl, awarzynski	2023-03-29 02:23:37 -05:00
Anshil Gandhi	a955a31896	[AMDGPU] Replace target feature for global fadd32 Change target feature of __builtin_amdgcn_global_atomic_fadd_f32 to atomic-fadd-rtn-insts. Enable atomic-fadd-rtn-insts for gfx90a, gfx940 and gfx1100 as they all support the return variant of `global_atomic_add_f32`. Fixes https://github.com/llvm/llvm-project/issues/61331. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D146840	2023-03-28 15:58:30 -06:00
Mariusz Sikora	ea064ee2a3	[AMDGPU] Create Subtarget Features for some of 16 bits atomic fadd instructions Introducing Subtarget Features for instructions: - ds_pk_add_bf16 - ds_pk_add_f16 - ds_pk_add_rtn_bf16 - ds_pk_add_rtn_f16 - flat_atomic_pk_add_f16 - flat_atomic_pk_add_bf16 - global_atomic_pk_add_f16 - global_atomic_pk_add_bf16 - buffer_atomic_pk_add_f16 Differential Revision: https://reviews.llvm.org/D146701	2023-03-24 13:10:40 +01:00
Stanislav Mekhanoshin	df0488369d	[AMDGPU] Split dot7 feature Differential Revision: https://reviews.llvm.org/D142507	2023-01-26 10:34:36 -08:00
Stanislav Mekhanoshin	870b92977e	[AMDGPU] Split dot8 feature Differential Revision: https://reviews.llvm.org/D142407	2023-01-24 11:16:07 -08:00
Stanislav Mekhanoshin	4ab2246d48	[AMDGPU] Remove dot1 and dot6 features from clang for gfx11 These are unsupported. Differential Revision: https://reviews.llvm.org/D142493	2023-01-24 10:52:42 -08:00
serge-sans-paille	5a7f47cc02	[clang] Optimize clang::Builtin::Info density Reorganize clang::Builtin::Info to have them naturally align on 4 bytes boundaries. Instead of storing builtin headers as a straight char pointer, enumerate them and store the enum. It allows to use a small enum instead of a pointer to reference them. On a 64 bit machine, this brings sizeof(clang::Builtin::Info) from 56 down to 48 bytes. On a release build on my Linux 64 bit machine, it shrinks the size of libclang-cpp.so by 193kB. The impact on performance is negligible in terms of instruction count, but the wall time seems better, see https://llvm-compile-time-tracker.com/compare.php?from=b3d8639f3536a4876b511aca9fb7948ff9266cee&to=a89b56423f98b550260a58c41e64aff9e56b76be&stat=task-clock Differential Revision: https://reviews.llvm.org/D142024	2023-01-23 14:27:44 +01:00
serge-sans-paille	a3c248db87	Move from llvm::makeArrayRef to ArrayRef deduction guides - clang/ part This is a follow-up to https://reviews.llvm.org/D140896, split into several parts as it touches a lot of files. Differential Revision: https://reviews.llvm.org/D141139	2023-01-09 12:15:24 +01:00
Matt Arsenault	81849497b4	clang/AMDGPU: Remove flat-address-space from feature map This was only used for checking if is_shared/is_private were legal, which we're not bothering to do anymore. This is apparently visible to more than the target attribute (which seems to silently ignore unrecognized features), so this has the potential to break something (i.e. see the OpenMP test change)	2023-01-05 16:35:04 -05:00
Matt Arsenault	f4bcd7f598	AMDGPU/clang: Add builtins for llvm.amdgcn.ballot Use explicit _w32/_w64 suffixes for the wave size to be consistent with the existing other wave dependent intrinsics. Also start diagnosing trying to use both wave32 and wave64. I would have preferred to avoid the +wavefrontsize64 spam on targets where that's the only option, but avoiding this seems to be more work than I expected.	2022-12-29 17:58:55 -05:00
serge-sans-paille	d9ab3e82f3	[clang] Use a StringRef instead of a raw char pointer to store builtin and call information This avoids recomputing string length that is already known at compile time. It has a slight impact on preprocessing / compile time, see https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u This a recommit of e953ae5bbc313fd0cc980ce021d487e5b5199ea4 and the subsequent fixes caa713559bd38f337d7d35de35686775e8fb5175 and 06b90e2e9c991e211fecc97948e533320a825470. The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in aa171833ab0017d9732e82b8682c9848ab25ff9e. The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable. Differential Revision: https://reviews.llvm.org/D139881	2022-12-27 09:55:19 +01:00
Pierre van Houtryve	678d8946ba	[AMDGPU] Add bf16 storage support - [Clang] Declare AMDGPU target as supporting BF16 for storage-only purposes on amdgcn - Add Sema & CodeGen tests cases. - Also add cases that D138651 would have covered as this patch replaces it. - [AMDGPU] Add BF16 storage-only support - Support legalization/dealing with bf16 operations in DAGIsel. - bf16 as a type remains illegal and is represented as i16 for storage purposes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139398	2022-12-13 10:34:26 -05:00
Alex Richardson	a602f76a24	[clang][TargetInfo] Use LangAS for getPointer{Width,Align}() Mixing LLVM and Clang address spaces can result in subtle bugs, and there is no need for this hook to use the LLVM IR level address spaces. Most of this change is just replacing zero with LangAS::Default, but it also allows us to remove a few calls to getTargetAddressSpace(). This also removes a stale comment+workaround in CGDebugInfo::CreatePointerLikeType(): ASTContext::getTypeSize() does return the expected size for ReferenceType (and handles address spaces). Differential Revision: https://reviews.llvm.org/D138295	2022-11-30 20:24:01 +00:00
Xiang Li	7e04c0ad63	[HLSL] Add groupshare address space. Added keyword, LangAS and TypeAttrbute for groupshared. Tanslate it to LangAS with asHLSLLangAS. Make sure it translated into address space 3 for DirectX target. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D135060	2022-10-20 09:29:09 -07:00
Kazu Hirata	981cbfb592	[clang] Don't include StringSwitch.h (NFC) These files don't seem to use StringSwitch.	2022-09-18 22:21:32 -07:00

1 2 3

143 Commits