llvm-project

Author	SHA1	Message	Date
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
jofrn	15d36aa4ce	[clang][CodeGen] Preserve addrspace of enqueue_kernel builtin. (#148062 ) __enqueue_kernel_varargs' last parameter is in addrspace(5), but CodeGen currently misses this qualifier. This commit fixes the code to preserve the qualifier by referencing Alloca, which has its casts removed, rather than TmpPtr.	2025-07-11 17:00:28 -04:00
Matt Arsenault	a11d86461e	clang: Fix broken implicit cast to generic address space (#138863 ) This fixes emitting undefined behavior where a 64-bit generic pointer is written to a 32-bit slot allocated for a private pointer. This can be seen in test/CodeGenOpenCL/amdgcn-automatic-variable.cl's wrong_pointer_alloca.	2025-05-08 07:51:57 +02:00
Aniket Lal	c3ce5684a8	[Clang][OpenCL][AMDGPU] OpenCL Kernel stubs should be assigned alwaysinline attribute (#137769 ) OpenCL Kernels body is emitted as stubs and the kernel is emitted as call to respective stub. (https://github.com/llvm/llvm-project/pull/115821). The stub function should be alwaysinlined, since call to stub can cause performance drop. Co-authored-by: anikelal <anikelal@amd.com>	2025-05-07 15:42:23 +05:30
Aniket Lal	642481a428	[Clang][OpenCL][AMDGPU] Allow a kernel to call another kernel (#115821 ) This feature is currently not supported in the compiler. To facilitate this we emit a stub version of each kernel function body with different name mangling scheme, and replaces the respective kernel call-sites appropriately. Fixes https://github.com/llvm/llvm-project/issues/60313 D120566 was an earlier attempt made to upstream a solution for this issue. --------- Co-authored-by: anikelal <anikelal@amd.com>	2025-04-08 10:29:30 +05:30
Joseph Huber	772173f548	[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870 ) Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in https://github.com/llvm/llvm-project/pull/131033 removed support for COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic.	2025-03-28 07:35:16 -05:00
Shilei Tian	f1ac2afe21	Reapply "[AMDGPU] Use COV6 by default (#118515 )" (#130963 ) This reverts commit 68bcba6d7a1cc18996c0bcb7c62267c62d2040d0.	2025-03-21 15:26:45 -04:00
Matt Arsenault	0d2c55cb96	AMDGPU: Move enqueued block handling into clang (#128519 ) The previous implementation wasn't maintaining a faithful IR representation of how this really works. The value returned by createEnqueuedBlockKernel wasn't actually used as a function, and hacked up later to be a pointer to the runtime handle global variable. In reality, the enqueued block is a struct where the first field is a pointer to the kernel descriptor, not the kernel itself. We were also relying on passing around a reference to a global using a string attribute containing its name. It's better to base this on a proper IR symbol reference during final emission. This now avoids using a function attribute on kernels and avoids using the additional "runtime-handle" attribute to populate the final metadata. Instead, associate the runtime handle reference to the kernel with the !associated global metadata. We can then get a final, correctly mangled name at the end. I couldn't figure out how to get rename-with-external-symbol behavior using a combination of comdats and aliases, so leaves an IR pass to externalize the runtime handles for codegen. If anything breaks, it's most likely this, so leave avoiding this for a later step. Use a special section name to enable this behavior. This also means it's possible to declare enqueuable kernels in source without going through the dedicated block syntax or other dedicated compiler support. We could move towards initializing the runtime handle in the compiler/linker. I have a working patch where the linker sets up the first field of the handle, avoiding the need to export the block kernel symbol for the runtime. We would need new relocations to get the private and group sizes, but that would avoid the runtime's special case handling that requires the device_enqueue_symbol metadata field. https://reviews.llvm.org/D141700	2025-03-10 19:54:04 +07:00
Matt Arsenault	bfea84946d	clang: Hack around opencl enqueue_block using wrong ABI for aggregrate (#130011 ) EmitAggExprToLValue started wrapping the temporary alloca in an addrspacecast at some point. We take the direct type from this as the pointer argument for the runtime function type, but this isn't correct. Technically, we should be querying the target's ABI for what IR to produce for this sequence. The assumption seems to always have been that this will be indirectly passed with byval (or byref). I started working on a patch to go through the ABI handling, but it seems to require more time and/or clang expertise than I have at the moment.	2025-03-06 23:13:28 +07:00
Florian Hahn	77d3f8a925	[TBAA] Don't emit pointer-tbaa for void pointers. (#122116 ) While there are no special rules in the standards regarding void pointers and strict aliasing, emitting distinct tags for void pointers break some common idioms and there is no good alternative to re-write the code without strict-aliasing violations. An example is to count the entries in an array of pointers: int count_elements(void * values) { void **seq = values; int count; for (count = 0; seq && seq[count]; count++); return count; } https://clang.godbolt.org/z/8dTv51v8W An example in the wild is from https://github.com/llvm/llvm-project/issues/119099 This patch avoids emitting distinct tags for void pointers, to avoid those idioms causing mis-compiles for now. Fixes https://github.com/llvm/llvm-project/issues/119099. Fixes https://github.com/llvm/llvm-project/issues/122537. PR: https://github.com/llvm/llvm-project/pull/122116	2025-01-31 11:38:14 +00:00
Florian Hahn	7954a0514b	[Clang] Enable -fpointer-tbaa by default. (#117244 ) Support for more precise TBAA metadata has been added a while ago (behind the -fpointer-tbaa flag). The more precise TBAA metadata allows treating accesses of different pointer types as no-alias. This helps to remove more redundant loads and stores in a number of workloads. Some highlights on the impact across llvm-test-suite's MultiSource, SPEC2006 & SPEC2017 include: * +2% more NoAlias results for memory accesses * +3% more stores removed by DSE, * +4% more loops vectorized. This closes a relatively big gap to GCC, which has been supporting disambiguating based on pointer types for a long time. (https://clang.godbolt.org/z/K7Wbhrz4q) Pointer-TBAA support for pointers to builtin types has been added in https://github.com/llvm/llvm-project/pull/76612. Support for user-defined types has been added in https://github.com/llvm/llvm-project/pull/110569. There are 2 recent PRs with bug fixes for special cases uncovered during testing: * https://github.com/llvm/llvm-project/pull/116991 * https://github.com/llvm/llvm-project/pull/116596 PR: https://github.com/llvm/llvm-project/pull/117244	2024-12-04 20:55:18 +00:00
Shilei Tian	68bcba6d7a	Revert "[AMDGPU] Use COV6 by default (#118515 )" This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some buildbots are not ready yet.	2024-12-03 20:17:06 -05:00
Shilei Tian	410cbe3cf2	[AMDGPU] Use COV6 by default (#118515 )	2024-12-03 19:38:35 -05:00
Shilei Tian	4b50ec43d0	[Clang] Avoid Using `byval` for `ndrange_t` when emitting `__enqueue_kernel_basic` (#116435 ) AMDGPU disabled the use of `byval` for struct argument passing in commit d77c620. However, when emitting `__enqueue_kernel_basic`, Clang still adds the `byval` attribute by default. Emitting the `byval` attribute by default in this context doesn’t seem like a good idea, as argument-passing conventions are highly target-dependent, and assumptions here could lead to issues. This PR removes the addition of the `byval` attribute, aligning the behavior with other `__enqueue_kernel_*` functions.	2024-11-15 16:54:29 -05:00
Alex Voicu	6e0b0038cd	[clang][OpenCL][CodeGen][AMDGPU] Do not use `private` as the default AS for when `generic` is available (#112442 ) Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use `private` as the default address space. This is wrong for cases where the `generic` address space is available, and is corrected via this patch. In general, this AS map abuse is a bad hack and we should re-work it altogether, but at least after this patch we will stop being incorrect for e.g. OpenCL 2.0.	2024-10-22 12:05:48 +01:00
Hari Limaye	94473f4db6	[IRBuilder] Generate nuw GEPs for struct member accesses (#99538 ) Generate nuw GEPs for struct member accesses, as inbounds + non-negative implies nuw. Regression tests are updated using update scripts where possible, and by find + replace where not.	2024-08-09 13:25:04 +01:00
Emma Pilkington	4490003a22	[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905 ) The previous name 'amdgpu_code_object_version', was misleading since this is really a property of the HSA OS. The new spelling also matches the asm directive I added in bc82cfb.	2024-03-06 09:51:48 -05:00
Saiyedul Islam	082f87c9d4	[AMDGPU] Change default AMDHSA Code Object version to 5 (#79038 ) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Corresponding llvm-objdump AMDGPU lit tests are updated in a follow-up PR.	2024-01-23 17:08:18 +05:30
Saiyedul Islam	466a8149b3	Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 )" (#66060 ) This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.	2023-09-12 15:13:59 +05:30
Saiyedul Islam	0a8d17e79b	[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 ) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed By: arsenm, jhuber6 Github PR: #65410 Differential Revision: https://reviews.llvm.org/D129818	2023-09-12 13:53:31 +05:30
Matt Arsenault	647925648a	clang/OpenCL: Apply default attributes to enqueued blocks This was missing important environment context, like denormal-fp-math and target-features. Curiously this seems to be losing nounwind. Note this only fixes the actual invoke kernel. The invoke function is already setting the default attribute set for internal functions. However that is still buggy since it's not applying any use function attributes (it's also missing uniform-work-group-size). There seem to be too many different functions for setting attributes with inconsistent behavior. The Function overload of addDefaultFunctionAttributes seems to miss the target-cpu and target-features. The AttrBuilder one seems to miss optnone (but that seems to be disallowed on blocks anyway). Neither one calls setTargetAttributes, when it probably should. uniform-work-group-size is also set through AMDGPU code when it should be emitting generically as a language property. I also noticed update_cc_test_checks for attributes seem to not connect the captured attribute variables to the attributes at the end (although I think the numbers happen to work out correctly).	2023-01-30 15:03:15 -04:00
Matt Arsenault	d12ee4bf7c	clang/OpenCL: Extend tests for enqueued block attributes Baseline tests showing that enqueued blocks are not getting the correct attributes applied.	2023-01-30 15:03:15 -04:00
Matt Arsenault	00f6a7f02f	clang/OpenCL: Fix not setting convergent on block invoke kernels Yet another example how convergent not being the default is dangerous and backwards.	2023-01-30 15:03:14 -04:00
Sven van Haastregt	1495210914	[OpenCL] Always add nounwind attribute for OpenCL Neither OpenCL nor C++ for OpenCL support exceptions, so add the `nounwind` attribute unconditionally for those languages. Differential Revision: https://reviews.llvm.org/D142033	2023-01-20 12:01:22 +00:00
Matt Arsenault	f9559b1e30	clang: Convert test to generated checks and opaque pointers	2023-01-10 20:35:49 -05:00
Nikita Popov	532dc62b90	[OpaquePtrs][Clang] Add -no-opaque-pointers to tests (NFC) This adds -no-opaque-pointers to clang tests whose output will change when opaque pointers are enabled by default. This is intended to be part of the migration approach described in https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9. The patch has been produced by replacing %clang_cc1 with %clang_cc1 -no-opaque-pointers for tests that fail with opaque pointers enabled. Worth noting that this doesn't cover all tests, there's a remaining ~40 tests not using %clang_cc1 that will need a followup change. Differential Revision: https://reviews.llvm.org/D123115	2022-04-07 12:09:47 +02:00
Fangrui Song	fd739804e0	[test] Add {{.}} to make ELF tests immune to dso_local/dso_preemptable/(none) differences For a default visibility external linkage definition, dso_local is set for ELF -fno-pic/-fpie and COFF and Mach-O. Since default clang -cc1 for ELF is similar to -fpic ("PIC Level" is not set), this nuance causes unneeded binary format differences. To make emitted IR similar, ELF -cc1 -fpic will default to -fno-semantic-interposition, which sets dso_local for default visibility external linkage definitions. To make this flip smooth and enable future (dso_local as definition default), this patch replaces (function) `define ` with `define{{.}} `, (variable/constant/alias) `= ` with `={{.}} `, or inserts appropriate `{{.}} `.	2020-12-31 00:27:11 -08:00
Tim Northover	a009a60a91	IR: print value numbers for unnamed function arguments For consistency with normal instructions and clarity when reading IR, it's best to print the %0, %1, ... names of function arguments in definitions. Also modifies the parser to accept IR in that form for obvious reasons. llvm-svn: 367755	2019-08-03 14:28:34 +00:00
Sven van Haastregt	da3b632057	Revert r326937 "[OpenCL] Remove block invoke function from emitted block literal struct" This reverts r326937 as it broke block argument handling in OpenCL. See the discussion on https://reviews.llvm.org/D43783 . The next commit will add a test case that revealed the issue. llvm-svn: 343582	2018-10-02 13:02:24 +00:00
Yaxun Liu	cb35e9fa94	[OpenCL] Remove block invoke function from emitted block literal struct OpenCL runtime tracks the invoke function emitted for any block expression. Due to restrictions on blocks in OpenCL (v2.0 s6.12.5), it is always possible to know the block invoke function when emitting call of block expression or __enqueue_kernel builtin functions. Since __enqueu_kernel already has an argument for the invoke function, it is redundant to have invoke function member in the llvm block literal structure. This patch removes invoke function from the llvm block literal structure. It also removes the bitcast of block invoke function to the generic block literal type which is useless for OpenCL. This will save some space for the kernel argument, and also eliminate some store instructions. Differential Revision: https://reviews.llvm.org/D43783 llvm-svn: 326937	2018-03-07 19:32:58 +00:00
Yaxun Liu	fa13d015a3	[OpenCL] Fix __enqueue_block for block with captures The following test case causes issue with codegen of __enqueue_block void (^block)(void) = ^{ callee(id, out); }; enqueue_kernel(queue, 0, ndrange, block); Clang first does codegen for block expression in the first line and deletes its block info. Clang then tries to do codegen for the same block expression again for the second line, and fails because the block info is gone. The fix is to do normal codegen for both lines. Introduce an API to OpenCL runtime to record llvm block invoke function and llvm block literal emitted for each AST block expression, and use the recorded information for generating the wrapper kernel. The EmitBlockLiteral APIs are cleaned up to minimize changes to the normal codegen of blocks. Another minor issue is that some clean up AST expression is generated for block with captures, which can be stripped by IgnoreImplicit. Differential Revision: https://reviews.llvm.org/D43240 llvm-svn: 325264	2018-02-15 16:39:19 +00:00
Yaxun Liu	f5f45e5e63	[AMDGPU] Switch to the new addr space mapping by default This requires corresponding llvm change. Differential Revision: https://reviews.llvm.org/D40956 llvm-svn: 324102	2018-02-02 16:08:24 +00:00
Yaxun Liu	f2127d1728	[AMDGPU] Fix bug in enqueued block codegen due to an extra line llvm-svn: 316165	2017-10-19 15:56:13 +00:00
Yaxun Liu	c2a87a05f1	[OpenCL] Emit enqueued block as kernel In OpenCL the kernel function and non-kernel function has different calling conventions. For certain targets they have different argument ABIs. Also kernels have special function attributes and metadata for runtime to launch them. The blocks passed to enqueue_kernel is supposed to be executed as kernels. As such, the block invoke function should be emitted as kernel with proper calling convention and argument ABI. This patch emits enqueued block as kernel. If a block is both called directly and passed to enqueue_kernel, separate functions will be generated. Differential Revision: https://reviews.llvm.org/D38134 llvm-svn: 315804	2017-10-14 12:23:50 +00:00

34 Commits