llvm-project

Author	SHA1	Message	Date
Matt Arsenault	f59f116bd5	AMDGPU: Add __builtin_amdgcn_permlane64	2022-10-13 21:12:11 -07:00
Petar Avramovic	dcc756d03e	[AMDGPU] Pattern for flat atomic fadd f64 intrinsic with local addr Fix regression from clang opencl test in builtins-fp-atomics-gfx90a.cl test_flat_add_local_f64 caused by D130579 Revert a3becb333d7faae695e18728e9b8fa3a3579a240. Differential Revision: https://reviews.llvm.org/D134568	2022-09-25 13:25:41 +02:00
Petar Avramovic	a3becb333d	[clang][AMDGPU] Temporarily disable clang atomic fadd test for gfx90a Test is broken by D130579. Temporarily disable to silence builbot failures.	2022-09-23 21:49:16 +02:00
Stanislav Mekhanoshin	e540965915	[AMDGPU] Added __builtin_amdgcn_ds_bvh_stack_rtn Differential Revision: https://reviews.llvm.org/D133966	2022-09-16 02:42:09 -07:00
Fangrui Song	74742147ee	[test] Change cc1 -fvisibility to -fvisibility=	2022-09-02 12:36:44 -07:00
Muhammad Omair Javaid	18de7c6a3b	Revert "[InstCombine] Treat passing undef to noundef params as UB" This reverts commit c911befaec494c52a63e3b957e28d449262656fb. It has broken LLDB Arm/AArch64 Linux buildbots. I dont really understand the underlying reason. Reverting for now make buildbot green. https://reviews.llvm.org/D133036	2022-09-02 16:09:50 +05:00
Arthur Eubanks	c911befaec	[InstCombine] Treat passing undef to noundef params as UB Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D133036	2022-09-01 15:16:45 -07:00
Anastasia Stulova	6b1a04529c	[OpenCL][SPIR-V] Test extern functions with a pointer arg. Added a test case that enhances coverage of opaque pointers particularly for the problematic case with extern functions for which there is no solution found for type recovery. Differential Revision: https://reviews.llvm.org/D130768	2022-09-01 10:22:47 +01:00
Yaxun (Sam) Liu	9f6cb3e9fd	[AMDGPU] Add builtin s_sendmsg_rtn Reviewed by: Brian Sumner, Artem Belevich Differential Revision: https://reviews.llvm.org/D132140 Fixes: SWDEV-352017	2022-08-22 18:29:23 -04:00
Austin Kerbow	b0f4678b90	[AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy Adds a builtin that serves as an optimization hint to apply specific optimized DAG mutations during scheduling. This also disables any other mutations or clustering that may interfere with the desired pipeline. The first optimization strategy that is added here is designed to improve the performance of small gemm kernels on gfx90a. Reviewed By: jrbyrnes Differential Revision: https://reviews.llvm.org/D132079	2022-08-19 15:38:36 -07:00
Austin Kerbow	f5b21680d1	[AMDGPU] Add amdgcn_sched_group_barrier builtin This builtin allows the creation of custom scheduling pipelines on a per-region basis. Like the sched_barrier builtin this is intended to be used either for testing, in situations where the default scheduler heuristics cannot be improved, or in critical kernels where users are trying to get performance that is close to handwritten assembly. Obviously using these builtins will require extra work from the kernel writer to maintain the desired behavior. The builtin can be used to create groups of instructions called "scheduling groups" where ordering between the groups is enforced by the scheduler. __builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter is a mask that determines the types of instructions that you would like to synchronize around and add to a scheduling group. These instructions will be selected from the bottom up starting from the sched_group_barrier's location during instruction scheduling. The second parameter is the number of matching instructions that will be associated with this sched_group_barrier. The third parameter is an identifier which is used to describe what other sched_group_barriers should be synchronized with. Note that multiple sched_group_barriers must be added in order for them to be useful since they only synchronize with other sched_group_barriers. Only "scheduling groups" with a matching third parameter will have any enforced ordering between them. As an example, the code below tries to create a pipeline of 1 VMEM_READ instruction followed by 1 VALU instruction followed by 5 MFMA instructions... // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 1 VALU __builtin_amdgcn_sched_group_barrier(2, 1, 0) // 5 MFMA __builtin_amdgcn_sched_group_barrier(8, 5, 0) // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 3 VALU __builtin_amdgcn_sched_group_barrier(2, 3, 0) // 2 VMEM_WRITE __builtin_amdgcn_sched_group_barrier(64, 2, 0) Reviewed By: jrbyrnes Differential Revision: https://reviews.llvm.org/D128158	2022-07-28 10:43:14 -07:00
Aaron Ballman	7068aa9841	Strengthen -Wint-conversion to default to an error Clang has traditionally allowed C programs to implicitly convert integers to pointers and pointers to integers, despite it not being valid to do so except under special circumstances (like converting the integer 0, which is the null pointer constant, to a pointer). In C89, this would result in undefined behavior per 3.3.4, and in C99 this rule was strengthened to be a constraint violation instead. Constraint violations are most often handled as an error. This patch changes the warning to default to an error in all C modes (it is already an error in C++). This gives us better security posture by calling out potential programmer mistakes in code but still allows users who need this behavior to use -Wno-error=int-conversion to retain the warning behavior, or -Wno-int-conversion to silence the diagnostic entirely. Differential Revision: https://reviews.llvm.org/D129881	2022-07-22 15:24:54 -04:00
Stanislav Mekhanoshin	523a99c0eb	[AMDGPU] Support for gfx940 fp8 smfmac Differential Revision: https://reviews.llvm.org/D129908	2022-07-18 12:12:41 -07:00
Stanislav Mekhanoshin	2695f0a688	[AMDGPU] Support for gfx940 fp8 mfma Differential Revision: https://reviews.llvm.org/D129906	2022-07-18 11:49:56 -07:00
Stanislav Mekhanoshin	9fa5a6b7e8	[AMDGPU] Support for gfx940 fp8 conversions Differential Revision: https://reviews.llvm.org/D129902	2022-07-18 11:48:43 -07:00
Piotr Sobczak	4a78225212	[AMDGPU] Add WMMA clang builtins Add WMMA clang builtins and tests. Extra changes in code are needed to handle function overloads. WavefrontSize 32: __builtin_amdgcn_wmma_f32_16x16x16_f16_w32 __builtin_amdgcn_wmma_f32_16x16x16_bf16_w32 __builtin_amdgcn_wmma_f16_16x16x16_f16_w32 __builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32 __builtin_amdgcn_wmma_i32_16x16x16_iu8_w32 __builtin_amdgcn_wmma_i32_16x16x16_iu4_w32 WavefrontSize 64: __builtin_amdgcn_wmma_f32_16x16x16_f16_w64 __builtin_amdgcn_wmma_f32_16x16x16_bf16_w64 __builtin_amdgcn_wmma_f16_16x16x16_f16_w64 __builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64 __builtin_amdgcn_wmma_i32_16x16x16_iu8_w64 __builtin_amdgcn_wmma_i32_16x16x16_iu4_w64 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D128952	2022-07-01 08:55:25 +02:00
Joe Nash	2d43de13df	[AMDGPU] gfx11 new dot instruction codegen support Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D127904	2022-06-16 14:19:34 -04:00
Austin Kerbow	2db700215a	[AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If there is a need to have highly optimized codegen and kernel developers have knowledge of inter-wave runtime behavior which is unknown to the compiler this builtin can be used to tune scheduling. This intrinsic creates a barrier between scheduling regions. The immediate parameter is a mask to determine the types of instructions that should be prevented from crossing the sched_barrier. In this initial patch, there are only two variations. A mask of 0 means that no instructions may be scheduled across the sched_barrier. A mask of 1 means that non-memory, non-side-effect inducing instructions may cross the sched_barrier. Note that this intrinsic is only meant to work with the scheduling passes. Any other transformations that may move code will not be impacted in the ways described above. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D124700	2022-05-11 13:22:51 -07:00
Joe Nash	8bdfc73f63	[AMDGPU][clang] Definition of gfx11 subtarget Contributors: Jay Foad <jay.foad@amd.com> Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> Patch 2/N for upstreaming of AMDGPU gfx11 architecture Depends on D124536 Reviewed By: foad, kzhuravl, #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D124537	2022-04-29 13:55:56 -04:00
Nikita Popov	b16a3b4f3b	[Clang] Add -no-opaque-pointers to more tests (NFC) This adds the flag to more tests that were not caught by the mass-migration in 532dc62b907554b3f07f17205674aa71e76fc863.	2022-04-07 12:53:29 +02:00
Nikita Popov	532dc62b90	[OpaquePtrs][Clang] Add -no-opaque-pointers to tests (NFC) This adds -no-opaque-pointers to clang tests whose output will change when opaque pointers are enabled by default. This is intended to be part of the migration approach described in https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9. The patch has been produced by replacing %clang_cc1 with %clang_cc1 -no-opaque-pointers for tests that fail with opaque pointers enabled. Worth noting that this doesn't cover all tests, there's a remaining ~40 tests not using %clang_cc1 that will need a followup change. Differential Revision: https://reviews.llvm.org/D123115	2022-04-07 12:09:47 +02:00
Scott Linder	09f33a430b	[AMDGPU][OpenCL] Remove "printf and hostcall" diagnostic The diagnostic is unreliable, and triggers even for dead uses of hostcall that may exist when linking the device-libs at lower optimization levels. Eliminate the diagnostic, and directly document the limitation for OpenCL before code object V5. Make some NFC changes to clarify the related code in the MetadataStreamer. Add a clang test to tie OCL sources containing printf to the backend IR tests for this situation. Reviewed By: sameerds, arsenm, yaxunl Differential Revision: https://reviews.llvm.org/D121951	2022-04-05 19:10:23 +00:00
Nikita Popov	f348ca51c7	[Tests] Use %clang_cc1 instead of %clang -cc1 in codegen tests (NFC)	2022-04-05 13:21:44 +02:00
Stanislav Mekhanoshin	6e3e14f600	[AMDGPU] Support gfx940 smfmac instructions Differential Revision: https://reviews.llvm.org/D122191	2022-03-24 12:40:42 -07:00
Stanislav Mekhanoshin	27439a7642	[AMDGPU] New gfx940 mfma instructions Differential Revision: https://reviews.llvm.org/D122044	2022-03-24 12:12:52 -07:00
Austin Kerbow	62bcfcb5a5	[AMDGPU] Add llvm.amdgcn.s.setprio intrinsic Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D120976	2022-03-12 22:15:42 -08:00
Stanislav Mekhanoshin	932f628121	[AMDGPU] new gfx940 fp atomics Differential Revision: https://reviews.llvm.org/D121028	2022-03-07 12:32:02 -08:00
Aakanksha	840695814a	[AMDGPU] Add gfx1036 target Differential Revision: https://reviews.llvm.org/D120846	2022-03-02 23:26:38 +00:00
Stanislav Mekhanoshin	2e2e64df4a	[AMDGPU] Add gfx940 target This is target definition only. Differential Revision: https://reviews.llvm.org/D120688	2022-03-02 13:54:48 -08:00
hyeongyukim	b529744c29	[Clang] Rename `disable-noundef-analysis` flag to `-[no-]enable-noundef-analysis` This flag was previously renamed `enable_noundef_analysis` to `disable-noundef-analysis,` which is not a conventional name. (Driver and CC1's boolean options are using [no-] prefix) As discussed at https://reviews.llvm.org/D105169, this patch reverts its name to `[no-]enable_noundef_analysis` and enables noundef-analysis as default. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D119998	2022-02-18 17:02:41 +09:00
Aaron Ballman	1ea584377e	A significant number of our tests in C accidentally use functions without prototypes. This patch converts the function signatures to have a prototype for the situations where the test is not specific to K&R C declarations. e.g., void func(); becomes void func(void); This is the ninth batch of tests being updated (there are a significant number of other tests left to be updated).	2022-02-13 08:03:40 -05:00
Anton Zabaznov	bee4bd70f7	[OpenCL] Add support of language builtins for OpenCL C 3.0 OpenCL C 3.0 introduces optionality to some builtins, in particularly to those which are conditionally supported with pipe, device enqueue and generic address space features. The idea is to conditionally support such builtins depending on the language options being set for a certain feature. This allows users to define functions with names of those optional builtins in OpenCL (as such names are not reserved). Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D118605	2022-02-11 15:53:44 +03:00
Nikita Popov	18834dca2d	[OpenCL] Mark kernel arguments as ABI aligned Following the discussion on D118229, this marks all pointer-typed kernel arguments as having ABI alignment, per section 6.3.5 of the OpenCL spec: > For arguments to a __kernel function declared to be a pointer to > a data type, the OpenCL compiler can assume that the pointee is > always appropriately aligned as required by the data type. Differential Revision: https://reviews.llvm.org/D118894	2022-02-08 16:12:51 +01:00
Sven van Haastregt	8e6099291d	[OpenCL] Make generic addrspace optional for -fdeclare-opencl-builtins Currently, -fdeclare-opencl-builtins always adds the generic address space overloads of e.g. the vload builtin functions in OpenCL 3.0 mode, even when the generic address space feature is disabled. Guard the generic address space overloads by the `__opencl_c_generic_address_space` feature instead of by OpenCL version. Guard the private, global, and local overloads using the internal `__opencl_c_named_address_space_builtins` feature. Differential Revision: https://reviews.llvm.org/D107769	2022-01-31 10:21:05 +00:00
Anton Zabaznov	a5de66c4c5	[OpenCL] Add support of __opencl_c_device_enqueue feature macro. This feature requires support of __opencl_c_generic_address_space and __opencl_c_program_scope_global_variables so diagnostics for that is provided as well. Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D115640	2022-01-27 14:25:59 +03:00
hyeongyu kim	1b1c8d83d3	[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions. I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default. Test updates are made as a separate patch: D108453 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D105169	2022-01-16 18:54:17 +09:00
Matt Arsenault	33315ef321	clang/AMDGPU: Don't set implicit arg attribute to default size Since 2959e082e1427647e107af0b82770682eaa58fe1, we conservatively assume all inputs are enabled by default. This isn't the best interface for controlling these anyway, since it's not granular and only allows trimming the last fields.	2022-01-14 18:43:30 -05:00
Sven van Haastregt	4b85800bfd	[OpenCL] Set external linkage for block enqueue kernels All kernels can be called from the host as per the SPIR_KERNEL calling convention. As such, all kernels should have external linkage, but block enqueue kernels were created with internal linkage. Reported-by: Pedro Olsen Ferreira Differential Revision: https://reviews.llvm.org/D115523	2022-01-12 13:30:09 +00:00
Philip Reames	0b09313cd5	[funcattrs] Infer writeonly argument attribute [part 2] This builds on the code from D114963, and extends it to handle calls both direct and indirect. With the revised code structure (from series of previously landed NFCs), this is pretty straight forward. One thing to note is that we can not infer writeonly for arguments which might be captured. If the pointer can be read back by the caller, and then read through, we have no way to track that. This is the same restriction we have for readonly, except that we get no mileage out of the "callee can be readonly" exception since a writeonly param on a readonly function is either a) readnone or b) UB. This means we can't actually infer much unless nocapture has already been inferred. Differential Revision: https://reviews.llvm.org/D115003	2022-01-04 09:07:54 -08:00
Aaron Ballman	6c75ab5f66	Introduce _BitInt, deprecate _ExtInt WG14 adopted the _ExtInt feature from Clang for C23, but renamed the type to be _BitInt. This patch does the vast majority of the work to rename _ExtInt to _BitInt, which accounts for most of its size. The new type is exposed in older C modes and all C++ modes as a conforming extension. However, there are functional changes worth calling out: * Deprecates _ExtInt with a fix-it to help users migrate to _BitInt. * Updates the mangling for the type. * Updates the documentation and adds a release note to warn users what is going on. * Adds new diagnostics for use of _BitInt to call out when it's used as a Clang extension or as a pre-C23 compatibility concern. * Adds new tests for the new diagnostic behaviors. I want to call out the ABI break specifically. We do not believe that this break will cause a significant imposition for early adopters of the feature, and so this is being done as a full break. If it turns out there are critical uses where recompilation is not an option for some reason, we can consider using ABI tags to ease the transition.	2021-12-06 12:52:01 -05:00
Jay Foad	2774bad112	[AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args The ray_origin, ray_dir and ray_inv_dir arguments should all be vec3 to match how the hardware instruction works. Don't change the API of the corresponding OpenCL builtins. Differential Revision: https://reviews.llvm.org/D115032	2021-12-04 10:32:11 +00:00
Philip Reames	740057d185	[funcattrs] Infer writeonly argument attribute This change extends the current logic for inferring readonly and readnone argument attributes to also infer writeonly. This change is deliberately minimal; there's a couple of areas for follow up. * I left out all call handling and thus any benefit from the SCC walk. When examining the test changes, I realized the existing code is imprecise, and am going to fix that in it's own revision before adding in the writeonly handling. (Mostly because updating the tests is hard when I, the human, can't figure out whether the result is correct.) * I left out handling for storing a value (as opposed to storing to a pointer). This should benefit readonly/readnone as well, and applies to a bunch of other instructions. Seemed worth having as a separate review. Differential Revision: https://reviews.llvm.org/D114963	2021-12-02 13:04:09 -08:00
skc7	16b781e6d1	[AMDGPU][clang] Fix __builtin_nontemporal_store() failure on AMDGPU Reviewed By: yaxunl, sameerds Differential Revision: https://reviews.llvm.org/D114849	2021-12-02 05:53:25 +00:00
hyeongyu kim	fd9b099906	Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default" This reverts commit aacfbb953eb705af2ecfeb95a6262818fa85dd92. Revert "Fix lit test failures in CodeGenCoroutines" This reverts commit 63fff0f5bffe20fa2c84a45a41161afa0043cb34.	2021-11-09 02:15:55 +09:00
Anastasia Stulova	a10a69fe9c	[SPIR-V] Add SPIR-V triple and clang target info. Add new triple and target info for ‘spirv32’ and ‘spirv64’ and, thus, enabling clang (LLVM IR) code emission to SPIR-V target. The target for SPIR-V is mostly reused from SPIR by derivation from a common base class since IR output for SPIR-V is mostly the same as SPIR. Some refactoring are made accordingly. Added and updated tests for parts that are different between SPIR and SPIR-V. Patch by linjamaki (Henry Linjamäki)! Differential Revision: https://reviews.llvm.org/D109144	2021-11-08 13:34:10 +00:00
hyeongyukim	aacfbb953e	[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions. I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default. Test updates are made as a separate patch: D108453 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D105169 [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2) This patch updates test files after D105169. Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows: (1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached. (2) The remaining tests are updated manually. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D108453 Resolve lit failures in clang after 8ca4b3e's land Fix lit test failures in clang-ppc* and clang-x64-windows-msvc Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc Fix internal_clone(aarch64) inline assembly	2021-11-06 19:19:22 +09:00
Juneyoung Lee	89ad2822af	Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default" This reverts commit 7584ef766a7219b6ee5a400637206d26e0fa98ac.	2021-11-06 15:39:19 +09:00
Juneyoung Lee	7584ef766a	[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions. I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default. Test updates are made as a separate patch: D108453 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D105169	2021-11-06 15:36:42 +09:00
Arthur Eubanks	cb5a10199b	[test] Remove tests pinned to the legacy PM Now that the legacy PM is deprecated for the optimization pipeline, we can start deleting legacy PM tests. For tests that test both PMs, merge the RUN lines. Delete tests specific to the legacy PM.	2021-10-18 16:40:46 -07:00
Juneyoung Lee	f193bcc701	Revert D105169 due to the two-stage failure in ASAN This reverts the following commits: 37ca7a795b277c20c02a218bf44052278c03344b 9aa6c72b92b6c89cc6d23b693257df9af7de2d15 705387c5074bcca36d626882462ebbc2bcc3bed4 8ca4b3ef19fe82d7ad6a6e1515317dcc01b41515 80dba72a669b5416e97a42fd2c2a7bc5a6d3f44a	2021-10-18 23:52:46 +09:00

1 2 3 4 5 ...

602 Commits