llvm-project

Author	SHA1	Message	Date
Saiyedul Islam	777b6de7a4	[AMDGPU][NFC] Test autogenerated llc tests for COV5 (#74339 ) Regenerate a few llc tests to test for COV5 instead of the default ABI version.	2023-12-12 14:35:13 +05:30
Matt Arsenault	d34a10a47d	AMDGPU: Port AMDGPUAttributor to new pass manager (#71349 )	2023-11-07 15:40:40 +09:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Saiyedul Islam	466a8149b3	Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 )" (#66060 ) This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.	2023-09-12 15:13:59 +05:30
Saiyedul Islam	0a8d17e79b	[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 ) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed By: arsenm, jhuber6 Github PR: #65410 Differential Revision: https://reviews.llvm.org/D129818	2023-09-12 13:53:31 +05:30
Matt Arsenault	b9c6d9e6c3	AMDGPU: Propagate amdgpu-waves-per-eu with attributor This will do a value range merging down the callgraph, unlike the current pass which can only propagate values to undecorated functions from a kernel. This one is a bit weird due to the interaction with the implied range from amdgpu-flat-workgroup-size. At the default group range of 1,1024, the minimum implied bounds is 4 so this ends up introducing the attribute on undecorated functions. We could probably simplify this by ignoring it and propagating the raw values. The subtarget interaction and the interaction with amdgpu-flat-workgroup-size only really clamp invalid values (plus the lower bound doesn't seem to do anything as far as I can tell anyway).	2023-06-16 15:04:08 -04:00
Matt Arsenault	4d4894ab92	Partially reapply "AMDGPU: Invert handling of enqueued block detection" This mostly reverts commit 270e96f435596449002fc89962595497481c8770. Keep the attributor related changes around, but functionally restore the old behavior as a workaround. Device enqueue goes back to not working at -O0 with this version.	2023-01-12 15:02:16 -05:00
Matt Arsenault	270e96f435	Revert "AMDGPU: Invert handling of enqueued block detection" This reverts commit 47288cc977fa31c44cc92b4e65044a5b75c2597e. The runtime is having trouble with this at -O0 when the inputs are always enabled.	2023-01-07 21:48:07 -05:00
Matt Arsenault	47288cc977	AMDGPU: Invert handling of enqueued block detection Invert the sense of the attribute and let the attributor figure this out like everything else. If needed we can have the not-OpenCL languages set amdgpu-no-default-queue and amdgpu-no-completion-action up front so they never have to pay the cost. There are also so many of these now, the offset use API should probably consider all of them at once. Maybe they should merge into one attribute with used fields. Having separate functions for each field in AMDGPUBaseInfo is also not the greatest API (might as well fix this when the patch to get the object version from the module lands).	2023-01-06 21:16:08 -05:00
Matt Arsenault	262c2c0fd2	AMDGPU: Update some tests to use opaque pointers vectorize-buffer-fat-pointer.ll required a manual check line fix. vector-alloca-addrspacecast.ll required a manual fixup of a check line. partial-regcopy-and-spill-missed-at-regalloc.ll required re-running update_mir_test_checks. The HSA metadata tests required avoiding the script touching the type name in the metadata. annotate-noclobber.ll ran into one update script bug. It deleted a check line with a 0 offset GEP, moving the following -NEXT check logically up one line.	2022-12-19 09:28:58 -05:00
Johannes Doerfert	f6e3a89cc0	[AMDGPU] Annotate the intrinsics to be default and nocallback Differential Revision: https://reviews.llvm.org/D135155	2022-12-07 14:25:25 -08:00
Ron Lieberman	ca856fff1c	Revert "enable code-object-version=5" very sorry wrong repo. This reverts commit d882ba7aeac4b496dccd1b10cb58bd691786b691.	2022-11-29 15:21:09 -06:00
Ron Lieberman	d882ba7aea	enable code-object-version=5	2022-11-29 15:11:57 -06:00
Nikita Popov	304f1d59ca	[IR] Switch everything to use memory attribute This switches everything to use the memory attribute proposed in https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579. The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly attributes are dropped. The readnone, readonly and writeonly attributes are restricted to parameters only. The old attributes are auto-upgraded both in bitcode and IR. The bitcode upgrade is a policy requirement that has to be retained indefinitely. The IR upgrade is mainly there so it's not necessary to update all tests using memory attributes in this patch, which is already large enough. We could drop that part after migrating tests, or retain it longer term, to make it easier to import IR from older LLVM versions. High-level Function/CallBase APIs like doesNotAccessMemory() or setDoesNotAccessMemory() are mapped transparently to the memory attribute. Code that directly manipulates attributes (e.g. via AttributeList) on the other hand needs to switch to working with the memory attribute instead. Differential Revision: https://reviews.llvm.org/D135780	2022-11-04 10:21:38 +01:00
Jon Chesterfield	3a20597776	[amdgpu] Implement lds kernel id intrinsic Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it. There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue. It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address. The intent is to emit a __const array of LDS addresses and index into it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D125060	2022-07-19 17:46:19 +01:00
Changpeng Fang	8edaf25986	AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally Summary: Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default. We use implicitarg_ptr + offset to check whether the multigrid synchronization pointer is used. If yes, we remove this attribute and also remove amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function. Reviewers: arsenm, sameerds, b-sumner and foad Differential Revision: https://reviews.llvm.org/D123548	2022-04-12 12:36:30 -07:00
Changpeng Fang	ca62b1db9f	[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg Summary: Emit metadata for hidden_heap_v1 kernarg Reviewers: sameerds, b-sumner Fixes: SWDEV-307188 Differential Revision: https://reviews.llvm.org/D119027	2022-02-25 10:45:35 -08:00
Sameer Sahasrabuddhe	d8f99bb6e0	[AMDGPU] replace hostcall module flag with function attribute The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph. If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument. The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument. Reviewed By: jdoerfert, arsenm, kpyzhov Differential Revision: https://reviews.llvm.org/D119216	2022-02-11 22:51:56 +05:30
Matt Arsenault	0eebe2e36c	AMDGPU: Sanitized functions require implicit arguments Do not infer no-amdgpu-implicitarg-ptr for sanitized functions. If a function is explicitly marked amdgpu-no-implicitarg-ptr and sanitize_address, infer that it is required.	2021-12-02 17:55:43 -05:00
Matt Arsenault	db4963d080	AMDGPU: Use attributor to propagate uniform-work-group-size Drop the legacy version in AMDGPUAnnotateKernelFeatures. This has the side effect of now respecting the linkage, and not changing externally visible functions.	2021-09-09 18:24:28 -04:00
Matt Arsenault	722b8e0e5a	AMDGPU: Invert ABI attribute handling Previously we assumed all callable functions did not need any implicitly passed inputs, and added attributes to functions to indicate when they were necessary. Requiring attributes for correctness is pretty ugly, and it makes supporting indirect and external calls more complicated. This inverts the direction of the attributes, so an undecorated function is assumed to need all implicit imputs. This enables AMDGPUAttributor by default to mark when functions are proven to not need a given input. This strips the equivalent functionality from the legacy AMDGPUAnnotateKernelFeatures pass. However, AMDGPUAnnotateKernelFeatures is not fully removed at this point although it should be in the future. It is still necessary for the two hacky amdgpu-calls and amdgpu-stack-objects attributes, which would be better served by a trivial analysis on the IR during selection. Additionally, AMDGPUAnnotateKernelFeatures still redundantly handles the uniform-work-group-size attribute to be removed in a future commit. At this point when not using -amdgpu-fixed-function-abi, we are still modifying the ABI based on these newly negated attributes. In the future, this option will be removed and the locations for implicit inputs will always be fixed. We will then use the new attributes to avoid passing the values when unnecessary.	2021-09-09 18:24:28 -04:00
Matt Arsenault	088cc63640	AMDGPU: Invert AMDGPUAttributor Switch to using BitIntegerState for each of the inputs, and invert their meanings. This now diverges more from the old AMDGPUAnnotateKernelFeatures, but this isn't used yet anyway.	2021-08-26 21:32:13 -04:00
Matt Arsenault	46d82e7357	AMDGPU: Restrict attributor transforms We only really want this to add the custom attributes. Theoretically the regular transforms were already run at this point. Touching undefined behavior breaks a lot of tests when this is enabled by default, many of which are expecting to test handling of undef operations.	2021-08-26 21:08:51 -04:00
Matt Arsenault	cf32d61a05	AMDGPU: Remove hacky attribute deduction from AMDGPUAttributor amdgpu-calls and amdgpu-stack-objects don't really belong as attributes, and are currently a hacky way of passing an analysis into the DAG. These don't really belong in the IR, and don't really fit in with the other attributes. Remove these to facilitate inverting the pass. I don't exactly understand the indirect call test changes. These tests are using calls which are trivially replacable with a direct call, so I'm not sure what the point is.	2021-08-26 20:31:14 -04:00
Matt Arsenault	98d7aa435f	AMDGPU: Stop inferring use of llvm.amdgcn.kernarg.segment.ptr We no longer use this intrinsic outside of the backend and no longer support using it outside of kernels.	2021-08-26 20:30:03 -04:00
Matt Arsenault	a77ae4aa6a	AMDGPU: Stop attributor adding attributes to intrinsic declarations	2021-08-13 20:51:48 -04:00
Matt Arsenault	152ceec1ae	AMDGPU: Add indirect and extern calls to attributor test	2021-08-13 20:45:53 -04:00
Johannes Doerfert	a420f80bf1	[Attributor] Do not delete volatile stores to null/undef See D106309. Differential Revision: https://reviews.llvm.org/D107906	2021-08-12 10:39:52 -05:00
Johannes Doerfert	cdb4cfe8b3	[Attributor][FIX] Update AMDGPU attributor test The test contains UB and should be improved, for now we update the check lines pass it.	2021-07-27 00:23:47 -05:00
Kuter Dinel	96709823ec	[AMDGPU] Deduce attributes with the Attributor This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes. Reviewed By: jdoerfert, arsenm Differential Revision: https://reviews.llvm.org/D104997	2021-07-24 06:07:15 +03:00
Kuter Dinel	a7749c3f79	[AMDGPU] Use update_test_checks.py script for annotate kernel features tests. This patch makes the annotate kernel features tests use the update_tests_checks.py script. Which makes it easy to update the tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D105864	2021-07-15 03:13:37 +03:00
madhur13490	6a4d9cb7e0	[AMDGPU] Remove error check for indirect calls and add missing queue-ptr This patch removes -fixed-abi check for indirect calls and also adds queue-ptr which is required for indirect calls to work. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D100633	2021-04-20 00:35:17 +05:30
madhur13490	5682ae2fc6	[AMDGPU] Set implicit arg attributes for indirect calls This patch adds attributes corresponding to implicits to functions/kernels if 1. it has an indirect call OR 2. it's address is taken. Once such attributes are set, rest of the codegen would work out-of-box for indirect calls. This patch eliminates the potential overhead -fixed-abi imposes even though indirect functions calls are not used. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D99347	2021-04-13 13:15:13 +00:00
Matt Arsenault	f0abefaf50	AMDGPU: Add IntrWillReturn to intrinsic definitions This should probably be implied for all the speculatable ones. I think the only ones where this plausibly doesn't apply is s_sendmsghalt and maybe kill.	2020-06-18 15:38:10 -04:00
Matt Arsenault	6bfe28e92f	AMDGPU: Fix annotate kernel features through casted calls I thought I was testing this before, but the workitem id x case isn't great since it's mandatory in the parent kernel.	2020-04-04 20:44:44 -04:00
Matt Arsenault	bb8622094d	AMDGPU: Don't handle kernarg.segment.ptr in functions Just lower this to null. Pass implicitarg.ptr in its place in the argument list.	2020-03-13 12:51:12 -07:00
Matt Arsenault	ccc6e780c8	AMDGPU: Directly annotate functions if they have calls Currently we infer whether the flat-scratch-init kernel input should be enabled based on calls. Move this handling, so we can decide if the full set of ABI inputs is needed in kernels. Ideally we would have an analysis of some sort, rather than the function attributes.	2020-03-12 19:10:59 -04:00
Aakanksha Patil	c56d2afc63	AMDGPU: Handle "uniform-work-group-size" attribute (fix for RADV) A previous patch for "uniform-work-group-size" attribute was found to break some RADV and possibly radeon SI tests and had to be retracted. This patch fixes that. Differential Revision: http://reviews.llvm.org/D58993 llvm-svn: 355574	2019-03-07 00:54:04 +00:00
Aakanksha Patil	bc568766b2	Revert r348971: [AMDGPU] Support for "uniform-work-group-size" attribute This patch breaks RADV (and probably RadeonSI as well) llvm-svn: 349084	2018-12-13 21:23:12 +00:00
Aakanksha Patil	729309cc89	[AMDGPU] Support for "uniform-work-group-size" attribute Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute from the kernel to the called functions. Once this pass is run, all kernels, even the ones which initially did not have the attribute, will be able to indicate weather or not they have uniform work group size depending on the value of the attribute. Differential Revision: https://reviews.llvm.org/D50200 llvm-svn: 348971	2018-12-12 20:49:17 +00:00
Matt Arsenault	72d27f5525	AMDGPU: Fix tests using old number for constant address space llvm-svn: 341770	2018-09-10 02:54:25 +00:00
Yaxun Liu	0124b5484c	[AMDGPU] Change constant addr space to 4 Differential Revision: https://reviews.llvm.org/D43170 llvm-svn: 325030	2018-02-13 18:00:25 +00:00
Matt Arsenault	9166ce86e8	AMDGPU: Annotate implicitarg.ptr usage We need to pass something to functions for this to work. It isn't derivable just from the kernarg segment pointer because the implicit arguments are placed after the kernel arguments. Also fixes missing test for the intrinsic. llvm-svn: 309398	2017-07-28 15:52:08 +00:00
Matt Arsenault	254ad3de5c	AMDGPU: Annotate necessity of flat-scratch-init As an approximation of the existing handling to avoid regressions. Fixes using too many registers with calls on subtargets with the SGPR allocation bug. llvm-svn: 308326	2017-07-18 16:44:58 +00:00
Matt Arsenault	e15855d9e3	AMDGPU: Annotate features from x work item/group IDs. This wasn't necessary before since they are always enabled for kernels, but this is necessary if they need to be forwarded to a callable function. llvm-svn: 308226	2017-07-17 22:35:50 +00:00
Matt Arsenault	23e4df6a59	AMDGPU: Detect kernarg segment pointer This is necessary to pass the kernarg segment pointer to callee functions. Also don't unconditionally enable for kernels. llvm-svn: 307978	2017-07-14 00:11:13 +00:00
Matt Arsenault	6b93046f29	AMDGPU: Annotate call graph with used features Previously this wouldn't detect used features indirectly used in callee functions. llvm-svn: 307967	2017-07-13 21:43:42 +00:00

47 Commits