llvm-project

Author	SHA1	Message	Date
Shilei Tian	7dbd6cd294	[AMDGPU][Attributor] Make `AAAMDFlatWorkGroupSize` honor existing attribute (#114357 ) If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by taking its value directly; otherwise, it uses the default range as a starting point. We will no longer manipulate the known range, which can cause issues because the known range is a "throttle" to the assumed range such that the assumed range can't get widened properly in `updateImpl` if the known range is not set properly for whatever reasons. Another benefit of not touching the known range is, if we indicate pessimistic state, it also invalidates the AA such that `manifest` will not be called. Since we honor the attribute, we don't want and will not add any half-baked attribute added to a function.	2024-12-11 16:47:51 -05:00
Jun Wang	41ed16c3b3	Reapply "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647 )" (#118907 ) This reverts commit 1ef9410a96c1d9669a6feaf03fcab8d0a4a13bd5. This fixes the test file attributor-flatscratchinit-globalisel.ll.	2024-12-09 16:44:48 -08:00
Philip Reames	1ef9410a96	Revert "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647 )" This reverts commit e6aec2c12095cc7debd1a8004c8535eef41f4c36. Commit breaks "ninja check-llvm" on x86 host.	2024-12-04 15:37:25 -08:00
Jun Wang	e6aec2c120	[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647 ) The AMDGPUAnnotateKernelFeatures pass infers the "amdgpu-calls" and "amdgpu-stack-objects" attributes, which are used to infer whether we need to initialize flat scratch. This is, however, not precise. Instead, we should use AMDGPUAttributor and infer amdgpu-no-flat-scratch-init on kernels. Refer to https://github.com/llvm/llvm-project/issues/63586 .	2024-12-04 14:10:15 -08:00
Matt Arsenault	b6b703b2df	AMDGPU: Infer no-agpr usage in AMDGPUAttributor (#85948 ) SIMachineFunctionInfo has a scan of the function body for inline asm which may use AGPRs, or callees in SIMachineFunctionInfo. Move this into the attributor, so it actually works interprocedurally. Could probably avoid most of the test churn if this bothered to avoid adding this on subtargets without AGPRs. We should also probably try to delete the MIR scan in usesAGPRs but it seems to be trickier to eliminate.	2024-03-21 14:24:06 +05:30
Emma Pilkington	4490003a22	[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905 ) The previous name 'amdgpu_code_object_version', was misleading since this is really a property of the HSA OS. The new spelling also matches the asm directive I added in bc82cfb.	2024-03-06 09:51:48 -05:00
Pierre van Houtryve	500846d2f5	[AMDGPU] Introduce Code Object V6 (#76954 ) Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are part of the follow-up patch #76955	2024-02-05 08:19:53 +01:00
Matt Arsenault	d34a10a47d	AMDGPU: Port AMDGPUAttributor to new pass manager (#71349 )	2023-11-07 15:40:40 +09:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Matt Arsenault	b9c6d9e6c3	AMDGPU: Propagate amdgpu-waves-per-eu with attributor This will do a value range merging down the callgraph, unlike the current pass which can only propagate values to undecorated functions from a kernel. This one is a bit weird due to the interaction with the implied range from amdgpu-flat-workgroup-size. At the default group range of 1,1024, the minimum implied bounds is 4 so this ends up introducing the attribute on undecorated functions. We could probably simplify this by ignoring it and propagating the raw values. The subtarget interaction and the interaction with amdgpu-flat-workgroup-size only really clamp invalid values (plus the lower bound doesn't seem to do anything as far as I can tell anyway).	2023-06-16 15:04:08 -04:00
Changpeng Fang	54cf69c9d5	AMDGPU: Use module flag to get code object version at IR level Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line. In case the module flag is missing, we use the current default code object version supported in the compiler. For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code object version, That will be in a separate patch later. For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file. In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D14313	2023-02-02 18:57:26 -08:00
Matt Arsenault	4d4894ab92	Partially reapply "AMDGPU: Invert handling of enqueued block detection" This mostly reverts commit 270e96f435596449002fc89962595497481c8770. Keep the attributor related changes around, but functionally restore the old behavior as a workaround. Device enqueue goes back to not working at -O0 with this version.	2023-01-12 15:02:16 -05:00
Matt Arsenault	270e96f435	Revert "AMDGPU: Invert handling of enqueued block detection" This reverts commit 47288cc977fa31c44cc92b4e65044a5b75c2597e. The runtime is having trouble with this at -O0 when the inputs are always enabled.	2023-01-07 21:48:07 -05:00
Matt Arsenault	47288cc977	AMDGPU: Invert handling of enqueued block detection Invert the sense of the attribute and let the attributor figure this out like everything else. If needed we can have the not-OpenCL languages set amdgpu-no-default-queue and amdgpu-no-completion-action up front so they never have to pay the cost. There are also so many of these now, the offset use API should probably consider all of them at once. Maybe they should merge into one attribute with used fields. Having separate functions for each field in AMDGPUBaseInfo is also not the greatest API (might as well fix this when the patch to get the object version from the module lands).	2023-01-06 21:16:08 -05:00

14 Commits