llvm-project

Author	SHA1	Message	Date
Emma Pilkington	4490003a22	[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905 ) The previous name 'amdgpu_code_object_version', was misleading since this is really a property of the HSA OS. The new spelling also matches the asm directive I added in bc82cfb.	2024-03-06 09:51:48 -05:00
Pierre van Houtryve	fe2f67e4ba	[AMDGPU] Remove Code Object V2 (#65715 ) Code Object V2 has been deprecated for more than a year now. We can safely remove it from LLVM. - [clang] Remove support for the `-mcode-object-version=2` option. - [lld] Remove/refactor tests that were still using COV2 - [llvm] Update AMDGPUUsage.rst - Code Object V2 docs are left for informational purposes because those code objects may still be supported by the runtime/loaders for a while. - [AMDGPU] Remove COV2 emission capabilities. - [AMDGPU] Remove `MetadataStreamerYamlV2` which was only used by COV2 - [AMDGPU] Update all tests that were still using COV2 - They are either deleted or ported directly to code object v4 (as v3 is also planned to be removed soon).	2023-09-21 12:00:45 +02:00
Jay Foad	ceb68eea8c	[AMDGPU] Remove repeated -mtriple options from RUN lines (#66486 )	2023-09-15 11:29:24 +01:00
Fangrui Song	806761a762	[test] Change llc -march= to -mtriple= The issue is uncovered by #47698: for IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. riscv64-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly.	2023-09-11 14:42:37 -07:00
Changpeng Fang	54cf69c9d5	AMDGPU: Use module flag to get code object version at IR level Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line. In case the module flag is missing, we use the current default code object version supported in the compiler. For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code object version, That will be in a separate patch later. For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file. In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D14313	2023-02-02 18:57:26 -08:00
Pierre van Houtryve	a88deb4b65	[AMDGPU] Use aperture registers instead of S_GETREG Fixes a longstanding TODO in the codebase where we were using S_GETREG + shift to do something that could simply be done with an inline constant (register). Patch based on D31874 by @kzhuravl Depends on D137767 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D137542	2022-11-30 12:25:10 +00:00
Matt Arsenault	6a91a5e826	AMDGPU: Convert some cast tests to opaque pointers	2022-11-29 19:04:58 -05:00
Alexander Timofeev	fbdea5a2e9	[AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler. It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D133593	2022-09-15 22:03:56 +02:00
Jay Foad	f510045d82	[CodeGen] Remove unneeded regex escaping in FileCheck patterns. NFC. Take advantage of D117117 to simplify all {{\[}} to [ and {{\]}} to ]. Differential Revision: https://reviews.llvm.org/D117298	2022-02-18 16:10:56 +00:00
Matt Arsenault	33b45ee44b	AMDGPU: Handle addrspacecast of constant 32-bit to flat I accidentally made this work on the GlobalISel path, and there's no real reason not to handle this.	2022-01-27 11:01:44 -05:00
David Salinas	c0581f7df6	Revert D109159 : Revert "[amdgpu] Enable selection of `s_cselect_b64`." This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30. That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort). Reverting until we have a better solution to s_cselect_b64 codegen cleanup Change-Id: Ifc167b3c2dae7a65920676f22a97ba76485f3456 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D116686 Change-Id: I1abf49b74a7e2ba0e0205f747a4154a468b9d7f2	2022-01-11 21:14:09 +00:00
Nico Weber	085f078307	Revert "Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`."" This reverts commit 859ebca744e634dcc89a2294ffa41574f947bd62. The change contained many unrelated changes and e.g. restored unit test failes for the old lld port.	2022-01-05 13:10:25 -05:00
David Salinas	859ebca744	Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`." This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30. That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort). Reverting until we have a better solution to s_cselect_b64 codegen cleanup Change-Id: Ibf8e397df94001f248fba609f072088a46abae08 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D115960 Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105	2022-01-05 17:57:32 +00:00
Matt Arsenault	729bf9b26b	AMDGPU: Enable fixed function ABI by default Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on uses. This resulted in more argument shuffling code anyway. Also have the option stop implying all inputs need to be passed. This will no rely on the amdgpu-no-* attributes to avoid passing unnecessary values.	2021-12-04 10:49:18 -05:00
Michael Liao	640beb38e7	[amdgpu] Enable selection of `s_cselect_b64`. Differential Revision: https://reviews.llvm.org/D109159	2021-09-07 10:45:07 -04:00
alex-t	ed0f4415f0	[AMDGPU] Divergence-driven compare operations instruction selection Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode divergence flag. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D106079	2021-08-25 18:30:49 +03:00
Stanislav Mekhanoshin	4a3b055653	[AMDGPU] Fix flags of V_MOV_B64_PSEUDO In particular it was not rematerializable. Differential Revision: https://reviews.llvm.org/D105724	2021-07-09 12:49:28 -07:00
Sebastian Neubauer	96e1fcb1e0	[AMDGPU] Use s_add_i32 for address additions This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction. Differential Revision: https://reviews.llvm.org/D103322	2021-06-07 16:09:48 +02:00
madhur13490	5682ae2fc6	[AMDGPU] Set implicit arg attributes for indirect calls This patch adds attributes corresponding to implicits to functions/kernels if 1. it has an indirect call OR 2. it's address is taken. Once such attributes are set, rest of the codegen would work out-of-box for indirect calls. This patch eliminates the potential overhead -fixed-abi imposes even though indirect functions calls are not used. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D99347	2021-04-13 13:15:13 +00:00
madhur13490	3c297a2564	Make fixed-abi default for AMD HSA OS fixed-abi uses pre-defined and predictable SGPR/VGPRs for passing arguments. This patch makes this scheme default when HSA OS is specified in triple. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96340	2021-02-19 15:05:25 +00:00
Matt Arsenault	d2e52eec51	AMDGPU: Select global saddr mode from SGPR pointer Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer instruction to materialize the 0 vs. the 64-bit copy.	2020-11-16 11:51:06 -05:00
Stanislav Mekhanoshin	5ab1702129	[AMDGPU] Remove scratch rsrc from spill pseudos Differential Revision: https://reviews.llvm.org/D91110	2020-11-12 15:23:37 -08:00
Konstantin Zhuravlyov	3fdf3b1539	AMDGPU: Update AMDHSA code object version handling Differential Revision: https://reviews.llvm.org/D89076	2020-10-14 13:04:27 -04:00
Piotr Sobczak	62d8b8a225	Fix 64-bit copy to SCC Fix 64-bit copy to SCC by restricting the pattern resulting in such a copy to subtargets supporting 64-bit scalar compare, and mapping the copy to S_CMP_LG_U64. Before introducing the S_CSELECT pattern with explicit SCC (0045786f146e78afee49eee053dc29ebc842fee1), there was no need for handling 64-bit copy to SCC ($scc = COPY sreg_64). The proposed handling to read only the low bits was however based on a false premise that it is only one bit that matters, while in fact the copy source might be a vector of booleans and all bits need to be considered. The practical problem of mapping the 64-bit copy to SCC is that the natural instruction to use (S_CMP_LG_U64) is not available on old hardware. Fix it by restricting the problematic pattern to subtargets supporting the instruction (hasScalarCompareEq64). Differential Revision: https://reviews.llvm.org/D85207	2020-08-09 20:50:30 +02:00
Piotr Sobczak	0045786f14	[AMDGPU] Select s_cselect Summary: Add patterns to select s_cselect in the isel. Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Re-commit D81925 with a bugfix D82370. Differential Revision: https://reviews.llvm.org/D81925 Differential Revision: https://reviews.llvm.org/D82370	2020-06-25 10:38:23 +02:00
Matt Arsenault	778351df77	Revert "[AMDGPU] Enable compare operations to be selected by divergence" This reverts commit 521ac0b5cea02f629d035f807460affbb65ae7ad. Reported to break thousands of piglit tests.	2020-06-24 11:21:30 -04:00
alex-t	521ac0b5ce	[AMDGPU] Enable compare operations to be selected by divergence Summary: Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82194	2020-06-24 11:50:40 +03:00
Piotr Sobczak	6d9565d6d5	Revert "[AMDGPU] Select s_cselect" This caused some failures detected by the buildbot with expensive checks enabled. This reverts commit 4067de569f119a81419fbf2e79d5f3307dfdda5b.	2020-06-19 16:41:04 +02:00
Piotr Sobczak	4067de569f	[AMDGPU] Select s_cselect Summary: Add patterns to select s_cselect in the isel. Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81925	2020-06-19 16:17:46 +02:00
Matt Arsenault	bb10fa3a53	AMDGPU: Fix wrong null value for private address space I'm guessing this was a holdover from when 0 was an invalid stack pointer, but surprised nobody has discovered this before. Also don't allow offset folding for -1 pointers, since it looks weird to partially fold this.	2020-05-26 16:35:13 -04:00
Matt Arsenault	074b802654	AMDGPU: Fix DAG divergence for implicit function arguments This should be directly implied from the register class, and there's no need to special case live ins here. This was getting the wrong answer for the queue ptr argument in callable functions, since it's not an explicit IR argument and is always uniform. Fixes not using scalar loads for the aperture in addrspacecast lowering, and any other places that use implicit SGPR arguments.	2020-05-19 18:11:34 -04:00
Matt Arsenault	fda0c8df28	AMDGPU: Lower addrspacecast to 32-bit constant Somehow this was missing from the DAG path, but not global isel.	2020-05-08 10:46:00 -04:00
Scott Linder	60b1967c39	[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI. As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative. Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first. Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75138	2020-03-19 15:35:16 -04:00
Konstantin Zhuravlyov	a25e0524c0	AMDGPU: Enable code object v3 for AMDHSA only Differential Revision: https://reviews.llvm.org/D54186 llvm-svn: 346923	2018-11-15 02:32:43 +00:00
Konstantin Zhuravlyov	2d22d24ac4	Revert r345542: AMDGPU: Enable code object v3 by default It breaks mesa. llvm-svn: 345662	2018-10-30 22:02:40 +00:00
Konstantin Zhuravlyov	5cb950200c	AMDGPU: Enable code object v3 by default Differential Revision: https://reviews.llvm.org/D53525 llvm-svn: 345542	2018-10-29 21:07:27 +00:00
Yaxun Liu	0124b5484c	[AMDGPU] Change constant addr space to 4 Differential Revision: https://reviews.llvm.org/D43170 llvm-svn: 325030	2018-02-13 18:00:25 +00:00
Yaxun Liu	2a22c5deff	[AMDGPU] Switch to the new addr space mapping by default This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101	2018-02-02 16:07:16 +00:00
Stanislav Mekhanoshin	62875fcd6c	[AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32 Differential Revision: https://reviews.llvm.org/D41617 llvm-svn: 322500	2018-01-15 18:49:15 +00:00
Matt Arsenault	8728c5f2db	AMDGPU: Cleanup subtarget features Try to avoid mutually exclusive features. Don't use a real default GPU, and use a fake "generic". The goal is to make it easier to see which set of features are incompatible between feature strings. Most of the test changes are due to random scheduling changes from not having a default fullspeed model. llvm-svn: 310258	2017-08-07 14:58:04 +00:00
Matt Arsenault	4e309b0861	AMDGPU: Start selecting global instructions llvm-svn: 309470	2017-07-29 01:03:53 +00:00
Matt Arsenault	9aa45f047f	AMDGPU: Add macro fusion schedule DAG mutation Try to increase opportunities to shrink vcc uses. llvm-svn: 307313	2017-07-06 20:57:05 +00:00
Matt Arsenault	0774ea267a	AMDGPU: Select scratch mubuf offsets when pointer is a constant In call sequence setups, there may not be a frame index base and the pointer is a constant offset from the frame pointer / scratch wave offset register. llvm-svn: 301230	2017-04-24 19:40:59 +00:00
Konstantin Zhuravlyov	4b3847e865	AMDGPU/GFX9: Fix shared and private aperture queries Differential Revision: https://reviews.llvm.org/D31786 llvm-svn: 299727	2017-04-06 23:02:33 +00:00
Matt Arsenault	3dbeefa978	AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444	2017-03-21 21:39:51 +00:00
Matt Arsenault	971c85ebb4	AMDGPU: Treat 0 as private null pointer in addrspacecast lowering llvm-svn: 297658	2017-03-13 19:47:31 +00:00
Matt Arsenault	e823d92f7f	AMDGPU: Merge initial gfx9 support llvm-svn: 295554	2017-02-18 18:29:53 +00:00
Matt Arsenault	417e0072d6	AMDGPU: Enable InferAddressSpaces llvm-svn: 294408	2017-02-08 06:16:04 +00:00
Matt Arsenault	5d8eb25e78	AMDGPU: Use unsigned compare for eq/ne For some reason there are both of these available, except for scalar 64-bit compares which only has u64. I'm not sure why there are both (I'm guessing it's for the one bit inputs we don't use), but for consistency always using the unsigned one. llvm-svn: 282832	2016-09-30 01:50:20 +00:00
Matt Arsenault	bbb47da8a1	AMDGPU: Support commuting with immediate in src0 llvm-svn: 280970	2016-09-08 17:19:29 +00:00

1 2

60 Commits