llvm-project

Author	SHA1	Message	Date
Emma Pilkington	4490003a22	[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905 ) The previous name 'amdgpu_code_object_version', was misleading since this is really a property of the HSA OS. The new spelling also matches the asm directive I added in bc82cfb.	2024-03-06 09:51:48 -05:00
Diana Picus	bc6955f18c	[AMDGPU] Don't fix the scavenge slot at offset 0 (#79136 ) At the moment, the emergency spill slot is a fixed object for entry functions and chain functions, and a regular stack object otherwise. This patch adopts the latter behaviour for entry/chain functions too. It seems this was always the intention [1] and it will also save us a bit of stack space in cases where the first stack object has a large alignment. [1] `34c8b835b1`	2024-02-09 09:20:25 +01:00
Fangrui Song	9e9907f1cf	[AMDGPU,test] Change llc -march= to -mtriple= (#75982 ) Similar to 806761a7629df268c8aed49657aeccffa6bca449. For IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. amdgpu-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly. This patch changes AMDGPU tests to not rely on the default OS/environment components. Tests that need fixes are not changed: ``` LLVM :: CodeGen/AMDGPU/fabs.f64.ll LLVM :: CodeGen/AMDGPU/fabs.ll LLVM :: CodeGen/AMDGPU/floor.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.ll LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll LLVM :: CodeGen/AMDGPU/schedule-if-2.ll ```	2024-01-16 21:54:58 -08:00
Pierre van Houtryve	fe2f67e4ba	[AMDGPU] Remove Code Object V2 (#65715 ) Code Object V2 has been deprecated for more than a year now. We can safely remove it from LLVM. - [clang] Remove support for the `-mcode-object-version=2` option. - [lld] Remove/refactor tests that were still using COV2 - [llvm] Update AMDGPUUsage.rst - Code Object V2 docs are left for informational purposes because those code objects may still be supported by the runtime/loaders for a while. - [AMDGPU] Remove COV2 emission capabilities. - [AMDGPU] Remove `MetadataStreamerYamlV2` which was only used by COV2 - [AMDGPU] Update all tests that were still using COV2 - They are either deleted or ported directly to code object v4 (as v3 is also planned to be removed soon).	2023-09-21 12:00:45 +02:00
Jay Foad	ceb68eea8c	[AMDGPU] Remove repeated -mtriple options from RUN lines (#66486 )	2023-09-15 11:29:24 +01:00
Fangrui Song	806761a762	[test] Change llc -march= to -mtriple= The issue is uncovered by #47698: for IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. riscv64-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly.	2023-09-11 14:42:37 -07:00
Nikita Popov	9d73a8bdc6	[KnownBits] Make shl/lshr/ashr implementations optimal The implementations for shifts were suboptimal in the case where the max shift amount was >= bitwidth. In that case we should still use the usual code clamped to BitWidth-1 rather than just giving up entirely. Additionally, there was an implementation bug where the known zero bits for the individual shift amounts were not set in the shl/lshr implementations. I think after these changes, we'll be able to drop some of the code in ValueTracking which also evaluates all possible shift amounts and has been papering over this issue. For the "all poison" case I've opted to return an unknown value for now. It would be better to return zero, but this has fairly substantial test fallout, so I figured it's best to not mix it into this change. (The "correct" return value would be a conflict, but given that a lot of our APIs assert conflict-freedom, that's probably not the best idea to actually return.) Differential Revision: https://reviews.llvm.org/D150587	2023-05-16 09:44:26 +02:00
Changpeng Fang	54cf69c9d5	AMDGPU: Use module flag to get code object version at IR level Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line. In case the module flag is missing, we use the current default code object version supported in the compiler. For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code object version, That will be in a separate patch later. For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file. In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D14313	2023-02-02 18:57:26 -08:00
Matt Arsenault	75a3f22a90	AMDGPU: Convert some stack tests to opaque pointers	2022-12-01 21:32:20 -05:00
Matt Arsenault	1310aa1688	AMDGPU: Use -passes for amdgpu-promote-alloca tests	2022-11-16 17:14:48 -08:00
Matt Arsenault	3830e4e58c	AMDGPU: Create poison values instead of undef These placeholders don't care about the finer points on the difference between the two.	2022-11-16 14:47:24 -08:00
Carl Ritson	4c4db81630	[AMDGPU] Extend SILoadStoreOptimizer to s_load instructions Apply merging to s_load as is done for s_buffer_load. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130742	2022-07-30 11:38:39 +09:00
Sebastian Neubauer	96e1fcb1e0	[AMDGPU] Use s_add_i32 for address additions This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction. Differential Revision: https://reviews.llvm.org/D103322	2021-06-07 16:09:48 +02:00
Thomas Preud'homme	04419628e0	[AMDGPU, test] Fix use of undef FileCheck var Test CodeGen/AMDGPU/amdgpu.private-memory.ll and CodeGen/AMDGPU/private-memory-r600.ll have a block of CHECK directives whose prefix is inconsistent: R600-CHECK Vs R600. This leads to a R600-NOT directive using an undefined CHAN variable due to R600-CHECK directives never being considered by FileCheck. Fixing the prefix leads to the testcase failing. As per https://reviews.llvm.org/D99865#2675235 this commit removes the directives instead since it is not possible to write a reliable check. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D99865	2021-04-08 09:42:59 +01:00
Jay Foad	830ed64ccd	Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"" This reverts commit 8b08fa0103c8d8e624b19fad5a5006e7a783ecb7. The underlying problems were fixed by D90607.	2020-11-11 14:40:14 +00:00
Konstantin Zhuravlyov	3fdf3b1539	AMDGPU: Update AMDHSA code object version handling Differential Revision: https://reviews.llvm.org/D89076	2020-10-14 13:04:27 -04:00
Jay Foad	16778b19f2	[AMDGPU] Make bfe patterns divergence-aware This tends to increase code size but more importantly it reduces vgpr usage, and could avoid costly readfirstlanes if the result needs to be in an sgpr. Differential Revision: https://reviews.llvm.org/D88580	2020-10-05 09:55:10 +01:00
Mirko Brkusanin	8b08fa0103	Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access" This reverts commit f5cd7ec9f3fc969ff5e1feed961996844333de3b. Certain rocPRIM/rocThrust/hipCUB tests were failing because of this change.	2020-09-29 15:33:34 +02:00
Mirko Brkusanin	f5cd7ec9f3	[AMDGPU] Reorganize GCN subtarget features for unaligned access Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine whether hardware supports such access. UnalignedAccessMode should be used to enable them. hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be now used to quickly check both. Differential Revision: https://reviews.llvm.org/D84522	2020-08-21 12:26:31 +02:00
Scott Linder	60b1967c39	[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI. As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative. Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first. Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75138	2020-03-19 15:35:16 -04:00
Matt Arsenault	4b47213951	AMDGPU: Switch backend default max workgroup size to 1024 Previously this would default to 256, not the maximum supported size of 1024. Using a maximum lower than the hardware maximum requires language runtimes to enforce this limit for correctness, which no language has correctly done. Switch the default to the conservatively correct maximum, and force frontends to opt-in to the more optimal 256 default maximum. I don't really understand why the changes in occupancy-levels.ll increased the computed occupancy, which I expected to decrease. I'm not sure if these tests should be forcing the old maximum.	2019-11-13 07:11:02 +05:30
Konstantin Zhuravlyov	a25e0524c0	AMDGPU: Enable code object v3 for AMDHSA only Differential Revision: https://reviews.llvm.org/D54186 llvm-svn: 346923	2018-11-15 02:32:43 +00:00
Konstantin Zhuravlyov	2d22d24ac4	Revert r345542: AMDGPU: Enable code object v3 by default It breaks mesa. llvm-svn: 345662	2018-10-30 22:02:40 +00:00
Konstantin Zhuravlyov	5cb950200c	AMDGPU: Enable code object v3 by default Differential Revision: https://reviews.llvm.org/D53525 llvm-svn: 345542	2018-10-29 21:07:27 +00:00
Alexander Timofeev	db7ee7660a	[AMDGPU] Preliminary patch for divergence driven instruction selection. Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec llvm-svn: 341928	2018-09-11 11:56:50 +00:00
Matt Arsenault	9224c00d2b	AMDGPU: Use more custom insert/extract_vector_elt lowering Apply to i8 vectors. llvm-svn: 334044	2018-06-05 19:52:46 +00:00
Matt Arsenault	869cbedc81	AMDGPU: Fix broken dynamic vector indexing for packed types The intention of this was to multiply by 16, not shift by 16. llvm-svn: 331793	2018-05-08 18:43:25 +00:00
Changpeng Fang	ba92059ca9	AMDGPU/SI: Extend promoting alloca to vector to arrays of up to 16 elements Summary: This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D33559 llvm-svn: 325372	2018-02-16 19:14:17 +00:00
Yaxun Liu	0124b5484c	[AMDGPU] Change constant addr space to 4 Differential Revision: https://reviews.llvm.org/D43170 llvm-svn: 325030	2018-02-13 18:00:25 +00:00
Yaxun Liu	2a22c5deff	[AMDGPU] Switch to the new addr space mapping by default This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101	2018-02-02 16:07:16 +00:00
Mark Searles	e4f067ebe2	[AMDGPU] Turn off MergeConsecutiveStores() before Instruction Selection for AMDGPU. Commit dbbb6c5fc3642987430866dffdf710df4f616ac7 turned on MergeConsecutiveStores() before Instruction Selection for all targets. Enough AMDGPU compiles go into an infinite loop ( MergeConsecutiveStores() merges two stores; LegalizeStoreOps() un-merges; MergeConsecutiveStores() re-merges, etc. ) to warrant turning it off until the issues can be addressed. Differential Revision: https://reviews.llvm.org/D41377 llvm-svn: 321100	2017-12-19 19:26:23 +00:00
Nirav Dave	db77e57ea8	[DAG] Do MergeConsecutiveStores again before Instruction Selection Summary: Now that store-merge is only generates type-safe stores, do a second pass just before instruction selection to allow lowered intrinsics to be merged as well. Reviewers: jyknight, hfinkel, RKSimon, efriedma, rnk, jmolloy Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33675 llvm-svn: 319036	2017-11-27 15:28:15 +00:00
Dmitry Preobrazhensky	a0342dc9eb	[AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev} See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765 Reviewers: tamazov, SamWot, arsenm, vpykhtin Differential Revision: https://reviews.llvm.org/D40088 llvm-svn: 318675	2017-11-20 18:24:21 +00:00
Matt Arsenault	45b98189bd	AMDGPU: Don't use MUBUF vaddr if address may overflow Effectively revert r263964. Before we would not allow this if vaddr was not known to be positive. llvm-svn: 318240	2017-11-15 00:45:43 +00:00
Stanislav Mekhanoshin	c90347d760	[AMDGPU] Generate range metadata for workitem id If workgroup size is known inform llvm about range returned by local id and local size queries. Differential Revision: https://reviews.llvm.org/D31804 llvm-svn: 300102	2017-04-12 20:48:56 +00:00
Matt Arsenault	3dbeefa978	AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444	2017-03-21 21:39:51 +00:00
Stanislav Mekhanoshin	8e45acfc38	[AMDGPU] Add address space based alias analysis pass This is direct port of HSAILAliasAnalysis pass, just cleaned for style and renamed. Differential Revision: https://reviews.llvm.org/D31103 llvm-svn: 298172	2017-03-17 23:56:58 +00:00
Matt Arsenault	707780b420	AMDGPU: Always allocate emergency stack slot at offset 0 This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. llvm-svn: 295877	2017-02-22 21:05:25 +00:00
Matt Arsenault	3aef809384	AMDGPU: Custom lower more vector operations This avoids stack usage. llvm-svn: 292846	2017-01-23 23:09:58 +00:00
Nicolai Haehnle	2857dc3893	AMDGPU: Properly implement SIRegisterInfo::isFrameOffsetLegal and needsFrameBaseReg Summary: Without the fix to isFrameOffsetLegal to consider the instruction's immediate offset, the new test case hits the corresponding assertion in resolveFrameIndex, because the LocalStackSlotAllocation pass re-uses a different base register. With only the fix to isFrameOffsetLegal, code quality reduces in a bunch of places because frame base registers are added where they're not needed. This is addressed by properly implementing needsFrameBaseReg, which also helps to avoid unnecessary zero frame indices in a bunch of other places. Fixes piglit glsl-1.50/execution/variable-indexing/gs-output-array-vec4-index-wr.shader_test Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D27344 llvm-svn: 289048	2016-12-08 14:08:02 +00:00
Matt Arsenault	39787bdcbb	Reapply "AMDGPU: Don't use offen if it is 0" This reverts r283003 llvm-svn: 285203	2016-10-26 15:08:16 +00:00
Bjorn Pettersson	12559441bd	[DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look through EXTRACT_VECTOR_ELT. Summary: Both computeKnownBits and ComputeNumSignBits can now do a simple look-through of EXTRACT_VECTOR_ELT. It will compute the result based on the known bits (or known sign bits) for the vector that the element is extracted from. Reviewers: bogner, tstellarAMD, mkuper Subscribers: wdng, RKSimon, jyknight, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D25007 llvm-svn: 283347	2016-10-05 17:40:27 +00:00
Mehdi Amini	86eeda8e20	Revert "AMDGPU: Don't use offen if it is 0" This reverts commit r282999. Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038 llvm-svn: 283003	2016-10-01 02:35:24 +00:00
Matt Arsenault	3070fdf798	AMDGPU: Don't use offen if it is 0 This removes many re-initializations of a base register to 0. llvm-svn: 282999	2016-10-01 01:37:15 +00:00
Matt Arsenault	0efdd06b22	AMDGPU: Run LoadStoreVectorizer pass by default llvm-svn: 281112	2016-09-09 22:29:28 +00:00
Konstantin Zhuravlyov	1d65026ca6	[AMDGPU] Wave and register controls - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747	2016-09-06 20:22:28 +00:00
Matt Arsenault	efb24540b1	AMDGPU: Remove dead check in AMDGPUPromoteAlloca This is currently only called with GEP users. A direct alloca would only happen with current typed pointers for arrays which are a perverse case. Also fix crashes on 0 x and 1 x arrays. llvm-svn: 275869	2016-07-18 18:34:53 +00:00
Matt Arsenault	7f681ac7a9	AMDGPU: Add feature for unaligned access llvm-svn: 274398	2016-07-01 23:03:44 +00:00
Matt Arsenault	c438ef574d	AMDGPU: Fix promote alloca for pointer loads If the load has a pointer type, we don't want to change its type. llvm-svn: 270000	2016-05-18 23:20:24 +00:00
Jan Vesely	687ca8df18	AMDGPU/R600: Use correct number of vector elements when lowering private loads Reviewer: tstellardAMD, arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D20032 llvm-svn: 269725	2016-05-16 23:56:32 +00:00

1 2

52 Commits