llvm-project

Author	SHA1	Message	Date
Jeffrey Byrnes	7180c23cf6	[SeparateConstOffsetFromGEP] Reland: Reorder trivial GEP chains to separate constants (#81671 ) Actually update tests w.r.t `9e5a77f252` and reland https://github.com/llvm/llvm-project/pull/73056	2024-02-13 17:10:23 -08:00
Philip Reames	99c5a66c62	Revert "[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to separate constants (#73056 )" and follow ups "ninja check-llvm" is failing on tip of tree. This reverts commit ec0aa1646e9953d1a8d0d15dc381d3250c854572. This reverts commit 1b65742f8c71f576381fe85d5e34579b24f2d874.	2024-02-13 13:29:23 -08:00
Jeffrey Byrnes	1b65742f8c	[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to separate constants (#73056 ) In this case, a trivial GEP chain has the form: ``` %ptr = getelementptr sameType, %base, constant %val = getelementptr sameType, %ptr, %variable ``` That is, a one-index GEP consumes another (of the same basis and result type) one-index GEP, where the inner GEP uses a constant index and the outer GEP uses a variable index. For chains of this type, it is trivial to reorder them (by simply swapping the indexes). The result of doing so is better AddrMode matching for users of the ultimate ptr produced by GEP chain. Future patches can extend this to support non-trivial GEP chains (e.g. those with different basis types and/or multiple indices).	2024-02-13 11:22:49 -08:00
Fangrui Song	9e9907f1cf	[AMDGPU,test] Change llc -march= to -mtriple= (#75982 ) Similar to 806761a7629df268c8aed49657aeccffa6bca449. For IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. amdgpu-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly. This patch changes AMDGPU tests to not rely on the default OS/environment components. Tests that need fixes are not changed: ``` LLVM :: CodeGen/AMDGPU/fabs.f64.ll LLVM :: CodeGen/AMDGPU/fabs.ll LLVM :: CodeGen/AMDGPU/floor.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.ll LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll LLVM :: CodeGen/AMDGPU/schedule-if-2.ll ```	2024-01-16 21:54:58 -08:00
Stanislav Mekhanoshin	fe8335babb	[AMDGPU] Select 64-bit imm moves if can be encoded as 32 bit operand (#70395 ) This allows folding of 64-bit operands if fit into 32-bit. Fixes https://github.com/llvm/llvm-project/issues/67781	2023-10-30 08:12:28 -07:00
Jay Foad	e4284a7c70	[AMDGPU] 4-align SGPR triples Previously SGPR triples like s[3:5] were aligned on a 3-SGPR boundary which has no basis in hardware. Aligning them on a 4-SGPR boundary is at least justified by the architecture reference guide which says: "Quad-alignment of SGPRs is required for operation on more than 64-bits". Currently there are no instructions that take SGPR triples as operands so the issue is latent. Differential Revision: https://reviews.llvm.org/D151463	2023-05-26 08:06:25 +01:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00
Krzysztof Drewniak	f9c1ede254	[AMDGPU] Define data layout entries for buffers Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441	2023-05-03 15:25:58 +00:00
Jay Foad	2be3189601	[AMDGPU] Don't select _SGPR forms of SMEM instructions on GFX9+ On GFX9+, SMEM instructions have an _SGPR_IMM form which is strictly more powerful than the _SGPR form. It simplifies codegen if we always select the _SGPR_IMM form with an immediate offset of 0 instead of the _SGPR form. Note that this patch just makes minimal changes to the selection patterns to prove the concept. Further simplifications are possible to reduced the number of selection patterns. On GFX9 the _SGPR form of the Real instruction is still required for assembly/disassembly but on GFX10+ it can be removed completely. Differential Revision: https://reviews.llvm.org/D147334	2023-04-17 16:23:30 +01:00
skc7	b434051dc8	[AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147168	2023-04-10 11:34:14 +05:30
Matt Arsenault	ce096b2207	AMDGPU: Convert some tests to opaque pointers These required update_mir_test_checks.	2022-12-19 09:04:17 -05:00
Nicolai Hähnle	b7f44f7cf9	AMDGPU: Remove ImagePSV and move images to addrspace 7 Following up on the removal of BufferPSV in commit 43b86bf992 ("AMDGPU: Remove BufferPseudoSourceValue") It is unclear what exactly the right address space for images should be. They seem morally closest to buffers, so that's what I went with. In practical terms, address space 7 is better than address space 0 because it can't alias with LDS. Differential Revision: https://reviews.llvm.org/D138949	2022-11-30 11:32:34 +01:00
Nicolai Hähnle	43b86bf992	AMDGPU: Remove BufferPseudoSourceValue The use of a PSV for buffer intrinsics is misleading because it may be misinterpreted as all buffer intrinsics accessing the same address in memory, which is clearly not true. Instead, build MachineMemOperands without a pointer value but with an address space, so that address space-based alias analysis can still work. There is a lot of test churn because previously address space 4 (constant address space) was used as an address space for buffer intrinsics. This doesn't make much sense and seems to have been an accident -- see the change in AMDGPUTargetMachine::getAddressSpaceForPseudoSourceKind. Differential Revision: https://reviews.llvm.org/D138711	2022-11-29 22:15:11 +01:00
Ivan Kosarev	1b560e6ab7	[AMDGPU][MC] Support TFE modifiers in MUBUF loads and stores. Reviewed By: dp, arsenm Differential Revision: https://reviews.llvm.org/D137783	2022-11-14 15:36:18 +00:00
Jay Foad	2e8863b6a1	[AMDGPU] Don't shrink VOP3 instructions pre-RA on GFX10+ In GFX10, there is no advantage to shrinking these instructions pre-RA, so this just saves a bit of work. In GFX11 there is an advantage to not shrinking them pre-RA, because the register classes for 16-bit operands are less restrictive in the VOP3 form than in the shrunk form. This patch is a prerequisite for actually setting up those register classes correctly for 16-bit vs non-16-bit operands. Differential Revision: https://reviews.llvm.org/D133769	2022-09-13 20:26:08 +01:00
Matt Arsenault	bb70b5d406	CodeGen: Set MODereferenceable from isDereferenceableAndAlignedPointer Previously this was assuming piontsToConstantMemory implies dereferenceable.	2022-09-12 08:38:35 -04:00
Alexander Timofeev	a321d95b59	[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs In the 2e29b0138ca243 we introduce a specific solving algorithm that analyzes the VGPR to SGPR copies use chains and either lowers the copy to v_readfirstlane_b32 or converts the whole chain to VALU forms. Same time we still have the code that blindly converts to VALU REG_SEQUENCE and PHIs in case they produce SGPR but have VGPRs input operands. In case the REG_SEQUENCE and PHIs are in the VGPR to SGPR copy use chain, and this chain was considered long enough to convert copy to v_readfistlane_b32, further lowering them to VALU leads to several kinds of issues. At first, we have v_readfistlane_b32 which is completely useless because most parts of its use chain were moved to VALU forms. Second, we may encounter subtle bugs related to the EXEC-dependent CF because of the weird mixing of SALU and VALU instructions. This change removes the code that moves REG_SEQUENCE and PHIs to VALU. Instead, we use the fact that both REG_SEQUENCE and PHIs have copy semantics. That is, if they define SGPR but have VGPR inputs, we insert VGPR to SGPR copies to make them pure SGPR. Then, the new copies are processed by the common VGPR to SGPR lowering algorithm. This is Part 2 in the series of commits aiming at the massive refactoring of the SIFixSGPRCopies pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130367	2022-08-02 18:37:57 +02:00
Alexander Timofeev	d7ae1a9097	Revert "[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs" This reverts commit 76d9ae924cc361578ecbb5688559f7cebc78ab87. because it causes several VK CTS tests to fail	2022-07-29 14:19:07 +02:00
Alexander Timofeev	76d9ae924c	[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs In the 2e29b0138ca243 we introduce a specific solving algorithm that analyzes the VGPR to SGPR copies use chains and either lowers the copy to v_readfirstlane_b32 or converts the whole chain to VALU forms. Same time we still have the code that blindly converts to VALU REG_SEQUENCE and PHIs in case they produce SGPR but have VGPRs input operands. In case the REG_SEQUENCE and PHIs are in the VGPR to SGPR copy use chain, and this chain was considered long enough to convert copy to v_readfistlane_b32, further lowering them to VALU leads to several kinds of issues. At first, we have v_readfistlane_b32 which is completely useless because most parts of its use chain were moved to VALU forms. Second, we may encounter subtle bugs related to the EXEC-dependent CF because of the weird mixing of SALU and VALU instructions. This change removes the code that moves REG_SEQUENCE and PHIs to VALU. Instead, we use the fact that both REG_SEQUENCE and PHIs have copy semantics. That is, if they define SGPR but have VGPR inputs, we insert VGPR to SGPR copies to make them pure SGPR. Then, the new copies are processed by the common VGPR to SGPR lowering algorithm. This is Part 2 in the series of commits aiming at the massive refactoring of the SIFixSGPRCopies pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130367	2022-07-28 14:30:29 +02:00
Matt Arsenault	8d0383eb69	CodeGen: Remove AliasAnalysis from regalloc This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable. Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy. Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.	2022-07-18 17:23:41 -04:00
Jay Foad	3eb2281bc0	[AMDGPU] Aggressively fold immediates in SIFoldOperands Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by adding the same 32-bit literal operand to many instructions. This patch removes that restriction, so that a non-inlinable immediate will be folded into any number of users. The rationale is: - It reduces the number of registers used for holding constant values, which might increase occupancy. (On the other hand, many of these registers are SGPRs which no longer affect occupancy on GFX10+.) - It reduces ALU stalls between the instruction that loads a constant into a register, and the instruction that uses it. - The above benefits are expected to outweigh any increase in code size. Differential Revision: https://reviews.llvm.org/D114643	2022-05-18 10:19:35 +01:00
Jay Foad	18ba57a53a	[AMDGPU] Regenerate test checks in splitkit-getsubrangeformask.ll	2021-11-18 16:03:28 +00:00
alex-t	ed0f4415f0	[AMDGPU] Divergence-driven compare operations instruction selection Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode divergence flag. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D106079	2021-08-25 18:30:49 +03:00
Ruiling Song	40e3df2a1b	[RegisterCoalescer] Resolve conflict based on liveness of subregister Currently we are resolving lane/subregister conflict by visiting instructions sequentially in current block to see whether there is any use of the tainted lanes. To save compile time, we are not doing further check in successor blocks. This sounds reasonable without subgregister liveness. But since we have added subregister liveness tracking capability to register coalescer, we can easily determine whether we have subregister liveness conflict by checking subranges. This would help coalescing more COPYs for target that enables subregister liveness tracking. Reviewed by: arsenm, qcolombet Differential Revision: https://reviews.llvm.org/D104509	2021-07-14 14:43:22 +08:00
Matt Arsenault	fae05692a3	CodeGen: Print/parse LLTs in MachineMemOperands This will currently accept the old number of bytes syntax, and convert it to a scalar. This should be removed in the near future (I think I converted all of the tests already, but likely missed a few). Not sure what the exact syntax and policy should be. We can continue printing the number of bytes for non-generic instructions to avoid test churn and only allow non-scalar types for generic instructions. This will currently print the LLT in parentheses, but accept parsing the existing integers and implicitly converting to scalar. The parentheses are a bit ugly, but the parser logic seems unable to deal without either parentheses or some keyword to indicate the start of a type.	2021-06-30 16:54:13 -04:00
Stanislav Mekhanoshin	3bffb1cd0e	[AMDGPU] Use single cache policy operand Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amount of code. These operands are mostly 0 anyway. Additional advantage that parser will accept these flags in any order unlike now. Differential Revision: https://reviews.llvm.org/D96469	2021-03-15 13:00:59 -07:00
Roman Lebedev	78b8ce40ef	Reland [SCEV] Improve modelling for (null) pointer constants This reverts commit 329aeb5db43f5e69df038fb20d2def77fe6f8595, and relands commit 61f006ac655431bd44b9e089f74c73bec0c1a48c. This is a continuation of D89456. As it was suggested there, now that SCEV models `PtrToInt`, we can try to improve SCEV's pointer handling. In particular, i believe, i will need this in the future to further fix `SCEVAddExpr`operation type handling. This removes special handling of `ConstantPointerNull` from `ScalarEvolution::createSCEV()`, and add constant folding into `ScalarEvolution::getPtrToIntExpr()`. This way, `null` constants stay as such in SCEV's, but gracefully become zero integers when asked. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D98147	2021-03-13 16:05:34 +03:00
Roman Lebedev	329aeb5db4	Temporairly evert "[SCEV] Improve modelling for (null) pointer constants" This appears to have broken ubsan bot: https://lab.llvm.org/buildbot/#/builders/85/builds/3062 https://reviews.llvm.org/D98147#2623549 It looks like LSR needs some kind of a change around insertion point handling. Reverting until i have a fix. This reverts commit 61f006ac655431bd44b9e089f74c73bec0c1a48c.	2021-03-13 09:10:28 +03:00
Roman Lebedev	61f006ac65	[SCEV] Improve modelling for (null) pointer constants This is a continuation of D89456. As it was suggested there, now that SCEV models `PtrToInt`, we can try to improve SCEV's pointer handling. In particular, i believe, i will need this in the future to further fix `SCEVAddExpr`operation type handling. This removes special handling of `ConstantPointerNull` from `ScalarEvolution::createSCEV()`, and add constant folding into `ScalarEvolution::getPtrToIntExpr()`. This way, `null` constants stay as such in SCEV's, but gracefully become zero integers when asked. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D98147	2021-03-12 22:11:58 +03:00
Matt Arsenault	81b2c23b77	AMDGPU: Use kill instruction to hint soft clause live ranges Previously we would use a bundle to hint the register allocator to not overwrite the pointers in a sequence of loads to avoid breaking soft clauses. This bundling was based on a fuzzy register pressure heuristic, so we could not guarantee using more registers than are really available. This would result in register allocator failing on unsatisfiable bundles. Use a kill to artificially extend the live ranges, so we can always succeed at register allocation even if it means extra spills in the worst case. This seems to capture most of the benefit of the bundle while avoiding most of the risk presented by the bundle. However the lit tests do show a handful of regressions. In some cases with sequences of volatile loads, unused load components end up getting reallocated to the next load which forces a wait between. There are also a few small scheduling regressions where a hazard used to be avoided, and one spill torture test which for some reason nearly doubles the stack usage. There is also a bit of noise from leftover kills (it may make sense for post-RA pseudos to strip all of these out).	2021-02-26 18:26:40 -05:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Matt Arsenault	e3c6fa3611	AMDGPU: Restrict soft clause bundling at half of the available regs Fixes a testcase that was overcommitting large register tuples to a bundle, which the register allocator could not possibly satisfy. This was producing a bundle which used nearly all of the available SGPRs with a series of 16-dword loads (not all of which are freely available to use). This is a quick hack for some deeper issues with how the clause bundler tracks register pressure. Overall the pressure tracking used here doesn't make sense and is too imprecise for what it needs to avoid the allocator failing. The pressure estimate does not account for the alignment requirements of large SGPR tuples, so this was really underestimating the pressure impact. This also ignores the impact of the extended live range of the use registers after the bundle is introduced. Additionally, it didn't account for some wide tuples not being available due to reserved registers. This regresses a few cases. These end up introducing more spilling. This is also a function of the global pressure being used in the decision to bundle, not the local pressure impact of the bundle itself.	2021-02-11 14:08:59 -05:00
Austin Kerbow	2291bd137d	[AMDGPU] Update subtarget features for new target ID support Support for XNACK and SRAMECC is not static on some GPUs. We must be able to differentiate between different scenarios for these dynamic subtarget features. The possible settings are: - Unsupported: The GPU has no support for XNACK/SRAMECC. - Any: Preference is unspecified. Use conservative settings that can run anywhere. - Off: Request support for XNACK/SRAMECC Off - On: Request support for XNACK/SRAMECC On GCNSubtarget will track the four options based on the following criteria. If the subtarget does not support XNACK/SRAMECC we say the setting is "Unsupported". If no subtarget features for XNACK/SRAMECC are requested we must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the feature string when initializing the subtarget, the settings are "On/Off". The defaults are updated to be conservatively correct, meaning if no setting for XNACK or SRAMECC is explicitly requested, defaults will be used which generate code that can be run anywhere. This corresponds to the "Any" setting. Differential Revision: https://reviews.llvm.org/D85882	2021-01-26 11:25:51 -08:00
Sebastian Neubauer	8214982b50	[AMDGPU] Implement mir parseCustomPseudoSourceValue Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names. Relands ba7dcd8542ab, which had memory leaks. Differential Revision: https://reviews.llvm.org/D95215	2021-01-22 11:24:08 +01:00
Sebastian Neubauer	4dbdff66fe	Revert "[AMDGPU] Implement mir parseCustomPseudoSourceValue" This reverts commit ba7dcd8542abfc784255efcb0767701dec42fe83. (caused memory leaks)	2021-01-21 18:11:48 +01:00
Sebastian Neubauer	ba7dcd8542	[AMDGPU] Implement mir parseCustomPseudoSourceValue Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names. Differential Revision: https://reviews.llvm.org/D94768	2021-01-21 16:32:17 +01:00
Jay Foad	d28624a209	[AMDGPU] Stop adding an implicit def of vcc_hi for wave32 This doesn't seem to be needed for anything. Differential Revision: https://reviews.llvm.org/D92400	2020-12-02 10:11:42 +00:00
Stanislav Mekhanoshin	5ab1702129	[AMDGPU] Remove scratch rsrc from spill pseudos Differential Revision: https://reviews.llvm.org/D91110	2020-11-12 15:23:37 -08:00
Jay Foad	b34ddfcc76	[SplitKit] In addDeadDef tolerate parent range that defines more lanes Following on from D87757 "[SplitKit] Only copy live lanes", in SplitEditor::addDeadDef, when we're checking whether the parent live interval has a subrange defining the same lanes, tolerate the case where the parent subrange defines a superset of the lanes. This can happen when the child subrange comes from SplitEditor::buildCopy decomposing a partial copy into a sequence of subreg copies that cover the required lanes. Differential Revision: https://reviews.llvm.org/D88020	2020-09-25 11:31:56 +01:00

40 Commits