llvm-project

Author	SHA1	Message	Date
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Georgi Mirazchiyski	5dc15ddf57	[AMDGPU] Default-initialize uninitialized class member variables (#108428 )	2024-09-24 14:48:21 +04:00
Jay Foad	7a30b9c0f0	[AMDGPU] Make more use of getWaveMaskRegClass. NFC. (#108186 )	2024-09-11 14:55:53 +01:00
Matt Arsenault	cf54cae26b	AMDGPU/NewPM: Port SIFixSGPRCopies to new pass manager (#102614 ) This allows moving some tests relying on -stop-after=amdgpu-isel to move to checking -stop-after=finalize-isel instead, which will more reliably pass the verifier.	2024-08-09 17:52:41 +04:00
Jay Foad	63fae3ed65	[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298 )	2024-07-17 21:11:00 +01:00
Jay Foad	c7309dadbf	[AMDGPU] Use range-based for loops. NFC. (#99047 )	2024-07-17 10:18:03 +01:00
Jay Foad	6bba44e8dc	[AMDGPU] Use member initializers. NFC.	2024-07-16 15:29:10 +01:00
paperchalice	837dc542b1	[CodeGen][NewPM] Split `MachineDominatorTree` into a concrete analysis result (#94571 ) Prepare for new pass manager version of `MachineDominatorTreeAnalysis`. We may need a machine dominator tree version of `DomTreeUpdater` to handle `SplitCriticalEdge` in some CodeGen passes.	2024-06-11 21:27:14 +08:00
Xu Zhang	f6d431f208	[CodeGen] Make the parameter TRI required in some functions. (#85968 ) Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.	2024-04-24 14:24:14 +01:00
Justin Lebar	fab2bb8bfd	Add llvm::min/max_element and use it in llvm/ and mlir/ directories. (#84678 ) For some reason this was missing from STLExtras.	2024-03-10 20:00:13 -07:00
Jay Foad	a6dabed348	[AMDGPU] Fix nondeterminism in SIFixSGPRCopies (#70644 ) There are a couple of loops that iterate over V2SCopies. The iteration order needs to be deterministic, otherwise we can call moveToVALU in different orders, which causes temporary vregs to be allocated in different orders, which can affect register allocation heuristics.	2023-10-31 11:47:42 +00:00
Stanislav Mekhanoshin	fe8335babb	[AMDGPU] Select 64-bit imm moves if can be encoded as 32 bit operand (#70395 ) This allows folding of 64-bit operands if fit into 32-bit. Fixes https://github.com/llvm/llvm-project/issues/67781	2023-10-30 08:12:28 -07:00
Stanislav Mekhanoshin	945e943db7	[AMDGPU] Fix subreg check in the SIFixSGPRCopies (#70007 ) It checks for the copy of subregs, but it checks destination which may never happen in SSA. It misses the subreg check and happily produces S_MOV_B64 out of a subreg COPY. The affected test should have never been formed in the first place because the pass is running in SSA and copies into a subreg shall never happen.	2023-10-24 01:44:58 -07:00
Diana Picus	2e1718adc8	Reland "AMDGPU: Duplicate instead of COPY constants from VGPR to SGPR (#66882 )" Teach the si-fix-sgpr-copies pass to deal with REG_SEQUENCE, PHI or INSERT_SUBREG where the result is an SGPR, but some of the inputs are constants materialized into VGPRs. This may happen in cases where for instance several instructions use an immediate zero and SelectionDAG chooses to put it in a VGPR to satisfy all of them. This however causes the si-fix-sgpr-copies to try to switch the whole chain to VGPR and may lead to illegal VGPR-to-SGPR copies. Rematerializing the constant into an SGPR fixes the issue. This was originally reverted because it triggered an unrelated bug in PEI on one of the OpenMP buildbots. That bug has been fixed in #68299, so it should be ok to try again.	2023-10-06 10:03:50 +02:00
Diana Picus	327fdcf789	Revert "AMDGPU: Duplicate instead of COPY constants from VGPR to SGPR (#66882 )" This reverts commit a04603993b43e5ebac1531293d288315f1885886 because it broke the OpenMP buildbot.	2023-09-25 13:40:38 +02:00
Diana	a04603993b	AMDGPU: Duplicate instead of COPY constants from VGPR to SGPR (#66882 ) Teach the si-fix-sgpr-copies pass to deal with REG_SEQUENCE, PHI or INSERT_SUBREG where the result is an SGPR, but some of the inputs are constants materialized into VGPRs. This may happen in cases where for instance several instructions use an immediate zero and SelectionDAG chooses to put it in a VGPR to satisfy all of them. This however causes the si-fix-sgpr-copies to try to switch the whole chain to VGPR and may lead to illegal VGPR-to-SGPR copies. Rematerializing the constant into an SGPR fixes the issue.	2023-09-25 13:20:08 +02:00
David Stuttard	ce031fc17a	[AMDGPU] Fix non-deterministic iteration order in SIFixSGPRCopies (#66617 ) Use of DenseSet was causing some non-deteminism in SIFixSGPRSopies. Changing to SetVector fixes the problem.	2023-09-18 10:08:53 +01:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
skc7	b434051dc8	[AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147168	2023-04-10 11:34:14 +05:30
Jay Foad	a07584d57d	[CodeGen] Make more use of MachineOperand::getOperandNo. NFC. Differential Revision: https://reviews.llvm.org/D143252	2023-02-07 11:50:57 +00:00
Carl Ritson	5bc703f755	[AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass Accelerate finding the base class for a physical register by building a statically mapping table from physical registers to base classes using TableGen. Replace uses of SIRegisterInfo::getPhysRegClass with TargetRegisterInfo::getPhysRegBaseClass in order to use the computed table. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D139422	2022-12-20 16:22:14 +09:00
Alexander Timofeev	2877b87666	[AMDGPU] Lower VGPR to physical SGPR COPY to S_MOV_B32 if VGPR contains the compile time constant Sometimes we have a constant value loaded to VGPR. In case we further need to rematrerialize it in the physical scalar register we may avoid VGPR to SGPR copy replacing it with S_MOV_B32. Reviewed By: JonChesterfield, arsenm Differential Revision: https://reviews.llvm.org/D139874	2022-12-16 00:38:10 +01:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Alexander Timofeev	2e8817b90a	[AMDGPU] SIFixSGPRCopies reworking to use one pass over the MIR for analysis and lowering. This change finalizes the series of patches aiming to replace the old strategy of VGPR to SGPR copy lowering. # Following the https://reviews.llvm.org/D128252 and https://reviews.llvm.org/D130367 code parts that are no longer used were removed. # The first pass over the MachineFunctoin collects all the necessary information. # Lowering is done in 3 phases: - VGPR to SGPR copies analysis lowering - REG_SEQUENCE, PHIs, and SGPR to VGPR copies lowering - SCC copies lowering is done in a separate pass over the Machine Function Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D131246	2022-09-19 23:31:45 +02:00
Matt Arsenault	7834194837	TableGen: Introduce generated getSubRegisterClass function Currently there isn't a generic way to get a smaller register class that can be produced from a subregister of a larger class. Replaces a manually implemented version for AMDGPU. This will be used to improve subregister support in the allocator.	2022-09-12 09:03:37 -04:00
Kazu Hirata	fedc59734a	[llvm] Use range-based for loops (NFC)	2022-09-03 11:17:40 -07:00
Kazu Hirata	9861a68a7c	[Target] Qualify auto in range-based for loops (NFC)	2022-08-28 10:41:50 -07:00
Evgenii Stepanov	8ea1cf3111	Revert "[AMDGPU] SIFixSGPRCopies refactoring" Breaks ASan tests. This reverts commit 3f8ae7efa866e581a16e9ccc8e29744722f13fff.	2022-08-10 11:32:46 -07:00
alex-t	3f8ae7efa8	[AMDGPU] SIFixSGPRCopies refactoring This change finalizes the series of patches aiming to replace old strategy of VGPR to SGPR copies loweriong. Following the https://reviews.llvm.org/D128252 and https://reviews.llvm.org/D130367 code parts that are no longer used were removed. Pass main loop is no longer used for the MIR changes but collect information for further analysis. Actual MIR lowering happens further according the analysys result in the set of separate functions. Another important change concerns the order of lowering: VGPR to SGPR copies lowering is done first to have priority on the rest of the MIR changes. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D131246	2022-08-10 00:51:57 +02:00
Alexander Timofeev	a321d95b59	[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs In the 2e29b0138ca243 we introduce a specific solving algorithm that analyzes the VGPR to SGPR copies use chains and either lowers the copy to v_readfirstlane_b32 or converts the whole chain to VALU forms. Same time we still have the code that blindly converts to VALU REG_SEQUENCE and PHIs in case they produce SGPR but have VGPRs input operands. In case the REG_SEQUENCE and PHIs are in the VGPR to SGPR copy use chain, and this chain was considered long enough to convert copy to v_readfistlane_b32, further lowering them to VALU leads to several kinds of issues. At first, we have v_readfistlane_b32 which is completely useless because most parts of its use chain were moved to VALU forms. Second, we may encounter subtle bugs related to the EXEC-dependent CF because of the weird mixing of SALU and VALU instructions. This change removes the code that moves REG_SEQUENCE and PHIs to VALU. Instead, we use the fact that both REG_SEQUENCE and PHIs have copy semantics. That is, if they define SGPR but have VGPR inputs, we insert VGPR to SGPR copies to make them pure SGPR. Then, the new copies are processed by the common VGPR to SGPR lowering algorithm. This is Part 2 in the series of commits aiming at the massive refactoring of the SIFixSGPRCopies pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130367	2022-08-02 18:37:57 +02:00
Alexander Timofeev	d7ae1a9097	Revert "[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs" This reverts commit 76d9ae924cc361578ecbb5688559f7cebc78ab87. because it causes several VK CTS tests to fail	2022-07-29 14:19:07 +02:00
Alexander Timofeev	76d9ae924c	[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs In the 2e29b0138ca243 we introduce a specific solving algorithm that analyzes the VGPR to SGPR copies use chains and either lowers the copy to v_readfirstlane_b32 or converts the whole chain to VALU forms. Same time we still have the code that blindly converts to VALU REG_SEQUENCE and PHIs in case they produce SGPR but have VGPRs input operands. In case the REG_SEQUENCE and PHIs are in the VGPR to SGPR copy use chain, and this chain was considered long enough to convert copy to v_readfistlane_b32, further lowering them to VALU leads to several kinds of issues. At first, we have v_readfistlane_b32 which is completely useless because most parts of its use chain were moved to VALU forms. Second, we may encounter subtle bugs related to the EXEC-dependent CF because of the weird mixing of SALU and VALU instructions. This change removes the code that moves REG_SEQUENCE and PHIs to VALU. Instead, we use the fact that both REG_SEQUENCE and PHIs have copy semantics. That is, if they define SGPR but have VGPR inputs, we insert VGPR to SGPR copies to make them pure SGPR. Then, the new copies are processed by the common VGPR to SGPR lowering algorithm. This is Part 2 in the series of commits aiming at the massive refactoring of the SIFixSGPRCopies pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130367	2022-07-28 14:30:29 +02:00
Alexander Timofeev	2e29b0138c	[AMDGPU] Lowering VGPR to SGPR copies to v_readfirstlane_b32 if profitable. Since the divergence-driven instruction selection has been enabled for AMDGPU, all the uniform instructions are expected to be selected to SALU form, except those not having one. VGPR to SGPR copies appear in MIR to connect values producers and consumers. This change implements an algorithm that evolves a reasonable tradeoff between the profit achieved from keeping the uniform instructions in SALU form and overhead introduced by the data transfer between the VGPRs and SGPRs. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D128252	2022-07-14 23:59:02 +02:00
Kazu Hirata	387927bbaf	[Target] Use range-based for loops (NFC)	2021-11-26 21:21:17 -08:00
Christudasan Devadasan	654c89d85a	[AMDGPU] Make vector superclasses allocatable The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to enable copy between VGPR and AGPR registers during regalloc. Also, added the missing AV register classes from 192b to 1024b. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D109300	2021-11-26 00:42:12 -05:00
alex-t	1a33294652	[AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC Normally, given that the DA results are kept consistent over the selection DAG, uniform comparisons get selected to S_CMP_* but divergent to V_CMP_*. Sometimes, for the sake of efficiency, SSA subgraphs may be converted to VALU to avoid repeatedly copying data back and forth. Hence we have to be able to sustain the correctness passing the i1 from VALU to SALU context and vice versa. VALU operations only process the active lanes of the VGPR and ignore inactive ones. Active lanes correspond to 1 bit in the EXEC mask register. SALU represents i1 as just one bit but VALU as 64bits: 0/1 and 0/(0xffffffffffffffff & EXEC) respectively. SALU uses one-bit conditional flag SCC but VALU - VCC that is a pair of 32-bit SGPRs To expose SCC to the VALU context we need to convert the one-bit boolean value to the appropriate 64bit. To return back to the SALU context we need to do the opposite. To correctly convert 64bit VALU boolean to either 0 or 1 we need to filter out the bits corresponding to the inactive lanes. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D109900	2021-09-21 21:19:31 +03:00
alex-t	ed0f4415f0	[AMDGPU] Divergence-driven compare operations instruction selection Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode divergence flag. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D106079	2021-08-25 18:30:49 +03:00
Piotr Sobczak	4672bac177	[AMDGPU] Introduce Strict WQM mode * Add amdgcn_strict_wqm intrinsic. * Add a corresponding STRICT_WQM machine instruction. * The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active. * The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96258	2021-03-03 14:19:16 +01:00
Piotr Sobczak	c3ce7bae80	[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm * Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm The change is done for consistency as the "strict" prefix will become an important, distinguishing factor between amdgcn_wqm and amdgcn_strictwqm in the future. The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict mode will always be enabled irrespective of control flow decisions. The amdgcn_wwm will be removed, but doing so in two steps gives users time to switch to the new name at their own pace. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96257	2021-03-03 09:33:57 +01:00
dfukalov	560d7e0411	[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets ... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036	2021-01-20 22:22:45 +03:00
Joe Nash	314e29ed2b	[AMDGPU] Add _e64 suffix to VOP3 Insts Previously, instructions which could be expressed as VOP3 in addition to another encoding had a _e64 suffix on the tablegen record name, while those only available as VOP3 did not. With this patch, all VOP3s will have the _e64 suffix. The assembly does not change, only the mir. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D94341 Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423	2021-01-12 18:33:18 -05:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Kazu Hirata	0e219b6443	[Target] Construct SmallVector with iterator ranges (NFC)	2021-01-03 09:57:45 -08:00
Matt Arsenault	29ed846d67	AMDGPU: Fix assert when checking for implicit operand legality	2020-12-22 20:56:24 -05:00
Sebastian Neubauer	31a0b2834f	[AMDGPU] Fix iterating in SIFixSGPRCopies The insertion of waterfall loops splits the current basic block into three blocks. So the basic block that we iterate over must be updated. This failed assert(!NodePtr->isKnownSentinel()) in ilist_iterator for divergent calls in branches before. Differential Revision: https://reviews.llvm.org/D90596	2020-11-04 18:43:19 +01:00
Piotr Sobczak	8d7fd73c3a	[AMDGPU] Fix merging m0 inits Fix incorrect merges of m0 inits in loops. It was assumed that if a clobbering instruction appears in the same block as an init and the clobbering instruction does not dominate the init then it does not interfere with init. This does not work in the presence of loops, where in this scenario, the clobbering instruction does interfere with the init in another iteration. To fix this, do not check for block equality and defer the decision to the predecessor check. Differential Revision: https://reviews.llvm.org/D87882	2020-09-23 09:13:43 +02:00
Jay Foad	45eeb8c2a9	[AMDGPU] Remove unused variable introduced in r251860	2020-08-27 13:28:32 +01:00
Jay Foad	98de0d22f5	[AMDGPU] Apply llvm-prefer-register-over-unsigned from clang-tidy	2020-08-21 10:14:35 +01:00
Jay Foad	3497860203	[AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister ... in favour of the isPhysical/isVirtual methods.	2020-08-20 17:59:11 +01:00
alex-t	b726d071b4	[AMDGPU] Reject moving PHI to VALU if the only VGPR input originated from move immediate Summary: PHIs result register class is set to VGPR or SGPR depending on the cross block value divergence. In some cases uniform PHI need to be converted to return VGPR to prevent the oddnumber of moves values from VGPR to SGPR and back. PHI should certainly return VGPR if it has at least one VGPR input. This change adds the exception. We don't want to convert uniform PHI to VGPRs in case the only VGPR input is a VGPR to SGPR COPY and definition od the source VGPR in this COPY is move immediate. bb.0: %0:vgpr_32 = V_MOV_B32_e32 0, implicit $exec %2:sreg_32 = ..... bb.1: %3:sreg_32 = PHI %1, %bb.3, %2, %bb.1 S_BRANCH %bb.3 bb.3: %1:sreg_32 = COPY %0 S_BRANCH %bb.2 Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80434	2020-05-28 19:25:51 +03:00

1 2 3

115 Commits