llvm-project

Author	SHA1	Message	Date
paperchalice	159628cc22	[CodeGen] Port MachineUniformityAnalysis to new pass manager (#137578 ) - Add new pass manager version of `MachineUniformityAnalysis `. - Query `TargetTransformInfo` in new pass manager version. - Use `printAsOperand` when printing machine function name	2025-04-30 10:44:06 +08:00
Petar Avramovic	c07e1e390c	AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (#124298 ) Record all uses outside cycle with divergent exit during propagateTemporalDivergence in Uniformity analysis. With this list of candidates for temporal divergence lowering, excluding known lane masks from control flow intrinsics, find sources from inside the cycle that are not i1 and uniform. Temporal divergence lowering (non i1): create copy(v_mov) to vgpr, with implicit exec (to stop other passes from moving this copy outside of the cycle) and use this vgpr outside of the cycle instead of original uniform source.	2025-03-12 12:09:37 +01:00
Petar Avramovic	88814969dd	MachineUniformityAnalysis: Pass is incorrectly initialized as CFGOnly (#125511 ) Set CFGOnly in MachineUniformityAnalysisPass to false. If there were new registers created, uniformity analysis needs to be updated. Previously, with CFGOnly set to true, pass would be skipped if CFG was preserved.	2025-02-04 09:25:25 +01:00
paperchalice	1562b70eaf	Reapply "[DomTreeUpdater] Move critical edge splitting code to updater" (#119547 ) This relands commit #115111. Use traditional way to update post dominator tree, i.e. break critical edge splitting into insert, insert, delete sequence. When splitting critical edges, the post dominator tree may change its root node, and `setNewRoot` only works in normal dominator tree... See `6c7e5827ed/llvm/include/llvm/Support/GenericDomTree.h (L684-L687)`	2024-12-13 11:43:09 +08:00
paperchalice	553058f825	Revert "[DomTreeUpdater] Move critical edge splitting code to updater" (#119512 ) Reverts llvm/llvm-project#115111 Causes #119511	2024-12-11 14:25:17 +08:00
paperchalice	79047fac65	[DomTreeUpdater] Move critical edge splitting code to updater (#115111 ) Support critical edge splitting in dominator tree updater. Continue the work in #100856. Compile time check: https://llvm-compile-time-tracker.com/compare.php?from=87c35d782795b54911b3e3a91a5b738d4d870e55&to=42b3e5623a9ab4c3648564dc0926b36f3b438a3a&stat=instructions%3Au	2024-12-11 11:31:42 +08:00
Nikita Popov	6a907699d8	Revert "[CodeGen] Remove `applySplitCriticalEdges` in `MachineDominatorTree` (#97055 )" This reverts commit c5e5088033fed170068d818c54af6862e449b545. Causes large compile-time regressions.	2024-07-11 09:13:37 +02:00
paperchalice	c5e5088033	[CodeGen] Remove `applySplitCriticalEdges` in `MachineDominatorTree` (#97055 ) Summary: - Remove wrappers in `MachineDominatorTree`. - Remove `MachineDominatorTree` update code in `MachineBasicBlock::SplitCriticalEdge`. - Use `MachineDomTreeUpdater` in passes which call `MachineBasicBlock::SplitCriticalEdge` and preserve `MachineDominatorTreeWrapperPass` or CFG analyses. Commit abea99f65a97248974c02a5544eaf25fc4240056 introduced related methods in 2014. Now we have SemiNCA based dominator tree in 2017 and dominator tree updater, the solution adopted here seems a bit outdated.	2024-07-11 11:08:05 +08:00
paperchalice	837dc542b1	[CodeGen][NewPM] Split `MachineDominatorTree` into a concrete analysis result (#94571 ) Prepare for new pass manager version of `MachineDominatorTreeAnalysis`. We may need a machine dominator tree version of `DomTreeUpdater` to handle `SplitCriticalEdge` in some CodeGen passes.	2024-06-11 21:27:14 +08:00
Petar Avramovic	06f711a906	AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis (#80003 ) Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: https://github.com/llvm/llvm-project/pull/73337	2024-02-05 14:07:01 +01:00
Petar Avramovic	c46109d0d7	Revert "AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis" (#79274 ) Reverts llvm/llvm-project#78482	2024-01-24 12:18:34 +01:00
Petar Avramovic	91ddcba83a	AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis (#78482 ) Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: https://github.com/llvm/llvm-project/pull/73337	2024-01-24 11:58:32 +01:00
Petar Avramovic	90bdf76fdb	Revert "AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis" (#78468 ) Reverts llvm/llvm-project#76145	2024-01-17 17:41:19 +01:00
Petar Avramovic	1fbf533286	AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis (#76145 ) Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: https://github.com/llvm/llvm-project/pull/73337	2024-01-17 12:10:24 +01:00
Sameer Sahasrabuddhe	b14e30f10d	[LLVM] refactor GenericSSAContext and its specializations Fix the GenericSSAContext template so that it actually declares all the necessary typenames and the methods that must be implemented by its specializations SSAContext and MachineSSAContext. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156288	2023-07-27 09:54:50 +05:30
Matt Arsenault	d61cba6de2	UniformityAnalysis: Skip computation with no branch divergence Check TTI before bothering to run the computation. Everything will be assumed uniform by default.	2023-06-16 18:41:56 -04:00
Jay Foad	5022fc2ad3	[CodeGen] Make use of MachineInstr::all_defs and all_uses. NFCI. Differential Revision: https://reviews.llvm.org/D151424	2023-06-01 19:17:34 +01:00
Sameer Sahasrabuddhe	0a170eb786	[Uniformity] Propagate divergence only along divergent outputs. When an instruction is determined to be divergent, not all its outputs are divergent. The users of only divergent outputs should now be examined for divergence. Also, replaced a repeating pattern of "if new divergent instruction, then add to worklist" by combining it into a single function. This does not cause any change in functionality. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D150636	2023-05-17 07:47:43 +05:30
Sameer Sahasrabuddhe	fbe1c0616f	[LLVM][Uniformity] Improve detection of uniform registers The MachineUA now queries the target to determine if a given register holds a uniform value. This is determined using the corresponding register bank if available, or by a combination of the register class and value type. This assumes that the target is optimizing for performance by choosing registers, and the target is responsible for any mismatch with the inferred uniformity. For example, on AMDGPU, an SGPR is now treated as uniform, except if the register bank is VCC (i.e., the register holds a wave-wide vector of 1-bit values) or equivalently if it has a value type of s1. - This does not always work with inline asm, where the register bank or the value type might not be present. We assume that the SGPR is uniform, because it is not expected to be s1 in the vast majority of cases. - The pseudo branch instruction SI_LOOP is now hard-coded to be always divergent, although its condition is an SGPR. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D150438	2023-05-16 09:37:04 +05:30
Sameer Sahasrabuddhe	b0f0dd2554	[LLVM][Uniformity] Propagate temporal divergence explicitly At a cycle C with divergent exits, UA was using a naive traversal of the exiting edges to locate blocks that may use values defined inside C. But this traversal fails when it encounters a cycle. This is now replaced with a much simpler propagation that iterates over every instruction in C and checks any uses that are outside C. But such an iteration can be expensive when C is very large; the original strategy may need to be reconsidered if there is a regression in compilation times. Also fixed lit tests that should have originally caught the missed propagation of temporal divergence. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D149646	2023-05-15 20:17:43 +05:30
pvanhout	f90849dfa3	[AMDGPU] Use UniformityAnalysis in AtomicOptimizer Adds & uses a new `isDivergentUse` API in UA. UniformityAnalysis now requires CycleInfo as well as the new temporal divergence API can query it. ----- Original patch that adds `isDivergentUse` by @sameerds The user of a temporally divergent value is marked as divergent in the uniformity analysis. But the same user may also have been marked divergent for other reasons, thus losing this information about temporal divergence. But some clients need to specificly check for temporal divergence. This change restores such an API, that already existed in DivergenceAnalysis. Reviewed By: sameerds, foad Differential Revision: https://reviews.llvm.org/D146018	2023-03-15 09:39:55 +01:00
Sameer Sahasrabuddhe	fd98416d37	[llvm][Uniformity] consistently handle always-uniform instructions An instruction that is "always uniform" is so even if it occurs in an irreducible cycle. The output produced by such an instruction may depend on the implementation defined cycle hierarchy, but that does not affect the uniformity of the output. In other words, an "always uniform" instruction is uniform even if it is not m-converged. Reviewed By: ruiling, ronlieb Differential Revision: https://reviews.llvm.org/D145572	2023-03-10 14:23:40 +05:30
Yashwant Singh	5230f6c1c2	[llvm][GenericUniformity] Prevent assert while calculating temporal divergence analyzeTemporalDivergence() was missing the check for always-uniform before evaluating weather an instruction depends on a value defined in the cycle. Fix for #60638 https://github.com/llvm/llvm-project/issues/60638 Reviewed By: sameerds, foad, #amdgpu Differential Revision: https://reviews.llvm.org/D144070	2023-03-02 12:42:35 +05:30
Krzysztof Drewniak	5d98dc7124	[llvm][GenericUniformity] Hack around strict is_invocable() checks With recent (> 15, as far as I can tell, possibly > 16) clang, c++17, and GNU's libstdc++ (versions 9 and 10 and maybe others), LLVM fails to compile due to an is_invocable() check in unique_ptr::reset(). To resolve this issue, add a template argument to ImplDeleter to make things work. Differential Revision: https://reviews.llvm.org/D141865	2023-01-18 19:56:42 +00:00
Haojian Wu	ad3996c1fc	Fix an unused-variable warning in release build, NFC	2022-12-20 09:38:03 +01:00
Sameer Sahasrabuddhe	475ce4c200	RFC: Uniformity Analysis for Irreducible Control Flow Uniformity analysis is a generalization of divergence analysis to include irreducible control flow: 1. The proposed spec presents a notion of "maximal convergence" that captures the existing convention of converging threads at the headers of natual loops. 2. Maximal convergence is then extended to irreducible cycles. The identity of irreducible cycles is determined by the choices made in a depth-first traversal of the control flow graph. Uniformity analysis uses criteria that depend only on closed paths and not cycles, to determine maximal convergence. This makes it a conservative analysis that is independent of the effect of DFS on CycleInfo. 3. The analysis is implemented as a template that can be instantiated for both LLVM IR and Machine IR. Validation: - passes existing tests for divergence analysis - passes new tests with irreducible control flow - passes equivalent tests in MIR and GMIR Based on concepts originally outlined by Nicolai Haehnle <nicolai.haehnle@amd.com> With contributions from Ruiling Song <ruiling.song@amd.com> and Jay Foad <jay.foad@amd.com>. Support for GMIR and lit tests for GMIR/MIR added by Yashwant Singh <yashwant.singh@amd.com>. Differential Revision: https://reviews.llvm.org/D130746	2022-12-20 07:22:24 +05:30

26 Commits