This patch replaces DivergentValues with UniformValues as the single
source of truth for tracking divergence in UniformityInfo.
Old model: DivergentValues starts empty; values are added as divergence
is propagated. isDivergent(V) returns DivergentValues.count(V).
New model: UniformValues starts fully populated (all
instructions/arguments for IR, all register defs for MIR) during
initialize(). Values are removed as divergence is propagated.
isDivergent(V) returns !UniformValues.contains(V), so any value not
present in the set (e.g., a newly created instruction that was not
present during analysis) is conservatively treated as divergent. This
avoids silent miscompilations when transformation passes introduce new
values and query their uniformity.
---------
Co-authored-by: padivedi <padivedi@amd.com>
See: https://github.com/llvm/llvm-project/issues/131779
Extends uniformity analysis to support instructions whose uniformity
depends on which specific operands are uniform. Introduces
`InstructionUniformity::Custom` and a target hook `TTI::isUniform(I,
UniformArgs)` that allows targets to define custom uniformity rules.
During propagation, custom candidates are checked via the target hook.
If we can prove they are uniform, we skip marking them divergent and let
iterative propagation re-evaluate as operands change.
Implements AMDGPU's `llvm.amdgcn.wave.shuffle` rules (uniform when
either operand is uniform, divergent only when both are divergent) as
the motivating example.
This inverted-logic approach is critical for correctness: proving
uniformity early during propagation would be unsafe, as operands can
transition from uniform to divergent during divergence propagation.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
- Remove pass initialization calls from pass constructors.
- For some passes, add the initialization to `initializeCodeGen` or
`initializeGlobalISel`.
- Remove redundant initializations from llc and X86 target for some
passes.
This patch introduces a new TargetTransformInfo hook
`getInstructionUniformity()`
that provides a unified interface for querying target-specific
uniformity
information about instructions and values.
The new hook returns an `InstructionUniformity` enum with three values:
- Default: Result is uniform if all operands are uniform (standard
propagation)
- AlwaysUniform: Result is always uniform regardless of operands
- NeverUniform: Result can never be assumed uniform
This API wraps the existing `isAlwaysUniform()` and
`isSourceOfDivergence()`
hooks, providing a single entry point for uniformity queries. Both LLVM
IR-level
(via TTI) and MIR-level (via TargetInstrInfo) uniformity analysis have
been
updated to use the new hook.
Target implementations:
- AMDGPU: Wraps existing `isAlwaysUniform()` and
`isSourceOfDivergence()` hooks
- NVPTX: Wraps existing `isSourceOfDivergence()` hook
This is an NFC change - all implementations return conservative defaults
or wrap
existing functionality.
Ref patch:https://github.com/llvm/llvm-project/pull/137639
- Add new pass manager version of `MachineUniformityAnalysis `.
- Query `TargetTransformInfo` in new pass manager version.
- Use `printAsOperand` when printing machine function name
Record all uses outside cycle with divergent exit during
propagateTemporalDivergence in Uniformity analysis.
With this list of candidates for temporal divergence lowering,
excluding known lane masks from control flow intrinsics,
find sources from inside the cycle that are not i1 and uniform.
Temporal divergence lowering (non i1):
create copy(v_mov) to vgpr, with implicit exec (to stop other
passes from moving this copy outside of the cycle) and use this
vgpr outside of the cycle instead of original uniform source.
Set CFGOnly in MachineUniformityAnalysisPass to false.
If there were new registers created, uniformity analysis needs to be
updated. Previously, with CFGOnly set to true, pass would be skipped
if CFG was preserved.
This relands commit #115111.
Use traditional way to update post dominator tree, i.e. break critical
edge splitting into insert, insert, delete sequence.
When splitting critical edges, the post dominator tree may change its
root node, and `setNewRoot` only works in normal dominator tree...
See
6c7e5827ed/llvm/include/llvm/Support/GenericDomTree.h (L684-L687)
Summary:
- Remove wrappers in `MachineDominatorTree`.
- Remove `MachineDominatorTree` update code in
`MachineBasicBlock::SplitCriticalEdge`.
- Use `MachineDomTreeUpdater` in passes which call
`MachineBasicBlock::SplitCriticalEdge` and preserve
`MachineDominatorTreeWrapperPass` or CFG analyses.
Commit abea99f65a97248974c02a5544eaf25fc4240056 introduced related
methods in 2014. Now we have SemiNCA based dominator tree in 2017 and
dominator tree updater, the solution adopted here seems a bit outdated.
Prepare for new pass manager version of `MachineDominatorTreeAnalysis`.
We may need a machine dominator tree version of `DomTreeUpdater` to
handle `SplitCriticalEdge` in some CodeGen passes.
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper.
Use machine uniformity analysis to find divergent i1 phis and select
them as lane mask phis in same way SILowerI1Copies select VReg_1 phis.
Note that divergent i1 phis include phis created by LCSSA and all cases
of uses outside of cycle are actually covered by "lowering LCSSA phis".
GlobalISel lane masks are registers with sgpr register class and S1 LLT.
TODO: General goal is that instructions created in this pass are fully
instruction-selected so that selection of lane mask phis is not split
across multiple passes.
patch 3 from: https://github.com/llvm/llvm-project/pull/73337
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper.
Use machine uniformity analysis to find divergent i1 phis and select
them as lane mask phis in same way SILowerI1Copies select VReg_1 phis.
Note that divergent i1 phis include phis created by LCSSA and all cases
of uses outside of cycle are actually covered by "lowering LCSSA phis".
GlobalISel lane masks are registers with sgpr register class and S1 LLT.
TODO: General goal is that instructions created in this pass are fully
instruction-selected so that selection of lane mask phis is not split
across multiple passes.
patch 3 from: https://github.com/llvm/llvm-project/pull/73337
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper.
Use machine uniformity analysis to find divergent i1 phis and select
them as lane mask phis in same way SILowerI1Copies select VReg_1 phis.
Note that divergent i1 phis include phis created by LCSSA and all cases
of uses outside of cycle are actually covered by "lowering LCSSA phis".
GlobalISel lane masks are registers with sgpr register class and S1 LLT.
TODO: General goal is that instructions created in this pass are fully
instruction-selected so that selection of lane mask phis is not split
across multiple passes.
patch 3 from: https://github.com/llvm/llvm-project/pull/73337
Fix the GenericSSAContext template so that it actually declares all the
necessary typenames and the methods that must be implemented by its
specializations SSAContext and MachineSSAContext.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D156288
When an instruction is determined to be divergent, not all its outputs are
divergent. The users of only divergent outputs should now be examined for
divergence.
Also, replaced a repeating pattern of "if new divergent instruction, then add to
worklist" by combining it into a single function. This does not cause any change
in functionality.
Reviewed By: foad, arsenm
Differential Revision: https://reviews.llvm.org/D150636
The MachineUA now queries the target to determine if a given register holds a
uniform value. This is determined using the corresponding register bank if
available, or by a combination of the register class and value type. This
assumes that the target is optimizing for performance by choosing registers, and
the target is responsible for any mismatch with the inferred uniformity.
For example, on AMDGPU, an SGPR is now treated as uniform, except if the
register bank is VCC (i.e., the register holds a wave-wide vector of 1-bit
values) or equivalently if it has a value type of s1.
- This does not always work with inline asm, where the register bank or the
value type might not be present. We assume that the SGPR is uniform, because
it is not expected to be s1 in the vast majority of cases.
- The pseudo branch instruction SI_LOOP is now hard-coded to be always
divergent, although its condition is an SGPR.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D150438
At a cycle C with divergent exits, UA was using a naive traversal of the exiting
edges to locate blocks that may use values defined inside C. But this traversal
fails when it encounters a cycle. This is now replaced with a much simpler
propagation that iterates over every instruction in C and checks any uses that
are outside C. But such an iteration can be expensive when C is very large; the
original strategy may need to be reconsidered if there is a regression in
compilation times.
Also fixed lit tests that should have originally caught the missed propagation
of temporal divergence.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D149646
Adds & uses a new `isDivergentUse` API in UA.
UniformityAnalysis now requires CycleInfo as well as the new temporal divergence API can query it.
-----
Original patch that adds `isDivergentUse` by @sameerds
The user of a temporally divergent value is marked as divergent in the
uniformity analysis. But the same user may also have been marked divergent for
other reasons, thus losing this information about temporal divergence. But some
clients need to specificly check for temporal divergence. This change restores
such an API, that already existed in DivergenceAnalysis.
Reviewed By: sameerds, foad
Differential Revision: https://reviews.llvm.org/D146018
An instruction that is "always uniform" is so even if it occurs in an
irreducible cycle. The output produced by such an instruction may depend on the
implementation defined cycle hierarchy, but that does not affect the uniformity
of the output. In other words, an "always uniform" instruction is uniform even
if it is not m-converged.
Reviewed By: ruiling, ronlieb
Differential Revision: https://reviews.llvm.org/D145572
With recent (> 15, as far as I can tell, possibly > 16) clang, c++17,
and GNU's libstdc++ (versions 9 and 10 and maybe others), LLVM fails
to compile due to an is_invocable() check in unique_ptr::reset().
To resolve this issue, add a template argument to ImplDeleter to make
things work.
Differential Revision: https://reviews.llvm.org/D141865
Uniformity analysis is a generalization of divergence analysis to
include irreducible control flow:
1. The proposed spec presents a notion of "maximal convergence" that
captures the existing convention of converging threads at the
headers of natual loops.
2. Maximal convergence is then extended to irreducible cycles. The
identity of irreducible cycles is determined by the choices made
in a depth-first traversal of the control flow graph. Uniformity
analysis uses criteria that depend only on closed paths and not
cycles, to determine maximal convergence. This makes it a
conservative analysis that is independent of the effect of DFS on
CycleInfo.
3. The analysis is implemented as a template that can be
instantiated for both LLVM IR and Machine IR.
Validation:
- passes existing tests for divergence analysis
- passes new tests with irreducible control flow
- passes equivalent tests in MIR and GMIR
Based on concepts originally outlined by
Nicolai Haehnle <nicolai.haehnle@amd.com>
With contributions from Ruiling Song <ruiling.song@amd.com> and
Jay Foad <jay.foad@amd.com>.
Support for GMIR and lit tests for GMIR/MIR added by
Yashwant Singh <yashwant.singh@amd.com>.
Differential Revision: https://reviews.llvm.org/D130746