This patch relands https://github.com/llvm/llvm-project/pull/86409.
I mistakenly thought that `Known.makeNegative()` clears the sign bit of
`Known.Zero`. This patch fixes the assertion failure by explicitly
clearing the sign bit.
isSafeToSink needs to check if machine cycle has divergent exit branch
but first it needs the MBB that contains cycle exit branch.
Early-tailduplication can delete exit block created by structurize-cfg
so there is still exactly one cycle exit block but the new cycle exit
block can have multiple predecessors.
Simplify search for MBBs that contain cycle exit branch by introducing
helper method getExitingBlocks in GenericCycle.
Fixes#89200
Implement GCNSubtarget::checkSubtargetFeatures as a canonical place to
check subtarget features for consistency and diagnose any
inconsistencies. To start with, the implementation just checks that
either wavefrontsize32 or wavefrontsize64 is selected.
checkSubtargetFeatures is called at the start of instruction selection.
This is pretty arbitrary. It is just a convenient point at which we have
access to the subtarget that we're going to use for codegenning a
particular function.
This is really a workaround to allow control flow lowering in the
presence of convergence control tokens. Control-flow intrinsics in LLVM
IR are convergent because they indirectly represent the wave CFG, i.e.,
sets of threads that are "converged" or "execute in lock-step". But they
exist during a small window in the lowering process, inserted after the
structurizer and then translated to equivalent MIR pseudos. So rather
than create convergence tokens for these builtins, we simply mark them
as not convergent.
The corresponding MIR pseudos are marked as having side effects, which
is sufficient to prevent optimizations without having to mark them as
convergent.
Revert "Revert 4 last AMDGPU commits to unbreak Windows bots"
This reverts commit 0d493ed2c6e664849a979b357a606dcd8273b03f.
MSVC does not like constexpr on the definition after an extern
declaration of a global.
Revert "AMDGPU: Try to fix build error with old gcc"
This reverts commit c7ad12d0d7606b0b9fb531b0b273bdc5f1490ddb.
Revert "AMDGPU: Use umin in set.rounding expansion"
This reverts commit a56f0b51dd988ad2b533de759c98457c1ed42456.
Revert "AMDGPU: Optimize set_rounding if input is known to fit in 2 bits (#88588)"
This reverts commit b4e751e2ab0ff152ed18dea59ebf9691e963e1dd.
Revert "AMDGPU: Implement llvm.set.rounding (#88587)"
This reverts commit 9731b77e80261c627d79980f8c275700bdaf6591.
This PR will fix the si-mode-register pass which is inserting an extra
setreg instruction in case of constrained FP operations. This pass will
be ignored for strictfp functions.
In practice PrologEpilogSGPRSpills never has more than 3 entries so
DenseMap is overkill. In addition this means that iteration happens in
register number order, instead of DenseMap's hashed order, so it will
not be affected by future patches that define new physical registers.
This should reduce future test case churn.
We don't need to figure out the weird extended rounding modes or
handle offsets to keep the lookup table in 64-bits.
https://reviews.llvm.org/D153258
Depends #88587
Use a shift of a magic constant and some offseting to convert from
flt_rounds values.
I don't know why the enum defines Dynamic = 7. The standard suggests -1
is the cannot determine value. If we could start the extended values at
4 we wouldn't need the extra compare sub and select.
https://reviews.llvm.org/D153257
Previously each single use producer would be marked with a
"S_SINGLEUSE_VDST 1" instruction. This patch adds support for
larger immediates that encode multiple single use producers into
one S_SINGLEUSE_VDST instruction.
This was removed due to avoid circular dependency between `CodeGen` and
`Passes`, but now I realized this is no longer a problem since
`CodeGenPassBuilder` is moved into `Passes`.
change order of fp and sp in kernel prologue also related codegen tests
to make it easier to merge code into our downstream branches
Signed-off-by: gangc <gangc@amd.com>
https://github.com/llvm/llvm-project/pull/90201 made some fixes for
gfx12
image_msaa_load waitcnt insertion.
That fix might break in some situations for pre-gfx12 - this fixes that
by
explitly checking for VSAMPLE which always requires a s_wait_samplecnt
and
leaves the previous logic intact for non-gfx12.
Code to determine if a waitcnt is required before a barrier instruction
only
considered S_BARRIER.
gfx12 adds barrier_signal/wait so need to enhance the existing code to
look for
a barrier start (which is just an S_BARRIER for earlier architectures).
The autogenerated memory legalizer tests use -O0 so this allows us to
see the exact waitcnts that were inserted by the memory legalizer
without them being optimized away.
Previously, multiple uses of a register within the same instruction were
being counted as multiple uses. This has been corrected to
only count as a single use as per the specification allowing for
more optimisation candidates.
The support library contains helpers to parse and emit YAML documents.
In the textual YAML representation, some strings need to be quoted, e.g.
when containing unprintable characters.
We already have such quoting implemented for YAML values.
This patch applies the same quoting to YAML *keys*.
One affected case is output of control registers in AMDGPU Msgpack
metadata, which are printed in a format like this:
```
0x2cca (SPI_SHADER_PGM_RSRC1_ES): 42
```
With this patch, the key is quoted:
```
'0x2cca (SPI_SHADER_PGM_RSRC1_ES)': 42
```
Most test changes come from this pattern.
This reverts commit 16bd10a38730fed27a3bf111076b8ef7a7e7b3ee.
Re-applies:
b3c55b707110084a9f50a16aade34c3be6fa18da - "[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921)"
8e2f6495c0bac1dd6ee32b6a0d24152c9c343624 - "[DAGCombiner] Do not always fold FREEZE over BUILD_VECTOR (#85932)"
73472c5996716cda0dbb3ddb788304e0e7e6a323 - "[SelectionDAG] Treat CopyFromReg as freezing the value (#85932)"
with a fix in DAGCombiner::visitFREEZE.
This reverts:
b3c55b707110084a9f50a16aade34c3be6fa18da - "[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921)"
(because it updates a test case that I don't know how to resolve the conflict for)
8e2f6495c0bac1dd6ee32b6a0d24152c9c343624 - "[DAGCombiner] Do not always fold FREEZE over BUILD_VECTOR (#85932)"
73472c5996716cda0dbb3ddb788304e0e7e6a323 - "[SelectionDAG] Treat CopyFromReg as freezing the value (#85932)"
Due to a test suite failure on AArch64 when compiling for SVE.
https://lab.llvm.org/buildbot/#/builders/197/builds/13955
clang: ../llvm/llvm/include/llvm/CodeGen/ValueTypes.h:307: MVT llvm::EVT::getSimpleVT() const: Assertion `isSimple() && "Expected a SimpleValueType!"' failed.
[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison
Handle SELECT_CC similarly as SETCC.
Handle these operations that only propagate poison/undef based on the
input operands:
SADDSAT, UADDSAT, SSUBSAT, USUBSAT, MULHU, MULHS,
SMIN, SMAX, UMIN, UMAX
These operations may create poison based on shift amount and exact
flag being violated:
SRL, SRA
One goal here is to allow pushing freeze through these operations
when allowed, as well as letting analyses such as
isGuaranteedNotToBeUndefOrPoison to not break on such operations.
Since some problems have been observed with pushing freeze through
SRA/SRL we block that explicitly in DAGCombiner::visitFreeze now.
That way we can still model SRA/SRL properly in
SelectionDAG::canCreateUndefOrPoison, e.g. when used by
isGuaranteedNotToBeUndefOrPoison, even if we do not want to push
freeze through those instructions.
It seems like this happened because #79460 moved this from
`FeatureISAVersion11_Common` to `FeatureISAVersion11_0_Common` while
#76955 was being reviewed.
This test will check the mode register in case of constrained floating
point operations.
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>