This reverts commit e0526b0780f56eede09b05a859a93626ecdc6e4d.
The `v_minmax/maxmin_f16`(GFX11) needs to be updated to t16 with
`v_minmax/maxmin_num_f16`(GFX12) together since they share the same
codegen pattern. Revert the old patch and resubmit
SDNode::use_iterator now returns an SDUse& when dereferenced.
SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses
work on use_iterator. SDNode::user_begin/user_end/users work on
user_iterator.
We can now write range based for loops using SDUse& and SDNode::uses().
I've converted many of these in this patch. I didn't update loops that
have additional variables updated in their for statement.
Some loops use SDNode::use_iterator::getOperandNo() which also prevents
using range based for loops. I plan to move this into SDUse in a follow
up patch.
Most of these are just places that want the first user and aren't
iterating over the whole list.
While there I changed some use_size() == 1 to hasOneUse() which
is more efficient.
This is part of an effort to rename use_iterator to user_iterator
and provide a use_iterator that dereferences to SDUse&. This patch
helps reduce the diff on later patches.
This function is most often used in range based loops or algorithms
where the iterator is implicitly dereferenced. The dereference returns
an SDNode * of the user rather than SDUse * so users() is a better name.
I've long beeen annoyed that we can't write a range based loop over
SDUse when we need getOperandNo. I plan to rename use_iterator to
user_iterator and add a use_iterator that returns SDUse& on dereference.
This will make it more like IR.
Support true16 format for v_minmax/maxmin_f16 in MC.
Since we are replacing `v_minmax/maxmin_f16` to `v_minmax/maxmin_f16_t16
/ v_minmax/maxmin_f16_fake16` in Post-GFX11, have to update the CodeGen
pattern for `v_minmax/maxmin_f16` to get CodeGen test passing.
Support true16 format for v_pack_b32_f16 in MC.
Since we are replacing v_alignbit_b32 to
`v_pack_b32_f16_t16/v_pack_b32_f16_fake16` in Post-GFX11, have to update
the CodeGen pattern for `v_pack_b32_f16_fake16 `to get CodeGen test
passing. There is no pattern modified/created, but just replacing the
`v_pack_b32_f16` with fake16 format.
Some of the true16 CodeGen test are impacted since `v_pack_b32_f16`
selection are removed in Post-GFX11 while `v_pack_b32_f16_t16` are not
yet supported. The CodeGen patch for `v_pack_b32_f16_t16` will be done
is the following patch.
This is a NFC change. Update mc test for v_subrev_f16 in true16 format.
MC source change was done by previous patch and automatically enabled by
t16 pesudo
This is a NFC change. Update mc test for v_ldexp_f16 in true16 format.
MC source change was done by previous patch and automatically enabled by
t16 pesudo
Support true16 and fake16 format for more VOP3 instructions in MC
This patch updates the true16 and fake16 vop_profile for the following
instructions and update the asm/dasm tests:
v_mad_u16
v_mad_i16
v_med3_f16
v_med3_i16
v_med3_u16
v_max3_f16
v_max3_i16
v_max3_u16
v_min3_f16
v_min3_i16
v_min3_u16
v_med3_num_f16
This is a NFC change. Update mc test for v_and/or/xor_b16 in true16
format.
MC source change was done by previous patch and automatically enabled by
t16 pesudo
This is a NFC change. Update mc test for v_mul_f16 in true16 format.
MC source change was done by previous patch and automatically enabled by
t16 pesudo
This is a NFC change. Update mc test for v_max/min_f16 in true16 format.
MC source change was done by previous patch and automatically enabled by
t16 pesudo
Do not remove S_CBRANCH_EXECZ if one of the following blocks contains an
unconditional branch to a block other than the one immediately following
it. This can cause unwanted behavior like infinite loops.
AMDGPU: Delete spills of undef values
It would be a bit more logical to preserve the undef and do the normal
expansion, but this is less work. This avoids verifier errors in a
future patch which starts deleting liveness from registers after
allocation failures which results in spills of undef values.
https://reviews.llvm.org/D122607
Move where undef sgpr spills are deleted
In one test code generation diverged between GISEL and DAG
For example, this intrinsic
> %ld = call i8 @llvm.amdgcn.s.buffer.load.u8(<4 x i32> %src, i32
%offset, i32 0)
would be lowered into these two cases:
* `buffer_load_u8 v2, v2, s[0:3], null offen`
* `buffer_load_b32 v2, v2, s[0:3], null offen`
This patch fixes this issue.
Add tests with get_image_width as a sample for all of the non-extension
image types. The transform doesn't do anything, but this runs through
all the mangled libfunc parsing and shows it does not crash. It would
probably be smarter to check for exact match of the types, rather than
checking the prefix.
Previously this would recognize a call to a mangled ldexp(float, float)
as a candidate to replace with the intrinsic. We need to verify the second
parameter is in fact an integer.
Fixes: SWDEV-501389
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.
This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
This relands commit #115111.
Use traditional way to update post dominator tree, i.e. break critical
edge splitting into insert, insert, delete sequence.
When splitting critical edges, the post dominator tree may change its
root node, and `setNewRoot` only works in normal dominator tree...
See
6c7e5827ed/llvm/include/llvm/Support/GenericDomTree.h (L684-L687)
This reverts commit 254d206ee2a337cb38ba347c896f7c6a14c7f218.
+Added a fix in ExpandDivRem24 to disqualify if DivNumBits exceed 24.
Original commit & msg:
ce6e955ac374f2b86cbbb73b2f32174dffd85f25.
Handle signed and unsigned path differently in getDivNumBits. Using
computeKnownBits, this rejects shrinking unsigned div/rem if operands
exceed signed max since we know NumSignBits will be always 0.
Retain LLT type information by creating new LLTs from the original LLT
instead of only using the original scalar size.
This PR prepares for the [LLT FPInfo
RFC](https://discourse.llvm.org/t/rfc-globalisel-adding-fp-type-information-to-llt/83349/24)
where LLTs will carry additional floating point type information in
addition to the scalar size.
When AGPRs are spilled to stack through VGPRs, the pei only marks the
AGPR tuple as implicit-def. To preserve the liveness, it should also
mark the tuple implicit.
Fixes: SWDEV-462189
If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by
taking its value directly; otherwise, it uses the default range as a starting
point. We will no longer manipulate the known range, which can cause issues
because the known range is a "throttle" to the assumed range such that the
assumed range can't get widened properly in `updateImpl` if the known range is
not set properly for whatever reasons. Another benefit of not touching the known
range is, if we indicate pessimistic state, it also invalidates the AA such that
`manifest` will not be called. Since we honor the attribute, we don't want and
will not add any half-baked attribute added to a function.
These properties are only valid on ComplexPatterns. Having them as flags
is more convenient because one can now use "let = ... in" syntax to set
these flags on several patterns at a time. This is also less error-prone
as it makes it impossible to specify these properties on records derived
from SDPatternOperator.
Pull Request: https://github.com/llvm/llvm-project/pull/119599
Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when
partially writing to high bytes.
Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop
for above cases irrespective of opsel values.
Note: We might need to add few others into the same table.
Currently LLVMContext::emitError emits any error as an "inline asm"
error which does not make any sense. InlineAsm appears to be special,
in that it uses a "LocCookie" from srcloc metadata, which looks like
a parallel mechanism to ordinary source line locations. This meant
that other types of failures had degraded source information reported
when available.
Introduce some new generic error types, and only use inline asm
in the appropriate contexts. The DiagnosticInfo types are still
a bit of a mess, and I'm not sure why DiagnosticInfoWithLocationBase
exists instead of just having an optional DiagnosticLocation in the
base class.
DK_Generic is for any error that derives from an IR level instruction,
and thus can pull debug locations directly from it. DK_GenericWithLoc
is functionally the generic codegen error, since it does not depend
on the IR and instead can construct a DiagnosticLocation from the
MI debug location.