9997 Commits

Author SHA1 Message Date
Konstantina Mitropoulou
d3508ccd15
[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. (#120588)
- **[AMDGPU] Add new test.**
- **[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions.**

---------

Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou@amd.com>
2024-12-19 11:20:43 -08:00
Brox Chen
4044886c7c
Revert "[AMDGPU][True16][MC] true16 for v_minmax/maxmin_f16 (#119586)" (#120594)
This reverts commit e0526b0780f56eede09b05a859a93626ecdc6e4d.

The `v_minmax/maxmin_f16`(GFX11) needs to be updated to t16 with
`v_minmax/maxmin_num_f16`(GFX12) together since they share the same
codegen pattern. Revert the old patch and resubmit
2024-12-19 12:10:23 -05:00
Craig Topper
f139bde8d8
[SelectionDAG] Move SDNode::use_iterator::getOperandNo to SDUse. (#120536)
This allows us to write more range based for loops because we no
longer need the iterator. It also matches IR's Use class.
2024-12-19 09:07:42 -08:00
Craig Topper
e6b2495545
[SelectionDAG] Split SDNode::use_iterator into user_iterator and use_iterator. (#120531)
SDNode::use_iterator now returns an SDUse& when dereferenced.
SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses
work on use_iterator. SDNode::user_begin/user_end/users work on
user_iterator.

We can now write range based for loops using SDUse& and SDNode::uses().
I've converted many of these in this patch. I didn't update loops that
have additional variables updated in their for statement.

Some loops use SDNode::use_iterator::getOperandNo() which also prevents
using range based for loops. I plan to move this into SDUse in a follow
up patch.
2024-12-19 08:35:32 -08:00
Jay Foad
a161e73fcc [AMDGPU] Remove unnecessary casts to GCNSubtarget 2024-12-19 15:50:53 +00:00
Jay Foad
056e5eccaf
[AMDGPU] Remove unneeded use of !dag. NFC. (#120546) 2024-12-19 11:01:59 +00:00
Craig Topper
bd261ecc5a
[SelectionDAG] Add SDNode::user_begin() and use it in some places (#120509)
Most of these are just places that want the first user and aren't
iterating over the whole list.

While there I changed some use_size() == 1 to hasOneUse() which
is more efficient.

This is part of an effort to rename use_iterator to user_iterator
and provide a use_iterator that dereferences to SDUse&. This patch
helps reduce the diff on later patches.
2024-12-18 22:13:04 -08:00
Craig Topper
104ad9258a
[SelectionDAG] Rename SDNode::uses() to users(). (#120499)
This function is most often used in range based loops or algorithms
where the iterator is implicitly dereferenced. The dereference returns
an SDNode * of the user rather than SDUse * so users() is a better name.

I've long beeen annoyed that we can't write a range based loop over
SDUse when we need getOperandNo. I plan to rename use_iterator to
user_iterator and add a use_iterator that returns SDUse& on dereference.
This will make it more like IR.
2024-12-18 20:09:33 -08:00
Brox Chen
e0526b0780
[AMDGPU][True16][MC] true16 for v_minmax/maxmin_f16 (#119586)
Support true16 format for v_minmax/maxmin_f16 in MC.

Since we are replacing `v_minmax/maxmin_f16` to `v_minmax/maxmin_f16_t16
/ v_minmax/maxmin_f16_fake16` in Post-GFX11, have to update the CodeGen
pattern for `v_minmax/maxmin_f16` to get CodeGen test passing.
2024-12-18 18:04:50 -05:00
Brox Chen
e10b12e656
[AMDGPU][True16][MC] true16 for v_div_fixup_f16 (#119613)
Support true16 format for v_div_fixup_f16 in MC.
2024-12-18 18:01:13 -05:00
Brox Chen
dc0ea0f945
[AMDGPU][True16][MC] true16 for v_cvt_pknorm_i16/u16_f16 (#119605)
Support true16 format for v_cvt_pknorm_i16/u16_f16 in MC.
2024-12-18 17:56:34 -05:00
Jun Wang
d57230c72e
[AMDGPU][MC] Disallow op_sel in some VOP3P dot instructions (#100485)
In v_dot4 and v_dot8 instructions with 4- or 8-bit packed data (e.g.,
v_dot4_u32_u8, v_dot8_u32_u4), the op_sel modifier should not be
allowed.
2024-12-18 10:50:47 -08:00
Brox Chen
c6f753b9a0
[AMDGPU][True16][MC] true16 for v_pack_b32_f16 (#119630)
Support true16 format for v_pack_b32_f16  in MC.

Since we are replacing v_alignbit_b32 to
`v_pack_b32_f16_t16/v_pack_b32_f16_fake16` in Post-GFX11, have to update
the CodeGen pattern for `v_pack_b32_f16_fake16 `to get CodeGen test
passing. There is no pattern modified/created, but just replacing the
`v_pack_b32_f16` with fake16 format.

Some of the true16 CodeGen test are impacted since `v_pack_b32_f16`
selection are removed in Post-GFX11 while `v_pack_b32_f16_t16` are not
yet supported. The CodeGen patch for `v_pack_b32_f16_t16` will be done
is the following patch.
2024-12-18 13:28:42 -05:00
Brox Chen
c3241a9a4d
[AMDGPU][True16][MC] test update for v_subrev_f16 in true16 (#119315)
This is a NFC change. Update mc test for v_subrev_f16 in true16 format.

MC source change was done by previous patch and automatically enabled by
t16 pesudo
2024-12-18 13:01:08 -05:00
Brox Chen
5270e63cdc
[AMDGPU][True16][MC] test update for v_ldexp_f16 in true16 (#119313)
This is a NFC change. Update mc test for v_ldexp_f16 in true16 format.

MC source change was done by previous patch and automatically enabled by
t16 pesudo
2024-12-18 13:00:07 -05:00
Ruiling, Song
67c55b1ffc
[AMDGPU] Make max dwords of memory cluster configurable (#119342)
We find it helpful to increase the value for graphics workload. Make it
configurable so we can experiment with a different value.
2024-12-18 14:17:27 +08:00
Brox Chen
de2acda3df
[AMDGPU][True16][MC] support more VOP3 inst in true16/fake16 format (#113603)
Support true16 and fake16 format for more VOP3 instructions in MC

This patch updates the true16 and fake16 vop_profile for the following
instructions and update the asm/dasm tests:
v_mad_u16
v_mad_i16
v_med3_f16
v_med3_i16
v_med3_u16
v_max3_f16
v_max3_i16
v_max3_u16
v_min3_f16
v_min3_i16
v_min3_u16
v_med3_num_f16
2024-12-17 13:58:01 -05:00
Brox Chen
b26f534980
[AMDGPU][True16][MC] test update for v_and/or/xor_b16 in true16 (#119489)
This is a NFC change. Update mc test for v_and/or/xor_b16 in true16
format.

MC source change was done by previous patch and automatically enabled by
t16 pesudo
2024-12-17 13:26:59 -05:00
Brox Chen
f9a9173b6c
[AMDGPU][True16][MC] test update for v_mul_f16 in true16 (#119314)
This is a NFC change. Update mc test for v_mul_f16 in true16 format.

MC source change was done by previous patch and automatically enabled by
t16 pesudo
2024-12-17 13:24:32 -05:00
Brox Chen
8bbbcaddbb
[AMDGPU][True16][MC] test update for v_max_f16/v_min_f16 in true16 (#119291)
This is a NFC change. Update mc test for v_max/min_f16 in true16 format.

MC source change was done by previous patch and automatically enabled by
t16 pesudo
2024-12-17 13:12:39 -05:00
Mirko Brkušanin
f7988a338d
[AMDGPU][SIPreEmitPeephole] Fix mustRetainExeczBranch (#120121)
Do not remove S_CBRANCH_EXECZ if one of the following blocks contains an
unconditional branch to a block other than the one immediately following
it. This can cause unwanted behavior like infinite loops.
2024-12-17 11:47:38 +01:00
Matt Arsenault
8387cbd0f9
AMDGPU: Delete spills of undef values (#119684)
AMDGPU: Delete spills of undef values

It would be a bit more logical to preserve the undef and do the normal
expansion, but this is less work. This avoids verifier errors in a
future patch which starts deleting liveness from registers after
allocation failures which results in spills of undef values.

https://reviews.llvm.org/D122607

Move where undef sgpr spills are deleted
2024-12-17 13:08:38 +07:00
Matt Arsenault
d866005f69
AMDGPU: Do not assert on unhandled types when demangling libcalls (#120068) 2024-12-16 20:27:06 +07:00
Sergei Barannikov
03847f19f2
[SelectionDAG] Add empty implementation of SelectionDAGInfo to some targets (#119968)
#119969 adds a couple of new methods to this class, which will need to
be overridden by these targets.

Part of #119709.

Pull Request: https://github.com/llvm/llvm-project/pull/119968
2024-12-16 15:13:46 +03:00
Juan Manuel Martinez Caamaño
ace87ec04c
[AMDGPU][AMDGPURegBankInfo] Map S_BUFFER_LOAD_XXX to its corresponding BUFFER_LOAD_XXX (#117574)
In one test code generation diverged between GISEL and DAG

For example, this intrinsic

> %ld = call i8 @llvm.amdgcn.s.buffer.load.u8(<4 x i32> %src, i32
%offset, i32 0)

would be lowered into these two cases:
* `buffer_load_u8 v2, v2, s[0:3], null offen`
* `buffer_load_b32 v2, v2, s[0:3], null offen`

This patch fixes this issue.
2024-12-16 10:24:33 +01:00
Matt Arsenault
1100d6a995
AMDGPU: Fix libcall recognition of image array types (#119832)
Add tests with get_image_width as a sample for all of the non-extension
image types. The transform doesn't do anything, but this runs through
all the mangled libfunc parsing and shows it does not crash. It would
probably be smarter to check for exact match of the types, rather than
checking the prefix.
2024-12-16 15:04:53 +09:00
Matt Arsenault
b446c208a5
AMDGPU: Verify function type matches when matching libcalls (#119043)
Previously this would recognize a call to a mangled ldexp(float, float)
as a candidate to replace with the intrinsic. We need to verify the second
parameter is in fact an integer.

Fixes: SWDEV-501389
2024-12-16 15:01:48 +09:00
Aaditya
0ae75eba67
[AMDGPU] Assert if stack grows downwards. (#119888) 2024-12-14 17:44:40 +05:30
Kirill Stoimenov
e821f642fd Revert "[AMDGPU][CodeGen] Do not backtrace invalid -regalloc param (#119687)"
Causes bot failure: https://lab.llvm.org/buildbot/#/builders/55/builds/4246/steps/11/logs/stdio

This reverts commit 7a648554f886fbc043c4f3f58ca88f6c4535f2cf.
2024-12-14 03:47:53 +00:00
Matt Arsenault
5f72f2c8fd
AMDGPU: Remove large, negative AddedComplexity from minimum/maximum patterns (#119795) 2024-12-14 06:17:00 +09:00
Ramkumar Ramachandra
4a0d53a0b0
PatternMatch: migrate to CmpPredicate (#118534)
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.

This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
2024-12-13 14:18:33 +00:00
Akshat Oke
7a648554f8
[AMDGPU][CodeGen] Do not backtrace invalid -regalloc param (#119687)
No need to generate a stack trace and a GitHub issue prompt on a wrongly
set regalloc option.
2024-12-13 11:58:53 +05:30
paperchalice
1562b70eaf
Reapply "[DomTreeUpdater] Move critical edge splitting code to updater" (#119547)
This relands commit #115111.
Use traditional way to update post dominator tree, i.e. break critical
edge splitting into insert, insert, delete sequence.
When splitting critical edges, the post dominator tree may change its
root node, and `setNewRoot` only works in normal dominator tree...
See

6c7e5827ed/llvm/include/llvm/Support/GenericDomTree.h (L684-L687)
2024-12-13 11:43:09 +08:00
Matt Arsenault
5e53a8dadb
AMDGPU: Fix verifier assert with out of bounds subregister indexes (#119799)
The manual check for aligned VGPR classes would assert if a virtual
register used an index not supported by the register class.
2024-12-13 11:52:11 +09:00
Matt Arsenault
37cd7926b7
AMDGPU: Fix entry for mac in VGPR->AGPR MFMA table (#119693) 2024-12-13 07:53:05 +09:00
choikwa
463e93b95f
Reapply [AMDGPU] prevent shrinking udiv/urem if either operand exceeds signed max (#119325)
This reverts commit 254d206ee2a337cb38ba347c896f7c6a14c7f218.

+Added a fix in ExpandDivRem24 to disqualify if DivNumBits exceed 24.

Original commit & msg:
ce6e955ac374f2b86cbbb73b2f32174dffd85f25.
Handle signed and unsigned path differently in getDivNumBits. Using
computeKnownBits, this rejects shrinking unsigned div/rem if operands
exceed signed max since we know NumSignBits will be always 0.
2024-12-12 15:24:34 -05:00
Tim Gymnich
2db2dc8ab9
[GlobalISel][NFC] Fix LLT Propagation (#119587)
Retain LLT type information by creating new LLTs from the original LLT
instead of only using the original scalar size.

This PR prepares for the [LLT FPInfo
RFC](https://discourse.llvm.org/t/rfc-globalisel-adding-fp-type-information-to-llt/83349/24)
where LLTs will carry additional floating point type information in
addition to the scalar size.
2024-12-12 09:47:46 -08:00
Pravin Jagtap
bb1961ed77
[AMDGPU] Stop using True16 profile for v_bitop3_b16 of gfx950. (#119706) 2024-12-12 20:12:08 +05:30
Pravin Jagtap
bdaa82a7bb
[AMDGPU] Mark AGPR tuple implicit in the first instr of AGPR spills. (#115285)
When AGPRs are spilled to stack through VGPRs, the pei only marks the
AGPR tuple as implicit-def. To preserve the liveness, it should also
mark the tuple implicit.

Fixes: SWDEV-462189
2024-12-12 19:47:17 +05:30
Akshat Oke
0876c11cee
[AMDGPU] Parse wwm filter flag for regalloc fast (#119347) 2024-12-12 13:51:02 +05:30
Matt Arsenault
ea632e1b34
Reapply "DiagnosticInfo: Clean up usage of DiagnosticInfoInlineAsm" (#119575) (#119634)
This reverts commit 40986feda8b1437ed475b144d5b9a208b008782a.

Reapply with fix to prevent temporary Twine from going out of scope.
2024-12-11 16:01:48 -08:00
Shilei Tian
f4037277bb
[AMDGPU][Attributor] Make AAAMDWavesPerEU honor existing attribute (#114438) 2024-12-11 16:50:06 -05:00
Shilei Tian
7dbd6cd294
[AMDGPU][Attributor] Make AAAMDFlatWorkGroupSize honor existing attribute (#114357)
If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by
taking its value directly; otherwise, it uses the default range as a starting
point. We will no longer manipulate the known range, which can cause issues
because the known range is a "throttle" to the assumed range such that the
assumed range can't get widened properly in `updateImpl` if the known range is
not set properly for whatever reasons. Another benefit of not touching the known
range is, if we indicate pessimistic state, it also invalidates the AA such that
`manifest` will not be called. Since we honor the attribute, we don't want and
will not add any half-baked attribute added to a function.
2024-12-11 16:47:51 -05:00
Sergei Barannikov
6b2232606d
[TableGen] Replace WantRoot/WantParent SDNode properties with flags (#119599)
These properties are only valid on ComplexPatterns. Having them as flags
is more convenient because one can now use "let = ... in" syntax to set
these flags on several patterns at a time. This is also less error-prone
as it makes it impossible to specify these properties on records derived
from SDPatternOperator.

Pull Request: https://github.com/llvm/llvm-project/pull/119599
2024-12-12 00:41:44 +03:00
Vitaly Buka
40986feda8
Revert "DiagnosticInfo: Clean up usage of DiagnosticInfoInlineAsm" (#119575)
Reverts llvm/llvm-project#119485

Breaks builders, details in llvm/llvm-project#119485
2024-12-11 07:51:36 -08:00
Pravin Jagtap
5e007afa9d
[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589)
Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when
partially writing to high bytes.
Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop
for above cases irrespective of opsel values.

Note: We might need to add few others into the same table.
2024-12-11 18:38:10 +05:30
Jay Foad
8eb12f6775
[AMDGPU] Support s_endpgm_ordered_ps_done on GFX11 (#119230)
Support assembly/disassembly of this instruction for compatibility with
SP3, even though it has no use in GFX11. It is fully removed in GFX12.
2024-12-11 11:48:36 +00:00
Matt Arsenault
884f2ad6f9
DiagnosticInfo: Clean up usage of DiagnosticInfoInlineAsm (#119485)
Currently LLVMContext::emitError emits any error as an "inline asm"
error which does not make any sense. InlineAsm appears to be special,
in that it uses a "LocCookie" from srcloc metadata, which looks like
a parallel mechanism to ordinary source line locations. This meant
that other types of failures had degraded source information reported
when available.

Introduce some new generic error types, and only use inline asm
in the appropriate contexts. The DiagnosticInfo types are still
a bit of a mess, and I'm not sure why DiagnosticInfoWithLocationBase
exists instead of just having an optional DiagnosticLocation in the
base class.

DK_Generic is for any error that derives from an IR level instruction,
and thus can pull debug locations directly from it. DK_GenericWithLoc
is functionally the generic codegen error, since it does not depend
on the IR and instead can construct a DiagnosticLocation from the
MI debug location.
2024-12-11 17:16:07 +09:00
paperchalice
553058f825
Revert "[DomTreeUpdater] Move critical edge splitting code to updater" (#119512)
Reverts llvm/llvm-project#115111 Causes #119511
2024-12-11 14:25:17 +08:00
paperchalice
79047fac65
[DomTreeUpdater] Move critical edge splitting code to updater (#115111)
Support critical edge splitting in dominator tree updater. Continue the
work in #100856.

Compile time check:
https://llvm-compile-time-tracker.com/compare.php?from=87c35d782795b54911b3e3a91a5b738d4d870e55&to=42b3e5623a9ab4c3648564dc0926b36f3b438a3a&stat=instructions%3Au
2024-12-11 11:31:42 +08:00