8103 Commits

Author SHA1 Message Date
Vikram Hegde
f1fa292cd6
[AMDGPU] Pre-commit tests for "lshr + mad" fold (#119509) 2025-01-01 10:17:37 +05:30
Jay Foad
2d6d723a85
[AMDGPU] Add some more GFX12 test coverage (#120581) 2024-12-23 09:42:52 +00:00
Chaitanya
21996bd69c
[AMDGPU] Remove amdgpu-no-heap-ptr and amdgpu-no-lds-kernel-id attributes from lowered kernels in amdgpu-sw-lower-lds pass (#120887)
'amdgpu-sw-lower-lds' pass internally calls '__asan_malloc_impl' for
heap memory allocation.
Pass also uses 'amdgcn_lds_kernel_id' for non-kernel lds accesses
lowering.

This patch removes 'amdgpu-no-heap-ptr' and 'amdgpu-no-lds-kernel-id'
from all kernels lowered by the pass.
2024-12-23 12:42:31 +05:30
Aaditya
c7606710f9
[AMDGPU] Update base addr of dyn alloca considering GrowingUp stack (#119822)
Currently, compiler calculates the base address of
dynamic sized stack object (alloca) as follows:
1. `NewSP = Align(CurrSP + Size)`
_where_ `Size = # of elements * wave size * alloca type`
2. `BaseAddr = NewSP`
3. The alignment is computed as: `AlignedAddr = Addr & ~(Alignment - 1)`
4. Return the `BaseAddr`
This makes sense when stack is grows downwards.

AMDGPU stack grows upwards, the base address 
needs to be aligned first and SP bump by required size later:
1. `BaseAddr = Align(CurrSP)`
2. `NewSP = BaseAddr + Size`
3. `AlignedAddr = (Addr + (Alignment - 1)) & ~(Alignment - 1)`
4. and returns the `BaseAddr`.
2024-12-20 10:27:27 +05:30
Brox Chen
08db696c87
[AMDGPU][True16][MC] V_MED3_I/U16_fake16 CodeGen pattern (#120600)
In this patch https://github.com/llvm/llvm-project/pull/113603 replace
`V_MED3_I/U16` to `V_MED3_I/U16_fake16` for Post-GFX11, but it miss to
update the CodeGen pattern. This patch update and corrert the CodeGen
pattern
2024-12-20 10:53:58 +07:00
Matt Arsenault
44201679c6
AMDGPU: Fix mishandling of search for constantexpr addrspacecasts (#120346) 2024-12-20 07:37:19 +07:00
Konstantina Mitropoulou
d3508ccd15
[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. (#120588)
- **[AMDGPU] Add new test.**
- **[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions.**

---------

Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou@amd.com>
2024-12-19 11:20:43 -08:00
Jun Wang
d57230c72e
[AMDGPU][MC] Disallow op_sel in some VOP3P dot instructions (#100485)
In v_dot4 and v_dot8 instructions with 4- or 8-bit packed data (e.g.,
v_dot4_u32_u8, v_dot8_u32_u4), the op_sel modifier should not be
allowed.
2024-12-18 10:50:47 -08:00
Brox Chen
c6f753b9a0
[AMDGPU][True16][MC] true16 for v_pack_b32_f16 (#119630)
Support true16 format for v_pack_b32_f16  in MC.

Since we are replacing v_alignbit_b32 to
`v_pack_b32_f16_t16/v_pack_b32_f16_fake16` in Post-GFX11, have to update
the CodeGen pattern for `v_pack_b32_f16_fake16 `to get CodeGen test
passing. There is no pattern modified/created, but just replacing the
`v_pack_b32_f16` with fake16 format.

Some of the true16 CodeGen test are impacted since `v_pack_b32_f16`
selection are removed in Post-GFX11 while `v_pack_b32_f16_t16` are not
yet supported. The CodeGen patch for `v_pack_b32_f16_t16` will be done
is the following patch.
2024-12-18 13:28:42 -05:00
Aaditya
0446990cc7
Reapply "[NFC][AMDGPU] Pre-commit clang and llvm tests for dynamic allocas" (#120410)
This reapplies commit https://github.com/llvm/llvm-project/pull/120063.

A machine-verifier bug was causing a crash in the previous commit. 
This has been addressed in
https://github.com/llvm/llvm-project/pull/120393.
2024-12-18 18:20:45 +05:30
Sergei Barannikov
1941f34172
[TableGen][GISel] Import more "multi-level" patterns (#120332)
Previously, if the destination DAG has an untyped leaf, we would import
the pattern only if that leaf is defined by the *top-level* source DAG.
This is an unnecessary restriction.

Here is an example of such pattern:
```
def : Pat<(add (mul v8i16:$vA, v8i16:$vB), v8i16:$vC),
          (VMLADDUHM $vA, $vB, $vC)>;
```

Previously, it failed to import because `add` doesn't define neither
`$vA` nor `$vB`.

This change reduces the number of skipped patterns as follows:

```
AArch64: 8695 ->  8548 (-147)
AMDGPU: 11333 -> 11240 (-93)
ARM:     4297 ->  4278 (-1)
PowerPC: 3955 ->  3010 (-945)
```

Other GISel-enabled targets are unaffected.
2024-12-18 14:44:55 +03:00
Aaditya
414c462a83
[AMDGPU] Modify Dyn Alloca test to account for Machine-Verifier bug (#120393)
Machine-Verifier crashes in kernel functions, 
but fails gracefully in device functions.

This is due to the buffer resource descriptor selected
during G-ISEL, before the fallback path. 
Device functions use `$sgpr0_sgpr1_sgpr2_sgpr3`.
while Kernel functions select `$private_rsrc_reg` 
where machine-verifier complains: 
`$private_rsrc_reg is not a SReg_128 register.`

Modifying test case to capture both behaviors, this is related to
https://github.com/llvm/llvm-project/pull/120063
2024-12-18 16:08:17 +05:30
Aaditya
d6e8ab1fa6
Revert "[NFC][AMDGPU] Pre-commit clang and llvm tests for dynamic allocas" (#120369)
Reverts llvm/llvm-project#120063 due to build-bot failures
2024-12-18 14:06:49 +07:00
Aaditya
99c2e3b782
[NFC][AMDGPU] Pre-commit clang and llvm tests for dynamic allocas (#120063)
For #119822
2024-12-18 12:14:37 +05:30
Ruiling, Song
67c55b1ffc
[AMDGPU] Make max dwords of memory cluster configurable (#119342)
We find it helpful to increase the value for graphics workload. Make it
configurable so we can experiment with a different value.
2024-12-18 14:17:27 +08:00
Mirko Brkušanin
f7988a338d
[AMDGPU][SIPreEmitPeephole] Fix mustRetainExeczBranch (#120121)
Do not remove S_CBRANCH_EXECZ if one of the following blocks contains an
unconditional branch to a block other than the one immediately following
it. This can cause unwanted behavior like infinite loops.
2024-12-17 11:47:38 +01:00
Matt Arsenault
3508d8f6dd
RegAllocFast: Avoid using temporary DiagnosticInfo (#120184)
This reverts commit 1297933f35b4948b4d281259627a72094c407a75.
2024-12-17 16:19:26 +07:00
Matt Arsenault
8387cbd0f9
AMDGPU: Delete spills of undef values (#119684)
AMDGPU: Delete spills of undef values

It would be a bit more logical to preserve the undef and do the normal
expansion, but this is less work. This avoids verifier errors in a
future patch which starts deleting liveness from registers after
allocation failures which results in spills of undef values.

https://reviews.llvm.org/D122607

Move where undef sgpr spills are deleted
2024-12-17 13:08:38 +07:00
Thurston Dang
1297933f35
[CodeGen] Disable ran-out-of-registers-error* tests (#120142)
Two tests are failing on the buildbot in stage2/asan with a stack
use-after-scope:
https://lab.llvm.org/buildbot/#/builders/52/builds/4533 (first failure
here; contains https://github.com/llvm/llvm-project/pull/119492 and
https://github.com/llvm/llvm-project/pull/119640)
    ...
    https://lab.llvm.org/buildbot/#/builders/52/builds/4550

This patch disables the tests for now, to allow the bots to return to
green (instead of reverting the patch series).
2024-12-16 12:39:03 -08:00
Matt Arsenault
d866005f69
AMDGPU: Do not assert on unhandled types when demangling libcalls (#120068) 2024-12-16 20:27:06 +07:00
Juan Manuel Martinez Caamaño
ace87ec04c
[AMDGPU][AMDGPURegBankInfo] Map S_BUFFER_LOAD_XXX to its corresponding BUFFER_LOAD_XXX (#117574)
In one test code generation diverged between GISEL and DAG

For example, this intrinsic

> %ld = call i8 @llvm.amdgcn.s.buffer.load.u8(<4 x i32> %src, i32
%offset, i32 0)

would be lowered into these two cases:
* `buffer_load_u8 v2, v2, s[0:3], null offen`
* `buffer_load_b32 v2, v2, s[0:3], null offen`

This patch fixes this issue.
2024-12-16 10:24:33 +01:00
Matt Arsenault
1100d6a995
AMDGPU: Fix libcall recognition of image array types (#119832)
Add tests with get_image_width as a sample for all of the non-extension
image types. The transform doesn't do anything, but this runs through
all the mangled libfunc parsing and shows it does not crash. It would
probably be smarter to check for exact match of the types, rather than
checking the prefix.
2024-12-16 15:04:53 +09:00
Matt Arsenault
b446c208a5
AMDGPU: Verify function type matches when matching libcalls (#119043)
Previously this would recognize a call to a mangled ldexp(float, float)
as a candidate to replace with the intrinsic. We need to verify the second
parameter is in fact an integer.

Fixes: SWDEV-501389
2024-12-16 15:01:48 +09:00
David Green
9ba7e2da00
[GlobalISel] Use replaceRegOrBuildCopy when legalizer-combining s/zext(undef) (#119850)
Similar to #119721, this helps remove some of the COPYs created by the
CSE builder.
2024-12-16 05:57:11 +00:00
Matt Arsenault
818bffcb1c
RegAlloc: Fix failure on undef use when all registers are reserved (#119647)
Greedy and fast would hit different assertions on undef uses if all
registers in a class were reserved.
2024-12-16 10:56:45 +09:00
Matt Arsenault
61f99a1c75
RegAlloc: Do not fatal error if there are no registers in the alloc order (#119640)
Try to use DiagnosticInfo if every register in the class is reserved
by forcing assignment to a reserved register. Also reduces the number
of redundant errors emitted, particularly with fast.

This is still broken in the case of undef uses. There are additional
complications in greedy and fast, so leave it for a separate fix.
2024-12-16 10:52:49 +09:00
Matt Arsenault
bb18e49edb
RegAlloc: Use DiagnosticInfo to report register allocation failures (#119492)
Improve the non-fatal cases to use DiagnosticInfo, which will now
provide a location. The allocators attempt to report different errors
if it happens to see inline assembly is involved (this detection is
quite unreliable) using srcloc instead of dbgloc. For now, leave this
behavior unchanged. I think reporting the full location and context
function would be more useful.
2024-12-16 10:49:08 +09:00
Fangrui Song
9afaf9c6c8 [AMDGPU,test] Change llc -march= to -mtriple=
Follow-up to 806761a7629df268c8aed49657aeccffa6bca449
2024-12-15 10:54:21 -08:00
Kirill Stoimenov
e821f642fd Revert "[AMDGPU][CodeGen] Do not backtrace invalid -regalloc param (#119687)"
Causes bot failure: https://lab.llvm.org/buildbot/#/builders/55/builds/4246/steps/11/logs/stdio

This reverts commit 7a648554f886fbc043c4f3f58ca88f6c4535f2cf.
2024-12-14 03:47:53 +00:00
Akshat Oke
7a648554f8
[AMDGPU][CodeGen] Do not backtrace invalid -regalloc param (#119687)
No need to generate a stack trace and a GitHub issue prompt on a wrongly
set regalloc option.
2024-12-13 11:58:53 +05:30
Pedro Lobo
05137cc507
[AsmParser] Convert empty arrays to poison (#119754)
Empty arrays can be converted to `poison` instead of `undef`.
2024-12-12 22:44:10 +01:00
choikwa
463e93b95f
Reapply [AMDGPU] prevent shrinking udiv/urem if either operand exceeds signed max (#119325)
This reverts commit 254d206ee2a337cb38ba347c896f7c6a14c7f218.

+Added a fix in ExpandDivRem24 to disqualify if DivNumBits exceed 24.

Original commit & msg:
ce6e955ac374f2b86cbbb73b2f32174dffd85f25.
Handle signed and unsigned path differently in getDivNumBits. Using
computeKnownBits, this rejects shrinking unsigned div/rem if operands
exceed signed max since we know NumSignBits will be always 0.
2024-12-12 15:24:34 -05:00
Craig Topper
7ece560a50
[GISel] Support narrowing G_ICMP with more than 2 parts. (#119335)
This allows us to support i128 G_ICMP on RV32. I'm not sure how to test
the "left over" part of this as RISC-V always widens to a power of 2
before narrowing.
2024-12-12 09:50:26 -08:00
Pravin Jagtap
bdaa82a7bb
[AMDGPU] Mark AGPR tuple implicit in the first instr of AGPR spills. (#115285)
When AGPRs are spilled to stack through VGPRs, the pei only marks the
AGPR tuple as implicit-def. To preserve the liveness, it should also
mark the tuple implicit.

Fixes: SWDEV-462189
2024-12-12 19:47:17 +05:30
Matt Arsenault
ea632e1b34
Reapply "DiagnosticInfo: Clean up usage of DiagnosticInfoInlineAsm" (#119575) (#119634)
This reverts commit 40986feda8b1437ed475b144d5b9a208b008782a.

Reapply with fix to prevent temporary Twine from going out of scope.
2024-12-11 16:01:48 -08:00
Shilei Tian
f4037277bb
[AMDGPU][Attributor] Make AAAMDWavesPerEU honor existing attribute (#114438) 2024-12-11 16:50:06 -05:00
Shilei Tian
7dbd6cd294
[AMDGPU][Attributor] Make AAAMDFlatWorkGroupSize honor existing attribute (#114357)
If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by
taking its value directly; otherwise, it uses the default range as a starting
point. We will no longer manipulate the known range, which can cause issues
because the known range is a "throttle" to the assumed range such that the
assumed range can't get widened properly in `updateImpl` if the known range is
not set properly for whatever reasons. Another benefit of not touching the known
range is, if we indicate pessimistic state, it also invalidates the AA such that
`manifest` will not be called. Since we honor the attribute, we don't want and
will not add any half-baked attribute added to a function.
2024-12-11 16:47:51 -05:00
Vitaly Buka
40986feda8
Revert "DiagnosticInfo: Clean up usage of DiagnosticInfoInlineAsm" (#119575)
Reverts llvm/llvm-project#119485

Breaks builders, details in llvm/llvm-project#119485
2024-12-11 07:51:36 -08:00
Pravin Jagtap
5e007afa9d
[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589)
Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when
partially writing to high bytes.
Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop
for above cases irrespective of opsel values.

Note: We might need to add few others into the same table.
2024-12-11 18:38:10 +05:30
Matt Arsenault
884f2ad6f9
DiagnosticInfo: Clean up usage of DiagnosticInfoInlineAsm (#119485)
Currently LLVMContext::emitError emits any error as an "inline asm"
error which does not make any sense. InlineAsm appears to be special,
in that it uses a "LocCookie" from srcloc metadata, which looks like
a parallel mechanism to ordinary source line locations. This meant
that other types of failures had degraded source information reported
when available.

Introduce some new generic error types, and only use inline asm
in the appropriate contexts. The DiagnosticInfo types are still
a bit of a mess, and I'm not sure why DiagnosticInfoWithLocationBase
exists instead of just having an optional DiagnosticLocation in the
base class.

DK_Generic is for any error that derives from an IR level instruction,
and thus can pull debug locations directly from it. DK_GenericWithLoc
is functionally the generic codegen error, since it does not depend
on the IR and instead can construct a DiagnosticLocation from the
MI debug location.
2024-12-11 17:16:07 +09:00
Shilei Tian
15f87bc10c [NFC][AMDGPU] Auto generate check lines for llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit.ll 2024-12-10 12:40:28 -05:00
Dan Gohman
e665e781dc
[SelectionDAG] Use the nuw flag when expanding loads. (#119288)
When expanding a load into two loads, use nuw for the add that computes
the offset from the base of the second load, because the original load
doesn't straddle the address space.

It turns out there's already a dedicated helper function for doing this,
`getObjectPtrOffset`.

This is in target-independent code, however in practice it only seems to
affact WebAssembly code, because WebAssembly load and store
instructions' constant offsets don't perform wrapping, so constant
folding often depends on the nuw flag being present.

This was noticed in the development of #119204.
2024-12-10 06:28:09 -08:00
Piotr Sobczak
a2d086af2c
[AMDGPU] Fix FMA combine (#119217)
Update the check in the FMA combine to check dot10-insts instead of
dot7-insts.

The target of the combine, v_dot2_f32_f16, is available only if
dot10-insts target feature is enabled.

The issue probably dates back to the change that split out dot10-insts
out of dot7-insts.

As far as I can see, this does not affect any current targets, but if a
future target has dot7-insts, but not dot10-insts that would cause a
crash ("cannot select") for the input ir in the test.
2024-12-10 10:11:19 +01:00
Jun Wang
41ed16c3b3
Reapply "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647)" (#118907)
This reverts commit 1ef9410a96c1d9669a6feaf03fcab8d0a4a13bd5.

This fixes the test file attributor-flatscratchinit-globalisel.ll.
2024-12-09 16:44:48 -08:00
Brox Chen
85142f5b35
[AMDGPU][True16][CodeGen] support for true16 for vinterp 16bit instructions (#116702)
vinterp 16bit instructions codeGen support in True16 format

Currently only enable two tests, will enable more when more true16
instructions are supported
2024-12-09 11:52:05 -05:00
Matt Arsenault
009368f130
AMDGPU: Mark grid size loads with range metadata (#113019)
Only handles the v5 case.
2024-12-09 11:01:55 -05:00
Matt Arsenault
664a226bf6
AMDGPU: Propagate amdgpu-max-num-workgroups attribute (#113018)
I'm not sure what the interpretation of 0 is supposed to be,
AMDGPUUsage doesn't say.
2024-12-09 09:57:27 -06:00
Joseph Huber
254d206ee2 Revert "Reapply "[AMDGPU] prevent shrinking udiv/urem if either operand is in… (#118928)"
This reverts commit 509893b58ff444a6f080946bd368e9bde7668f13.

This broke the libc build again https://lab.llvm.org/buildbot/#/builders/73/builds/9787.
2024-12-09 08:10:49 -06:00
David Green
9a415f6d6b
[GlobalISel] Fold ptrtoint(undef) and inttoptr(undef) to undef. (#119073)
This helps with shuffles a little, and one of the amd gpu tests is now
equivalent to the SDAG version.
2024-12-09 08:52:22 +00:00
Vikash Gupta
0b0d9a3bee
[CodeGen] [AMDGPU] Attempt DAGCombine for fmul with select to ldexp (#111109)
The materialization cost of 32-bit non-inline in case of fmul is quite
relatively more, rather than if possible to combine it into ldexp
instruction for specific scenarios (for datatypes like f64, f32 and f16)
as this is being handled here :

The dag combine for any pair of select values which are exact exponent
of 2.

```
fmul x, select(y, A, B)       -> ldexp (x, select i32 (y, a, b))
fmul x, select(y, -A, -B)    -> ldexp ((fneg x), select i32 (y, a, b))

where, A=2^a & B=2^b ; a and b are integers.  
```

This dagCombine is handled separately in fmulCombine (newly defined in
SIIselLowering), targeting fmul fusing it with select type operand into
ldexp.

Thus, it fixes #104900.
2024-12-09 12:52:04 +05:30