7032 Commits

Author SHA1 Message Date
Matt Arsenault
f7c3627338
DAG: Implement promotion for strict_fpextend (#74310)
Test is a placeholder, will be merged into the existing test after
additional bug fixes for illegal f16 targets are fixed.
2023-12-22 17:15:52 +07:00
Matt Arsenault
c7952d8860 AMDGPU: Add a few more bfloat codegen tests 2023-12-22 12:31:42 +07:00
Matt Arsenault
50ed3b1ecc AMDGPU: Workaround a divergent return value bug in test 2023-12-22 12:31:42 +07:00
Jay Foad
8fdfd34cd2
[AMDGPU] Remove GDS and GWS for GFX12 (#76148) 2023-12-21 15:27:08 +00:00
Matt Arsenault
b01adc6bed AMDGPU: Strengthen some bfloat tests
Fix bitcast test, which was splitting apart phis intended to force
bitcasts that survive all the way to selection.

Disable the amdgpu-codegenprepare phi splitting, which defeats the technique
of using a phi to ensure a bitcast reaches all the way to selection. Also
add a variety of bfloat tests. These probably need revisiting to avoid the
cast folding into argument loads. Also round out set of bfloat bitcast and
ABI tests.

Add codegen tests for more bf16 operations The promotion of these works
contrary to the comment.
2023-12-20 19:33:45 +07:00
Matt Arsenault
9e574a3936 DAG: Fix expansion of bf16 sourced extloads
Also fix assorted vector extload failures for AMDGPU.
2023-12-20 19:24:27 +07:00
Mariusz Sikora
9a41a80e76
[AMDGPU] Handle object size and bail if assume-like intrinsic is used in PromoteAllocaToVector (#68744)
Attached test will cause crash without this change.

We should not remove isAssumeLikeIntrinsic instruction if it is used by
other instruction.
2023-12-20 07:47:49 +01:00
Jeffrey Byrnes
f1156fb622
[AMDGPU][IGLP]: Add SchedGroupMask::TRANS (#75416)
Makes constructing SchedGroups of this type easier, and provides ability
to create them with __builtin_amdgcn_sched_group_barrier
2023-12-19 16:54:18 -08:00
Matt Arsenault
1196975286 AMDGPU: Add gfx11 run line to bf16 test 2023-12-19 17:12:52 +07:00
Mariusz Sikora
a018c8cdbb
GFX12: Add LoopDataPrefetchPass (#75625)
It is currently disabled by default. It will need experiments on a real
HW to tune and decide on the profitability.

---------

Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-19 08:32:16 +01:00
James Y Knight
137f785fa6
[AMDGPU] Set MaxAtomicSizeInBitsSupported. (#75185)
This will result in larger atomic operations getting expanded to
`__atomic_*` libcalls via AtomicExpandPass, which matches what Clang
already does in the frontend.

While AMDGPU currently disables the use of all libcalls, I've changed it
to instead disable all of them _except_ the atomic ones. Those are
already be emitted by the Clang frontend, and enabling them in the
backend allows the same behavior there.
2023-12-18 16:51:06 -05:00
Simon Pilgrim
7b1e4239b3
[DAG] Fold (vt trunc (extload (vt x))) -> (vt load x) (#75229)
We were only folding cases which remained extloads, but DAG.getExtLoad can also handle the cases which don't need to extend at all (we just can't do truncloads).

reduceLoadWidth can handle this for scalar loads, but not for vectors.

Noticed while triaging D152928
2023-12-18 16:21:11 +00:00
Carl Ritson
5139299618
[AMDGPU] Track physical VGPRs used for SGPR spills (#75573)
Physical VGPRs used for SGPR spills need to be tracked independent of
WWM reserved registers. The WWM reserved set contains extra registers
allocated during WWM pre-allocation pass.

This causes SGPR spills allocated after WWM pre-allocation to overlap
with WWM register usage, e.g. if frame pointer is spilt during
prologue/epilog insertion.
2023-12-17 16:44:16 +09:00
Mariusz Sikora
414d27419f
[AMDGPU] GFX12: select @llvm.prefetch intrinsic (#74576)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-15 17:15:55 +01:00
Jessica Del
32f9983c06
[AMDGPU] - Add address space for strided buffers (#74471)
This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they
point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which contain a
128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially,
they are fat buffer pointers with an additional 32-bit index.
2023-12-15 15:49:25 +01:00
Mirko Brkušanin
07a6d73664
[AMDGPU] CodeGen for GFX12 VFLAT, VSCRATCH and VGLOBAL instructions (#75493) 2023-12-15 15:01:40 +01:00
Mirko Brkušanin
5879162f7f
[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492) 2023-12-15 13:45:03 +01:00
Mirko Brkušanin
26b14aedb7
[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488) 2023-12-15 12:40:23 +01:00
Pierre van Houtryve
ef067f5204
[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (#72830)
Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>
2023-12-15 12:33:32 +01:00
Mirko Brkušanin
a278ac577e
[AMDGPU] CodeGen for SMEM instructions (#75579) 2023-12-15 12:10:33 +01:00
Mariusz Sikora
229273f538
[AMDGPU] Update permlane test for GFX12 (#75572) 2023-12-15 11:18:23 +01:00
Mirko Brkušanin
569ef8ddd9
[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204) 2023-12-15 10:41:05 +01:00
Carl Ritson
0ed0b7458a [AMDGPU] Pre-commit test for #75573. NFC
Shows spill allocation overlapping with WWM register use.
2023-12-15 18:29:08 +09:00
Mariusz Sikora
966416b9e8
[AMDGPU][GFX12] Add new v_permlane16 variants (#75475) 2023-12-15 10:14:38 +01:00
Pierre van Houtryve
f1ea77f7be
[AMDGPU][SIInsertWaitcnts] Set initial state for VS_CNT in non-kernel functions (#75436)
Split from #72830
2023-12-15 08:31:14 +01:00
Saiyedul Islam
e21b7e2143
[AMDGPU][NFC] Check more autogenerated llc tests for COV5 (#75219)
Regenerate a few more llc tests to check for COV5 instead of the default
ABI version.
2023-12-15 10:27:49 +05:30
Jay Foad
3e6da3252f
[AMDGPU] Add GFX12 s_sleep_var instruction and intrinsic (#75499) 2023-12-14 21:11:39 +00:00
Mirko Brkusanin
c6351b4cc9 [AMDGPU][NFC] Regenerate .mir test 2023-12-14 18:58:43 +01:00
Valery Pykhtin
dd051295bc
[AMDGPU] Enable GCNRewritePartialRegUses pass by default. (#72975)
Let's try once again after #69957 has landed.
2023-12-14 14:10:27 +01:00
Stanislav Mekhanoshin
c6ecbcb48b
[AMDGPU] Fix no waitcnt produced between LDS DMA and ds_read on gfx10 (#75245)
BUFFER_LOAD_DWORD_LDS was incorrectly touching vscnt instead of the
vmcnt. This is VMEM load and DS store, so it shall use vmcnt.
2023-12-13 10:49:36 -08:00
Petar Avramovic
6892c175c5
AMDGPU/GlobalISel: add AMDGPUGlobalISelDivergenceLowering pass (#75340)
Add empty AMDGPUGlobalISelDivergenceLowering pass. This pass will
implement
- selection of divergent i1 phis as lane mask phis, requires lane mask
merging in some cases
- lower uses of divergent i1 values outside of the cycle using lane mask
merging
- lowering of all cases of temporal divergence:
- lower uses of uniform i1 values outside of the cycle using lane mask
merging
- lower uses of uniform non-i1 values outside of the cycle using a copy
to vgpr inside of the cycle

Add very detailed set of regression tests for cases mentioned above.

patch 1 from: https://github.com/llvm/llvm-project/pull/73337
2023-12-13 16:42:56 +01:00
Mariusz Sikora
7f55d7de1a
[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-13 15:01:13 +01:00
Piotr Sobczak
6eec80133b
[AMDGPU] Min/max changes for GFX12 (#75214)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 14:18:10 +01:00
Piotr Sobczak
fac093dd08
[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 13:52:40 +01:00
Stanislav Mekhanoshin
7f54070194
[AMDGPU] Precommit test for LDS DMA waitcounts. NFC. (#75240) 2023-12-12 12:29:09 -08:00
Jay Foad
8005ee6da3
[AMDGPU] CodeGen for GFX12 64-bit scalar add/sub (#75070) 2023-12-12 17:41:40 +00:00
Mariusz Sikora
a97028ac51
[AMDGPU] Update VOP instructions for GFX12 (#74853)
Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>
2023-12-12 11:38:24 +01:00
Saiyedul Islam
777b6de7a4
[AMDGPU][NFC] Test autogenerated llc tests for COV5 (#74339)
Regenerate a few llc tests to test for COV5 instead of the default ABI
version.
2023-12-12 14:35:13 +05:30
Jay Foad
35ebd92d3d
[GlobalISel] Add G_PREFETCH (#74863) 2023-12-11 11:06:50 +00:00
Pierre van Houtryve
dd32d26a37
[AMDGPU] Form V_MAD_U64_U32 from mul24 (#72393)
Fixes SWDEV-421067
2023-12-11 11:38:27 +01:00
Jay Foad
e38c29c2b7 [AMDGPU] Add GFX11 test coverage to integer-mad-patterns.ll 2023-12-08 13:06:03 +00:00
Saiyedul Islam
5c4c199fe3
[AMDGPU][NFC] Improve testing for AMDHSA ABI Version (#74300)
Add tests for COV4 as well as COV5 instead of only testing for the
default version.
2023-12-08 18:09:45 +05:30
Valery Pykhtin
901c5be524
[AMDGPU] Fix GCNUpwardRPTracker: max register pressure on defs. (#74422)
Treat a defined register as fully live "at" the instruction and update maximum pressure accordingly. Fixes #3786.
2023-12-08 11:27:08 +01:00
Pierre van Houtryve
ecd2f56a80
[AMDGPU] Warn if 'amdgpu-waves-per-eu' target occupancy was not met (#74055)
This should make it a bit harder to miss this type of issue. The warning
only shows if amdgpu-waves-per-eu is used.

See SWDEV-434482
2023-12-06 10:46:46 +01:00
Matt Arsenault
08e63dd8fe AMDGPU: Add a MIR test to catch infinite loop
This is derived from one of the regressions reported
after aed1a2217a1da0c9fb7d2c0856302dee25b1d4a1
2023-12-06 15:58:32 +07:00
Pranav Taneja
41507fe595
[GISel] Combine (Scalarize) vector load followed by an element extract. 2023-12-06 11:23:23 +05:30
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00
Valery Pykhtin
ecf8818380
[AMDGPU] Presubmit test: max register pressure on defs. (#74424)
Upcoming patch #74422.
2023-12-05 13:23:49 +01:00
Ruiling, Song
90681d3a41
AMDGPU: Return legal addressmode correctly for flat scratch (#71494) 2023-12-05 11:22:39 +08:00
Ruiling Song
e31a7581a5 AMDGPU: Pre-commit test to show diff 2023-12-05 11:19:50 +08:00