8582 Commits

Author SHA1 Message Date
Matt Arsenault
248fba0cd8 AMDGPU: Remove pointless setOperationAction for xint_to_fp
The legalize action for uint_to_fp/sint_to_fp uses the source integer
type, not the result FP type so setting an action on an FP type does
nothing.
2023-12-22 11:24:35 +07:00
Jay Foad
8fdfd34cd2
[AMDGPU] Remove GDS and GWS for GFX12 (#76148) 2023-12-21 15:27:08 +00:00
Matt Arsenault
9e574a3936 DAG: Fix expansion of bf16 sourced extloads
Also fix assorted vector extload failures for AMDGPU.
2023-12-20 19:24:27 +07:00
Nikita Popov
9d60e95bcd
[AMDGPU] Use poison instead of undef for non-demanded elements (#75914)
Return poison instead of undef for non-demanded lanes in the AMDGPU
demanded element simplification hook.

Also bail out of dmask is 0, as this case has special semantics:

> If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by
> LWE status if exists. TFE status is not generated since the fetch is dropped.
2023-12-20 11:01:59 +01:00
Mariusz Sikora
9a41a80e76
[AMDGPU] Handle object size and bail if assume-like intrinsic is used in PromoteAllocaToVector (#68744)
Attached test will cause crash without this change.

We should not remove isAssumeLikeIntrinsic instruction if it is used by
other instruction.
2023-12-20 07:47:49 +01:00
Jeffrey Byrnes
f1156fb622
[AMDGPU][IGLP]: Add SchedGroupMask::TRANS (#75416)
Makes constructing SchedGroups of this type easier, and provides ability
to create them with __builtin_amdgcn_sched_group_barrier
2023-12-19 16:54:18 -08:00
Mariusz Sikora
a018c8cdbb
GFX12: Add LoopDataPrefetchPass (#75625)
It is currently disabled by default. It will need experiments on a real
HW to tune and decide on the profitability.

---------

Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-19 08:32:16 +01:00
James Y Knight
137f785fa6
[AMDGPU] Set MaxAtomicSizeInBitsSupported. (#75185)
This will result in larger atomic operations getting expanded to
`__atomic_*` libcalls via AtomicExpandPass, which matches what Clang
already does in the frontend.

While AMDGPU currently disables the use of all libcalls, I've changed it
to instead disable all of them _except_ the atomic ones. Those are
already be emitted by the Clang frontend, and enabling them in the
backend allows the same behavior there.
2023-12-18 16:51:06 -05:00
Stanislav Mekhanoshin
e5c523e861
[AMDGPU] Produce better memoperand for LDS DMA (#75247)
1) It was marked as volatile. This is not needed and the only reason
   it was done is because it is both load and store and handled
   together with atomics. Global load to LDS was marked as volatile
   just because buffer load was done that way.
2) Preserve at least LDS (store) pointer which we always have with
   the intrinsics.
3) Use PoisonValue instead of nullptr for load memop as a Value.
2023-12-18 11:01:12 -08:00
Stanislav Mekhanoshin
94230ce548
[AMDGPU] Fix lack of LDS DMA check in the AA handling (#75249)
SIInstrInfo::areMemAccessesTriviallyDisjoint does a DS offset checks,
but does not account for LDS DMA instructions. Added these checks.
Without it code falls through and returns true which is wrong. As a
result mayAlias would always return false for LDS DMA and a regular LDS
instruction or 2 LDS DMA instructions.

At the moment this is NFCI because we do not use this AA in a context
which may touch LDS DMA instructions. This is also unreacheable now
because of the ordered memory ref checks just above in the function and
LDS DMA is marked as volatile. This volatile marking is removed in PR
#75247, therefore I'd submit this check before #75247.
2023-12-18 10:58:50 -08:00
Jay Foad
7e5019e82b
[AMDGPU] Simplify WaitcntBrackets::getRegInterval with getPhysRegBaseClass (#74087)
This means that getRegInterval no longer depends on the MCInstrDesc, so
it could be simplified further to take just a MachineOperand or just a
physical register. NFCI.
2023-12-18 14:16:02 +00:00
Jakub Chlanda
a34db9bdef
[AMDGPU][NFC] Simplify needcopysign logic (#75176)
This was caught by coverity, reported as: `dead_error_condition`.
Since the conditional revolves around `CF`, it is guaranteed to be null
in the else clause, hence making the second part of the statement
redundant.
2023-12-18 12:07:22 +01:00
Carl Ritson
5139299618
[AMDGPU] Track physical VGPRs used for SGPR spills (#75573)
Physical VGPRs used for SGPR spills need to be tracked independent of
WWM reserved registers. The WWM reserved set contains extra registers
allocated during WWM pre-allocation pass.

This causes SGPR spills allocated after WWM pre-allocation to overlap
with WWM register usage, e.g. if frame pointer is spilt during
prologue/epilog insertion.
2023-12-17 16:44:16 +09:00
Youngsuk Kim
67aec2f58b [llvm] Remove no-op ptr-to-ptr casts (NFC)
Remove calls to CreatePointerCast which are just doing no-op ptr-to-ptr
bitcasts.

Opaque ptr cleanup effort (NFC).
2023-12-15 11:04:48 -06:00
Mariusz Sikora
414d27419f
[AMDGPU] GFX12: select @llvm.prefetch intrinsic (#74576)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-15 17:15:55 +01:00
Jessica Del
32f9983c06
[AMDGPU] - Add address space for strided buffers (#74471)
This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they
point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which contain a
128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially,
they are fat buffer pointers with an additional 32-bit index.
2023-12-15 15:49:25 +01:00
Mirko Brkušanin
07a6d73664
[AMDGPU] CodeGen for GFX12 VFLAT, VSCRATCH and VGLOBAL instructions (#75493) 2023-12-15 15:01:40 +01:00
Mirko Brkušanin
5879162f7f
[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492) 2023-12-15 13:45:03 +01:00
Jie Fu
f0b44ce28e [AMDGPU] Fix -Wunused-variable in SIInsertWaitcnts.cpp (NFC)
llvm-project/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1322:10:
 error: unused variable 'SWaitInst' [-Werror,-Wunused-variable]
    auto SWaitInst =
         ^
llvm-project/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1334:10:
 error: unused variable 'SWaitInst' [-Werror,-Wunused-variable]
    auto SWaitInst = BuildMI(Block, It, DL, TII->get(AMDGPU::S_WAITCNT_VSCNT))
         ^
2 errors generated.
2023-12-15 20:00:18 +08:00
Mirko Brkušanin
26b14aedb7
[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488) 2023-12-15 12:40:23 +01:00
Pierre van Houtryve
ef067f5204
[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (#72830)
Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>
2023-12-15 12:33:32 +01:00
Petar Avramovic
d37ced8880
AMDGPU: refactor phi lowering from SILowerI1Copies (NFCI) (#75349)
Make abstract class PhiLoweringHelper and expose it for use in
GlobalISel path.
SILowerI1Copies implements PhiLoweringHelper as Vreg1LoweringHelper and
it is equivalent to SILowerI1Copies.
Notable change that createLaneMaskReg now clones attributes from
register that has lane mask attributes instead of creating register with
lane mask register class. This is because lane masks have
different(more) attributes in GlobalISel.

patch 2 from: https://github.com/llvm/llvm-project/pull/73337
2023-12-15 12:20:07 +01:00
Mirko Brkušanin
a278ac577e
[AMDGPU] CodeGen for SMEM instructions (#75579) 2023-12-15 12:10:33 +01:00
Mirko Brkušanin
569ef8ddd9
[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204) 2023-12-15 10:41:05 +01:00
Mariusz Sikora
966416b9e8
[AMDGPU][GFX12] Add new v_permlane16 variants (#75475) 2023-12-15 10:14:38 +01:00
Mirko Brkušanin
c1a6974d6b
[AMDGPU][MC] Add GFX12 SMEM encoding (#75215) 2023-12-15 09:00:54 +01:00
Pierre van Houtryve
f1ea77f7be
[AMDGPU][SIInsertWaitcnts] Set initial state for VS_CNT in non-kernel functions (#75436)
Split from #72830
2023-12-15 08:31:14 +01:00
Jay Foad
3e6da3252f
[AMDGPU] Add GFX12 s_sleep_var instruction and intrinsic (#75499) 2023-12-14 21:11:39 +00:00
Jay Foad
c5a068a196
[AMDGPU] Remove s_cmpk_* for GFX12 (#75497)
No GFX12 encoding was added for these. This patch adds tests that they
are not recognized by the assembler and defends against generating them
in codegen.
2023-12-14 21:10:53 +00:00
Mirko Brkušanin
47615ddc84
[AMDGPU][MC] Add GFX12 VFLAT, VSCRATCH and VGLOBAL encodings (#75193) 2023-12-14 14:22:04 +01:00
Valery Pykhtin
dd051295bc
[AMDGPU] Enable GCNRewritePartialRegUses pass by default. (#72975)
Let's try once again after #69957 has landed.
2023-12-14 14:10:27 +01:00
Mirko Brkušanin
ac406b4817
[AMDGPU][MC] Add GFX12 VBUFFER encoding (#75195) 2023-12-14 12:58:18 +01:00
Mirko Brkušanin
16c27bcdde
[AMDGPU][MC] Add GFX12 VDS encoding (#75316) 2023-12-14 11:04:21 +01:00
Stanislav Mekhanoshin
c6ecbcb48b
[AMDGPU] Fix no waitcnt produced between LDS DMA and ds_read on gfx10 (#75245)
BUFFER_LOAD_DWORD_LDS was incorrectly touching vscnt instead of the
vmcnt. This is VMEM load and DS store, so it shall use vmcnt.
2023-12-13 10:49:36 -08:00
Petar Avramovic
6892c175c5
AMDGPU/GlobalISel: add AMDGPUGlobalISelDivergenceLowering pass (#75340)
Add empty AMDGPUGlobalISelDivergenceLowering pass. This pass will
implement
- selection of divergent i1 phis as lane mask phis, requires lane mask
merging in some cases
- lower uses of divergent i1 values outside of the cycle using lane mask
merging
- lowering of all cases of temporal divergence:
- lower uses of uniform i1 values outside of the cycle using lane mask
merging
- lower uses of uniform non-i1 values outside of the cycle using a copy
to vgpr inside of the cycle

Add very detailed set of regression tests for cases mentioned above.

patch 1 from: https://github.com/llvm/llvm-project/pull/73337
2023-12-13 16:42:56 +01:00
Mariusz Sikora
7f55d7de1a
[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-13 15:01:13 +01:00
Piotr Sobczak
6eec80133b
[AMDGPU] Min/max changes for GFX12 (#75214)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 14:18:10 +01:00
Piotr Sobczak
fac093dd08
[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 13:52:40 +01:00
Piotr Sobczak
62653573cb [AMDGPU] Extend clang-format directive in SIDefines.h 2023-12-13 11:18:16 +01:00
Jay Foad
8005ee6da3
[AMDGPU] CodeGen for GFX12 64-bit scalar add/sub (#75070) 2023-12-12 17:41:40 +00:00
Jay Foad
220e095a2c [AMDGPU] Remove unused function splitScalar64BitAddSub 2023-12-12 15:01:57 +00:00
Piotr Sobczak
cd138fddf1
[AMDGPU] Turn off clang-format in moveToVALU (#75188) 2023-12-12 15:38:06 +01:00
Jay Foad
1c6b336395
[AMDGPU] Miscellaneous clang-format changes (#75186)
Reformat one table and protect a couple of tables that we don't want to
reformat.
2023-12-12 14:26:04 +00:00
Mariusz Sikora
a97028ac51
[AMDGPU] Update VOP instructions for GFX12 (#74853)
Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>
2023-12-12 11:38:24 +01:00
Saiyedul Islam
777b6de7a4
[AMDGPU][NFC] Test autogenerated llc tests for COV5 (#74339)
Regenerate a few llc tests to test for COV5 instead of the default ABI
version.
2023-12-12 14:35:13 +05:30
Kazu Hirata
586ecdf205
[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-11 21:01:36 -08:00
Wang Pengcheng
d11e54f3fe [MacroFusion] Support multiple predicators (#72219)
The user can provide multiple predicators to MacroFusion and the
DAG mutation will be applied if one of them is evalated to true.

`ShouldSchedulePredTy` is renamed to `MacroFusionPredTy`.
2023-12-12 12:11:13 +08:00
wangpc
59f3661bd2 Revert "[MacroFusion] Support multiple predicators (#72219)"
This reverts commit d3f6e82a6a562e3288a6fc0970d324073996c16d.

Some code can't be compiled.
2023-12-12 11:21:39 +08:00
Wang Pengcheng
d3f6e82a6a
[MacroFusion] Support multiple predicators (#72219)
The user can provide multiple predicators to MacroFusion and the
DAG mutation will be applied if one of them is evalated to true.

`ShouldSchedulePredTy` is renamed to `MacroFusionPredTy`.
2023-12-12 11:18:31 +08:00
Pierre van Houtryve
dd32d26a37
[AMDGPU] Form V_MAD_U64_U32 from mul24 (#72393)
Fixes SWDEV-421067
2023-12-11 11:38:27 +01:00