6 Commits

Author SHA1 Message Date
Nikita Popov
bdf2fbba9c [AMDGPU] Convert some tests to opaque pointers (NFC) 2022-12-19 12:41:13 +01:00
Pierre van Houtryve
a88deb4b65 [AMDGPU] Use aperture registers instead of S_GETREG
Fixes a longstanding TODO in the codebase where we were using S_GETREG + shift to do something that could simply be done with an inline constant (register).

Patch based on D31874 by @kzhuravl
Depends on D137767

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D137542
2022-11-30 12:25:10 +00:00
Ruiling Song
0eaf6759ae [AMDGPU][InsertWaits] No wait for WAW for global/scratch_load
global/scratch_load will return in order they are issued. No
need to insert a s_waitcnt for WAW hazard.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D138476
2022-11-23 09:57:50 +08:00
Alexander Timofeev
fbdea5a2e9 [AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode
This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler.  It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D133593
2022-09-15 22:03:56 +02:00
Carl Ritson
4c4db81630 [AMDGPU] Extend SILoadStoreOptimizer to s_load instructions
Apply merging to s_load as is done for s_buffer_load.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D130742
2022-07-30 11:38:39 +09:00
Changpeng Fang
dd5895cc39 AMDGPU: Use the implicit kernargs for code object version 5
Summary:
  Specifically, for trap handling, for targets that do not support getDoorbellID,
we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1].
To get aperture bases when targets do not have aperture registers, we load
private_base or shared_base directly from the implicit kernarg. In clang, we use
implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.

Reviewers: arsenm, sameerds, yaxunl

Differential Revision: https://reviews.llvm.org/D120265
2022-03-17 14:12:36 -07:00