15 Commits

Author SHA1 Message Date
David Green
44be5a7fdc
[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875)
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
2024-03-06 17:40:13 +00:00
Krzysztof Drewniak
b497234146
[AMDGPU] Make maximum hard clause size a subtarget feature (#81287)
gfx11 chips may, in some conditions, behave incorrectly with S_CLAUSE
instructions (hard clauses) containing more than 32 operations (that is,
whose arguments exceed 0x1f). However, gfx10 targets will work
successfully with clauses of up to length 63.

Therefore, define the MaxHardClauseLength property on GCNSubtarget and
make it a subtarget feature via tablegen, thus allowing us to specify,
both now and in the future, the maximum viable size of clauses on
various hardware from the tablegen definition. If MaxHardClauseLength is
0, which is the default, the hardware does not support hard clauses.
2024-02-15 13:58:31 -06:00
Alex Bradbury
b717365216
[MachineScheduler][NFCI] Add Offset and OffsetIsScalable args to shouldClusterMemOps (#73778)
These are picked up from getMemOperandsWithOffsetWidth but weren't then
being passed through to shouldClusterMemOps, which forces backends to
collect the information again if they want to use the kind of heuristics
typically used for the similar shouldScheduleLoadsNear function (e.g.
checking the offset is within 1 cache line).

This patch just adds the parameters, but doesn't attempt to use them.
There is potential to use them in the current PPC and AArch64
shouldClusterMemOps implementation, and I intend to use the offset in
the heuristic for RISC-V. I've left these for future patches in the
interest of being as incremental as possible.

As noted in the review and in an inline FIXME, an ElementCount-style abstraction may later be used to condense these two parameters to one argument. ElementCount isn't quite suitable as it doesn't support negative offsets.
2023-12-06 15:30:48 +00:00
Jay Foad
ffe86e3bdd [AMDGPU] Update SIInsertHardClauses for GFX11
Changes for GFX11:
- Clauses may not mix instructions of different types, and there are
  more types. For example image instructions with and without a sampler
  are now different types.
- The max size of a clause is explicitly documented as 63 instructions.
  Previously it was implicitly assumed to be 64. This is such a tiny
  difference that it does not seem worth making it conditional on the
  subtarget.
- It can be beneficial to clause stores as well as loads.

Differential Revision: https://reviews.llvm.org/D127391
2022-06-09 21:29:56 +01:00
serge-sans-paille
989f1c72e0 Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169

after:  1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
2022-03-16 08:43:00 +01:00
Nico Weber
a278250b0f Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10 07:59:22 -05:00
serge-sans-paille
7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00
Sebastian Neubauer
bf980930e5 [AMDGPU] Ignore KILLs when forming clauses
KILL instructions are sometimes present and prevented hard
clauses from being formed.

Fix this by ignoring all meta instructions in clauses.

Differential Revision: https://reviews.llvm.org/D106042
2021-09-27 16:33:52 +02:00
Carl Ritson
9cf6ff7aff [AMDGPU] Do not clause NSA instructions
To ensure correct behaviour NSA instructions should not be claused.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D102211
2021-05-14 12:54:56 +09:00
Jay Foad
9e026273b0 [AMDGPU] SIInsertHardClauses: move more stuff into the class. NFC. 2021-05-06 14:47:54 +01:00
dfukalov
560d7e0411 [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
dfukalov
6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
hsmahesha
0ed2c04636 [AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes
Summary:
While clustering mem ops, AMDGPU target needs to consider number of clustered bytes
to decide on max number of mem ops that can be clustered. This patch adds support to pass
number of clustered bytes to target mem ops clustering logic.

Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar

Reviewed By: foad

Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80545
2020-06-01 22:52:34 +05:30
Jay Foad
10c10f2419 [AMDGPU] Fix assertion failure in SIInsertHardClauses
This new pass failed an assertion whenever there were s_nops after the
end of clause.

Differential Revision: https://reviews.llvm.org/D80007
2020-05-15 15:49:52 +01:00
Jay Foad
42a5560503 [AMDGPU] New SIInsertHardClauses pass
Enable clausing of memory loads on gfx10 by adding a new pass to insert
the s_clause instructions that mark the start of each hard clause.

Differential Revision: https://reviews.llvm.org/D79792
2020-05-14 18:54:49 +01:00