13 Commits

Author SHA1 Message Date
Jacob Weightman
814a0abcce AMDGPU: allow reordering of functions in AMDGPUResourceUsageAnalysis
The AMDGPUResourceUsageAnalysis was previously a CGSCC pass, and assumed
that a function's callees were always analyzed prior to their callees.
When it was refactored into a module pass, this assumption no longer
always holds. This results in calls being erroneously identified as
indirect, and reserving private segment space for them. This results in
significantly slower kernel launch latency.

This patch changes the order in which the module's functions are analyzed
from the order in which they occur in the module to a post-order traversal
of the call graph. Perhaps Clang always generates the module's functions
in such an order, but this is not the case for the Cray Fortran compiler.

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D126025
2022-06-03 15:55:54 -05:00
serge-sans-paille
989f1c72e0 Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169

after:  1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
2022-03-16 08:43:00 +01:00
Nico Weber
a278250b0f Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10 07:59:22 -05:00
serge-sans-paille
7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00
Jacob Lambert
0bad7cb565 Hoist getTotalNumVGPRs into AMDGPUBaseInfo for use in both codegen and MC
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D119912
2022-02-16 11:04:08 -08:00
Matt Arsenault
4622afa94c AMDGPU: Convert AMDGPUResourceUsageAnalysis to a Module pass
This is more precise in the face of indirect calls and aliases, still
assuming the call target is defined somewhere in the current module.

This sometimes changes the order the functions are printed, and also
changes the point where context errors are printed relative to
stdout. This also likely has negative consequences for compile time
and memory usage.
2022-02-04 15:56:04 -05:00
Matt Arsenault
935abab65c AMDGPU: Use module level register maximums for unknown callees
Compute the theoretical register budget based on the IR function
signature/attributes, and use the global maximum register budgets for
unknown callees.

This should fix the kernel reported register usage in the presence of
indirect calls. The previous fix in
2b08f6af62afbf32e89a6a392dbafa92c62f7bdf was incorrect becauset it was
only taking the maximum in the known call graph, and missing something
that was either outside of it or codegened later.

This fixes a second case I discovered where calls to aliases also did
not work as expected. CallGraphAnalysis misses these, so functions
called through aliases were not codegened ahead of callers as
expected. CallGraphAnalysis should probably be fixed to understand
this case, and there's likely a bug with IPRA here. This fixes
numerous failures in the conformance test at -O0.
2022-02-04 15:56:03 -05:00
Matt Arsenault
c7a0c2d0f7 AMDGPU: Report large stack usage for recursive calls
We were previously setting an ignored bit in the kernel headers. The
current behavior is to add the large amount on top of the statically
known size of a single stack frame. I'm not sure if we should just use
the large size as the entire reported size instead.
2021-11-10 20:02:01 -05:00
Anshil Gandhi
0567f03331 [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols
By default clang emits complete contructors as alias of base constructors if they are the same.
The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols.
@yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true`
and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values
with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had
to be extended to support aliases to functions. inline-calls.ll was corrected appropriately.

Reviewed By: yaxunl, #amdgpu

Differential Revision: https://reviews.llvm.org/D109707
2021-10-18 16:53:15 -06:00
Anshil Gandhi
1830ec94ac Revert "[HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols"
This reverts commit 03375a3fb33b11e1249d9c88070b7f33cb97802a.
2021-10-15 16:16:18 -06:00
Anshil Gandhi
03375a3fb3 [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols
By default clang emits complete contructors as alias of base constructors if they are the same.
The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols.
@yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true`
and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values
with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had
to be extended to support aliases to functions. inline-calls.ll was corrected appropriately.

Reviewed By: yaxunl, #amdgpu

Differential Revision: https://reviews.llvm.org/D109707
2021-10-15 11:39:15 -06:00
David Stuttard
69f7d81d0a [AMDGPU] Set number vgprs used in PS shaders based on input registers actually used
For PS shaders we can use the input SPI_PS_INPUT_ENA and SPI_PS_INPUT_ADDR
registers

Calculate the number of VGPR registers used as input VGPRs based on these
registers rather than the arguments passed in (this conservatively always
allocates the maximum).

Differential Revision: https://reviews.llvm.org/D101633

Change-Id: Idf7c060cbbd5f7e3300102c55ecee3c07f209de6
2021-10-08 14:24:35 +01:00
Sebastian Neubauer
2b08f6af62 [AMDGPU] Improve register computation for indirect calls
First, collect the register usage in each function, then apply the
maximum register usage of all functions to functions with indirect
calls.

This is more accurate than guessing the maximum register usage without
looking at the actual usage.

As before, assume that indirect calls will hit a function in the
current module.

Differential Revision: https://reviews.llvm.org/D105839
2021-07-20 13:48:50 +02:00