7 Commits

Author SHA1 Message Date
pvanhout
f104eb6e15 [AMDGPU] Reintroduce CC exception for non-inlined functions in Promote Alloca limits
This is basically a partial revert of https://reviews.llvm.org/D145586 ( fd1d60873fdc )

D145586 was originally introduced to help with SWDEV-363662, and it did, but
it also caused a 25% drop in performance in
some MIOpen benchmarks where, it seems,
functions are inlined more conservatively.

This patch restores the pre-D145586 behavior
for PromoteAlloca: functions with a non-entry CC
have a 32 VGPRs threshold, but only if the function
is not marked with "alwaysinline".

A good number of AMDGPU code makes uses of
the AMDGPUAlwaysInline pass anyway, so in our
backend "alwaysinline" seems very common.

This change does not affect SWDEV-363662 (the motivating issue for introducing D145586).

Fixes SWDEV-399519

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D150551
2023-05-23 09:01:39 +02:00
pvanhout
fd1d60873f [AMDGPU] Remove CC exception for Promote Alloca Limits
Apparently it was used to work around some issue that has been fixed.
Removing it helps with high scratch usage observed in some cases due to failed alloca promotion.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D145586
2023-04-13 08:48:34 +02:00
Roman Lebedev
7850ab2112
[NFC] Port an assortment of tests that invoke SROA to new pass manager 2022-12-01 21:17:18 +03:00
Matt Arsenault
50caf6936b AMDGPU: Convert promote alloca tests to opaque pointers 2022-11-28 10:36:38 -05:00
Matt Arsenault
1310aa1688 AMDGPU: Use -passes for amdgpu-promote-alloca tests 2022-11-16 17:14:48 -08:00
Stanislav Mekhanoshin
cf74ef134c [AMDGPU] Limit promote alloca max size in functions
Non-entry functions have 32 caller saved VGPRs available. If we
promote alloca to consume more registers we will have to spill
CSRs. There is no reason to eliminate scratch access to get
another scratch access instead.

Differential Revision: https://reviews.llvm.org/D110372
2021-09-24 13:38:39 -07:00
Stanislav Mekhanoshin
54e2dc7537 [AMDGPU] Limit promote alloca to vector with VGPR budget
Allow only up to 1/4 of available VGPRs for the vectorization
of any given alloca.

Differential Revision: https://reviews.llvm.org/D82990
2020-07-01 15:57:24 -07:00