Using a BufferSize of one for memory ProcResources will result in better
ILP since it more accurately models the dependencies between memory ops
and their consumers on an in-order processor. After this change, the
scheduler will treat the data edges from loads as blocking so that
stalls are guaranteed when waiting for data to be retreaved from memory.
Since we don't actually track waitcnt here, this should do a better job
at modeling their behavior.
Practically, this means that the scheduler will trigger the 'STALL'
heuristic more often.
This type of change needs to be evaluated experimentally. Preliminary
results are positive.
Fixes: SWDEV-282962
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D114777
Use GCNHazardRecognizer in postra sched.
Updated tests for the new schedules.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D109536
Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde
Description: This change enables the compare operations to be selected to SALU/VALU form
dependent of the SDNode divergence flag.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D106079
Treat a non-atomic volatile load and store as a relaxed atomic at
system scope for the address spaces accessed. This will ensure all
relevant caches will be bypassed.
A volatile atomic is not changed and still only bypasses caches upto
the level specified by the SyncScope operand.
Differential Revision: https://reviews.llvm.org/D94214
It should be enabled only when the load alignment is at least 8-byte.
Fixes: SWDEV-256824
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D90404
tryLatency compares two sched candidates. For the top zone it prefers
the one with lesser depth, but only if that depth is greater than the
total latency of the instructions we've already scheduled -- otherwise
its latency would be hidden and there would be no stall.
Unfortunately it only tests the depth of one of the candidates. This can
lead to situations where the TopDepthReduce heuristic does not kick in,
but a lower priority heuristic chooses the other candidate, whose depth
*is* greater than the already scheduled latency, which causes a stall.
The fix is to apply the heuristic if the depth of *either* candidate is
greater than the already scheduled latency.
All this also applies to the BotHeightReduce heuristic in the bottom
zone.
Differential Revision: https://reviews.llvm.org/D72392
This allows to avoid scratch use or indirect VGPR addressing for
small vectors.
Differential Revision: https://reviews.llvm.org/D54606
llvm-svn: 347231
An extractelement with non-constant index will be lowered either to
scratch or movrel loop in most cases. This patch converts such
instruction into a set of selects if vector size is not too big.
Differential Revision: https://reviews.llvm.org/D54351
llvm-svn: 346800
Try to avoid mutually exclusive features. Don't use
a real default GPU, and use a fake "generic". The goal
is to make it easier to see which set of features are
incompatible between feature strings.
Most of the test changes are due to random scheduling changes
from not having a default fullspeed model.
llvm-svn: 310258
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
llvm-svn: 298444