Matt Arsenault
d8ed207a20
AMDGPU: Constant fold rcp node
...
When doing arcp optimization with a constant denominator,
this was leaving behind rcps with constant inputs.
llvm-svn: 297248
2017-03-08 00:48:46 +00:00
Matt Arsenault
7aad8fd8f4
Enable FeatureFlatForGlobal on Volcanic Islands
...
This switches to the workaround that HSA defaults to
for the mesa path.
This should be applied to the 4.0 branch.
Patch by Vedran Miletić <vedran@miletic.net>
llvm-svn: 292982
2017-01-24 22:02:15 +00:00
Tom Stellard
0d23ebe888
AMDGPU/SI: Implement a custom MachineSchedStrategy
...
Summary:
GCNSchedStrategy re-uses most of GenericScheduler, it's just uses
a different method to compute the excess and critical register
pressure limits.
It's not enabled by default, to enable it you need to pass -misched=gcn
to llc.
Shader DB stats:
32464 shaders in 17874 tests
Totals:
SGPRS: 1542846 -> 1643125 (6.50 %)
VGPRS: 1005595 -> 904653 (-10.04 %)
Spilled SGPRs: 29929 -> 27745 (-7.30 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 36688188 -> 37034900 (0.95 %) bytes
LDS: 1913 -> 1913 (0.00 %) blocks
Max Waves: 254101 -> 265125 (4.34 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 1338220 -> 1438499 (7.49 %)
VGPRS: 886221 -> 785279 (-11.39 %)
Spilled SGPRs: 29869 -> 27685 (-7.31 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 34315716 -> 34662428 (1.01 %) bytes
LDS: 1551 -> 1551 (0.00 %) blocks
Max Waves: 188127 -> 199151 (5.86 %)
Wait states: 0 -> 0 (0.00 %)
Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: https://reviews.llvm.org/D23688
llvm-svn: 279995
2016-08-29 19:42:52 +00:00
Matt Arsenault
979902b3ff
AMDGPU: fdiv -1, x -> rcp -x
...
llvm-svn: 277535
2016-08-02 22:25:04 +00:00
Matt Arsenault
e3862cdc93
AMDGPU: Use rcp for fdiv 1, x with fpmath metadata
...
Using rcp should be OK for safe math usually, so this
should not be replacing the original fdiv.
llvm-svn: 276823
2016-07-26 23:25:44 +00:00
Matt Arsenault
bef34e21c7
AMDGPU: Rename intrinsics to use amdgcn prefix
...
The intrinsic target prefix should match the target name
as it appears in the triple.
This is not yet complete, but gets most of the important ones.
llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled
for compatability for now.
llvm-svn: 258557
2016-01-22 21:30:34 +00:00