792 Commits

Author SHA1 Message Date
Roman Lebedev
7d91037fd2
[X86][Costmodel] Load/store i32/f32 Stride=4 VF=16 interleaving costs
This one required quite a bit of assembly surgery, but the trend continues, so i think this is right.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/EKWdj8cKT - for intels `Block RThroughput: <=32.0`; for ryzens, `Block RThroughput: <=24.0`
So could pick cost of `32`.

For store we have:
https://godbolt.org/z/zj4bb9P75 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=16.0`
So we could pick cost of `32`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111064
2021-10-05 16:58:57 +03:00
Roman Lebedev
4aee1e5b93
[X86][Costmodel] Load/store i32/f32 Stride=4 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/a6rxMG6ec - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=12.0`
So could pick cost of `16`.

For store we have:
https://godbolt.org/z/ced1bdqc9 - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=8.0`
So we could pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111063
2021-10-05 16:58:57 +03:00
Roman Lebedev
3c2e22b795
[X86][Costmodel] Load/store i32/f32 Stride=4 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/avq1oz98W - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: =4.0`
So could pick cost of `8`.

For store we have:
https://godbolt.org/z/89PGMc1qs - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=6.0`
So we could pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111061
2021-10-05 16:58:57 +03:00
Roman Lebedev
b6234c1edf
[X86][Costmodel] Load/store i32/f32 Stride=4 VF=2 interleaving costs
Finally, we are getting to the heavy-hitter stuff!

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/7crGWoar6 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So could pick cost of `4`.

For store we have:
https://godbolt.org/z/T8aq3MszM - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=2.0`
So we could pick cost of `5`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111060
2021-10-05 16:58:56 +03:00
Roman Lebedev
cef0a693b6
[X86][Costmodel] Load/store i64/f64 Stride=3 VF=16 interleaving costs
This required huge amount of assembly surgery, but i think this is about right.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/z11crMEcj - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: <=18.0`
So could pick cost of `25`.

For store we have:
https://godbolt.org/z/eqT4ze3j4 - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=16.0`
So we could pick cost of `24`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111031
2021-10-04 14:35:17 +03:00
Roman Lebedev
ede0611e79
[X86][Costmodel] Load/store i64/f64 Stride=3 VF=8 interleaving costs
This one required quite a bit of assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/oYWv4cTnK - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `10`.

For store we have:
https://godbolt.org/z/33GMhrsG9 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111027
2021-10-04 14:35:01 +03:00
Roman Lebedev
eb9a694c17
[X86][Costmodel] Load/store i64/f64 Stride=3 VF=4 interleaving costs
This one required quite a bit of assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Tce3osvcz - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `5`.

For store we have:
https://godbolt.org/z/oc3arEcnE - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111026
2021-10-04 14:34:47 +03:00
Roman Lebedev
d3bbe781ea
[X86][Costmodel] Load/store i64/f64 Stride=3 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/sz5qdKnr4 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `1`.

For store we have:
https://godbolt.org/z/Kzdjff63v - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111025
2021-10-04 14:34:33 +03:00
Roman Lebedev
4ca5bc07af
[X86][Costmodel] Load/store i32/f32 Stride=3 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =22.0`; for ryzens, `Block RThroughput: <=16.0`
So pick cost of `22`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111022
2021-10-04 14:34:19 +03:00
Roman Lebedev
198aa84973
[X86][Costmodel] Load/store i32/f32 Stride=3 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/zdz5Ga6fs - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `7`.

For store we have:
https://godbolt.org/z/qn71513ac - for intels `Block RThroughput: =11.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `11`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111021
2021-10-04 14:34:05 +03:00
Roman Lebedev
a93411c3af
[X86][Costmodel] Load/store i32/f32 Stride=3 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/d8PdhEszo - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/WojonfG5n - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `5`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111020
2021-10-04 14:34:03 +03:00
Roman Lebedev
3e93fcdfc8
[X86][Costmodel] Load/store i32/f32 Stride=3 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/z8qa14bs3 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: =1.5`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/GYGajoc4K - for intels `Block RThroughput: <=4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111019
2021-10-04 14:31:50 +03:00
Roman Lebedev
67f1ee2e38
[X86][Costmodel] Load/store i16 Stride=3 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/rMaYr67hz - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=17.8`
So pick cost of `56`.

For store we have:
https://godbolt.org/z/eMsbKqnvv - for intels `Block RThroughput: <=54.0`; for ryzens, `Block RThroughput: <=15.0`
So pick cost of `54`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111018
2021-10-03 23:40:35 +03:00
Roman Lebedev
3cbc0a07f9
[X86][Costmodel] Load/store i16 Stride=3 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: =28.0`; for ryzens, `Block RThroughput: <=8.5`
So pick cost of `28`.

For store we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `27`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111017
2021-10-03 23:40:21 +03:00
Roman Lebedev
72f8a9244a
[X86][Costmodel] Load/store i16 Stride=3 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=2.3`
So pick cost of `9`.

For store we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: <=12.0`; for ryzens, `Block RThroughput: <=3.3`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111016
2021-10-03 23:40:05 +03:00
Roman Lebedev
04f1469cb4
[X86][Costmodel] Load/store i16 Stride=3 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.

For store we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111015
2021-10-03 23:39:51 +03:00
Roman Lebedev
8e8fb77aa4
[X86][Costmodel] Load/store i16 Stride=3 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/xnE988aej - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=2.5`
So pick cost of `5`.

For store we have:
https://godbolt.org/z/rMGT31Tnh - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111014
2021-10-03 23:39:36 +03:00
Roman Lebedev
a5e5883ef5
[X86][Costmodel] Load/store i8 Stride=6 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/c1jjKqP7b - for intels `Block RThroughput: <=82.0`; for ryzens, `Block RThroughput: <=26.0`
So pick cost of `82`.

For store we have:
https://godbolt.org/z/YM4ErY8x7 - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=25.5`
So pick cost of `90`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111013
2021-10-03 23:39:22 +03:00
Roman Lebedev
bd5ba437fd
[X86][Costmodel] Load/store i8 Stride=6 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Gz8hhqfTM - for intels `Block RThroughput: <=43.0`; for ryzens, `Block RThroughput: <=14.0`
So pick cost of `43`.

For store we have:
https://godbolt.org/z/9vrdssYa8 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `27`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111012
2021-10-03 23:39:08 +03:00
Roman Lebedev
0b27f9c088
[X86][Costmodel] Load/store i8 Stride=6 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/v98qPTTf6 - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: =6.0`
So pick cost of `18`.

For store we have:
https://godbolt.org/z/rn5T9E8q6 - for intels `Block RThroughput: <=16.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111011
2021-10-03 23:38:54 +03:00
Roman Lebedev
6fe4cce558
[X86][Costmodel] Load/store i8 Stride=6 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `9`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111010
2021-10-03 23:38:40 +03:00
Roman Lebedev
396b95e5c9
[X86][Costmodel] Load/store i8 Stride=6 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/jvj6jzns5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/ros7eebMP - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111008
2021-10-03 23:38:10 +03:00
Simon Pilgrim
b85bf520dc [CostModel][X86] X86TTIImpl::getCmpSelInstrCost - try to use Predicate argument directly first (PR48337)
There's still a lot of cases where getCmpSelInstrCost fails to specify a predicate, once those are in place we should be able to remove the fallback to the Instruction argument entirely.
2021-10-03 17:16:51 +01:00
Roman Lebedev
acb459574a
[X86][Costmodel] Load/store i8 Stride=4 VF=32 interleaving costs
While we already model this tuple, the load cost is divergent from reality, so fix it.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/zWMhhnPYa - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=24.0`
So pick cost of `56`.

For store we have:
https://godbolt.org/z/vnqqjWx51 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110971
2021-10-02 13:40:21 +03:00
Roman Lebedev
0e71ae6da8
[X86][Costmodel] Load/store i8 Stride=4 VF=16 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/TrGW7cKsE - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `24`.

For store we have:
https://godbolt.org/z/Mh7qaqEfe - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `8`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110970
2021-10-02 13:40:21 +03:00
Roman Lebedev
74e4a0e327
[X86][Costmodel] Load/store i8 Stride=4 VF=8 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/v7746Wcf7 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `12`.

For store we have:
https://godbolt.org/z/aEeEohEbP - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110969
2021-10-02 13:40:20 +03:00
Roman Lebedev
ae08362cb8
[X86][Costmodel] Load/store i8 Stride=4 VF=4 interleaving costs
While we already model this tuple, the store cost is divergent from reality, so fix it.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1n4bPh7Tn - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/r8K9sveqo - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110968
2021-10-02 13:40:20 +03:00
Roman Lebedev
935b9693ae
[X86][Costmodel] Load/store i8 Stride=4 VF=2 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/KP6nn36zs - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/ov95zhrq6 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110966
2021-10-02 13:40:20 +03:00
Roman Lebedev
448c939839
[X86][Costmodel] Load/store i8 Stride=3 VF=32 interleaving costs
For VF=16, costs are correct.
For VF=32, load cost is divergent.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/qKjevqf4W - for intels `Block RThroughput: <=14.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/xTssTq319 - for intels `Block RThroughput: =13.0`; for ryzens, `Block RThroughput: <=5.5`
So pick cost of `13`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110961
2021-10-02 13:39:15 +03:00
Roman Lebedev
d1460c88a6
[X86][Costmodel] Load/store i8 Stride=3 VF=8 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1jeocxj55 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/fr7xfa3K5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110960
2021-10-02 13:39:15 +03:00
Roman Lebedev
f1df2d8eaf
[X86][Costmodel] Load/store i8 Stride=3 VF=4 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/obWz3PrfK - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=1.5`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/orjPshn3h - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110958
2021-10-02 13:39:10 +03:00
Roman Lebedev
8a3c64c3a2
[X86][Costmodel] Load/store i8 Stride=3 VF=2 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/WYscYMcW4 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=1.5`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/e9qvYdbbs - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110956
2021-10-02 13:39:05 +03:00
Roman Lebedev
3e260efdfc
[X86][Costmodel] Load/store i64/f64 Stride=2 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1WMTojvfW - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `16`.

For store we have:
https://godbolt.org/z/1WMTojvfW - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=16.0`
So pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110840
2021-10-01 17:48:14 +03:00
Roman Lebedev
abd37de63e
[X86][Costmodel] Load/store i64/f64 Stride=2 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/PGYbYKPq8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `8`.

For store we have:
https://godbolt.org/z/PGYbYKPq8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `8`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110838
2021-10-01 17:48:14 +03:00
Roman Lebedev
71bc31b907
[X86][Costmodel] Load/store i64/f64 Stride=2 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/j5co1qWEW - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/j5co1qWEW - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110837
2021-10-01 17:48:14 +03:00
Roman Lebedev
612e5b05a2
[X86][Costmodel] Load/store i64/f64 Stride=2 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/8a1cfGeMn - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/jMdcM47bx - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `2`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110835
2021-10-01 17:48:14 +03:00
Roman Lebedev
ea76cb87ee
[X86][Costmodel] Load/store i32/f32 Stride=2 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

Here for `store` pattern we are starting to have spilling,
so accurate modelling may be problematic,
although if i drop the spilling, the measurements don't change.

For load we have:
https://godbolt.org/z/1oTTnncbx - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `16`.

For store we have:
https://godbolt.org/z/1oTTnncbx - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: =8.0`
So pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110761
2021-10-01 17:48:14 +03:00
Roman Lebedev
80cd8da78d
[X86][Costmodel] Load/store i32/f32 Stride=2 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/M9eev3xe8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `8`.

For store we have:
https://godbolt.org/z/M9eev3xe8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: =4.0`
So pick cost of `8`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110756
2021-10-01 17:48:14 +03:00
Roman Lebedev
3a0643e9c2
[X86][Costmodel] Load/store i32/f32 Stride=2 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/n8aMKeo4E - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/n8aMKeo4E - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: =2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110755
2021-10-01 17:48:13 +03:00
Roman Lebedev
b12aeaec9a
[X86][Costmodel] Load/store i32/f32 Stride=2 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/EM5Ean7bd - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/EM5Ean7bd - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `2`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110754
2021-10-01 17:48:13 +03:00
Roman Lebedev
f44d9009c2
[X86][Costmodel] Load/store i32/f32 Stride=2 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/4rY96hnGT - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/vbo37Y3r9 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: =0.5`
So pick cost of `1`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110753
2021-10-01 17:48:13 +03:00
Roman Lebedev
2d42a192e0
[X86][Costmodel] Load/store i8 Stride=2 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/xz6x7c35P - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.5`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/xz6x7c35P - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110709
2021-09-29 21:52:45 +03:00
Roman Lebedev
bac60c55e0
[X86][Costmodel] Load/store i8 Stride=2 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/a9hv4z47v - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: =2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/6GfPn1b79 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `3`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110708
2021-09-29 21:52:45 +03:00
Roman Lebedev
1962185671
[X86][Costmodel] Load/store i8 Stride=2 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

Identical to VF=2.

For load we have:
https://godbolt.org/z/4TEbdzbMM - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/MYfzGPf3Y - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5`
So pick cost of `1`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110705
2021-09-29 21:52:45 +03:00
Roman Lebedev
08face1f9a
[X86][Costmodel] Load/store i8 Stride=2 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

Identical to VF=2.

For load we have:
https://godbolt.org/z/sGE41GYo7 - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/ba5r3s9xa - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5`
So pick cost of `1`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110704
2021-09-29 21:52:45 +03:00
Roman Lebedev
7d52628eb0
[X86][Costmodel] Load/store i8 Stride=2 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/caKqjr9hb - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/6TTn3eKj8 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5`
So pick cost of `1`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110702
2021-09-29 21:52:44 +03:00
Roman Lebedev
b6b7860954
[X86][Costmodel] Load/store i16 Stride=6 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For this tuple, measuring becomes problematic since there's a lot of spilling going on,
but apparently all these memory ops do not affect worst-case estimate at all here.

For load we have:
https://godbolt.org/z/5qGb9odP6 - for intels `Block RThroughput: <=106.0`; for ryzens, `Block RThroughput: <=34.8`
So pick cost of `106`.

For store we have:
https://godbolt.org/z/KrWcv4Ph7 - for intels `Block RThroughput: =58.0`; for ryzens, `Block RThroughput: <=20.5`
So pick cost of `58`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110593
2021-09-28 19:15:08 +03:00
Roman Lebedev
24e42f7d28
[X86][Costmodel] Load/store i16 Stride=6 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/3Tc5s897j - for intels `Block RThroughput: =39.0`; for ryzens, `Block RThroughput: <=13.5`
So pick cost of `39`.

For store we have:
https://godbolt.org/z/fo1h9E67e - for intels `Block RThroughput: =21.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `21`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110592
2021-09-28 19:15:07 +03:00
Roman Lebedev
b3011bcc78
[X86][Costmodel] Load/store i16 Stride=6 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1Wcaf9c7T - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `9`.

For store we have:
https://godbolt.org/z/1Wcaf9c7T - for intels `Block RThroughput: =15.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `15`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110591
2021-09-28 19:15:01 +03:00
Roman Lebedev
aa93c55889
[X86][Costmodel] Load/store i16 Stride=6 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/bhscej4WM - for intels `Block RThroughput: =13.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `13`.

For store we have:
https://godbolt.org/z/Yf4Pfnxbq - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=3.5`
So pick cost of `10`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110590
2021-09-28 19:14:56 +03:00