Nicolas Vasilache
b2729fda60
[mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)
...
This revision follows up on the conversation titled:
```[llvm-dev] Understanding and controlling some of the AVX shuffle emission paths```
The revision adds a vblendps-based implementation for transpose8x8 and further distinguishes between and intrinsics and an inline_asm implementation.
This results in roughly 20% fewer cycles as reported by llvm-mca:
After this revision (intrinsic version, resolves to virtually identical assembly as per the llvm-dev discussion, no vblendps instruction is emitted):
```
Iterations: 100
Instructions: 5900
Total Cycles: 2415
Total uOps: 7300
Dispatch Width: 6
uOps Per Cycle: 3.02
IPC: 2.44
Block RThroughput: 24.0
Cycles with backend pressure increase [ 89.90% ]
Throughput Bottlenecks:
Resource Pressure [ 89.65% ]
- SKXPort1 [ 0.04% ]
- SKXPort2 [ 12.42% ]
- SKXPort3 [ 12.42% ]
- SKXPort5 [ 89.52% ]
Data Dependencies: [ 37.06% ]
- Register Dependencies [ 37.06% ]
- Memory Dependencies [ 0.00% ]
```
After this revision (inline_asm version, vblendps instructions are indeed emitted):
```
Iterations: 100
Instructions: 6300
Total Cycles: 2015
Total uOps: 7700
Dispatch Width: 6
uOps Per Cycle: 3.82
IPC: 3.13
Block RThroughput: 20.0
Cycles with backend pressure increase [ 83.47% ]
Throughput Bottlenecks:
Resource Pressure [ 83.18% ]
- SKXPort0 [ 14.49% ]
- SKXPort1 [ 14.54% ]
- SKXPort2 [ 19.70% ]
- SKXPort3 [ 19.70% ]
- SKXPort5 [ 83.03% ]
- SKXPort6 [ 14.49% ]
Data Dependencies: [ 39.75% ]
- Register Dependencies [ 39.75% ]
- Memory Dependencies [ 0.00% ]
```
An accessible copy of the conversation is available [here](https://gist.github.com/nicolasvasilache/68c7f34012584b0e00f335bcb374ede0 ).
Differential Revision: https://reviews.llvm.org/D114393
2021-11-23 07:31:22 +00:00
Mehdi Amini
e0b7bee7cf
Revert "[mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)"
...
This reverts commit a9e236bed835c58be381dadb973a1db0681e4795.
This broke the Windows build:
mlir\include\mlir/Dialect/X86Vector/Transforms.h(28): error C2061: syntax error: identifier 'uint'
2021-11-22 19:23:18 +00:00
Nicolas Vasilache
a9e236bed8
[mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)
...
This revision follows up on the conversation titled:
```[llvm-dev] Understanding and controlling some of the AVX shuffle emission paths```
The revision adds a vblendps-based implementation for transpose8x8 and further distinguishes between and intrinsics and an inline_asm implementation.
This results in roughly 20% fewer cycles as reported by llvm-mca:
After this revision (intrinsic version, resolves to virtually identical assembly as per the llvm-dev discussion, no vblendps instruction is emitted):
```
Iterations: 100
Instructions: 5900
Total Cycles: 2415
Total uOps: 7300
Dispatch Width: 6
uOps Per Cycle: 3.02
IPC: 2.44
Block RThroughput: 24.0
Cycles with backend pressure increase [ 89.90% ]
Throughput Bottlenecks:
Resource Pressure [ 89.65% ]
- SKXPort1 [ 0.04% ]
- SKXPort2 [ 12.42% ]
- SKXPort3 [ 12.42% ]
- SKXPort5 [ 89.52% ]
Data Dependencies: [ 37.06% ]
- Register Dependencies [ 37.06% ]
- Memory Dependencies [ 0.00% ]
```
After this revision (inline_asm version, vblendps instructions are indeed emitted):
```
Iterations: 100
Instructions: 6300
Total Cycles: 2015
Total uOps: 7700
Dispatch Width: 6
uOps Per Cycle: 3.82
IPC: 3.13
Block RThroughput: 20.0
Cycles with backend pressure increase [ 83.47% ]
Throughput Bottlenecks:
Resource Pressure [ 83.18% ]
- SKXPort0 [ 14.49% ]
- SKXPort1 [ 14.54% ]
- SKXPort2 [ 19.70% ]
- SKXPort3 [ 19.70% ]
- SKXPort5 [ 83.03% ]
- SKXPort6 [ 14.49% ]
Data Dependencies: [ 39.75% ]
- Register Dependencies [ 39.75% ]
- Memory Dependencies [ 0.00% ]
```
An accessible copy of the conversation is available [here](https://gist.github.com/nicolasvasilache/68c7f34012584b0e00f335bcb374ede0 ).
Reviewed By: ftynse, dcaballe
Differential Revision: https://reviews.llvm.org/D114335
2021-11-22 10:32:34 +00:00
Mehdi Amini
99b0032ce0
Move the MLIR integration tests as a subdirectory of test (NFC)
...
This does not change the behavior directly: the tests only run when
`-DMLIR_INCLUDE_INTEGRATION_TESTS=ON` is configured. However running
`ninja check-mlir` will not run all the tests within a single
lit invocation. The previous behavior would wait for all the integration
tests to complete before starting to run the first regular test. The
test results were also reported separately. This change is unifying all
of this and allow concurrent execution of the integration tests with
regular non-regression and unit-tests.
Differential Revision: https://reviews.llvm.org/D97241
2021-02-23 05:55:47 +00:00