6 Commits

Author SHA1 Message Date
erman-gurses
87c0260f45
[AMDGPU] Add parameterization for optimized shared memory variables (#82508)
- This PR adds parameterization for shared memory variables that are
used for optimization: `sharedMemoryLineSizeBytes` and
`defaultVectorSizeBits.`
- The default values are set to 128 for both variables since it gives
zero bank conflicts.
2024-02-27 23:28:12 -05:00
erman-gurses
04381c106f
[MLIR][AMDGPU]Add refactoring for shared-mem optimization (#81791)
Addressing the issues in this PR:
https://github.com/llvm/llvm-project/pull/81550
2024-02-15 13:53:15 -05:00
erman-gurses
29d1aca05c
[AMDGPU][MLIR]Add shmem-optimization as an op using transform dialect (#81550)
This PR adds functionality to use shared memory optimization as an op
using transform dialect.
2024-02-13 17:42:04 -08:00
erman-gurses
3f37df5b71
[reland][mlir][amdgpu] Shared memory access optimization pass (#79164)
- Reland: https://github.com/llvm/llvm-project/pull/75627

- Reproduced then fixed the build issue
2024-01-25 07:44:45 -08:00
Mehdi Amini
e611a4cf80
Revert "[mlir][amdgpu] Shared memory access optimization pass" (#78822)
Reverts llvm/llvm-project#75627 ; it broke the bot:
https://lab.llvm.org/buildbot/#/builders/61/builds/53218
2024-01-19 16:41:43 -08:00
erman-gurses
b7360fbe8c
[mlir][amdgpu] Shared memory access optimization pass (#75627)
It implements transformation to optimize accesses to shared memory.

Reference: https://reviews.llvm.org/D127457

_This change adds a transformation and pass to the NvGPU dialect that
attempts to optimize reads/writes from a memref representing GPU shared
memory in order to avoid bank conflicts. Given a value representing a
shared memory memref, it traverses all reads/writes within the parent op
and, subject to suitable conditions, rewrites all last dimension index
values such that element locations in the final (col) dimension are
given by newColIdx = col % vecSize + perm[row](col / vecSize, row)
where perm is a permutation function indexed by row and vecSize
is the vector access size in elements (currently assumes 128bit
vectorized accesses, but this can be made a parameter). This specific
transformation can help optimize typical distributed & vectorized
accesses
common to loading matrix multiplication operands to/from shared memory._
2024-01-19 15:44:45 -08:00