llvm-project

Author	SHA1	Message	Date
erman-gurses	87c0260f45	[AMDGPU] Add parameterization for optimized shared memory variables (#82508 ) - This PR adds parameterization for shared memory variables that are used for optimization: `sharedMemoryLineSizeBytes` and `defaultVectorSizeBits.` - The default values are set to 128 for both variables since it gives zero bank conflicts.	2024-02-27 23:28:12 -05:00
erman-gurses	04381c106f	[MLIR][AMDGPU]Add refactoring for shared-mem optimization (#81791 ) Addressing the issues in this PR: https://github.com/llvm/llvm-project/pull/81550	2024-02-15 13:53:15 -05:00
erman-gurses	29d1aca05c	[AMDGPU][MLIR]Add shmem-optimization as an op using transform dialect (#81550 ) This PR adds functionality to use shared memory optimization as an op using transform dialect.	2024-02-13 17:42:04 -08:00
erman-gurses	3f37df5b71	[reland][mlir][amdgpu] Shared memory access optimization pass (#79164 ) - Reland: https://github.com/llvm/llvm-project/pull/75627 - Reproduced then fixed the build issue	2024-01-25 07:44:45 -08:00
Mehdi Amini	e611a4cf80	Revert "[mlir][amdgpu] Shared memory access optimization pass" (#78822 ) Reverts llvm/llvm-project#75627 ; it broke the bot: https://lab.llvm.org/buildbot/#/builders/61/builds/53218	2024-01-19 16:41:43 -08:00
erman-gurses	b7360fbe8c	[mlir][amdgpu] Shared memory access optimization pass (#75627 ) It implements transformation to optimize accesses to shared memory. Reference: https://reviews.llvm.org/D127457 _This change adds a transformation and pass to the NvGPU dialect that attempts to optimize reads/writes from a memref representing GPU shared memory in order to avoid bank conflicts. Given a value representing a shared memory memref, it traverses all reads/writes within the parent op and, subject to suitable conditions, rewrites all last dimension index values such that element locations in the final (col) dimension are given by newColIdx = col % vecSize + perm[row](col / vecSize, row) where perm is a permutation function indexed by row and vecSize is the vector access size in elements (currently assumes 128bit vectorized accesses, but this can be made a parameter). This specific transformation can help optimize typical distributed & vectorized accesses common to loading matrix multiplication operands to/from shared memory._	2024-01-19 15:44:45 -08:00

6 Commits