Commit Graph

5 Commits

Author SHA1 Message Date
aledudek
ed385de9b2 Ck tile batched gemm example (#1615)
* [CK Tile] Batched GEMM Example

* [CK Tile] Batched GEMM Example - minor refactor

* [CK Tile] Batched GEMM Example - README update

* [CK Tile] Batched Gemm Example - review changes

- Added tensor data layours as input parameters
- Changed structure of Host and Kernel args
- Removed bug with invalid vector read on non-contiguous memory

* [CK Tile] Batched Gemm Example - remove comment

* [CK Tile] Batched Gemm Example - Add GTests part1

* [CK Tile] Batched Gemm Example - GTests part2 + review changes

* [CK TILE] Batched GEMM post merge fixes

* [CK Tile] Batched GEMM Example - fix pad views

[ROCm/composable_kernel commit: 78f0fea08e]
2024-11-29 11:52:18 +01:00
jakpiase
d8d0168cef Add interwave scheduler for gemm mem pipeline (#1647)
* add interwave scheduler for gemm mem pipeline

* Fix merge artifacts.

* Refactor unit tests.

* Switch to interwave scheduler for mem example

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>

[ROCm/composable_kernel commit: e7b6286441]
2024-11-27 18:25:07 +01:00
Thomas Ning
6bafdd985c [CK Tile] Improve the Layout, Padding, and Alignment features of CK Tile GEMM (#1651)
* Finished the feature

* Modified the test file

* Test case update

* addresss comment

* Addressed the review comment

* Fixed the CI error

[ROCm/composable_kernel commit: 2b6458ddf2]
2024-11-11 18:08:25 -08:00
Adam Osewski
c0de51533d [CK-Tile] Universal gemm memory bound pipeline (#1558)
* CK-Tile GEMM with memory bound pipeline.

* Memory bound gemm pipeline.

* Fix not closed namespace.

* Block gemm mem pipeline draft.

* Do not use ck_tile:: within ck_tile namespace.

* Refactoring & Move Layout info to pipeline problem.

* Get hot loop and TailNum information before lunching kernel.

* Fixes in pipeline.

* Add comment to load_tile_raw and change variable naming style.

* Few small changes & formatting.

* Do not use macro.

* Add gtests.

* Use AccDataType for Output of MFMA instruction.

* Formatting.

* Refactor gemm examples.

* Switch over to current block gemm.

* Use currently available pipeline policy.

* Refactoring and review comment.s

* Fixes after merge.

* Add missing include.

* Add load tile overload which accepts output tensor as parameter.

* This give 8% perf boost at the cost of using more registers.

* Rename example.

* Small changes.

* Fix compilation err and lower K.

* Support different layouts for A/B

* Fix vector size for different layouts.

* Rename Alignment into VectorSize

* Unblock tests.

[ROCm/composable_kernel commit: 24d996aae1]
2024-10-30 10:05:15 +01:00
Bartłomiej Kocot
2c12ff4e9a [CK_TILE] Image to Column kernel (#1532)
* [CK_TILE] Image to Column kernel

* Fixes

* Vector loads and stores

* Fixes

* Fixes

* change test dir name

[ROCm/composable_kernel commit: de3e3b6424]
2024-09-27 22:57:38 +02:00