mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-17 17:19:12 +00:00

Files

Adam Osewski 39dc25a9b8 [CK-Tile] Enable vectorized reads on all layouts & improve perf. (#1835 )

* Refactor universal gemm policy.

* Adapt example to refactor changes.

* Introduce static encoding pattern

* Adding shuffled encoding patterns.

* Fix err in reverse tuple.

* Add transpose_tile2d

* Small refactoring + doc

* Enable reading on contiguous dimension in all layouts.

* Transpose A/B register tile if needed for comp v3 pipeline.

* Take contiguous dim size when calculating dram vector load size.

* A/B smem pack size taken from WarpGemm attributes

* Update B LDS layout and setup tile distribution pattern at class level.

* Fix static assert.

* Fix errors in examples.

* Formatting & fix IsTranspose

* Fix VectorSize & refactor.

* Add error loging messages.

* Fix VecLoadSize and TranspseC for mem pipeline.

* Update unit-tests & disable mem pipeline.

* Clang format

* Update include/ck_tile/core/tensor/tile_window.hpp

Co-authored-by: jakpiase <jakub.piasecki@amd.com>

* Fix compilation and reviewers comments.

* Refactor unit-test. Fallback to non-universal gemm.

Need to use GemmPipelineAGmemBGmemCRegV1 for now,
since GemmKernel is now supporting also non-K major vector reads.

---------

Co-authored-by: jakpiase <jakub.piasecki@amd.com>

2025-01-27 16:37:19 +01:00

algorithm

[CK-Tile] Enable vectorized reads on all layouts & improve perf. (#1835 )

2025-01-27 16:37:19 +01:00

arch

CK-Tile Grouped GEMM refactor and post PR fixes (#1756 )

2025-01-21 21:06:10 +01:00

container

[CK-Tile] Enable vectorized reads on all layouts & improve perf. (#1835 )

2025-01-27 16:37:19 +01:00

numeric

[CK_TILE] Add error threshold calculation for gemm examples (#1821 )

2025-01-18 01:01:52 +01:00

tensor

[CK-Tile] Enable vectorized reads on all layouts & improve perf. (#1835 )

2025-01-27 16:37:19 +01:00

utility

add fp8 as dst (#1830 )

2025-01-22 17:34:27 +08:00

config.hpp

[CK_TILE]Moe update index (#1672 )

2024-11-25 13:12:35 +08:00

README.md

introducing ck_tile! (#1216 )

2024-04-15 19:27:12 -05:00

README.md

ck_tile/core

ck_tile/core contains every basic functions and structures to create a GPU kernel using ck_tile. User should only include ck_tile/core.hpp this single header to use all the functionality. Everything is under ck_tile namespace. The coding style under this folder should be similar to std (snake_case for structure/function, Camel for template types...)

algorithm/
    coordinate transform and some other reusable algorithm
arch/
    contains some basic device building block like mma, buffer addressing, etc...
container/
    contains basic container data structure, array/sequence/tuple/...
numeric/
    data type, and data type related math
tensor/
    tensor descriptors and tile level API
utility/
    other utility function for both host/device