mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-08 15:30:23 +00:00

Files

Christopher Millette b9c6cb1452 First look at mfma / wmma unification (#2704 )

* First look at mfma / wmma unification

* Refactor

* Re-org file structure

* Restructure transform selection and WaveWiseMma class

* Update license files. Add missing gfx1151 support. Change wave size for HOST to 1. Update datatypes naming consistency

* Fixes default MmaSelector implentation

* Adds unit tests for amdgcn_mma and arch

* Consolidate common arch id checks to constexpr functions. Strongly type ids as amdgcn_target_arch_id object.

* Refactor is_any_value_of

* Fixes mma_selector logic

* Fix typo

* Add mma selector test for tile decomposition

* Fix compilation of mma.hpp

* Revert back to c++17 compatibility

* Fix compiler error by returning index_t from get_warp_size()

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fixes compiler error for missing is_wave32() function

* Fixes compiler error for host wave_size() should be 64

* Fixes compiler errors where __cpp_concepts is not defined

* Fixes compiler errors where __cpp_concepts is not defined

* Fix test failure for host is wave64 by default

---------

Co-authored-by: Chris Millette <you@example.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

2025-11-24 09:39:59 -08:00

algorithm

[CK_Tile] Merge multiple convolution groups into a single GEMM batch (#2986 )

2025-10-29 16:49:28 +02:00

arch

First look at mfma / wmma unification (#2704 )

2025-11-24 09:39:59 -08:00

container

[CK_TILE] Share partition index across threads and specify offset in load_tile()/async_load_tile()/load_tile_transpose() (#2905 )

2025-11-12 10:26:14 +08:00

numeric

[CK_TILE] Refine FP32 => FP16/BF16 Conversion (#3215 )

2025-11-20 10:50:26 -08:00

tensor

[CK_TILE] Refine FP32 => FP16/BF16 Conversion (#3215 )

2025-11-20 10:50:26 -08:00

utility

First look at mfma / wmma unification (#2704 )

2025-11-24 09:39:59 -08:00

config.hpp

First look at mfma / wmma unification (#2704 )

2025-11-24 09:39:59 -08:00

README.md

introducing ck_tile! (#1216 )

2024-04-15 19:27:12 -05:00

README.md

ck_tile/core

ck_tile/core contains every basic functions and structures to create a GPU kernel using ck_tile. User should only include ck_tile/core.hpp this single header to use all the functionality. Everything is under ck_tile namespace. The coding style under this folder should be similar to std (snake_case for structure/function, Camel for template types...)

algorithm/
    coordinate transform and some other reusable algorithm
arch/
    contains some basic device building block like mma, buffer addressing, etc...
container/
    contains basic container data structure, array/sequence/tuple/...
numeric/
    data type, and data type related math
tensor/
    tensor descriptors and tile level API
utility/
    other utility function for both host/device