mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 18:17:44 +00:00
* First look at mfma / wmma unification
* Refactor
* Re-org file structure
* Restructure transform selection and WaveWiseMma class
* Update license files. Add missing gfx1151 support. Change wave size for HOST to 1. Update datatypes naming consistency
* Fixes default MmaSelector implentation
* Adds unit tests for amdgcn_mma and arch
* Consolidate common arch id checks to constexpr functions. Strongly type ids as amdgcn_target_arch_id object.
* Refactor is_any_value_of
* Fixes mma_selector logic
* Fix typo
* Add mma selector test for tile decomposition
* Fix compilation of mma.hpp
* Revert back to c++17 compatibility
* Fix compiler error by returning index_t from get_warp_size()
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Fixes compiler error for missing is_wave32() function
* Fixes compiler error for host wave_size() should be 64
* Fixes compiler errors where __cpp_concepts is not defined
* Fixes compiler errors where __cpp_concepts is not defined
* Fix test failure for host is wave64 by default
---------
Co-authored-by: Chris Millette <you@example.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
[ROCm/composable_kernel commit: b9c6cb1452]
ck_tile/core
ck_tile/core contains every basic functions and structures to create a GPU kernel using ck_tile. User should only include ck_tile/core.hpp this single header to use all the functionality. Everything is under ck_tile namespace. The coding style under this folder should be similar to std (snake_case for structure/function, Camel for template types...)
algorithm/
coordinate transform and some other reusable algorithm
arch/
contains some basic device building block like mma, buffer addressing, etc...
container/
contains basic container data structure, array/sequence/tuple/...
numeric/
data type, and data type related math
tensor/
tensor descriptors and tile level API
utility/
other utility function for both host/device