mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 10:09:41 +00:00
* Ck moe bs splitk pr (#3440)
* splitk kick-off. Compilation fail
* splitk hack pass
* fix scale offset calc.
* clang-format for a8w8_moe_blk_gemm1 splitk change
* fix testcase error
---------
Co-authored-by: oscar <huaiguxu@amd.com>
Co-authored-by: huaiguxu <145733371+huaiguxu@users.noreply.github.com>
* Zan/moe a8w4 (#3441)
* update
* update
* update ck moe a8w4
* update
* update
* update
* compile pass
* update
* update
* python3 op_tests/test_moe_2stage.py -t 16 -e 1 -k 1 -dim 256,256 ready
* support new a8w4 kernel
* update
* update ck_tile
* re format
* update
* update
* fix conflict
* fix build
* update ck_tile moe
* fix clang format
* fix the problem
* fix accruacy issue
* fix
---------
Co-authored-by: oscar <huaiguxu@amd.com>
Co-authored-by: huaiguxu <145733371+huaiguxu@users.noreply.github.com>
Co-authored-by: Zzz9990 <zanzhang@amd.com>
Co-authored-by: felix <felix.li@amd.com>
[ROCm/composable_kernel commit: c0ee71d735]
ck_tile/core
ck_tile/core contains every basic functions and structures to create a GPU kernel using ck_tile. User should only include ck_tile/core.hpp this single header to use all the functionality. Everything is under ck_tile namespace. The coding style under this folder should be similar to std (snake_case for structure/function, Camel for template types...)
algorithm/
coordinate transform and some other reusable algorithm
arch/
contains some basic device building block like mma, buffer addressing, etc...
container/
contains basic container data structure, array/sequence/tuple/...
numeric/
data type, and data type related math
tensor/
tensor descriptors and tile level API
utility/
other utility function for both host/device