mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 10:09:41 +00:00
* support bf16*mxfp4 gemm
* rebase bf16*fp4 example to develop branch
* Clean up commented debug code in GEMM kernel
* rename example folder
* support bf16*mxfp4 gemm
* rebase bf16*fp4 example to develop branch
* Clean up commented debug code in GEMM kernel
* rename example folder
* rebase to new develop
* fix clang format
* update code according to reviewer's comment
* Update README.md
* update code according to reviewer's comment
* update code according to reviewer's comment
* Update CMakeLists.txt
* Update README.md
* Update CMakeLists.txt
* Delete files
* Delete files
* Add unit tests
* Update test_gemm_quant_base.hpp
* merge bf16*fp4 example to develop branch
* fix clang format
* fix clang format
* Update CMakeLists.txt
* fix ci test
* fix clang format
* resolve conflicts
---------
Co-authored-by: eliotwang <charyang@smci355-ccs-aus-m10-29.cs-aus.dcgpu>
Co-authored-by: ShaoChunLee <Shao-Chun.Lee@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
[ROCm/composable_kernel commit: 715671e419]
ck_tile/core
ck_tile/core contains every basic functions and structures to create a GPU kernel using ck_tile. User should only include ck_tile/core.hpp this single header to use all the functionality. Everything is under ck_tile namespace. The coding style under this folder should be similar to std (snake_case for structure/function, Camel for template types...)
algorithm/
coordinate transform and some other reusable algorithm
arch/
contains some basic device building block like mma, buffer addressing, etc...
container/
contains basic container data structure, array/sequence/tuple/...
numeric/
data type, and data type related math
tensor/
tensor descriptors and tile level API
utility/
other utility function for both host/device