mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-01 20:21:23 +00:00
* Initial commit. create batched_contraction_kernel file * initial problem definition * implement initial example to launch kernel * add universal gemm to contraction. initial phase * complete implementation for special case all Dims are 1 and no Ds * clean code * initial changes to support multi dimensional G * more progress in implementing multiple G * tmp commit * manage dynamic NumDimG in kernel * improving example for multi M,N,K,G handling. start generalizing kernel. it is a temporary commit * implement the example for general Multi dimension G M N K and test different reference calculation algorithms * 2 functions for reference using multi dimensional and flat indexing * clean the code for muti dimentional G, M, N, K contraction and add some logs * Add Make descriptor function in kernel for merging Ms, Ns, Ks for A, B, E * some cleaning on kernel * clean the code for calculating the offsets from flatten batch number * Start adding MultiD support to kernel and example * more changes to manage multi D in kernel and example * manage passing multi d to kernel and testing. * complete multi D support in kernel. modify example code to support it * Correct algorithm to calc the correct offset values for D tensor batches and some code cleaning * Minor fix * Generalize example code for variable NumD tensors and apply cleanup based on review feedback * Refactored code and addressed review feedback * refactoring, cleaning, add documents, in kernel side and example codes * Optimize batch offset calculation in kernel * Inline CalculateBatchOffset in batched contraction kernel, update CHANGELOG.md --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
31 lines
917 B
CMake
31 lines
917 B
CMake
include_directories(AFTER
|
|
${CMAKE_CURRENT_LIST_DIR}
|
|
)
|
|
|
|
add_subdirectory(01_fmha)
|
|
add_subdirectory(02_layernorm2d)
|
|
add_subdirectory(03_gemm)
|
|
add_subdirectory(04_img2col)
|
|
add_subdirectory(05_reduce)
|
|
add_subdirectory(06_permute)
|
|
add_subdirectory(09_topk_softmax)
|
|
add_subdirectory(10_rmsnorm2d)
|
|
add_subdirectory(11_add_rmsnorm2d_rdquant)
|
|
add_subdirectory(12_smoothquant)
|
|
add_subdirectory(13_moe_sorting)
|
|
add_subdirectory(14_moe_smoothquant)
|
|
add_subdirectory(15_fused_moe)
|
|
add_subdirectory(16_batched_gemm)
|
|
add_subdirectory(17_grouped_gemm)
|
|
add_subdirectory(18_flatmm)
|
|
add_subdirectory(19_gemm_multi_d)
|
|
add_subdirectory(20_grouped_convolution)
|
|
add_subdirectory(21_elementwise)
|
|
add_subdirectory(22_gemm_multi_abd)
|
|
add_subdirectory(35_batched_transpose)
|
|
add_subdirectory(36_pooling)
|
|
add_subdirectory(38_block_scale_gemm)
|
|
add_subdirectory(39_copy)
|
|
add_subdirectory(40_streamk_gemm)
|
|
add_subdirectory(41_batched_contraction)
|