mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 18:17:44 +00:00
* start
* read for gridwise gemm
* add MakeBGridDescriptor_K0_N0_N1_N2_N3_K1
* add thread copy desc and register buffer
* add K0PerBlock dim
* add read global data
* finish gridwise gemm
* finish blockwise gemm
* add print data
* add smallest config
* add compare code for gridwis gemm
* fix NXdlPerWave
* fix k0perthread and gridewis gemm main loop
* remove b matrix lds alloc
* fix name
* add test code
* create b_grid_desc_k0_k1_k2_n0_n1_n2_n3_k3 from parameter
* add double register
* modify b_thread_desc_
* add float
* fp16 tag
* add tail for pipeline
* finish main loop
* optimize main loop
* start clear gridwise gemm
* clear code
* clear redundant code
* change file name
* change file name
* fix bug after merge develop
* fix input parameters
* using MultiK0 control b load data loop
* fix some config
* 4 buffer
* fix bug
* one can use
* change read order
* change buffer array to tuple
* change to 8 buffer
* interleave buffer load
* change to 16
* read 8 buffer
* add data buffer to template
* fix after merge develop(head file)
* format
* change to 4 buffer
* remove unnecessary lambda fun
[ROCm/composable_kernel commit: 10b3278b05]