* init of grouped_gemm
* 2 gemm test
* perf test
* clean
* wrap desc into a struct
* test cast static_arr to pointer
* add ptr to GemmDesc
* add grouped gemm profiler
* fixed mem issue with unique_ptr
* clean
* clean
* finished ckprofiler
* Update README.md
* readme
* fixed readme
* add example
* improve code
* fixed comments: reserve, seperate ptr and gemm_shapes
* merge group and non-group
* fixed comments: replace push_back with emplace_back to avoid copy constructor
* fixed comments: unified blk2ctile; add test
* ci fix
* fixed ci
* fixed ci
* fixed ci