* update grouped_gemm blockwise kernel
* update config
* update kernel
* update examples
* remove test code for now
* sync test files with origin/develop
* update example
* fix code lint
* fix code-lint
* update test code
* run clang format
* run pre-commit
* update api