* add example
* fix example
* add instance for gemm permute
* add to client example
* change configs
* change instance file name
* formate
* change client example file name and remove example
We can use this template to eliminate duplicated iterator computing
logics. By providing return type to ck::accumulate_n(), we can avoid
type conversion operations.