* spolit the static library into several
* update lib paths and fix client example
* do not use device_mha_operarions for client examples
* use appropriate libs to link to client examples
* remove the gpu/transpose path from the list
* try fixing clinet examples 3,4,9
* add necessary libs for client examples
* fix the layernorm client example
* fix the client examples 23 and 24
* fix typo
* add interface library and refresh clang format
* add example
* fix example
* add instance for gemm permute
* add to client example
* change configs
* change instance file name
* formate
* change client example file name and remove example
We can use this template to eliminate duplicated iterator computing
logics. By providing return type to ck::accumulate_n(), we can avoid
type conversion operations.