* spolit the static library into several
* update lib paths and fix client example
* do not use device_mha_operarions for client examples
* use appropriate libs to link to client examples
* remove the gpu/transpose path from the list
* try fixing clinet examples 3,4,9
* add necessary libs for client examples
* fix the layernorm client example
* fix the client examples 23 and 24
* fix typo
* add interface library and refresh clang format
* Add wei_strides to grouped conv3d wei to keep consistency
* Fix strides in client examples
* Unify backward weight api with forward
* Fix for example
* Fixes for examples
---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>
* Remove redundant CMake setting
* Extract common code from files
* Rename folder 'convnd' to 'conv'
* Use std::array<> to accept compile-time kwnown # of arguments
* Fix compilation error of tuning parameter
* In example, use same setting as unit-test
* Remove no-longer used include directive
* Add interface for grouped conv bwd weight
* Add group support for conv bwd weight
* Add grouped conv bwd weight example
* Use group parameter in example
* Rename example folder
* Remove non-grouped version example source files
* Rename device op template
* Add group support to convolution backward weight
* Remove debug messages
* Use smaller group size in example
* Use named variable as loop terminate condition
* Prettify example output message
* Enlarge used grid size
* Allow real grid size exceeds expected grid size
* Rename interface file
* Add client example for grouped conv2d bwd weight
* Fix wrong include directive
* Rename client example folder