* Re-structure ckProfiler source files
* Rename profiler.cpp to main.cpp
* Modularize ckProfiler operations
* Add description for profiler operations
* Use longer name to avoid name collision
* Use macro to delay expansion
* Use std::move() to avoid object copying
* Prohibit users from calling dtor
* Use macro to eliminate redundant code
* Make friend function hidden
* Add missing include directive <iostream>
* Fix wrong include directives
* Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test
Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>
* Remove redundant CMake setting
* Extract common code from files
* Rename folder 'convnd' to 'conv'
* Use std::array<> to accept compile-time kwnown # of arguments
* Fix compilation error of tuning parameter
* In example, use same setting as unit-test
* Remove no-longer used include directive
* Add interface for grouped conv bwd weight
* Add group support for conv bwd weight
* Add grouped conv bwd weight example
* Use group parameter in example
* Rename example folder
* Remove non-grouped version example source files
* Rename device op template
* Add group support to convolution backward weight
* Remove debug messages
* Use smaller group size in example
* Use named variable as loop terminate condition
* Prettify example output message
* Enlarge used grid size
* Allow real grid size exceeds expected grid size
* Rename interface file
* Add client example for grouped conv2d bwd weight
* Fix wrong include directive
* Rename client example folder