* Wrap ck host utitlies in CK namespace.
The CK and CK-Tile source code bases are incompatible because CK is not properly using namespaces everywhere. In particular, we need to put hip_check_error in the ck namespace.
Move all functions in include/ck_/host_utility that were in global namespace into the ck namespace.
There may be additional namespace problems like this, and it's possible we'll have namespace clashes. But it is good design to properly guard our to code bases (CK and CKTile) so that they can both coexist. Moreover, estabilishing this compatiblity is essential if we are going to allow the builder to instantiate kernels from either template library.
* Add using declarations to test code.
After moving some of the untils into the ck namespace, most examples and a few tests had to be updated to recognize the new namespace declarations. We add using declarations to individual compute units for functions that were previously in the global namespace.
* Add using declarations to client examples.
-Added parameter to enable/disable verification and timing of kernel in various examples that missed it.
-Added parameter to change number of groups to execute in grouped_gemm_examples.
Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com>
* added working example for 5D input using 1D kernel
* example with 5D input tensor and 2d kernel - not working: issues with arguments
* added updated version of 3d device op - changed descriptors/dims
* added example file to check kernel
* fixed descriptor and isSupportedArgument stride problem
* added and modified kernel for 3d - updated tids/loop
* adding some more 5d example files
* fixed some issues
* changes made for testing
* working version: fixed error in stride for A, still a bit inefficient
* cleaned up formatting/comments
* updating formatting
* more formatting fixes
* fixing cmake, adding back gpu targets in cmake script
* adding client example
* added instances for client example
* fixed errors in client example
* implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp
* removed extra files
* minor formatting and naming fixes
* adding test files and profiler
* fixing minor error
* minor fix
* removed unneccesary comments, renamed files
* updated instance list for client example, added different layout example
* removing instances
* fixed error in instance generation
* remove comments
* update profiler and client example tensor layouts
* fixed errors in test/profiler
* updated vector dim access to enable vector load
* updated test/profiler files
* updated example with 1d kernel
* updating profiler
* renamed files
---------
Co-authored-by: Jing Zhang <jizha@amd.com>
* File renaming and class renaming for device element-wise operation
* Add batchnorm-infer instances, external API and client example
* Add batchnorm-infer profiler module and gtests
* Remove file device_elementwise_extension.hpp and move NormalizeInInfer operation to element_wise_operation.hpp
* Remove the using of class aliasing for DeviceElementwiseForBatchNormInfer
* Rename class and file due to conflict from device_elementwise_2d.hpp
* Fix namespace in batcnnorm_infer_nhwc client example