* Move kernel implementation files under impl directory.
* Update examples paths.
* Update device kernel impl include paths.
* Update tensor operation instances include paths.
* Update profiler and tests include paths.
* Clang-format
* Update include paths for batched gemm reduce
* Refactor UnitTest ConvNDBwdWeight.
* Refactor fwd and bwd data convND UT.
* Fix used test macro.
* Fix include path.
* Fix include paths.
* Fix include paths in profiler and tests.
* Fix include paths.
Co-authored-by: Adam Osewski <aosewski@amd.com>
* Add int8 specialization for elementwise Add and Subtract.
* CGEMM examples bf16, fp32, int8
* Add convert reference output to CDataType.
* Skip BF16 data type during testing.
* Lower K value to get rid of accumulation error.
* Fix merge artifact.
* Fix changed function name: GetElementSpaceSize()
* Fix merge artifact.
Co-authored-by: Adam Osewski <aosewski@amd.com>
* Reference CGEMM + test stub
* Format.
* Incomplete simple implementation
* Library instances
* Sketch of tests
* Test fixes.
* Example added
* Cosmetics
* Add elementwise operation kernel and example
* Add comment
* Add template argument of dim . Prepare to support multiple dimension
* Rename example
* Support 1 dimension
* Add static assert
* Add comment
* Second auxiliary buffer added
* Extract pad
* Remove redundant argument
* Support any dimension for elementwise operation
* Remove line
* Let it be the multiple number of CU
* Move thread per block to the parameter of constructor
* Consuming binary ops to do A+B / A-B
* Fix + cosmetics + bf16 test commented out temporarily
* Format
* Enabling bf16 test
* Revert "Enabling bf16 test"
This reverts commit f497e2ba44.
* Fix + test reenabled
* fix build
* Revert "fix build"
This reverts commit d73102384b.
* post PR #235 merge fix
* amend
* Single workspace for cgemm + helper
* Perf calc fix
* Review remarks: static_cast
* Review remarks: binary ops templated
* Cleaning
* Removal of instances and their tests
* Review remarks from aosew addressed
* Review remark: unnecessary attribute
* Post-merge fixes
* Restrict 4gemm to PassThrough + bug fix
* Review remarks
* update licence
* change cgemm example to fp16
Co-authored-by: rocking <chunylai@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Anthony Chang <ac.chang@outlook.com>