Po Yen Chen
e418b29268
Introduce ck::accumulate_n() ( #439 )
...
We can use this template to eliminate duplicated iterator computing
logics. By providing return type to ck::accumulate_n(), we can avoid
type conversion operations.
[ROCm/composable_kernel commit: 730204eed0 ]
2022-11-14 19:53:39 -06:00
Po Yen Chen
a4776782a5
Rangify constructor of HostTensorDescriptor & Tensor<> ( #445 )
...
* Rangify STL algorithms
This commit adapts rangified std::copy(), std::fill() & std::transform()
* Rangify check_err()
By rangifying check_err(), we can not only compare values between
std::vector<>s, but also compare any ranges which have same value
type.
* Allow constructing Tensor<> like a HostTensorDescriptor
* Simplify Tensor<> object construction logics
* Remove more unnecessary 'HostTensorDescriptor' objects
* Re-format example code
* Re-write more HostTensorDescriptor ctor call
[ROCm/composable_kernel commit: 4a2a56c22f ]
2022-11-11 11:36:01 -06:00
Adam Osewski
c747be612f
Refactor device op implementations into impl subdirectory. ( #420 )
...
* Move kernel implementation files under impl directory.
* Update examples paths.
* Update device kernel impl include paths.
* Update tensor operation instances include paths.
* Update profiler and tests include paths.
* Clang-format
* Update include paths for batched gemm reduce
* Refactor UnitTest ConvNDBwdWeight.
* Refactor fwd and bwd data convND UT.
* Fix used test macro.
* Fix include path.
* Fix include paths.
* Fix include paths in profiler and tests.
* Fix include paths.
Co-authored-by: Adam Osewski <aosewski@amd.com >
[ROCm/composable_kernel commit: 3048028897 ]
2022-10-13 09:05:08 -05:00
zjing14
56de9aaa27
add g; fixed strides ( #355 )
...
[ROCm/composable_kernel commit: 35e49f2de6 ]
2022-08-12 15:22:39 -05:00
zjing14
ed44e71894
Add batched/grouped_gemm contraction deviceOps ( #349 )
...
* convnd_fwd fp16 example
* update example
* update example
* update instance
* updating refernce conv
* update reference conv
* update conv fwd profiler
* update conv 1d and 3d instance
* update include path
* clean
* update profiler for conv bwd data and weight
* update conv bwd weight
* clean
* update conv example
* update profiler for conv bwd weight
* update ckprofiler for conv bwd data
* fix reference conv bwd data bug; update conv bwd data test
* update examples
* fix initialization issue
* update test for conv fwd
* clean
* clean
* remove test case too sensitive to error threshhold
* fix test
* clean
* fix build
* adding conv multiple d
* adding conv multiple D
* add matrix padder
* add gemm padding to convnd
* adding group conv
* update gemm multi-d
* refactor
* refactor
* refactor
* clean
* clean
* refactor
* refactor
* reorg
* add ds
* add bias
* clean
* add G
* adding group
* adding group
* adding group
* update Tensor
* clean
* update example
* update DeviceGemmMultipleD_Xdl_CShuffle
* update conv bwd-data and bwd-weight
* upate contraction example
* update gemm and batch gemm with e permute
* fix example build
* instance for grouped conv1d
* update example
* adding group conv instance
* update gemm bilinear instance
* update gemm+add+add+fastgelu instance
* update profiler
* update profiler
* update test
* update test and client example
* clean
* add grouped conv into profiler
* update profiler
* clean
* add test grouped conv, update all conv test to gtest
* update test
* change gemm_c_permute with contraction
* add grouped_contraction
* add contraction in group_gemm
* add example of grouped_gemm with contraction
* add example of grouped_contraction_bias_e_permute
* clean
* fixed ds
* add m3n2 m2n3 examples into gemm_bias_e_permute
Co-authored-by: Chao Liu <chao.liu2@amd.com >
[ROCm/composable_kernel commit: e08d68d25d ]
2022-08-10 12:20:29 -05:00
Chao Liu
236f946292
Clean up conv example, Instances, profiler and test ( #324 )
...
* convnd_fwd fp16 example
* update example
* update example
* update instance
* updating refernce conv
* update reference conv
* update conv fwd profiler
* update conv 1d and 3d instance
* update include path
* clean
* update profiler for conv bwd data and weight
* update conv bwd weight
* clean
* update conv example
* update profiler for conv bwd weight
* update ckprofiler for conv bwd data
* fix reference conv bwd data bug; update conv bwd data test
* update examples
* fix initialization issue
* update test for conv fwd
* clean
* clean
* remove test case too sensitive to error threshhold
* fix test
* clean
* fix build
* adding conv multiple d
* adding conv multiple D
* add matrix padder
* add gemm padding to convnd
* adding group conv
* update gemm multi-d
* refactor
* refactor
* refactor
* clean
* clean
* refactor
* refactor
* reorg
* add ds
* add bias
* clean
* add G
* adding group
* adding group
* adding group
* update Tensor
* clean
* update example
* update DeviceGemmMultipleD_Xdl_CShuffle
* update conv bwd-data and bwd-weight
* upate contraction example
* update gemm and batch gemm with e permute
* fix example build
* instance for grouped conv1d
* update example
* adding group conv instance
* update gemm bilinear instance
* update gemm+add+add+fastgelu instance
* update profiler
* update profiler
* update test
* update test and client example
* clean
* add grouped conv into profiler
* update profiler
* clean
* add test grouped conv, update all conv test to gtest
* update test
[ROCm/composable_kernel commit: 500fa99512 ]
2022-07-29 18:19:25 -05:00