composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-13 09:45:56 +00:00

Author	SHA1	Message	Date
Po Yen Chen	730204eed0	Introduce ck::accumulate_n() (#439 ) We can use this template to eliminate duplicated iterator computing logics. By providing return type to ck::accumulate_n(), we can avoid type conversion operations.	2022-11-14 19:53:39 -06:00
Po Yen Chen	4a2a56c22f	Rangify constructor of HostTensorDescriptor & Tensor<> (#445 ) * Rangify STL algorithms This commit adapts rangified std::copy(), std::fill() & std::transform() * Rangify check_err() By rangifying check_err(), we can not only compare values between std::vector<>s, but also compare any ranges which have same value type. * Allow constructing Tensor<> like a HostTensorDescriptor * Simplify Tensor<> object construction logics * Remove more unnecessary 'HostTensorDescriptor' objects * Re-format example code * Re-write more HostTensorDescriptor ctor call	2022-11-11 11:36:01 -06:00
Adam Osewski	3048028897	Refactor device op implementations into `impl` subdirectory. (#420 ) * Move kernel implementation files under impl directory. * Update examples paths. * Update device kernel impl include paths. * Update tensor operation instances include paths. * Update profiler and tests include paths. * Clang-format * Update include paths for batched gemm reduce * Refactor UnitTest ConvNDBwdWeight. * Refactor fwd and bwd data convND UT. * Fix used test macro. * Fix include path. * Fix include paths. * Fix include paths in profiler and tests. * Fix include paths. Co-authored-by: Adam Osewski <aosewski@amd.com>	2022-10-13 09:05:08 -05:00
Chao Liu	204ef976ca	add more datatype to gemm+gemm and conv+conv example (#397 ) * refactor * refactor * adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm * adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm * clean	2022-09-01 09:31:17 -05:00
Chao Liu	4df6d93f60	conv+conv (1x1 only) example using gemm+gemm (#393 ) * refactor conv * add conv+conv example, 1x1 only	2022-08-31 11:27:11 -05:00

5 Commits