composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-19 20:40:07 +00:00

Author	SHA1	Message	Date
Illia Silin	d40b8d5e2c	update copyright headers (#726 ) [ROCm/composable_kernel commit: `b94fd0b227`]	2023-05-31 18:46:57 -05:00
Illia Silin	2359d80980	Enable gemm_dl and other kernels on Navi3x. (#714 ) * enable dl kernels on navi3 * do not build xdl tests and examples on Navi * run tests before building everything on jenkins * disable gemm_bilinear on gfx1030 * add gpu targets to installer on Navi * put tests in the same order as before * reduce the number of navi targets in CI * build CI installed for gfx940 as well * only build for MI300 during QA runs [ROCm/composable_kernel commit: `d821d1e54f`]	2023-05-23 11:23:16 -05:00
Illia Silin	dda83a196e	Syncing up from internal repo to enable MI300. (#690 ) * enable gfx940 * switch between intrinsic mfma routines on mi100/200 and mi300 * fix mfma_int8 on MI300 * disable 2 int8 examples on MI300 * Update cmake-ck-dev.sh * restore gitignore file * modify Jenkinsfile to the internal repo --------- Co-authored-by: Jing Zhang <jizha@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `4feebedd41`]	2023-04-28 18:22:59 -05:00
Po Yen Chen	ff9f244625	Introduce ck::accumulate_n() (#439 ) We can use this template to eliminate duplicated iterator computing logics. By providing return type to ck::accumulate_n(), we can avoid type conversion operations. [ROCm/composable_kernel commit: `730204eed0`]	2022-11-14 19:53:39 -06:00
Po Yen Chen	f2dd2e5b09	Rangify constructor of HostTensorDescriptor & Tensor<> (#445 ) * Rangify STL algorithms This commit adapts rangified std::copy(), std::fill() & std::transform() * Rangify check_err() By rangifying check_err(), we can not only compare values between std::vector<>s, but also compare any ranges which have same value type. * Allow constructing Tensor<> like a HostTensorDescriptor * Simplify Tensor<> object construction logics * Remove more unnecessary 'HostTensorDescriptor' objects * Re-format example code * Re-write more HostTensorDescriptor ctor call [ROCm/composable_kernel commit: `4a2a56c22f`]	2022-11-11 11:36:01 -06:00
Adam Osewski	8a8f8521f9	Refactor device op implementations into `impl` subdirectory. (#420 ) * Move kernel implementation files under impl directory. * Update examples paths. * Update device kernel impl include paths. * Update tensor operation instances include paths. * Update profiler and tests include paths. * Clang-format * Update include paths for batched gemm reduce * Refactor UnitTest ConvNDBwdWeight. * Refactor fwd and bwd data convND UT. * Fix used test macro. * Fix include path. * Fix include paths. * Fix include paths in profiler and tests. * Fix include paths. Co-authored-by: Adam Osewski <aosewski@amd.com> [ROCm/composable_kernel commit: `3048028897`]	2022-10-13 09:05:08 -05:00
Chao Liu	3ef500d96b	add more datatype to gemm+gemm and conv+conv example (#397 ) * refactor * refactor * adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm * adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm * clean [ROCm/composable_kernel commit: `204ef976ca`]	2022-09-01 09:31:17 -05:00
Chao Liu	feca6e57f9	conv+conv (1x1 only) example using gemm+gemm (#393 ) * refactor conv * add conv+conv example, 1x1 only [ROCm/composable_kernel commit: `4df6d93f60`]	2022-08-31 11:27:11 -05:00

8 Commits