composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-03 05:01:25 +00:00

Author	SHA1	Message	Date
Illia Silin	504b101da3	upgrade from clang-format-12 to clang-format-18 (#2568 ) * upgrade to clang-format-18 * update to clang-format-18 in pre-commit-config	2025-07-28 11:34:07 -07:00
Bartłomiej Kocot	c2e4898b4b	Grouped conv bwd data NGCHW (#1967 ) * Grouped conv bwd data NGCHW * fixes * fix * Improvements * Fix * Fix * add client example	2025-03-17 13:32:00 +01:00
Bartłomiej Kocot	85d6fcd30a	Add Grouped Convolution and GEMM documentation (#1719 ) * Add Grouped Convolution docs * Add gemm docs * Update docs * fix	2025-02-04 16:41:49 +01:00
Haocong WANG	3049b5467c	[GEMM] gemm_universal related optimization (#1453 ) * replace buffer_atomic with global_atomic * fixed global_atomic_add * added bf16 atomic_add * format * clang-format-12 * clean * clean * add guards * Update gtest.cmake * enabled splitk_gemm_multi_d * format * add ckProfiler * format * fixed naming * format * clean * clean * add guards * fix clang format * format * add kbatch printout * clean * Add rocm6.2 related gemm optimization * Limit bf16 atomic usage * remove redundant RCR gemm_universal instance * Add RRR fp8 gemm universal instance * Bug fix * Add GPU_TARGET guard to FP8/BF8 target * bug fix * update cmake * remove all fp8/bf8 example if arch not support * Enable fp8 RRR support in ckProfiler * limit greedy-reverse flag to gemm_universal in ckProfiler --------- Co-authored-by: Jing Zhang <jizhan@fb.com> Co-authored-by: Jing Zhang <jizhan@meta.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com>	2024-08-14 10:42:30 +08:00
amoskvic	a776978cbe	Style improvement: improving type alias usage consistency in gemm-related client examples. Also copyright year update for all client examples. (#1180 ) Co-authored-by: Arseny Moskvichev <amoskvic@amd.com>	2024-02-28 16:39:03 -08:00
Illia Silin	7965d66a81	Split the static library into several files. (#1044 ) * spolit the static library into several * update lib paths and fix client example * do not use device_mha_operarions for client examples * use appropriate libs to link to client examples * remove the gpu/transpose path from the list * try fixing clinet examples 3,4,9 * add necessary libs for client examples * fix the layernorm client example * fix the client examples 23 and 24 * fix typo * add interface library and refresh clang format	2023-11-28 11:17:37 -08:00
zjing14	04f93aadb8	Grouped conv bwd data with fp16 input and bf8fp8 comp (#962 ) * Add f8 bf8 gemm example * Add element-wise ops * Add intrinsics * Update reference calculation * Add an additional type option for xdlops gemm * Fix build process * Add bf8 to buffer addressing * Update blockwise op, split typeA and typeB * Update for compatibility * Uppdate naming to f8->fp8 * Update naming * Format * Update naming (#937) * Add a client example * Add computetypes to device and gridwise ops * Add instances, update instance factory * Format * Fix a flag * Add ckProfiler mode * Fix typos * Add an example * Add bf8 generator * add bf8 mfma; fixed type_convert for bf8 * move verfication ahead of timing * Update reference calculation * Fix reference * Narrow down float init range * Fix bf8 bf8 mfma * Add bf8 @ fp8 mfma * Update example * Update instances * Update profiler api * Update for compatibility * Format * Remove extra example * Clean up * workaround convert * added instance of f16_bf8f8, and client example * fixed mfma selector * format --------- Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com> Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> Co-authored-by: Jing Zhang <jizha@amd.com>	2023-10-04 18:04:27 -05:00

7 Commits