composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-03 21:21:22 +00:00

Author	SHA1	Message	Date
Illia Silin	504b101da3	upgrade from clang-format-12 to clang-format-18 (#2568 ) * upgrade to clang-format-18 * update to clang-format-18 in pre-commit-config	2025-07-28 11:34:07 -07:00
Bartłomiej Kocot	2ccf914888	Add support for GKCYX grouped conv weight (#2023 ) * Grouped conv bwd weight GKCYX support * fix and changelog * fix * fix * fixes * comments * fix	2025-04-02 23:59:49 +02:00
Bartłomiej Kocot	85d6fcd30a	Add Grouped Convolution and GEMM documentation (#1719 ) * Add Grouped Convolution docs * Add gemm docs * Update docs * fix	2025-02-04 16:41:49 +01:00
Bartłomiej Kocot	742f5d6b55	Add Conv NGCHW client example (#1831 )	2025-01-22 01:02:03 +01:00
Haocong WANG	3049b5467c	[GEMM] gemm_universal related optimization (#1453 ) * replace buffer_atomic with global_atomic * fixed global_atomic_add * added bf16 atomic_add * format * clang-format-12 * clean * clean * add guards * Update gtest.cmake * enabled splitk_gemm_multi_d * format * add ckProfiler * format * fixed naming * format * clean * clean * add guards * fix clang format * format * add kbatch printout * clean * Add rocm6.2 related gemm optimization * Limit bf16 atomic usage * remove redundant RCR gemm_universal instance * Add RRR fp8 gemm universal instance * Bug fix * Add GPU_TARGET guard to FP8/BF8 target * bug fix * update cmake * remove all fp8/bf8 example if arch not support * Enable fp8 RRR support in ckProfiler * limit greedy-reverse flag to gemm_universal in ckProfiler --------- Co-authored-by: Jing Zhang <jizhan@fb.com> Co-authored-by: Jing Zhang <jizhan@meta.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com>	2024-08-14 10:42:30 +08:00
Rostyslav Geyyer	204da9c522	Move grouped conv fwd client examples (#1299 ) * Move grouped conv fwd client examples * Update existing examples * Format	2024-05-21 09:52:41 -05:00
Illia Silin	ae57e5938e	Split the instances by architecture. (#1223 ) * parse examples inside the add_example_executable function * fix the example 64 cmake file * add xdl flag to the gemm_bias_softmax_gemm_permute example * add filtering of tests based on architecture type * enable test_grouped_gemm for gfx9 only * enable test_transpose only for gfx9 * only linnk test_transpose if it gets built * split the gemm instances by architectures * split gemm_bilinear,grouped_conv_bwd_weight instances by targets * split instances by architecture * split grouped_conv instances by architecture * fix clang format * fix the if-else logic in group_conv headers * small fix for grouped convolution instances * fix the grouped conv bwd weight dl instances * fix client examples * only enable client examples 3 and 4 on gfx9 * set the gfx9 macro * make sure the architecture macros are set by cmake * use separate set of xdl/wmma flags for host code * sinmplify the main cmake file * add conv_fwd_bf8 instance declaration	2024-04-02 09:42:17 -07:00
amoskvic	a776978cbe	Style improvement: improving type alias usage consistency in gemm-related client examples. Also copyright year update for all client examples. (#1180 ) Co-authored-by: Arseny Moskvichev <amoskvic@amd.com>	2024-02-28 16:39:03 -08:00
Illia Silin	7965d66a81	Split the static library into several files. (#1044 ) * spolit the static library into several * update lib paths and fix client example * do not use device_mha_operarions for client examples * use appropriate libs to link to client examples * remove the gpu/transpose path from the list * try fixing clinet examples 3,4,9 * add necessary libs for client examples * fix the layernorm client example * fix the client examples 23 and 24 * fix typo * add interface library and refresh clang format	2023-11-28 11:17:37 -08:00
Bartłomiej Kocot	f2398f612d	Introduce multiABD api and deprecate multiD (#1035 ) * Introduce multiABD api and deprecate multiD * Replace multiD with multiABD * Mark structures as deprecated * Change doxygen deprecated to note to avoid warnings	2023-11-14 17:00:40 +01:00
Illia Silin	b94fd0b227	update copyright headers (#726 )	2023-05-31 18:46:57 -05:00
rocking	3eecbfb6ec	Revise layout of group convolution (#675 ) * [What] Remove pure conv int8 instance [Why] We will never use pure int8 conv in AI, use int8 quantization instead * Change layout * Share the kernel parameter * Support more type of NHWGC for group conv * Revise client example of conv 2d, use NHWGC layout * Add instance to cmake * Revise layout of group conv quantization instance * Revise layout of external api of group conv quantization * Revise layout of group conv quantization client example * Fix clang format * Add comment to describe meaning of each parameter	2023-04-23 23:40:00 -05:00
ltqin	830d37a7d5	Grouped conv1d client example (#589 ) * add conv1d fwd client example * change 07_grouped_conv2d_fwd to 07_grouped_convnd_fwd * add conv1d bwd weight --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-02-22 11:55:21 -06:00

13 Commits