composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-14 10:09:41 +00:00

Author	SHA1	Message	Date
Sam Wu	4bb12a1bd8	Documentation Updates (#710 ) * update documentation dependencies add version number to docs rename doc config directories enable more doc formats on rtd add license section in docs [ROCm/composable_kernel commit: `3cff340423`]	2023-05-18 11:08:38 -06:00
Bartłomiej Kocot	993c671395	Add contraction profiler and tests (#701 ) * Add contraction profiler and tests * Build and style fixes * Allow to use any elementwise operator for ref_contraction * Introduce profile_contraction_scale and profile_contraction_bilinear * Make ref_contraction generic and extend interface tests * Stylistic minor fixes * Extend test_contraction_interface [ROCm/composable_kernel commit: `642d5e9155`]	2023-05-15 09:46:52 -05:00
rocking	e57089f861	Normalization/split k (#615 ) [ROCm/composable_kernel commit: `a1e344b1ae`]	2023-05-11 07:15:02 -05:00
Rostyslav Geyyer	7d92b0fb64	Optimize bf16 conversion (#664 ) * Add TypeConvert class and start refactoring * Refactor TypeConvert as a struct * Get back to template functions type_convert * Add a type_convert_bf16_rtn, set rtz as default * Clean up * Add UnaryConvertPrecision struct for high-precision workloads * Format * Update type_convert to UnaryConvert on threadwise level * Update UnaryConvertPrecision * Format * Fix chmod * Add a flag to pick converion method * Format * Remove the added flag * Merge elementwise op with type conversion * Move type_convert to elemwise op, update the op * Update type_convert_precision -> bf16_convert_rtn * Clean up * Update comments * Update the CK_WORKAROUND_DENORM_FIX flag handling * Update the unneeded op to work but warn user * Remove the message * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference * Format * Add missing include [ROCm/composable_kernel commit: `b076a02ad2`]	2023-05-04 10:25:47 -05:00
Illia Silin	a2d3ef1536	Fix the group of quantization_int8 kernels on MI300. (#695 ) * replace amd_buffer_atomic_add with hip_atomic_add * fix grouped_gemm_splitk kernels on mi300 * fix syntax * revert experimental atomic_add changes * fix the group of kernels from ticket 723 on MI300 --------- Co-authored-by: Jing Zhang <jizhan@amd.com> [ROCm/composable_kernel commit: `b8635a25b2`]	2023-05-03 18:27:04 -05:00
Illia Silin	5406c5254e	Fix grouped_gemm_splitk kernels on MI300. (#694 ) * replace amd_buffer_atomic_add with hip_atomic_add * fix grouped_gemm_splitk kernels on mi300 * fix syntax * revert experimental atomic_add changes --------- Co-authored-by: Jing Zhang <jizhan@amd.com> [ROCm/composable_kernel commit: `4a51d2da9d`]	2023-05-03 08:25:25 -07:00
Illia Silin	358f58f14b	update daily build from rocm 5.4.3 to 5.5 (#693 ) [ROCm/composable_kernel commit: `86e0190ec9`]	2023-05-03 08:18:10 -07:00
zjing14	38cb16791b	fixed init range (#691 ) [ROCm/composable_kernel commit: `f53ede26e5`]	2023-05-02 08:30:23 -07:00
Illia Silin	da61da8b4a	Syncing up from internal repo to enable MI300. (#690 ) * enable gfx940 * switch between intrinsic mfma routines on mi100/200 and mi300 * fix mfma_int8 on MI300 * disable 2 int8 examples on MI300 * Update cmake-ck-dev.sh * restore gitignore file * modify Jenkinsfile to the internal repo --------- Co-authored-by: Jing Zhang <jizha@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `4feebedd41`]	2023-04-28 18:22:59 -05:00
Haocong WANG	1dc0de1c00	add vector load check (#680 ) Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `54c90aae13`]	2023-04-26 15:58:57 -05:00
Jun Liu	aea315a7c4	[CK] suppress unsafe buffer warn (#687 ) incomplete fix from https://github.com/ROCmSoftwarePlatform/composable_kernel/pull/670 So it does not only happen in gtest but also in CK code: We need to fix them as a quality improvement, but for now suppressing this warning in immediate releases: http://compiler-ci.amd.com/blue/rest/organizations/jenkins/pipelines/compiler-psdb-amd-stg-open/runs/2540/nodes/282/steps/3202/log/?start=0 e.g. ``` [2023-04-26T17:26:31.524Z] /jenkins/workspace/compiler-psdb-amd-stg-open/Libs/MIOpen/deps_hip/cget/build/tmp-a3db5da587a64213bde99fb856db1b43/composable_kernel-9084a068fb4f5fe7d58cc80e08b9769da1f64556/include/ck/utility/generic_memory_space_atomic.hpp:52:19: error: unsafe pointer arithmetic [-Werror,-Wunsafe-buffer-usage] [2023-04-26T17:26:31.524Z] atomicAdd(c_style_pointer_cast<float>(p_dst) + 1, vx.template AsType<float>()[I1]); [2023-04-26T17:26:31.524Z] ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` ``` [2023-04-26T17:26:31.523Z] /jenkins/workspace/compiler-psdb-amd-stg-open/Libs/MIOpen/deps_hip/cget/build/tmp-a3db5da587a64213bde99fb856db1b43/composable_kernel-9084a068fb4f5fe7d58cc80e08b9769da1f64556/include/ck/utility/amd_inline_asm.hpp:62:20: error: 'p_a_half2' is an unsafe pointer used for buffer access [-Werror,-Wunsafe-buffer-usage] [2023-04-26T17:26:31.523Z] const half2_t p_a_half2 = c_style_pointer_cast<const half2_t*>(&a); [2023-04-26T17:26:31.523Z] ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` [ROCm/composable_kernel commit: `7613c1d9b9`]	2023-04-26 15:41:03 -05:00
Adam Osewski	d9fe87efbd	Grouped Gemm + SplitK + simplified Kernel Args (#669 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * B2C with 3D grid for KSplit * Remove unused code. * Use default B2C (3D grid) in grid gemm v2r4r2. * Device gemm splitk use B2C map. * Device GroupedGemmXdlSplitKCShuffle * Example for GroupedGemm Xdl SplitK * Introduce Device GroupedGemmSplitK * Fix updating kbatch size. * Add instance mk-nk-mn * Enable set kbatch in profiler. * Add GGemmSplitK mk-kn-mn instances * Add more instances & split into multiple files. * minor fix * tuning * clean * disabled failed instances * use pipe v2 * Ignore arg on not supported arch. * fix warning --------- Co-authored-by: carlushuang <carlus.huang@amd.com> Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Jing Zhang <jizhan@amd.com> Co-authored-by: root <root@ctr-ubbsmc15.amd.com> [ROCm/composable_kernel commit: `8bb2bb4a05`]	2023-04-24 15:43:36 -05:00
zjing14	f28c43b544	reduce inital number for half_t splitk (#685 ) [ROCm/composable_kernel commit: `8b9cbba823`]	2023-04-24 08:07:39 -05:00
rocking	cff08cbc72	Revise layout of group convolution (#675 ) * [What] Remove pure conv int8 instance [Why] We will never use pure int8 conv in AI, use int8 quantization instead * Change layout * Share the kernel parameter * Support more type of NHWGC for group conv * Revise client example of conv 2d, use NHWGC layout * Add instance to cmake * Revise layout of group conv quantization instance * Revise layout of external api of group conv quantization * Revise layout of group conv quantization client example * Fix clang format * Add comment to describe meaning of each parameter [ROCm/composable_kernel commit: `3eecbfb6ec`]	2023-04-23 23:40:00 -05:00
Illia Silin	55d16b3400	Put back the split-k gemm code. (#684 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout --------- Co-authored-by: carlushuang <carlus.huang@amd.com> [ROCm/composable_kernel commit: `903cd19ce3`]	2023-04-21 19:37:00 -05:00
Illia Silin	9ed5ad0f21	Switch to the new rocm5.6 compiler. (#681 ) * switch to the new rocm5.6 compiler and docker * fix syntax [ROCm/composable_kernel commit: `9afa44d40b`]	2023-04-21 07:59:26 -07:00
Sam Wu	11168111ba	Update dependabot config (#682 ) Co-authored-by: samjwu <samjwu@users.noreply.github.com> [ROCm/composable_kernel commit: `938a5e0e41`]	2023-04-20 21:55:56 -06:00
Illia Silin	ebc7fabbe5	Allow using ROCm release candidate compilers. (#679 ) * enable use of rocm5.5 release candidate 4 * upgrade to ROCM5.5 RC5 * try fix the PUB_KEY error, remove the cmake-data package * upgrade to latest cmake version * use private dockerhub repo for rocm5.5 rc5 * add missing bracket [ROCm/composable_kernel commit: `bb0b772da9`]	2023-04-18 09:22:49 -07:00
rocking5566	ee4b893928	Add (#677 ) [ROCm/composable_kernel commit: `fd11a4a12a`]	2023-04-17 10:12:10 -05:00
Haocong WANG	f0f697ae4a	Fix a typo (#676 ) [ROCm/composable_kernel commit: `fc26d42a2e`]	2023-04-15 21:57:34 -05:00
Rostyslav Geyyer	6e1df339c9	Add more macros to turn on/off denorm fix (#678 ) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> [ROCm/composable_kernel commit: `03eaee6ae6`]	2023-04-15 21:56:07 -05:00
Haocong WANG	000176b5fc	Add memory index guard in wmma device ops (#667 ) [ROCm/composable_kernel commit: `e85178b4ca`]	2023-04-11 15:42:47 -05:00
Jun Liu	b4df986264	[gtest] suppress unsafe buffer warn (#670 ) ref: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1912 [ROCm/composable_kernel commit: `f532988713`]	2023-04-11 15:41:49 -05:00
Sam Wu	e5a82c403a	Add dependabot config and pin rocm-docs-core (#663 ) [ROCm/composable_kernel commit: `fd497f0e79`]	2023-04-11 09:18:38 -06:00
zjing14	53b28d2146	fixed quant example (#672 ) Co-authored-by: root <root@ctr-ubbsmc15.amd.com> [ROCm/composable_kernel commit: `c203bf6711`]	2023-04-11 07:46:46 -05:00
zjing14	b18d739672	add a marco to turn on/off denorm fix (off by default) (#673 ) * add a marco to turn off denorm fix by default * expose the marco --------- Co-authored-by: root <root@ctr-ubbsmc15.amd.com> [ROCm/composable_kernel commit: `c54f8bcc25`]	2023-04-11 07:44:43 -05:00
rocking5566	356c1cc17b	Groupnorm + swish external api (#668 ) * Rename to proper naming * Add example of groupnorm + swish * Extract duplicate code in example * Add groupnorm + swish instances * Ractor instance generation, split into multiple cpp file * Add external api and client example * Refine profiler message * Use ck math version of exp * Refine problem size in example * Add host version of exp [ROCm/composable_kernel commit: `ed3a2e5226`]	2023-04-10 08:02:17 -05:00
Jun Liu	89d6f8a65f	Issue #666 : Revert "simplify karg in device/grid of split-k op (#644 )" (#665 ) This reverts commit `1108f64591`. [ROCm/composable_kernel commit: `3248387bbb`]	2023-04-06 17:14:11 -07:00
zjing14	696991c923	add fp64 instances (#658 ) Co-authored-by: root <root@ctr-ubbsmc15.amd.com> [ROCm/composable_kernel commit: `fde6d2742b`]	2023-03-30 13:30:43 -05:00
Haocong WANG	37f95442f9	fix 3rd dword of buffer source descriptor (#659 ) [ROCm/composable_kernel commit: `091570f594`]	2023-03-29 19:03:55 -05:00
carlushuang	1108f64591	simplify karg in device/grid of split-k op (#644 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout [ROCm/composable_kernel commit: `bb5530af91`]	2023-03-29 19:03:07 -05:00
Rostyslav Geyyer	15ac3fc064	Add a denorm test fix (#603 ) * Add type_convert implementations for bf16 * Add the fix for conv_fwd * Add the fix for conv_bwd_data * Add the fix for conv_bwd_weight * Format * Format * Another format * Add a macro to use workaround on MI200 only * Format --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `dbd8f94bef`]	2023-03-29 15:05:32 -05:00
rocking5566	cbce8b77da	Conv + quantization + tanh (#645 ) * Rename file. Prepare to support another activation * Add comment for quantization * Extract out_elementop * Add tanh example * Add conv + bias + tanh quantization instance * Add missing parameter * Refine cmake * Add external api and client example * Extract variable in example * Fix the comment --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `389e84a83b`]	2023-03-29 14:50:23 -05:00
Haocong WANG	8a984b4e3f	Add CMake Option "USE_OPT_NAVI3X" (#647 ) * Add CMake Option "USE_OPT_NAVI3X" * remove navi3x opt compile option from cmake script [ROCm/composable_kernel commit: `4e097ad283`]	2023-03-29 14:07:33 -05:00
Sam Wu	5a8db87383	Separate bibtex requirement from rocm-docs-core (#656 ) * separate bibtex requirement from rocm-docs-core * point requirements to source rocm-docs-core repo [ROCm/composable_kernel commit: `88d474323b`]	2023-03-27 17:14:36 -06:00
Sam Wu	2268a29786	standardize docs (#655 ) [ROCm/composable_kernel commit: `f80776d937`]	2023-03-23 20:58:59 -07:00
Haocong WANG	84f096c844	[Navi3x] Fix Gridwise_multiple_d operation (#649 ) * Add CMake Option "USE_OPT_NAVI3X" * fix bug [ROCm/composable_kernel commit: `e5376be4ac`]	2023-03-23 11:22:10 -05:00
Po Yen Chen	57c8d94bf7	Reduce group & batch of the tested convolutions (#648 ) [ROCm/composable_kernel commit: `fe96e8fbf2`]	2023-03-22 10:49:11 -07:00
Illia Silin	b3c1e83276	Get rid of XDL parameters in WMMA kernel string. (#646 ) * remove XDL parameters from WMMA kernel string * get rid f two more parameters [ROCm/composable_kernel commit: `36750a5763`]	2023-03-22 08:05:48 -07:00
Dan Yao	a84d2f5d81	rtn in ternary way (#632 ) * rtn in ternary way * Check both flags to preserve NaN * Format * Rearrange flag1 * Apply suggestions from code review Co-authored-by: Ronan Keryell <ronan@keryell.fr> --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> Co-authored-by: Ronan Keryell <ronan@keryell.fr> [ROCm/composable_kernel commit: `8a659a2e4c`]	2023-03-20 14:30:24 -05:00
ltqin	fc10856d4b	workaround 637 (#640 ) * add workaround 637 * format * change id --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `6ae12434d2`]	2023-03-20 11:49:31 -05:00
Rostyslav Geyyer	5c8eb78a25	Update cmake-ck-dev.sh script (#641 ) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> [ROCm/composable_kernel commit: `fa998675fc`]	2023-03-15 18:38:11 -05:00
rocking5566	6a1403d82d	gemm/Conv xdlops + dlops quantization (#625 ) * Add conv perlayer quantization * Add gemm_dlops quantization * Support int8 for innerproduct * Refine gemm dlops int8 kernel parameter * Support gfx908(MI100) and gfx90a(MI200) * clang-format * Rename example number * Support different layout for d tensor * Add conv dlops perchannel quantization example * Move to example 40 * Extract the common code for different platform (dlops and xdlops) * Move ot subfolder. Prepare to add other op of quantization * Refine the quantization instance library * Add conv dl instances and client example * Remove unnecessary type * Add gemm quantization instance * Add external api and client example * Refine num_bytes * Separete different layout to different cpp * Add more xdl instances * Revert "Remove unnecessary type" This reverts commit `820869182f`. * Remove CShuffleDataType in dlops Let acc and CShuffleDataType be the same in xdlops --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `16dc18e0f9`]	2023-03-15 15:29:40 -05:00
Adam Osewski	512ec3ac4d	Device Op GroupedGemmMultipleD + example fp16 (#633 ) * Pass shared mem pointer as pointer to void. * Device Op GroupedGEMM Multiple D * Example for grouped gemm multiple d. * Add MI200 to supported archs. --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `a2d5ca8e95`]	2023-03-15 11:22:59 -05:00
Rostyslav Geyyer	6e6482b9cd	Add layout check to IsSupportedArgument (#627 ) * Add layout check to IsSupportedArgument * Format --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `c10a6e8293`]	2023-03-15 11:12:12 -05:00
Illia Silin	87113ad617	Update GetTypeString function to generate unique kernel IDs. (#638 ) * make conv_fwd_bias_activation kernel id unique * add more parameters to conv and gemm kernel names * update GetTypeString for conv and gemm kernels * fix two more kernel strings [ROCm/composable_kernel commit: `14b3504d95`]	2023-03-15 10:44:42 -05:00
Haocong WANG	459469f66a	Fix arch limitation bug (#639 ) [ROCm/composable_kernel commit: `ea028ac65a`]	2023-03-15 07:44:13 -07:00
Rostyslav Geyyer	b78f3ba805	Remove debug asserts (#629 ) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> [ROCm/composable_kernel commit: `5b57ab96a8`]	2023-03-10 17:34:44 -06:00
Haocong WANG	9687ad0b61	[Navi3x] Multiple issue fix (#612 ) * Change gridwise gemm mD blockwise gemm to naive * RRR Gemm fix * Fix RCR gemm bug * Isolate wmma instructions * Update amd_inline_asm.hpp * Update amd_wmma.hpp * Update amd_wmma.hpp * fix syntax and update Jenkinsfile --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `087e310589`]	2023-03-10 17:04:28 -06:00
carlushuang	ca7b3a4f58	fix a bug with non-dword-aligned offset when OOB, in case crash (#616 ) Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `76fcdc60e9`]	2023-03-09 08:07:24 -06:00

1 2 3 4 5 ...

885 Commits