composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-12 18:17:58 +00:00

Author	SHA1	Message	Date
rocking	e57089f861	Normalization/split k (#615 ) [ROCm/composable_kernel commit: `a1e344b1ae`]	2023-05-11 07:15:02 -05:00
Rostyslav Geyyer	7d92b0fb64	Optimize bf16 conversion (#664 ) * Add TypeConvert class and start refactoring * Refactor TypeConvert as a struct * Get back to template functions type_convert * Add a type_convert_bf16_rtn, set rtz as default * Clean up * Add UnaryConvertPrecision struct for high-precision workloads * Format * Update type_convert to UnaryConvert on threadwise level * Update UnaryConvertPrecision * Format * Fix chmod * Add a flag to pick converion method * Format * Remove the added flag * Merge elementwise op with type conversion * Move type_convert to elemwise op, update the op * Update type_convert_precision -> bf16_convert_rtn * Clean up * Update comments * Update the CK_WORKAROUND_DENORM_FIX flag handling * Update the unneeded op to work but warn user * Remove the message * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference * Format * Add missing include [ROCm/composable_kernel commit: `b076a02ad2`]	2023-05-04 10:25:47 -05:00
Illia Silin	a2d3ef1536	Fix the group of quantization_int8 kernels on MI300. (#695 ) * replace amd_buffer_atomic_add with hip_atomic_add * fix grouped_gemm_splitk kernels on mi300 * fix syntax * revert experimental atomic_add changes * fix the group of kernels from ticket 723 on MI300 --------- Co-authored-by: Jing Zhang <jizhan@amd.com> [ROCm/composable_kernel commit: `b8635a25b2`]	2023-05-03 18:27:04 -05:00
Illia Silin	5406c5254e	Fix grouped_gemm_splitk kernels on MI300. (#694 ) * replace amd_buffer_atomic_add with hip_atomic_add * fix grouped_gemm_splitk kernels on mi300 * fix syntax * revert experimental atomic_add changes --------- Co-authored-by: Jing Zhang <jizhan@amd.com> [ROCm/composable_kernel commit: `4a51d2da9d`]	2023-05-03 08:25:25 -07:00
Illia Silin	da61da8b4a	Syncing up from internal repo to enable MI300. (#690 ) * enable gfx940 * switch between intrinsic mfma routines on mi100/200 and mi300 * fix mfma_int8 on MI300 * disable 2 int8 examples on MI300 * Update cmake-ck-dev.sh * restore gitignore file * modify Jenkinsfile to the internal repo --------- Co-authored-by: Jing Zhang <jizha@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `4feebedd41`]	2023-04-28 18:22:59 -05:00
Haocong WANG	1dc0de1c00	add vector load check (#680 ) Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `54c90aae13`]	2023-04-26 15:58:57 -05:00
Adam Osewski	d9fe87efbd	Grouped Gemm + SplitK + simplified Kernel Args (#669 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * B2C with 3D grid for KSplit * Remove unused code. * Use default B2C (3D grid) in grid gemm v2r4r2. * Device gemm splitk use B2C map. * Device GroupedGemmXdlSplitKCShuffle * Example for GroupedGemm Xdl SplitK * Introduce Device GroupedGemmSplitK * Fix updating kbatch size. * Add instance mk-nk-mn * Enable set kbatch in profiler. * Add GGemmSplitK mk-kn-mn instances * Add more instances & split into multiple files. * minor fix * tuning * clean * disabled failed instances * use pipe v2 * Ignore arg on not supported arch. * fix warning --------- Co-authored-by: carlushuang <carlus.huang@amd.com> Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Jing Zhang <jizhan@amd.com> Co-authored-by: root <root@ctr-ubbsmc15.amd.com> [ROCm/composable_kernel commit: `8bb2bb4a05`]	2023-04-24 15:43:36 -05:00
Illia Silin	55d16b3400	Put back the split-k gemm code. (#684 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout --------- Co-authored-by: carlushuang <carlus.huang@amd.com> [ROCm/composable_kernel commit: `903cd19ce3`]	2023-04-21 19:37:00 -05:00
Haocong WANG	f0f697ae4a	Fix a typo (#676 ) [ROCm/composable_kernel commit: `fc26d42a2e`]	2023-04-15 21:57:34 -05:00
Rostyslav Geyyer	6e1df339c9	Add more macros to turn on/off denorm fix (#678 ) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> [ROCm/composable_kernel commit: `03eaee6ae6`]	2023-04-15 21:56:07 -05:00
Haocong WANG	000176b5fc	Add memory index guard in wmma device ops (#667 ) [ROCm/composable_kernel commit: `e85178b4ca`]	2023-04-11 15:42:47 -05:00
zjing14	b18d739672	add a marco to turn on/off denorm fix (off by default) (#673 ) * add a marco to turn off denorm fix by default * expose the marco --------- Co-authored-by: root <root@ctr-ubbsmc15.amd.com> [ROCm/composable_kernel commit: `c54f8bcc25`]	2023-04-11 07:44:43 -05:00
rocking5566	356c1cc17b	Groupnorm + swish external api (#668 ) * Rename to proper naming * Add example of groupnorm + swish * Extract duplicate code in example * Add groupnorm + swish instances * Ractor instance generation, split into multiple cpp file * Add external api and client example * Refine profiler message * Use ck math version of exp * Refine problem size in example * Add host version of exp [ROCm/composable_kernel commit: `ed3a2e5226`]	2023-04-10 08:02:17 -05:00
Jun Liu	89d6f8a65f	Issue #666 : Revert "simplify karg in device/grid of split-k op (#644 )" (#665 ) This reverts commit `1108f64591`. [ROCm/composable_kernel commit: `3248387bbb`]	2023-04-06 17:14:11 -07:00
Haocong WANG	37f95442f9	fix 3rd dword of buffer source descriptor (#659 ) [ROCm/composable_kernel commit: `091570f594`]	2023-03-29 19:03:55 -05:00
carlushuang	1108f64591	simplify karg in device/grid of split-k op (#644 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout [ROCm/composable_kernel commit: `bb5530af91`]	2023-03-29 19:03:07 -05:00
Rostyslav Geyyer	15ac3fc064	Add a denorm test fix (#603 ) * Add type_convert implementations for bf16 * Add the fix for conv_fwd * Add the fix for conv_bwd_data * Add the fix for conv_bwd_weight * Format * Format * Another format * Add a macro to use workaround on MI200 only * Format --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `dbd8f94bef`]	2023-03-29 15:05:32 -05:00
rocking5566	cbce8b77da	Conv + quantization + tanh (#645 ) * Rename file. Prepare to support another activation * Add comment for quantization * Extract out_elementop * Add tanh example * Add conv + bias + tanh quantization instance * Add missing parameter * Refine cmake * Add external api and client example * Extract variable in example * Fix the comment --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `389e84a83b`]	2023-03-29 14:50:23 -05:00
Haocong WANG	84f096c844	[Navi3x] Fix Gridwise_multiple_d operation (#649 ) * Add CMake Option "USE_OPT_NAVI3X" * fix bug [ROCm/composable_kernel commit: `e5376be4ac`]	2023-03-23 11:22:10 -05:00
Illia Silin	b3c1e83276	Get rid of XDL parameters in WMMA kernel string. (#646 ) * remove XDL parameters from WMMA kernel string * get rid f two more parameters [ROCm/composable_kernel commit: `36750a5763`]	2023-03-22 08:05:48 -07:00
Dan Yao	a84d2f5d81	rtn in ternary way (#632 ) * rtn in ternary way * Check both flags to preserve NaN * Format * Rearrange flag1 * Apply suggestions from code review Co-authored-by: Ronan Keryell <ronan@keryell.fr> --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> Co-authored-by: Ronan Keryell <ronan@keryell.fr> [ROCm/composable_kernel commit: `8a659a2e4c`]	2023-03-20 14:30:24 -05:00
ltqin	fc10856d4b	workaround 637 (#640 ) * add workaround 637 * format * change id --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `6ae12434d2`]	2023-03-20 11:49:31 -05:00
rocking5566	6a1403d82d	gemm/Conv xdlops + dlops quantization (#625 ) * Add conv perlayer quantization * Add gemm_dlops quantization * Support int8 for innerproduct * Refine gemm dlops int8 kernel parameter * Support gfx908(MI100) and gfx90a(MI200) * clang-format * Rename example number * Support different layout for d tensor * Add conv dlops perchannel quantization example * Move to example 40 * Extract the common code for different platform (dlops and xdlops) * Move ot subfolder. Prepare to add other op of quantization * Refine the quantization instance library * Add conv dl instances and client example * Remove unnecessary type * Add gemm quantization instance * Add external api and client example * Refine num_bytes * Separete different layout to different cpp * Add more xdl instances * Revert "Remove unnecessary type" This reverts commit `820869182f`. * Remove CShuffleDataType in dlops Let acc and CShuffleDataType be the same in xdlops --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `16dc18e0f9`]	2023-03-15 15:29:40 -05:00
Adam Osewski	512ec3ac4d	Device Op GroupedGemmMultipleD + example fp16 (#633 ) * Pass shared mem pointer as pointer to void. * Device Op GroupedGEMM Multiple D * Example for grouped gemm multiple d. * Add MI200 to supported archs. --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `a2d5ca8e95`]	2023-03-15 11:22:59 -05:00
Rostyslav Geyyer	6e6482b9cd	Add layout check to IsSupportedArgument (#627 ) * Add layout check to IsSupportedArgument * Format --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `c10a6e8293`]	2023-03-15 11:12:12 -05:00
Illia Silin	87113ad617	Update GetTypeString function to generate unique kernel IDs. (#638 ) * make conv_fwd_bias_activation kernel id unique * add more parameters to conv and gemm kernel names * update GetTypeString for conv and gemm kernels * fix two more kernel strings [ROCm/composable_kernel commit: `14b3504d95`]	2023-03-15 10:44:42 -05:00
Haocong WANG	459469f66a	Fix arch limitation bug (#639 ) [ROCm/composable_kernel commit: `ea028ac65a`]	2023-03-15 07:44:13 -07:00
Rostyslav Geyyer	b78f3ba805	Remove debug asserts (#629 ) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> [ROCm/composable_kernel commit: `5b57ab96a8`]	2023-03-10 17:34:44 -06:00
Haocong WANG	9687ad0b61	[Navi3x] Multiple issue fix (#612 ) * Change gridwise gemm mD blockwise gemm to naive * RRR Gemm fix * Fix RCR gemm bug * Isolate wmma instructions * Update amd_inline_asm.hpp * Update amd_wmma.hpp * Update amd_wmma.hpp * fix syntax and update Jenkinsfile --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `087e310589`]	2023-03-10 17:04:28 -06:00
carlushuang	ca7b3a4f58	fix a bug with non-dword-aligned offset when OOB, in case crash (#616 ) Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `76fcdc60e9`]	2023-03-09 08:07:24 -06:00
Illia Silin	9ce65cae0e	[gfx110x] support Navi3x architectures. (#628 ) * enable building on Nav31 * fix syntax * replace GPU_TARGETS with offload-arch * add gfx1102 rachitecture * fix typo * update changelog [ROCm/composable_kernel commit: `0ccecc7c31`]	2023-03-09 07:56:40 -06:00
Adam Osewski	0d23b0d1c9	GroupedGEMM + Gelu client example/instances/profiler (#614 ) * Grouped gemm + Gelu instances. * Device Instance Factory for GroupedGemm+Gelu * Client example * Rangify fill helper functions. * Fix name clash. * Profiler for grouped_gemm+gelu * No need to use full namespace name. * Add check for MRaw divisible by vector load. * Ugly fix for big errors. * Add grouped_gemm+gelu to profiler CMakelists. * Store in argument additional info. * Information about Mraw, Nraw, Kraw values. * Use FastGelu instead of Gelu. * Change client ex to use FastGelu * Remove relaxed error precision. * Remove duplicate output elementwise-op --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `9096b1c7b2`]	2023-03-07 22:06:56 -06:00
Rostyslav Geyyer	2cf1f440a3	Add descriptions to avoid build issues (#619 ) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> [ROCm/composable_kernel commit: `1e59eb3be5`]	2023-03-06 13:11:58 -08:00
pmaybank	9080b984cb	Generate output using Doxygen / Breathe (#598 ) * Modify Doxygen config to pick up include directories recursively * Add DeviceMem struct to API Reference guide * Add classes that are used in Flash Attention kernel * Add a reference and config for generating bibliography Co-authored-by: Philip Maybank <Philip.Maybank@amd.com> [ROCm/composable_kernel commit: `e4bf6d422e`]	2023-03-06 11:39:16 -06:00
Haocong WANG	d33b8f9152	[Navi3x Bug Fix] fix typo to accept MNKPadding flag correctly. (#597 ) * fix a bug blocking wmma_gemm_multipleD * Utilize matrix padder in device_wmma_op * cosmetic change for gemmpadding format * clang format * Change gridwise gemm from FIFO to KMN loop fashion [ROCm/composable_kernel commit: `68dbf40a79`]	2023-03-01 12:07:42 -06:00
Chao Liu	c72e448b2a	Fast GeLU using built-in function (#587 ) * clean up * fast gelu using builtin function * clean * clean * clean * clean: * clean * fix compilation * clean * clean --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `8f455615a8`]	2023-02-26 23:19:11 -06:00
zjing14	0bede6cabd	disable tensor contraction f64 on MI100 (#602 ) [ROCm/composable_kernel commit: `209baee299`]	2023-02-23 16:59:37 -08:00
Rostyslav Geyyer	f52b71c693	Add Grouped Conv Backward Weight on Navi21 for ResNet50. (#505 ) * Add DeviceOp and examples * Format DeviceOp template arguments * Remove bf16 example * Format * Format * Update MakeABCGridDescriptor_A_K0_M_K1_B_K0_N_K1_C_M_N * Refactor argument preparation * Update conv_bwd_weight_dl to grouped_conv_bwd_weight_dl * Rename device op file * Update include directive in the example file * Update descriptor preparation for grouped op * Update the argument * Update batch handling * Add gridwise gemm supporting batched input * Update blockwise indexing, working version * Update copyright year * Update check if argument is supported * Refactor and make consistent with xdl examples * Update check if argument is supported * Add changelog entry * Added comments on Dl op split_k>1 support --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `246ceee49e`]	2023-02-22 11:59:53 -06:00
Illia Silin	fdd525e21c	fix a bug when building for gfx1030 target. (#591 ) * fix a bug while building for gfx1030 and add gfx1030 to targets * fix syntax [ROCm/composable_kernel commit: `bef0cb20db`]	2023-02-16 13:54:08 -06:00
Illia Silin	c1efabf921	Clean up kernel launch output (#569 ) * clean up output from kernel_launch * set RUN_WARMUP to 0 by default * split the warm-up into a separate issue --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `19490ac4f7`]	2023-02-15 12:07:21 -06:00
zjing14	7335ebded0	Add contraction_fp64 example (#570 ) * add contraction_bilinear * add contraction_scale_xdl_fp64 * reduce tile size to avoid register spill --------- Co-authored-by: root <root@ctr-ubbsmc16.amd.com> [ROCm/composable_kernel commit: `24c9ee1d22`]	2023-02-15 12:00:58 -06:00
rocking5566	9d20a2b6b5	Improve normalization (#580 ) * Sync the order of type string with template parameter * Add more instances * Check the vector size and remove redundant var * Extract var to static, prepare to separate sweep once kernel * Separate sweeponce flow and optimize the flow * 1. Rename AccDatatype in normalization to computeData 2. Rename AccElementwiseOperation to YElementwiseOperation in normalization * Remove useless code * Update naive variance kernel * Refine string * Fix typo * Support naive variance for device_normalization * Check the blocksize * Share the VGPR of x and y * Share the VGPR of gamma and beta * Add more instances * Support fp16 sqrt for experiment * Add CHANGELOG * Fix typo * clang-format [ROCm/composable_kernel commit: `6a6163a3d1`]	2023-02-15 11:59:35 -06:00
Haocong WANG	789c15d703	[Navi3x] Add Device Operations (#567 ) * wmma_op + unit test * add arch limitation to wmma test * change arch limitation * Refactor + Add all type unit test(int4 compile failed) * Add f32_16x16x16_bf16 unit test * tempsave * tempsave * tempsave * runtime bug, cannot find symbol * workaround for incorrect HIP warpSize return value * debugging * tempsave * Correctness OK, waiting for optimization * Tidy up + format * temp save * temp save, reproduce the v_bfi_b32 issue * add inline asm for wmmaop test * tidy up * clean some debug purpose code * discard some codes * clang format * clang format * compiler issue fixed + increase tile size * navi3x_multipleD+example * temp save * workable * batchedgemm[OK], groupconv[debug] * groupconv: Sanity check[OK], Performance[Bad] * navi3x_groupconv_need_optimization * format * Add arch limitation to all wmma examples * fix bug: example30 input conv args [ROCm/composable_kernel commit: `0cfda84d05`]	2023-02-15 11:50:51 -06:00
Illia Silin	6a4cfe125b	Remove the workaround for bf16 attention tests. (#586 ) * remove workanround in bf16 attention test * clean up another workaround [ROCm/composable_kernel commit: `06f1fc864c`]	2023-02-14 18:06:24 -06:00
rocking5566	16c383aa2a	Gemm+layernorm instance, ckProfiler, client example (#568 ) * Add gemm + layernorm instance * Add ckProfiler * Add test * Add client example * Detect if user forger to set the workrspace * Use literal in the example * [What] use builtin function for sqrt [Why] compiler will not use v_sqrt_f64_e64 if we use ::sqrt() * check gemm vaildity in IsSupportedArgument * Add more testcases * Merge duplicated folder in client example * Print more infomation * Use better kernel parameter for MS problem size * clang format * Add constexpr for if condition and remove redundant include * Remove cstdlib and add constexpr [ROCm/composable_kernel commit: `f7d28f3e4b`]	2023-02-09 15:02:55 -06:00
guangzlu	ebb0fa5b3b	Add instance for elementwise normlization (#573 ) * added instances for large N * add instance for elementwise normlization * added supported restrict in device_elementwise_normalization_impl.hpp [ROCm/composable_kernel commit: `76d144fa7c`]	2023-02-09 09:37:29 -08:00
ltqin	7cb7031d6a	Add GemmAddSoftmaxGemm support for MSFT ORT (instances and client API) (#576 ) * add instance for gemm bias softmax gemm * add client example * change CGridDesc_G_M_N to CGridDesc_G_M_O * add gridwise * change c grid name * device add d0s data * fix 08 client_example * add example 47_fused_attention * example output correct * add d0 to example * add d0 element op * rechange instance code * change Acc0ElementwiseOperation to C0DEElementwiseOperation * change example name * update instance for cdeelementwiseop * add bhalf_t ScaleAdd * add test * not surport geem1 bias * remove some ignore * fix test bug [ROCm/composable_kernel commit: `332ccc3367`]	2023-02-08 14:34:45 -06:00
who who who	2b3fd10f2b	remove unused variable (#564 ) * remove unused variable * format code [ROCm/composable_kernel commit: `ba40c2ce9d`]	2023-01-31 10:34:35 +08:00
Qianfeng	2c1a324b99	Batchnorm inference instances, external API, client examples and gtests (#531 ) * File renaming and class renaming for device element-wise operation * Add batchnorm-infer instances, external API and client example * Add batchnorm-infer profiler module and gtests * Remove file device_elementwise_extension.hpp and move NormalizeInInfer operation to element_wise_operation.hpp * Remove the using of class aliasing for DeviceElementwiseForBatchNormInfer * Rename class and file due to conflict from device_elementwise_2d.hpp * Fix namespace in batcnnorm_infer_nhwc client example [ROCm/composable_kernel commit: `a1b2441f8d`]	2023-01-25 17:09:04 -06:00
Qianfeng	fc8fa0992f	Use double for all scaling values and float-point constant values at the Device Op API (#557 ) * Use double as alpha/beta values type in reduce device op api * Use double as alpha/beta values type in softmax device op api * Use double as alpha/beta values type in multiple-reduce device op api * Use double as epsilon value type in normalization/elementwise-normalization device op api [ROCm/composable_kernel commit: `52abc2f371`]	2023-01-18 12:02:50 -06:00

1 2 3 4 5

238 Commits