composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-12 01:10:17 +00:00

Author	SHA1	Message	Date
Illia Silin	86e0190ec9	update daily build from rocm 5.4.3 to 5.5 (#693 )	2023-05-03 08:18:10 -07:00
zjing14	f53ede26e5	fixed init range (#691 )	2023-05-02 08:30:23 -07:00
Illia Silin	4feebedd41	Syncing up from internal repo to enable MI300. (#690 ) * enable gfx940 * switch between intrinsic mfma routines on mi100/200 and mi300 * fix mfma_int8 on MI300 * disable 2 int8 examples on MI300 * Update cmake-ck-dev.sh * restore gitignore file * modify Jenkinsfile to the internal repo --------- Co-authored-by: Jing Zhang <jizha@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> rocm-5.7.1 rocm-5.7.0	2023-04-28 18:22:59 -05:00
Haocong WANG	54c90aae13	add vector load check (#680 ) Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-04-26 15:58:57 -05:00
Jun Liu	7613c1d9b9	[CK] suppress unsafe buffer warn (#687 ) incomplete fix from https://github.com/ROCmSoftwarePlatform/composable_kernel/pull/670 So it does not only happen in gtest but also in CK code: We need to fix them as a quality improvement, but for now suppressing this warning in immediate releases: http://compiler-ci.amd.com/blue/rest/organizations/jenkins/pipelines/compiler-psdb-amd-stg-open/runs/2540/nodes/282/steps/3202/log/?start=0 e.g. ``` [2023-04-26T17:26:31.524Z] /jenkins/workspace/compiler-psdb-amd-stg-open/Libs/MIOpen/deps_hip/cget/build/tmp-a3db5da587a64213bde99fb856db1b43/composable_kernel-0f98035df1cc5ba3e90ab03187e672b426a25b00/include/ck/utility/generic_memory_space_atomic.hpp:52:19: error: unsafe pointer arithmetic [-Werror,-Wunsafe-buffer-usage] [2023-04-26T17:26:31.524Z] atomicAdd(c_style_pointer_cast<float>(p_dst) + 1, vx.template AsType<float>()[I1]); [2023-04-26T17:26:31.524Z] ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` ``` [2023-04-26T17:26:31.523Z] /jenkins/workspace/compiler-psdb-amd-stg-open/Libs/MIOpen/deps_hip/cget/build/tmp-a3db5da587a64213bde99fb856db1b43/composable_kernel-0f98035df1cc5ba3e90ab03187e672b426a25b00/include/ck/utility/amd_inline_asm.hpp:62:20: error: 'p_a_half2' is an unsafe pointer used for buffer access [-Werror,-Wunsafe-buffer-usage] [2023-04-26T17:26:31.523Z] const half2_t p_a_half2 = c_style_pointer_cast<const half2_t*>(&a); [2023-04-26T17:26:31.523Z] ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ```	2023-04-26 15:41:03 -05:00
Adam Osewski	8bb2bb4a05	Grouped Gemm + SplitK + simplified Kernel Args (#669 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * B2C with 3D grid for KSplit * Remove unused code. * Use default B2C (3D grid) in grid gemm v2r4r2. * Device gemm splitk use B2C map. * Device GroupedGemmXdlSplitKCShuffle * Example for GroupedGemm Xdl SplitK * Introduce Device GroupedGemmSplitK * Fix updating kbatch size. * Add instance mk-nk-mn * Enable set kbatch in profiler. * Add GGemmSplitK mk-kn-mn instances * Add more instances & split into multiple files. * minor fix * tuning * clean * disabled failed instances * use pipe v2 * Ignore arg on not supported arch. * fix warning --------- Co-authored-by: carlushuang <carlus.huang@amd.com> Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Jing Zhang <jizhan@amd.com> Co-authored-by: root <root@ctr-ubbsmc15.amd.com>	2023-04-24 15:43:36 -05:00
zjing14	8b9cbba823	reduce inital number for half_t splitk (#685 )	2023-04-24 08:07:39 -05:00
rocking	3eecbfb6ec	Revise layout of group convolution (#675 ) * [What] Remove pure conv int8 instance [Why] We will never use pure int8 conv in AI, use int8 quantization instead * Change layout * Share the kernel parameter * Support more type of NHWGC for group conv * Revise client example of conv 2d, use NHWGC layout * Add instance to cmake * Revise layout of group conv quantization instance * Revise layout of external api of group conv quantization * Revise layout of group conv quantization client example * Fix clang format * Add comment to describe meaning of each parameter	2023-04-23 23:40:00 -05:00
Illia Silin	903cd19ce3	Put back the split-k gemm code. (#684 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout --------- Co-authored-by: carlushuang <carlus.huang@amd.com>	2023-04-21 19:37:00 -05:00
Illia Silin	9afa44d40b	Switch to the new rocm5.6 compiler. (#681 ) * switch to the new rocm5.6 compiler and docker * fix syntax	2023-04-21 07:59:26 -07:00
Sam Wu	938a5e0e41	Update dependabot config (#682 ) Co-authored-by: samjwu <samjwu@users.noreply.github.com>	2023-04-20 21:55:56 -06:00
Illia Silin	bb0b772da9	Allow using ROCm release candidate compilers. (#679 ) * enable use of rocm5.5 release candidate 4 * upgrade to ROCM5.5 RC5 * try fix the PUB_KEY error, remove the cmake-data package * upgrade to latest cmake version * use private dockerhub repo for rocm5.5 rc5 * add missing bracket	2023-04-18 09:22:49 -07:00
rocking5566	fd11a4a12a	Add (#677 )	2023-04-17 10:12:10 -05:00
Haocong WANG	fc26d42a2e	Fix a typo (#676 )	2023-04-15 21:57:34 -05:00
Rostyslav Geyyer	03eaee6ae6	Add more macros to turn on/off denorm fix (#678 ) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>	2023-04-15 21:56:07 -05:00
Haocong WANG	e85178b4ca	Add memory index guard in wmma device ops (#667 )	2023-04-11 15:42:47 -05:00
Jun Liu	f532988713	[gtest] suppress unsafe buffer warn (#670 ) ref: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1912	2023-04-11 15:41:49 -05:00
Sam Wu	fd497f0e79	Add dependabot config and pin rocm-docs-core (#663 )	2023-04-11 09:18:38 -06:00
zjing14	c203bf6711	fixed quant example (#672 ) Co-authored-by: root <root@ctr-ubbsmc15.amd.com>	2023-04-11 07:46:46 -05:00
zjing14	c54f8bcc25	add a marco to turn on/off denorm fix (off by default) (#673 ) * add a marco to turn off denorm fix by default * expose the marco --------- Co-authored-by: root <root@ctr-ubbsmc15.amd.com>	2023-04-11 07:44:43 -05:00
rocking5566	ed3a2e5226	Groupnorm + swish external api (#668 ) * Rename to proper naming * Add example of groupnorm + swish * Extract duplicate code in example * Add groupnorm + swish instances * Ractor instance generation, split into multiple cpp file * Add external api and client example * Refine profiler message * Use ck math version of exp * Refine problem size in example * Add host version of exp	2023-04-10 08:02:17 -05:00
Jun Liu	3248387bbb	Issue #666 : Revert "simplify karg in device/grid of split-k op (#644 )" (#665 ) This reverts commit `bb5530af91`.	2023-04-06 17:14:11 -07:00
zjing14	fde6d2742b	add fp64 instances (#658 ) Co-authored-by: root <root@ctr-ubbsmc15.amd.com>	2023-03-30 13:30:43 -05:00
Haocong WANG	091570f594	fix 3rd dword of buffer source descriptor (#659 )	2023-03-29 19:03:55 -05:00
carlushuang	bb5530af91	simplify karg in device/grid of split-k op (#644 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout	2023-03-29 19:03:07 -05:00
Rostyslav Geyyer	dbd8f94bef	Add a denorm test fix (#603 ) * Add type_convert implementations for bf16 * Add the fix for conv_fwd * Add the fix for conv_bwd_data * Add the fix for conv_bwd_weight * Format * Format * Another format * Add a macro to use workaround on MI200 only * Format --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-29 15:05:32 -05:00
rocking5566	389e84a83b	Conv + quantization + tanh (#645 ) * Rename file. Prepare to support another activation * Add comment for quantization * Extract out_elementop * Add tanh example * Add conv + bias + tanh quantization instance * Add missing parameter * Refine cmake * Add external api and client example * Extract variable in example * Fix the comment --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-29 14:50:23 -05:00
Haocong WANG	4e097ad283	Add CMake Option "USE_OPT_NAVI3X" (#647 ) * Add CMake Option "USE_OPT_NAVI3X" * remove navi3x opt compile option from cmake script	2023-03-29 14:07:33 -05:00
Sam Wu	88d474323b	Separate bibtex requirement from rocm-docs-core (#656 ) * separate bibtex requirement from rocm-docs-core * point requirements to source rocm-docs-core repo	2023-03-27 17:14:36 -06:00
Sam Wu	f80776d937	standardize docs (#655 )	2023-03-23 20:58:59 -07:00
Haocong WANG	e5376be4ac	[Navi3x] Fix Gridwise_multiple_d operation (#649 ) * Add CMake Option "USE_OPT_NAVI3X" * fix bug	2023-03-23 11:22:10 -05:00
Po Yen Chen	fe96e8fbf2	Reduce group & batch of the tested convolutions (#648 )	2023-03-22 10:49:11 -07:00
Illia Silin	36750a5763	Get rid of XDL parameters in WMMA kernel string. (#646 ) * remove XDL parameters from WMMA kernel string * get rid f two more parameters	2023-03-22 08:05:48 -07:00
Dan Yao	8a659a2e4c	rtn in ternary way (#632 ) * rtn in ternary way * Check both flags to preserve NaN * Format * Rearrange flag1 * Apply suggestions from code review Co-authored-by: Ronan Keryell <ronan@keryell.fr> --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> Co-authored-by: Ronan Keryell <ronan@keryell.fr>	2023-03-20 14:30:24 -05:00
ltqin	6ae12434d2	workaround 637 (#640 ) * add workaround 637 * format * change id --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-20 11:49:31 -05:00
Rostyslav Geyyer	fa998675fc	Update cmake-ck-dev.sh script (#641 ) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>	2023-03-15 18:38:11 -05:00
rocking5566	16dc18e0f9	gemm/Conv xdlops + dlops quantization (#625 ) * Add conv perlayer quantization * Add gemm_dlops quantization * Support int8 for innerproduct * Refine gemm dlops int8 kernel parameter * Support gfx908(MI100) and gfx90a(MI200) * clang-format * Rename example number * Support different layout for d tensor * Add conv dlops perchannel quantization example * Move to example 40 * Extract the common code for different platform (dlops and xdlops) * Move ot subfolder. Prepare to add other op of quantization * Refine the quantization instance library * Add conv dl instances and client example * Remove unnecessary type * Add gemm quantization instance * Add external api and client example * Refine num_bytes * Separete different layout to different cpp * Add more xdl instances * Revert "Remove unnecessary type" This reverts commit `820869182f`. * Remove CShuffleDataType in dlops Let acc and CShuffleDataType be the same in xdlops --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-15 15:29:40 -05:00
Adam Osewski	a2d5ca8e95	Device Op GroupedGemmMultipleD + example fp16 (#633 ) * Pass shared mem pointer as pointer to void. * Device Op GroupedGEMM Multiple D * Example for grouped gemm multiple d. * Add MI200 to supported archs. --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-15 11:22:59 -05:00
Rostyslav Geyyer	c10a6e8293	Add layout check to IsSupportedArgument (#627 ) * Add layout check to IsSupportedArgument * Format --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-15 11:12:12 -05:00
Illia Silin	14b3504d95	Update GetTypeString function to generate unique kernel IDs. (#638 ) * make conv_fwd_bias_activation kernel id unique * add more parameters to conv and gemm kernel names * update GetTypeString for conv and gemm kernels * fix two more kernel strings	2023-03-15 10:44:42 -05:00
Haocong WANG	ea028ac65a	Fix arch limitation bug (#639 )	2023-03-15 07:44:13 -07:00
Rostyslav Geyyer	5b57ab96a8	Remove debug asserts (#629 ) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>	2023-03-10 17:34:44 -06:00
Haocong WANG	087e310589	[Navi3x] Multiple issue fix (#612 ) * Change gridwise gemm mD blockwise gemm to naive * RRR Gemm fix * Fix RCR gemm bug * Isolate wmma instructions * Update amd_inline_asm.hpp * Update amd_wmma.hpp * Update amd_wmma.hpp * fix syntax and update Jenkinsfile --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com>	2023-03-10 17:04:28 -06:00
carlushuang	76fcdc60e9	fix a bug with non-dword-aligned offset when OOB, in case crash (#616 ) Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-09 08:07:24 -06:00
Illia Silin	0ccecc7c31	[gfx110x] support Navi3x architectures. (#628 ) * enable building on Nav31 * fix syntax * replace GPU_TARGETS with offload-arch * add gfx1102 rachitecture * fix typo * update changelog	2023-03-09 07:56:40 -06:00
Adam Osewski	9096b1c7b2	GroupedGEMM + Gelu client example/instances/profiler (#614 ) * Grouped gemm + Gelu instances. * Device Instance Factory for GroupedGemm+Gelu * Client example * Rangify fill helper functions. * Fix name clash. * Profiler for grouped_gemm+gelu * No need to use full namespace name. * Add check for MRaw divisible by vector load. * Ugly fix for big errors. * Add grouped_gemm+gelu to profiler CMakelists. * Store in argument additional info. * Information about Mraw, Nraw, Kraw values. * Use FastGelu instead of Gelu. * Change client ex to use FastGelu * Remove relaxed error precision. * Remove duplicate output elementwise-op --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-07 22:06:56 -06:00
Rostyslav Geyyer	1e59eb3be5	Add descriptions to avoid build issues (#619 ) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>	2023-03-06 13:11:58 -08:00
pmaybank	e4bf6d422e	Generate output using Doxygen / Breathe (#598 ) * Modify Doxygen config to pick up include directories recursively * Add DeviceMem struct to API Reference guide * Add classes that are used in Flash Attention kernel * Add a reference and config for generating bibliography Co-authored-by: Philip Maybank <Philip.Maybank@amd.com>	2023-03-06 11:39:16 -06:00
Illia Silin	e6cda9f8ff	Change the CI workflow. (#611 ) * add new parallel stage on navi node * dont run performance tests on navi, get rid of 9110 compiler * only run navi build when not doing QA * fix syntax * use navi21 label * dont stash profiler on navi nodes, scp deb package to ginger * disable tests on navi nodes * test posting a binary to ginger * add sshpass and use it to copy deb package * fix the scp example * fix syntax * debug the scp issues * add jenkins user to docker * dont try whoami * change jenkins uid and add user with uid=1002 * try scp from the last stage on micimaster * rename and stash the package, scp from micimaster	2023-03-02 11:24:31 -06:00
Illia Silin	59cbb20c7c	Suppress reserved-identifier warning and catch all warnings. (#608 ) * suppress the reserved-identifier warnings * keep BUILD_DEV=On and use -Werror by default	2023-03-01 12:08:13 -06:00

1 2 3 4 5 ...

879 Commits