composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-07 00:04:37 +00:00

Author	SHA1	Message	Date
shaojiewang	fb238f356f	use pipeline_v2 for gemm kernel	2022-06-05 14:48:11 +08:00
Chao Liu	d0b9a46741	Merge remote-tracking branch 'origin/develop' into improve_pipeline	2022-05-04 04:10:53 +00:00
Chao Liu	c8f6d5d1f5	clean	2022-05-04 03:37:02 +00:00
Chao Liu	7b4de77570	use remove_cvref_t	2022-05-04 03:00:16 +00:00
Adam Osewski	8eca05a633	Introduce GoogleTest framework. (#204 ) * Use googletest for tests. Add conv2d_fwd UT. * Add conv1D/3D to gtest UT. * Fix: not duplicate test with CTest. * Convert more tests to googltests. * Fix: GIT_SHALLOW is not allowed for git commit hash. * Clang-format * use integer value for GEMM test Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> Co-authored-by: Chao Liu <lc.roy86@gmail.com>	2022-04-30 08:50:16 -05:00
Chao Liu	8a2c69eeee	use integer value for GEMM test (#219 )	2022-04-30 08:44:20 -05:00
Chao Liu	e86f3769b8	clean up	2022-04-29 16:59:01 +00:00
Qianfeng	c77ae65d40	Update to gemm_reduce and batched_gemm_reduce (#213 ) * [Experimental] Change to gemm+reduce and batched-gemm+reduce * Use threadwise-reduce function to improve the gridwise_gemm_reduce_xdl_cshuffle kernel * Tiny fix in device_batched_gemm_xdl.hpp * clang-format library/src/utility/conv_fwd_util.cpp	2022-04-29 11:35:25 -05:00
JD	97d8c5045e	Add gfx90a CI stage for tests (#208 ) * Add gfx90a CI stage * upgrade to ROCm 5.1 and fix formatting	2022-04-29 10:36:19 -05:00
Anthony Chang	95e93430de	Hotfix for gemm test (#214 ) * pass by ref to avoid throwing away initialization results * EOL CRLF -> LF	2022-04-29 19:03:34 +08:00
Jianfeng Yan	3956085d8e	add comments to batched_gemm (#186 ) * add comments to batched_gemm * formatting * fix a typo in batched_gemm_documentation * fix naming	2022-04-25 14:32:59 -05:00
Anthony Chang	7c0b149811	profiler: fix fp32 c-shuffle gemm tuning parameter (#194 )	2022-04-22 15:48:51 -05:00
Adam Osewski	31d869adc6	Clang-format only modified files. (#181 )	2022-04-22 15:48:08 -05:00
Anthony Chang	08a979f188	use inline asm for 4x4 int8 transposition (#187 )	2022-04-22 15:47:31 -05:00
Chao Liu	2488092e16	format	2022-04-22 13:36:09 +00:00
Chao Liu	5b3bd032ad	refactor	2022-04-22 05:02:24 +00:00
Chao Liu	76ee0baf12	Merge remote-tracking branch 'origin/develop' into improve_pipeline	2022-04-22 00:00:05 +00:00
Adam Osewski	1a0cd5d160	Convolution FWD profiler refactor. (#183 ) * Convolution ND * Code unification across dimensions for generating tensor descriptors. * Example * Instances * Move convnd f32 instance file to comply with repo structure. * Conv 1D tensor layouts. * Formatting and use ReferenceConv * Reference ConvFwd supporting 1D and 2D convolution. * Debug printing TensorLayout name. * Conv fwd 1D instance f32 * Refactor conv ND example. Needed to support various conv dimensio. Needed to support various conv dimensions * Rename conv nd example director to prevent conflicts. * Refactor some common utility to single file. Plus some tests. * Refactor GetHostTensorDescriptor + UT. * Add 1D test case. * Test reference convolution 1d/2d * Remove some leftovers. * Fix convolution example error for 1D * Refactor test check errors utility function. * Test Conv2D Fwd XDL * More UT for 1D case. * Parameterize input & weight initializers. * Rename example to prevent conflicts. * Split convnd instance into separate files for 1d/2d * Address review comments. * Fix data type for flops/gbytes calculations. * Assign example number 11. * 3D cases for convolution utility functions. * 3D reference convolution. * Add support for 3D convolution. * Check for inputs bigger than 2GB. * Formatting * Support for bf16/f16/f32/i8 - conv instances + UT. * Use check_err from test_util.hpp. * Split convnd test into separate files for each dim. * Fix data generation and use proper instances. * Formatting * Skip tensor initialization if not necessary. * Fix CMakefiles. * Remove redundant conv2d_fwd test. * Lower problem size for conv3D UT. * 3D case for convnd example. * Remove leftovers after merge. * Add Conv Specialization string to GetTypeString * Skip instance causing numerical errors. * Small fixes. * Remove redundant includes. * Fix namespace name error. * Script for automatic testing and logging convolution fwd UTs * Comment out numactl cmd. * Refine weights initalization and relax rtol for fp16 * Move test_util.hpp to check_err.hpp * Refine weights initalization and relax rtol for fp16 * Refactor common part of test conv utils. * Move utility function to single common place. * Add additional common functions to utility. * Refactor convnd_fwd_xdl examples. * Remove redundant files. * Unify structure. * Add constructor to ConvParams. * And add input parameters validation. * Modify conv examples to use single utility file. * Remove check_error from host_tensor.hpp * Get rid of check_indices function. * Remove bf16_to_f32 function overload for scalars. * Fix namespace. * Add half_float::half for check_err. * Fix conv params size in UT. * Fix weights initialization for int8. * Fix weights initialization for int8. * Add type_convert when store output in ref conv 1D. * Get back old conv2d_fwd_xdl operation. * Silence conv debug print. * format * clean * clean * Fix merge. * Fix namespace for check_err * Formatting. * Fix merge artifacts. * Remove deleted header. * Fix some includes and use ck::utils::check_err. * Remove unused check_indices restored by previous merge. * Fix namespaces after merge. * Fix compilation error. * Small fixes. * Use common functions. * Fix filename * Fix namespaces. * Fix merge artifact - retrieve removed by accident fun. * Fix ConvForwardSpecialization. * Working example of OpInstanceRunEngine for conv2dfwd UT. * Adhere to coding style rules. * Formatting and adhere to coding style rules. * Fix merge artifacts. * Utility for collecting conv fwd instances. + Plus commmon part for parsing cmdline params. * Refactor FillUniform because of segfault for int8_t. * Naming convention. * Elegant version of device mem allocation. * Use OpInstanceRunEngine in conv fwd nd tests. * Multiple refinements. * conditional init * don't run reference op if not provided. * Use OpInstanceRunEngine for ckProfiler conv_fwd * Refactor common tensor fill function to separate file. * Clean up unused functions. * Support different init methods. * Create CMake target for conv_fwd_util. * Add header for profile_convnd_fwd.cpp * Fix CMakefiles to link with conv_fwd_util where needed. * Fix some clutter. Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-04-21 17:39:39 -05:00
Chao Liu	4816890d74	fix build	2022-04-21 22:39:04 +00:00
JD	7353ec0c25	Fix `clang-format` (#189 ) * Fix clang-format filepath * update docker and fix format	2022-04-21 17:02:15 -05:00
Chao Liu	0d4026f8b9	clean	2022-04-21 21:53:00 +00:00
Chao Liu	d55080e9f9	Merge remote-tracking branch 'origin/develop' into improve_pipeline	2022-04-21 21:46:25 +00:00
Chao Liu	7610e0491a	refactor	2022-04-21 21:45:50 +00:00
Chao Liu	1a24ad25e1	refactor	2022-04-21 17:32:57 +00:00
Chao Liu	7cd48ef11e	refactor	2022-04-21 17:28:53 +00:00
Chao Liu	96c73d709c	add missing type convert	2022-04-21 16:57:40 +00:00
Chao Liu	2d35fac050	refactor	2022-04-21 16:06:15 +00:00
Chao Liu	4a96c2e4ee	refactor	2022-04-21 14:22:24 +00:00
zjing14	860e291c30	removed unused lds loads (#196 )	2022-04-20 22:10:35 -05:00
Qianfeng	c1ef73192e	Use ck::half_t for Host Reduction (#195 ) * Add math functions for host * Change to host reduction to use ck::math: * Remove the using of half_float::half and half.hpp from reduction example/profiler/ctest	2022-04-20 22:09:26 -05:00
Chao Liu	70e8cc7666	Merge remote-tracking branch 'origin/develop' into improve_pipeline	2022-04-20 01:28:22 +00:00
Illia Silin	4221505d3e	Compile CK for all targets (#188 ) * compile ck for all targets * update the target criteria * change the target condition * fixed some typos * fixed missed file * revert changes in README * revert device_conv3d_fwd_xdl_... * update device_conv3d_fwd_xdl_... * update device_batched_gemm_reduce... * test the unused arguments fix * test the warning suppression * try suppress warnings in device_batched_gemm_reduce_xdl... * fix the last warnings * replace UNUSED with std::ignore * fix a typo * replaced std::ignore with ignore * add igonre header to common_header * refactor atomicAdd Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-04-15 14:17:28 -05:00
Chao Liu	cf226747f8	tweak	2022-04-15 02:39:43 +00:00
Chao Liu	2071869077	adding gemm pipeline	2022-04-14 00:01:29 +00:00
Chao Liu	18707866d9	adding thread group	2022-04-10 03:01:58 +00:00
Chao Liu	ee33b1faf8	adding thread group	2022-04-10 02:05:46 +00:00
Chao Liu	0e877b8481	adding thread group	2022-04-10 02:05:14 +00:00
Chao Liu	3f4af14cc1	format	2022-04-10 02:02:48 +00:00
Chao Liu	f520f9919f	fix typo	2022-04-09 20:03:02 +00:00
Jianfeng Yan	ac0d806650	Fix typo in batched gemm profiler (#176 ) * forgot passing BatchedCount in some profiler_batched_gemm * delete default BatchCount	2022-04-07 13:17:15 -05:00
Chao Liu	ee92d26282	improving pipeline	2022-04-06 05:39:39 +00:00
Chao Liu	2d5f0683a5	format	2022-04-06 05:39:27 +00:00
Adam Osewski	abf4bdb9a9	Common forward convolution utility refactor. (#141 ) * Convolution ND * Code unification across dimensions for generating tensor descriptors. * Example * Instances * Move convnd f32 instance file to comply with repo structure. * Conv 1D tensor layouts. * Formatting and use ReferenceConv * Reference ConvFwd supporting 1D and 2D convolution. * Debug printing TensorLayout name. * Conv fwd 1D instance f32 * Refactor conv ND example. Needed to support various conv dimensio. Needed to support various conv dimensions * Rename conv nd example director to prevent conflicts. * Refactor some common utility to single file. Plus some tests. * Refactor GetHostTensorDescriptor + UT. * Add 1D test case. * Test reference convolution 1d/2d * Remove some leftovers. * Fix convolution example error for 1D * Refactor test check errors utility function. * Test Conv2D Fwd XDL * More UT for 1D case. * Parameterize input & weight initializers. * Rename example to prevent conflicts. * Split convnd instance into separate files for 1d/2d * Address review comments. * Fix data type for flops/gbytes calculations. * Assign example number 11. * 3D cases for convolution utility functions. * 3D reference convolution. * Add support for 3D convolution. * Check for inputs bigger than 2GB. * Formatting * Support for bf16/f16/f32/i8 - conv instances + UT. * Use check_err from test_util.hpp. * Split convnd test into separate files for each dim. * Fix data generation and use proper instances. * Formatting * Skip tensor initialization if not necessary. * Fix CMakefiles. * Remove redundant conv2d_fwd test. * Lower problem size for conv3D UT. * 3D case for convnd example. * Remove leftovers after merge. * Add Conv Specialization string to GetTypeString * Skip instance causing numerical errors. * Small fixes. * Remove redundant includes. * Fix namespace name error. * Script for automatic testing and logging convolution fwd UTs * Comment out numactl cmd. * Refine weights initalization and relax rtol for fp16 * Move test_util.hpp to check_err.hpp * Refine weights initalization and relax rtol for fp16 * Refactor common part of test conv utils. * Move utility function to single common place. * Add additional common functions to utility. * Refactor convnd_fwd_xdl examples. * Remove redundant files. * Unify structure. * Add constructor to ConvParams. * And add input parameters validation. * Modify conv examples to use single utility file. * Remove check_error from host_tensor.hpp * Get rid of check_indices function. * Remove bf16_to_f32 function overload for scalars. * Fix namespace. * Add half_float::half for check_err. * Fix conv params size in UT. * Fix weights initialization for int8. * Fix weights initialization for int8. * Add type_convert when store output in ref conv 1D. * Get back old conv2d_fwd_xdl operation. * Silence conv debug print. * format * clean * clean * Fix merge. * Fix namespace for check_err * Formatting. * Fix merge artifacts. * Remove deleted header. * Fix some includes and use ck::utils::check_err. * Remove unused check_indices restored by previous merge. * Fix namespaces after merge. * Fix compilation error. * Small fixes. * Use common functions. * Fix filename * Fix namespaces. * Fix merge artifact - retrieve removed by accident fun. * Fix ConvForwardSpecialization. * Adhere to coding style rules. * Fix merge artifacts. Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-04-05 15:16:59 -05:00
ltqin	6717168c18	Patch for bwd data comments (#174 ) * change function name and way to set input zero * change enable if	2022-04-04 20:33:53 -05:00
ltqin	781cacd2e6	NHWC Conv2d Bwd weight fp16 ckprofiler and test (#166 ) * change backward weight name * start add bwd weight lib and profiler * change tuning paramter * change output info * add bwd weight test * change test info * using conv_util * change wgt to weight * add } * add fp32	2022-04-04 20:32:00 -05:00
Qianfeng	82c8b9f8ee	Improve Reduction kernel api (#152 ) * Add ThreadwiseReduction functor as per-thread reduction api * Using ThreadwiseReduce api and some change in using PartitionedBlockwiseReduction api to simply the kernels * Add comments and remove useless declarations in the kernels * Tiny updates	2022-04-04 20:31:44 -05:00
Chao Liu	646878162b	fix build (#171 )	2022-03-31 20:30:20 -05:00
Anthony Chang	7db48f9008	Tune & add conflict-free LDS gemm kernels (#159 ) * retune & add conflict-free bf16/fp16 c-shuffle gemm instances amend wrong K1 value in some fp16/bf16 kernel instances * make gemm cshuffle's timing behavior consistent with all other functions * clang-format * retune & add conflict-free fp32 c-shuffle gemm instances * retune & add conflict-free int8 c-shuffle gemm instances * update the underlying gridwise gemm of all c-shuffle gemm kernels * typo	2022-03-31 12:58:41 -05:00
ltqin	c0e95f6204	Patch for bwd data #134 (#168 ) * remove switch for NDimSpatial * change in, out and wei name * rename reference thumb function name * remove test	2022-03-31 12:34:18 -05:00
Chao Liu	cd167e492a	Compile for gfx908 and gfx90a (#130 ) * adding compilation for multiple targets * fix build * clean * update Jekinsfile * update readme * update Jenkins * use ck::half_t instead of ushort for bf16 * rename enum classes * clean * rename * clean	2022-03-31 12:33:34 -05:00

1 2 3 4 5 ...

593 Commits