composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-17 17:19:12 +00:00

Author	SHA1	Message	Date
Illia Silin	4fa2ef676a	Add performance tests as a stage of CI. (#247 ) * modify ckProfiler_gemm output * fix syntax * change ckProfiler output and return 0 * fix syntax * output datatype * fix syntax * output datatype in another way * fix syntax * fix syntax * test return values of ckProfiler * add layout info and tests, make sure ckprofiler returns 0 * fix syntax * change layout output * fix syntax * fix syntax again * update script to process perf results * rearrange jenkins stages * fix typo * add python packages to Docker file * adding setuptools-rust package * modify parsing for new test parameters * test db credentials on jenkins * fix syntax * update python script to handle incomplete lines * ungrade python to 3.8 and write the gemm_params table * add sqlalchemy package to docker * move perf data processing to master node * move the master node inside a steps region * add new stage for result processing * move results processing to separate stage * reduce number of tests to speedup debugging * pass config to processPerfResults stage * run script on master in a docker container * replace show_node_info * try loading docker on master node again * use ansible node instead of master * get rid of pymysql package * try ssh connection using paramiko * put back pymysql * put the perf data processing back on the gpu node * put back artifact definition * archive the perf_log before parsing * clean up jenkinsfile, fix parsing * fix typo * enable all perf tests * put all stages in original order, finalize script * fix gpu_arch version * update parsing script * remove obsolete file causing merge conflict [ROCm/composable_kernel commit: `1085794df3`]	2022-05-24 11:14:50 -05:00
Shaojie WANG	d1a0ccb542	add GetWorkSpaceSize to base arg (#253 ) * add GetWorkSpaceSize to base arg and make an example on convnd_bwd_weight * remove redundant compute * use datatype and split k to check whether a workspace is used * remove unused computation for work space size [ROCm/composable_kernel commit: `0d08cf1893`]	2022-05-24 11:13:00 -05:00
Chao Liu	3864685a52	fix build (#246 ) * fix build * Revert "fix build" This reverts commit `d73102384b`. * post PR #235 merge fix * amend Co-authored-by: Anthony Chang <ac.chang@outlook.com> [ROCm/composable_kernel commit: `ba58a93f60`]	2022-05-23 12:10:22 -05:00
Shaojie WANG	7ed9e9a348	example of conv bwd weight 1d/2d/3d fp32/fp16/bf16 xdl (#244 ) * enable example of conv 1d/3d for bwd weight * make bf16 kernel do not use atomic add * using new gridwise gemm for bwd weight on convnd bwd weight Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `ac543313bf`]	2022-05-20 17:20:10 -05:00
Chao Liu	983972fedf	remove options.hpp.in (#240 ) [ROCm/composable_kernel commit: `44943e0e21`]	2022-05-20 14:40:12 -05:00
Anthony Chang	f20803e046	Refactor block to C tile map (#235 ) * refactor block-to-ctile-map * gridwise gemm block2ctile generic validity check * format * amend split-k gemm block2ctile map refactor * add test * format * amend * revert to calculating batch index in kernel instead of passing as block_id_z * move file * add valid ctile index check to gridwise v2r4 [ROCm/composable_kernel commit: `a054f7d604`]	2022-05-20 12:40:51 -05:00
Shaojie WANG	b2dd2b03ed	[conv bwd-weight]Binding gemm k1 to conv n (#202 ) * add some instance to develop * avoid bank conflicts for wrw for all instance * add small K1 test * delete some unused instance * binding gemm k1 to conv n * try using half_4 to do ds_read * reset buffer load oob and ds memcpy to default option * remove useless instances * remove redandunt space * remove printf code * clang-format-10 change * use fastest config * fix clang format for the other files * remove gemmk0 pad for output * add gemmk padding macro * add bank length computation * add template to distinguish the instance that need lds padding for wrw * use rocm5.1 as docker * use integer value for GEMM test * add Right padding macro * add 2 test asm code * using 256x256x32 tile size * 1. move dedicated transform into gridwisegemm's head file. 2. make lds tensor params a struct templete. 3. remove useless code * using small vec * 256128 kernel size for example remove asm files * use a new gridwise gemm header for bwd-weight * revert gridwise gemm v2r4r2 * change foramt * reset gridwise gemm v2r4r2 * remove unused code * revert instance file * revert example instance * format file * remove macros * resolve compile error * rename wrw kernel invoker * use gridwisegemm pipeline struct instead of implement run fucntion in the same header Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `070619fbf1`]	2022-05-20 12:36:25 -05:00
Shaojie WANG	60a849ccb5	remove unused conv bwd data profiler header and cpp (#245 ) [ROCm/composable_kernel commit: `b31b588dd2`]	2022-05-20 12:34:23 -05:00
Shaojie WANG	642e21be7f	[Perf][Bwd-weights]Lds re-layout to avoid ds read/write bank conflict and balance ds ops with address calculations (#190 ) * add some instance to develop * avoid bank conflicts for wrw for all instance * add small K1 test * delete some unused instance * reset buffer load oob and ds memcpy to default option * remove useless instances * remove redandunt space * remove printf code * clang-format-10 change * fix clang format for the other files * add bank length computation * add template to distinguish the instance that need lds padding for wrw * use rocm5.1 as docker * use integer value for GEMM test * 1. move dedicated transform into gridwisegemm's head file. 2. make lds tensor params a struct templete. 3. remove useless code * use a new gridwise gemm header for bwd-weight * revert gridwise gemm v2r4r2 * change foramt * rename kernel invoker Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `b9b9c3b814`]	2022-05-20 00:43:10 -05:00
rocking5566	ff3c5a4063	Hotfix eltiwseop (#242 ) * Use vector constructor instead * Fix typo * Move blockSize to the MakeArgumentPointer * Fix naming * Fix clang format * remove blockSize from DeviceBinaryElementwise::Argument() Co-authored-by: rocking <chunylai@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `bb4b82a95a`]	2022-05-19 22:02:06 -05:00
rocking5566	7100ce8382	Gemm reduce max (#209 ) * [What] Rename the example [Why] Prepare to add unary reduction * Add global oparation to the parameter * Add atomicmax * Fix compile error * Support atomicMax (hip library) * Rename the reduction example * Fix target name * use p_d1_grid as the indicator directly * Prevent performance issue. Let passthrough handle it. * Implement the function template the specialize the float2 * No need to separate into two lines * Remove empty line * add comment * Fix compile error due to merge from develop * make the implementation of atomic_max / atomic_add explicit for each datatype * Refine typo * For future CI test * Fix compiler error in ckProfiler * Merge commit 'de2769e3a6695b38a20529261273ddc5cdaab2fe' * simply use remove_pointer * Rename type and var * Refine example * Modify reducemax example * Fix bug in reduction * Change initialize range * Implement F64 version of atomicMax * Move reduction code together * Add buffer atomic_max * Fix coding style by clang-format * Integrate new api of DeviceGemmReduce_Xdl_CShuffle * Integrate Batch gemm reduction * Fix example * fix example * clean up * Fix batch gemm tensor operation * Fix coding style * Fix template augument * Fix clang format * Keep flexible of different stride for each D tensor * Fix compile error for ckProfiler * Fix typo * [What] Fix naming [Why] Prepare to add out elementop * Add DoutElementOp Co-authored-by: Chao Liu <chao.liu2@amd.com> Co-authored-by: rocking <chunylai@amd.com> [ROCm/composable_kernel commit: `0ffe956ab1`]	2022-05-19 21:56:56 -05:00
rocking5566	8bdd05f366	elementwise op (#238 ) * Add elementwise operation kernel and example * Add comment * Add template argument of dim . Prepare to support multiple dimension * Rename example * Support 1 dimension * Add static assert * Add comment * Extract pad * Remove redundant argument * Support any dimension for elementwise operation * Remove line * Let it be the multiple number of CU * Move thread per block to the parameter of constructor * rename threadPerBlock with blockSize * Support double * rename kernel function name * remove redundant include header * Refine type * Need to the final dimension * Refine variable name * Refine type * Use index_t instead of int in API Co-authored-by: rocking <chunylai@amd.com> [ROCm/composable_kernel commit: `aafc3ac27a`]	2022-05-18 23:34:35 -05:00
Anthony Chang	3a574a0f5c	Validate examples in CI (#233 ) * validate examples in ctest runs * format * fix usage of check_err * amend * add example codes to custom target 'check' Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `9f71ff48e2`]	2022-05-13 16:54:44 -05:00
JD	569dd9f47b	Add host API (#220 ) * Add host API * manually rebase on develop * clean * manually rebase on develop * exclude tests from all target * address review comments * update client app name * fix missing lib name * clang-format update * refactor * refactor * refactor * refactor * refactor * fix test issue * refactor * refactor * refactor * upate cmake and readme Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `cec69bc3bc`]	2022-05-12 09:21:01 -05:00
ltqin	7ad07e23ac	enable convnd bwd data test (#234 ) [ROCm/composable_kernel commit: `0f912e205e`]	2022-05-12 09:18:59 -05:00
Anthony Chang	dd9949bc7f	Manual control of MAC cluster for improved interwave performance (#184 ) * manual control of MAC cluster for improved 2-wave performance ensure setprio's order; ensure inner loop size >= local read size synchronize when single mac cluster * format * use value field from ck::integral_constant * roll out inter-wave loop scheduler to c-shuffle gemm variants will gradually roll out to other applicable device ops when occasional reg spill is resolved * additional comments * format * fix mismatch between inter-wave pipeline and interwave blockwise gemm * address review feedback * amend [ROCm/composable_kernel commit: `76764d8c92`]	2022-05-10 19:19:22 -05:00
Adam Osewski	762d0e382a	Post PR183 review fixes. (#224 ) * Suppress additional warnings for googltest. * Rename file conv_fwd_util to conv_util. * Update includes and ConvParams member access. * Formatting. * Change conv_fwd_util target to conv_util * Fix compiler errors. * Fix leftovers. Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `712e464c4e`]	2022-05-10 15:41:29 -05:00
myamlak	8f767322ca	Resolution of issue #153 : Add compiler warning on comparing int and size_t (#212 ) * Turning compare warnings on * Cleaning part I * Cleaning part II * Explicit static_cast to ck::type_convert * Resolving large tensor size issue. * format * revert change to tensor descriptor; promote lementSpaceSize to 64bit * use integer value for GEMM test * Review remarks * Review remarks + issues with (un)signed arithmetic * Format fix * Format * Clang-format. * fix 2gb limit issue Co-authored-by: Chao Liu <chao.liu2@amd.com> Co-authored-by: Adam Osewski <aosewski@amd.com> [ROCm/composable_kernel commit: `f03a1738d9`]	2022-05-09 15:06:49 -05:00
Wen-Heng (Jack) Chung	1dc34ba98b	Update README.md (#228 ) [ROCm/composable_kernel commit: `968bd93285`]	2022-05-09 15:00:04 -05:00
Chao Liu	a5ad59ed11	Code refactor (#175 ) * format * improving pipeline * fix typo * format * adding thread group * adding thread group * adding thread group * adding gemm pipeline * tweak * refactor * refactor * add missing type convert * refactor * refactor * refactor * clean * fix build * refactor * format * clean up * use remove_cvref_t * clean * clean up * clean up * clean up [ROCm/composable_kernel commit: `ec7c2e912e`]	2022-05-09 14:57:59 -05:00
Illia Silin	6ee5aa4d12	Add Benchmark test into CI (#226 ) * add performance test to jenkins pipeline * fix typo * fix the syntax in conv_fwd_util.cpp * fix the error message syntax spacing * fix the error message syntax spacing again * run profile_gemm and archive results * fix typo * try to figure out the paths * try to figure out the paths one more time * skip the copying step * build ckProfiler release only once * change directory using dir * fix dir syntax * change the gemm parameters * do not pipe script output to file * try running ckProfiler directly * fix typo * use set +e * run profile_gemm.sh \|\| true * run multiple gemms and parse results * fix typo in jenkinsfile * fix syntax * add new gemm sizes, update scripts * put all jenkins steps in original order Co-authored-by: Chao Liu <chao.liu2@amd.com> Co-authored-by: Chao Liu <lc.roy86@gmail.com> [ROCm/composable_kernel commit: `a3c910ac6c`]	2022-05-08 02:44:18 -05:00
Adam Osewski	159494284d	Introduce GoogleTest framework. (#204 ) * Use googletest for tests. Add conv2d_fwd UT. * Add conv1D/3D to gtest UT. * Fix: not duplicate test with CTest. * Convert more tests to googltests. * Fix: GIT_SHALLOW is not allowed for git commit hash. * Clang-format * use integer value for GEMM test Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> Co-authored-by: Chao Liu <lc.roy86@gmail.com> [ROCm/composable_kernel commit: `8eca05a633`]	2022-04-30 08:50:16 -05:00
Chao Liu	ba71166423	use integer value for GEMM test (#219 ) [ROCm/composable_kernel commit: `8a2c69eeee`]	2022-04-30 08:44:20 -05:00
Qianfeng	fbede07a01	Update to gemm_reduce and batched_gemm_reduce (#213 ) * [Experimental] Change to gemm+reduce and batched-gemm+reduce * Use threadwise-reduce function to improve the gridwise_gemm_reduce_xdl_cshuffle kernel * Tiny fix in device_batched_gemm_xdl.hpp * clang-format library/src/utility/conv_fwd_util.cpp [ROCm/composable_kernel commit: `c77ae65d40`]	2022-04-29 11:35:25 -05:00
JD	23a2849670	Add gfx90a CI stage for tests (#208 ) * Add gfx90a CI stage * upgrade to ROCm 5.1 and fix formatting [ROCm/composable_kernel commit: `97d8c5045e`]	2022-04-29 10:36:19 -05:00
Anthony Chang	e79ab59149	Hotfix for gemm test (#214 ) * pass by ref to avoid throwing away initialization results * EOL CRLF -> LF [ROCm/composable_kernel commit: `95e93430de`]	2022-04-29 19:03:34 +08:00
Jianfeng Yan	c6cc2489dd	add comments to batched_gemm (#186 ) * add comments to batched_gemm * formatting * fix a typo in batched_gemm_documentation * fix naming [ROCm/composable_kernel commit: `3956085d8e`]	2022-04-25 14:32:59 -05:00
Anthony Chang	f9a5880af6	profiler: fix fp32 c-shuffle gemm tuning parameter (#194 ) [ROCm/composable_kernel commit: `7c0b149811`]	2022-04-22 15:48:51 -05:00
Adam Osewski	299294647a	Clang-format only modified files. (#181 ) [ROCm/composable_kernel commit: `31d869adc6`]	2022-04-22 15:48:08 -05:00
Anthony Chang	76acb04499	use inline asm for 4x4 int8 transposition (#187 ) [ROCm/composable_kernel commit: `08a979f188`]	2022-04-22 15:47:31 -05:00
Adam Osewski	b32c3df45d	Convolution FWD profiler refactor. (#183 ) * Convolution ND * Code unification across dimensions for generating tensor descriptors. * Example * Instances * Move convnd f32 instance file to comply with repo structure. * Conv 1D tensor layouts. * Formatting and use ReferenceConv * Reference ConvFwd supporting 1D and 2D convolution. * Debug printing TensorLayout name. * Conv fwd 1D instance f32 * Refactor conv ND example. Needed to support various conv dimensio. Needed to support various conv dimensions * Rename conv nd example director to prevent conflicts. * Refactor some common utility to single file. Plus some tests. * Refactor GetHostTensorDescriptor + UT. * Add 1D test case. * Test reference convolution 1d/2d * Remove some leftovers. * Fix convolution example error for 1D * Refactor test check errors utility function. * Test Conv2D Fwd XDL * More UT for 1D case. * Parameterize input & weight initializers. * Rename example to prevent conflicts. * Split convnd instance into separate files for 1d/2d * Address review comments. * Fix data type for flops/gbytes calculations. * Assign example number 11. * 3D cases for convolution utility functions. * 3D reference convolution. * Add support for 3D convolution. * Check for inputs bigger than 2GB. * Formatting * Support for bf16/f16/f32/i8 - conv instances + UT. * Use check_err from test_util.hpp. * Split convnd test into separate files for each dim. * Fix data generation and use proper instances. * Formatting * Skip tensor initialization if not necessary. * Fix CMakefiles. * Remove redundant conv2d_fwd test. * Lower problem size for conv3D UT. * 3D case for convnd example. * Remove leftovers after merge. * Add Conv Specialization string to GetTypeString * Skip instance causing numerical errors. * Small fixes. * Remove redundant includes. * Fix namespace name error. * Script for automatic testing and logging convolution fwd UTs * Comment out numactl cmd. * Refine weights initalization and relax rtol for fp16 * Move test_util.hpp to check_err.hpp * Refine weights initalization and relax rtol for fp16 * Refactor common part of test conv utils. * Move utility function to single common place. * Add additional common functions to utility. * Refactor convnd_fwd_xdl examples. * Remove redundant files. * Unify structure. * Add constructor to ConvParams. * And add input parameters validation. * Modify conv examples to use single utility file. * Remove check_error from host_tensor.hpp * Get rid of check_indices function. * Remove bf16_to_f32 function overload for scalars. * Fix namespace. * Add half_float::half for check_err. * Fix conv params size in UT. * Fix weights initialization for int8. * Fix weights initialization for int8. * Add type_convert when store output in ref conv 1D. * Get back old conv2d_fwd_xdl operation. * Silence conv debug print. * format * clean * clean * Fix merge. * Fix namespace for check_err * Formatting. * Fix merge artifacts. * Remove deleted header. * Fix some includes and use ck::utils::check_err. * Remove unused check_indices restored by previous merge. * Fix namespaces after merge. * Fix compilation error. * Small fixes. * Use common functions. * Fix filename * Fix namespaces. * Fix merge artifact - retrieve removed by accident fun. * Fix ConvForwardSpecialization. * Working example of OpInstanceRunEngine for conv2dfwd UT. * Adhere to coding style rules. * Formatting and adhere to coding style rules. * Fix merge artifacts. * Utility for collecting conv fwd instances. + Plus commmon part for parsing cmdline params. * Refactor FillUniform because of segfault for int8_t. * Naming convention. * Elegant version of device mem allocation. * Use OpInstanceRunEngine in conv fwd nd tests. * Multiple refinements. * conditional init * don't run reference op if not provided. * Use OpInstanceRunEngine for ckProfiler conv_fwd * Refactor common tensor fill function to separate file. * Clean up unused functions. * Support different init methods. * Create CMake target for conv_fwd_util. * Add header for profile_convnd_fwd.cpp * Fix CMakefiles to link with conv_fwd_util where needed. * Fix some clutter. Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `1a0cd5d160`]	2022-04-21 17:39:39 -05:00
JD	123a0f7c64	Fix `clang-format` (#189 ) * Fix clang-format filepath * update docker and fix format [ROCm/composable_kernel commit: `7353ec0c25`]	2022-04-21 17:02:15 -05:00
zjing14	fdec0370c7	removed unused lds loads (#196 ) [ROCm/composable_kernel commit: `860e291c30`]	2022-04-20 22:10:35 -05:00
Qianfeng	9666dd3dd5	Use ck::half_t for Host Reduction (#195 ) * Add math functions for host * Change to host reduction to use ck::math: * Remove the using of half_float::half and half.hpp from reduction example/profiler/ctest [ROCm/composable_kernel commit: `c1ef73192e`]	2022-04-20 22:09:26 -05:00
Illia Silin	f2455f2507	Compile CK for all targets (#188 ) * compile ck for all targets * update the target criteria * change the target condition * fixed some typos * fixed missed file * revert changes in README * revert device_conv3d_fwd_xdl_... * update device_conv3d_fwd_xdl_... * update device_batched_gemm_reduce... * test the unused arguments fix * test the warning suppression * try suppress warnings in device_batched_gemm_reduce_xdl... * fix the last warnings * replace UNUSED with std::ignore * fix a typo * replaced std::ignore with ignore * add igonre header to common_header * refactor atomicAdd Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `4221505d3e`]	2022-04-15 14:17:28 -05:00
Jianfeng Yan	8a4806a3dc	Fix typo in batched gemm profiler (#176 ) * forgot passing BatchedCount in some profiler_batched_gemm * delete default BatchCount [ROCm/composable_kernel commit: `ac0d806650`]	2022-04-07 13:17:15 -05:00
Adam Osewski	f846457a87	Common forward convolution utility refactor. (#141 ) * Convolution ND * Code unification across dimensions for generating tensor descriptors. * Example * Instances * Move convnd f32 instance file to comply with repo structure. * Conv 1D tensor layouts. * Formatting and use ReferenceConv * Reference ConvFwd supporting 1D and 2D convolution. * Debug printing TensorLayout name. * Conv fwd 1D instance f32 * Refactor conv ND example. Needed to support various conv dimensio. Needed to support various conv dimensions * Rename conv nd example director to prevent conflicts. * Refactor some common utility to single file. Plus some tests. * Refactor GetHostTensorDescriptor + UT. * Add 1D test case. * Test reference convolution 1d/2d * Remove some leftovers. * Fix convolution example error for 1D * Refactor test check errors utility function. * Test Conv2D Fwd XDL * More UT for 1D case. * Parameterize input & weight initializers. * Rename example to prevent conflicts. * Split convnd instance into separate files for 1d/2d * Address review comments. * Fix data type for flops/gbytes calculations. * Assign example number 11. * 3D cases for convolution utility functions. * 3D reference convolution. * Add support for 3D convolution. * Check for inputs bigger than 2GB. * Formatting * Support for bf16/f16/f32/i8 - conv instances + UT. * Use check_err from test_util.hpp. * Split convnd test into separate files for each dim. * Fix data generation and use proper instances. * Formatting * Skip tensor initialization if not necessary. * Fix CMakefiles. * Remove redundant conv2d_fwd test. * Lower problem size for conv3D UT. * 3D case for convnd example. * Remove leftovers after merge. * Add Conv Specialization string to GetTypeString * Skip instance causing numerical errors. * Small fixes. * Remove redundant includes. * Fix namespace name error. * Script for automatic testing and logging convolution fwd UTs * Comment out numactl cmd. * Refine weights initalization and relax rtol for fp16 * Move test_util.hpp to check_err.hpp * Refine weights initalization and relax rtol for fp16 * Refactor common part of test conv utils. * Move utility function to single common place. * Add additional common functions to utility. * Refactor convnd_fwd_xdl examples. * Remove redundant files. * Unify structure. * Add constructor to ConvParams. * And add input parameters validation. * Modify conv examples to use single utility file. * Remove check_error from host_tensor.hpp * Get rid of check_indices function. * Remove bf16_to_f32 function overload for scalars. * Fix namespace. * Add half_float::half for check_err. * Fix conv params size in UT. * Fix weights initialization for int8. * Fix weights initialization for int8. * Add type_convert when store output in ref conv 1D. * Get back old conv2d_fwd_xdl operation. * Silence conv debug print. * format * clean * clean * Fix merge. * Fix namespace for check_err * Formatting. * Fix merge artifacts. * Remove deleted header. * Fix some includes and use ck::utils::check_err. * Remove unused check_indices restored by previous merge. * Fix namespaces after merge. * Fix compilation error. * Small fixes. * Use common functions. * Fix filename * Fix namespaces. * Fix merge artifact - retrieve removed by accident fun. * Fix ConvForwardSpecialization. * Adhere to coding style rules. * Fix merge artifacts. Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `abf4bdb9a9`]	2022-04-05 15:16:59 -05:00
ltqin	fbc03c595d	Patch for bwd data comments (#174 ) * change function name and way to set input zero * change enable if [ROCm/composable_kernel commit: `6717168c18`]	2022-04-04 20:33:53 -05:00
ltqin	f3eb4639a8	NHWC Conv2d Bwd weight fp16 ckprofiler and test (#166 ) * change backward weight name * start add bwd weight lib and profiler * change tuning paramter * change output info * add bwd weight test * change test info * using conv_util * change wgt to weight * add } * add fp32 [ROCm/composable_kernel commit: `781cacd2e6`]	2022-04-04 20:32:00 -05:00
Qianfeng	573f1de6fa	Improve Reduction kernel api (#152 ) * Add ThreadwiseReduction functor as per-thread reduction api * Using ThreadwiseReduce api and some change in using PartitionedBlockwiseReduction api to simply the kernels * Add comments and remove useless declarations in the kernels * Tiny updates [ROCm/composable_kernel commit: `82c8b9f8ee`]	2022-04-04 20:31:44 -05:00
Chao Liu	5aa380eb6f	fix build (#171 ) [ROCm/composable_kernel commit: `646878162b`]	2022-03-31 20:30:20 -05:00
Anthony Chang	1450193e62	Tune & add conflict-free LDS gemm kernels (#159 ) * retune & add conflict-free bf16/fp16 c-shuffle gemm instances amend wrong K1 value in some fp16/bf16 kernel instances * make gemm cshuffle's timing behavior consistent with all other functions * clang-format * retune & add conflict-free fp32 c-shuffle gemm instances * retune & add conflict-free int8 c-shuffle gemm instances * update the underlying gridwise gemm of all c-shuffle gemm kernels * typo [ROCm/composable_kernel commit: `7db48f9008`]	2022-03-31 12:58:41 -05:00
ltqin	d61727ef72	Patch for bwd data #134 (#168 ) * remove switch for NDimSpatial * change in, out and wei name * rename reference thumb function name * remove test [ROCm/composable_kernel commit: `c0e95f6204`]	2022-03-31 12:34:18 -05:00
Chao Liu	3f732cceab	Compile for gfx908 and gfx90a (#130 ) * adding compilation for multiple targets * fix build * clean * update Jekinsfile * update readme * update Jenkins * use ck::half_t instead of ushort for bf16 * rename enum classes * clean * rename * clean [ROCm/composable_kernel commit: `cd167e492a`]	2022-03-31 12:33:34 -05:00
Jianfeng Yan	59506defde	fixed issue164 (#165 ) * fixed issue164 * removed prints [ROCm/composable_kernel commit: `ecf337bab5`]	2022-03-31 08:50:30 -05:00
Anthony Chang	8bb6c6e120	use single threaded tensor generator (#161 ) [ROCm/composable_kernel commit: `f015c77687`]	2022-03-30 22:28:30 -05:00
Jianfeng Yan	297ef9795d	batched_gemm: use profiler in ctest (#163 ) [ROCm/composable_kernel commit: `c8f3acf9c0`]	2022-03-30 21:32:49 -05:00
Adam Osewski	6a3f751bf7	Fix return type to be conformant with CTest. (#160 ) Co-authored-by: Adam Osewski <aosewski@amd.com> [ROCm/composable_kernel commit: `982f8bbc29`]	2022-03-30 20:05:20 -05:00
Jianfeng Yan	cb97ce68d8	Batched gemm and reduction (#156 ) * adding batched_gemm_and_reduction * batched_gemm_reduce works with bactch_count=1 * fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1 * adding profiler for batched_gemm_fp16 * fixed a bug in declaration of d1 and d0; both example and profiler work * clang-format * cleanup * batched_gemm_reduce: add test * minor change * fixed some typo in function names [ROCm/composable_kernel commit: `34c661e71c`]	2022-03-30 11:21:18 -05:00
rocking5566	6d537a8c3e	Refine kernel parameter of int8 (ScalarPerVector) (#155 ) * Change int8 ScalarPerVector * Modify vector width of C [ROCm/composable_kernel commit: `98e1e2d0e9`]	2022-03-29 17:36:21 -05:00

1 2 3 4 5 ...

579 Commits