composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-14 10:09:41 +00:00

Author	SHA1	Message	Date
aledudek	f504e98b5d	Extend pool3d fwd avg, max operations by f8_t, int8_t types (#1483 ) * Extend pool3d fwd avg, max operations by f8_t, int8_t types * Pack MaxPool3dFwd params together * Fix MaxPool3dFwd AVG instances * Decrease verification precision for bf16 * Adjust tests + review changes * Adjust threshold for F8 * Adjusted compute types for MAX op instances * Fix ComputeDataType mismatch in tests and profiler for AVG * Fix naming from max_pool3d_fwd to pool3d_fwd * Adjust CMakeLists --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> [ROCm/composable_kernel commit: `a793afc961`]	2024-09-17 15:57:10 +02:00
Mateusz Ozga	d7326fb525	This commit contains implementation of max pool2d for f8 type (#1506 ) Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> [ROCm/composable_kernel commit: `6834e5ee74`]	2024-09-16 10:15:06 +02:00
bibek	50ec07c3e3	Fix duplicate CMake tidy-target issue (#1513 ) [ROCm/composable_kernel commit: `49e012dee1`]	2024-09-13 21:15:04 -07:00
jakpiase	4940f07a4b	Add pool2d int8 and fp8 instances (#1508 ) * add pool2d fp8 and int8 * minor fixes * add formatting * add reviewer suggestions * add reviewer suggestions [ROCm/composable_kernel commit: `8f8a2ce396`]	2024-09-13 10:18:21 -07:00
Jun Liu	3739cf9f74	Customize filesystem in CK for legacy systems (#1509 ) * Legacy support: customized filesystem * Update cmakefile for python alternative path * fix build issues * CK has no boost dependency * More fixes to issues found on legay systems * fix clang format issue * Check if blob is correctly generated in cmake * fix the python issues * add a compiler flag for codegen when using alternative python * use target_link_options instead of target_compile_options --------- Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `81bc1496b2`]	2024-09-13 07:51:07 -07:00
Mateusz Ozga	92d1b386b2	Pool2d max/avg kernel in the BWD version (#1494 ) * Add pool2d instance BWD AVG * Add pool2d instance BWD MAX * Fix: avg review * Fix review: part2 * Fix - enable test when type is compiled * Fix review part3 [ROCm/composable_kernel commit: `448c0f56d8`]	2024-09-12 11:47:52 +02:00
jakpiase	cb4975cf70	Rewrite pool2d fwd (#1462 ) * added pool2d fwd * add tests * add reviewers changes * Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new" This reverts commit `6b2ba7ff89`, reversing changes made to `22c82bea0c`. * Revert "add reviewers changes" This reverts commit `22c82bea0c`. * added reviewers comments * revert some old files * add reviewers requests --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> [ROCm/composable_kernel commit: `e8d2887cb2`]	2024-09-11 15:21:00 +02:00
Haocong WANG	4e4514caa8	Add gemm universal bf16 instances (#1484 ) * revert ckprofiler change * temp save * Add test and test pass * test pass * Fix bug inside rotating buffer when tensor is not packed * bug fix * clang format --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `5b10dae6a4`]	2024-09-04 20:58:54 -07:00
Illia Silin	132c89b29d	copy all fmha headers when building library (#1497 ) * copy all fmha headers when building library * fix the rocm_install call for mha headers [ROCm/composable_kernel commit: `8b95d9ad52`]	2024-09-04 07:36:41 -07:00
Illia Silin	234bc58d2d	Add an option to select an alternative python version during build. (#1496 ) * locate a newwer version of python when -DRHEL=ON flag is set * allow setting python version on cmake command line [ROCm/composable_kernel commit: `841009c5ee`]	2024-09-04 07:36:27 -07:00
Bartłomiej Kocot	950165c6fb	Add support for NGCHW in grouped conv bwd wei (#1491 ) * Add support for NGCHW in grouped conv bwd wei * Comments fixes * navi fixes * Update function names [ROCm/composable_kernel commit: `73b67f290f`]	2024-09-03 10:52:03 +02:00
Bartłomiej Kocot	9974926658	Revert "Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) (#1415 )" (#1455 )" (#1490 ) This reverts commit 725dd433cdc6435d481e806b5442a07b0097c94a. [ROCm/composable_kernel commit: `a9b170b541`]	2024-09-02 10:39:49 +02:00
Andriy Roshchenko	f6c6819b47	Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. (#1473 ) * Enable CMakePresets build * Verify Convolution, Scaling and ReLU algorithms. * Add tensor element-wise scale and type cast operation. * Reduction implemented but does not work. * Exploration of Reduction functionality. * Completed example for Convolution scaled with ReLu activation and AMAX reduction. * WIP: Add required instances for convolution. * WIP: Create client example. Implement convolution stage. * Add elementwise instances. * Add elementwise scale + convert example. * Add reduction instances. * WIP: Client example for AMAX reduction. * WIP: Add instances for multistage reduction. * WIP: Implementation of multistage reduction. * Refactoring. * Clean up. * Add CMakePresets.json * Guard off FP8 instances when the data type is not available. * Add example for Scaled FP8 Convolution with AMAX reduction. * Refactor CombConvScaleRelu instances. * Add CombConvScale instances. * Add client example for Scaled FP8 Convolution with AMAX reduction. * Cleanup. [ROCm/composable_kernel commit: `c3515f277c`]	2024-08-21 15:22:41 -07:00
Rostyslav Geyyer	0ab95a332e	Set RNE fp8 conversion as a default (#1458 ) * Set RNE fp8 conversion as a default * Update f8 tests * Disable failing test on gfx11 * Update bf8 tests * Add a flag * Fix the flag * Raise flag for gfx10 as well * Temp commit for tolerance testing * Update tolerances [ROCm/composable_kernel commit: `e20f20efbf`]	2024-08-21 09:09:48 -07:00
Andriy Roshchenko	10edb0c70e	Adding Instances and Examples for FP8-based Scaled Convolution with ReLU Activation and AMAX Reduction. (#1469 ) * Enable CMakePresets build * Verify Convolution, Scaling and ReLU algorithms. * Add tensor element-wise scale and type cast operation. * Reduction implemented but does not work. * Exploration of Reduction functionality. * Completed example for Convolution scaled with ReLu activation and AMAX reduction. * WIP: Add required instances for convolution. * WIP: Create client example. Implement convolution stage. * Add elementwise instances. * Add elementwise scale + convert example. * Add reduction instances. * WIP: Client example for AMAX reduction. * WIP: Add instances for multistage reduction. * WIP: Implementation of multistage reduction. * Refactoring. * Clean up. * Guard off FP8 instances when the data type is not available. * Improve output readability. * Addressing reviewer's comments. [ROCm/composable_kernel commit: `a94113a941`]	2024-08-20 10:30:56 -05:00
Illia Silin	ad65d8d5b0	Re-enable fp8 types for all architectures. (#1470 ) * re-enable fp8 and bf8 for all targets * restore the fp8 gemm instances * re-enable conv_3d fp8 on all architectures * diasble several fp8 gemm instances on all architectures except gfx94 * clang format fix [ROCm/composable_kernel commit: `c8b6b64240`]	2024-08-16 16:07:52 -06:00
Haocong WANG	65d6442b4c	[GEMM] gemm_universal related optimization (#1453 ) * replace buffer_atomic with global_atomic * fixed global_atomic_add * added bf16 atomic_add * format * clang-format-12 * clean * clean * add guards * Update gtest.cmake * enabled splitk_gemm_multi_d * format * add ckProfiler * format * fixed naming * format * clean * clean * add guards * fix clang format * format * add kbatch printout * clean * Add rocm6.2 related gemm optimization * Limit bf16 atomic usage * remove redundant RCR gemm_universal instance * Add RRR fp8 gemm universal instance * Bug fix * Add GPU_TARGET guard to FP8/BF8 target * bug fix * update cmake * remove all fp8/bf8 example if arch not support * Enable fp8 RRR support in ckProfiler * limit greedy-reverse flag to gemm_universal in ckProfiler --------- Co-authored-by: Jing Zhang <jizhan@fb.com> Co-authored-by: Jing Zhang <jizhan@meta.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `3049b5467c`]	2024-08-14 10:42:30 +08:00
AngryLoki	6a4b36d948	Fix compilation errors with libc++ (#1461 ) This fixes 2 issues when compiled with libc++. First issue is attempt to call std::numeric_limits<ranges::range_value_t<_Float16>>::min(). _Float16 is extension of libstdc++, it does not exist in C++ standard[2]. Luckily, there is NumericLimits class in composable_kernel, which does everything needed. Second issue with call to 'check_err' is ambiguous: there are 2 candidates. It happens because composable_kernel relies on idea that f8_t (defined as _BitInt(8)) does not pass is_integral trait. However, libc++ treats _BitInt(N) as integral (per standard "any implementation-defined extended integer types" can be integral). Closes: #1460 Signed-off-by: Sv. Lockal <lockalsash@gmail.com> [ROCm/composable_kernel commit: `50c423481b`]	2024-08-13 14:31:15 -05:00
Illia Silin	92df7893df	Disable inapplicable xdl and mha instances for gfx12 (#1464 ) [ROCm/composable_kernel commit: `cbb6f2ab8c`]	2024-08-12 15:11:58 -07:00
Jun Liu	254a7dadb6	Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) (#1415 )" (#1455 ) This reverts commit 0c367d5912486f4fcbae1dbb861a1fb8176ca308. [ROCm/composable_kernel commit: `5ff8eeebf9`]	2024-08-08 19:09:33 -07:00
bibek	c8c3293b0b	adding mha as static lib (#1366 ) * adding mha as static lib * add fmha fwd compile options * typo * fix python version * python version to 3 * increase path length * add max path flag in mha cmake * fix long path issue * mha currently only runs in gfx94x * only buld mha in mi300 * populate gpu_list * add mha compile flags * avoid building mha in gpu other then gfx94x * some comments and include ck_tile in rocm * use rocm_install * place ck_tile in include * correct ck_tile path --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `840c5397bb`]	2024-08-06 11:17:10 -05:00
Bartłomiej Kocot	69a6b563f9	Add Grouped Conv Fwd Large Tensor kernel (#1432 ) * Support 64 bit indexing * Add new grouped conv fwd kernel for large tensors * Add instances large tensor * Fixes for transform conv to gemm * Fixes * fixes * Remove not needed instances * examples fixes * Remove not need ds arrays * Fix tests * Add 2GB check in gridwise dl * Fixes [ROCm/composable_kernel commit: `4ec5c52a0c`]	2024-08-06 10:06:10 +02:00
Illia Silin	8f71de4707	add --offload-compress compiler flag (#1433 ) * add --offload-compress compiler flag * only apply the --offload-compress flag to the ckProfiler * move the --offload-compress flag back to main cmake file * add offload-compress to target compile option of ckProfiler --------- Co-authored-by: carlushuang <carlus.huang@amd.com> [ROCm/composable_kernel commit: `7f57b2e02c`]	2024-08-05 23:26:01 +08:00
Bartłomiej Kocot	1567614d80	Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) (#1415 ) [ROCm/composable_kernel commit: `33b399cc15`]	2024-07-30 18:36:04 +02:00
Andriy Roshchenko	e3b469a493	Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. (#1412 ) * Add CMakePresets configurations. * Add binary elementwise ConvScaleAdd and an example. * Numerical verification of results. Observed significant irregularities in F8 to F32 type conversions: ```log ConvScaleAdd: float=145.000000 f8_t=160.000000 e=144.000000 ConvScaleAdd: float=97.000000 f8_t=96.000000 e=104.000000 ConvScaleAdd: float=65.000000 f8_t=64.000000 e=72.000000 ``` * Implemented ConvScaleAdd + Example. * Add ConvScale+Bias Instances * Add Client Example for ConvScale+Bias * Fix number of bytes in an example.. * Cleanup. [ROCm/composable_kernel commit: `4a8a1befd5`]	2024-07-24 15:49:55 -05:00
Haocong WANG	c69df380b9	disable bad instance (#1410 ) [ROCm/composable_kernel commit: `d22713a719`]	2024-07-23 09:05:03 -07:00
Bartłomiej Kocot	b23a3fcf77	Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) [ROCm/composable_kernel commit: `5d8c3d8190`]	2024-07-22 14:21:24 +02:00
Haocong WANG	a0e0f3cdcc	[GEMM] F8 GEMM, performance optimized. (#1384 ) * add ab_scale init support * enabled interwave * add scale type; update isSupport * adjust example * clean * enable f8 pure gemm rcr ckprofiler * Add gemm_multiply_multiply instances * clang format * Optimize for ScaleBlockMNK=128 * enable abscale f8 gemm ck profiler * Add pure f8 gemm test suite * Reverting to the state of project at `f60fd77` * update copyright * clang format * update copyright --------- Co-authored-by: root <jizhan@amd.com> [ROCm/composable_kernel commit: `8c90f25be3`]	2024-07-19 22:06:52 +08:00
ltqin	50c6703b31	Universal gemm splitk using reduce (with multi-d) (#1341 ) * init for reduce_threadwise multi_d * add reduce_threadwise_multi_d * add reduce_multi_d * clean * start add an other splitk device op * add reduce template parameter to SplitKBatchOffset * add reduce c matrix * clean up code * change example data type to bf16 * add bf16Ai8B example * remove reduce template parameter * add splitk atomic status to v4 * example add multi d parameters * device op add multi-d parameters * add multi-d to reduce * fix kbach=1 bug * change B layout to col in bf16Ai8B example * remove float adding struct * change multi-d interface * change file and class name * remove multi-d of bf16Ai8B example * change IsReduce function to IsReduceAdd * change example layout to RRR from RCR * according layout to set ds stride * reset parameter layout * add gemm universal reduce instance * add reduce factory * add profile_gemm_universal_reduce * add reduce to profiler * fix reduce instance * fix profiler reduce compiling bug * format * format library instance code * add mem instance for reduce library * fix call instance names * add workspace for reduce in ckProfiler * format * add mnpading to reduce library instance * add fp16 instance to reduce of profiler * change copyright time * restore profiler cmake file * add reduce text to instances * add DsLayout and DsDataType to instances template parameter * fixed gemm_reduce_multi_d * add an example without multi_d * Update common.hpp * Update gtest.cmake * Update gemm_xdl_splitk_reduce_bf16.cpp * clean * Update gtest.cmake * format * fixe api * format * default parameter change to RRR * add vector_len for multi_d * format * Update gtest.cmake * fix bf16A iBB elementwiseop * add ReduceDataType * move ReduceDataType to end position * format * remove googletest git method address * fix copyright time * update init data --------- Co-authored-by: root <jizhan@amd.com> Co-authored-by: letaoqin <letaoqin@amd.com> Co-authored-by: Jing Zhang <jizhan@meta.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `c544eb4da0`]	2024-07-19 22:01:22 +08:00
Andriy Roshchenko	a765481437	Adding more instances of grouped convolution 3d forward for FP8 with ConvScale element-wise operation and ReLU activation. (#1386 ) * Add CMakePresets configurations. * Add ConvScale+ReLU Functor and an Example * Account for ReLU FLOPs. * Add instances of 3D convolutions with ConvscaleRelu operation. * Implement Client Example * Cleanup [ROCm/composable_kernel commit: `802a8a1df1`]	2024-07-16 08:51:49 -07:00
Haocong WANG	d25cc9596c	Disbale failed instance in rocm6.2 rel (#1388 ) [ROCm/composable_kernel commit: `1ff4f25138`]	2024-07-16 08:46:48 -07:00
Bartłomiej Kocot	07ca6dacf1	Support access per groups and filter3x3 in grouped conv fwd (#1382 ) * Support access per groups and filter3x3 in grouped conv fwd * Fixes for large cases * Fixes for large tensors [ROCm/composable_kernel commit: `82e8a78a3f`]	2024-07-12 11:08:42 -07:00
Rostyslav Geyyer	2cd91e7d12	Add instances for grouped conv fwd 3d with ConvScale for bf8@fp8->fp8 (#1369 ) * Add an example * Add instances * Add a client example [ROCm/composable_kernel commit: `7a46a91c84`]	2024-07-11 13:31:39 -07:00
Illia Silin	3e1d0b3d5d	Fix the cmake logic when building with INSTANCES_ONLY=ON. (#1376 ) * fix the cmake logic when building for various targets * another minor fix [ROCm/composable_kernel commit: `a328df25a1`]	2024-07-08 21:21:16 -07:00
Andriy Roshchenko	0675d258a6	Add ckProfiler support for forward 3D convolutions with OUT element-wise operations. (#1354 ) [ROCm/composable_kernel commit: `eb44e0472a`]	2024-07-08 10:55:54 -07:00
Harisankar Sadasivan	c5f81450e1	Universal streamk with atomics (#1360 ) * universal streamk with atomics with ckprofiler support. grid_size and streamk strategy are tunable. grid_size of -1 leads to #WGs = maximum occupancy X num_CUs. implementation supports many different streamk policies: 1-tile, 2-tile, 3-tile and 4-tile. streamk strategy of -1 leads to default streamk policy (4-tile). * Update README.md * fixing clang-format issues * removed conflicts in struct members between streamk and universal streamk * corrected arg parsing for streamk and universal streamk * added stream-k policies for 3 tile and 4 tile * fixed argument type issue with parsing cmd args * changes suggested in PR review are made- removing comments and correcting copyright * file permissions updated * added default value support for grid_size and streamk-policy selection set to -1 * print messages for arguments * print messages for arguments * print messages for arguments1 [ROCm/composable_kernel commit: `75e622f02f`]	2024-07-05 21:40:30 -07:00
jakpiase	3a04bdded7	Add structural sparsity gemm instruction tests (#1309 ) * first version of smfmac test * add reviewer comments * add reviewer suggestions [ROCm/composable_kernel commit: `ed21948bcd`]	2024-06-27 11:30:32 +02:00
Illia Silin	cd1e33cce4	Merging the gfx12 code into public repo. (#1362 ) [ROCm/composable_kernel commit: `941d1f7ce0`]	2024-06-27 00:33:34 -07:00
Andriy Roshchenko	e05bfee7e5	Add instances of grouped convolution 3d forward with a ConvScale element-wise op for bf8@bf8->fp8 (#1326 ) We are adding more instances of grouped convolution 3d forward with a ConvScale element-wise operation. This commit handles bf8@bf8->fp8 data types combination. * Included an example. * Added instances. * Added a client example. --------- Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com> Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> [ROCm/composable_kernel commit: `05b10e0e5a`]	2024-06-21 19:02:57 -06:00
jakpiase	92853de60e	Switch to universal gemm in grouped gemm tile loop (#1335 ) * switch to universal gemm in grouped gemm tile loop * minor fixes * add reviewers comments --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> [ROCm/composable_kernel commit: `e2d139201b`]	2024-06-18 09:01:49 -05:00
Rostyslav Geyyer	2832eb1444	Add instances for grouped conv fwd 3d with ConvScale for fp8@bf8->fp8 (#1325 ) * Add fp8 bf8 conv example * Add instances * Add client example * Add random scale values * Format [ROCm/composable_kernel commit: `acda4c5a3c`]	2024-06-12 14:41:56 -05:00
Bartłomiej Kocot	f1600a0db5	Fix nhwgc f16 wmma instances (#1328 ) [ROCm/composable_kernel commit: `5fc1bee4c5`]	2024-06-11 09:52:38 +02:00
Rostyslav Geyyer	9416b16080	Add a convinvscale op, related instances and examples (#1307 ) * Update the element op * Add an example * Add instances * Add a client example * make sure new instances only build on gfx9 * Update element op and its handling * Format * Update instances to take element op as an argument * Update examples to use random scale values * Format * Update client example with random scales * Format --------- Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `ce66277a76`]	2024-06-10 14:48:49 -05:00
Bartłomiej Kocot	4716f8f70b	Integrate universal gemm with conv forward (#1320 ) * Integrate universal gemm with conv fwd * Fix conv fwd wmma test * Fix instances * Remove direct load check [ROCm/composable_kernel commit: `ac58cc5d1d`]	2024-06-05 13:01:29 -05:00
Rostyslav Geyyer	692ae331ca	Add a scale op, related instances and examples (#1242 ) * Add a scale op * Update the element op * Add instances * Add an example * Add a client example * Add a flag check * Revert flag check addition * Fix flag check * Update d strides in example * Update d strides in client example * Apply suggestions from code review Update copyright header Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Move the example * Move the client example * Update element op * Update example with the new element op * Add scalar layout * Update example * Update kernel for scalar Ds * Revert kernel changes * Update element op * Update example to use scales' pointers * Format * Update instances * Update client example * Move element op to unary elements * Update element op to work with values instead of pointers * Update instances to take element op as an argument * Update examples to use random scale values --------- Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> [ROCm/composable_kernel commit: `cb0645bedc`]	2024-06-04 19:28:15 -05:00
Illia Silin	c6b1a8b2e9	Split the gemm_multi_abd instances. (#1306 ) * split the gemm_multi_abd instances * update the dates [ROCm/composable_kernel commit: `ec2bae27ff`]	2024-05-23 09:17:02 -07:00
Bartłomiej Kocot	c6431f6c07	Optimize grouped conv bwd weight for small M and N (#1303 ) * Optimize grouped conv bwd weight for small M and N * Fixes [ROCm/composable_kernel commit: `fd72380aeb`]	2024-05-22 21:01:01 +02:00
Illia Silin	6cf9f7f72c	Select appropriate GPU targets for instances, tests, and examples. (#1304 ) * set individual gpu targets for instances, examples, tests * fix path to hip compiler * fix path to hip compiler once more * aggregate device macros in ck_tile config header * fix the cmake logic for instances * fix clang format * add gfx900 and gfx906 to default set of targets [ROCm/composable_kernel commit: `7b027d5643`]	2024-05-22 11:45:27 -07:00
Bartłomiej Kocot	b7ee312021	Change output gemm type to AccDataType in two stage conv bwd wei (#1283 ) [ROCm/composable_kernel commit: `8346af9c68`]	2024-05-10 10:57:42 +02:00
Bartłomiej Kocot	68b2757f11	Add two stage grouped conv bwd weight kernel (#1280 ) [ROCm/composable_kernel commit: `0b6b5d1785`]	2024-05-08 09:53:24 +02:00

1 2 3 4 5 ...

353 Commits