composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-15 10:37:44 +00:00

Author	SHA1	Message	Date
Adam Osewski	87b69930e7	Change block gemm pipeline local prefill loop order. (#1692 ) * Fix loop order. * Fix loop order in pipeline v4 [ROCm/composable_kernel commit: `bfe983a151`]	2024-11-26 17:36:53 +01:00
jakpiase	50ee0ac283	Add check for bf16 splitk support for grouped gemm splitk (#1673 ) * add check for bf16 splitk support for grouped gemm splitk * Update if condition --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> [ROCm/composable_kernel commit: `b70f367f80`]	2024-11-26 13:56:32 +01:00
Harisankar Sadasivan	0d34db594d	universal streamk fp8 changes (#1665 ) * universal streamk fp8 changes & ckprofiler instances * revert strides to -1 and verification options * fp8 exclusion on pre-gfx94 for universal_streamk * PR review based revisions: permissions reverted, removed hip err checks --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `d6d4c2788b`]	2024-11-21 08:21:37 -08:00
Illia Silin	dc227604bc	Add bf16 and int8 wmma gemms for Navi3x and Navi4x. (#1671 ) * add bf16 gemms for gfx11/gfx12 * reduce the input values in test_gemm * add int8 wmma gemm instances for gfx11/gfx12 * add example gemm_wmma_int8 * fix bug in gemm_wmma_int8 test * increase bf16 gemm test tolerance * update the dates and clean-up commented-out instances [ROCm/composable_kernel commit: `8aba2724cc`]	2024-11-18 14:07:04 -08:00
Bartłomiej Kocot	b89a44ea33	Batched GEMM Multiple D based on Universal GEMM (#1655 ) * Batched GEMM Multiple D based on Universal GEMM Co-authored-by: Jing Zhang <jizhan@fb.com> * CI fixes Co-authored-by: Jing Zhang <jizhan@fb.com> --------- Co-authored-by: Jing Zhang <jizhan@fb.com> [ROCm/composable_kernel commit: `754adc70e3`]	2024-11-18 14:03:45 +01:00
Illia Silin	d80f50d5e9	fix clang format (#1662 ) [ROCm/composable_kernel commit: `efd9261545`]	2024-11-13 09:20:18 -08:00
Taylor Ding	7c9257128d	Move checks for compatibility from Argument() to IsSupportedArgument() (#1653 ) [ROCm/composable_kernel commit: `73f02a1083`]	2024-11-13 11:20:38 -05:00
Illia Silin	ddfcce82ab	enable compilation for generic navi targets (#1645 ) [ROCm/composable_kernel commit: `75c5bfa364`]	2024-11-07 14:14:42 -08:00
darren-amd	bee5289f56	Statically Cast Pointer Offset (#1631 ) * explicit cast ptr offset * formating change [ROCm/composable_kernel commit: `d0e3a70a2e`]	2024-11-05 09:59:08 -08:00
Bartłomiej Kocot	724312aea3	Remove virtual destructors from unary ops (#1610 ) * Remove virtual destructors from unary ops * Fixes * Fixes * clang format fixes [ROCm/composable_kernel commit: `9a8a52130d`]	2024-10-30 17:42:50 +01:00
Illia Silin	75b1a7a6fe	fix compilation errors for gfx12 with clang20 (#1606 ) [ROCm/composable_kernel commit: `922e42a039`]	2024-10-28 19:02:48 -07:00
Bartłomiej Kocot	930195c384	Add dynamic elementwise op (#1426 ) * Add dynamic elementwise op Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com> * CI issues fix * Custom parameter value for dynamic functions - Comments addressed --------- Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com> Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com> [ROCm/composable_kernel commit: `31bf253aeb`]	2024-10-26 15:22:37 +02:00
valarLip	85cf31cf40	add int8 gemm multiply multiply a8w8 (#1591 ) * add int8 gemm multiply multiply a8w8 * uncomment * clang-format-12 * Add example_gemm_multiply_multiply_xdl_int8 * Remove shell scripts * update preprocess number for mi308; bring back printout in ckprofiler * format --------- Co-authored-by: chenjun <junchen2@amd.com> Co-authored-by: Haocong WANG <haocwang@amd.com> Co-authored-by: carlushuang <carlus.huang@amd.com> [ROCm/composable_kernel commit: `37f7afed1e`]	2024-10-26 16:39:34 +08:00
aledudek	2c3399d29b	Generic threshold calculation (#1546 ) * Calculate generic relative threshold pool3dfwd * Calculate absolute error threshold pool3d fwd * Generic threshold calculation take max input for relative error pool3dfwd * Remove max possible value for error calculation at runtime * Remove debug print in pool3dfwd * Pool3d fwd adjusted types in generic threshold calculation * Generic threshold calculation take into account number of accumulations and accdatatype * Generic threshold fix final error formula * Generic threshold calculation - num of accs fix * Generic threshold calculation - adjust absolute error * Generic threshold calculation - OutDataType in absolute error [ROCm/composable_kernel commit: `9385caa306`]	2024-10-25 12:46:24 +02:00
Jatin Chaudhary	2a074cd391	Explicit cast values to half (#1593 ) Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `4d5248e2d1`]	2024-10-22 11:17:32 -07:00
Rostyslav Geyyer	63c478e14f	Add custom type vector support (#1333 ) * Add non_native_vector_type * Add a test * Add non-native vector type * Fix CTOR * Fix non-native vector type of 1 * Fix CTORs * Use vector_type to cover non-native implementation as well * Update the test * Format * Format * Fix copyright years * Remove BoolVecT so far * Add AsType test cases * Update assert error message * Remove redundant type * Update naming * Add complex half type with tests * Add tests for vector reshaping * Add missing alignas * Update test/data_type/test_custom_type.cpp Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> * Compare custom types to built-in types * Add default constructor test * Add an alignment test --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `4cf70b36c1`]	2024-10-14 11:56:45 -05:00
Bartłomiej Kocot	f51ed2ad28	Add transpose scale amax example (#1547 ) * Add transpose scale amax example * fixes * Tune reduce instance [ROCm/composable_kernel commit: `f21cda2536`]	2024-10-14 17:39:38 +02:00
Adam Osewski	ec25e56259	Implement GetWorkSpaceSize from BaseOperator. (#1564 ) [ROCm/composable_kernel commit: `29d384d0b2`]	2024-10-12 14:05:11 +08:00
Christopher Millette	f76d476983	Fixes small memory leak from missing hipEventDestroy (#1554 ) [ROCm/composable_kernel commit: `ceaed8e097`]	2024-10-09 09:41:35 +02:00
Illia Silin	ee93500dad	Fix build logic using GRU_ARCHS. (#1536 ) * update build logic with GPU_ARCHS * fix the GPU_ARCHS build for codegen * unset GPU_TARGETS when GPU_ARCHS are set [ROCm/composable_kernel commit: `7d8ea5f08b`]	2024-10-07 08:18:23 -07:00
Bartłomiej Kocot	58d4e01645	Fix grouped gemm check to avoid overflow (#1545 ) [ROCm/composable_kernel commit: `6b54d2faf8`]	2024-10-04 17:32:43 +02:00
macurtis-amd	72f57d6c42	Fix compilation errors generated by forthcoming Clang changes (#1544 ) Without this change, the following diagnostic is generated: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] See C++17 spec [temp.names] p5. [ROCm/composable_kernel commit: `aeb7c91f48`]	2024-10-02 13:56:22 -07:00
Illia Silin	fb225a93d0	Fix compilation errors with Clang20.0. (#1533 ) * fix clang20 compilation errors for gfx90a * fix clang20 compilation errors for gfx11 targets [ROCm/composable_kernel commit: `42e6dceacc`]	2024-09-25 13:45:38 -07:00
Bartłomiej Kocot	9d56280a0a	Add support for NGCHW in grouped conv fwd (#1499 ) * Support NGCHW in grouped conv fwd * Remove not needed variable * Fixes [ROCm/composable_kernel commit: `4ba52b35dc`]	2024-09-20 10:45:46 +02:00
Adam Osewski	bb7364d3c9	Remove unsupported (fp8) type from Add memory operation. (#1521 ) The dynamic buffer doesn't have support for fp8 in `Update` operation thus fp8 is not supporting `InMemoryDataOperation::Add` [ROCm/composable_kernel commit: `0c39954da9`]	2024-09-20 09:40:45 +02:00
Jun Liu	3739cf9f74	Customize filesystem in CK for legacy systems (#1509 ) * Legacy support: customized filesystem * Update cmakefile for python alternative path * fix build issues * CK has no boost dependency * More fixes to issues found on legay systems * fix clang format issue * Check if blob is correctly generated in cmake * fix the python issues * add a compiler flag for codegen when using alternative python * use target_link_options instead of target_compile_options --------- Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `81bc1496b2`]	2024-09-13 07:51:07 -07:00
Mateusz Ozga	92d1b386b2	Pool2d max/avg kernel in the BWD version (#1494 ) * Add pool2d instance BWD AVG * Add pool2d instance BWD MAX * Fix: avg review * Fix review: part2 * Fix - enable test when type is compiled * Fix review part3 [ROCm/composable_kernel commit: `448c0f56d8`]	2024-09-12 11:47:52 +02:00
jakpiase	cb4975cf70	Rewrite pool2d fwd (#1462 ) * added pool2d fwd * add tests * add reviewers changes * Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new" This reverts commit `6b2ba7ff89`, reversing changes made to `22c82bea0c`. * Revert "add reviewers changes" This reverts commit `22c82bea0c`. * added reviewers comments * revert some old files * add reviewers requests --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> [ROCm/composable_kernel commit: `e8d2887cb2`]	2024-09-11 15:21:00 +02:00
jakpiase	bf3518b45a	Added structural sparsity blockwise gemm (#1435 ) * Implemented smfmac xdlops * Added smfmac blockwise xdlops * fixes * add reviewers suggestions --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> [ROCm/composable_kernel commit: `2a261afcdf`]	2024-09-11 15:19:42 +02:00
M.Emin Ozturk	6c1bd4d47c	Moficiation to fix this issue "threadwise_tensor_slice_transfer_v5r1 issue #1279 " (#1492 ) * issue fix, one line changed for tmp * clang --------- Co-authored-by: Emin Ozturk <emin.ozturk@utah.edu> Co-authored-by: Harisankar Sadasivan <135730918+hsadasiv@users.noreply.github.com> [ROCm/composable_kernel commit: `8378855361`]	2024-09-04 21:52:55 -07:00
Haocong WANG	4e4514caa8	Add gemm universal bf16 instances (#1484 ) * revert ckprofiler change * temp save * Add test and test pass * test pass * Fix bug inside rotating buffer when tensor is not packed * bug fix * clang format --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `5b10dae6a4`]	2024-09-04 20:58:54 -07:00
Bartłomiej Kocot	950165c6fb	Add support for NGCHW in grouped conv bwd wei (#1491 ) * Add support for NGCHW in grouped conv bwd wei * Comments fixes * navi fixes * Update function names [ROCm/composable_kernel commit: `73b67f290f`]	2024-09-03 10:52:03 +02:00
Bartłomiej Kocot	9974926658	Revert "Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) (#1415 )" (#1455 )" (#1490 ) This reverts commit 725dd433cdc6435d481e806b5442a07b0097c94a. [ROCm/composable_kernel commit: `a9b170b541`]	2024-09-02 10:39:49 +02:00
Andriy Roshchenko	f6c6819b47	Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. (#1473 ) * Enable CMakePresets build * Verify Convolution, Scaling and ReLU algorithms. * Add tensor element-wise scale and type cast operation. * Reduction implemented but does not work. * Exploration of Reduction functionality. * Completed example for Convolution scaled with ReLu activation and AMAX reduction. * WIP: Add required instances for convolution. * WIP: Create client example. Implement convolution stage. * Add elementwise instances. * Add elementwise scale + convert example. * Add reduction instances. * WIP: Client example for AMAX reduction. * WIP: Add instances for multistage reduction. * WIP: Implementation of multistage reduction. * Refactoring. * Clean up. * Add CMakePresets.json * Guard off FP8 instances when the data type is not available. * Add example for Scaled FP8 Convolution with AMAX reduction. * Refactor CombConvScaleRelu instances. * Add CombConvScale instances. * Add client example for Scaled FP8 Convolution with AMAX reduction. * Cleanup. [ROCm/composable_kernel commit: `c3515f277c`]	2024-08-21 15:22:41 -07:00
Rostyslav Geyyer	0ab95a332e	Set RNE fp8 conversion as a default (#1458 ) * Set RNE fp8 conversion as a default * Update f8 tests * Disable failing test on gfx11 * Update bf8 tests * Add a flag * Fix the flag * Raise flag for gfx10 as well * Temp commit for tolerance testing * Update tolerances [ROCm/composable_kernel commit: `e20f20efbf`]	2024-08-21 09:09:48 -07:00
Haocong WANG	65d6442b4c	[GEMM] gemm_universal related optimization (#1453 ) * replace buffer_atomic with global_atomic * fixed global_atomic_add * added bf16 atomic_add * format * clang-format-12 * clean * clean * add guards * Update gtest.cmake * enabled splitk_gemm_multi_d * format * add ckProfiler * format * fixed naming * format * clean * clean * add guards * fix clang format * format * add kbatch printout * clean * Add rocm6.2 related gemm optimization * Limit bf16 atomic usage * remove redundant RCR gemm_universal instance * Add RRR fp8 gemm universal instance * Bug fix * Add GPU_TARGET guard to FP8/BF8 target * bug fix * update cmake * remove all fp8/bf8 example if arch not support * Enable fp8 RRR support in ckProfiler * limit greedy-reverse flag to gemm_universal in ckProfiler --------- Co-authored-by: Jing Zhang <jizhan@fb.com> Co-authored-by: Jing Zhang <jizhan@meta.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `3049b5467c`]	2024-08-14 10:42:30 +08:00
Mateusz Ozga	7a4690b077	Support large: 12d tensor size for reduction kenrel (#1465 ) [ROCm/composable_kernel commit: `0606e5498e`]	2024-08-13 16:15:47 +02:00
Bartłomiej Kocot	15ab8b0d5c	Fix bug with n block id calculation in DeviceGroupedConvXdlCShuffle (#1457 ) * Fix typo in TransformConvFwdToGemm * Fix bug in n offset calculation [ROCm/composable_kernel commit: `4a870942e6`]	2024-08-10 13:12:05 +02:00
Jun Liu	254a7dadb6	Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) (#1415 )" (#1455 ) This reverts commit 0c367d5912486f4fcbae1dbb861a1fb8176ca308. [ROCm/composable_kernel commit: `5ff8eeebf9`]	2024-08-08 19:09:33 -07:00
Juan Manuel Martinez Caamaño	61ecdbc128	Remove reinterpret_cast uses that result in undefined behaviour. (#1445 ) * Remove reinterpret_cast uses that result in undefined behaviour. Use a bitcast instead. See https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_accessibility Closes #1439 * fix clang format --------- Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `901e5f1540`]	2024-08-07 11:49:02 -07:00
Juan Manuel Martinez Caamaño	e539c37e7d	Add missing constexpr to if conditions (#1444 ) [ROCm/composable_kernel commit: `fd9ef4e678`]	2024-08-06 11:40:34 -07:00
jakpiase	e8ee8856fa	Fix for beta!=0 in reduce (#1440 ) * fix for beta!=0 in reduce * add reviewers suggestions [ROCm/composable_kernel commit: `b74d4d4d54`]	2024-08-06 09:10:39 -07:00
Bartłomiej Kocot	69a6b563f9	Add Grouped Conv Fwd Large Tensor kernel (#1432 ) * Support 64 bit indexing * Add new grouped conv fwd kernel for large tensors * Add instances large tensor * Fixes for transform conv to gemm * Fixes * fixes * Remove not needed instances * examples fixes * Remove not need ds arrays * Fix tests * Add 2GB check in gridwise dl * Fixes [ROCm/composable_kernel commit: `4ec5c52a0c`]	2024-08-06 10:06:10 +02:00
arai713	735984bb5a	Codegen: isSupportedArgument check (#1417 ) * added isSupportedArgument check into codegen device op * adding function call * remove commented code [ROCm/composable_kernel commit: `d32997a792`]	2024-07-31 07:12:15 -07:00
Bartłomiej Kocot	1567614d80	Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) (#1415 ) [ROCm/composable_kernel commit: `33b399cc15`]	2024-07-30 18:36:04 +02:00
zjing14	a94e87d868	Add rotating buff for gemm_multi_d (#1411 ) * add rotating_buff for gemm_multi_d * format * Update flush_cache.hpp * Update gtest.cmake --------- Co-authored-by: Jing Zhang <jizhan@fb.com> Co-authored-by: Haocong WANG <haocwang@amd.com> [ROCm/composable_kernel commit: `105bd708c7`]	2024-07-25 23:21:21 +08:00
Andriy Roshchenko	e3b469a493	Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. (#1412 ) * Add CMakePresets configurations. * Add binary elementwise ConvScaleAdd and an example. * Numerical verification of results. Observed significant irregularities in F8 to F32 type conversions: ```log ConvScaleAdd: float=145.000000 f8_t=160.000000 e=144.000000 ConvScaleAdd: float=97.000000 f8_t=96.000000 e=104.000000 ConvScaleAdd: float=65.000000 f8_t=64.000000 e=72.000000 ``` * Implemented ConvScaleAdd + Example. * Add ConvScale+Bias Instances * Add Client Example for ConvScale+Bias * Fix number of bytes in an example.. * Cleanup. [ROCm/composable_kernel commit: `4a8a1befd5`]	2024-07-24 15:49:55 -05:00
Bartłomiej Kocot	1f93d3f961	Add support for half_t and bfloat to reduction operations (#1395 ) * Add support for half_t and bfloat to reduction operations * Fix bhalf convert * Next fix bf16 [ROCm/composable_kernel commit: `ffabd70a15`]	2024-07-24 12:12:37 -05:00
Bartłomiej Kocot	b23a3fcf77	Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) [ROCm/composable_kernel commit: `5d8c3d8190`]	2024-07-22 14:21:24 +02:00
Haocong WANG	a0e0f3cdcc	[GEMM] F8 GEMM, performance optimized. (#1384 ) * add ab_scale init support * enabled interwave * add scale type; update isSupport * adjust example * clean * enable f8 pure gemm rcr ckprofiler * Add gemm_multiply_multiply instances * clang format * Optimize for ScaleBlockMNK=128 * enable abscale f8 gemm ck profiler * Add pure f8 gemm test suite * Reverting to the state of project at `f60fd77` * update copyright * clang format * update copyright --------- Co-authored-by: root <jizhan@amd.com> [ROCm/composable_kernel commit: `8c90f25be3`]	2024-07-19 22:06:52 +08:00

1 2 3 4 5 ...

499 Commits