composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-14 02:02:46 +00:00

Author	SHA1	Message	Date
dependabot[bot]	f86faf86f0	Bump rocm-docs-core from 1.7.1 to 1.7.2 in /docs/sphinx (#1479 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.7.1 to 1.7.2. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.7.1...v1.7.2) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> [ROCm/composable_kernel commit: `0d9bf9f154`]	2024-08-21 22:40:49 -07:00
Illia Silin	caf5f89ae3	fix the build errors with clang20 (#1478 ) [ROCm/composable_kernel commit: `1925b322eb`]	2024-08-21 21:29:48 -07:00
Andriy Roshchenko	f6c6819b47	Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. (#1473 ) * Enable CMakePresets build * Verify Convolution, Scaling and ReLU algorithms. * Add tensor element-wise scale and type cast operation. * Reduction implemented but does not work. * Exploration of Reduction functionality. * Completed example for Convolution scaled with ReLu activation and AMAX reduction. * WIP: Add required instances for convolution. * WIP: Create client example. Implement convolution stage. * Add elementwise instances. * Add elementwise scale + convert example. * Add reduction instances. * WIP: Client example for AMAX reduction. * WIP: Add instances for multistage reduction. * WIP: Implementation of multistage reduction. * Refactoring. * Clean up. * Add CMakePresets.json * Guard off FP8 instances when the data type is not available. * Add example for Scaled FP8 Convolution with AMAX reduction. * Refactor CombConvScaleRelu instances. * Add CombConvScale instances. * Add client example for Scaled FP8 Convolution with AMAX reduction. * Cleanup. [ROCm/composable_kernel commit: `c3515f277c`]	2024-08-21 15:22:41 -07:00
Rostyslav Geyyer	0ab95a332e	Set RNE fp8 conversion as a default (#1458 ) * Set RNE fp8 conversion as a default * Update f8 tests * Disable failing test on gfx11 * Update bf8 tests * Add a flag * Fix the flag * Raise flag for gfx10 as well * Temp commit for tolerance testing * Update tolerances [ROCm/composable_kernel commit: `e20f20efbf`]	2024-08-21 09:09:48 -07:00
Bartłomiej Kocot	c7b7771a91	Convert MIOpen driver to ckProfiler script typos fix (#1476 ) [ROCm/composable_kernel commit: `dc82daa86e`]	2024-08-20 19:04:14 +02:00
Andriy Roshchenko	10edb0c70e	Adding Instances and Examples for FP8-based Scaled Convolution with ReLU Activation and AMAX Reduction. (#1469 ) * Enable CMakePresets build * Verify Convolution, Scaling and ReLU algorithms. * Add tensor element-wise scale and type cast operation. * Reduction implemented but does not work. * Exploration of Reduction functionality. * Completed example for Convolution scaled with ReLu activation and AMAX reduction. * WIP: Add required instances for convolution. * WIP: Create client example. Implement convolution stage. * Add elementwise instances. * Add elementwise scale + convert example. * Add reduction instances. * WIP: Client example for AMAX reduction. * WIP: Add instances for multistage reduction. * WIP: Implementation of multistage reduction. * Refactoring. * Clean up. * Guard off FP8 instances when the data type is not available. * Improve output readability. * Addressing reviewer's comments. [ROCm/composable_kernel commit: `a94113a941`]	2024-08-20 10:30:56 -05:00
dependabot[bot]	2c7d3a1c22	Bump rocm-docs-core from 1.7.0 to 1.7.1 in /docs/sphinx (#1475 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.7.0 to 1.7.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.7.0...v1.7.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> [ROCm/composable_kernel commit: `f48529b511`]	2024-08-19 23:02:07 -07:00
Bartłomiej Kocot	6e6277ca02	Add script to convert MIOpen driver to ckProfiler (#1472 ) * Add script to convert MIOpen driver to ckProfiler * Fix [ROCm/composable_kernel commit: `a6a7966505`]	2024-08-19 08:24:56 -07:00
Illia Silin	ad65d8d5b0	Re-enable fp8 types for all architectures. (#1470 ) * re-enable fp8 and bf8 for all targets * restore the fp8 gemm instances * re-enable conv_3d fp8 on all architectures * diasble several fp8 gemm instances on all architectures except gfx94 * clang format fix [ROCm/composable_kernel commit: `c8b6b64240`]	2024-08-16 16:07:52 -06:00
Dan Yao	14402bb211	[CK_TILE] FA bwd kernels optimization (#1397 ) * tmp save * fix batch deterministic bugs * fix group deterministic bugs * codegen update * reorder files * bias support * hd256 bias support * bwd smoke test update * simplify convert dq * fix hd256 dropout scratch * do{}while() -> while(){} * comments * remove FmhaBwdTilePartitioner * save clear_tile * refactor dropout * code cleanup * code cleanup * comments * fix epilogue problem * fix fwd dropout * group convert_dq opt * fix dq alignment * Do not store storerandval in bwd for flash attention integration * fix hd32 error and boost performance * revert * Remove duplicated WarpGemm definitions in the policy file * dropout patch for mrepeat 1616 code sync up * dq_acc stride * dq_acc stride stuff * codegen update * fwd dropout revert * fix hd128 scratches and boost performance * receipt 3 for simplified smoke test * more strides for fa integration * fix hd64 scratches and boost performance * non-iglp pipeline for headdim padding cases * dpad same as dvpad for flash attention integration * unpadded lse&d for group mode * Support unpad layout for group lse * Support unpad lse layout for splitkv * Fix stride for splitkv kernel * fix unpadded lse issue in fwd splitkv * comment * solve lds read&write conflicts * rename * bias rename * tile index revert --------- Co-authored-by: danyao12 <danyao12> Co-authored-by: rocking <ChunYu.Lai@amd.com> Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com> [ROCm/composable_kernel commit: `79a5d9c10c`]	2024-08-16 13:40:10 -07:00
Bartłomiej Kocot	dffd5eacc0	Add performance and large tensor tests for grouped conv (#1456 ) * Add performance and large tensor tests for grouped conv * Resize tests * Resize tests * update the python script to parse the grouped_conv results * Remove int8 tests * change bwd wei layout --------- Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `2581727d2a`]	2024-08-16 07:48:30 -07:00
dependabot[bot]	e77e18da19	Bump rocm-docs-core from 1.6.2 to 1.7.0 in /docs/sphinx (#1467 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.2 to 1.7.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.2...v1.7.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> [ROCm/composable_kernel commit: `76bd0af6af`]	2024-08-15 13:59:40 -07:00
trixirt	1eeb32a64d	Check compiler flags before using (#1403 ) * Check compiler flags before using The user's compiler may not support these flags, so check. Resolves failures on Fedora. Signed-off-by: Tom Rix <trix@redhat.com> * fix syntax CMakeLists.txt Fix syntax in the check_cxx_compiler_flag. --------- Signed-off-by: Tom Rix <trix@redhat.com> Co-authored-by: Tom Rix <trix@redhat.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `49769ec889`]	2024-08-14 20:43:10 -07:00
Haocong WANG	65d6442b4c	[GEMM] gemm_universal related optimization (#1453 ) * replace buffer_atomic with global_atomic * fixed global_atomic_add * added bf16 atomic_add * format * clang-format-12 * clean * clean * add guards * Update gtest.cmake * enabled splitk_gemm_multi_d * format * add ckProfiler * format * fixed naming * format * clean * clean * add guards * fix clang format * format * add kbatch printout * clean * Add rocm6.2 related gemm optimization * Limit bf16 atomic usage * remove redundant RCR gemm_universal instance * Add RRR fp8 gemm universal instance * Bug fix * Add GPU_TARGET guard to FP8/BF8 target * bug fix * update cmake * remove all fp8/bf8 example if arch not support * Enable fp8 RRR support in ckProfiler * limit greedy-reverse flag to gemm_universal in ckProfiler --------- Co-authored-by: Jing Zhang <jizhan@fb.com> Co-authored-by: Jing Zhang <jizhan@meta.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `3049b5467c`]	2024-08-14 10:42:30 +08:00
AngryLoki	6a4b36d948	Fix compilation errors with libc++ (#1461 ) This fixes 2 issues when compiled with libc++. First issue is attempt to call std::numeric_limits<ranges::range_value_t<_Float16>>::min(). _Float16 is extension of libstdc++, it does not exist in C++ standard[2]. Luckily, there is NumericLimits class in composable_kernel, which does everything needed. Second issue with call to 'check_err' is ambiguous: there are 2 candidates. It happens because composable_kernel relies on idea that f8_t (defined as _BitInt(8)) does not pass is_integral trait. However, libc++ treats _BitInt(N) as integral (per standard "any implementation-defined extended integer types" can be integral). Closes: #1460 Signed-off-by: Sv. Lockal <lockalsash@gmail.com> [ROCm/composable_kernel commit: `50c423481b`]	2024-08-13 14:31:15 -05:00
Mateusz Ozga	7a4690b077	Support large: 12d tensor size for reduction kenrel (#1465 ) [ROCm/composable_kernel commit: `0606e5498e`]	2024-08-13 16:15:47 +02:00
Illia Silin	92df7893df	Disable inapplicable xdl and mha instances for gfx12 (#1464 ) [ROCm/composable_kernel commit: `cbb6f2ab8c`]	2024-08-12 15:11:58 -07:00
Mateusz Ozga	b7b9eb73c7	Rewrite sh reduce unit tests to gtest: part 1 (#1407 ) Rewrite .sh test to Gtest * review chnages * Removew unused comments * Review v2 * Typo * Separete UT: AMAX, MAX, MIN; added template params to trigger them * Update test/reduce/reduce_no_index.cpp --------- Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> [ROCm/composable_kernel commit: `ab60b390f8`]	2024-08-12 16:28:10 +02:00
Bartłomiej Kocot	15ab8b0d5c	Fix bug with n block id calculation in DeviceGroupedConvXdlCShuffle (#1457 ) * Fix typo in TransformConvFwdToGemm * Fix bug in n offset calculation [ROCm/composable_kernel commit: `4a870942e6`]	2024-08-10 13:12:05 +02:00
arai713	ab0829d8bd	Codegen build w/CK (#1428 ) * initial push * cleaned up compiler errors * removed commented code * build codegen folder only for gfx9 targets * remove separate stage for codegen tests from CI * removed commented code from CMake --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `da214a5a58`]	2024-08-09 08:15:06 -07:00
Jun Liu	254a7dadb6	Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) (#1415 )" (#1455 ) This reverts commit 0c367d5912486f4fcbae1dbb861a1fb8176ca308. [ROCm/composable_kernel commit: `5ff8eeebf9`]	2024-08-08 19:09:33 -07:00
Illia Silin	bfb128a8cf	Enable CI on gfx12. (#1454 ) * enable CI build and test on gfx1201 * skip DL kernels in CI for gfx12 * only run CI on gfx12 if rocm version >= 6.2 * remove the rocm version check for CI on gfx12 * add a switch for CI builds on gfx12 [ROCm/composable_kernel commit: `4a5ab67871`]	2024-08-08 16:29:15 -07:00
Illia Silin	9e9b3d563b	check if the coerce-illegal-types flag is supported (#1451 ) [ROCm/composable_kernel commit: `ae3b8ff86c`]	2024-08-08 07:29:29 -07:00
Illia Silin	38fe3e7936	add rocm-llvm-dev package to docker image (#1452 ) [ROCm/composable_kernel commit: `8a75728406`]	2024-08-08 07:29:13 -07:00
Juan Manuel Martinez Caamaño	61ecdbc128	Remove reinterpret_cast uses that result in undefined behaviour. (#1445 ) * Remove reinterpret_cast uses that result in undefined behaviour. Use a bitcast instead. See https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_accessibility Closes #1439 * fix clang format --------- Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `901e5f1540`]	2024-08-07 11:49:02 -07:00
Illia Silin	760fcb96f3	upgrade to rocm6.2 as new default compiler (#1448 ) [ROCm/composable_kernel commit: `5df10432d8`]	2024-08-07 09:38:43 -07:00
dependabot[bot]	5d69137f37	Bump rocm-docs-core from 1.6.1 to 1.6.2 in /docs/sphinx (#1449 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.1 to 1.6.2. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.1...v1.6.2) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> [ROCm/composable_kernel commit: `a71d407e35`]	2024-08-07 08:22:38 -07:00
Illia Silin	e377ca404b	Run CK_TILE FMHA benchmarks and collect the performance data. (#1447 ) * run ck_tile benchmarks after the smoke tests and store logs * change the path of fmha benchmark logs * change the way of stashig ck_tile fmha logs * prevent the errors in stages where no logs are generated * fix the ck_tile fmha log names and headers * generate the fmha performance logs in the root folder * change jenkins scrip arguments format * use exact file names for stashing * modify scripts to process FMHA performance results * unstash FMHA logs before parsing them [ROCm/composable_kernel commit: `12c1f68dd9`]	2024-08-07 08:18:26 -07:00
Max Podkorytov	f4b5582b2a	modify python wrapper for addmm (#1441 ) [ROCm/composable_kernel commit: `886d14ccb2`]	2024-08-06 15:09:27 -07:00
Haocong WANG	78bf13c11a	Limit fp8only operator build arch in ckProfiler (#1443 ) [ROCm/composable_kernel commit: `6fc7bff58f`]	2024-08-06 14:29:14 -07:00
Jun Liu	37921efb24	Fix ROCm 6.2 compiler not fully supporting gfx12 when building CK with INSTANCES_ONLY (#1446 ) [ROCm/composable_kernel commit: `afbf6350f3`]	2024-08-06 13:06:53 -07:00
Juan Manuel Martinez Caamaño	e539c37e7d	Add missing constexpr to if conditions (#1444 ) [ROCm/composable_kernel commit: `fd9ef4e678`]	2024-08-06 11:40:34 -07:00
bibek	c8c3293b0b	adding mha as static lib (#1366 ) * adding mha as static lib * add fmha fwd compile options * typo * fix python version * python version to 3 * increase path length * add max path flag in mha cmake * fix long path issue * mha currently only runs in gfx94x * only buld mha in mi300 * populate gpu_list * add mha compile flags * avoid building mha in gpu other then gfx94x * some comments and include ck_tile in rocm * use rocm_install * place ck_tile in include * correct ck_tile path --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `840c5397bb`]	2024-08-06 11:17:10 -05:00
jakpiase	e8ee8856fa	Fix for beta!=0 in reduce (#1440 ) * fix for beta!=0 in reduce * add reviewers suggestions [ROCm/composable_kernel commit: `b74d4d4d54`]	2024-08-06 09:10:39 -07:00
Bartłomiej Kocot	69a6b563f9	Add Grouped Conv Fwd Large Tensor kernel (#1432 ) * Support 64 bit indexing * Add new grouped conv fwd kernel for large tensors * Add instances large tensor * Fixes for transform conv to gemm * Fixes * fixes * Remove not needed instances * examples fixes * Remove not need ds arrays * Fix tests * Add 2GB check in gridwise dl * Fixes [ROCm/composable_kernel commit: `4ec5c52a0c`]	2024-08-06 10:06:10 +02:00
Illia Silin	8f71de4707	add --offload-compress compiler flag (#1433 ) * add --offload-compress compiler flag * only apply the --offload-compress flag to the ckProfiler * move the --offload-compress flag back to main cmake file * add offload-compress to target compile option of ckProfiler --------- Co-authored-by: carlushuang <carlus.huang@amd.com> [ROCm/composable_kernel commit: `7f57b2e02c`]	2024-08-05 23:26:01 +08:00
Illia Silin	1a8f8fce5b	[CI][Jenkins] delete CI docker container upon exit (#1437 ) [ROCm/composable_kernel commit: `f31ba04afc`]	2024-08-05 08:13:56 -07:00
Illia Silin	3d3819e0b3	Add compiler flags for ROCm versions 6.2+ (#1429 ) * add compiler flags to fix compiler issues * fix typo. * disable test_smfmac_op on all devices except gfx942 * specify full path to compiler in CI [ROCm/composable_kernel commit: `d311c95396`]	2024-08-01 08:27:52 -07:00
Sam Wu	604152a68b	Update doc requirements (#1423 ) [ROCm/composable_kernel commit: `6648fd3b04`]	2024-07-31 07:42:42 -07:00
zjing14	807edd542a	[HotFix] Fixed a typo in profile_gemm_multiply_multiply (#1425 ) * fixed a typo * clean --------- Co-authored-by: Jing Zhang <jizhan@fb.com> [ROCm/composable_kernel commit: `f31e8dfa80`]	2024-07-31 07:19:17 -07:00
arai713	735984bb5a	Codegen: isSupportedArgument check (#1417 ) * added isSupportedArgument check into codegen device op * adding function call * remove commented code [ROCm/composable_kernel commit: `d32997a792`]	2024-07-31 07:12:15 -07:00
carlushuang	cecee51c65	workaround rocm-6.2 compiler issue (#1421 ) [ROCm/composable_kernel commit: `b3f86e79dd`]	2024-07-31 16:03:59 +08:00
Illia Silin	4e86ab9f21	add docker for rocm6.2_rc4 compiler (#1424 ) [ROCm/composable_kernel commit: `b527cad4a5`]	2024-07-30 11:55:33 -07:00
Bartłomiej Kocot	1567614d80	Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) (#1415 ) [ROCm/composable_kernel commit: `33b399cc15`]	2024-07-30 18:36:04 +02:00
dependabot[bot]	9ab0227208	Bump rocm-docs-core from 1.6.0 to 1.6.1 in /docs/sphinx (#1420 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.0 to 1.6.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.0...v1.6.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> [ROCm/composable_kernel commit: `b9ba5b2676`]	2024-07-26 14:47:19 -07:00
trixirt	9348857732	Introduce cmake USE_GLIBCXX_ASSERTIONS option (#1404 ) A standard option in Fedora packaging that is used to check the correctness of c++ use of the standard c++ library. Signed-off-by: Tom Rix <trix@redhat.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `733f33af78`]	2024-07-25 19:28:17 -07:00
zjing14	a94e87d868	Add rotating buff for gemm_multi_d (#1411 ) * add rotating_buff for gemm_multi_d * format * Update flush_cache.hpp * Update gtest.cmake --------- Co-authored-by: Jing Zhang <jizhan@fb.com> Co-authored-by: Haocong WANG <haocwang@amd.com> [ROCm/composable_kernel commit: `105bd708c7`]	2024-07-25 23:21:21 +08:00
dependabot[bot]	0686a1b400	Bump rocm-docs-core from 1.5.1 to 1.6.0 in /docs/sphinx (#1416 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.5.1 to 1.6.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.5.1...v1.6.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> [ROCm/composable_kernel commit: `1208082e53`]	2024-07-24 22:56:29 -07:00
Andriy Roshchenko	e3b469a493	Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. (#1412 ) * Add CMakePresets configurations. * Add binary elementwise ConvScaleAdd and an example. * Numerical verification of results. Observed significant irregularities in F8 to F32 type conversions: ```log ConvScaleAdd: float=145.000000 f8_t=160.000000 e=144.000000 ConvScaleAdd: float=97.000000 f8_t=96.000000 e=104.000000 ConvScaleAdd: float=65.000000 f8_t=64.000000 e=72.000000 ``` * Implemented ConvScaleAdd + Example. * Add ConvScale+Bias Instances * Add Client Example for ConvScale+Bias * Fix number of bytes in an example.. * Cleanup. [ROCm/composable_kernel commit: `4a8a1befd5`]	2024-07-24 15:49:55 -05:00
Bartłomiej Kocot	1f93d3f961	Add support for half_t and bfloat to reduction operations (#1395 ) * Add support for half_t and bfloat to reduction operations * Fix bhalf convert * Next fix bf16 [ROCm/composable_kernel commit: `ffabd70a15`]	2024-07-24 12:12:37 -05:00

1 2 3 4 5 ...

1379 Commits