composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-11 17:00:18 +00:00

Author	SHA1	Message	Date
Illia Silin	0056e0bf4b	disable bad fp8 test on gfx12 (#1481 )	2024-08-22 15:05:20 -07:00
Illia Silin	d3fa00f14c	disabel codegen tests when INSTANCES_ONLY is set (#1480 )	2024-08-22 09:50:17 -07:00
arai713	967b1f0fda	Codegen INSTANCES_ONLY build (#1468 ) * initial push - altering codegen build * fix the codegen cmake * enable codegen build for gfx908 and gfx90a * enable building codegen with INSTANCES_ONLY=ON * updating ck_rtc * remove gpu targets for codegen and rename tests * make codegen tests dependencies of tests and check targets --------- Co-authored-by: illsilin <Illia.Silin@amd.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2024-08-22 07:24:55 -07:00
dependabot[bot]	0d9bf9f154	Bump rocm-docs-core from 1.7.1 to 1.7.2 in /docs/sphinx (#1479 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.7.1 to 1.7.2. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.7.1...v1.7.2) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-21 22:40:49 -07:00
Illia Silin	1925b322eb	fix the build errors with clang20 (#1478 )	2024-08-21 21:29:48 -07:00
Andriy Roshchenko	c3515f277c	Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. (#1473 ) * Enable CMakePresets build * Verify Convolution, Scaling and ReLU algorithms. * Add tensor element-wise scale and type cast operation. * Reduction implemented but does not work. * Exploration of Reduction functionality. * Completed example for Convolution scaled with ReLu activation and AMAX reduction. * WIP: Add required instances for convolution. * WIP: Create client example. Implement convolution stage. * Add elementwise instances. * Add elementwise scale + convert example. * Add reduction instances. * WIP: Client example for AMAX reduction. * WIP: Add instances for multistage reduction. * WIP: Implementation of multistage reduction. * Refactoring. * Clean up. * Add CMakePresets.json * Guard off FP8 instances when the data type is not available. * Add example for Scaled FP8 Convolution with AMAX reduction. * Refactor CombConvScaleRelu instances. * Add CombConvScale instances. * Add client example for Scaled FP8 Convolution with AMAX reduction. * Cleanup.	2024-08-21 15:22:41 -07:00
Rostyslav Geyyer	e20f20efbf	Set RNE fp8 conversion as a default (#1458 ) * Set RNE fp8 conversion as a default * Update f8 tests * Disable failing test on gfx11 * Update bf8 tests * Add a flag * Fix the flag * Raise flag for gfx10 as well * Temp commit for tolerance testing * Update tolerances	2024-08-21 09:09:48 -07:00
Bartłomiej Kocot	dc82daa86e	Convert MIOpen driver to ckProfiler script typos fix (#1476 )	2024-08-20 19:04:14 +02:00
Andriy Roshchenko	a94113a941	Adding Instances and Examples for FP8-based Scaled Convolution with ReLU Activation and AMAX Reduction. (#1469 ) * Enable CMakePresets build * Verify Convolution, Scaling and ReLU algorithms. * Add tensor element-wise scale and type cast operation. * Reduction implemented but does not work. * Exploration of Reduction functionality. * Completed example for Convolution scaled with ReLu activation and AMAX reduction. * WIP: Add required instances for convolution. * WIP: Create client example. Implement convolution stage. * Add elementwise instances. * Add elementwise scale + convert example. * Add reduction instances. * WIP: Client example for AMAX reduction. * WIP: Add instances for multistage reduction. * WIP: Implementation of multistage reduction. * Refactoring. * Clean up. * Guard off FP8 instances when the data type is not available. * Improve output readability. * Addressing reviewer's comments.	2024-08-20 10:30:56 -05:00
dependabot[bot]	f48529b511	Bump rocm-docs-core from 1.7.0 to 1.7.1 in /docs/sphinx (#1475 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.7.0 to 1.7.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.7.0...v1.7.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-19 23:02:07 -07:00
Bartłomiej Kocot	a6a7966505	Add script to convert MIOpen driver to ckProfiler (#1472 ) * Add script to convert MIOpen driver to ckProfiler * Fix	2024-08-19 08:24:56 -07:00
Illia Silin	c8b6b64240	Re-enable fp8 types for all architectures. (#1470 ) * re-enable fp8 and bf8 for all targets * restore the fp8 gemm instances * re-enable conv_3d fp8 on all architectures * diasble several fp8 gemm instances on all architectures except gfx94 * clang format fix	2024-08-16 16:07:52 -06:00
Dan Yao	79a5d9c10c	[CK_TILE] FA bwd kernels optimization (#1397 ) * tmp save * fix batch deterministic bugs * fix group deterministic bugs * codegen update * reorder files * bias support * hd256 bias support * bwd smoke test update * simplify convert dq * fix hd256 dropout scratch * do{}while() -> while(){} * comments * remove FmhaBwdTilePartitioner * save clear_tile * refactor dropout * code cleanup * code cleanup * comments * fix epilogue problem * fix fwd dropout * group convert_dq opt * fix dq alignment * Do not store storerandval in bwd for flash attention integration * fix hd32 error and boost performance * revert * Remove duplicated WarpGemm definitions in the policy file * dropout patch for mrepeat 1616 code sync up * dq_acc stride * dq_acc stride stuff * codegen update * fwd dropout revert * fix hd128 scratches and boost performance * receipt 3 for simplified smoke test * more strides for fa integration * fix hd64 scratches and boost performance * non-iglp pipeline for headdim padding cases * dpad same as dvpad for flash attention integration * unpadded lse&d for group mode * Support unpad layout for group lse * Support unpad lse layout for splitkv * Fix stride for splitkv kernel * fix unpadded lse issue in fwd splitkv * comment * solve lds read&write conflicts * rename * bias rename * tile index revert --------- Co-authored-by: danyao12 <danyao12> Co-authored-by: rocking <ChunYu.Lai@amd.com> Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>	2024-08-16 13:40:10 -07:00
Bartłomiej Kocot	2581727d2a	Add performance and large tensor tests for grouped conv (#1456 ) * Add performance and large tensor tests for grouped conv * Resize tests * Resize tests * update the python script to parse the grouped_conv results * Remove int8 tests * change bwd wei layout --------- Co-authored-by: illsilin <Illia.Silin@amd.com>	2024-08-16 07:48:30 -07:00
dependabot[bot]	76bd0af6af	Bump rocm-docs-core from 1.6.2 to 1.7.0 in /docs/sphinx (#1467 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.2 to 1.7.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.2...v1.7.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-15 13:59:40 -07:00
trixirt	49769ec889	Check compiler flags before using (#1403 ) * Check compiler flags before using The user's compiler may not support these flags, so check. Resolves failures on Fedora. Signed-off-by: Tom Rix <trix@redhat.com> * fix syntax CMakeLists.txt Fix syntax in the check_cxx_compiler_flag. --------- Signed-off-by: Tom Rix <trix@redhat.com> Co-authored-by: Tom Rix <trix@redhat.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2024-08-14 20:43:10 -07:00
Haocong WANG	3049b5467c	[GEMM] gemm_universal related optimization (#1453 ) * replace buffer_atomic with global_atomic * fixed global_atomic_add * added bf16 atomic_add * format * clang-format-12 * clean * clean * add guards * Update gtest.cmake * enabled splitk_gemm_multi_d * format * add ckProfiler * format * fixed naming * format * clean * clean * add guards * fix clang format * format * add kbatch printout * clean * Add rocm6.2 related gemm optimization * Limit bf16 atomic usage * remove redundant RCR gemm_universal instance * Add RRR fp8 gemm universal instance * Bug fix * Add GPU_TARGET guard to FP8/BF8 target * bug fix * update cmake * remove all fp8/bf8 example if arch not support * Enable fp8 RRR support in ckProfiler * limit greedy-reverse flag to gemm_universal in ckProfiler --------- Co-authored-by: Jing Zhang <jizhan@fb.com> Co-authored-by: Jing Zhang <jizhan@meta.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com>	2024-08-14 10:42:30 +08:00
AngryLoki	50c423481b	Fix compilation errors with libc++ (#1461 ) This fixes 2 issues when compiled with libc++. First issue is attempt to call std::numeric_limits<ranges::range_value_t<_Float16>>::min(). _Float16 is extension of libstdc++, it does not exist in C++ standard[2]. Luckily, there is NumericLimits class in composable_kernel, which does everything needed. Second issue with call to 'check_err' is ambiguous: there are 2 candidates. It happens because composable_kernel relies on idea that f8_t (defined as _BitInt(8)) does not pass is_integral trait. However, libc++ treats _BitInt(N) as integral (per standard "any implementation-defined extended integer types" can be integral). Closes: #1460 Signed-off-by: Sv. Lockal <lockalsash@gmail.com>	2024-08-13 14:31:15 -05:00
Mateusz Ozga	0606e5498e	Support large: 12d tensor size for reduction kenrel (#1465 )	2024-08-13 16:15:47 +02:00
Illia Silin	cbb6f2ab8c	Disable inapplicable xdl and mha instances for gfx12 (#1464 )	2024-08-12 15:11:58 -07:00
Mateusz Ozga	ab60b390f8	Rewrite sh reduce unit tests to gtest: part 1 (#1407 ) Rewrite .sh test to Gtest * review chnages * Removew unused comments * Review v2 * Typo * Separete UT: AMAX, MAX, MIN; added template params to trigger them * Update test/reduce/reduce_no_index.cpp --------- Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>	2024-08-12 16:28:10 +02:00
Bartłomiej Kocot	4a870942e6	Fix bug with n block id calculation in DeviceGroupedConvXdlCShuffle (#1457 ) * Fix typo in TransformConvFwdToGemm * Fix bug in n offset calculation	2024-08-10 13:12:05 +02:00
arai713	da214a5a58	Codegen build w/CK (#1428 ) * initial push * cleaned up compiler errors * removed commented code * build codegen folder only for gfx9 targets * remove separate stage for codegen tests from CI * removed commented code from CMake --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com>	2024-08-09 08:15:06 -07:00
Jun Liu	5ff8eeebf9	Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) (#1415 )" (#1455 ) This reverts commit `33b399cc15`.	2024-08-08 19:09:33 -07:00
Illia Silin	4a5ab67871	Enable CI on gfx12. (#1454 ) * enable CI build and test on gfx1201 * skip DL kernels in CI for gfx12 * only run CI on gfx12 if rocm version >= 6.2 * remove the rocm version check for CI on gfx12 * add a switch for CI builds on gfx12	2024-08-08 16:29:15 -07:00
Illia Silin	ae3b8ff86c	check if the coerce-illegal-types flag is supported (#1451 )	2024-08-08 07:29:29 -07:00
Illia Silin	8a75728406	add rocm-llvm-dev package to docker image (#1452 )	2024-08-08 07:29:13 -07:00
Juan Manuel Martinez Caamaño	901e5f1540	Remove reinterpret_cast uses that result in undefined behaviour. (#1445 ) * Remove reinterpret_cast uses that result in undefined behaviour. Use a bitcast instead. See https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_accessibility Closes #1439 * fix clang format --------- Co-authored-by: illsilin <Illia.Silin@amd.com>	2024-08-07 11:49:02 -07:00
Illia Silin	5df10432d8	upgrade to rocm6.2 as new default compiler (#1448 )	2024-08-07 09:38:43 -07:00
dependabot[bot]	a71d407e35	Bump rocm-docs-core from 1.6.1 to 1.6.2 in /docs/sphinx (#1449 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.1 to 1.6.2. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.1...v1.6.2) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-07 08:22:38 -07:00
Illia Silin	12c1f68dd9	Run CK_TILE FMHA benchmarks and collect the performance data. (#1447 ) * run ck_tile benchmarks after the smoke tests and store logs * change the path of fmha benchmark logs * change the way of stashig ck_tile fmha logs * prevent the errors in stages where no logs are generated * fix the ck_tile fmha log names and headers * generate the fmha performance logs in the root folder * change jenkins scrip arguments format * use exact file names for stashing * modify scripts to process FMHA performance results * unstash FMHA logs before parsing them	2024-08-07 08:18:26 -07:00
Max Podkorytov	886d14ccb2	modify python wrapper for addmm (#1441 )	2024-08-06 15:09:27 -07:00
Haocong WANG	6fc7bff58f	Limit fp8only operator build arch in ckProfiler (#1443 )	2024-08-06 14:29:14 -07:00
Jun Liu	afbf6350f3	Fix ROCm 6.2 compiler not fully supporting gfx12 when building CK with INSTANCES_ONLY (#1446 )	2024-08-06 13:06:53 -07:00
Juan Manuel Martinez Caamaño	fd9ef4e678	Add missing constexpr to if conditions (#1444 )	2024-08-06 11:40:34 -07:00
bibek	840c5397bb	adding mha as static lib (#1366 ) * adding mha as static lib * add fmha fwd compile options * typo * fix python version * python version to 3 * increase path length * add max path flag in mha cmake * fix long path issue * mha currently only runs in gfx94x * only buld mha in mi300 * populate gpu_list * add mha compile flags * avoid building mha in gpu other then gfx94x * some comments and include ck_tile in rocm * use rocm_install * place ck_tile in include * correct ck_tile path --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2024-08-06 11:17:10 -05:00
jakpiase	b74d4d4d54	Fix for beta!=0 in reduce (#1440 ) * fix for beta!=0 in reduce * add reviewers suggestions	2024-08-06 09:10:39 -07:00
Bartłomiej Kocot	4ec5c52a0c	Add Grouped Conv Fwd Large Tensor kernel (#1432 ) * Support 64 bit indexing * Add new grouped conv fwd kernel for large tensors * Add instances large tensor * Fixes for transform conv to gemm * Fixes * fixes * Remove not needed instances * examples fixes * Remove not need ds arrays * Fix tests * Add 2GB check in gridwise dl * Fixes	2024-08-06 10:06:10 +02:00
Illia Silin	7f57b2e02c	add --offload-compress compiler flag (#1433 ) * add --offload-compress compiler flag * only apply the --offload-compress flag to the ckProfiler * move the --offload-compress flag back to main cmake file * add offload-compress to target compile option of ckProfiler --------- Co-authored-by: carlushuang <carlus.huang@amd.com>	2024-08-05 23:26:01 +08:00
Illia Silin	f31ba04afc	[CI][Jenkins] delete CI docker container upon exit (#1437 )	2024-08-05 08:13:56 -07:00
Illia Silin	d311c95396	Add compiler flags for ROCm versions 6.2+ (#1429 ) * add compiler flags to fix compiler issues * fix typo. * disable test_smfmac_op on all devices except gfx942 * specify full path to compiler in CI	2024-08-01 08:27:52 -07:00
Sam Wu	6648fd3b04	Update doc requirements (#1423 )	2024-07-31 07:42:42 -07:00
zjing14	f31e8dfa80	[HotFix] Fixed a typo in profile_gemm_multiply_multiply (#1425 ) * fixed a typo * clean --------- Co-authored-by: Jing Zhang <jizhan@fb.com>	2024-07-31 07:19:17 -07:00
arai713	d32997a792	Codegen: isSupportedArgument check (#1417 ) * added isSupportedArgument check into codegen device op * adding function call * remove commented code	2024-07-31 07:12:15 -07:00
carlushuang	b3f86e79dd	workaround rocm-6.2 compiler issue (#1421 )	2024-07-31 16:03:59 +08:00
Illia Silin	b527cad4a5	add docker for rocm6.2_rc4 compiler (#1424 )	2024-07-30 11:55:33 -07:00
Bartłomiej Kocot	33b399cc15	Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) (#1415 )	2024-07-30 18:36:04 +02:00
dependabot[bot]	b9ba5b2676	Bump rocm-docs-core from 1.6.0 to 1.6.1 in /docs/sphinx (#1420 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.0 to 1.6.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.0...v1.6.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-07-26 14:47:19 -07:00
trixirt	733f33af78	Introduce cmake USE_GLIBCXX_ASSERTIONS option (#1404 ) A standard option in Fedora packaging that is used to check the correctness of c++ use of the standard c++ library. Signed-off-by: Tom Rix <trix@redhat.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2024-07-25 19:28:17 -07:00
zjing14	105bd708c7	Add rotating buff for gemm_multi_d (#1411 ) * add rotating_buff for gemm_multi_d * format * Update flush_cache.hpp * Update gtest.cmake --------- Co-authored-by: Jing Zhang <jizhan@fb.com> Co-authored-by: Haocong WANG <haocwang@amd.com>	2024-07-25 23:21:21 +08:00

1 2 3 4 5 ...

1391 Commits