Illia Silin
0056e0bf4b
disable bad fp8 test on gfx12 ( #1481 )
2024-08-22 15:05:20 -07:00
Illia Silin
d3fa00f14c
disabel codegen tests when INSTANCES_ONLY is set ( #1480 )
2024-08-22 09:50:17 -07:00
arai713
967b1f0fda
Codegen INSTANCES_ONLY build ( #1468 )
...
* initial push - altering codegen build
* fix the codegen cmake
* enable codegen build for gfx908 and gfx90a
* enable building codegen with INSTANCES_ONLY=ON
* updating ck_rtc
* remove gpu targets for codegen and rename tests
* make codegen tests dependencies of tests and check targets
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2024-08-22 07:24:55 -07:00
dependabot[bot]
0d9bf9f154
Bump rocm-docs-core from 1.7.1 to 1.7.2 in /docs/sphinx ( #1479 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.7.1 to 1.7.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.7.1...v1.7.2 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-21 22:40:49 -07:00
Illia Silin
1925b322eb
fix the build errors with clang20 ( #1478 )
2024-08-21 21:29:48 -07:00
Andriy Roshchenko
c3515f277c
Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. ( #1473 )
...
* Enable CMakePresets build
* Verify Convolution, Scaling and ReLU algorithms.
* Add tensor element-wise scale and type cast operation.
* Reduction implemented but does not work.
* Exploration of Reduction functionality.
* Completed example for Convolution scaled with ReLu activation and AMAX reduction.
* WIP: Add required instances for convolution.
* WIP: Create client example. Implement convolution stage.
* Add elementwise instances.
* Add elementwise scale + convert example.
* Add reduction instances.
* WIP: Client example for AMAX reduction.
* WIP: Add instances for multistage reduction.
* WIP: Implementation of multistage reduction.
* Refactoring.
* Clean up.
* Add CMakePresets.json
* Guard off FP8 instances when the data type is not available.
* Add example for Scaled FP8 Convolution with AMAX reduction.
* Refactor CombConvScaleRelu instances.
* Add CombConvScale instances.
* Add client example for Scaled FP8 Convolution with AMAX reduction.
* Cleanup.
2024-08-21 15:22:41 -07:00
Rostyslav Geyyer
e20f20efbf
Set RNE fp8 conversion as a default ( #1458 )
...
* Set RNE fp8 conversion as a default
* Update f8 tests
* Disable failing test on gfx11
* Update bf8 tests
* Add a flag
* Fix the flag
* Raise flag for gfx10 as well
* Temp commit for tolerance testing
* Update tolerances
2024-08-21 09:09:48 -07:00
Bartłomiej Kocot
dc82daa86e
Convert MIOpen driver to ckProfiler script typos fix ( #1476 )
2024-08-20 19:04:14 +02:00
Andriy Roshchenko
a94113a941
Adding Instances and Examples for FP8-based Scaled Convolution with ReLU Activation and AMAX Reduction. ( #1469 )
...
* Enable CMakePresets build
* Verify Convolution, Scaling and ReLU algorithms.
* Add tensor element-wise scale and type cast operation.
* Reduction implemented but does not work.
* Exploration of Reduction functionality.
* Completed example for Convolution scaled with ReLu activation and AMAX reduction.
* WIP: Add required instances for convolution.
* WIP: Create client example. Implement convolution stage.
* Add elementwise instances.
* Add elementwise scale + convert example.
* Add reduction instances.
* WIP: Client example for AMAX reduction.
* WIP: Add instances for multistage reduction.
* WIP: Implementation of multistage reduction.
* Refactoring.
* Clean up.
* Guard off FP8 instances when the data type is not available.
* Improve output readability.
* Addressing reviewer's comments.
2024-08-20 10:30:56 -05:00
dependabot[bot]
f48529b511
Bump rocm-docs-core from 1.7.0 to 1.7.1 in /docs/sphinx ( #1475 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.7.0 to 1.7.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.7.0...v1.7.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-19 23:02:07 -07:00
Bartłomiej Kocot
a6a7966505
Add script to convert MIOpen driver to ckProfiler ( #1472 )
...
* Add script to convert MIOpen driver to ckProfiler
* Fix
2024-08-19 08:24:56 -07:00
Illia Silin
c8b6b64240
Re-enable fp8 types for all architectures. ( #1470 )
...
* re-enable fp8 and bf8 for all targets
* restore the fp8 gemm instances
* re-enable conv_3d fp8 on all architectures
* diasble several fp8 gemm instances on all architectures except gfx94
* clang format fix
2024-08-16 16:07:52 -06:00
Dan Yao
79a5d9c10c
[CK_TILE] FA bwd kernels optimization ( #1397 )
...
* tmp save
* fix batch deterministic bugs
* fix group deterministic bugs
* codegen update
* reorder files
* bias support
* hd256 bias support
* bwd smoke test update
* simplify convert dq
* fix hd256 dropout scratch
* do{}while() -> while(){}
* comments
* remove FmhaBwdTilePartitioner
* save clear_tile
* refactor dropout
* code cleanup
* code cleanup
* comments
* fix epilogue problem
* fix fwd dropout
* group convert_dq opt
* fix dq alignment
* Do not store storerandval in bwd for flash attention integration
* fix hd32 error and boost performance
* revert
* Remove duplicated WarpGemm definitions in the policy file
* dropout patch for mrepeat 16*16
* code sync up
* dq_acc stride
* dq_acc stride stuff
* codegen update
* fwd dropout revert
* fix hd128 scratches and boost performance
* receipt 3 for simplified smoke test
* more strides for fa integration
* fix hd64 scratches and boost performance
* non-iglp pipeline for headdim padding cases
* dpad same as dvpad for flash attention integration
* unpadded lse&d for group mode
* Support unpad layout for group lse
* Support unpad lse layout for splitkv
* Fix stride for splitkv kernel
* fix unpadded lse issue in fwd splitkv
* comment
* solve lds read&write conflicts
* rename
* bias rename
* tile index revert
---------
Co-authored-by: danyao12 <danyao12>
Co-authored-by: rocking <ChunYu.Lai@amd.com >
Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com >
2024-08-16 13:40:10 -07:00
Bartłomiej Kocot
2581727d2a
Add performance and large tensor tests for grouped conv ( #1456 )
...
* Add performance and large tensor tests for grouped conv
* Resize tests
* Resize tests
* update the python script to parse the grouped_conv results
* Remove int8 tests
* change bwd wei layout
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-08-16 07:48:30 -07:00
dependabot[bot]
76bd0af6af
Bump rocm-docs-core from 1.6.2 to 1.7.0 in /docs/sphinx ( #1467 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.6.2 to 1.7.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.2...v1.7.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-15 13:59:40 -07:00
trixirt
49769ec889
Check compiler flags before using ( #1403 )
...
* Check compiler flags before using
The user's compiler may not support these flags, so check.
Resolves failures on Fedora.
Signed-off-by: Tom Rix <trix@redhat.com >
* fix syntax CMakeLists.txt
Fix syntax in the check_cxx_compiler_flag.
---------
Signed-off-by: Tom Rix <trix@redhat.com >
Co-authored-by: Tom Rix <trix@redhat.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2024-08-14 20:43:10 -07:00
Haocong WANG
3049b5467c
[GEMM] gemm_universal related optimization ( #1453 )
...
* replace buffer_atomic with global_atomic
* fixed global_atomic_add
* added bf16 atomic_add
* format
* clang-format-12
* clean
* clean
* add guards
* Update gtest.cmake
* enabled splitk_gemm_multi_d
* format
* add ckProfiler
* format
* fixed naming
* format
* clean
* clean
* add guards
* fix clang format
* format
* add kbatch printout
* clean
* Add rocm6.2 related gemm optimization
* Limit bf16 atomic usage
* remove redundant RCR gemm_universal instance
* Add RRR fp8 gemm universal instance
* Bug fix
* Add GPU_TARGET guard to FP8/BF8 target
* bug fix
* update cmake
* remove all fp8/bf8 example if arch not support
* Enable fp8 RRR support in ckProfiler
* limit greedy-reverse flag to gemm_universal in ckProfiler
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
Co-authored-by: Jing Zhang <jizhan@meta.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-08-14 10:42:30 +08:00
AngryLoki
50c423481b
Fix compilation errors with libc++ ( #1461 )
...
This fixes 2 issues when compiled with libc++.
First issue is attempt to call std::numeric_limits<ranges::range_value_t<_Float16>>::min().
_Float16 is extension of libstdc++, it does not exist in C++ standard[2].
Luckily, there is NumericLimits class in composable_kernel, which does everything needed.
Second issue with call to 'check_err' is ambiguous: there are 2 candidates.
It happens because composable_kernel relies on idea that f8_t (defined as _BitInt(8)) does not pass is_integral trait.
However, libc++ treats _BitInt(N) as integral (per standard "any implementation-defined extended integer types" can be integral).
Closes : #1460
Signed-off-by: Sv. Lockal <lockalsash@gmail.com >
2024-08-13 14:31:15 -05:00
Mateusz Ozga
0606e5498e
Support large: 12d tensor size for reduction kenrel ( #1465 )
2024-08-13 16:15:47 +02:00
Illia Silin
cbb6f2ab8c
Disable inapplicable xdl and mha instances for gfx12 ( #1464 )
2024-08-12 15:11:58 -07:00
Mateusz Ozga
ab60b390f8
Rewrite *sh reduce unit tests to gtest: part 1 ( #1407 )
...
* Rewrite .sh test to Gtest
* review chnages
* Removew unused comments
* Review v2
* Typo
* Separete UT: AMAX, MAX, MIN; added template params to trigger them
* Update test/reduce/reduce_no_index.cpp
---------
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
2024-08-12 16:28:10 +02:00
Bartłomiej Kocot
4a870942e6
Fix bug with n block id calculation in DeviceGroupedConvXdlCShuffle ( #1457 )
...
* Fix typo in TransformConvFwdToGemm
* Fix bug in n offset calculation
2024-08-10 13:12:05 +02:00
arai713
da214a5a58
Codegen build w/CK ( #1428 )
...
* initial push
* cleaned up compiler errors
* removed commented code
* build codegen folder only for gfx9 targets
* remove separate stage for codegen tests from CI
* removed commented code from CMake
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-08-09 08:15:06 -07:00
Jun Liu
5ff8eeebf9
Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd ( #1382 ) ( #1406 ) ( #1415 )" ( #1455 )
...
This reverts commit 33b399cc15 .
2024-08-08 19:09:33 -07:00
Illia Silin
4a5ab67871
Enable CI on gfx12. ( #1454 )
...
* enable CI build and test on gfx1201
* skip DL kernels in CI for gfx12
* only run CI on gfx12 if rocm version >= 6.2
* remove the rocm version check for CI on gfx12
* add a switch for CI builds on gfx12
2024-08-08 16:29:15 -07:00
Illia Silin
ae3b8ff86c
check if the coerce-illegal-types flag is supported ( #1451 )
2024-08-08 07:29:29 -07:00
Illia Silin
8a75728406
add rocm-llvm-dev package to docker image ( #1452 )
2024-08-08 07:29:13 -07:00
Juan Manuel Martinez Caamaño
901e5f1540
Remove reinterpret_cast uses that result in undefined behaviour. ( #1445 )
...
* Remove reinterpret_cast uses that result in undefined behaviour. Use a bitcast instead.
See https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_accessibility
Closes #1439
* fix clang format
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-08-07 11:49:02 -07:00
Illia Silin
5df10432d8
upgrade to rocm6.2 as new default compiler ( #1448 )
2024-08-07 09:38:43 -07:00
dependabot[bot]
a71d407e35
Bump rocm-docs-core from 1.6.1 to 1.6.2 in /docs/sphinx ( #1449 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.6.1 to 1.6.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.1...v1.6.2 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-07 08:22:38 -07:00
Illia Silin
12c1f68dd9
Run CK_TILE FMHA benchmarks and collect the performance data. ( #1447 )
...
* run ck_tile benchmarks after the smoke tests and store logs
* change the path of fmha benchmark logs
* change the way of stashig ck_tile fmha logs
* prevent the errors in stages where no logs are generated
* fix the ck_tile fmha log names and headers
* generate the fmha performance logs in the root folder
* change jenkins scrip arguments format
* use exact file names for stashing
* modify scripts to process FMHA performance results
* unstash FMHA logs before parsing them
2024-08-07 08:18:26 -07:00
Max Podkorytov
886d14ccb2
modify python wrapper for addmm ( #1441 )
2024-08-06 15:09:27 -07:00
Haocong WANG
6fc7bff58f
Limit fp8only operator build arch in ckProfiler ( #1443 )
2024-08-06 14:29:14 -07:00
Jun Liu
afbf6350f3
Fix ROCm 6.2 compiler not fully supporting gfx12 when building CK with INSTANCES_ONLY ( #1446 )
2024-08-06 13:06:53 -07:00
Juan Manuel Martinez Caamaño
fd9ef4e678
Add missing constexpr to if conditions ( #1444 )
2024-08-06 11:40:34 -07:00
bibek
840c5397bb
adding mha as static lib ( #1366 )
...
* adding mha as static lib
* add fmha fwd compile options
* typo
* fix python version
* python version to 3
* increase path length
* add max path flag in mha cmake
* fix long path issue
* mha currently only runs in gfx94x
* only buld mha in mi300
* populate gpu_list
* add mha compile flags
* avoid building mha in gpu other then gfx94x
* some comments and include ck_tile in rocm
* use rocm_install
* place ck_tile in include
* correct ck_tile path
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2024-08-06 11:17:10 -05:00
jakpiase
b74d4d4d54
Fix for beta!=0 in reduce ( #1440 )
...
* fix for beta!=0 in reduce
* add reviewers suggestions
2024-08-06 09:10:39 -07:00
Bartłomiej Kocot
4ec5c52a0c
Add Grouped Conv Fwd Large Tensor kernel ( #1432 )
...
* Support 64 bit indexing
* Add new grouped conv fwd kernel for large tensors
* Add instances large tensor
* Fixes for transform conv to gemm
* Fixes
* fixes
* Remove not needed instances
* examples fixes
* Remove not need ds arrays
* Fix tests
* Add 2GB check in gridwise dl
* Fixes
2024-08-06 10:06:10 +02:00
Illia Silin
7f57b2e02c
add --offload-compress compiler flag ( #1433 )
...
* add --offload-compress compiler flag
* only apply the --offload-compress flag to the ckProfiler
* move the --offload-compress flag back to main cmake file
* add offload-compress to target compile option of ckProfiler
---------
Co-authored-by: carlushuang <carlus.huang@amd.com >
2024-08-05 23:26:01 +08:00
Illia Silin
f31ba04afc
[CI][Jenkins] delete CI docker container upon exit ( #1437 )
2024-08-05 08:13:56 -07:00
Illia Silin
d311c95396
Add compiler flags for ROCm versions 6.2+ ( #1429 )
...
* add compiler flags to fix compiler issues
* fix typo.
* disable test_smfmac_op on all devices except gfx942
* specify full path to compiler in CI
2024-08-01 08:27:52 -07:00
Sam Wu
6648fd3b04
Update doc requirements ( #1423 )
2024-07-31 07:42:42 -07:00
zjing14
f31e8dfa80
[HotFix] Fixed a typo in profile_gemm_multiply_multiply ( #1425 )
...
* fixed a typo
* clean
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
2024-07-31 07:19:17 -07:00
arai713
d32997a792
Codegen: isSupportedArgument check ( #1417 )
...
* added isSupportedArgument check into codegen device op
* adding function call
* remove commented code
2024-07-31 07:12:15 -07:00
carlushuang
b3f86e79dd
workaround rocm-6.2 compiler issue ( #1421 )
2024-07-31 16:03:59 +08:00
Illia Silin
b527cad4a5
add docker for rocm6.2_rc4 compiler ( #1424 )
2024-07-30 11:55:33 -07:00
Bartłomiej Kocot
33b399cc15
Revert Revert Support access per groups and filter2x3 in grouped conv fwd ( #1382 ) ( #1406 ) ( #1415 )
2024-07-30 18:36:04 +02:00
dependabot[bot]
b9ba5b2676
Bump rocm-docs-core from 1.6.0 to 1.6.1 in /docs/sphinx ( #1420 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.6.0 to 1.6.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.0...v1.6.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-26 14:47:19 -07:00
trixirt
733f33af78
Introduce cmake USE_GLIBCXX_ASSERTIONS option ( #1404 )
...
A standard option in Fedora packaging that is used to check
the correctness of c++ use of the standard c++ library.
Signed-off-by: Tom Rix <trix@redhat.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2024-07-25 19:28:17 -07:00
zjing14
105bd708c7
Add rotating buff for gemm_multi_d ( #1411 )
...
* add rotating_buff for gemm_multi_d
* format
* Update flush_cache.hpp
* Update gtest.cmake
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
Co-authored-by: Haocong WANG <haocwang@amd.com >
2024-07-25 23:21:21 +08:00