danyao12
ee9706ab2c
bf16 rtz update
2024-10-11 15:40:29 +08:00
danyao12
7b12d9b720
some kernels and related api update
2024-10-11 11:55:06 +08:00
danyao12
d4de8495dd
rename & ensure thread safety
2024-10-08 18:14:58 +08:00
danyao12
871c7556f7
add bf16+a16 rtz
2024-09-29 11:35:40 +08:00
danyao12
2dafca1f39
mqa/gqa support for atomic f16 cases
2024-09-27 16:08:49 +08:00
danyao12
2a4c23161a
Merge branch 'develop' into ck_tile/fa_asm_bwd
2024-09-23 11:29:00 +08:00
danyao12
1e01ee0918
add benchmark_bwd_ext
2024-09-23 10:51:35 +08:00
danyao12
36e65bdc81
clang-format
2024-09-23 09:49:44 +00:00
Po Yen Chen
770d2b7725
Early return if seqlen_k=0 on group mode ( #1524 )
2024-09-22 20:05:58 +08:00
danyao12
2463a22136
code revert
2024-09-21 18:00:56 +08:00
danyao12
78f33529a8
no_coex update
2024-09-21 17:59:04 +08:00
danyao12
8ac3eb3978
asm code update
2024-09-20 17:26:02 +08:00
Bartłomiej Kocot
4ba52b35dc
Add support for NGCHW in grouped conv fwd ( #1499 )
...
* Support NGCHW in grouped conv fwd
* Remove not needed variable
* Fixes
2024-09-20 10:45:46 +02:00
Adam Osewski
0c39954da9
Remove unsupported (fp8) type from Add memory operation. ( #1521 )
...
The dynamic buffer doesn't have support for fp8 in `Update` operation thus fp8 is not supporting `InMemoryDataOperation::Add`
2024-09-20 09:40:45 +02:00
danyao12
67b160c579
enable bwd_fp16_a16
2024-09-19 13:59:31 +08:00
danyao12
c3b406d6e8
clang-format
2024-09-19 11:28:47 +00:00
danyao12
5ab137f44c
add traits
2024-09-19 11:13:21 +08:00
Thomas Ning
694c300145
Ck tile gemm padding dim ( #1516 )
...
* Support the N dimension padding
* Finished the padding feature for different dimension of K
2024-09-18 11:32:29 -07:00
dependabot[bot]
e84adec3ba
Bump rocm-docs-core from 1.8.0 to 1.8.1 in /docs/sphinx ( #1519 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.8.0 to 1.8.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.1/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.0...v1.8.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-18 07:00:26 -07:00
danyao12
a0491b67bd
code cleanup
2024-09-18 16:11:27 +08:00
danyao12
3efb862142
tmp save
2024-09-18 12:32:15 +00:00
Illia Silin
1658c0dc11
Add rocm6.3_rc1 docker image ( #1518 )
...
* add image for rocm6.3_rc1
* fix deb package url
2024-09-17 15:59:26 -07:00
aledudek
a793afc961
Extend pool3d fwd avg, max operations by f8_t, int8_t types ( #1483 )
...
* Extend pool3d fwd avg, max operations by f8_t, int8_t types
* Pack MaxPool3dFwd params together
* Fix MaxPool3dFwd AVG instances
* Decrease verification precision for bf16
* Adjust tests + review changes
* Adjust threshold for F8
* Adjusted compute types for MAX op instances
* Fix ComputeDataType mismatch in tests and profiler for AVG
* Fix naming from max_pool3d_fwd to pool3d_fwd
* Adjust CMakeLists
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2024-09-17 15:57:10 +02:00
dependabot[bot]
8ec15e644e
Bump rocm-docs-core from 1.7.2 to 1.8.0 in /docs/sphinx ( #1517 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.7.2 to 1.8.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.7.2...v1.8.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-16 22:37:57 -07:00
Mateusz Ozga
6834e5ee74
This commit contains implementation of max pool2d for f8 type ( #1506 )
...
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2024-09-16 10:15:06 +02:00
Thomas Ning
844f5a1712
Ck tile GPU verification sample develop & Add the CK TILE GEMM to the CI/CD test ( #1505 )
...
* Finished the feature of gpu verification
* Add the ck_tile_gemm test in the CI CD
* add the include of tensor_layou in reference_gemm
* Comment Addressed
* split ck_tile fhma and gemm tests into separate stages
* restructure the reference gemm
* restructure a new reference_gemm api that could read the device mem
---------
Co-authored-by: carlushuang <carlus.huang@amd.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-09-14 21:08:40 +08:00
bibek
49e012dee1
Fix duplicate CMake tidy-target issue ( #1513 )
2024-09-13 21:15:04 -07:00
jakpiase
8f8a2ce396
Add pool2d int8 and fp8 instances ( #1508 )
...
* add pool2d fp8 and int8
* minor fixes
* add formatting
* add reviewer suggestions
* add reviewer suggestions
2024-09-13 10:18:21 -07:00
dependabot[bot]
a4982c3b86
Bump sphinxcontrib-bibtex from 2.6.2 to 2.6.3 in /docs/sphinx ( #1511 )
...
Bumps [sphinxcontrib-bibtex](https://github.com/mcmtroffaes/sphinxcontrib-bibtex ) from 2.6.2 to 2.6.3.
- [Changelog](https://github.com/mcmtroffaes/sphinxcontrib-bibtex/blob/develop/CHANGELOG.rst )
- [Commits](https://github.com/mcmtroffaes/sphinxcontrib-bibtex/compare/2.6.2...2.6.3 )
---
updated-dependencies:
- dependency-name: sphinxcontrib-bibtex
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-13 08:17:26 -07:00
Jun Liu
81bc1496b2
Customize filesystem in CK for legacy systems ( #1509 )
...
* Legacy support: customized filesystem
* Update cmakefile for python alternative path
* fix build issues
* CK has no boost dependency
* More fixes to issues found on legay systems
* fix clang format issue
* Check if blob is correctly generated in cmake
* fix the python issues
* add a compiler flag for codegen when using alternative python
* use target_link_options instead of target_compile_options
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-09-13 07:51:07 -07:00
Illia Silin
e07f1108c0
make sure to rebuild compilers if they changed ( #1504 )
2024-09-12 07:49:55 -07:00
Mateusz Ozga
448c0f56d8
Pool2d max/avg kernel in the BWD version ( #1494 )
...
* Add pool2d instance BWD AVG
* Add pool2d instance BWD MAX
* Fix: avg review
* Fix review: part2
* Fix - enable test when type is compiled
* Fix review part3
2024-09-12 11:47:52 +02:00
jakpiase
e8d2887cb2
Rewrite pool2d fwd ( #1462 )
...
* added pool2d fwd
* add tests
* add reviewers changes
* Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new"
This reverts commit 6b2ba7ff89 , reversing
changes made to 22c82bea0c .
* Revert "add reviewers changes"
This reverts commit 22c82bea0c .
* added reviewers comments
* revert some old files
* add reviewers requests
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2024-09-11 15:21:00 +02:00
jakpiase
2a261afcdf
Added structural sparsity blockwise gemm ( #1435 )
...
* Implemented smfmac xdlops
* Added smfmac blockwise xdlops
* fixes
* add reviewers suggestions
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2024-09-11 15:19:42 +02:00
Dan Yao
d09572e8c2
[CK_TILE] FA bwd repair ( #1502 )
...
* fix fa bwd
* revert kernelBlockSize in gemm_kernel.hpp
2024-09-10 10:45:32 -07:00
Thomas Ning
cf08df6b5e
fix the unsupported scenario of Ali TestGemmUniversal ( #1501 )
2024-09-09 11:31:27 -07:00
Thomas Ning
caacd38830
Ck tile gemm example ( #1488 )
...
* Checkpoint: Finished with the tile example & kernel verification, working on the different matrix layout
* Finished the Matrix Layout feature set up. Note: Need to modify the inner block to solve the shuffle problem in the future.
* Fix: Clang Format, API fixed from fmha
* fix with better naming convention
* revert back the pipeline code of fmha
* Fixed: Addressed the comments and merge the GEMM shape of GEMM Operator and FMHA Operator to one.
* clang format with the reference_gemm file
* convert the clang format with the remod.py
* Changed the format and variable name of the kernel gemm_shape and partitioner
---------
Co-authored-by: thomasning <thomasning@banff-cyxtera-s70-4.ctr.dcgpu >
2024-09-07 16:23:32 +08:00
Fang.Che
d4139c8bf8
add fmha asm api: fmha_bwd_ext
2024-09-05 07:51:01 +00:00
danyao12
933ac7c7ff
hsaco reorder
2024-09-05 14:04:25 +00:00
danyao12
d356c4d0f1
hsaco rename
2024-09-05 14:01:13 +00:00
M.Emin Ozturk
8378855361
Moficiation to fix this issue "threadwise_tensor_slice_transfer_v5r1 issue #1279 " ( #1492 )
...
* issue fix, one line changed for tmp
* clang
---------
Co-authored-by: Emin Ozturk <emin.ozturk@utah.edu >
Co-authored-by: Harisankar Sadasivan <135730918+hsadasiv@users.noreply.github.com >
2024-09-04 21:52:55 -07:00
Haocong WANG
5b10dae6a4
Add gemm universal bf16 instances ( #1484 )
...
* revert ckprofiler change
* temp save
* Add test and test pass
* test pass
* Fix bug inside rotating buffer when tensor is not packed
* bug fix
* clang format
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2024-09-04 20:58:54 -07:00
Fang.Che
2d9b53441c
Add the hsaco binary compiled from fmha assembly
2024-09-05 03:00:23 +00:00
Rostyslav Geyyer
52410b49c7
Temporarily disable flaky test for all ( #1495 )
2024-09-04 07:36:57 -07:00
Illia Silin
8b95d9ad52
copy all fmha headers when building library ( #1497 )
...
* copy all fmha headers when building library
* fix the rocm_install call for mha headers
2024-09-04 07:36:41 -07:00
Illia Silin
841009c5ee
Add an option to select an alternative python version during build. ( #1496 )
...
* locate a newwer version of python when -DRHEL=ON flag is set
* allow setting python version on cmake command line
2024-09-04 07:36:27 -07:00
Bartłomiej Kocot
73b67f290f
Add support for NGCHW in grouped conv bwd wei ( #1491 )
...
* Add support for NGCHW in grouped conv bwd wei
* Comments fixes
* navi fixes
* Update function names
2024-09-03 10:52:03 +02:00
Bartłomiej Kocot
a9b170b541
Revert "Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd ( #1382 ) ( #1406 ) ( #1415 )" ( #1455 )" ( #1490 )
...
This reverts commit 5ff8eeebf9 .
2024-09-02 10:39:49 +02:00
Dan Yao
b8addae293
[CK_TILE] float -> bf16 inline asm rtn ( #1482 )
...
* asm rtn
* add asm rtn macro
* reorder macro
---------
Co-authored-by: carlushuang <carlus.huang@amd.com >
2024-08-30 15:38:09 +08:00
Po Yen Chen
461ec98d78
Enable scratch memory workaround on ROCm 6.2 ( #1486 )
...
Co-authored-by: carlushuang <carlus.huang@amd.com >
2024-08-30 10:40:00 +08:00