rtmadduri
a52f12458f
LWPCK-2429: Device grouped GEMM uses Async Memcpy ( #1695 )
...
* LWPCK-2429: Device grouped GEMM uses Async Memcpy
Resolving merge conflicts
* reverting changes to profile_grouped_gemm
* revert date change
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
[ROCm/composable_kernel commit: 9488f1c981 ]
2024-12-02 09:13:56 +01:00
Max Podkorytov
c75a851aad
[Python] Add batched gemm instances parsing ( #1684 )
...
* add op
* do not insert ds parameters as they are already parsed
* reset ds parameters
* apply ruff
[ROCm/composable_kernel commit: 44828b7c0f ]
2024-11-30 08:11:42 -08:00
Bartłomiej Kocot
d4a9af967e
[CK TILE] Fix universal gemm template keywords ( #1704 )
...
[ROCm/composable_kernel commit: cff7fab798 ]
2024-11-29 20:51:09 -08:00
dependabot[bot]
8f43579a49
Bump rocm-docs-core from 1.9.1 to 1.9.2 in /docs/sphinx ( #1702 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.9.1 to 1.9.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.9.1...v1.9.2 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: 28e02cf524 ]
2024-11-29 07:18:43 -08:00
aledudek
ed385de9b2
Ck tile batched gemm example ( #1615 )
...
* [CK Tile] Batched GEMM Example
* [CK Tile] Batched GEMM Example - minor refactor
* [CK Tile] Batched GEMM Example - README update
* [CK Tile] Batched Gemm Example - review changes
- Added tensor data layours as input parameters
- Changed structure of Host and Kernel args
- Removed bug with invalid vector read on non-contiguous memory
* [CK Tile] Batched Gemm Example - remove comment
* [CK Tile] Batched Gemm Example - Add GTests part1
* [CK Tile] Batched Gemm Example - GTests part2 + review changes
* [CK TILE] Batched GEMM post merge fixes
* [CK Tile] Batched GEMM Example - fix pad views
[ROCm/composable_kernel commit: 78f0fea08e ]
2024-11-29 11:52:18 +01:00
dependabot[bot]
9bd522d82f
Bump rocm-docs-core from 1.9.0 to 1.9.1 in /docs/sphinx ( #1701 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.9.0 to 1.9.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.9.0...v1.9.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: bb652696e7 ]
2024-11-28 10:43:36 -08:00
Illia Silin
7ce688ded2
Reduce docker size and build time in CI. ( #1699 )
...
* refactor docker build in CI
* add Dockerfile.compiler
* add input args to Dockerfile.compiler
* rearrange the docker args
[ROCm/composable_kernel commit: aa6e2087f5 ]
2024-11-28 10:42:19 -08:00
Bartłomiej Kocot
0e0653614a
[CK TILE] Add gemm compute pipeline v3 ( #1661 )
...
* [CK TILE] Add gemm compute pipeline v3
* Enable universal gemm compute pipeline.
* Rename example and add compute pipeline.
* Introduce ag bg cr pipeline impl base.
* Refactor to reuse code.
* Cleaning
* Formatting.
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
Co-authored-by: Adam Osewski <Adam.Osewski@amd.com >
[ROCm/composable_kernel commit: f49b595dc0 ]
2024-11-28 17:51:49 +01:00
jakpiase
d8d0168cef
Add interwave scheduler for gemm mem pipeline ( #1647 )
...
* add interwave scheduler for gemm mem pipeline
* Fix merge artifacts.
* Refactor unit tests.
* Switch to interwave scheduler for mem example
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
Co-authored-by: Adam Osewski <Adam.Osewski@amd.com >
[ROCm/composable_kernel commit: e7b6286441 ]
2024-11-27 18:25:07 +01:00
Illia Silin
b631991712
move utility headers from library/include to include path ( #1697 )
...
[ROCm/composable_kernel commit: fe6b185b97 ]
2024-11-27 06:12:56 -08:00
Adam Osewski
46b120168b
Polished Grouped GEMM APIs and new BF16 instances ( #1600 )
...
* Few small fixes.
* New GroupedGemm instances (BF16)
* Unify and refactor GroupedGEMM device API.
* Adapt changes to new API.
* Adapt grouped gemm profiler.
* Accept multiple kbatches for grouped gemm profiler.
- delete obsolete two stage as it is now covered by grouped gemm
* Update unit test for grouped gemm.
* Fix thresholds for BF16 and F8. Unblock tests.
* Fix few instances.
* Multiple small fixes.
* Adapt to new API, check dynamic casting.
* Uncomment few data types in grouped gemm profiler.
* Fix call to SetDeviceArgs.
* Fix profile grouped gemm multiply tile loop.
* Fix grouped gemm tile loop kernel args in client examples.
* Review comments.
[ROCm/composable_kernel commit: 061ac0649c ]
2024-11-27 13:02:44 +01:00
Illia Silin
1fec113c70
update mainline compiler branch name ( #1696 )
...
[ROCm/composable_kernel commit: cb8c7f42d6 ]
2024-11-26 14:58:35 -08:00
rocking
e116bfef59
support max3 in smoothquant and add+ rmsnorm + rdquant ( #1654 )
...
* Fix cmake example build
* Support max3 in smoothquant one pass
* support max3 in two pass
* support max3 in add_rmsnorm_rdquant
[ROCm/composable_kernel commit: abae2afc72 ]
2024-11-27 05:01:15 +08:00
Adam Osewski
87b69930e7
Change block gemm pipeline local prefill loop order. ( #1692 )
...
* Fix loop order.
* Fix loop order in pipeline v4
[ROCm/composable_kernel commit: bfe983a151 ]
2024-11-26 17:36:53 +01:00
jakpiase
50ee0ac283
Add check for bf16 splitk support for grouped gemm splitk ( #1673 )
...
* add check for bf16 splitk support for grouped gemm splitk
* Update if condition
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
[ROCm/composable_kernel commit: b70f367f80 ]
2024-11-26 13:56:32 +01:00
Po Yen Chen
b0cfd7f12e
[CK_TILE] Fix incorrect computation of group mode PagedAttention ( #1688 )
...
* Allow getting batch size from splitkv tile partitioner
* Fix wrong paged-kvcache impl for group mode
* Fix wrong example code for page-kvcache
* Undo changes in fmha_fwd.cpp
* Always use 2D block table
* Add is_gappy kernel argument for paged-kvcache
The is_gappy argument is used for differentiating seqstart_k_ptr usage
in flash-attention & xformers
* Remove out-of-date comments
* Remove no-longer used method
* Fix wrong # page-block calculation
* Fix wrong comment
---------
Co-authored-by: Qianfeng <qianfeng.zhang@amd.com >
[ROCm/composable_kernel commit: cf2d635ea2 ]
2024-11-26 20:37:54 +08:00
Adam Osewski
0b49f75e9e
CK-Tile first draft of universal block gemm with interwave & intrawave scheduler ( #1676 )
...
* Block universal gemm.
* Universal block gemm with interwave scheduler - draft.
* Refactoring
* Move a/b_warp_tiles into BlockGemmImpl
* set BlockGemmImpl as a class member
* Change tile size for more suitable to memory bound cases.
* Introduce kKPerThread to WarpGemm
* Add documentation comment.
* Fix Interwave scheduler block gemm.
* Add compute/memory friendly tile configuration.
* Clean
* New tile configurations in gemm mem example.
* Add more static checks and fix loop order in block gemm.
* Add more static checks and use warp gemm mfma dispatcher.
* Add default scheduler block gemm.
* Remove logging in example.
[ROCm/composable_kernel commit: b6bcd76d88 ]
2024-11-26 08:45:14 +01:00
carlushuang
8acce2dee1
[CK_TILE] fused-moe first version ( #1634 )
...
* moe pipeline
* update code
* compile OK
* update
* update cpu reference
* update pipeline_gemm0
* compiler ok
* update pipeline
* rename to ex pipeline
* block-asm
* update
* update
* update first gemm ok
* compute correct
* update file structure
* update README
* update
* update
* update code
* update API
* return unsupport case
* add comment
* update readme
* update
* uncomment
* update
* fix build err
---------
Co-authored-by: valarLip <340077269@qq.com >
[ROCm/composable_kernel commit: 440e28b08f ]
2024-11-26 11:14:56 +08:00
Po Yen Chen
f81addbe42
[CK_TILE] Fix fMHA fwd MakeKargs() compilation errors ( #1689 )
...
* Fix mis-matched tuple<> elem types
* Rename MakeKargs() as MakeKargsImpl()
---------
Co-authored-by: Qianfeng <qianfeng.zhang@amd.com >
[ROCm/composable_kernel commit: 645fe812f6 ]
2024-11-25 15:30:35 +08:00
dependabot[bot]
13fa64e90c
Bump rocm-docs-core from 1.8.5 to 1.9.0 in /docs/sphinx ( #1691 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.8.5 to 1.9.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.9.0/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.5...v1.9.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: c2bcbb1379 ]
2024-11-24 21:41:52 -08:00
carlushuang
4fad52fea6
[CK_TILE]Moe update index ( #1672 )
...
* update MOCK_ID for moe-sorting
* add moe-smoothquant
* update a comment
* fix format
* hot fix
* update topk in overflow case
* update comments
* update bf16 cvt
---------
Co-authored-by: valarLip <340077269@qq.com >
[ROCm/composable_kernel commit: 36c7ce4e0e ]
2024-11-25 13:12:35 +08:00
Qianfeng
e58d441138
Change in fwd-splitkv kernel to support num_splits=1 case ( #1690 )
...
* Change in fwd-splitkv kernel to support num_splits=1 case
* Update in codegen fwd-splitkv to make num_splits > 1 cases pass
* Specify instance traits in dispatch
* Fix link error for fp8 kernels
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
[ROCm/composable_kernel commit: ce2bdf42a9 ]
2024-11-25 12:31:38 +08:00
Illia Silin
0149380318
add --squash flag when building dockers ( #1686 )
...
[ROCm/composable_kernel commit: 19d4b79039 ]
2024-11-22 17:16:08 -08:00
Illia Silin
d6546a87fa
add Andriy to the code owners ( #1687 )
...
[ROCm/composable_kernel commit: a420b3b34d ]
2024-11-22 16:30:12 -08:00
schung-amd
47b06431d5
[CK_TILE] MakeKargs overloads for backward compatibility ( #1681 )
...
* Add overloads for MakeKargs
Overload MakeKargs to accept std::tuple<uint64_t, uint64_t> and std::tuple<void*, void*> to preserve functionality of code currently passing in list initializers or tuples.
* Add overloads for MakeKargs
Overload MakeKargs to accept std::tuple<uint64_t, uint64_t> and std::tuple<void*, void*> to preserve functionality of code currently passing in list initializers or tuples.
* Re-format files using ck_tile remod.py
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
[ROCm/composable_kernel commit: ff92222f93 ]
2024-11-23 06:51:35 +08:00
Illia Silin
236cc380f1
fix path of ninjatracing ( #1685 )
...
[ROCm/composable_kernel commit: 4c7035ff08 ]
2024-11-22 08:30:01 -08:00
Harisankar Sadasivan
0d34db594d
universal streamk fp8 changes ( #1665 )
...
* universal streamk fp8 changes & ckprofiler instances
* revert strides to -1 and verification options
* fp8 exclusion on pre-gfx94 for universal_streamk
* PR review based revisions: permissions reverted, removed hip err checks
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
[ROCm/composable_kernel commit: d6d4c2788b ]
2024-11-21 08:21:37 -08:00
Po Yen Chen
ed8630416e
[CK_TILE] Add paged-kvcache support in group mode fmha fwd splitkv kernels ( #1678 )
...
* Generate group mode paged-attn kernel
* Enable paged-kvcache + group mode support
* Add missing header: fused_moe.hpp
* Add comment to explain kernel arg usage
* Make error message more clear
* Add comment for confusing data member names
* Add more comment for confusing variable names
* Fix typo in option description
[ROCm/composable_kernel commit: fb1ccfa9df ]
2024-11-21 14:53:10 +08:00
Po Yen Chen
df950309f9
Add QianFeng to code owners ( #1682 )
...
[ROCm/composable_kernel commit: 6916d8cc03 ]
2024-11-21 14:49:13 +08:00
Illia Silin
a5e58cd17a
Optimize docker file. ( #1679 )
...
* reduce the docker image size and layers
* clean up docker file
* fix linker error for client example 24
* install CK into the default /opt/rocm/ path
* restore installing CK to alternative path in CI
* add linking for utility lib
[ROCm/composable_kernel commit: d31e8249c1 ]
2024-11-20 14:01:04 -08:00
Haocong WANG
15ce93a491
fix bug ( #1680 )
...
[ROCm/composable_kernel commit: 81ec5eff4a ]
2024-11-20 07:03:56 -08:00
Illia Silin
87ed997f4b
add more fp32 dl gemm instances ( #1675 )
...
* add more fp32 dl gemm instances
* update the dates
[ROCm/composable_kernel commit: da0c21f661 ]
2024-11-19 10:00:17 -08:00
dependabot[bot]
74aa10c051
Bump rocm-docs-core from 1.8.4 to 1.8.5 in /docs/sphinx ( #1674 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.8.4 to 1.8.5.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.5/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.4...v1.8.5 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: e4dfe4d892 ]
2024-11-18 22:00:18 -08:00
Illia Silin
dc227604bc
Add bf16 and int8 wmma gemms for Navi3x and Navi4x. ( #1671 )
...
* add bf16 gemms for gfx11/gfx12
* reduce the input values in test_gemm
* add int8 wmma gemm instances for gfx11/gfx12
* add example gemm_wmma_int8
* fix bug in gemm_wmma_int8 test
* increase bf16 gemm test tolerance
* update the dates and clean-up commented-out instances
[ROCm/composable_kernel commit: 8aba2724cc ]
2024-11-18 14:07:04 -08:00
Bartłomiej Kocot
b89a44ea33
Batched GEMM Multiple D based on Universal GEMM ( #1655 )
...
* Batched GEMM Multiple D based on Universal GEMM
Co-authored-by: Jing Zhang <jizhan@fb.com >
* CI fixes
Co-authored-by: Jing Zhang <jizhan@fb.com >
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
[ROCm/composable_kernel commit: 754adc70e3 ]
2024-11-18 14:03:45 +01:00
dependabot[bot]
78d3df3a47
Bump rocm-docs-core from 1.8.3 to 1.8.4 in /docs/sphinx ( #1670 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.8.3 to 1.8.4.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.4/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.3...v1.8.4 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: efb34741fe ]
2024-11-15 18:30:58 -05:00
Illia Silin
6179655750
re-enable fp8 gemms in ckProfiler ( #1667 )
...
[ROCm/composable_kernel commit: b4a7904582 ]
2024-11-14 16:15:01 -08:00
Illia Silin
9da305a0f6
re-enable coerce-illegal-types flag for rocm6.3 ( #1668 )
...
[ROCm/composable_kernel commit: 3b6a481e92 ]
2024-11-14 16:14:50 -08:00
Andriy Roshchenko
f8fc165140
Fix example_convnd_fwd_max_xdl_int8 failures on MI300 ( #1666 )
...
* Improve test verbosity.
* BUGFIX: Add missing initialization for reduction buffer
* Change default initialization method
Performance may be affected for fp32 and int8 examples.
* Improve test verbosity
* Cleanup
[ROCm/composable_kernel commit: d805a461aa ]
2024-11-14 08:40:50 -08:00
feli
44c37881bd
[Ck_tile] hot fix, fix rpcf param setting err ( #1657 )
...
Co-authored-by: dummycoderfe <noplydummmycoder@163.com >
[ROCm/composable_kernel commit: c1f8d53ce8 ]
2024-11-14 14:06:36 +08:00
Illia Silin
d80f50d5e9
fix clang format ( #1662 )
...
[ROCm/composable_kernel commit: efd9261545 ]
2024-11-13 09:20:18 -08:00
Taylor Ding
7c9257128d
Move checks for compatibility from Argument() to IsSupportedArgument() ( #1653 )
...
[ROCm/composable_kernel commit: 73f02a1083 ]
2024-11-13 11:20:38 -05:00
Bartłomiej Kocot
d10e451e7e
[CK TILE] Update gemm universal pipeline ( #1644 )
...
* [CK TILE] Update gemm universal pipeline
* Fixes
* fix
* Rebase
[ROCm/composable_kernel commit: d20735691c ]
2024-11-13 11:46:18 +01:00
Illia Silin
6b809fd876
test rocm6.3 rc1 build 20 ( #1659 )
...
[ROCm/composable_kernel commit: 489c78d073 ]
2024-11-12 09:35:33 -08:00
Thomas Ning
6bafdd985c
[CK Tile] Improve the Layout, Padding, and Alignment features of CK Tile GEMM ( #1651 )
...
* Finished the feature
* Modified the test file
* Test case update
* addresss comment
* Addressed the review comment
* Fixed the CI error
[ROCm/composable_kernel commit: 2b6458ddf2 ]
2024-11-11 18:08:25 -08:00
Illia Silin
5104a01a8a
restore collecting performance of mixed prec gemms ( #1648 )
...
[ROCm/composable_kernel commit: 5fb150dbe7 ]
2024-11-11 09:25:08 -08:00
valarLip
0531381131
[CK_TILE] add more stride for layernorm to support un-continuous Tensor ( #1650 )
...
* [CK_TILE] add more stride for layernorm to support un-continuous Tensor
* align CK coding style
* extend strides to layernrom expample
* clang-format...
[ROCm/composable_kernel commit: 8ef8a994e7 ]
2024-11-11 16:02:28 +08:00
Po Yen Chen
f541d4382f
Return nullptr when block index is invalid ( #1649 )
...
[ROCm/composable_kernel commit: 13332998a4 ]
2024-11-11 09:28:32 +08:00
dummycoderfe
eec0fed606
Ck tile/moe sorting ( #1624 )
...
* add moe_sorting & check ok
* fix comments & typo
* Run remod.py under include/ck_tile & example/ck_tile directories
* format codes
* fix output ci check bug
* fix moe sorting readme and error commit file
* use magiv div to accelerate compute
* add an loop unroll for moe lds ops
* add extblocksnel to set zeros for moebufs
* [Ck_tile] moe set zero run ok, add size check and fix ref check
* [Ck_tile]fix moe_sorting fuse set_zero remod
* [Ck_tile] change name style, fix zero buffer size err, change folder
* [Ck_tile] moe_sorting: fix name style
* [Ck_tile] moe_sorting, remove useless params in traits
* [Ck_tile] change outputtile cnt * unit_size; change output buf alloc
---------
Co-authored-by: dummycoderfe <noplydummmycoder@163.com >
Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
[ROCm/composable_kernel commit: bec6fbc65f ]
2024-11-09 17:57:27 +08:00
Po Yen Chen
c35b7e3d61
Fix 'sh' command compatibility of smoke_test_fwd.sh ( #1553 )
...
[ROCm/composable_kernel commit: af9546d9f4 ]
2024-11-09 09:55:14 +08:00