dependabot[bot]
28e02cf524
Bump rocm-docs-core from 1.9.1 to 1.9.2 in /docs/sphinx ( #1702 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.9.1 to 1.9.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.9.1...v1.9.2 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-29 07:18:43 -08:00
aledudek
78f0fea08e
Ck tile batched gemm example ( #1615 )
...
* [CK Tile] Batched GEMM Example
* [CK Tile] Batched GEMM Example - minor refactor
* [CK Tile] Batched GEMM Example - README update
* [CK Tile] Batched Gemm Example - review changes
- Added tensor data layours as input parameters
- Changed structure of Host and Kernel args
- Removed bug with invalid vector read on non-contiguous memory
* [CK Tile] Batched Gemm Example - remove comment
* [CK Tile] Batched Gemm Example - Add GTests part1
* [CK Tile] Batched Gemm Example - GTests part2 + review changes
* [CK TILE] Batched GEMM post merge fixes
* [CK Tile] Batched GEMM Example - fix pad views
2024-11-29 11:52:18 +01:00
dependabot[bot]
bb652696e7
Bump rocm-docs-core from 1.9.0 to 1.9.1 in /docs/sphinx ( #1701 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.9.0 to 1.9.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.9.0...v1.9.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-28 10:43:36 -08:00
Illia Silin
aa6e2087f5
Reduce docker size and build time in CI. ( #1699 )
...
* refactor docker build in CI
* add Dockerfile.compiler
* add input args to Dockerfile.compiler
* rearrange the docker args
2024-11-28 10:42:19 -08:00
Bartłomiej Kocot
f49b595dc0
[CK TILE] Add gemm compute pipeline v3 ( #1661 )
...
* [CK TILE] Add gemm compute pipeline v3
* Enable universal gemm compute pipeline.
* Rename example and add compute pipeline.
* Introduce ag bg cr pipeline impl base.
* Refactor to reuse code.
* Cleaning
* Formatting.
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
Co-authored-by: Adam Osewski <Adam.Osewski@amd.com >
2024-11-28 17:51:49 +01:00
jakpiase
e7b6286441
Add interwave scheduler for gemm mem pipeline ( #1647 )
...
* add interwave scheduler for gemm mem pipeline
* Fix merge artifacts.
* Refactor unit tests.
* Switch to interwave scheduler for mem example
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
Co-authored-by: Adam Osewski <Adam.Osewski@amd.com >
2024-11-27 18:25:07 +01:00
Illia Silin
fe6b185b97
move utility headers from library/include to include path ( #1697 )
2024-11-27 06:12:56 -08:00
Adam Osewski
061ac0649c
Polished Grouped GEMM APIs and new BF16 instances ( #1600 )
...
* Few small fixes.
* New GroupedGemm instances (BF16)
* Unify and refactor GroupedGEMM device API.
* Adapt changes to new API.
* Adapt grouped gemm profiler.
* Accept multiple kbatches for grouped gemm profiler.
- delete obsolete two stage as it is now covered by grouped gemm
* Update unit test for grouped gemm.
* Fix thresholds for BF16 and F8. Unblock tests.
* Fix few instances.
* Multiple small fixes.
* Adapt to new API, check dynamic casting.
* Uncomment few data types in grouped gemm profiler.
* Fix call to SetDeviceArgs.
* Fix profile grouped gemm multiply tile loop.
* Fix grouped gemm tile loop kernel args in client examples.
* Review comments.
2024-11-27 13:02:44 +01:00
Illia Silin
cb8c7f42d6
update mainline compiler branch name ( #1696 )
2024-11-26 14:58:35 -08:00
rocking
abae2afc72
support max3 in smoothquant and add+ rmsnorm + rdquant ( #1654 )
...
* Fix cmake example build
* Support max3 in smoothquant one pass
* support max3 in two pass
* support max3 in add_rmsnorm_rdquant
2024-11-27 05:01:15 +08:00
Adam Osewski
bfe983a151
Change block gemm pipeline local prefill loop order. ( #1692 )
...
* Fix loop order.
* Fix loop order in pipeline v4
2024-11-26 17:36:53 +01:00
jakpiase
b70f367f80
Add check for bf16 splitk support for grouped gemm splitk ( #1673 )
...
* add check for bf16 splitk support for grouped gemm splitk
* Update if condition
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2024-11-26 13:56:32 +01:00
Po Yen Chen
cf2d635ea2
[CK_TILE] Fix incorrect computation of group mode PagedAttention ( #1688 )
...
* Allow getting batch size from splitkv tile partitioner
* Fix wrong paged-kvcache impl for group mode
* Fix wrong example code for page-kvcache
* Undo changes in fmha_fwd.cpp
* Always use 2D block table
* Add is_gappy kernel argument for paged-kvcache
The is_gappy argument is used for differentiating seqstart_k_ptr usage
in flash-attention & xformers
* Remove out-of-date comments
* Remove no-longer used method
* Fix wrong # page-block calculation
* Fix wrong comment
---------
Co-authored-by: Qianfeng <qianfeng.zhang@amd.com >
2024-11-26 20:37:54 +08:00
Adam Osewski
b6bcd76d88
CK-Tile first draft of universal block gemm with interwave & intrawave scheduler ( #1676 )
...
* Block universal gemm.
* Universal block gemm with interwave scheduler - draft.
* Refactoring
* Move a/b_warp_tiles into BlockGemmImpl
* set BlockGemmImpl as a class member
* Change tile size for more suitable to memory bound cases.
* Introduce kKPerThread to WarpGemm
* Add documentation comment.
* Fix Interwave scheduler block gemm.
* Add compute/memory friendly tile configuration.
* Clean
* New tile configurations in gemm mem example.
* Add more static checks and fix loop order in block gemm.
* Add more static checks and use warp gemm mfma dispatcher.
* Add default scheduler block gemm.
* Remove logging in example.
2024-11-26 08:45:14 +01:00
carlushuang
440e28b08f
[CK_TILE] fused-moe first version ( #1634 )
...
* moe pipeline
* update code
* compile OK
* update
* update cpu reference
* update pipeline_gemm0
* compiler ok
* update pipeline
* rename to ex pipeline
* block-asm
* update
* update
* update first gemm ok
* compute correct
* update file structure
* update README
* update
* update
* update code
* update API
* return unsupport case
* add comment
* update readme
* update
* uncomment
* update
* fix build err
---------
Co-authored-by: valarLip <340077269@qq.com >
2024-11-26 11:14:56 +08:00
Po Yen Chen
645fe812f6
[CK_TILE] Fix fMHA fwd MakeKargs() compilation errors ( #1689 )
...
* Fix mis-matched tuple<> elem types
* Rename MakeKargs() as MakeKargsImpl()
---------
Co-authored-by: Qianfeng <qianfeng.zhang@amd.com >
2024-11-25 15:30:35 +08:00
dependabot[bot]
c2bcbb1379
Bump rocm-docs-core from 1.8.5 to 1.9.0 in /docs/sphinx ( #1691 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.8.5 to 1.9.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.9.0/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.5...v1.9.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-24 21:41:52 -08:00
carlushuang
36c7ce4e0e
[CK_TILE]Moe update index ( #1672 )
...
* update MOCK_ID for moe-sorting
* add moe-smoothquant
* update a comment
* fix format
* hot fix
* update topk in overflow case
* update comments
* update bf16 cvt
---------
Co-authored-by: valarLip <340077269@qq.com >
2024-11-25 13:12:35 +08:00
Qianfeng
ce2bdf42a9
Change in fwd-splitkv kernel to support num_splits=1 case ( #1690 )
...
* Change in fwd-splitkv kernel to support num_splits=1 case
* Update in codegen fwd-splitkv to make num_splits > 1 cases pass
* Specify instance traits in dispatch
* Fix link error for fp8 kernels
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
2024-11-25 12:31:38 +08:00
Illia Silin
19d4b79039
add --squash flag when building dockers ( #1686 )
2024-11-22 17:16:08 -08:00
Illia Silin
a420b3b34d
add Andriy to the code owners ( #1687 )
2024-11-22 16:30:12 -08:00
schung-amd
ff92222f93
[CK_TILE] MakeKargs overloads for backward compatibility ( #1681 )
...
* Add overloads for MakeKargs
Overload MakeKargs to accept std::tuple<uint64_t, uint64_t> and std::tuple<void*, void*> to preserve functionality of code currently passing in list initializers or tuples.
* Add overloads for MakeKargs
Overload MakeKargs to accept std::tuple<uint64_t, uint64_t> and std::tuple<void*, void*> to preserve functionality of code currently passing in list initializers or tuples.
* Re-format files using ck_tile remod.py
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
2024-11-23 06:51:35 +08:00
Illia Silin
4c7035ff08
fix path of ninjatracing ( #1685 )
2024-11-22 08:30:01 -08:00
Harisankar Sadasivan
d6d4c2788b
universal streamk fp8 changes ( #1665 )
...
* universal streamk fp8 changes & ckprofiler instances
* revert strides to -1 and verification options
* fp8 exclusion on pre-gfx94 for universal_streamk
* PR review based revisions: permissions reverted, removed hip err checks
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2024-11-21 08:21:37 -08:00
Po Yen Chen
fb1ccfa9df
[CK_TILE] Add paged-kvcache support in group mode fmha fwd splitkv kernels ( #1678 )
...
* Generate group mode paged-attn kernel
* Enable paged-kvcache + group mode support
* Add missing header: fused_moe.hpp
* Add comment to explain kernel arg usage
* Make error message more clear
* Add comment for confusing data member names
* Add more comment for confusing variable names
* Fix typo in option description
2024-11-21 14:53:10 +08:00
Po Yen Chen
6916d8cc03
Add QianFeng to code owners ( #1682 )
2024-11-21 14:49:13 +08:00
Illia Silin
d31e8249c1
Optimize docker file. ( #1679 )
...
* reduce the docker image size and layers
* clean up docker file
* fix linker error for client example 24
* install CK into the default /opt/rocm/ path
* restore installing CK to alternative path in CI
* add linking for utility lib
2024-11-20 14:01:04 -08:00
Haocong WANG
81ec5eff4a
fix bug ( #1680 )
2024-11-20 07:03:56 -08:00
Illia Silin
da0c21f661
add more fp32 dl gemm instances ( #1675 )
...
* add more fp32 dl gemm instances
* update the dates
2024-11-19 10:00:17 -08:00
dependabot[bot]
e4dfe4d892
Bump rocm-docs-core from 1.8.4 to 1.8.5 in /docs/sphinx ( #1674 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.8.4 to 1.8.5.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.5/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.4...v1.8.5 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-18 22:00:18 -08:00
Illia Silin
8aba2724cc
Add bf16 and int8 wmma gemms for Navi3x and Navi4x. ( #1671 )
...
* add bf16 gemms for gfx11/gfx12
* reduce the input values in test_gemm
* add int8 wmma gemm instances for gfx11/gfx12
* add example gemm_wmma_int8
* fix bug in gemm_wmma_int8 test
* increase bf16 gemm test tolerance
* update the dates and clean-up commented-out instances
2024-11-18 14:07:04 -08:00
Bartłomiej Kocot
754adc70e3
Batched GEMM Multiple D based on Universal GEMM ( #1655 )
...
* Batched GEMM Multiple D based on Universal GEMM
Co-authored-by: Jing Zhang <jizhan@fb.com >
* CI fixes
Co-authored-by: Jing Zhang <jizhan@fb.com >
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
2024-11-18 14:03:45 +01:00
dependabot[bot]
efb34741fe
Bump rocm-docs-core from 1.8.3 to 1.8.4 in /docs/sphinx ( #1670 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.8.3 to 1.8.4.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.4/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.3...v1.8.4 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-15 18:30:58 -05:00
Illia Silin
b4a7904582
re-enable fp8 gemms in ckProfiler ( #1667 )
2024-11-14 16:15:01 -08:00
Illia Silin
3b6a481e92
re-enable coerce-illegal-types flag for rocm6.3 ( #1668 )
2024-11-14 16:14:50 -08:00
Andriy Roshchenko
d805a461aa
Fix example_convnd_fwd_max_xdl_int8 failures on MI300 ( #1666 )
...
* Improve test verbosity.
* BUGFIX: Add missing initialization for reduction buffer
* Change default initialization method
Performance may be affected for fp32 and int8 examples.
* Improve test verbosity
* Cleanup
2024-11-14 08:40:50 -08:00
feli
c1f8d53ce8
[Ck_tile] hot fix, fix rpcf param setting err ( #1657 )
...
Co-authored-by: dummycoderfe <noplydummmycoder@163.com >
2024-11-14 14:06:36 +08:00
Illia Silin
efd9261545
fix clang format ( #1662 )
2024-11-13 09:20:18 -08:00
Taylor Ding
73f02a1083
Move checks for compatibility from Argument() to IsSupportedArgument() ( #1653 )
2024-11-13 11:20:38 -05:00
Bartłomiej Kocot
d20735691c
[CK TILE] Update gemm universal pipeline ( #1644 )
...
* [CK TILE] Update gemm universal pipeline
* Fixes
* fix
* Rebase
2024-11-13 11:46:18 +01:00
Illia Silin
489c78d073
test rocm6.3 rc1 build 20 ( #1659 )
2024-11-12 09:35:33 -08:00
Thomas Ning
2b6458ddf2
[CK Tile] Improve the Layout, Padding, and Alignment features of CK Tile GEMM ( #1651 )
...
* Finished the feature
* Modified the test file
* Test case update
* addresss comment
* Addressed the review comment
* Fixed the CI error
2024-11-11 18:08:25 -08:00
Illia Silin
5fb150dbe7
restore collecting performance of mixed prec gemms ( #1648 )
2024-11-11 09:25:08 -08:00
valarLip
8ef8a994e7
[CK_TILE] add more stride for layernorm to support un-continuous Tensor ( #1650 )
...
* [CK_TILE] add more stride for layernorm to support un-continuous Tensor
* align CK coding style
* extend strides to layernrom expample
* clang-format...
2024-11-11 16:02:28 +08:00
Po Yen Chen
13332998a4
Return nullptr when block index is invalid ( #1649 )
2024-11-11 09:28:32 +08:00
dummycoderfe
bec6fbc65f
Ck tile/moe sorting ( #1624 )
...
* add moe_sorting & check ok
* fix comments & typo
* Run remod.py under include/ck_tile & example/ck_tile directories
* format codes
* fix output ci check bug
* fix moe sorting readme and error commit file
* use magiv div to accelerate compute
* add an loop unroll for moe lds ops
* add extblocksnel to set zeros for moebufs
* [Ck_tile] moe set zero run ok, add size check and fix ref check
* [Ck_tile]fix moe_sorting fuse set_zero remod
* [Ck_tile] change name style, fix zero buffer size err, change folder
* [Ck_tile] moe_sorting: fix name style
* [Ck_tile] moe_sorting, remove useless params in traits
* [Ck_tile] change outputtile cnt * unit_size; change output buf alloc
---------
Co-authored-by: dummycoderfe <noplydummmycoder@163.com >
Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
2024-11-09 17:57:27 +08:00
Po Yen Chen
af9546d9f4
Fix 'sh' command compatibility of smoke_test_fwd.sh ( #1553 )
2024-11-09 09:55:14 +08:00
Bartłomiej Kocot
ea3640fdea
Add generic instances for two stage conv bwd wei ( #1643 )
...
* Add generic instances for two stage conv bwd wei
* Update layout prefix
2024-11-08 10:04:33 +01:00
dummycoderfe
686a58a912
[Ck tile] layernorm2d fwd optimize ( #1637 )
...
* optimze small N case using vec io and using rcp div
* [Ck_tile] layernorm, add param to control fastdiv; change generate codes and test pass
* [Ck_tile] fix blockSize compute in Generic2dBlockShape
* [Ck_tile]fix kfastfdiv template style
* [Ck_tile] layernorm, fix stype in review
---------
Co-authored-by: dummycoderfe <noplydummmycoder@163.com >
2024-11-08 12:28:23 +08:00
Illia Silin
75c5bfa364
enable compilation for generic navi targets ( #1645 )
2024-11-07 14:14:42 -08:00