coderfeli
99c8123f48
change to static
2024-12-04 08:47:59 +00:00
coderfeli
928b6d1a56
split smem to 2array, but still same
2024-12-04 08:12:06 +00:00
coderfeli
c275904b7d
try to fix hint
2024-12-03 06:53:32 +00:00
coderfeli
730c5fffeb
fix linear
2024-12-03 08:32:57 +08:00
coderfeli
4525c5d75c
merge upstream
2024-12-02 05:33:33 +00:00
coderfeli
a8d88d8df8
tmp before merge
2024-12-02 04:12:57 +00:00
coderfeli
c7d08b7c2a
use hasmainloop; no spill for 3tail
2024-12-01 04:50:58 +00:00
Max Podkorytov
44828b7c0f
[Python] Add batched gemm instances parsing ( #1684 )
...
* add op
* do not insert ds parameters as they are already parsed
* reset ds parameters
* apply ruff
2024-11-30 08:11:42 -08:00
Bartłomiej Kocot
cff7fab798
[CK TILE] Fix universal gemm template keywords ( #1704 )
2024-11-29 20:51:09 -08:00
coderfeli
532eb870c5
fix warning and use default epilog and one out
2024-11-30 03:00:48 +00:00
dependabot[bot]
28e02cf524
Bump rocm-docs-core from 1.9.1 to 1.9.2 in /docs/sphinx ( #1702 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.9.1 to 1.9.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.9.1...v1.9.2 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-29 07:18:43 -08:00
root
613e45b961
cshuffle v2 result correct, but perf awful
2024-11-29 15:09:36 +00:00
aledudek
78f0fea08e
Ck tile batched gemm example ( #1615 )
...
* [CK Tile] Batched GEMM Example
* [CK Tile] Batched GEMM Example - minor refactor
* [CK Tile] Batched GEMM Example - README update
* [CK Tile] Batched Gemm Example - review changes
- Added tensor data layours as input parameters
- Changed structure of Host and Kernel args
- Removed bug with invalid vector read on non-contiguous memory
* [CK Tile] Batched Gemm Example - remove comment
* [CK Tile] Batched Gemm Example - Add GTests part1
* [CK Tile] Batched Gemm Example - GTests part2 + review changes
* [CK TILE] Batched GEMM post merge fixes
* [CK Tile] Batched GEMM Example - fix pad views
2024-11-29 11:52:18 +01:00
dependabot[bot]
bb652696e7
Bump rocm-docs-core from 1.9.0 to 1.9.1 in /docs/sphinx ( #1701 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.9.0 to 1.9.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.9.0...v1.9.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-28 10:43:36 -08:00
Illia Silin
aa6e2087f5
Reduce docker size and build time in CI. ( #1699 )
...
* refactor docker build in CI
* add Dockerfile.compiler
* add input args to Dockerfile.compiler
* rearrange the docker args
2024-11-28 10:42:19 -08:00
Bartłomiej Kocot
f49b595dc0
[CK TILE] Add gemm compute pipeline v3 ( #1661 )
...
* [CK TILE] Add gemm compute pipeline v3
* Enable universal gemm compute pipeline.
* Rename example and add compute pipeline.
* Introduce ag bg cr pipeline impl base.
* Refactor to reuse code.
* Cleaning
* Formatting.
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
Co-authored-by: Adam Osewski <Adam.Osewski@amd.com >
2024-11-28 17:51:49 +01:00
coderfeli
801f995cb4
tmp:add smem cshuffle code but not debug
2024-11-28 14:38:56 +00:00
coderfeli
5a2d93d436
revert code
2024-11-28 10:33:42 +00:00
coderfeli
6a07464b56
change ways but still could not use immediate data as ds_read
2024-11-28 08:54:32 +00:00
jakpiase
e7b6286441
Add interwave scheduler for gemm mem pipeline ( #1647 )
...
* add interwave scheduler for gemm mem pipeline
* Fix merge artifacts.
* Refactor unit tests.
* Switch to interwave scheduler for mem example
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
Co-authored-by: Adam Osewski <Adam.Osewski@amd.com >
2024-11-27 18:25:07 +01:00
Illia Silin
fe6b185b97
move utility headers from library/include to include path ( #1697 )
2024-11-27 06:12:56 -08:00
Adam Osewski
061ac0649c
Polished Grouped GEMM APIs and new BF16 instances ( #1600 )
...
* Few small fixes.
* New GroupedGemm instances (BF16)
* Unify and refactor GroupedGEMM device API.
* Adapt changes to new API.
* Adapt grouped gemm profiler.
* Accept multiple kbatches for grouped gemm profiler.
- delete obsolete two stage as it is now covered by grouped gemm
* Update unit test for grouped gemm.
* Fix thresholds for BF16 and F8. Unblock tests.
* Fix few instances.
* Multiple small fixes.
* Adapt to new API, check dynamic casting.
* Uncomment few data types in grouped gemm profiler.
* Fix call to SetDeviceArgs.
* Fix profile grouped gemm multiply tile loop.
* Fix grouped gemm tile loop kernel args in client examples.
* Review comments.
2024-11-27 13:02:44 +01:00
dummycoderfe
405c05c0be
add prefetch and fix output err
2024-11-27 11:10:28 +00:00
dummycoderfe
6c270303b3
change pipelines to v4. compile ok
2024-11-27 09:12:15 +00:00
Illia Silin
cb8c7f42d6
update mainline compiler branch name ( #1696 )
2024-11-26 14:58:35 -08:00
rocking
abae2afc72
support max3 in smoothquant and add+ rmsnorm + rdquant ( #1654 )
...
* Fix cmake example build
* Support max3 in smoothquant one pass
* support max3 in two pass
* support max3 in add_rmsnorm_rdquant
2024-11-27 05:01:15 +08:00
Adam Osewski
bfe983a151
Change block gemm pipeline local prefill loop order. ( #1692 )
...
* Fix loop order.
* Fix loop order in pipeline v4
2024-11-26 17:36:53 +01:00
jakpiase
b70f367f80
Add check for bf16 splitk support for grouped gemm splitk ( #1673 )
...
* add check for bf16 splitk support for grouped gemm splitk
* Update if condition
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2024-11-26 13:56:32 +01:00
Po Yen Chen
cf2d635ea2
[CK_TILE] Fix incorrect computation of group mode PagedAttention ( #1688 )
...
* Allow getting batch size from splitkv tile partitioner
* Fix wrong paged-kvcache impl for group mode
* Fix wrong example code for page-kvcache
* Undo changes in fmha_fwd.cpp
* Always use 2D block table
* Add is_gappy kernel argument for paged-kvcache
The is_gappy argument is used for differentiating seqstart_k_ptr usage
in flash-attention & xformers
* Remove out-of-date comments
* Remove no-longer used method
* Fix wrong # page-block calculation
* Fix wrong comment
---------
Co-authored-by: Qianfeng <qianfeng.zhang@amd.com >
2024-11-26 20:37:54 +08:00
dummycoderfe
c808fa6502
change xdl to best now as ref
2024-11-26 09:43:01 +00:00
Adam Osewski
b6bcd76d88
CK-Tile first draft of universal block gemm with interwave & intrawave scheduler ( #1676 )
...
* Block universal gemm.
* Universal block gemm with interwave scheduler - draft.
* Refactoring
* Move a/b_warp_tiles into BlockGemmImpl
* set BlockGemmImpl as a class member
* Change tile size for more suitable to memory bound cases.
* Introduce kKPerThread to WarpGemm
* Add documentation comment.
* Fix Interwave scheduler block gemm.
* Add compute/memory friendly tile configuration.
* Clean
* New tile configurations in gemm mem example.
* Add more static checks and fix loop order in block gemm.
* Add more static checks and use warp gemm mfma dispatcher.
* Add default scheduler block gemm.
* Remove logging in example.
2024-11-26 08:45:14 +01:00
coderfeli
e511bb7803
lds a,b ok
2024-11-26 13:18:02 +08:00
carlushuang
440e28b08f
[CK_TILE] fused-moe first version ( #1634 )
...
* moe pipeline
* update code
* compile OK
* update
* update cpu reference
* update pipeline_gemm0
* compiler ok
* update pipeline
* rename to ex pipeline
* block-asm
* update
* update
* update first gemm ok
* compute correct
* update file structure
* update README
* update
* update
* update code
* update API
* return unsupport case
* add comment
* update readme
* update
* uncomment
* update
* fix build err
---------
Co-authored-by: valarLip <340077269@qq.com >
2024-11-26 11:14:56 +08:00
Po Yen Chen
645fe812f6
[CK_TILE] Fix fMHA fwd MakeKargs() compilation errors ( #1689 )
...
* Fix mis-matched tuple<> elem types
* Rename MakeKargs() as MakeKargsImpl()
---------
Co-authored-by: Qianfeng <qianfeng.zhang@amd.com >
2024-11-25 15:30:35 +08:00
dependabot[bot]
c2bcbb1379
Bump rocm-docs-core from 1.8.5 to 1.9.0 in /docs/sphinx ( #1691 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.8.5 to 1.9.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.9.0/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.5...v1.9.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-24 21:41:52 -08:00
carlushuang
36c7ce4e0e
[CK_TILE]Moe update index ( #1672 )
...
* update MOCK_ID for moe-sorting
* add moe-smoothquant
* update a comment
* fix format
* hot fix
* update topk in overflow case
* update comments
* update bf16 cvt
---------
Co-authored-by: valarLip <340077269@qq.com >
2024-11-25 13:12:35 +08:00
Qianfeng
ce2bdf42a9
Change in fwd-splitkv kernel to support num_splits=1 case ( #1690 )
...
* Change in fwd-splitkv kernel to support num_splits=1 case
* Update in codegen fwd-splitkv to make num_splits > 1 cases pass
* Specify instance traits in dispatch
* Fix link error for fp8 kernels
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
2024-11-25 12:31:38 +08:00
Illia Silin
19d4b79039
add --squash flag when building dockers ( #1686 )
2024-11-22 17:16:08 -08:00
Illia Silin
a420b3b34d
add Andriy to the code owners ( #1687 )
2024-11-22 16:30:12 -08:00
schung-amd
ff92222f93
[CK_TILE] MakeKargs overloads for backward compatibility ( #1681 )
...
* Add overloads for MakeKargs
Overload MakeKargs to accept std::tuple<uint64_t, uint64_t> and std::tuple<void*, void*> to preserve functionality of code currently passing in list initializers or tuples.
* Add overloads for MakeKargs
Overload MakeKargs to accept std::tuple<uint64_t, uint64_t> and std::tuple<void*, void*> to preserve functionality of code currently passing in list initializers or tuples.
* Re-format files using ck_tile remod.py
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
2024-11-23 06:51:35 +08:00
Illia Silin
4c7035ff08
fix path of ninjatracing ( #1685 )
2024-11-22 08:30:01 -08:00
dummycoderfe
d51f4e5290
use 32x32x8 ok, fix scratch store
2024-11-22 14:26:01 +00:00
Bartlomiej Kocot
bc4366d4da
[CK TILE] Add gemm compute pipeline v3
2024-11-22 08:50:24 +00:00
Harisankar Sadasivan
d6d4c2788b
universal streamk fp8 changes ( #1665 )
...
* universal streamk fp8 changes & ckprofiler instances
* revert strides to -1 and verification options
* fp8 exclusion on pre-gfx94 for universal_streamk
* PR review based revisions: permissions reverted, removed hip err checks
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2024-11-21 08:21:37 -08:00
Po Yen Chen
fb1ccfa9df
[CK_TILE] Add paged-kvcache support in group mode fmha fwd splitkv kernels ( #1678 )
...
* Generate group mode paged-attn kernel
* Enable paged-kvcache + group mode support
* Add missing header: fused_moe.hpp
* Add comment to explain kernel arg usage
* Make error message more clear
* Add comment for confusing data member names
* Add more comment for confusing variable names
* Fix typo in option description
2024-11-21 14:53:10 +08:00
Po Yen Chen
6916d8cc03
Add QianFeng to code owners ( #1682 )
2024-11-21 14:49:13 +08:00
Illia Silin
d31e8249c1
Optimize docker file. ( #1679 )
...
* reduce the docker image size and layers
* clean up docker file
* fix linker error for client example 24
* install CK into the default /opt/rocm/ path
* restore installing CK to alternative path in CI
* add linking for utility lib
2024-11-20 14:01:04 -08:00
Haocong WANG
81ec5eff4a
fix bug ( #1680 )
2024-11-20 07:03:56 -08:00
Illia Silin
da0c21f661
add more fp32 dl gemm instances ( #1675 )
...
* add more fp32 dl gemm instances
* update the dates
2024-11-19 10:00:17 -08:00
dependabot[bot]
e4dfe4d892
Bump rocm-docs-core from 1.8.4 to 1.8.5 in /docs/sphinx ( #1674 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.8.4 to 1.8.5.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.5/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.4...v1.8.5 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-18 22:00:18 -08:00