Commit Graph

1547 Commits

Author SHA1 Message Date
rocking
3dcaedda15 support max3 in smoothquant and add+ rmsnorm + rdquant (#1654)
* Fix cmake example build

* Support max3 in smoothquant one pass

* support max3 in two pass

* support max3 in add_rmsnorm_rdquant

[ROCm/composable_kernel commit: abae2afc72]
2024-11-27 05:01:15 +08:00
Adam Osewski
f5a55a2a41 Change block gemm pipeline local prefill loop order. (#1692)
* Fix loop order.

* Fix loop order in pipeline v4

[ROCm/composable_kernel commit: bfe983a151]
2024-11-26 17:36:53 +01:00
jakpiase
cae95c5d46 Add check for bf16 splitk support for grouped gemm splitk (#1673)
* add check for bf16 splitk support for grouped gemm splitk

* Update if condition

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: b70f367f80]
2024-11-26 13:56:32 +01:00
Po Yen Chen
b00d5d4750 [CK_TILE] Fix incorrect computation of group mode PagedAttention (#1688)
* Allow getting batch size from splitkv tile partitioner

* Fix wrong paged-kvcache impl for group mode

* Fix wrong example code for page-kvcache

* Undo changes in fmha_fwd.cpp

* Always use 2D block table

* Add is_gappy kernel argument for paged-kvcache

The is_gappy argument is used for differentiating seqstart_k_ptr usage
in flash-attention & xformers

* Remove out-of-date comments

* Remove no-longer used method

* Fix wrong # page-block calculation

* Fix wrong comment

---------

Co-authored-by: Qianfeng <qianfeng.zhang@amd.com>

[ROCm/composable_kernel commit: cf2d635ea2]
2024-11-26 20:37:54 +08:00
Adam Osewski
25f646f00c CK-Tile first draft of universal block gemm with interwave & intrawave scheduler (#1676)
* Block universal gemm.

* Universal block gemm with interwave scheduler - draft.

* Refactoring

* Move a/b_warp_tiles into BlockGemmImpl
* set BlockGemmImpl as a class member

* Change tile size for more suitable to memory bound cases.

* Introduce kKPerThread to WarpGemm

* Add documentation comment.

* Fix Interwave scheduler block gemm.

* Add compute/memory friendly tile configuration.

* Clean

* New tile configurations in gemm mem example.

* Add more static checks and fix loop order in block gemm.

* Add more static checks and use warp gemm mfma dispatcher.

* Add default scheduler block gemm.

* Remove logging in example.

[ROCm/composable_kernel commit: b6bcd76d88]
2024-11-26 08:45:14 +01:00
carlushuang
74b0db75f7 [CK_TILE] fused-moe first version (#1634)
* moe pipeline

* update code

* compile OK

* update

* update cpu reference

* update pipeline_gemm0

* compiler ok

* update pipeline

* rename to ex pipeline

* block-asm

* update

* update

* update first gemm ok

* compute correct

* update file structure

* update README

* update

* update

* update code

* update API

* return unsupport case

* add comment

* update readme

* update

* uncomment

* update

* fix build err

---------

Co-authored-by: valarLip <340077269@qq.com>

[ROCm/composable_kernel commit: 440e28b08f]
2024-11-26 11:14:56 +08:00
Po Yen Chen
69aef1e11e [CK_TILE] Fix fMHA fwd MakeKargs() compilation errors (#1689)
* Fix mis-matched tuple<> elem types

* Rename MakeKargs() as MakeKargsImpl()

---------

Co-authored-by: Qianfeng <qianfeng.zhang@amd.com>

[ROCm/composable_kernel commit: 645fe812f6]
2024-11-25 15:30:35 +08:00
dependabot[bot]
57fbba49e2 Bump rocm-docs-core from 1.8.5 to 1.9.0 in /docs/sphinx (#1691)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.5 to 1.9.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.9.0/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.5...v1.9.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: c2bcbb1379]
2024-11-24 21:41:52 -08:00
carlushuang
d6ab951548 [CK_TILE]Moe update index (#1672)
* update MOCK_ID for moe-sorting

* add moe-smoothquant

* update a comment

* fix format

* hot fix

* update topk in overflow case

* update comments

* update bf16 cvt

---------

Co-authored-by: valarLip <340077269@qq.com>

[ROCm/composable_kernel commit: 36c7ce4e0e]
2024-11-25 13:12:35 +08:00
Qianfeng
2a36aa8f41 Change in fwd-splitkv kernel to support num_splits=1 case (#1690)
* Change in fwd-splitkv kernel to support num_splits=1 case

* Update in codegen fwd-splitkv to make num_splits > 1 cases pass

* Specify instance traits in dispatch

* Fix link error for fp8 kernels

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

[ROCm/composable_kernel commit: ce2bdf42a9]
2024-11-25 12:31:38 +08:00
Illia Silin
014f1cefd5 add --squash flag when building dockers (#1686)
[ROCm/composable_kernel commit: 19d4b79039]
2024-11-22 17:16:08 -08:00
Illia Silin
a023f82827 add Andriy to the code owners (#1687)
[ROCm/composable_kernel commit: a420b3b34d]
2024-11-22 16:30:12 -08:00
schung-amd
e7c3e5b34b [CK_TILE] MakeKargs overloads for backward compatibility (#1681)
* Add overloads for MakeKargs

Overload MakeKargs to accept std::tuple<uint64_t, uint64_t> and std::tuple<void*, void*> to preserve functionality of code currently passing in list initializers or tuples.

* Add overloads for MakeKargs

Overload MakeKargs to accept std::tuple<uint64_t, uint64_t> and std::tuple<void*, void*> to preserve functionality of code currently passing in list initializers or tuples.

* Re-format files using ck_tile remod.py

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

[ROCm/composable_kernel commit: ff92222f93]
2024-11-23 06:51:35 +08:00
Illia Silin
a56baa8aeb fix path of ninjatracing (#1685)
[ROCm/composable_kernel commit: 4c7035ff08]
2024-11-22 08:30:01 -08:00
Harisankar Sadasivan
5a5bfe14f4 universal streamk fp8 changes (#1665)
* universal streamk fp8 changes & ckprofiler instances

* revert strides to -1 and verification options

* fp8 exclusion on pre-gfx94 for universal_streamk

* PR review based revisions: permissions reverted,  removed hip err checks


---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: d6d4c2788b]
2024-11-21 08:21:37 -08:00
Po Yen Chen
326639e80c [CK_TILE] Add paged-kvcache support in group mode fmha fwd splitkv kernels (#1678)
* Generate group mode paged-attn kernel

* Enable paged-kvcache + group mode support

* Add missing header: fused_moe.hpp

* Add comment to explain kernel arg usage

* Make error message more clear

* Add comment for confusing data member names

* Add more comment for confusing variable names

* Fix typo in option description

[ROCm/composable_kernel commit: fb1ccfa9df]
2024-11-21 14:53:10 +08:00
Po Yen Chen
59005fc1de Add QianFeng to code owners (#1682)
[ROCm/composable_kernel commit: 6916d8cc03]
2024-11-21 14:49:13 +08:00
Illia Silin
d42e9795ed Optimize docker file. (#1679)
* reduce the docker image size and layers

* clean up docker file

* fix linker error for client example 24

* install CK into the default /opt/rocm/ path

* restore installing CK to alternative path in CI

* add linking for utility lib

[ROCm/composable_kernel commit: d31e8249c1]
2024-11-20 14:01:04 -08:00
Haocong WANG
d31a1bb7df fix bug (#1680)
[ROCm/composable_kernel commit: 81ec5eff4a]
2024-11-20 07:03:56 -08:00
Illia Silin
f4b13d3ffd add more fp32 dl gemm instances (#1675)
* add more fp32 dl gemm instances

* update the dates

[ROCm/composable_kernel commit: da0c21f661]
2024-11-19 10:00:17 -08:00
dependabot[bot]
54b0206887 Bump rocm-docs-core from 1.8.4 to 1.8.5 in /docs/sphinx (#1674)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.4 to 1.8.5.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.5/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.4...v1.8.5)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: e4dfe4d892]
2024-11-18 22:00:18 -08:00
Illia Silin
ea702b3631 Add bf16 and int8 wmma gemms for Navi3x and Navi4x. (#1671)
* add bf16 gemms for gfx11/gfx12

* reduce the input values in test_gemm

* add int8 wmma gemm instances for gfx11/gfx12

* add example gemm_wmma_int8

* fix bug in gemm_wmma_int8 test

* increase bf16 gemm test tolerance

* update the dates and clean-up commented-out instances

[ROCm/composable_kernel commit: 8aba2724cc]
2024-11-18 14:07:04 -08:00
Bartłomiej Kocot
929a9183dc Batched GEMM Multiple D based on Universal GEMM (#1655)
* Batched GEMM Multiple D based on Universal GEMM

Co-authored-by: Jing Zhang <jizhan@fb.com>

* CI fixes

Co-authored-by: Jing Zhang <jizhan@fb.com>

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>

[ROCm/composable_kernel commit: 754adc70e3]
2024-11-18 14:03:45 +01:00
dependabot[bot]
af52a2eb20 Bump rocm-docs-core from 1.8.3 to 1.8.4 in /docs/sphinx (#1670)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.3 to 1.8.4.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.4/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.3...v1.8.4)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: efb34741fe]
2024-11-15 18:30:58 -05:00
Illia Silin
c01a539b62 re-enable fp8 gemms in ckProfiler (#1667)
[ROCm/composable_kernel commit: b4a7904582]
2024-11-14 16:15:01 -08:00
Illia Silin
0a0f2c1147 re-enable coerce-illegal-types flag for rocm6.3 (#1668)
[ROCm/composable_kernel commit: 3b6a481e92]
2024-11-14 16:14:50 -08:00
Andriy Roshchenko
2b5daba133 Fix example_convnd_fwd_max_xdl_int8 failures on MI300 (#1666)
* Improve test verbosity.

* BUGFIX: Add missing initialization for reduction buffer

* Change default initialization method

Performance may be affected for fp32 and int8 examples.

* Improve test verbosity

* Cleanup

[ROCm/composable_kernel commit: d805a461aa]
2024-11-14 08:40:50 -08:00
feli
0765bd5201 [Ck_tile] hot fix, fix rpcf param setting err (#1657)
Co-authored-by: dummycoderfe <noplydummmycoder@163.com>

[ROCm/composable_kernel commit: c1f8d53ce8]
2024-11-14 14:06:36 +08:00
Illia Silin
41d3c5ab96 fix clang format (#1662)
[ROCm/composable_kernel commit: efd9261545]
2024-11-13 09:20:18 -08:00
Taylor Ding
31e67ccb79 Move checks for compatibility from Argument() to IsSupportedArgument() (#1653)
[ROCm/composable_kernel commit: 73f02a1083]
2024-11-13 11:20:38 -05:00
Bartłomiej Kocot
d304f85838 [CK TILE] Update gemm universal pipeline (#1644)
* [CK TILE] Update gemm universal pipeline

* Fixes

* fix

* Rebase

[ROCm/composable_kernel commit: d20735691c]
2024-11-13 11:46:18 +01:00
Illia Silin
2a9a3c062c test rocm6.3 rc1 build 20 (#1659)
[ROCm/composable_kernel commit: 489c78d073]
2024-11-12 09:35:33 -08:00
Thomas Ning
04c3062b89 [CK Tile] Improve the Layout, Padding, and Alignment features of CK Tile GEMM (#1651)
* Finished the feature

* Modified the test file

* Test case update

* addresss comment

* Addressed the review comment

* Fixed the CI error

[ROCm/composable_kernel commit: 2b6458ddf2]
2024-11-11 18:08:25 -08:00
Illia Silin
40e53c828f restore collecting performance of mixed prec gemms (#1648)
[ROCm/composable_kernel commit: 5fb150dbe7]
2024-11-11 09:25:08 -08:00
valarLip
bc55a7d920 [CK_TILE] add more stride for layernorm to support un-continuous Tensor (#1650)
* [CK_TILE] add more stride for layernorm to support un-continuous Tensor

* align CK coding style

* extend strides to layernrom expample

* clang-format...

[ROCm/composable_kernel commit: 8ef8a994e7]
2024-11-11 16:02:28 +08:00
Po Yen Chen
c6de3e22d0 Return nullptr when block index is invalid (#1649)
[ROCm/composable_kernel commit: 13332998a4]
2024-11-11 09:28:32 +08:00
dummycoderfe
77f0f4ee48 Ck tile/moe sorting (#1624)
* add moe_sorting & check ok

* fix comments & typo

* Run remod.py under include/ck_tile & example/ck_tile directories

* format codes

* fix output ci check bug

* fix moe sorting readme and error commit file

* use magiv div to accelerate compute

* add an loop unroll for moe lds ops

* add extblocksnel to set zeros for moebufs

* [Ck_tile] moe set zero run ok, add size check and fix ref check

* [Ck_tile]fix moe_sorting fuse set_zero remod

* [Ck_tile] change name style, fix zero buffer size err, change folder

* [Ck_tile] moe_sorting: fix name style

* [Ck_tile] moe_sorting, remove useless params in traits

* [Ck_tile] change outputtile cnt * unit_size; change output buf alloc

---------

Co-authored-by: dummycoderfe <noplydummmycoder@163.com>
Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>

[ROCm/composable_kernel commit: bec6fbc65f]
2024-11-09 17:57:27 +08:00
Po Yen Chen
ae9d04ac98 Fix 'sh' command compatibility of smoke_test_fwd.sh (#1553)
[ROCm/composable_kernel commit: af9546d9f4]
2024-11-09 09:55:14 +08:00
Bartłomiej Kocot
783dc82064 Add generic instances for two stage conv bwd wei (#1643)
* Add generic instances for two stage conv bwd wei

* Update layout prefix

[ROCm/composable_kernel commit: ea3640fdea]
2024-11-08 10:04:33 +01:00
dummycoderfe
7ba8518112 [Ck tile] layernorm2d fwd optimize (#1637)
* optimze small N case using vec io and using rcp div

* [Ck_tile] layernorm, add param to control fastdiv; change generate codes and test pass

* [Ck_tile] fix blockSize compute in Generic2dBlockShape

* [Ck_tile]fix kfastfdiv template style

* [Ck_tile] layernorm, fix stype in review

---------

Co-authored-by: dummycoderfe <noplydummmycoder@163.com>

[ROCm/composable_kernel commit: 686a58a912]
2024-11-08 12:28:23 +08:00
Illia Silin
123aae9e6e enable compilation for generic navi targets (#1645)
[ROCm/composable_kernel commit: 75c5bfa364]
2024-11-07 14:14:42 -08:00
rocking
5f89baa4c3 Fix F16 type (#1583)
[ROCm/composable_kernel commit: 3599418aa8]
2024-11-06 11:32:44 -08:00
aledudek
4f8f789834 Generic threshold calculation after merge fixes (#1618)
* Generic threshold calculation add passing num of accums

* Generic threshold - after merge fixes

* Fix cmakelists

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: dcafb1de15]
2024-11-06 10:44:58 +01:00
Andriy Roshchenko
15bd0d9189 Prevent instantiation of undefined FP8 operators. (#1639)
[ROCm/composable_kernel commit: 365f39aed0]
2024-11-05 13:58:29 -08:00
Illia Silin
804c8701e8 remove gfx940;gfx941 from default target lists (#1640)
[ROCm/composable_kernel commit: 54440cf562]
2024-11-05 13:56:20 -08:00
darren-amd
0788008fde Statically Cast Pointer Offset (#1631)
* explicit cast ptr offset

* formating change

[ROCm/composable_kernel commit: d0e3a70a2e]
2024-11-05 09:59:08 -08:00
Illia Silin
18b5aef6fd Make sure cmake can handle the xnack+/xnack- targets. (#1633)
* make sure cmake can handle xnack targets

* dont build xdl instances for gfx906:xnack-

* dont build xdl tests for gfx906:xnack-

[ROCm/composable_kernel commit: b6e74be1aa]
2024-11-05 08:53:10 -08:00
Juan Manuel Martinez Caamaño
4bb95f18ed [generate.py] Override blob list if it already exists (#1635)
Before, generate.py appended the list at the end of the output file.
When running the cmake configuration steps multiple times on the
examples, the blob list (such as fwd_blob_list.txt) would grow at every
configuration.
`library/src/tensor_operation_instance/gpu/mha/CMakeLists.txt` worked around
this issue by removing the output file if it exists.

Now, generate.py overrides the content of the output file.
There is no need for the workaround in the CMakeLists.txt;
and the issue is solved for the example projects too.

[ROCm/composable_kernel commit: 464abd235e]
2024-11-05 10:09:52 +01:00
Lin Sun
6cc9f5e486 Linsun/convint8 fwd instances (#1626)
Add instances for int8 grouped conv2d fwd
---------

Co-authored-by: root <root@dell300x-pla-t28-03.pla.dcgpu>
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: 0c9012fb70]
2024-11-04 16:33:20 -08:00
Bartłomiej Kocot
357cab6560 Temporary disable part of dynamic op conv instances (#1630)
* Temporary disable part of dynamic op conv instances

* fix

[ROCm/composable_kernel commit: 4f1fdbb6e3]
2024-11-04 13:34:17 -08:00