composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-24 14:54:47 +00:00

Author	SHA1	Message	Date
Adam Osewski	061ac0649c	Polished Grouped GEMM APIs and new BF16 instances (#1600 ) * Few small fixes. * New GroupedGemm instances (BF16) * Unify and refactor GroupedGEMM device API. * Adapt changes to new API. * Adapt grouped gemm profiler. * Accept multiple kbatches for grouped gemm profiler. - delete obsolete two stage as it is now covered by grouped gemm * Update unit test for grouped gemm. * Fix thresholds for BF16 and F8. Unblock tests. * Fix few instances. * Multiple small fixes. * Adapt to new API, check dynamic casting. * Uncomment few data types in grouped gemm profiler. * Fix call to SetDeviceArgs. * Fix profile grouped gemm multiply tile loop. * Fix grouped gemm tile loop kernel args in client examples. * Review comments.	2024-11-27 13:02:44 +01:00
Illia Silin	cb8c7f42d6	update mainline compiler branch name (#1696 )	2024-11-26 14:58:35 -08:00
rocking	abae2afc72	support max3 in smoothquant and add+ rmsnorm + rdquant (#1654 ) * Fix cmake example build * Support max3 in smoothquant one pass * support max3 in two pass * support max3 in add_rmsnorm_rdquant	2024-11-27 05:01:15 +08:00
Adam Osewski	bfe983a151	Change block gemm pipeline local prefill loop order. (#1692 ) * Fix loop order. * Fix loop order in pipeline v4	2024-11-26 17:36:53 +01:00
jakpiase	b70f367f80	Add check for bf16 splitk support for grouped gemm splitk (#1673 ) * add check for bf16 splitk support for grouped gemm splitk * Update if condition --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2024-11-26 13:56:32 +01:00
Po Yen Chen	cf2d635ea2	[CK_TILE] Fix incorrect computation of group mode PagedAttention (#1688 ) * Allow getting batch size from splitkv tile partitioner * Fix wrong paged-kvcache impl for group mode * Fix wrong example code for page-kvcache * Undo changes in fmha_fwd.cpp * Always use 2D block table * Add is_gappy kernel argument for paged-kvcache The is_gappy argument is used for differentiating seqstart_k_ptr usage in flash-attention & xformers * Remove out-of-date comments * Remove no-longer used method * Fix wrong # page-block calculation * Fix wrong comment --------- Co-authored-by: Qianfeng <qianfeng.zhang@amd.com>	2024-11-26 20:37:54 +08:00
Adam Osewski	b6bcd76d88	CK-Tile first draft of universal block gemm with interwave & intrawave scheduler (#1676 ) * Block universal gemm. * Universal block gemm with interwave scheduler - draft. * Refactoring * Move a/b_warp_tiles into BlockGemmImpl * set BlockGemmImpl as a class member * Change tile size for more suitable to memory bound cases. * Introduce kKPerThread to WarpGemm * Add documentation comment. * Fix Interwave scheduler block gemm. * Add compute/memory friendly tile configuration. * Clean * New tile configurations in gemm mem example. * Add more static checks and fix loop order in block gemm. * Add more static checks and use warp gemm mfma dispatcher. * Add default scheduler block gemm. * Remove logging in example.	2024-11-26 08:45:14 +01:00
carlushuang	440e28b08f	[CK_TILE] fused-moe first version (#1634 ) * moe pipeline * update code * compile OK * update * update cpu reference * update pipeline_gemm0 * compiler ok * update pipeline * rename to ex pipeline * block-asm * update * update * update first gemm ok * compute correct * update file structure * update README * update * update * update code * update API * return unsupport case * add comment * update readme * update * uncomment * update * fix build err --------- Co-authored-by: valarLip <340077269@qq.com>	2024-11-26 11:14:56 +08:00
Po Yen Chen	645fe812f6	[CK_TILE] Fix fMHA fwd MakeKargs() compilation errors (#1689 ) * Fix mis-matched tuple<> elem types * Rename MakeKargs() as MakeKargsImpl() --------- Co-authored-by: Qianfeng <qianfeng.zhang@amd.com>	2024-11-25 15:30:35 +08:00
dependabot[bot]	c2bcbb1379	Bump rocm-docs-core from 1.8.5 to 1.9.0 in /docs/sphinx (#1691 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.5 to 1.9.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.9.0/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.5...v1.9.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-11-24 21:41:52 -08:00
carlushuang	36c7ce4e0e	[CK_TILE]Moe update index (#1672 ) * update MOCK_ID for moe-sorting * add moe-smoothquant * update a comment * fix format * hot fix * update topk in overflow case * update comments * update bf16 cvt --------- Co-authored-by: valarLip <340077269@qq.com>	2024-11-25 13:12:35 +08:00
Qianfeng	ce2bdf42a9	Change in fwd-splitkv kernel to support num_splits=1 case (#1690 ) * Change in fwd-splitkv kernel to support num_splits=1 case * Update in codegen fwd-splitkv to make num_splits > 1 cases pass * Specify instance traits in dispatch * Fix link error for fp8 kernels --------- Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>	2024-11-25 12:31:38 +08:00
Illia Silin	19d4b79039	add --squash flag when building dockers (#1686 )	2024-11-22 17:16:08 -08:00
Illia Silin	a420b3b34d	add Andriy to the code owners (#1687 )	2024-11-22 16:30:12 -08:00
schung-amd	ff92222f93	[CK_TILE] MakeKargs overloads for backward compatibility (#1681 ) * Add overloads for MakeKargs Overload MakeKargs to accept std::tuple<uint64_t, uint64_t> and std::tuple<void, void> to preserve functionality of code currently passing in list initializers or tuples. * Add overloads for MakeKargs Overload MakeKargs to accept std::tuple<uint64_t, uint64_t> and std::tuple<void, void> to preserve functionality of code currently passing in list initializers or tuples. * Re-format files using ck_tile remod.py --------- Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>	2024-11-23 06:51:35 +08:00
Illia Silin	4c7035ff08	fix path of ninjatracing (#1685 )	2024-11-22 08:30:01 -08:00
Harisankar Sadasivan	d6d4c2788b	universal streamk fp8 changes (#1665 ) * universal streamk fp8 changes & ckprofiler instances * revert strides to -1 and verification options * fp8 exclusion on pre-gfx94 for universal_streamk * PR review based revisions: permissions reverted, removed hip err checks --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2024-11-21 08:21:37 -08:00
Po Yen Chen	fb1ccfa9df	[CK_TILE] Add paged-kvcache support in group mode fmha fwd splitkv kernels (#1678 ) * Generate group mode paged-attn kernel * Enable paged-kvcache + group mode support * Add missing header: fused_moe.hpp * Add comment to explain kernel arg usage * Make error message more clear * Add comment for confusing data member names * Add more comment for confusing variable names * Fix typo in option description	2024-11-21 14:53:10 +08:00
Po Yen Chen	6916d8cc03	Add QianFeng to code owners (#1682 )	2024-11-21 14:49:13 +08:00
Illia Silin	d31e8249c1	Optimize docker file. (#1679 ) * reduce the docker image size and layers * clean up docker file * fix linker error for client example 24 * install CK into the default /opt/rocm/ path * restore installing CK to alternative path in CI * add linking for utility lib	2024-11-20 14:01:04 -08:00
Haocong WANG	81ec5eff4a	fix bug (#1680 )	2024-11-20 07:03:56 -08:00
Illia Silin	da0c21f661	add more fp32 dl gemm instances (#1675 ) * add more fp32 dl gemm instances * update the dates	2024-11-19 10:00:17 -08:00
dependabot[bot]	e4dfe4d892	Bump rocm-docs-core from 1.8.4 to 1.8.5 in /docs/sphinx (#1674 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.4 to 1.8.5. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.5/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.4...v1.8.5) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-11-18 22:00:18 -08:00
Illia Silin	8aba2724cc	Add bf16 and int8 wmma gemms for Navi3x and Navi4x. (#1671 ) * add bf16 gemms for gfx11/gfx12 * reduce the input values in test_gemm * add int8 wmma gemm instances for gfx11/gfx12 * add example gemm_wmma_int8 * fix bug in gemm_wmma_int8 test * increase bf16 gemm test tolerance * update the dates and clean-up commented-out instances	2024-11-18 14:07:04 -08:00
Bartłomiej Kocot	754adc70e3	Batched GEMM Multiple D based on Universal GEMM (#1655 ) * Batched GEMM Multiple D based on Universal GEMM Co-authored-by: Jing Zhang <jizhan@fb.com> * CI fixes Co-authored-by: Jing Zhang <jizhan@fb.com> --------- Co-authored-by: Jing Zhang <jizhan@fb.com>	2024-11-18 14:03:45 +01:00
dependabot[bot]	efb34741fe	Bump rocm-docs-core from 1.8.3 to 1.8.4 in /docs/sphinx (#1670 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.3 to 1.8.4. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.4/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.3...v1.8.4) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-11-15 18:30:58 -05:00
Illia Silin	b4a7904582	re-enable fp8 gemms in ckProfiler (#1667 )	2024-11-14 16:15:01 -08:00
Illia Silin	3b6a481e92	re-enable coerce-illegal-types flag for rocm6.3 (#1668 )	2024-11-14 16:14:50 -08:00
Andriy Roshchenko	d805a461aa	Fix example_convnd_fwd_max_xdl_int8 failures on MI300 (#1666 ) * Improve test verbosity. * BUGFIX: Add missing initialization for reduction buffer * Change default initialization method Performance may be affected for fp32 and int8 examples. * Improve test verbosity * Cleanup	2024-11-14 08:40:50 -08:00
feli	c1f8d53ce8	[Ck_tile] hot fix, fix rpcf param setting err (#1657 ) Co-authored-by: dummycoderfe <noplydummmycoder@163.com>	2024-11-14 14:06:36 +08:00
Illia Silin	efd9261545	fix clang format (#1662 )	2024-11-13 09:20:18 -08:00
Taylor Ding	73f02a1083	Move checks for compatibility from Argument() to IsSupportedArgument() (#1653 )	2024-11-13 11:20:38 -05:00
Bartłomiej Kocot	d20735691c	[CK TILE] Update gemm universal pipeline (#1644 ) * [CK TILE] Update gemm universal pipeline * Fixes * fix * Rebase	2024-11-13 11:46:18 +01:00
Illia Silin	489c78d073	test rocm6.3 rc1 build 20 (#1659 )	2024-11-12 09:35:33 -08:00
Thomas Ning	2b6458ddf2	[CK Tile] Improve the Layout, Padding, and Alignment features of CK Tile GEMM (#1651 ) * Finished the feature * Modified the test file * Test case update * addresss comment * Addressed the review comment * Fixed the CI error	2024-11-11 18:08:25 -08:00
Illia Silin	5fb150dbe7	restore collecting performance of mixed prec gemms (#1648 )	2024-11-11 09:25:08 -08:00
valarLip	8ef8a994e7	[CK_TILE] add more stride for layernorm to support un-continuous Tensor (#1650 ) * [CK_TILE] add more stride for layernorm to support un-continuous Tensor * align CK coding style * extend strides to layernrom expample * clang-format...	2024-11-11 16:02:28 +08:00
Po Yen Chen	13332998a4	Return nullptr when block index is invalid (#1649 )	2024-11-11 09:28:32 +08:00
dummycoderfe	bec6fbc65f	Ck tile/moe sorting (#1624 ) * add moe_sorting & check ok * fix comments & typo * Run remod.py under include/ck_tile & example/ck_tile directories * format codes * fix output ci check bug * fix moe sorting readme and error commit file * use magiv div to accelerate compute * add an loop unroll for moe lds ops * add extblocksnel to set zeros for moebufs * [Ck_tile] moe set zero run ok, add size check and fix ref check * [Ck_tile]fix moe_sorting fuse set_zero remod * [Ck_tile] change name style, fix zero buffer size err, change folder * [Ck_tile] moe_sorting: fix name style * [Ck_tile] moe_sorting, remove useless params in traits * [Ck_tile] change outputtile cnt * unit_size; change output buf alloc --------- Co-authored-by: dummycoderfe <noplydummmycoder@163.com> Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com> Co-authored-by: carlushuang <carlus.huang@amd.com>	2024-11-09 17:57:27 +08:00
Po Yen Chen	af9546d9f4	Fix 'sh' command compatibility of smoke_test_fwd.sh (#1553 )	2024-11-09 09:55:14 +08:00
Bartłomiej Kocot	ea3640fdea	Add generic instances for two stage conv bwd wei (#1643 ) * Add generic instances for two stage conv bwd wei * Update layout prefix	2024-11-08 10:04:33 +01:00
dummycoderfe	686a58a912	[Ck tile] layernorm2d fwd optimize (#1637 ) * optimze small N case using vec io and using rcp div * [Ck_tile] layernorm, add param to control fastdiv; change generate codes and test pass * [Ck_tile] fix blockSize compute in Generic2dBlockShape * [Ck_tile]fix kfastfdiv template style * [Ck_tile] layernorm, fix stype in review --------- Co-authored-by: dummycoderfe <noplydummmycoder@163.com>	2024-11-08 12:28:23 +08:00
Illia Silin	75c5bfa364	enable compilation for generic navi targets (#1645 )	2024-11-07 14:14:42 -08:00
rocking	3599418aa8	Fix F16 type (#1583 )	2024-11-06 11:32:44 -08:00
aledudek	dcafb1de15	Generic threshold calculation after merge fixes (#1618 ) * Generic threshold calculation add passing num of accums * Generic threshold - after merge fixes * Fix cmakelists --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2024-11-06 10:44:58 +01:00
Andriy Roshchenko	365f39aed0	Prevent instantiation of undefined FP8 operators. (#1639 )	2024-11-05 13:58:29 -08:00
Illia Silin	54440cf562	remove gfx940;gfx941 from default target lists (#1640 )	2024-11-05 13:56:20 -08:00
darren-amd	d0e3a70a2e	Statically Cast Pointer Offset (#1631 ) * explicit cast ptr offset * formating change	2024-11-05 09:59:08 -08:00
Illia Silin	b6e74be1aa	Make sure cmake can handle the xnack+/xnack- targets. (#1633 ) * make sure cmake can handle xnack targets * dont build xdl instances for gfx906:xnack- * dont build xdl tests for gfx906:xnack-	2024-11-05 08:53:10 -08:00
Juan Manuel Martinez Caamaño	464abd235e	[generate.py] Override blob list if it already exists (#1635 ) Before, generate.py appended the list at the end of the output file. When running the cmake configuration steps multiple times on the examples, the blob list (such as fwd_blob_list.txt) would grow at every configuration. `library/src/tensor_operation_instance/gpu/mha/CMakeLists.txt` worked around this issue by removing the output file if it exists. Now, generate.py overrides the content of the output file. There is no need for the workaround in the CMakeLists.txt; and the issue is solved for the example projects too.	2024-11-05 10:09:52 +01:00

1 2 3 4 5 ...

1549 Commits