composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-14 19:18:35 +00:00

Author	SHA1	Message	Date
Illia Silin	dcbfa79542	turn on the ck_tile gemm tests by default (#1849 )	2025-01-30 07:03:48 -08:00
Bartłomiej Kocot	25e2e0f04a	[CK TILE] Implement cschuflle algorithm (#1842 ) * [CK TILE] Implement cschuflle algorithm * Rebase * Vector store size fixes * fixes * Fixes * fixes * fmha fix * fixes * fixes of fixes	2025-01-30 11:57:39 +01:00
fangche123	c5fff071e5	add batched_transpose implement (#1660 ) * add batched_transpose implement --------- Co-authored-by: root <root@ctr-ubbsmc16.amd.com> Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com> Co-authored-by: ThomasNing <thomas.ning@amd.com>	2025-01-28 16:22:02 -08:00
darren-amd	d6a4605e1c	Change flag to CK_GFX90A_DENORM_WORKAROUND (#1817 ) * Change flag from CK_WORKAROUND_DENORM_FIX to CK_GFX90A_DENORM_WORKAROUND for more clarity. Also changed the definition macros to be more clear.	2025-01-28 09:58:39 -05:00
Andriy Roshchenko	35aebe5936	Add OCP FP8 support in CK_TILE (#1829 ) * Add OCP FP8 to CK_TILE * Validate OCP FP8 in FMHA FWD under VALID=1	2025-01-27 11:59:49 -07:00
Adam Osewski	39dc25a9b8	[CK-Tile] Enable vectorized reads on all layouts & improve perf. (#1835 ) * Refactor universal gemm policy. * Adapt example to refactor changes. * Introduce static encoding pattern * Adding shuffled encoding patterns. * Fix err in reverse tuple. * Add transpose_tile2d * Small refactoring + doc * Enable reading on contiguous dimension in all layouts. * Transpose A/B register tile if needed for comp v3 pipeline. * Take contiguous dim size when calculating dram vector load size. * A/B smem pack size taken from WarpGemm attributes * Update B LDS layout and setup tile distribution pattern at class level. * Fix static assert. * Fix errors in examples. * Formatting & fix IsTranspose * Fix VectorSize & refactor. * Add error loging messages. * Fix VecLoadSize and TranspseC for mem pipeline. * Update unit-tests & disable mem pipeline. * Clang format * Update include/ck_tile/core/tensor/tile_window.hpp Co-authored-by: jakpiase <jakub.piasecki@amd.com> * Fix compilation and reviewers comments. * Refactor unit-test. Fallback to non-universal gemm. Need to use GemmPipelineAGmemBGmemCRegV1 for now, since GemmKernel is now supporting also non-K major vector reads. --------- Co-authored-by: jakpiase <jakub.piasecki@amd.com>	2025-01-27 16:37:19 +01:00
ruanjm	64d5c4d6cb	Implement fp8 quant for layernorm and rmsnorm (#1814 )	2025-01-24 16:40:43 +08:00
carlushuang	5b9b083dbc	[CK_TILE] not using structures under ck_tile/ops for ck_tile/host (#1834 ) * not using structures under ck_tile/ops for ck_tile/host * update as constexpr function * Rename fn * Update other examples. --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>	2025-01-24 15:35:54 +08:00
carlushuang	052a72655c	add fp8 as dst (#1830 )	2025-01-22 17:34:27 +08:00
dependabot[bot]	1fe2c35291	Bump rocm-docs-core from 1.13.0 to 1.14.1 in /docs/sphinx (#1832 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.13.0 to 1.14.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.13.0...v1.14.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-01-21 21:30:30 -08:00
Bartłomiej Kocot	742f5d6b55	Add Conv NGCHW client example (#1831 )	2025-01-22 01:02:03 +01:00
Mateusz Ozga	3db77bc4f2	Simplify static_cast if-lands (#1828 )	2025-01-21 23:23:19 +01:00
Mateusz Ozga	3c93d3c444	CK-Tile Grouped GEMM refactor and post PR fixes (#1756 ) * Grouped gemm simple code refactor * Offset invoker * Invoke generic Run, and replace name of parrtitioner variable * Tests fix type * Removed namespaces * Add template param to avoid implicit cast * Remove generic function * Constant value * underline enum to int16_t * Generalize partitioner function * Remove whitespaces * Rename function * Using support * Clang-format * Clang-format * Fn-partitioner description fn * Typo * Typo 2 * Better description * Better description * Refactor after review * Use ctr instead of set fn * Inovke ctr and typo * Comments * Remove unnecessary comment * Review, remove modulo	2025-01-21 21:06:10 +01:00
deepsek	e7dce4d247	Added bf16 instances grouped gemm fixed nk (#1825 ) * Feat: Add bf16 input instances * feat: Add BF16 profiler code * fix: reorder enum types * fix: CI fail due to clang-format * fix: clang script format issue * fix: clang format broke cmakelist file	2025-01-20 09:13:09 -08:00
lucbruni-amd	3fb2f5acc7	Add CK_TIME_KERNEL as toggleable CMake Variable (#1794 ) * Disable CK_TIME_KERNEL by Default, Add as CMake Variable * Enable CK_TIME_KERNEL by Default, Maintaining CMake Variable Functionality. * Fix build error.	2025-01-20 07:09:19 -08:00
Mingtao Gu	86d1b46aa6	fix a bug for int4 scale weight only kernel (#1820 ) Co-authored-by: mtgu0705 <mtgu@amd.com>	2025-01-19 11:18:18 +08:00
Bartłomiej Kocot	bdddf1eace	[CK_TILE] Add error threshold calculation for gemm examples (#1821 )	2025-01-18 01:01:52 +01:00
deepsek	0fcbb25f70	fix: preprocessor directives logic error if/else (#1764 ) * fix: preprocessors logic error if/else * fix: added macros as preferred by CK team	2025-01-16 20:31:15 -08:00
Aviral Goel	54de3e55e1	Implementing Test Filters for Smoke and Regression Tests (#1819 ) * smoke and regression targets working with tests * test filters work for both examples and test * removed uneccesary comments * added a missing comment * added a missing comment * fixed typo in the comments * updated README * Update PULL_REQUEST_TEMPLATE.md updating the template for future addition of test cases * Update PULL_REQUEST_TEMPLATE.md	2025-01-16 16:40:08 -08:00
Bartłomiej Kocot	1519ce91a3	Fix and optimize dynamic unary elementwise (#1818 ) * Fix and optimize dynamic unary elementwise * fix	2025-01-16 13:48:39 -08:00
carlushuang	1ff50e78c6	[CK_TILE] Fix mock token id, support g1u1/g1u0 through same inline code block (#1808 ) * fix mock token id * prepare host for g1u1 * reformat inline-asm * restructure uk_0 * restructure gate_up * done * change default to init=1 * update readme * fix a bug in interleave pipeline * rcp for silu	2025-01-16 17:51:10 +08:00
Illia Silin	8c29e06f3c	disable inductor codegen tests on legacy OS (#1816 )	2025-01-15 12:11:54 -08:00
Bartłomiej Kocot	7790e8c3f7	Add rounding for float to bf16 conversion as default (#1812 ) * Add rounding for float to bf16 conversion * Add bhalf test * Add inf test bhalf * Refactor * update cmake * Fixes	2025-01-15 16:41:21 +01:00
ruanjm	04dd314883	[CK_TILE] Add Various Fusion Functions to RMSNorm (#1802 ) * Add shortcut to RMSNorm * Modify test for adding shortcut for RMSNorm * Add fused parameter into tests * 1. Add YDataType. 2. rmsnorm2d_fwd_traits_ from rmsnorm2d_fwd.hpp to rmsnorm2d_fwd_api.cpp and rmsnorm2d_fwd_instance_common.hpp * 1. Supports various stride and percisions. * Add support of Epilogue * Add fuse and epilogue support to rmsnorm ref * Modify rmsnorm example * Refactor tests/examples * Bug fix for newly added tests/examples * Bug fix for new tests 2 * Modify smoke test scripts remove dbg code * Supports non-smooth dyanmic quant * Update Rmsnorm2dFwd::GetName() * rename xscale and prec_sx to smoothscale and prec_sm Bug fix after rename Remove files * change example_rmsnorm2d_fwd.cpp * update performance calculator * Fix issue in two-pass when fuse add is enabled * Remove comment of beta --------- Co-authored-by: rocking <ChunYu.Lai@amd.com>	2025-01-15 10:23:48 +08:00
Max Podkorytov	c0b90f130f	fix parsing instances for pt inductor (#1796 ) add unit test for gen instances for gemms add unit tests for conv and batched gemms add unit test for preselected gemm instances apply ruff lint add license header for the unit test add inductor pytest to CI verbose pip install switch the directory before installing python packages move the inductor codegen test try yet another workdir Update Jenkinsfile The directory looks right, fixing pip module not found by invoking pip directly Update Jenkinsfile invoke pytest directly since the module is not found Update Dockerfile Install setuptools update package structure bump setuptools maybe fix data path for library sources fix library search path for conv instances fix path in pyproject definition compare path used in gen_instances with one in pyproject.toml; fix the difference Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2025-01-13 13:51:08 -08:00
feli	53ab1b9047	Dev/merge u8w8 (#1774 ) * port tiles from a8w8 * rm debug used files * add instances * remove all non gemm in cmake * merge; impl fp16 * recover cmake from develop * add missed files; fix clang format --------- Co-authored-by: coderfeli <coderfeli@163.com>	2025-01-13 10:25:14 -08:00
Thomas Ning	5d671a5fc4	CK Tile GEMM CICD fixed & register block method refactor (#1776 ) * refactor the block_gemm_areg_breg_creg_v1 and add the v2 policy with 2x2 warp gemm * Finished the 2x2 warp gemm policy and the block selection mechanism * Clang format * address poyen's comment * Address feedbacks * Fixed the compilation issue * Change the function name	2025-01-13 13:10:44 +08:00
ClementLinCF	0b8f117f1a	[CK_TILE] Adjust kBlockSize of reduce example for better perf (#1779 ) * Observed a 2x perf improvement with kBlockSize = 256 * Using 512 threads may lead to redundant computations	2025-01-12 20:50:32 -08:00
Qianfeng	3d50f57f43	Update for fmha_fwd qs_ks_vs pipeline (#1810 ) * Update for fmha_fwd qs_ks_vs pipeline * Remove _builtin_amdgcn_sched_barrier(0) * Move p_compute to p converting earlier for trying to increase vgprs re-using * Enable GetQKBlockGemm to use WarpGemm-16x16x16 for QLoadOnce==false situation * Re-add __builtin_amdgcn_sched_barrier(0) --------- Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>	2025-01-13 12:43:05 +08:00
Bartłomiej Kocot	fd46a01d8b	Grouped convolution backward weight special vector size loads (#1772 ) * Grouped convolution backward weight special vector size loads * Instnaces and tests * Fixes * Add 7 and 13 special cases * fix comments * Fix * Fix2 * fixes * fix atomic add bf16	2025-01-10 22:02:30 +08:00
Thomas Ning	73a076eee1	Ck tile/gemm perf measure (#1750 ) * Finished adding the performance benchmark for ck tile gemm * Fix the executable rename problem * fix the executable name error * delete the unsupported layout combinations * Update run_full_test.sh * Update benchmark_mem_pipeline.sh * Update benchmark_basic.sh * change the executable of gemm_universal * change ck_tile_gemm script permissions * Addressed the comment * Addressed the comment * Fixed the comments * Fixed Comment * roll back the malfunctioned change * Fix the Typo * finalize the tile_gemm_fp16 performance monitoring * fix the stash names for ck_tile gemm logs * change the stashing logic * change stashing syntax --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com>	2025-01-09 17:41:49 -08:00
darren-amd	26b3829c02	Disable building DPP kernels by default (#1804 ) * Disable building DPP kernels by default * Disable building dpp instances, examples, or tests if DPP_KERNELS is not set * Add new DPP_KERNELS flag to readme	2025-01-08 13:50:42 -05:00
Max Podkorytov	ad697c78ac	mark unused args	2025-01-08 10:09:54 -08:00
Max Podkorytov	a2e6ad62e2	run clang-format -style=file	2025-01-08 10:09:54 -08:00
Max Podkorytov	aa59ecaa22	run clang-format==12	2025-01-08 10:09:54 -08:00
Max Podkorytov	82fb3f84fb	update comment in the policy	2025-01-08 10:09:54 -08:00
Max Podkorytov	4daa82b451	update qsksvs comment	2025-01-08 10:09:54 -08:00
Max Podkorytov	66c5b715c9	remove dead code	2025-01-08 10:09:54 -08:00
Max Podkorytov	edb78a4729	clang-format and remove dead code	2025-01-08 10:09:54 -08:00
Max Podkorytov	60113859fa	roll back splitkv	2025-01-08 10:09:54 -08:00
Max Podkorytov	bfc997a7e6	update qsksvs pipeline	2025-01-08 10:09:54 -08:00
Max Podkorytov	f7942b993c	qsksvs pipeline changes to mirror qrksvs	2025-01-08 10:09:54 -08:00
AMD-dteng	d5c8a334ca	enable bias feature that add bias before adding residual (for rtpllm project) (#1741 ) * 1. enable bias feature that add bias before adding residual; 2. change block size from 128->64 when m<64 in fp16 * delete comment * 1.remove fmha change 2.change buffer name from bias to xbias * Now bias can be used independently from fadd * change kbias to kxbias --------- Co-authored-by: feli <felix.li@amd.com>	2025-01-08 17:51:06 +08:00
spolifroni-amd	a6b761c39a	Update LICENSE to 2025 (#1797 )	2025-01-07 08:29:40 -08:00
dependabot[bot]	9f6bf9ab26	Bump rocm-docs-core from 1.12.1 to 1.13.0 in /docs/sphinx (#1798 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.12.1 to 1.13.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.12.1...v1.13.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-01-07 08:26:31 -08:00
Po Yen Chen	24b12d04af	[CK_TILE] fmha fwd splitkv optimization for decode (seqlen_q=1) (#1789 ) * Update license year * Add initial code to override decode problem * Fix splitkv traits/args overriding error * Reshape and transpose lse for decode * Remove debug code * Prettify example code * Use better function name * Add kMergeNumHeadGroupsSeqLenQ flag Kernel user can use this switch to turn on/off optimization for some problem sizes * Add missing flag declarations * Default turn off kMergeNumHeadGroupsSeqLenQ in codegen * Group similar statements together * Remove assumption of seqlen_q=1 * Remove kMergeNumHeadGroupsSeqLenQ from splitkv combine kernel * Support kMergeNumHeadGroupsSeqLenQ=true in fmha splitkv kernel * Run kMergeNumHeadGroupsSeqLenQ=true kernels when need * Fix group mode block skip logics * Undo changes of normal fwd kernel * Update in GridSize() and using GridSize() for splitkv kernel (#1799) --------- Co-authored-by: Qianfeng <qianfeng.zhang@amd.com>	2025-01-07 18:49:24 +08:00
Bartłomiej Kocot	888317e698	Fix universal gemm profiler for pk_i4_t (#1790 ) * Fix universal gemm profiler for pk_i4_t * fix	2025-01-04 14:01:33 +01:00
dependabot[bot]	37b3514648	Bump rocm-docs-core from 1.12.0 to 1.12.1 in /docs/sphinx (#1788 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.12.0 to 1.12.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.12.0...v1.12.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-01-03 17:47:48 -08:00
Illia Silin	8ea375bb58	terminology clean-up (#1792 )	2025-01-03 16:38:22 -08:00
carlushuang	6df5fe2ad8	[CK_TILE]naive attn support FP8 KVCache quant (#1747 ) * quant * fix bug * simple smoothquant after softmax * update kv-quant * update stride * fix fp8-pertoken-kvcache * update int8/fp8 quant support --------- Co-authored-by: so <a.com> Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>	2025-01-03 18:43:07 +08:00

1 2 3 4 5 ...

1663 Commits