composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-19 02:01:01 +00:00

Author	SHA1	Message	Date
Bartłomiej Kocot	d5ffc2b1b9	Add Grouped Convolution and GEMM documentation (#1719 ) * Add Grouped Convolution docs * Add gemm docs * Update docs * fix [ROCm/composable_kernel commit: `85d6fcd30a`]	2025-02-04 16:41:49 +01:00
Bartłomiej Kocot	3cddb295b1	Fix duplication of pk_add_f16 symbols (#1858 ) [ROCm/composable_kernel commit: `11e4082dd8`]	2025-02-04 14:42:11 +01:00
Bartłomiej Kocot	5835ed012d	Fix pk_int4 cast and add pk_int4 dtype in ck tile (#1854 ) * Fix pk_int4 cast and add pk_int4 dtype in ck tile * fixes * Improvements * fix typo [ROCm/composable_kernel commit: `9ee69dd297`]	2025-02-04 10:32:07 +01:00
Ben Richard	2753e26d39	SWDEV-506789 - composable_kernel does not honor cmake BUILD_SHARED (#1844 ) * Honor BUILD_SHARED_LIBS * Add .so versioning when building shared libraries [ROCm/composable_kernel commit: `9c5b2f3936`]	2025-02-01 09:21:25 -08:00
arai713	1ca4ad2739	Codegen hipRTC compilation (#1579 ) * updating codegen build for MIOpen access: adding .cmake for codegen component * updating CMake * adding in header guards for some headers due to issues with hiprtc compilation in MIOpen * some more header guards * putting env file in header guard * cleaning up some includes * updated types file for hiprtc purposes * fixed types file: bit-wise/memcpy issue * updating multiple utility files to deal with standard header inclusion for hiprtc * added some more header guards in the utility files, replacing some standard header functionality * added some more header guards * fixing some conflicts in utility files, another round of header guards * fixing errors in data type file * resolved conflict errors in a few utility files * added header guards/replicated functionality in device files * resolved issues with standard headers in device files: device_base and device_grouped_conv_fwd_multiple_abd * resolved issues with standard headers in device files: device_base.hpp, device_grouped_conv_fwd_multiple_abd.hpp, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle.hpp * added header guards for gridwise gemm files: gridwise_gemm_multiple_abd_xdl_cshuffle.hpp and gridwise_gemm_multiple_d_xdl_cshuffle.hpp * fixed issue with numerics header, removed from transform_conv_fwd_to_gemm and added to device_column_to_image_impl, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle_v3, device_image_to_column_impl * replaced standard header usage and added header guards in block to ctile map and gridwise_gemm_pipeline_selector * resolved errors in device_gemm_xdl_splitk_c_shuffle files in regards to replacement of standard headers in previous commit * added replicated functionality for standard header methods in utility files * replaced standard header functionality in threadwise tensor slice transfer files and added header guards in element_wise_operation.hpp * temp fix for namespace error in MIOpen * remove standard header usage in codegen device op * removed standard header usage in elementwise files, resolved namespace errors * formatting fix * changed codegen argument to ON for testing * temporarily removing codegen compiler flag for testing purposes * added codegen flag again, set default to ON * set codegen flag default back to OFF * replaced enable_if_t standard header usage in data_type.hpp * added some debug prints to pinpoint issues in MIOpen * added print outs to debug in MIOpen * removed debug print outs from device op * resolved stdexcept include error * formatting fix * adding includes to new fp8 file to resolve ck::enable_if_t errors * made changes to amd_wave_read_first_lane * updated functionality in type utility file * fixed end of file issue * resovled errors in type utility file, added functionality to array utility file * fixed standard header usage replication in data_type file, resolves error with failing examples on navi3x * formatting fix * replaced standard header usage in amd_ck_fp8 file * added include to random_gen file * removed and replicated standard header usage from data_type and type_convert files for fp8 changes * replicated standard unsigned integer types in random_gen * resolved comments from review: put calls to reinterpret_cast for size_t in header guards * updated/added copyright headers * removed duplicate header * fixed typo in header guard * updated copyright headers --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `2e3183af4f`]	2025-01-31 09:48:39 -08:00
Illia Silin	9c2c2e7b63	fix ck_tile gemm scripts (#1851 ) [ROCm/composable_kernel commit: `2ab8bf4c12`]	2025-01-31 09:42:43 -08:00
Illia Silin	d38451ad09	Enable ck_tile gemms build in CI by default. (#1850 ) * turn on the ck_tile gemm tests by default * enable ck_tile gemms CI build by default [ROCm/composable_kernel commit: `7cf8931677`]	2025-01-30 16:01:43 -08:00
Adam Osewski	e029eaedae	[CK Tile] Spatially local GEMM tile partitioner. (#1843 ) * Add spatially local tile partitioner * Use 1D Grid size & create partitioner object. * Docs & use 1D partitioner in example. * Clang format. * Change kernel grid size Now: X is the # of output C-tiles, Y is the batch count Z is the splitK * Formatting & more doc. * Clang format. * Fix batched gemm test. Use 1d partitioner. * Move condition. * FIx ctor. * clang-format. [ROCm/composable_kernel commit: `ce448002ee`]	2025-01-31 00:10:16 +01:00
dependabot[bot]	e94eaabc18	Bump rocm-docs-core from 1.14.1 to 1.15.0 in /docs/sphinx (#1848 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.14.1 to 1.15.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.14.1...v1.15.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> [ROCm/composable_kernel commit: `e6d4180498`]	2025-01-30 07:04:27 -08:00
Illia Silin	71e9030f34	turn on the ck_tile gemm tests by default (#1849 ) [ROCm/composable_kernel commit: `dcbfa79542`]	2025-01-30 07:03:48 -08:00
Bartłomiej Kocot	4f2c699f90	[CK TILE] Implement cschuflle algorithm (#1842 ) * [CK TILE] Implement cschuflle algorithm * Rebase * Vector store size fixes * fixes * Fixes * fixes * fmha fix * fixes * fixes of fixes [ROCm/composable_kernel commit: `25e2e0f04a`]	2025-01-30 11:57:39 +01:00
fangche123	e4d8548dc5	add batched_transpose implement (#1660 ) * add batched_transpose implement --------- Co-authored-by: root <root@ctr-ubbsmc16.amd.com> Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com> Co-authored-by: ThomasNing <thomas.ning@amd.com> [ROCm/composable_kernel commit: `c5fff071e5`]	2025-01-28 16:22:02 -08:00
darren-amd	54ae306398	Change flag to CK_GFX90A_DENORM_WORKAROUND (#1817 ) * Change flag from CK_WORKAROUND_DENORM_FIX to CK_GFX90A_DENORM_WORKAROUND for more clarity. Also changed the definition macros to be more clear. [ROCm/composable_kernel commit: `d6a4605e1c`]	2025-01-28 09:58:39 -05:00
Andriy Roshchenko	78f8490cb6	Add OCP FP8 support in CK_TILE (#1829 ) * Add OCP FP8 to CK_TILE * Validate OCP FP8 in FMHA FWD under VALID=1 [ROCm/composable_kernel commit: `35aebe5936`]	2025-01-27 11:59:49 -07:00
Adam Osewski	89093ac431	[CK-Tile] Enable vectorized reads on all layouts & improve perf. (#1835 ) * Refactor universal gemm policy. * Adapt example to refactor changes. * Introduce static encoding pattern * Adding shuffled encoding patterns. * Fix err in reverse tuple. * Add transpose_tile2d * Small refactoring + doc * Enable reading on contiguous dimension in all layouts. * Transpose A/B register tile if needed for comp v3 pipeline. * Take contiguous dim size when calculating dram vector load size. * A/B smem pack size taken from WarpGemm attributes * Update B LDS layout and setup tile distribution pattern at class level. * Fix static assert. * Fix errors in examples. * Formatting & fix IsTranspose * Fix VectorSize & refactor. * Add error loging messages. * Fix VecLoadSize and TranspseC for mem pipeline. * Update unit-tests & disable mem pipeline. * Clang format * Update include/ck_tile/core/tensor/tile_window.hpp Co-authored-by: jakpiase <jakub.piasecki@amd.com> * Fix compilation and reviewers comments. * Refactor unit-test. Fallback to non-universal gemm. Need to use GemmPipelineAGmemBGmemCRegV1 for now, since GemmKernel is now supporting also non-K major vector reads. --------- Co-authored-by: jakpiase <jakub.piasecki@amd.com> [ROCm/composable_kernel commit: `39dc25a9b8`]	2025-01-27 16:37:19 +01:00
ruanjm	dce7207ece	Implement fp8 quant for layernorm and rmsnorm (#1814 ) [ROCm/composable_kernel commit: `64d5c4d6cb`]	2025-01-24 16:40:43 +08:00
carlushuang	ae3d2e47c4	[CK_TILE] not using structures under ck_tile/ops for ck_tile/host (#1834 ) * not using structures under ck_tile/ops for ck_tile/host * update as constexpr function * Rename fn * Update other examples. --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by: Adam Osewski <Adam.Osewski@amd.com> [ROCm/composable_kernel commit: `5b9b083dbc`]	2025-01-24 15:35:54 +08:00
carlushuang	74b2592535	add fp8 as dst (#1830 ) [ROCm/composable_kernel commit: `052a72655c`]	2025-01-22 17:34:27 +08:00
dependabot[bot]	ac89e47893	Bump rocm-docs-core from 1.13.0 to 1.14.1 in /docs/sphinx (#1832 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.13.0 to 1.14.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.13.0...v1.14.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> [ROCm/composable_kernel commit: `1fe2c35291`]	2025-01-21 21:30:30 -08:00
Bartłomiej Kocot	d86a5e1418	Add Conv NGCHW client example (#1831 ) [ROCm/composable_kernel commit: `742f5d6b55`]	2025-01-22 01:02:03 +01:00
Mateusz Ozga	91965f3411	Simplify static_cast if-lands (#1828 ) [ROCm/composable_kernel commit: `3db77bc4f2`]	2025-01-21 23:23:19 +01:00
Mateusz Ozga	b63e4bc4b8	CK-Tile Grouped GEMM refactor and post PR fixes (#1756 ) * Grouped gemm simple code refactor * Offset invoker * Invoke generic Run, and replace name of parrtitioner variable * Tests fix type * Removed namespaces * Add template param to avoid implicit cast * Remove generic function * Constant value * underline enum to int16_t * Generalize partitioner function * Remove whitespaces * Rename function * Using support * Clang-format * Clang-format * Fn-partitioner description fn * Typo * Typo 2 * Better description * Better description * Refactor after review * Use ctr instead of set fn * Inovke ctr and typo * Comments * Remove unnecessary comment * Review, remove modulo [ROCm/composable_kernel commit: `3c93d3c444`]	2025-01-21 21:06:10 +01:00
deepsek	c6a6e93628	Added bf16 instances grouped gemm fixed nk (#1825 ) * Feat: Add bf16 input instances * feat: Add BF16 profiler code * fix: reorder enum types * fix: CI fail due to clang-format * fix: clang script format issue * fix: clang format broke cmakelist file [ROCm/composable_kernel commit: `e7dce4d247`]	2025-01-20 09:13:09 -08:00
lucbruni-amd	80a206156b	Add CK_TIME_KERNEL as toggleable CMake Variable (#1794 ) * Disable CK_TIME_KERNEL by Default, Add as CMake Variable * Enable CK_TIME_KERNEL by Default, Maintaining CMake Variable Functionality. * Fix build error. [ROCm/composable_kernel commit: `3fb2f5acc7`]	2025-01-20 07:09:19 -08:00
Mingtao Gu	462b62be6b	fix a bug for int4 scale weight only kernel (#1820 ) Co-authored-by: mtgu0705 <mtgu@amd.com> [ROCm/composable_kernel commit: `86d1b46aa6`]	2025-01-19 11:18:18 +08:00
Bartłomiej Kocot	6472bdb4ed	[CK_TILE] Add error threshold calculation for gemm examples (#1821 ) [ROCm/composable_kernel commit: `bdddf1eace`]	2025-01-18 01:01:52 +01:00
deepsek	cb909428d0	fix: preprocessor directives logic error if/else (#1764 ) * fix: preprocessors logic error if/else * fix: added macros as preferred by CK team [ROCm/composable_kernel commit: `0fcbb25f70`]	2025-01-16 20:31:15 -08:00
Aviral Goel	552845ecd0	Implementing Test Filters for Smoke and Regression Tests (#1819 ) * smoke and regression targets working with tests * test filters work for both examples and test * removed uneccesary comments * added a missing comment * added a missing comment * fixed typo in the comments * updated README * Update PULL_REQUEST_TEMPLATE.md updating the template for future addition of test cases * Update PULL_REQUEST_TEMPLATE.md [ROCm/composable_kernel commit: `54de3e55e1`]	2025-01-16 16:40:08 -08:00
Bartłomiej Kocot	e65a010b5e	Fix and optimize dynamic unary elementwise (#1818 ) * Fix and optimize dynamic unary elementwise * fix [ROCm/composable_kernel commit: `1519ce91a3`]	2025-01-16 13:48:39 -08:00
carlushuang	2fec988802	[CK_TILE] Fix mock token id, support g1u1/g1u0 through same inline code block (#1808 ) * fix mock token id * prepare host for g1u1 * reformat inline-asm * restructure uk_0 * restructure gate_up * done * change default to init=1 * update readme * fix a bug in interleave pipeline * rcp for silu [ROCm/composable_kernel commit: `1ff50e78c6`]	2025-01-16 17:51:10 +08:00
Illia Silin	9955ac560b	disable inductor codegen tests on legacy OS (#1816 ) [ROCm/composable_kernel commit: `8c29e06f3c`]	2025-01-15 12:11:54 -08:00
Bartłomiej Kocot	2c4a1cce43	Add rounding for float to bf16 conversion as default (#1812 ) * Add rounding for float to bf16 conversion * Add bhalf test * Add inf test bhalf * Refactor * update cmake * Fixes [ROCm/composable_kernel commit: `7790e8c3f7`]	2025-01-15 16:41:21 +01:00
ruanjm	9f9eddd0cf	[CK_TILE] Add Various Fusion Functions to RMSNorm (#1802 ) * Add shortcut to RMSNorm * Modify test for adding shortcut for RMSNorm * Add fused parameter into tests * 1. Add YDataType. 2. rmsnorm2d_fwd_traits_ from rmsnorm2d_fwd.hpp to rmsnorm2d_fwd_api.cpp and rmsnorm2d_fwd_instance_common.hpp * 1. Supports various stride and percisions. * Add support of Epilogue * Add fuse and epilogue support to rmsnorm ref * Modify rmsnorm example * Refactor tests/examples * Bug fix for newly added tests/examples * Bug fix for new tests 2 * Modify smoke test scripts remove dbg code * Supports non-smooth dyanmic quant * Update Rmsnorm2dFwd::GetName() * rename xscale and prec_sx to smoothscale and prec_sm Bug fix after rename Remove files * change example_rmsnorm2d_fwd.cpp * update performance calculator * Fix issue in two-pass when fuse add is enabled * Remove comment of beta --------- Co-authored-by: rocking <ChunYu.Lai@amd.com> [ROCm/composable_kernel commit: `04dd314883`]	2025-01-15 10:23:48 +08:00
Max Podkorytov	51c8a8e291	fix parsing instances for pt inductor (#1796 ) add unit test for gen instances for gemms add unit tests for conv and batched gemms add unit test for preselected gemm instances apply ruff lint add license header for the unit test add inductor pytest to CI verbose pip install switch the directory before installing python packages move the inductor codegen test try yet another workdir Update Jenkinsfile The directory looks right, fixing pip module not found by invoking pip directly Update Jenkinsfile invoke pytest directly since the module is not found Update Dockerfile Install setuptools update package structure bump setuptools maybe fix data path for library sources fix library search path for conv instances fix path in pyproject definition compare path used in gen_instances with one in pyproject.toml; fix the difference Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `c0b90f130f`]	2025-01-13 13:51:08 -08:00
feli	1f484facd3	Dev/merge u8w8 (#1774 ) * port tiles from a8w8 * rm debug used files * add instances * remove all non gemm in cmake * merge; impl fp16 * recover cmake from develop * add missed files; fix clang format --------- Co-authored-by: coderfeli <coderfeli@163.com> [ROCm/composable_kernel commit: `53ab1b9047`]	2025-01-13 10:25:14 -08:00
Thomas Ning	70e79bc56f	CK Tile GEMM CICD fixed & register block method refactor (#1776 ) * refactor the block_gemm_areg_breg_creg_v1 and add the v2 policy with 2x2 warp gemm * Finished the 2x2 warp gemm policy and the block selection mechanism * Clang format * address poyen's comment * Address feedbacks * Fixed the compilation issue * Change the function name [ROCm/composable_kernel commit: `5d671a5fc4`]	2025-01-13 13:10:44 +08:00
ClementLinCF	bbd54d3dfb	[CK_TILE] Adjust kBlockSize of reduce example for better perf (#1779 ) * Observed a 2x perf improvement with kBlockSize = 256 * Using 512 threads may lead to redundant computations [ROCm/composable_kernel commit: `0b8f117f1a`]	2025-01-12 20:50:32 -08:00
Qianfeng	3cc02417a9	Update for fmha_fwd qs_ks_vs pipeline (#1810 ) * Update for fmha_fwd qs_ks_vs pipeline * Remove _builtin_amdgcn_sched_barrier(0) * Move p_compute to p converting earlier for trying to increase vgprs re-using * Enable GetQKBlockGemm to use WarpGemm-16x16x16 for QLoadOnce==false situation * Re-add __builtin_amdgcn_sched_barrier(0) --------- Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `3d50f57f43`]	2025-01-13 12:43:05 +08:00
Bartłomiej Kocot	da7d6023cf	Grouped convolution backward weight special vector size loads (#1772 ) * Grouped convolution backward weight special vector size loads * Instnaces and tests * Fixes * Add 7 and 13 special cases * fix comments * Fix * Fix2 * fixes * fix atomic add bf16 [ROCm/composable_kernel commit: `fd46a01d8b`]	2025-01-10 22:02:30 +08:00
Thomas Ning	dc1b18eebf	Ck tile/gemm perf measure (#1750 ) * Finished adding the performance benchmark for ck tile gemm * Fix the executable rename problem * fix the executable name error * delete the unsupported layout combinations * Update run_full_test.sh * Update benchmark_mem_pipeline.sh * Update benchmark_basic.sh * change the executable of gemm_universal * change ck_tile_gemm script permissions * Addressed the comment * Addressed the comment * Fixed the comments * Fixed Comment * roll back the malfunctioned change * Fix the Typo * finalize the tile_gemm_fp16 performance monitoring * fix the stash names for ck_tile gemm logs * change the stashing logic * change stashing syntax --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `73a076eee1`]	2025-01-09 17:41:49 -08:00
darren-amd	6bc57cf274	Disable building DPP kernels by default (#1804 ) * Disable building DPP kernels by default * Disable building dpp instances, examples, or tests if DPP_KERNELS is not set * Add new DPP_KERNELS flag to readme [ROCm/composable_kernel commit: `26b3829c02`]	2025-01-08 13:50:42 -05:00
Max Podkorytov	4c98908e17	mark unused args [ROCm/composable_kernel commit: `ad697c78ac`]	2025-01-08 10:09:54 -08:00
Max Podkorytov	d7a2a81051	run clang-format -style=file [ROCm/composable_kernel commit: `a2e6ad62e2`]	2025-01-08 10:09:54 -08:00
Max Podkorytov	e1896982b5	run clang-format==12 [ROCm/composable_kernel commit: `aa59ecaa22`]	2025-01-08 10:09:54 -08:00
Max Podkorytov	715635839b	update comment in the policy [ROCm/composable_kernel commit: `82fb3f84fb`]	2025-01-08 10:09:54 -08:00
Max Podkorytov	00c32ecda2	update qsksvs comment [ROCm/composable_kernel commit: `4daa82b451`]	2025-01-08 10:09:54 -08:00
Max Podkorytov	099e23be84	remove dead code [ROCm/composable_kernel commit: `66c5b715c9`]	2025-01-08 10:09:54 -08:00
Max Podkorytov	63cc962000	clang-format and remove dead code [ROCm/composable_kernel commit: `edb78a4729`]	2025-01-08 10:09:54 -08:00
Max Podkorytov	25fdfe3df8	roll back splitkv [ROCm/composable_kernel commit: `60113859fa`]	2025-01-08 10:09:54 -08:00
Max Podkorytov	d3d53433aa	update qsksvs pipeline [ROCm/composable_kernel commit: `bfc997a7e6`]	2025-01-08 10:09:54 -08:00

1 2 3 4 5 ...

1663 Commits