composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-14 10:09:41 +00:00

Author	SHA1	Message	Date
Andriy Roshchenko	e70035cf5a	MX GEMM - Add FP8 GEMM Tests for Different Layouts (#2152 ) * Add gemm_mx_fp8_bf8 example with row-major B * Add more overloads of MX MFMA instructions * Add MK_KN (RRR) tests * Add KM_NK (CCR) tests * Add more problem sizes to Large tests * Add test_gemm_mx to the list of regression tests [ROCm/composable_kernel commit: `79b0bfeb41`]	2025-05-01 11:55:48 -06:00
Aviral Goel	7cedc33703	Add documentation for ck_tile::array<T,N> (#2078 ) * addded documentation for ck_tile::array<T,N> * clang format fix * spelling errros Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> * spelling errros Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Revert "spelling errros" This reverts commit `4179e7d193`. * Revert "spelling errros" This reverts commit `3f90733dbe`. --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Co-authored-by: John Afaganis <john.afaganis@amd.com> [ROCm/composable_kernel commit: `1d8ef40760`]	2025-04-30 16:43:36 -07:00
Illia Silin	0b07559cbe	Revert "Add ck tile examples to package (#1880 )" (#2150 ) [ROCm/composable_kernel commit: `9a9f59ae69`]	2025-04-30 10:20:16 -07:00
Aviral Goel	d79abc2e03	Add Matrix A and Matrix B Swizzle for LDS in Computev4 policy (#2136 ) * fixed computev4 policy bug for lds swizzle * added swizzle for input matrix B * Improved ComputeV4 policy and pipeline by swizzling A and B * consolidated LDS descriptor functions in parent struct [ROCm/composable_kernel commit: `65f182d617`]	2025-04-28 18:20:47 -07:00
Khushbu Agarwal	aeb46e6a49	Support for MFMA_16x16x128 for fp8/bf8 (#2125 ) * Adding 16x16x128 support for gfx950 * Support for fp8 and bf8 * fix input arguments for MFMA scale instruction * clang-formatted * Fixes for lwpck-3145 (#2138) * Fix lds tile & cmake dep & default epilogue * Fallback BTypeToUse to ADataType in WOQ cases * reverting instance json file * reverting instance json file --------- Co-authored-by: Yi DING <yi.ding@amd.com> [ROCm/composable_kernel commit: `d107f3c3a5`]	2025-04-28 18:19:50 -07:00
Bartłomiej Kocot	7942bb905b	Integrate universal gemm with conv bwd data and add SplitK (#1315 ) * Integrate universal gemm with conv bwd data * Fix multi d kernel * Add splitK support * instances refactor * instances refactor * refactor * fixeS * fixes * 16x16 instnaces * Fixes * Fix * Fix * Fix * Fix * Fix * Fixes * fix * fix [ROCm/composable_kernel commit: `4094ad158a`]	2025-04-28 23:54:49 +02:00
jakpiase	28783ec2f4	Add ck tile examples to package (#1880 ) * add ck tile examples to package * Update jenkinsfile * fix for jenkinsfile * fix for building ck tile code on non gfx9 * compile ck tile examples only for gfx94 * include ck tile examples in all target * fix for basic gemm UseStructuredSparsity * Update CMakeLists.txt * Update gemm_pipeline_problem.hpp * add targets to rocm install --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `434d19f696`]	2025-04-28 09:53:19 -07:00
Anton Gorenko	420de0f22d	DeviceGemm_Wmma_CShuffleV3 with BlockGemmPipelineVersion::v3 (#2096 ) * Prepare files for DeviceGemm_Wmma_CShuffleV3 * Implement main part of CShuffleV3 with block pipeline v3 for WMMA * Remove unused functions and template params for A/B descriptors * Support both gfx11 and gfx12 * Enable SplitK for gfx12 and disable for gfx11 * Added RowColRow layout for DeviceGemmV2 fp16 * Added more instances for Row, Col, Row data layout * Added instances for DeviceGemm_Wmma_CShuffleV3, Col, Row, Row data layout * Added instances for DeviceGemm_Wmma_CShuffleV3, Col, Col, Row data layout * Added more instances for DeviceGemm_Wmma_CShuffleV3, Row, Row, Row data layout * Fix formatting * Add documentation Based on cc666c6a19dabc2cce8141e7ae23bd460ceef331 * Enable gemm_universal profiling for gfx11/12 * Add WMMA intrinsics for F8/BF8 * Support F8/BF8 DeviceGemm_Wmma_CShuffleV3, add basic instances * Add BF16 instances and tests * Fix test_gemm_universal_wmma_fp8 by adding CK_USE_WMMA_FP8 --------- Co-authored-by: Anca Hamuraru <anca@streamhpc.com> [ROCm/composable_kernel commit: `edd92fc546`]	2025-04-28 10:14:21 +05:00
Yi DING	77c7fb1e6b	Fix fp8 convert & add option for basic example (#2129 ) [ROCm/composable_kernel commit: `8add2cf45d`]	2025-04-27 16:26:05 -07:00
Khushbu Agarwal	5c17305a82	MFMA_32x32x16 for gfx950 (#2121 ) * Enable MFMA_32x32x16 for fp16/BF16 for gfx950 * clang formatted [ROCm/composable_kernel commit: `a2ed34a112`]	2025-04-24 10:20:22 -07:00
Illia Silin	08f859b460	make code compliant with std=c++20 (#2123 ) [ROCm/composable_kernel commit: `01cb8379cd`]	2025-04-24 10:14:52 -07:00
carlushuang	65199be10e	[CK_TILE] support gfx950 matrix core in 01_fmha fwd (#2110 ) * gfx950 01_fmha fwd * fix comment --------- Co-authored-by: Thomas Ning <Thomas.Ning@amd.com> [ROCm/composable_kernel commit: `5487289fc4`]	2025-04-23 12:40:18 -07:00
lalala-sh	8f426b1216	Moe gemm activation (#2026 ) * fix useless code and remove usless oob * clang format * fix coredump in e2e test * fix2 * fix clang format * fix output oob * impl int64 but result not correct * int64 index ok now * input output all ok * fix uint32 * revert v1 test * use uint32 * mork to support 13w tokens * moe sorting fix moebuf * fix merge * update moe api fix aiter build * fix buid * fuse silu * silu ok * acale ok * add silu * change code * gemm2 ok * gufusion compatible ok, fix warnings * gu fusion for m32 m64 ok * support bf16 cshuffle * i4 gemm2 ok * i4 gemm2 ok and i4 gemm1 build * 16x16 run ok * change flops; change cshuffle dtype * fuse gelu silu act in moe gemm1 * fp8 with act ready * int4 act ready * remove useless changes * remove useless code change * fix clang format * add the arch limit of int4 moe gemm * fuse moe activation * fix fp8 16x16 * fix no quant case * fix bugs * fix fp8 gufusion bug * remove useless comments * refine activation code & complete moe example * fix int8 bugs * merge tkw1 --------- Co-authored-by: coderfeli <coderfeli@163.com> Co-authored-by: feli <felix.li@amd.com> Co-authored-by: illsilin <Illia.Silin@amd.com> Co-authored-by: root <root@hjbog-srdc-51.amd.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `39ba03f25d`]	2025-04-23 10:35:34 +08:00
Gino Lu	983dac1699	[CK-Tile] warp-gemm support for using V_MFMA_F32_16x16x32_BF16 (#2073 ) * draft v_mfma_f32_16x16x32_bf16 * fix error config and add debug code. * Solve the CShuffle Problem * draft v_mfma_f32_16x16x32_bf16 * fix error config and add debug code. * Solve the CShuffle Problem * fix error while testing new command * Finished the feature of new mfma 161632 * Addressed the comment --------- Co-authored-by: ThomasNing <thomas.ning@amd.com> [ROCm/composable_kernel commit: `504f563f78`]	2025-04-22 15:52:36 -07:00
Rostyslav Geyyer	dbcaae42bf	Temporarily disable MX FP4 device tests (#2112 ) [ROCm/composable_kernel commit: `416e851584`]	2025-04-22 16:08:48 -05:00
Thomas Ning	005e61ce63	GEMM Multiply Multiply Fix (#2102 ) * fix the type convert and increase the BF16 conversion + the profile comment * fix the CI [ROCm/composable_kernel commit: `0cca8fa28f`]	2025-04-22 01:13:22 -07:00
Muhammed Emin Ozturk	eb8306b5f0	MI308 fix for streamk 1-Tile floating point exception (#2101 ) [ROCm/composable_kernel commit: `b092c18da7`]	2025-04-21 11:44:07 -07:00
Thomas Ning	83197c769f	MFMA 16x16x32fp8 (#2103 ) * add mfma_16x16x32_fp8 * clang format code * Finished the fix for gemm basic * clang foramt * rebuild CI * recover gemm.hpp * add MFMA 161632bf8 --------- Co-authored-by: solin <bingzhou@amd.com> [ROCm/composable_kernel commit: `a738e43445`]	2025-04-21 10:21:35 -07:00
solin	dd2c3289c9	fix CI build fail [ROCm/composable_kernel commit: `c318ec0778`]	2025-04-21 16:00:12 +08:00
lalala-sh	dc58110b06	enable do top k weights in moe stage1 gemm (#2094 ) * add switch for mul topk weights * fix bf16/f16 bugs * complete [ROCm/composable_kernel commit: `bcf5bb41be`]	2025-04-18 10:45:49 +08:00
Andriy Roshchenko	7972a39081	MX GEMM - Parameterized Test Template (#2088 ) * Tests for MX FP8 GEMM * Improve documentation [ROCm/composable_kernel commit: `213b203a3c`]	2025-04-16 19:56:00 -06:00
Andriy Roshchenko	a96e96ab4e	MX GEMM - Add MX BF8 example (#2071 ) * Add MX GEMM example for MX BF8 * Verified MX FP8 with 16x16x128 scale builtin * Verify MX BF8 GEMM with BF16 output [ROCm/composable_kernel commit: `da54464cce`]	2025-04-16 15:25:02 -06:00
BingYuan.Zhou	f6b51a1c7b	[flatmm] implement basic fp16 flatmm (#2089 ) * [flatmm] implement basic fp16 flatmm * fix CI build fail --------- Co-authored-by: root <root@hjbog-srdc-50.amd.com> Co-authored-by: solin <bingzhou@amd.com> [ROCm/composable_kernel commit: `eaf1f0bf3b`]	2025-04-16 16:51:17 +08:00
Andriy Roshchenko	5e2bd20672	MX GEMM - New GEMM pipeline for MX data types (#2059 ) * Allow selection of mfma_scale instructions * Read B tensor from LDS to VGPR in chunks of 16 in MFMA order * Add constexpr and synchronize return type for `get_exponent_value` * Pass scales by reference and add comments to `mfma_scale_f32_32x32x64` * Add support for microscaling instructions in `XdlopsGemm` * Fix `mfma_scale_f32_16x16x128f8f6f4` wrapper * Remove software implementation of MX GEMM * Make interface of `intrin_mfma_scale_f32_16x16x128f8f6f4<16, 16>` consistent with the other scale instruction * Update README * Updated CHANGELOG * Remove unused static methods [ROCm/composable_kernel commit: `7106976a72`]	2025-04-15 17:17:07 -06:00
Mingtao Gu	3cbda9a11c	CK pk_i4_t test failures fix (SWDEV-518629) (#2075 ) * fix pk_i4_v3 tests failures in Unbuntu env. * fix pk_i4_t tests failure on Unbuntu issues. * some fixed. --------- Co-authored-by: mtgu0705 <mtgu@amd.com> [ROCm/composable_kernel commit: `56378f810f`]	2025-04-14 16:58:57 +08:00
Thomas Ning	1d62f5edb9	Solve the Static Encoding Pattern compile error when the tile size is too small (#2079 ) [ROCm/composable_kernel commit: `269f4f6af5`]	2025-04-13 20:09:30 -07:00
Illia Silin	c91d046350	Fix build issues for multiple targets. (#2077 ) * build for multiple targets on gfx942 * add missing ignore statements [ROCm/composable_kernel commit: `0d4f145078`]	2025-04-11 12:12:53 -07:00
jakpiase	addcd203eb	[CK_TILE] Add 2:4 structured sparsity support for fp16 gemm (#1957 ) * add structured sparsity fp16 support for gemm * added reviewer suggestions * update changelog * update changelog * add reviewers suggestions * Minor fix * clang fix * fix doxygen [ROCm/composable_kernel commit: `6c61f4d237`]	2025-04-11 12:18:26 +02:00
slippedJim	cca9cca699	add fmha fwd splitkv receipt for aiter c++ api (#2068 ) * add s_randval for c++ api * Fix bug of bias in splitkv --------- Co-authored-by: rocking <ChunYu.Lai@amd.com> [ROCm/composable_kernel commit: `5f885d2b7a`]	2025-04-10 23:21:13 +08:00
Juan Manuel Martinez Caamaño	c4e0659e51	Replace inline assembly with builtins in FHMA (#2067 ) * Replace inline assembly with builtins in FHMA --------- Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `f14e648e7c`]	2025-04-10 09:48:37 +02:00
Illia Silin	9cc561987f	enable gfx115x support (#2065 ) [ROCm/composable_kernel commit: `3e6d21adeb`]	2025-04-09 10:06:42 -07:00
MHYang-gh	4e5b335b5e	Make buffer coherence configurable in tensor view (#2041 ) * Make buffer coherence configurable in tensor view * Fix clang-format for tensor_view.hpp [ROCm/composable_kernel commit: `03ce8729fd`]	2025-04-08 15:34:11 -07:00
valarLip	aa6eeb7f66	add passthrough for int32->float32 (#2062 ) [ROCm/composable_kernel commit: `2c563fecf7`]	2025-04-08 15:16:30 -07:00
Max Podkorytov	1eec5cc073	simplify generate_tuple (#2043 ) [ROCm/composable_kernel commit: `6ce0797dad`]	2025-04-08 09:00:51 -07:00
aledudek	a42081842c	[CK_TILE] Fix GEMM Memory Pipeline (#2034 ) * [CK_TILE] Fix GEMM Memory Pipeline * Fix transpose tile * Add comments [ROCm/composable_kernel commit: `80aae6119b`]	2025-04-08 12:40:04 +02:00
Illia Silin	ad65c947c9	fix codegen issues (#2052 ) [ROCm/composable_kernel commit: `1793228422`]	2025-04-07 07:08:39 -07:00
Illia Silin	6d90b2eb50	Split env.hpp header from the ck.hpp header. (#2049 ) * split env.hpp out of main headers * fix namespace logic [ROCm/composable_kernel commit: `572cd820ce`]	2025-04-03 15:30:21 -07:00
Rostyslav Geyyer	1716380358	Add FP16/BF16<->FP8/BF8 conversions (#2035 ) * Move conversion functions and add missing conversions * Add tests * Add missing conversions * Add missing conversions * Add bf8 tests * Update clipping for vectors * Add missing conversions * Add bf16 fp8 tests * Add bf16 bf8 tests * Fix device conversion * Fix conversions * Fix vector use * Minor fix * Add a workaround flag * Add a workaround flag for bf16 conversion * Add another workaround * Add a workaround for fp16 to bf8 conversion * Update type alias * Add docstrings and missing wrappers * Fix if defined macros * Fix more if defined macros * Add comments * Remove __host__ specifier * Add a gfx950 guard * Update function naming [ROCm/composable_kernel commit: `265af71a71`]	2025-04-03 12:42:03 -05:00
aledudek	7a78bc823a	Post-merge changes for fully async args copy in ck grouped gemm (#1991 ) * Post-merge changes for fully async args copy in ck grouped gemm * Post-merge documentation and naming changes * Build fix and updated changelog * Revised comments [ROCm/composable_kernel commit: `9329432f6c`]	2025-04-03 13:35:43 +02:00
Bartłomiej Kocot	49565538fe	Add support for GKCYX grouped conv weight (#2023 ) * Grouped conv bwd weight GKCYX support * fix and changelog * fix * fix * fixes * comments * fix [ROCm/composable_kernel commit: `2ccf914888`]	2025-04-02 23:59:49 +02:00
Adam Osewski	0a607256cf	Basic docs for universal gemm & ck-tile gemm. (#2014 ) * Basic docs for universal gemm & ck-tile gemm. * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Reviewers suggestions. * Align tparam names in doc with class tparams. * More reviewers fine tuning ;) --------- Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> [ROCm/composable_kernel commit: `e5ad48a784`]	2025-04-02 11:03:40 +02:00
Bartłomiej Kocot	50fb390d6f	Grouped conv backward data GKCYX support (#2029 ) * Grouped conv backward data GKCYX support * profiler * Converter * split instances [ROCm/composable_kernel commit: `8c0ab61ece`]	2025-04-01 13:24:38 -07:00
Bartłomiej Kocot	b417137276	Grouped conv fwd v3 fix for SplitN an G > 1 (#2038 ) * Grouped conv fwd v3 fix for SplitN an G > 1 * Remove int8 large test * Retore int8 test [ROCm/composable_kernel commit: `ec742908bd`]	2025-04-01 13:19:35 -07:00
Seunghoon Lee	0ac4c37028	Fix Windows build. (#2012 ) * Remove duplicate using uint64_t. * Cast before shift. [ROCm/composable_kernel commit: `df32020f93`]	2025-04-01 12:22:10 -07:00
Max Podkorytov	70ad8571c2	add a fast compilation path for static for (0..N) (#2005 ) * add a fast compilation path for static for (0..N) * Update functional2.hpp add comment and put range applier into detail namespace * Update functional.hpp ditto for ck-tile * prettify * prettify more * add comment * clang-format [ROCm/composable_kernel commit: `c59a8bb206`]	2025-04-01 12:06:25 -07:00
rocking	2b657d9a2c	Reduce redundant space in bias tensor (#2024 ) Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `8a20b62e91`]	2025-03-28 21:58:06 +08:00
felix	817752cdb4	hotfix fix sorting int64 (#2025 ) * fix sorting int64 * clang format * fix example issue * update WA issue # --------- Co-authored-by: coderfeli <coderfeli@163.com> Co-authored-by: carlushuang <carlus.huang@amd.com> [ROCm/composable_kernel commit: `a82f338fb9`]	2025-03-28 11:31:52 +08:00
Rostyslav Geyyer	48fa126a9e	Add MX FP4 device conversion tests (#1889 ) * Add conversion tests * Fix ctor * Fix nan logic * Fix conversion logic * Permute packed f4_t values * Fix conversion to float, repack vector elements * Fix device tests * Permute elements in a vector * Add a repro test * Add a conversion for a repro test * Update test vectors * Update conversion * Fix the test * Update test vector generator * Fix vector sr conversion * Permute conversion args * Update conversion * Test * Fix packing * Simplify conversion function * Pack conversion in a loop * Pack conversion in a loop * Pack another conversion in a loop * Pack one more conversion in a loop * Pack the last conversion in a loop * Clean up * Add printf to fix intrinsic * Add a sw-based workaround [ROCm/composable_kernel commit: `441343a23d`]	2025-03-26 19:23:01 -05:00
Bartłomiej Kocot	f967fd7296	Add support for GKCYX grouped conv fwd (#2015 ) * Add support for GKCYX grouped conv fwd * fixes * fix * changelog * Fixes [ROCm/composable_kernel commit: `54c81a1fcf`]	2025-03-26 21:13:38 +01:00
Andriy Roshchenko	3f06d019ba	MX GEMM examples with FP8, FP16, and E8M0 scales (#2016 ) * Add `scalar_type` specification for E8M0 exponent * Specialize `nnvb_data_t_selector` for E8M0 exponent * Remove partial specializations for `scalar_type` of `non_native_vector_base` template * Reword command line helper string * Create MX GEMM examples for different scales [ROCm/composable_kernel commit: `72d888821c`]	2025-03-25 15:33:03 -06:00

1 2 3 4 5 ...

764 Commits