composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-06 07:12:19 +00:00

Author	SHA1	Message	Date
lalala-sh	960b2bce1c	update	2025-05-08 09:48:23 +00:00
lalala-sh	abff33eaab	tune fp8 example	2025-05-06 08:46:38 +00:00
lalala-sh	0ab978584d	fix bugs	2025-05-06 07:36:59 +00:00
lalala-sh	9c06c3817a	[fix] align v3 gufusion pipeline	2025-04-30 02:27:39 +00:00
lalala-sh	b8427b812e	align v3 gufusion pipeline	2025-04-30 01:35:49 +00:00
aska-0096	bc9c819aa4	generalized bpreshuffle pipeline optimization	2025-04-27 11:50:30 +00:00
aska-0096	49338edb1b	tempsave	2025-04-27 08:05:20 +00:00
aska-0096	1637dd5297	Merge branch 'swdev_528812_moe' of https://github.com/ROCm/composable_kernel into swdev_528812	2025-04-25 05:14:01 +00:00
aska-0096	946a2119cd	temp save	2025-04-25 05:12:47 +00:00
coderfeli	e07ed1eda8	use v3	2025-04-25 03:15:48 +00:00
coderfeli	f9c29b5ec7	set 16x16	2025-04-25 03:09:53 +00:00
coderfeli	cd7955ce8b	Merge branch 'swdev_528812' into dev/moe_opt	2025-04-25 02:51:33 +00:00
coderfeli	ddb5f36eeb	add missing file	2025-04-24 11:10:31 +00:00
coderfeli	c3c4a1e252	change test	2025-04-24 11:09:41 +00:00
coderfeli	ceaa5a984b	gu fusion v3	2025-04-24 11:05:08 +00:00
coderfeli	2054e165bc	fix moe pipeline and change to compute tile	2025-04-24 06:26:12 +00:00
aska-0096	abd40d3569	found a case that seems like have vectorizer issue	2025-04-23 15:55:59 +00:00
aska-0096	d6e2dd92fe	enable f8 new mfma for preshuffle gemm. found some vectorizer issue even with slp flag	2025-04-23 15:53:15 +00:00
lalala-sh	39ba03f25d	Moe gemm activation (#2026 ) * fix useless code and remove usless oob * clang format * fix coredump in e2e test * fix2 * fix clang format * fix output oob * impl int64 but result not correct * int64 index ok now * input output all ok * fix uint32 * revert v1 test * use uint32 * mork to support 13w tokens * moe sorting fix moebuf * fix merge * update moe api fix aiter build * fix buid * fuse silu * silu ok * acale ok * add silu * change code * gemm2 ok * gufusion compatible ok, fix warnings * gu fusion for m32 m64 ok * support bf16 cshuffle * i4 gemm2 ok * i4 gemm2 ok and i4 gemm1 build * 16x16 run ok * change flops; change cshuffle dtype * fuse gelu silu act in moe gemm1 * fp8 with act ready * int4 act ready * remove useless changes * remove useless code change * fix clang format * add the arch limit of int4 moe gemm * fuse moe activation * fix fp8 16x16 * fix no quant case * fix bugs * fix fp8 gufusion bug * remove useless comments * refine activation code & complete moe example * fix int8 bugs * merge tkw1 --------- Co-authored-by: coderfeli <coderfeli@163.com> Co-authored-by: feli <felix.li@amd.com> Co-authored-by: illsilin <Illia.Silin@amd.com> Co-authored-by: root <root@hjbog-srdc-51.amd.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2025-04-23 10:35:34 +08:00
aska-0096	25bb0d2fee	add flags to avoid vectorizer problem	2025-04-23 02:08:44 +00:00
Khushbu Agarwal	94662b02d0	Adding include directory in tile_engine (#2116 )	2025-04-22 15:55:19 -07:00
Gino Lu	504f563f78	[CK-Tile] warp-gemm support for using V_MFMA_F32_16x16x32_BF16 (#2073 ) * draft v_mfma_f32_16x16x32_bf16 * fix error config and add debug code. * Solve the CShuffle Problem * draft v_mfma_f32_16x16x32_bf16 * fix error config and add debug code. * Solve the CShuffle Problem * fix error while testing new command * Finished the feature of new mfma 161632 * Addressed the comment --------- Co-authored-by: ThomasNing <thomas.ning@amd.com>	2025-04-22 15:52:36 -07:00
Rostyslav Geyyer	416e851584	Temporarily disable MX FP4 device tests (#2112 )	2025-04-22 16:08:48 -05:00
aska-0096	5366d3415b	f8 mfma issue	2025-04-22 10:59:03 +00:00
Thomas Ning	0cca8fa28f	GEMM Multiply Multiply Fix (#2102 ) * fix the type convert and increase the BF16 conversion + the profile comment * fix the CI	2025-04-22 01:13:22 -07:00
Thomas Ning	4bef60aa57	update code owner (#2113 )	2025-04-21 13:53:03 -07:00
Muhammed Emin Ozturk	b092c18da7	MI308 fix for streamk 1-Tile floating point exception (#2101 )	2025-04-21 11:44:07 -07:00
Thomas Ning	a738e43445	MFMA 16x16x32fp8 (#2103 ) * add mfma_16x16x32_fp8 * clang format code * Finished the fix for gemm basic * clang foramt * rebuild CI * recover gemm.hpp * add MFMA 161632bf8 --------- Co-authored-by: solin <bingzhou@amd.com>	2025-04-21 10:21:35 -07:00
Illia Silin	ce61759538	fix daily gfx942 build (#2106 )	2025-04-21 08:48:22 -07:00
Khushbu Agarwal	7cadf187e2	multi instance generation for CkTileEngine (#2080 ) * Add support for multi-instance verification, print detail for each instance, documentation fix * clang formatted * Added Readme file * updated readme * Addressing review comments * clang formatted * Updated ReadMe and GPU reference code * simplified dispatch kernel code * indentation	2025-04-21 08:39:45 -07:00
solin	c318ec0778	fix CI build fail	2025-04-21 16:00:12 +08:00
lalala-sh	bcf5bb41be	enable do top k weights in moe stage1 gemm (#2094 ) * add switch for mul topk weights * fix bf16/f16 bugs * complete	2025-04-18 10:45:49 +08:00
Andriy Roshchenko	213b203a3c	MX GEMM - Parameterized Test Template (#2088 ) * Tests for MX FP8 GEMM * Improve documentation	2025-04-16 19:56:00 -06:00
Andriy Roshchenko	da54464cce	MX GEMM - Add MX BF8 example (#2071 ) * Add MX GEMM example for MX BF8 * Verified MX FP8 with 16x16x128 scale builtin * Verify MX BF8 GEMM with BF16 output	2025-04-16 15:25:02 -06:00
Illia Silin	3bb62f16cd	Upgrade default docker to Ubuntu24.04 (#2090 ) * upgrade docker to Ubuntu24.04 * add break-system-packages flag to pip install * fix dockerfile	2025-04-16 12:10:15 -07:00
aledudek	7c32652e03	Add grouped conv fwd 3d GKCYX instances for f32, f16, bf16 (#2069 ) * Part1 * Add grouped conv fwd 3d GKCYX instances for f32, f16, bf16 * Add missing coma * Add missing cpp instance files * Fix 3d layout * Add missing closing bracket * Add missing comp x2 and part2 instances * Fix typo in instance name * fix * Fix --------- Co-authored-by: Bartlomiej Kocot <barkocot@amd.com>	2025-04-16 11:00:55 +02:00
BingYuan.Zhou	eaf1f0bf3b	[flatmm] implement basic fp16 flatmm (#2089 ) * [flatmm] implement basic fp16 flatmm * fix CI build fail --------- Co-authored-by: root <root@hjbog-srdc-50.amd.com> Co-authored-by: solin <bingzhou@amd.com>	2025-04-16 16:51:17 +08:00
felix	c5975529bb	add preshuffle gemm fp16 (#2036 ) * add preshuffle gemm fp16 * clang format and test ok * Update gemm_multiply_multiply_xdl_fp16_bpreshuffle.cpp remove useless comments in example * Update gemm_multiply_multiply_xdl_fp16_bpreshuffle.cpp remove 2 --------- Co-authored-by: coderfeli <coderfeli@163.com>	2025-04-16 10:53:21 +08:00
joyeamd	94d47b1680	fmha hdim256 vectorize improve (#2086 ) For hdim 256, will not have vectorized buffer load when seqlen % 256 != 0 and hdim % 256 = 0; this commit tries to solve this condition.	2025-04-16 09:21:04 +08:00
Andriy Roshchenko	7106976a72	MX GEMM - New GEMM pipeline for MX data types (#2059 ) * Allow selection of mfma_scale instructions * Read B tensor from LDS to VGPR in chunks of 16 in MFMA order * Add constexpr and synchronize return type for `get_exponent_value` * Pass scales by reference and add comments to `mfma_scale_f32_32x32x64` * Add support for microscaling instructions in `XdlopsGemm` * Fix `mfma_scale_f32_16x16x128f8f6f4` wrapper * Remove software implementation of MX GEMM * Make interface of `intrin_mfma_scale_f32_16x16x128f8f6f4<16, 16>` consistent with the other scale instruction * Update README * Updated CHANGELOG * Remove unused static methods	2025-04-15 17:17:07 -06:00
Illia Silin	d55c9cb313	Upgrade default docker image to ROCm6.4 release. (#2082 ) * upgrade to rocm6.4 * fix gfx10 generic target syntax * use gfx1101 target for unit tests * use gfx1201 target for unit tests * do not use generic targets until 6.4.1 release * update target list and dockerfile.compiler	2025-04-14 16:41:47 -07:00
Mingtao Gu	56378f810f	CK pk_i4_t test failures fix (SWDEV-518629) (#2075 ) * fix pk_i4_v3 tests failures in Unbuntu env. * fix pk_i4_t tests failure on Unbuntu issues. * some fixed. --------- Co-authored-by: mtgu0705 <mtgu@amd.com>	2025-04-14 16:58:57 +08:00
Thomas Ning	269f4f6af5	Solve the Static Encoding Pattern compile error when the tile size is too small (#2079 )	2025-04-13 20:09:30 -07:00
Illia Silin	0d4f145078	Fix build issues for multiple targets. (#2077 ) * build for multiple targets on gfx942 * add missing ignore statements	2025-04-11 12:12:53 -07:00
Muhammed Emin Ozturk	74fda2e796	CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test Redo PR #2044 (#2070 ) * fix and split gemm_universal test * Update test_gemm_universal_streamk_ut_cases_fp8.inc	2025-04-11 10:17:29 -07:00
jakpiase	6c61f4d237	[CK_TILE] Add 2:4 structured sparsity support for fp16 gemm (#1957 ) * add structured sparsity fp16 support for gemm * added reviewer suggestions * update changelog * update changelog * add reviewers suggestions * Minor fix * clang fix * fix doxygen	2025-04-11 12:18:26 +02:00
slippedJim	5f885d2b7a	add fmha fwd splitkv receipt for aiter c++ api (#2068 ) * add s_randval for c++ api * Fix bug of bias in splitkv --------- Co-authored-by: rocking <ChunYu.Lai@amd.com>	2025-04-10 23:21:13 +08:00
Juan Manuel Martinez Caamaño	f14e648e7c	Replace inline assembly with builtins in FHMA (#2067 ) * Replace inline assembly with builtins in FHMA --------- Co-authored-by: illsilin <Illia.Silin@amd.com>	2025-04-10 09:48:37 +02:00
Illia Silin	3e6d21adeb	enable gfx115x support (#2065 )	2025-04-09 10:06:42 -07:00
MHYang-gh	03ce8729fd	Make buffer coherence configurable in tensor view (#2041 ) * Make buffer coherence configurable in tensor view * Fix clang-format for tensor_view.hpp	2025-04-08 15:34:11 -07:00

1 2 3 4 5 ...

1859 Commits