composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-29 11:16:59 +00:00

Author	SHA1	Message	Date
joye	b406b584d7	fix clang format check	2025-04-28 08:56:00 +08:00
joye	292f89c860	fix a transpose index issue	2025-04-27 17:39:52 +08:00
joye	bd091d9a88	fix a transpose index issue	2025-04-27 17:33:32 +08:00
joye	de9407ed93	fix 16x16 related dimension transpose	2025-04-27 02:44:49 -05:00
joye	57e8e34705	exchange the iteration order	2025-04-26 21:37:59 -05:00
joye	944f36efb8	update transpose load example	2025-04-27 10:13:46 +08:00
joye	25ab38f913	fix transpose related codes	2025-04-27 09:55:29 +08:00
joye	a823e00dc0	delete unused variables	2025-04-25 14:47:49 +08:00
joye	a6564da629	update output tensor distribution	2025-04-25 14:45:45 +08:00
joye	119a8e0e16	hack for transpose 16x16	2025-04-25 00:56:59 -05:00
joye	e2f3c95d24	miss output tile distribution mapping	2025-04-24 21:55:03 -05:00
joye	6beb585dad	update tile transpose	2025-04-24 20:36:09 -05:00
joye	efa7243ee5	transpose load enable	2025-04-24 19:19:44 -05:00
joye	df9769afba	can pass; but no logic	2025-04-24 19:05:47 -05:00
joye	90a4501869	add some fix	2025-04-24 18:29:38 +08:00
joye	34040f43b6	update some codes	2025-04-24 17:53:19 +08:00
joye	8d75983536	fix a distribution issue	2025-04-24 02:24:31 -05:00
joye	6893165818	add some fixes	2025-04-24 01:20:35 -05:00
joye	afb1cec9c4	update transpose load logic	2025-04-24 11:14:07 +08:00
joye	3918a35870	Merge branch 'mi355_transpose_load_dev' of https://github.com/ROCm/composable_kernel into mi355_transpose_load_dev	2025-04-24 08:28:14 +08:00
joye	aaab4aacc5	Merge branch 'develop' of https://github.com/ROCm/composable_kernel into mi355_transpose_load_dev	2025-04-24 08:27:53 +08:00
joye	c862437f05	fix some issues	2025-04-23 19:27:29 -05:00
carlushuang	5487289fc4	[CK_TILE] support gfx950 matrix core in 01_fmha fwd (#2110 ) * gfx950 01_fmha fwd * fix comment --------- Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2025-04-23 12:40:18 -07:00
John Afaganis	854159fd00	Update CODEOWNERS (#2119 )	2025-04-23 10:25:41 -07:00
joye	acce2df3bf	fix some compile errors	2025-04-23 05:27:52 -05:00
joye	e14a16359f	add transpose load; no real logic	2025-04-23 16:55:13 +08:00
lalala-sh	39ba03f25d	Moe gemm activation (#2026 ) * fix useless code and remove usless oob * clang format * fix coredump in e2e test * fix2 * fix clang format * fix output oob * impl int64 but result not correct * int64 index ok now * input output all ok * fix uint32 * revert v1 test * use uint32 * mork to support 13w tokens * moe sorting fix moebuf * fix merge * update moe api fix aiter build * fix buid * fuse silu * silu ok * acale ok * add silu * change code * gemm2 ok * gufusion compatible ok, fix warnings * gu fusion for m32 m64 ok * support bf16 cshuffle * i4 gemm2 ok * i4 gemm2 ok and i4 gemm1 build * 16x16 run ok * change flops; change cshuffle dtype * fuse gelu silu act in moe gemm1 * fp8 with act ready * int4 act ready * remove useless changes * remove useless code change * fix clang format * add the arch limit of int4 moe gemm * fuse moe activation * fix fp8 16x16 * fix no quant case * fix bugs * fix fp8 gufusion bug * remove useless comments * refine activation code & complete moe example * fix int8 bugs * merge tkw1 --------- Co-authored-by: coderfeli <coderfeli@163.com> Co-authored-by: feli <felix.li@amd.com> Co-authored-by: illsilin <Illia.Silin@amd.com> Co-authored-by: root <root@hjbog-srdc-51.amd.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2025-04-23 10:35:34 +08:00
Khushbu Agarwal	94662b02d0	Adding include directory in tile_engine (#2116 )	2025-04-22 15:55:19 -07:00
Gino Lu	504f563f78	[CK-Tile] warp-gemm support for using V_MFMA_F32_16x16x32_BF16 (#2073 ) * draft v_mfma_f32_16x16x32_bf16 * fix error config and add debug code. * Solve the CShuffle Problem * draft v_mfma_f32_16x16x32_bf16 * fix error config and add debug code. * Solve the CShuffle Problem * fix error while testing new command * Finished the feature of new mfma 161632 * Addressed the comment --------- Co-authored-by: ThomasNing <thomas.ning@amd.com>	2025-04-22 15:52:36 -07:00
Rostyslav Geyyer	416e851584	Temporarily disable MX FP4 device tests (#2112 )	2025-04-22 16:08:48 -05:00
Thomas Ning	0cca8fa28f	GEMM Multiply Multiply Fix (#2102 ) * fix the type convert and increase the BF16 conversion + the profile comment * fix the CI	2025-04-22 01:13:22 -07:00
Thomas Ning	4bef60aa57	update code owner (#2113 )	2025-04-21 13:53:03 -07:00
Muhammed Emin Ozturk	b092c18da7	MI308 fix for streamk 1-Tile floating point exception (#2101 )	2025-04-21 11:44:07 -07:00
Thomas Ning	a738e43445	MFMA 16x16x32fp8 (#2103 ) * add mfma_16x16x32_fp8 * clang format code * Finished the fix for gemm basic * clang foramt * rebuild CI * recover gemm.hpp * add MFMA 161632bf8 --------- Co-authored-by: solin <bingzhou@amd.com>	2025-04-21 10:21:35 -07:00
Illia Silin	ce61759538	fix daily gfx942 build (#2106 )	2025-04-21 08:48:22 -07:00
Khushbu Agarwal	7cadf187e2	multi instance generation for CkTileEngine (#2080 ) * Add support for multi-instance verification, print detail for each instance, documentation fix * clang formatted * Added Readme file * updated readme * Addressing review comments * clang formatted * Updated ReadMe and GPU reference code * simplified dispatch kernel code * indentation	2025-04-21 08:39:45 -07:00
solin	c318ec0778	fix CI build fail	2025-04-21 16:00:12 +08:00
lalala-sh	bcf5bb41be	enable do top k weights in moe stage1 gemm (#2094 ) * add switch for mul topk weights * fix bf16/f16 bugs * complete	2025-04-18 10:45:49 +08:00
Andriy Roshchenko	213b203a3c	MX GEMM - Parameterized Test Template (#2088 ) * Tests for MX FP8 GEMM * Improve documentation	2025-04-16 19:56:00 -06:00
Andriy Roshchenko	da54464cce	MX GEMM - Add MX BF8 example (#2071 ) * Add MX GEMM example for MX BF8 * Verified MX FP8 with 16x16x128 scale builtin * Verify MX BF8 GEMM with BF16 output	2025-04-16 15:25:02 -06:00
Illia Silin	3bb62f16cd	Upgrade default docker to Ubuntu24.04 (#2090 ) * upgrade docker to Ubuntu24.04 * add break-system-packages flag to pip install * fix dockerfile	2025-04-16 12:10:15 -07:00
aledudek	7c32652e03	Add grouped conv fwd 3d GKCYX instances for f32, f16, bf16 (#2069 ) * Part1 * Add grouped conv fwd 3d GKCYX instances for f32, f16, bf16 * Add missing coma * Add missing cpp instance files * Fix 3d layout * Add missing closing bracket * Add missing comp x2 and part2 instances * Fix typo in instance name * fix * Fix --------- Co-authored-by: Bartlomiej Kocot <barkocot@amd.com>	2025-04-16 11:00:55 +02:00
BingYuan.Zhou	eaf1f0bf3b	[flatmm] implement basic fp16 flatmm (#2089 ) * [flatmm] implement basic fp16 flatmm * fix CI build fail --------- Co-authored-by: root <root@hjbog-srdc-50.amd.com> Co-authored-by: solin <bingzhou@amd.com>	2025-04-16 16:51:17 +08:00
felix	c5975529bb	add preshuffle gemm fp16 (#2036 ) * add preshuffle gemm fp16 * clang format and test ok * Update gemm_multiply_multiply_xdl_fp16_bpreshuffle.cpp remove useless comments in example * Update gemm_multiply_multiply_xdl_fp16_bpreshuffle.cpp remove 2 --------- Co-authored-by: coderfeli <coderfeli@163.com>	2025-04-16 10:53:21 +08:00
joyeamd	94d47b1680	fmha hdim256 vectorize improve (#2086 ) For hdim 256, will not have vectorized buffer load when seqlen % 256 != 0 and hdim % 256 = 0; this commit tries to solve this condition.	2025-04-16 09:21:04 +08:00
Andriy Roshchenko	7106976a72	MX GEMM - New GEMM pipeline for MX data types (#2059 ) * Allow selection of mfma_scale instructions * Read B tensor from LDS to VGPR in chunks of 16 in MFMA order * Add constexpr and synchronize return type for `get_exponent_value` * Pass scales by reference and add comments to `mfma_scale_f32_32x32x64` * Add support for microscaling instructions in `XdlopsGemm` * Fix `mfma_scale_f32_16x16x128f8f6f4` wrapper * Remove software implementation of MX GEMM * Make interface of `intrin_mfma_scale_f32_16x16x128f8f6f4<16, 16>` consistent with the other scale instruction * Update README * Updated CHANGELOG * Remove unused static methods	2025-04-15 17:17:07 -06:00
Illia Silin	d55c9cb313	Upgrade default docker image to ROCm6.4 release. (#2082 ) * upgrade to rocm6.4 * fix gfx10 generic target syntax * use gfx1101 target for unit tests * use gfx1201 target for unit tests * do not use generic targets until 6.4.1 release * update target list and dockerfile.compiler	2025-04-14 16:41:47 -07:00
Mingtao Gu	56378f810f	CK pk_i4_t test failures fix (SWDEV-518629) (#2075 ) * fix pk_i4_v3 tests failures in Unbuntu env. * fix pk_i4_t tests failure on Unbuntu issues. * some fixed. --------- Co-authored-by: mtgu0705 <mtgu@amd.com>	2025-04-14 16:58:57 +08:00
Thomas Ning	269f4f6af5	Solve the Static Encoding Pattern compile error when the tile size is too small (#2079 )	2025-04-13 20:09:30 -07:00
Illia Silin	0d4f145078	Fix build issues for multiple targets. (#2077 ) * build for multiple targets on gfx942 * add missing ignore statements	2025-04-11 12:12:53 -07:00

1 2 3 4 5 ...

1865 Commits