Aviral Goel
004784ef98
chore(copyright) update library wide CMakeLists.txt copyright header template ( #3313 )
...
* chore(copyright) update library wide CMakeLists.txt files copyright header template
* Fix build
---------
Co-authored-by: Sami Remes <samremes@amd.com >
2025-11-28 13:49:54 -08:00
Max Podkorytov
79aae7c7f7
[CK Tile] enable building examples by default ( #3259 )
...
* remove EXCLUDE_FROM_ALL from ck-tile examples
-> +15 min build time w/ 64 threads for a single arch
* fix cpp17 compile error in the ck-tile examples
---------
Co-authored-by: khuagarw <khuagarw@amd.com >
Co-authored-by: Ding, Yi <yi.ding@amd.com >
2025-11-26 16:24:44 -08:00
Yi DING
47e2ed838e
[CK_TILE] Add Flatmm MX FP8 ( #3208 )
...
* Use async for flatmm mxfp4
* Fix preshuffle
* Add flatmm mxfp8
* Thanks, Copilot
* Thanks Copilot again~
2025-11-20 10:35:15 +08:00
Yi DING
b6720531de
[CK_TILE] MX Flatmm Split kernel instances ( #3207 )
...
* [CK_TILE] MX Flatmm Split kernel instances
* Fix flatmm example compile
2025-11-18 13:46:30 +08:00
Yi DING
e135dd518d
[CK_TILE] Add mxfp4 flatmm ( #3080 )
...
* Squashed commit of the following:
commit 3e1a851dad834776efbe4fe365ac82c4ed312010
Author: Ding, Yi <yi.ding@amd.com >
Date: Thu Oct 23 06:10:54 2025 +0000
Fix & clean after rebase
commit 1edf485092f44411da9a1796a4a6b72d5cdb67c6
Author: Ding, Yi <yi.ding@amd.com >
Date: Wed Oct 22 10:46:13 2025 +0000
Squashed commit of the following:
commit 0b6b9dbd1b
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 22 02:04:27 2025 -0500
fix bandwidth calculation
commit 9aebf53bb7
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 22 00:58:59 2025 -0500
updates
commit 62607de56c
Author: mtgu0705 <mtgu@amd.com >
Date: Fri Sep 19 00:39:46 2025 -0500
fix a bug, set the A DS_read preload size to 4 for MXFP4
commit 92ad6fcc0a
Author: mtgu0705 <mtgu@amd.com >
Date: Thu Sep 18 01:19:03 2025 -0500
fix a_wrap preload issue for large MPerBlock.
commit f2db44710f
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 17 21:34:03 2025 -0500
optimized the VGPR repack issue for MXFP4
commit 346a400027
Author: Gino Lu <gino.lu@amd.com >
Date: Wed Sep 17 04:19:44 2025 -0500
fix time error
commit 80c1743034
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 17 03:58:00 2025 -0500
updated, function passed.
commit ce26d9071e
Author: mtgu0705 <mtgu@amd.com >
Date: Tue Sep 16 22:21:39 2025 -0500
fix, function partially passed
commit 0a89ed13a5
Author: mtgu0705 <mtgu@amd.com >
Date: Tue Sep 16 03:01:12 2025 -0500
fix, reference function passed, next check kernel function
commit ec9bcef591
Author: Gino Lu <gino.lu@amd.com >
Date: Tue Sep 16 02:29:01 2025 -0500
let pack/unpack return pk_fp4_t
commit a333206929
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 20:50:26 2025 -0500
fix
commit 3893c06540
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 15 05:51:06 2025 -0500
fix bug
commit 8052bea019
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 04:02:05 2025 -0500
fix core dump issue, function is not correct.
commit 9ceb3fd508
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 03:03:02 2025 -0500
updates, build pass
commit cc94eb6045
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 00:05:18 2025 -0500
updates
commit 22586c3135
Author: Gino Lu <gino.lu@amd.com >
Date: Sun Sep 14 23:40:28 2025 -0500
fix bug
commit e92e67b8dd
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 12 03:28:50 2025 -0500
fix interface
commit 8b1dd60c08
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 12 02:53:50 2025 -0500
add interface in warp_gemm_impl
commit c6135f6abe
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 10 05:03:08 2025 -0500
updates some fixes.
commit b0d71b8d19
Author: mtgu0705 <mtgu@amd.com >
Date: Tue Sep 9 04:37:42 2025 -0500
fix after merge ginolu/add_wgmfma_dispatcher
commit f119c30317
Merge: c5030e602 72c8ef856
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 8 22:09:15 2025 -0500
Merge remote-tracking branch 'origin/ginolu/add_wgmfma_dispatcher' into mtgu/cktile_mxfp4_flatmm_dev
commit c5030e602e
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 8 21:42:47 2025 -0500
update mx flatmm tail pipeline
commit 72c8ef8567
Merge: 9661bb400 e4a772890
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 8 19:10:23 2025 -0500
Merge branch 'develop' into ginolu/add_wgmfma_dispatcher
commit 9661bb400b
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 8 19:09:55 2025 -0500
fix type error
commit 0509597f55
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 8 04:01:40 2025 -0500
update hotloop pipeline
commit 754ae0461b
Merge: 15d44406e 83f607e2a
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 5 04:22:26 2025 -0500
Merge branch 'develop' into ginolu/add_wgmfma_dispatcher
commit 15d44406e5
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 5 04:21:26 2025 -0500
fix clang format
commit 146963d62a
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 3 10:00:54 2025 -0500
some updates
commit 12526b626a
Merge: 47cee0471 00fd72b2d
Author: asleepzzz <hanwen.chang@amd.com >
Date: Wed Sep 3 13:22:03 2025 +0800
Merge branch 'develop' into ginolu/add_wgmfma_dispatcher
commit 47cee04712
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 1 02:11:02 2025 -0500
fix vec size error
commit d2892925e5
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 1 01:23:39 2025 -0500
fix format error
commit 16993acd1d
Author: mtgu0705 <mtgu@amd.com >
Date: Sat Aug 30 03:19:07 2025 -0500
update codes
commit 9c37e55d13
Author: mtgu0705 <mtgu@amd.com >
Date: Fri Aug 29 11:27:33 2025 -0500
init ck_tile mxfp4 flatmm
commit 5c484a5672
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 28 08:02:50 2025 +0000
Add bias for f16xf4 moe_flatmm
commit dd6539f366
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 27 13:39:47 2025 +0000
update case construction
commit 65b702454c
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Tue Aug 26 12:32:29 2025 +0000
support swiglu activaion and use rcpf to accelerate silu
commit b422e41e08
Author: Gino Lu <gino.lu@amd.com >
Date: Tue Aug 26 02:33:55 2025 -0500
first commit
commit d05eed931d
Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu >
Date: Fri Aug 22 04:01:59 2025 -0500
add line to last
commit d69cab7f0c
Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu >
Date: Fri Aug 22 03:20:46 2025 -0500
adjust A_LDS descriptor to avoid bankconflict
commit 65989e940c
Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu >
Date: Thu Aug 21 09:46:52 2025 -0500
enable hotloop
commit c378e9bdf8
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 21 09:12:21 2025 +0000
support atomic_pk_add_bf16 on gfx950
commit 85976b0b87
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 21 06:58:55 2025 +0000
use int64_t as expert stride to avoid overflow
commit 9fbcc8f8a4
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 20 13:53:32 2025 +0000
use v4i32 as the storage type for B to avoid repack operation
commit 81899bd920
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 20 06:40:03 2025 +0000
add pk_fp4_t and e8m0_t support for amd_buffer_load_impl
commit c27eb0771a
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 20 04:39:14 2025 +0000
optimize cvt_pkf4_to_f16 implementation
commit 3ca0bd500a
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Tue Aug 19 14:56:46 2025 +0000
optimize A_LDS descriptor to avoid bankconflict
commit f7f0306eea
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 18 18:43:37 2025 +0000
fix gate-up when GU_NRepeat > 1
commit be55c0f9cb
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 18 17:28:11 2025 +0000
add fp16xf4 moe
commit 599e1f5b32
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Sun Aug 17 17:51:18 2025 +0000
rename example
commit 7899fb4a8d
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 15 06:20:46 2025 +0000
remove additional check when e8m0->float
commit 714b341797
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 14 09:34:12 2025 +0000
eliminate repeat dequant
commit 53e8c0c533
Merge: 5de620895 cc9c7b9e5
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 13 16:51:49 2025 +0000
Merge remote-tracking branch 'origin/moe_flatmm' into feat-mixed_input_flatmm
commit 5de6208952
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 13 16:16:48 2025 +0000
update f16xMXF4
commit 732ebdee8b
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 13 10:48:53 2025 +0000
update scale-preshuffle for MXF4
commit edb58d0680
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 11:24:34 2025 +0000
update
commit cc9c7b9e58
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 08:38:23 2025 +0000
optimize gemm2 atomic_add pattern
commit 200a11afc8
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 07:59:47 2025 +0000
update scale for mxfp4
commit 87aed564dc
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 07:56:14 2025 +0000
update case construction
commit 8b85fa6cf2
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 06:03:06 2025 +0000
update granularity control
commit 1b8c7097b8
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 03:42:46 2025 +0000
fix TileConfig
commit 8ba1c708dc
Author: Gino Lu <gino.lu@amd.com >
Date: Thu Aug 7 21:37:28 2025 +0800
Add e8m0 scaled convert into CK_TILE (#2617 )
* first commit
* remove redundent code
* modify according to comments.
* fix type_convert error with scaled_type_convert
commit f788d3d629
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 8 20:19:16 2025 +0000
add mixed_prec fp16xfp4
commit 3dea10a277
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 7 09:22:04 2025 +0000
debug mixed_prec flatmm
commit 0ba513b148
Merge: 90e910f3a c0cb4d036
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Aug 6 16:49:47 2025 +0800
Merge pull request #2626 from ROCm/felix/flatmm_fix_splitk
fix split k
commit 6d3cbc7c0e
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 6 08:33:33 2025 +0000
add moe_flatmm
commit c0cb4d036d
Author: coderfeli <coderfeli@163.com >
Date: Wed Aug 6 02:45:31 2025 +0000
fix split k
commit 90e910f3a7
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 4 07:16:36 2025 +0000
fix flatmm with scaling when WarpTileM == 32
commit aa5e008fa5
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 1 11:01:23 2025 +0000
optimize scaling epilogue
commit ac5908c0bb
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 1 07:28:38 2025 +0000
fix wrong config for fp8 scaling
commit 3f43b841d4
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 30 06:20:30 2025 +0000
prune debug message
commit 2e5d4c74cd
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 30 04:52:08 2025 +0000
fix compile error
commit c117a1986a
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Tue Jul 29 15:42:58 2025 +0000
Add persistent option on flatmm for tuning
commit a587701117
Author: AMD-dteng <dteng@amd.com >
Date: Tue Jul 29 22:48:00 2025 +0800
update pipeline v1: add atomic IGLP schedule
commit f9e48148d2
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 24 09:09:27 2025 +0000
fix error log throwing
commit 1b6d7cf407
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Jul 28 08:24:51 2025 +0000
crz idea
commit 5473f06461
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Sun Jul 27 11:57:38 2025 +0000
Add permuteN optimzization when NRepeat % 2 == 0 on flatmm
commit bfb9f4002f
Author: sjfeng <j514681085@icloud.com >
Date: Sun Jul 27 17:24:08 2025 +0800
try to remove c_shuffle_lds
commit 1264f4d2ab
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Jul 25 07:41:48 2025 +0000
fix loop-dim mismatch and improve c_shuffle alu parallelism
commit 1239d8a546
Merge: 406645448 b908f5e80
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 24 08:46:51 2025 +0000
merge flatmm -scale
commit 4066454483
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 24 16:19:58 2025 +0800
revert delete of inc file
commit 68390988c9
Author: solin <bingzhou@amd.com >
Date: Thu Jul 24 04:38:16 2025 +0000
reorg flatmm code
commit b908f5e803
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 23 19:12:31 2025 +0000
fix flatmm syntax error on gfx950
commit 5a1183ebbd
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 23 19:04:22 2025 +0000
support flatmm scaling
commit 89fa639207
Author: valarLip <340077269@qq.com >
Date: Wed Jul 23 08:44:12 2025 +0000
merge flatmm pipe v0 from dteng_flatmm_opt
commit 3f7d848dd3
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 23 15:38:12 2025 +0800
build pass
commit 6dacf833da
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 23 07:20:26 2025 +0000
fix bug
commit 7e1bd4b839
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 23 15:01:53 2025 +0800
sync
commit 46a538e39e
Author: valarLip <340077269@qq.com >
Date: Tue Jul 22 08:09:35 2025 +0000
adaptive scheduler instead of Macro definition
commit 9aa3396a79
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 17 08:40:35 2025 +0000
fix tail handler bug
commit fb76450e63
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 16 10:12:19 2025 +0000
merge from dteng_flatmm_opt
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: AMD-dteng <dteng@amd.com >
Co-authored-by: solin <bingzhou@amd.com >
Co-authored-by: sjfeng <j514681085@icloud.com >
Co-authored-by: valarLip <340077269@qq.com >
Co-authored-by: asleepzzz <hanwen.chang@amd.com >
Co-authored-by: Feng Shijie <Shijie.Feng@amd.com >
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: Gino Lu <gino.lu@amd.com >
Co-authored-by: mtgu0705 <mtgu@amd.com >
* Fix crash on small M
* Apply suggestion from @Copilot
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: AMD-dteng <dteng@amd.com >
Co-authored-by: solin <bingzhou@amd.com >
Co-authored-by: sjfeng <j514681085@icloud.com >
Co-authored-by: valarLip <340077269@qq.com >
Co-authored-by: asleepzzz <hanwen.chang@amd.com >
Co-authored-by: Feng Shijie <Shijie.Feng@amd.com >
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: Gino Lu <gino.lu@amd.com >
Co-authored-by: mtgu0705 <mtgu@amd.com >
2025-10-31 11:29:05 +08:00
lalala-sh
211d64e18a
[CK_TILE] Update flatmm related kernels ( #3022 )
...
---------
Co-authored-by: Ding, Yi <yi.ding@amd.com >
Co-authored-by: felix <felix.li@amd.com >
2025-10-22 22:36:11 +08:00
linqunAMD
37e1a27537
[CK_TILE] Refine fp8 support in flatmm ( #2239 )
...
* [CK_TILE] Refine fp8 in flatmm
1. Replace USING_MFMA_16x16x32 & USING_MFMA_16x16x32 with constexpr
2. Add an additional const check to avoid build error in HotLoopScheduler
3. Refine shuffleb to support both tile 32x32 and 16x16
4. Support command option -init
5. Move Gemm warp defintion to a separate struct
* fix clang format
* fix clang format
* keep default bhavior unchanged (warp tile = 16x16)
* fix tile engine build error
* fix a typo in codegen_utils.py
* address review comments
* address review comments
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-06-25 01:07:45 -07:00
Khushbu Agarwal
bd270fe4bc
fix flatmm kernel for bigger size for fp16 datatype ( #2302 )
2025-06-10 11:13:40 -07:00
BingYuan.Zhou
6a3960c1e1
Flatmm merge ( #2168 )
...
* sync with function interface of cshuffleepiloge,fix flatmm build fail
* move code from solin/flatmm which add mfma16*16*32fp8 and optimize flatmm
---------
Co-authored-by: solin <bingzhou@amd.com >
2025-05-08 12:59:57 +08:00
Illia Silin
9a9f59ae69
Revert "Add ck tile examples to package ( #1880 )" ( #2150 )
2025-04-30 10:20:16 -07:00
jakpiase
434d19f696
Add ck tile examples to package ( #1880 )
...
* add ck tile examples to package
* Update jenkinsfile
* fix for jenkinsfile
* fix for building ck tile code on non gfx9
* compile ck tile examples only for gfx94
* include ck tile examples in all target
* fix for basic gemm UseStructuredSparsity
* Update CMakeLists.txt
* Update gemm_pipeline_problem.hpp
* add targets to rocm install
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2025-04-28 09:53:19 -07:00
BingYuan.Zhou
eaf1f0bf3b
[flatmm] implement basic fp16 flatmm ( #2089 )
...
* [flatmm] implement basic fp16 flatmm
* fix CI build fail
---------
Co-authored-by: root <root@hjbog-srdc-50.amd.com >
Co-authored-by: solin <bingzhou@amd.com >
2025-04-16 16:51:17 +08:00