Yi DING
b0ea67e377
[CK_TILE] MX FLATMM Fix M Padding ( #3489 )
...
* Fix M Padding
* Fix tensor desc ele space size
2025-12-29 09:09:12 +08:00
Lyu, Xudong
8b73633e65
fix: handle void return type in TailHandler error path with ROCm6 compiler (clang++) ( #3477 )
...
Replace `decltype(TailHandler<>(...)){}` with direct function call
to fix compilation error when return type is void.
Co-authored-by: Yi DING <yi.ding@amd.com >
2025-12-23 15:03:18 +08:00
Yi DING
2220cbaba7
[CK_TILE] MX Flatmm Use Byte Pointer Arithmetic for A Tensor ( #3446 )
...
* A as bytes
* Reformat with static_for_product
2025-12-19 10:28:13 +08:00
yadaish
c0ee71d735
Dev/a8w4 and a8w8splitk ( #3447 )
...
* Ck moe bs splitk pr (#3440 )
* splitk kick-off. Compilation fail
* splitk hack pass
* fix scale offset calc.
* clang-format for a8w8_moe_blk_gemm1 splitk change
* fix testcase error
---------
Co-authored-by: oscar <huaiguxu@amd.com >
Co-authored-by: huaiguxu <145733371+huaiguxu@users.noreply.github.com >
* Zan/moe a8w4 (#3441 )
* update
* update
* update ck moe a8w4
* update
* update
* update
* compile pass
* update
* update
* python3 op_tests/test_moe_2stage.py -t 16 -e 1 -k 1 -dim 256,256 ready
* support new a8w4 kernel
* update
* update ck_tile
* re format
* update
* update
* fix conflict
* fix build
* update ck_tile moe
* fix clang format
* fix the problem
* fix accruacy issue
* fix
---------
Co-authored-by: oscar <huaiguxu@amd.com >
Co-authored-by: huaiguxu <145733371+huaiguxu@users.noreply.github.com >
Co-authored-by: Zzz9990 <zanzhang@amd.com >
Co-authored-by: felix <felix.li@amd.com >
2025-12-19 09:26:52 +08:00
Bartłomiej Kocot
700b2ec9c0
Update AMD buffer coherency ( #3403 )
...
* Update AMD buffer coherency [AICK-421]
* fixes
* fix
* fixes
* fixes
* Add backward compatilibity
* fix
* fixes
* fix
* fix
* fix
* Update grouped_convolution_backward_weight_kernel.hpp
2025-12-18 10:16:22 +01:00
Yi DING
57e1e4a848
[CK_TILE] Add FP8xF4 Flatmm ( #3401 )
...
* Refactor policy
* fix a bank conflict
* Enable mixed mx flatmm
* Update
2025-12-17 10:01:48 +08:00
Zzz9990
1aa93ef551
[CK_TILE MOE] add NT & preshuffle permute to cktile MOE ( #3377 )
...
* update coherence
---------
Co-authored-by: Zzz9990 <Zzz9990>
2025-12-10 10:03:28 +08:00
Yi DING
878b4e7f46
[CK_TILE] Optimize Flatmm MXFP4 by Eliminating Runtime Division by 2 ( #3287 )
...
* [CK_TILE] Optimize Flatmm MXFP4 by Eliminating Runtime Division by 2
* typo
2025-12-08 19:20:44 +08:00
Yi DING
f211156ce6
[CK_Tile] Flatmm MX Cleanup & Explicite Offset Calculation ( #3286 )
2025-12-02 14:21:12 +08:00
Aviral Goel
de6466481f
chore(copyright): update copyright header for include directory ( #3293 )
2025-11-26 11:00:05 -07:00
Yi DING
c7dce2ac29
[CK_TILE] Fix Compilation of Flatmm Examples ( #3285 )
2025-11-26 10:11:43 +08:00
Thomas Ning
de6a9590ab
Reorganize of KPack in GEMM ( #3247 )
...
* add the reorganize of KPack
* fix the compilation error
* fix the compilation error
2025-11-24 12:38:59 -08:00
Yi DING
47e2ed838e
[CK_TILE] Add Flatmm MX FP8 ( #3208 )
...
* Use async for flatmm mxfp4
* Fix preshuffle
* Add flatmm mxfp8
* Thanks, Copilot
* Thanks Copilot again~
2025-11-20 10:35:15 +08:00
Yi DING
b6720531de
[CK_TILE] MX Flatmm Split kernel instances ( #3207 )
...
* [CK_TILE] MX Flatmm Split kernel instances
* Fix flatmm example compile
2025-11-18 13:46:30 +08:00
Yi DING
e135dd518d
[CK_TILE] Add mxfp4 flatmm ( #3080 )
...
* Squashed commit of the following:
commit 3e1a851dad834776efbe4fe365ac82c4ed312010
Author: Ding, Yi <yi.ding@amd.com >
Date: Thu Oct 23 06:10:54 2025 +0000
Fix & clean after rebase
commit 1edf485092f44411da9a1796a4a6b72d5cdb67c6
Author: Ding, Yi <yi.ding@amd.com >
Date: Wed Oct 22 10:46:13 2025 +0000
Squashed commit of the following:
commit 0b6b9dbd1b
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 22 02:04:27 2025 -0500
fix bandwidth calculation
commit 9aebf53bb7
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 22 00:58:59 2025 -0500
updates
commit 62607de56c
Author: mtgu0705 <mtgu@amd.com >
Date: Fri Sep 19 00:39:46 2025 -0500
fix a bug, set the A DS_read preload size to 4 for MXFP4
commit 92ad6fcc0a
Author: mtgu0705 <mtgu@amd.com >
Date: Thu Sep 18 01:19:03 2025 -0500
fix a_wrap preload issue for large MPerBlock.
commit f2db44710f
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 17 21:34:03 2025 -0500
optimized the VGPR repack issue for MXFP4
commit 346a400027
Author: Gino Lu <gino.lu@amd.com >
Date: Wed Sep 17 04:19:44 2025 -0500
fix time error
commit 80c1743034
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 17 03:58:00 2025 -0500
updated, function passed.
commit ce26d9071e
Author: mtgu0705 <mtgu@amd.com >
Date: Tue Sep 16 22:21:39 2025 -0500
fix, function partially passed
commit 0a89ed13a5
Author: mtgu0705 <mtgu@amd.com >
Date: Tue Sep 16 03:01:12 2025 -0500
fix, reference function passed, next check kernel function
commit ec9bcef591
Author: Gino Lu <gino.lu@amd.com >
Date: Tue Sep 16 02:29:01 2025 -0500
let pack/unpack return pk_fp4_t
commit a333206929
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 20:50:26 2025 -0500
fix
commit 3893c06540
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 15 05:51:06 2025 -0500
fix bug
commit 8052bea019
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 04:02:05 2025 -0500
fix core dump issue, function is not correct.
commit 9ceb3fd508
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 03:03:02 2025 -0500
updates, build pass
commit cc94eb6045
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 00:05:18 2025 -0500
updates
commit 22586c3135
Author: Gino Lu <gino.lu@amd.com >
Date: Sun Sep 14 23:40:28 2025 -0500
fix bug
commit e92e67b8dd
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 12 03:28:50 2025 -0500
fix interface
commit 8b1dd60c08
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 12 02:53:50 2025 -0500
add interface in warp_gemm_impl
commit c6135f6abe
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 10 05:03:08 2025 -0500
updates some fixes.
commit b0d71b8d19
Author: mtgu0705 <mtgu@amd.com >
Date: Tue Sep 9 04:37:42 2025 -0500
fix after merge ginolu/add_wgmfma_dispatcher
commit f119c30317
Merge: c5030e602 72c8ef856
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 8 22:09:15 2025 -0500
Merge remote-tracking branch 'origin/ginolu/add_wgmfma_dispatcher' into mtgu/cktile_mxfp4_flatmm_dev
commit c5030e602e
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 8 21:42:47 2025 -0500
update mx flatmm tail pipeline
commit 72c8ef8567
Merge: 9661bb400 e4a772890
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 8 19:10:23 2025 -0500
Merge branch 'develop' into ginolu/add_wgmfma_dispatcher
commit 9661bb400b
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 8 19:09:55 2025 -0500
fix type error
commit 0509597f55
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 8 04:01:40 2025 -0500
update hotloop pipeline
commit 754ae0461b
Merge: 15d44406e 83f607e2a
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 5 04:22:26 2025 -0500
Merge branch 'develop' into ginolu/add_wgmfma_dispatcher
commit 15d44406e5
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 5 04:21:26 2025 -0500
fix clang format
commit 146963d62a
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 3 10:00:54 2025 -0500
some updates
commit 12526b626a
Merge: 47cee0471 00fd72b2d
Author: asleepzzz <hanwen.chang@amd.com >
Date: Wed Sep 3 13:22:03 2025 +0800
Merge branch 'develop' into ginolu/add_wgmfma_dispatcher
commit 47cee04712
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 1 02:11:02 2025 -0500
fix vec size error
commit d2892925e5
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 1 01:23:39 2025 -0500
fix format error
commit 16993acd1d
Author: mtgu0705 <mtgu@amd.com >
Date: Sat Aug 30 03:19:07 2025 -0500
update codes
commit 9c37e55d13
Author: mtgu0705 <mtgu@amd.com >
Date: Fri Aug 29 11:27:33 2025 -0500
init ck_tile mxfp4 flatmm
commit 5c484a5672
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 28 08:02:50 2025 +0000
Add bias for f16xf4 moe_flatmm
commit dd6539f366
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 27 13:39:47 2025 +0000
update case construction
commit 65b702454c
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Tue Aug 26 12:32:29 2025 +0000
support swiglu activaion and use rcpf to accelerate silu
commit b422e41e08
Author: Gino Lu <gino.lu@amd.com >
Date: Tue Aug 26 02:33:55 2025 -0500
first commit
commit d05eed931d
Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu >
Date: Fri Aug 22 04:01:59 2025 -0500
add line to last
commit d69cab7f0c
Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu >
Date: Fri Aug 22 03:20:46 2025 -0500
adjust A_LDS descriptor to avoid bankconflict
commit 65989e940c
Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu >
Date: Thu Aug 21 09:46:52 2025 -0500
enable hotloop
commit c378e9bdf8
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 21 09:12:21 2025 +0000
support atomic_pk_add_bf16 on gfx950
commit 85976b0b87
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 21 06:58:55 2025 +0000
use int64_t as expert stride to avoid overflow
commit 9fbcc8f8a4
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 20 13:53:32 2025 +0000
use v4i32 as the storage type for B to avoid repack operation
commit 81899bd920
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 20 06:40:03 2025 +0000
add pk_fp4_t and e8m0_t support for amd_buffer_load_impl
commit c27eb0771a
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 20 04:39:14 2025 +0000
optimize cvt_pkf4_to_f16 implementation
commit 3ca0bd500a
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Tue Aug 19 14:56:46 2025 +0000
optimize A_LDS descriptor to avoid bankconflict
commit f7f0306eea
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 18 18:43:37 2025 +0000
fix gate-up when GU_NRepeat > 1
commit be55c0f9cb
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 18 17:28:11 2025 +0000
add fp16xf4 moe
commit 599e1f5b32
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Sun Aug 17 17:51:18 2025 +0000
rename example
commit 7899fb4a8d
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 15 06:20:46 2025 +0000
remove additional check when e8m0->float
commit 714b341797
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 14 09:34:12 2025 +0000
eliminate repeat dequant
commit 53e8c0c533
Merge: 5de620895 cc9c7b9e5
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 13 16:51:49 2025 +0000
Merge remote-tracking branch 'origin/moe_flatmm' into feat-mixed_input_flatmm
commit 5de6208952
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 13 16:16:48 2025 +0000
update f16xMXF4
commit 732ebdee8b
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 13 10:48:53 2025 +0000
update scale-preshuffle for MXF4
commit edb58d0680
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 11:24:34 2025 +0000
update
commit cc9c7b9e58
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 08:38:23 2025 +0000
optimize gemm2 atomic_add pattern
commit 200a11afc8
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 07:59:47 2025 +0000
update scale for mxfp4
commit 87aed564dc
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 07:56:14 2025 +0000
update case construction
commit 8b85fa6cf2
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 06:03:06 2025 +0000
update granularity control
commit 1b8c7097b8
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 03:42:46 2025 +0000
fix TileConfig
commit 8ba1c708dc
Author: Gino Lu <gino.lu@amd.com >
Date: Thu Aug 7 21:37:28 2025 +0800
Add e8m0 scaled convert into CK_TILE (#2617 )
* first commit
* remove redundent code
* modify according to comments.
* fix type_convert error with scaled_type_convert
commit f788d3d629
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 8 20:19:16 2025 +0000
add mixed_prec fp16xfp4
commit 3dea10a277
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 7 09:22:04 2025 +0000
debug mixed_prec flatmm
commit 0ba513b148
Merge: 90e910f3a c0cb4d036
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Aug 6 16:49:47 2025 +0800
Merge pull request #2626 from ROCm/felix/flatmm_fix_splitk
fix split k
commit 6d3cbc7c0e
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 6 08:33:33 2025 +0000
add moe_flatmm
commit c0cb4d036d
Author: coderfeli <coderfeli@163.com >
Date: Wed Aug 6 02:45:31 2025 +0000
fix split k
commit 90e910f3a7
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 4 07:16:36 2025 +0000
fix flatmm with scaling when WarpTileM == 32
commit aa5e008fa5
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 1 11:01:23 2025 +0000
optimize scaling epilogue
commit ac5908c0bb
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 1 07:28:38 2025 +0000
fix wrong config for fp8 scaling
commit 3f43b841d4
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 30 06:20:30 2025 +0000
prune debug message
commit 2e5d4c74cd
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 30 04:52:08 2025 +0000
fix compile error
commit c117a1986a
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Tue Jul 29 15:42:58 2025 +0000
Add persistent option on flatmm for tuning
commit a587701117
Author: AMD-dteng <dteng@amd.com >
Date: Tue Jul 29 22:48:00 2025 +0800
update pipeline v1: add atomic IGLP schedule
commit f9e48148d2
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 24 09:09:27 2025 +0000
fix error log throwing
commit 1b6d7cf407
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Jul 28 08:24:51 2025 +0000
crz idea
commit 5473f06461
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Sun Jul 27 11:57:38 2025 +0000
Add permuteN optimzization when NRepeat % 2 == 0 on flatmm
commit bfb9f4002f
Author: sjfeng <j514681085@icloud.com >
Date: Sun Jul 27 17:24:08 2025 +0800
try to remove c_shuffle_lds
commit 1264f4d2ab
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Jul 25 07:41:48 2025 +0000
fix loop-dim mismatch and improve c_shuffle alu parallelism
commit 1239d8a546
Merge: 406645448 b908f5e80
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 24 08:46:51 2025 +0000
merge flatmm -scale
commit 4066454483
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 24 16:19:58 2025 +0800
revert delete of inc file
commit 68390988c9
Author: solin <bingzhou@amd.com >
Date: Thu Jul 24 04:38:16 2025 +0000
reorg flatmm code
commit b908f5e803
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 23 19:12:31 2025 +0000
fix flatmm syntax error on gfx950
commit 5a1183ebbd
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 23 19:04:22 2025 +0000
support flatmm scaling
commit 89fa639207
Author: valarLip <340077269@qq.com >
Date: Wed Jul 23 08:44:12 2025 +0000
merge flatmm pipe v0 from dteng_flatmm_opt
commit 3f7d848dd3
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 23 15:38:12 2025 +0800
build pass
commit 6dacf833da
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 23 07:20:26 2025 +0000
fix bug
commit 7e1bd4b839
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 23 15:01:53 2025 +0800
sync
commit 46a538e39e
Author: valarLip <340077269@qq.com >
Date: Tue Jul 22 08:09:35 2025 +0000
adaptive scheduler instead of Macro definition
commit 9aa3396a79
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 17 08:40:35 2025 +0000
fix tail handler bug
commit fb76450e63
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 16 10:12:19 2025 +0000
merge from dteng_flatmm_opt
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: AMD-dteng <dteng@amd.com >
Co-authored-by: solin <bingzhou@amd.com >
Co-authored-by: sjfeng <j514681085@icloud.com >
Co-authored-by: valarLip <340077269@qq.com >
Co-authored-by: asleepzzz <hanwen.chang@amd.com >
Co-authored-by: Feng Shijie <Shijie.Feng@amd.com >
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: Gino Lu <gino.lu@amd.com >
Co-authored-by: mtgu0705 <mtgu@amd.com >
* Fix crash on small M
* Apply suggestion from @Copilot
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: AMD-dteng <dteng@amd.com >
Co-authored-by: solin <bingzhou@amd.com >
Co-authored-by: sjfeng <j514681085@icloud.com >
Co-authored-by: valarLip <340077269@qq.com >
Co-authored-by: asleepzzz <hanwen.chang@amd.com >
Co-authored-by: Feng Shijie <Shijie.Feng@amd.com >
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: Gino Lu <gino.lu@amd.com >
Co-authored-by: mtgu0705 <mtgu@amd.com >
2025-10-31 11:29:05 +08:00
lalala-sh
211d64e18a
[CK_TILE] Update flatmm related kernels ( #3022 )
...
---------
Co-authored-by: Ding, Yi <yi.ding@amd.com >
Co-authored-by: felix <felix.li@amd.com >
2025-10-22 22:36:11 +08:00
linqunAMD
df4ee556d6
[CK_TILE] Fix flatmm on gfx11 and gfx12 ( #2790 )
...
1. Correct shuffle_b and MakeBFlatDramTileDistribution according to WMMA warp layout
2. Add FlatmmConfig16_Wmma for gfx11 and gfx12
2025-09-10 08:28:00 +08:00
linqunAMD
9fcc1ee9fd
Support Wave32 in CK_TILE - Part 1 ( #2594 )
...
* Support wave32/wave64 in CK_TILE - Part 1
* remove blocksize in kernel launch
* fix build error
* fix clang format
* fix clang format 2
* fix clang format 3
* fix fmha build error
* fix fmha build 2
* fix fmha build 3
* fix build error 4
* address review comment
* update change log
* replace KernelBlockSize with kBlockSize
* fix CI fail
* fix clang format
* address review comment and rebase code.
* fix universal test fail
---------
Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com >
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-08-18 10:08:31 -07:00
Tianyuan Wu
68134b60e4
[CK_TILE] CK_TILE GEMM WMMA Support for GFX11/GFX12 ( #2466 )
...
* WMMA GEMM F16 Implementation
Signed-off-by: root <tianyuwu@amd.com >
* Self-review
Signed-off-by: root <tianyuwu@amd.com >
* ASIC check minor tweak
Signed-off-by: root <tianyuwu@amd.com >
* add missing include file
* Set GPU_TARGETS to gfx11/12 generic
Signed-off-by: root <tianyuwu@amd.com >
* INT8 GFX12
Signed-off-by: root <tianyuwu@amd.com >
* add int8x16 branch
* Fix CI script
Signed-off-by: root <tianyuwu@amd.com >
* Fix typo
Signed-off-by: root <tianyuwu@amd.com >
* Add CK_Tile WMMA example
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com >
* Fix CI
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com >
* fix clang format
* Set M/N_Warp Back to Constant
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com >
* Use GemmConfigComputeV3 by default
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Remove CK_Tile wmma gemm examples from the CI list
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Add atomic add fallback method for gfx11
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Fix typo
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Omit copyright year
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Support non-square cases
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Fix CI
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Add get_device_ip()
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Revert "Add atomic add fallback method for gfx11"
This reverts commit 07a79e797d .
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
* Revert "Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12"
This reverts commit ceee918007 .
* Revise method name and typos
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
* clang-format
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Try fix CI
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Revert "Try fix CI"
This reverts commit 7a7241085e .
* clang-format
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Fix typo caused by merge
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
* Fix typo caused by merging
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
---------
Signed-off-by: root <tianyuwu@amd.com >
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com >
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
Co-authored-by: joye <joye@amd.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com >
2025-08-15 16:22:27 -07:00
Aviral Goel
1441a0a7ee
Integration of a new pipeline for weight preshuffle into gemm examples ( #2516 )
...
* something khushbu can help with
* v1 v2 works with flatmm develop
* v0 v1 v2 numerical error gone
* Fixing numerical error, and interchange preshuffle configs to match with flatmm
* Refactor GEMM pipeline configurations and integrate preshuffle support
- Updated preshuffle pipeline definitions to include multiple versions (V1, V2, V3).
- Changed the pipeline constant from CK_TILE_PIPELINE_PRESHUFFLE to CK_TILE_PIPELINE_PRESHUFFLE_V3 in relevant configurations.
- Removed obsolete code and comments
* clang format
* fix vectorloadsize bug
* add the Preshuffle3
* update kwarp calculation in gemm utils
* update vector size A and B correctly in V2 pipeline; Added few more changes to align with dteng's branch
* fix: add CK_GFX950_SUPPORT macro for gfx950 detection
* default disable rotating buffer
* docs(CHANGELOG): update changelog for rocm 7.0
* Revert "docs(CHANGELOG): update changelog for rocm 7.0"
This reverts commit 2bc16fff84 .
* Remove unused Preshuffle V3 pipeline and related code; update gemm function to use Preshuffle V2; clean up comments and formatting in various files.
* revert example/ck_tile/flatmm to its original state
* remove comment added by second author
* switch to xor ALDSDescriptor
* modify the MakeALdsDescriptor()
* temporary profiling script
* getting rid of line marker compiler error
* UniversalWeightPreshufflePipelineAgBgCrPolicy now derives from UniversalGemmBasePolicy
* add a minor fix for the config
* typo fix
* Fix formatting in lambda function for WeightPreshufflePipelineAGmemBGmemCRegV2
* revert change in include/ck_tile/ops/flatmm/pipeline/flatmm_pipeline_agmem_bgmem_creg_v1.hpp
* revert change in include/ck_tile/core/arch/amd_buffer_addressing.hpp
* reenable the GemmSpatiallyLocalTilePartitioner
* make GemmConfigPreshuffle_1 for v1 pipeline, GemmConfigPreshuffle_2 for v2 pipeline
* remove hardcoded true for preshuffle bool template argument
* rename script
* remove gemm_profilie.sh script
* merge conflict resolve
* clang formatted
* typo fix
* Remove duplicate include of block_gemm_areg_bsmem_creg_v2r1.hpp in gemm.hpp
* Remove commented-out code in UniversalWeightPreshufflePipelineAgBgCrPolicy
* Fix missing newline at end of file in run_gemm_example.inc
* Remove unused barrier call in BlockWeightPreshuffleASmemBSmemCRegV1
* addressing review comments
* removing debug code
* addressing review comments
* Revert "addressing review comments"
This reverts commit 29c45192ba .
* updating tile_engine code
* addressing review comments
---------
Co-authored-by: amd-khushbu <khuagarw@amd.com >
Co-authored-by: ThomasNing <thomas.ning@amd.com >
2025-08-01 00:04:54 -07:00
Illia Silin
504b101da3
upgrade from clang-format-12 to clang-format-18 ( #2568 )
...
* upgrade to clang-format-18
* update to clang-format-18 in pre-commit-config
2025-07-28 11:34:07 -07:00
Cong Ma
e62710e461
ck_tile kernel for gemm with groupwise quantized A tensor ( #2473 )
...
* ck_tile kernel for gemm with groupwise quantized A or B tensor.
This change introduces new pipelines with Intrawave scheduler and block gemm primitives that loads the scale tensor to registers to perform dequantization post MFMA on C tensor in registers.
Scale tensor data, AQ/BQ is spliced across threads in registers and not stored in LDS.
Current support is for the following combinations, but it should be fairly straightforward to extend support to more formats.
1. fp8, fp8 -> f32
2. bf8, bf8 -> f32
3. i4, fp8 -> f32
4. i4, bf8 -> f32
Group size can go down to as low as K length of underlying WarpGemm primitive.
For Gemm problems with quantized B tensor, this change also introduces preliminary support for flatmm pipeline which loads B tensor directly into registers.
* [Block Scale Gemm] Only run gemm quant examples on __gfx94__
- Only run gemm quant examples on __gfx94__ for usage of
`v_cvt_pk_fp8_f32`
- Format the code
* [Block Scale Gemm] Remove Bquant Gemm BlockScale
This cleanup is in preparation for future development of bquant. By
isolating Aquant-related code, we can streamline the codebase and make
it easier to add and maintain bquant functionality in subsequent
updates.
* [Block Scale Gemm] Format code with clang-format-12
The latest clang-format (v19) in ROCm 7.0 generate different result than
clang-format-12 which is used in CK CI.
Format code with clang-format-12 for consistency.
* [Block Scale Gemm] Split the k direction loop
- Split the k direction loop in block_universal_gemm_as_quant_bs_cr.hpp
to make the logic clearer.
- Disable C transposition.
* [Block Scale Gemm] Move block scale gemm example to 38_block_scale_gemm
* [Block Scale Gemm] Update copyright
* test
* Add TailHandler
* Move TileDistributionEncodingPatternAQ
* Refactor
* refactor
* fix bug
* fix bug
* help solve the PR comment
* Format the code
* [Block Scale Gemm] Add unit tests
* [Block Scale Gemm] Add support to 16x16x32 MFMA
- Add support to 16x16x32 MFMA
- Fix a bug when exchange data crossing lanes
---------
Co-authored-by: Vijay Krishnamoorthy <vjkrish@meta.com >
Co-authored-by: Cong MA <congma13@ctr2-alola-ctrl-01.amd.com >
Co-authored-by: ThomasNing <thomas.ning@amd.com >
2025-07-23 00:10:16 -07:00
Khushbu Agarwal
d239b91fd5
Merge flatmm Operator with universal gemm ( #2434 )
...
* Initial commit
* Adding new tile partitioner to flatmm
* intermediate changes
* debugging kernels
* Updating flatmm example to universal gemm example
* updated flatmm kernel to run via gemmKernel
* update universal gemm to incorporate flatmm
* debug
* Fix flatmm call
* Fixing other kernels and tests for API changes
* clang formatted
* fixing gemm tests
* added test for flatmm and simplify kernel arguments
* adding flatmm test
* fix test for flatmm
* simplify gemm kernel with flatmm
* remove flatmm related files
* addressing review comments and code clean up
* resolving empty file
* resolving empty file
* clang formatted
* addressing review comments
* enable persistent kernel for flatmm
* reverted the removed files for flatmm
* reverted the removed files for flatmm
* changed flatmm to weightPReshuffle; removed the _1 added in teh faltmm example
* some more renames
* clang formatted
2025-07-11 08:27:55 -07:00
linqunAMD
37e1a27537
[CK_TILE] Refine fp8 support in flatmm ( #2239 )
...
* [CK_TILE] Refine fp8 in flatmm
1. Replace USING_MFMA_16x16x32 & USING_MFMA_16x16x32 with constexpr
2. Add an additional const check to avoid build error in HotLoopScheduler
3. Refine shuffleb to support both tile 32x32 and 16x16
4. Support command option -init
5. Move Gemm warp defintion to a separate struct
* fix clang format
* fix clang format
* keep default bhavior unchanged (warp tile = 16x16)
* fix tile engine build error
* fix a typo in codegen_utils.py
* address review comments
* address review comments
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-06-25 01:07:45 -07:00
Khushbu Agarwal
bd270fe4bc
fix flatmm kernel for bigger size for fp16 datatype ( #2302 )
2025-06-10 11:13:40 -07:00
Andriy Roshchenko
00247e3c29
Optimized GEMMs for MX FP4/8 ( #2294 )
...
Adds V3 GEMM pipeline for MX FP4 and MX FP8
Adds V3 GEMM pipeline for MX FP4 with preshuffling
Adds MXFP4 GEMM tests (#2275 )
Adds MXFP4 GEMM examples
Adds MXFP4 GEMMs to ckProfiler
Co-authored-by: Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com >
Co-authored-by: Andriy Roshchenko <andriy.roshchenko@amd.com >
Co-authored-by: aska-0096 <haocwang@amd.com >
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: OscarXu <huaiguxu@amd.com >
Co-authored-by: mtgu0705 <mtgu@amd.com >
Co-authored-by: Ding, Yi <yi.ding@amd.com >
Co-authored-by: feifei14119 <feiw@amd.com >
Co-authored-by: Lin, Qun <qlin@amd.com >
Co-authored-by: joye <joye@amd.com >
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com >
2025-06-05 13:54:15 -06:00
BingYuan.Zhou
41c17d0a95
fix moe sorting build fail ( #2190 )
...
* fix moe sorting build fail
* refile code
---------
Co-authored-by: solin <bingzhou@amd.com >
2025-05-14 09:31:26 +08:00
BingYuan.Zhou
6a3960c1e1
Flatmm merge ( #2168 )
...
* sync with function interface of cshuffleepiloge,fix flatmm build fail
* move code from solin/flatmm which add mfma16*16*32fp8 and optimize flatmm
---------
Co-authored-by: solin <bingzhou@amd.com >
2025-05-08 12:59:57 +08:00
Illia Silin
9a9f59ae69
Revert "Add ck tile examples to package ( #1880 )" ( #2150 )
2025-04-30 10:20:16 -07:00
jakpiase
434d19f696
Add ck tile examples to package ( #1880 )
...
* add ck tile examples to package
* Update jenkinsfile
* fix for jenkinsfile
* fix for building ck tile code on non gfx9
* compile ck tile examples only for gfx94
* include ck tile examples in all target
* fix for basic gemm UseStructuredSparsity
* Update CMakeLists.txt
* Update gemm_pipeline_problem.hpp
* add targets to rocm install
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2025-04-28 09:53:19 -07:00
solin
c318ec0778
fix CI build fail
2025-04-21 16:00:12 +08:00
BingYuan.Zhou
eaf1f0bf3b
[flatmm] implement basic fp16 flatmm ( #2089 )
...
* [flatmm] implement basic fp16 flatmm
* fix CI build fail
---------
Co-authored-by: root <root@hjbog-srdc-50.amd.com >
Co-authored-by: solin <bingzhou@amd.com >
2025-04-16 16:51:17 +08:00