Tianxing Wu
0a2b6c4bcd
[rocm-libraries] ROCm/rocm-libraries#4297 (commit 5ff580c)
...
moe flatmm xcd remap
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
co-authors: @Chi-Chu319 @juuso-oskari
Added XCD remapping for flatmm moe
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40 ">
<head>
<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/tianxiwu/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/tianxiwu/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
<style>
<!--table
{mso-displayed-decimal-separator:"\.";
mso-displayed-thousand-separator:"\,";}
@page
{margin:.75in .7in .75in .7in;
mso-header-margin:.3in;
mso-footer-margin:.3in;}
tr
{mso-height-source:auto;}
col
{mso-width-source:auto;}
br
{mso-data-placement:same-cell;}
td
{padding-top:1px;
padding-right:1px;
padding-left:1px;
mso-ignore:padding;
color:black;
font-size:11.0pt;
font-weight:400;
font-style:normal;
text-decoration:none;
font-family:Arial, sans-serif;
mso-font-charset:0;
mso-number-format:General;
text-align:general;
vertical-align:bottom;
border:none;
mso-background-source:auto;
mso-pattern:auto;
mso-protection:locked visible;
white-space:nowrap;
mso-rotate:0;}
-->
</style>
</head>
<body link="#467886 " vlink="#96607D">
batch | Mixtral (tflops, wip_355) | Mixtral-7B (tflops, our branch) |
perf boost
-- | -- | -- | --
64 | 865.424 | 995.455 | 15.0%
256 | 886.336 | 1020.96 | 15.2%
1024 | 890.808 | 1022.53 | 14.8%
</body>
</html>
2026-02-18 19:33:24 +00:00
Thomas Ning
5cb8109535
[rocm-libraries] ROCm/rocm-libraries#4640 (commit 37b8c81)
...
Fix the Composable Kernel CI and versions incompatibility
(#4640 )
## Motivation
This PR has 4 patches:
1. Fix the CI error of grouped gemm.
2. Fix the incompatibility of old linux version.
3. Fix the potential errors of flatmm.
4. Address the previous comments of abquant eight warps pipeline
solution.
2026-02-18 15:00:26 +00:00
Max Podkorytov
e339101e9c
[CK-Tile] move out memory operation from cshuffle epilogue class ( #3359 )
...
* initial poc
* factor out common parts in operator()
* cv4
* rest of the universal gemm pipelines
* fix test
* remove boilerplate from tile engine
* fix example
* fix example
* format
* fix tests build for gemm
* remove base pipeline codegen from gemm instance builder
* unify v3 logic with the rest of universal gemm pipelines
* fix build for multi abd test
* fix test gemm multi d
* fix build for weight preshuffle
* fix grouped gemm test
* fix grouped gemm multi d test
* fix grouped gemm preshuffle
* fix grouped gemm example except for quant
* fix gemm preshuffle
* fix splitk 2 stage example
* fix batched gemm example
* fix multid example
* fix multiabd example
* fix batched gemm test
* fixup
* fix examples build
* fix grouped gemm test build
* fix smoke builder
* hacky poc
* fix tile engine
* kill the lambda
* maybe fix test build
* more fixes
* clang-format
* save temp
* clang-format
* mostly fix examples
* clang-format
* remove dead code
* more cleanup
* fix fmha bwd build (default epilogue set/add appears to be broken)
* fix default epilogue tests but not correctness
* clang-format
* fix bquant
* clang-format
* cleanup dead code
* rearrange make windows for readability
* restore changes to IsSupportedArgument
* fix smoke-builder
* clang-format
* fixup rename class
* build fixes
* clang-format
* fix builder
* fixup
* remove set from builder tests
* fix test
* clang-format
* re-refactor the kernels
* clang-format
* fix header license
* remove memory operation from conv bwd test
* clang-format
* clang-format example,include
* clang-format test
* build fixes
* clang-format
* solve compilation error
* fix the CI
* solve compilation error
* clang format
* solve merge conflict
* solve merge conflict
* solve the gfx11 error
* solve test error
* moar build fixes
* remove AtomicAddRequiresKBatchGreaterThanOne test since the property is removed from the kernel scope
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2026-01-04 03:28:14 -08:00
yadaish
dae85ead64
[CK_TILE] support split-k a16w4 gemm1 ( #3389 )
...
* initial version to support moe gemm1 split-k
* add missing args
* fix build warning
* update reference
* for split-k disable bias and weight
* remove debug log
* fix format
* fix div by zero errors
* fix cmake config
* update
* resolve conflicts
* remove useless changes
* reformat
* fix
* remove useless changes
* fix ci
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: root <root@smci355-ccs-aus-m01-25.cs-aus.dcgpu >
2025-12-29 23:05:35 +08:00
yadaish
c0ee71d735
Dev/a8w4 and a8w8splitk ( #3447 )
...
* Ck moe bs splitk pr (#3440 )
* splitk kick-off. Compilation fail
* splitk hack pass
* fix scale offset calc.
* clang-format for a8w8_moe_blk_gemm1 splitk change
* fix testcase error
---------
Co-authored-by: oscar <huaiguxu@amd.com >
Co-authored-by: huaiguxu <145733371+huaiguxu@users.noreply.github.com >
* Zan/moe a8w4 (#3441 )
* update
* update
* update ck moe a8w4
* update
* update
* update
* compile pass
* update
* update
* python3 op_tests/test_moe_2stage.py -t 16 -e 1 -k 1 -dim 256,256 ready
* support new a8w4 kernel
* update
* update ck_tile
* re format
* update
* update
* fix conflict
* fix build
* update ck_tile moe
* fix clang format
* fix the problem
* fix accruacy issue
* fix
---------
Co-authored-by: oscar <huaiguxu@amd.com >
Co-authored-by: huaiguxu <145733371+huaiguxu@users.noreply.github.com >
Co-authored-by: Zzz9990 <zanzhang@amd.com >
Co-authored-by: felix <felix.li@amd.com >
2025-12-19 09:26:52 +08:00
Yi DING
57e1e4a848
[CK_TILE] Add FP8xF4 Flatmm ( #3401 )
...
* Refactor policy
* fix a bank conflict
* Enable mixed mx flatmm
* Update
2025-12-17 10:01:48 +08:00
Zzz9990
1aa93ef551
[CK_TILE MOE] add NT & preshuffle permute to cktile MOE ( #3377 )
...
* update coherence
---------
Co-authored-by: Zzz9990 <Zzz9990>
2025-12-10 10:03:28 +08:00
lalala-sh
6f0966e1e9
fix a16w4 moe bugs ( #3373 )
...
* fix valid mask bug
* update format
2025-12-09 17:54:55 +08:00
Yi DING
878b4e7f46
[CK_TILE] Optimize Flatmm MXFP4 by Eliminating Runtime Division by 2 ( #3287 )
...
* [CK_TILE] Optimize Flatmm MXFP4 by Eliminating Runtime Division by 2
* typo
2025-12-08 19:20:44 +08:00
msaffari-amd
f875ab0bbc
Add validity checks for MoE FlatMM scatter and enable bf16 hardware atomic-add ( #3236 )
...
* Add validity checks for MoE FlatMM scatter and enable bf16 hardware atomic
* correct clang-format
* removed unused rtol_atol variable from example code
* clang format correction
* remove unused varable max_accumulated_value from example
2025-11-28 09:43:01 +01:00
Aviral Goel
de6466481f
chore(copyright): update copyright header for include directory ( #3293 )
2025-11-26 11:00:05 -07:00
Yi DING
47e2ed838e
[CK_TILE] Add Flatmm MX FP8 ( #3208 )
...
* Use async for flatmm mxfp4
* Fix preshuffle
* Add flatmm mxfp8
* Thanks, Copilot
* Thanks Copilot again~
2025-11-20 10:35:15 +08:00
Max Podkorytov
a3a4eb12bd
[CK-Tile] Remove usage of tile partitioner's full gemm shape ( #3204 )
...
gemm shape should be used from the pipeline instead (where it gets from a problem description struct)
2025-11-18 09:56:40 -08:00
BingYuan.Zhou
4d629cd2b0
fix build error ( #3195 )
...
Co-authored-by: root <root@hjbog-srdc-39.amd.com >
2025-11-14 09:46:13 +08:00
Yi DING
e135dd518d
[CK_TILE] Add mxfp4 flatmm ( #3080 )
...
* Squashed commit of the following:
commit 3e1a851dad834776efbe4fe365ac82c4ed312010
Author: Ding, Yi <yi.ding@amd.com >
Date: Thu Oct 23 06:10:54 2025 +0000
Fix & clean after rebase
commit 1edf485092f44411da9a1796a4a6b72d5cdb67c6
Author: Ding, Yi <yi.ding@amd.com >
Date: Wed Oct 22 10:46:13 2025 +0000
Squashed commit of the following:
commit 0b6b9dbd1b
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 22 02:04:27 2025 -0500
fix bandwidth calculation
commit 9aebf53bb7
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 22 00:58:59 2025 -0500
updates
commit 62607de56c
Author: mtgu0705 <mtgu@amd.com >
Date: Fri Sep 19 00:39:46 2025 -0500
fix a bug, set the A DS_read preload size to 4 for MXFP4
commit 92ad6fcc0a
Author: mtgu0705 <mtgu@amd.com >
Date: Thu Sep 18 01:19:03 2025 -0500
fix a_wrap preload issue for large MPerBlock.
commit f2db44710f
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 17 21:34:03 2025 -0500
optimized the VGPR repack issue for MXFP4
commit 346a400027
Author: Gino Lu <gino.lu@amd.com >
Date: Wed Sep 17 04:19:44 2025 -0500
fix time error
commit 80c1743034
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 17 03:58:00 2025 -0500
updated, function passed.
commit ce26d9071e
Author: mtgu0705 <mtgu@amd.com >
Date: Tue Sep 16 22:21:39 2025 -0500
fix, function partially passed
commit 0a89ed13a5
Author: mtgu0705 <mtgu@amd.com >
Date: Tue Sep 16 03:01:12 2025 -0500
fix, reference function passed, next check kernel function
commit ec9bcef591
Author: Gino Lu <gino.lu@amd.com >
Date: Tue Sep 16 02:29:01 2025 -0500
let pack/unpack return pk_fp4_t
commit a333206929
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 20:50:26 2025 -0500
fix
commit 3893c06540
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 15 05:51:06 2025 -0500
fix bug
commit 8052bea019
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 04:02:05 2025 -0500
fix core dump issue, function is not correct.
commit 9ceb3fd508
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 03:03:02 2025 -0500
updates, build pass
commit cc94eb6045
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 00:05:18 2025 -0500
updates
commit 22586c3135
Author: Gino Lu <gino.lu@amd.com >
Date: Sun Sep 14 23:40:28 2025 -0500
fix bug
commit e92e67b8dd
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 12 03:28:50 2025 -0500
fix interface
commit 8b1dd60c08
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 12 02:53:50 2025 -0500
add interface in warp_gemm_impl
commit c6135f6abe
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 10 05:03:08 2025 -0500
updates some fixes.
commit b0d71b8d19
Author: mtgu0705 <mtgu@amd.com >
Date: Tue Sep 9 04:37:42 2025 -0500
fix after merge ginolu/add_wgmfma_dispatcher
commit f119c30317
Merge: c5030e602 72c8ef856
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 8 22:09:15 2025 -0500
Merge remote-tracking branch 'origin/ginolu/add_wgmfma_dispatcher' into mtgu/cktile_mxfp4_flatmm_dev
commit c5030e602e
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 8 21:42:47 2025 -0500
update mx flatmm tail pipeline
commit 72c8ef8567
Merge: 9661bb400 e4a772890
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 8 19:10:23 2025 -0500
Merge branch 'develop' into ginolu/add_wgmfma_dispatcher
commit 9661bb400b
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 8 19:09:55 2025 -0500
fix type error
commit 0509597f55
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 8 04:01:40 2025 -0500
update hotloop pipeline
commit 754ae0461b
Merge: 15d44406e 83f607e2a
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 5 04:22:26 2025 -0500
Merge branch 'develop' into ginolu/add_wgmfma_dispatcher
commit 15d44406e5
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 5 04:21:26 2025 -0500
fix clang format
commit 146963d62a
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 3 10:00:54 2025 -0500
some updates
commit 12526b626a
Merge: 47cee0471 00fd72b2d
Author: asleepzzz <hanwen.chang@amd.com >
Date: Wed Sep 3 13:22:03 2025 +0800
Merge branch 'develop' into ginolu/add_wgmfma_dispatcher
commit 47cee04712
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 1 02:11:02 2025 -0500
fix vec size error
commit d2892925e5
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 1 01:23:39 2025 -0500
fix format error
commit 16993acd1d
Author: mtgu0705 <mtgu@amd.com >
Date: Sat Aug 30 03:19:07 2025 -0500
update codes
commit 9c37e55d13
Author: mtgu0705 <mtgu@amd.com >
Date: Fri Aug 29 11:27:33 2025 -0500
init ck_tile mxfp4 flatmm
commit 5c484a5672
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 28 08:02:50 2025 +0000
Add bias for f16xf4 moe_flatmm
commit dd6539f366
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 27 13:39:47 2025 +0000
update case construction
commit 65b702454c
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Tue Aug 26 12:32:29 2025 +0000
support swiglu activaion and use rcpf to accelerate silu
commit b422e41e08
Author: Gino Lu <gino.lu@amd.com >
Date: Tue Aug 26 02:33:55 2025 -0500
first commit
commit d05eed931d
Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu >
Date: Fri Aug 22 04:01:59 2025 -0500
add line to last
commit d69cab7f0c
Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu >
Date: Fri Aug 22 03:20:46 2025 -0500
adjust A_LDS descriptor to avoid bankconflict
commit 65989e940c
Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu >
Date: Thu Aug 21 09:46:52 2025 -0500
enable hotloop
commit c378e9bdf8
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 21 09:12:21 2025 +0000
support atomic_pk_add_bf16 on gfx950
commit 85976b0b87
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 21 06:58:55 2025 +0000
use int64_t as expert stride to avoid overflow
commit 9fbcc8f8a4
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 20 13:53:32 2025 +0000
use v4i32 as the storage type for B to avoid repack operation
commit 81899bd920
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 20 06:40:03 2025 +0000
add pk_fp4_t and e8m0_t support for amd_buffer_load_impl
commit c27eb0771a
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 20 04:39:14 2025 +0000
optimize cvt_pkf4_to_f16 implementation
commit 3ca0bd500a
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Tue Aug 19 14:56:46 2025 +0000
optimize A_LDS descriptor to avoid bankconflict
commit f7f0306eea
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 18 18:43:37 2025 +0000
fix gate-up when GU_NRepeat > 1
commit be55c0f9cb
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 18 17:28:11 2025 +0000
add fp16xf4 moe
commit 599e1f5b32
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Sun Aug 17 17:51:18 2025 +0000
rename example
commit 7899fb4a8d
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 15 06:20:46 2025 +0000
remove additional check when e8m0->float
commit 714b341797
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 14 09:34:12 2025 +0000
eliminate repeat dequant
commit 53e8c0c533
Merge: 5de620895 cc9c7b9e5
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 13 16:51:49 2025 +0000
Merge remote-tracking branch 'origin/moe_flatmm' into feat-mixed_input_flatmm
commit 5de6208952
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 13 16:16:48 2025 +0000
update f16xMXF4
commit 732ebdee8b
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 13 10:48:53 2025 +0000
update scale-preshuffle for MXF4
commit edb58d0680
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 11:24:34 2025 +0000
update
commit cc9c7b9e58
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 08:38:23 2025 +0000
optimize gemm2 atomic_add pattern
commit 200a11afc8
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 07:59:47 2025 +0000
update scale for mxfp4
commit 87aed564dc
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 07:56:14 2025 +0000
update case construction
commit 8b85fa6cf2
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 06:03:06 2025 +0000
update granularity control
commit 1b8c7097b8
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 03:42:46 2025 +0000
fix TileConfig
commit 8ba1c708dc
Author: Gino Lu <gino.lu@amd.com >
Date: Thu Aug 7 21:37:28 2025 +0800
Add e8m0 scaled convert into CK_TILE (#2617 )
* first commit
* remove redundent code
* modify according to comments.
* fix type_convert error with scaled_type_convert
commit f788d3d629
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 8 20:19:16 2025 +0000
add mixed_prec fp16xfp4
commit 3dea10a277
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 7 09:22:04 2025 +0000
debug mixed_prec flatmm
commit 0ba513b148
Merge: 90e910f3a c0cb4d036
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Aug 6 16:49:47 2025 +0800
Merge pull request #2626 from ROCm/felix/flatmm_fix_splitk
fix split k
commit 6d3cbc7c0e
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 6 08:33:33 2025 +0000
add moe_flatmm
commit c0cb4d036d
Author: coderfeli <coderfeli@163.com >
Date: Wed Aug 6 02:45:31 2025 +0000
fix split k
commit 90e910f3a7
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 4 07:16:36 2025 +0000
fix flatmm with scaling when WarpTileM == 32
commit aa5e008fa5
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 1 11:01:23 2025 +0000
optimize scaling epilogue
commit ac5908c0bb
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 1 07:28:38 2025 +0000
fix wrong config for fp8 scaling
commit 3f43b841d4
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 30 06:20:30 2025 +0000
prune debug message
commit 2e5d4c74cd
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 30 04:52:08 2025 +0000
fix compile error
commit c117a1986a
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Tue Jul 29 15:42:58 2025 +0000
Add persistent option on flatmm for tuning
commit a587701117
Author: AMD-dteng <dteng@amd.com >
Date: Tue Jul 29 22:48:00 2025 +0800
update pipeline v1: add atomic IGLP schedule
commit f9e48148d2
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 24 09:09:27 2025 +0000
fix error log throwing
commit 1b6d7cf407
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Jul 28 08:24:51 2025 +0000
crz idea
commit 5473f06461
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Sun Jul 27 11:57:38 2025 +0000
Add permuteN optimzization when NRepeat % 2 == 0 on flatmm
commit bfb9f4002f
Author: sjfeng <j514681085@icloud.com >
Date: Sun Jul 27 17:24:08 2025 +0800
try to remove c_shuffle_lds
commit 1264f4d2ab
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Jul 25 07:41:48 2025 +0000
fix loop-dim mismatch and improve c_shuffle alu parallelism
commit 1239d8a546
Merge: 406645448 b908f5e80
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 24 08:46:51 2025 +0000
merge flatmm -scale
commit 4066454483
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 24 16:19:58 2025 +0800
revert delete of inc file
commit 68390988c9
Author: solin <bingzhou@amd.com >
Date: Thu Jul 24 04:38:16 2025 +0000
reorg flatmm code
commit b908f5e803
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 23 19:12:31 2025 +0000
fix flatmm syntax error on gfx950
commit 5a1183ebbd
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 23 19:04:22 2025 +0000
support flatmm scaling
commit 89fa639207
Author: valarLip <340077269@qq.com >
Date: Wed Jul 23 08:44:12 2025 +0000
merge flatmm pipe v0 from dteng_flatmm_opt
commit 3f7d848dd3
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 23 15:38:12 2025 +0800
build pass
commit 6dacf833da
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 23 07:20:26 2025 +0000
fix bug
commit 7e1bd4b839
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 23 15:01:53 2025 +0800
sync
commit 46a538e39e
Author: valarLip <340077269@qq.com >
Date: Tue Jul 22 08:09:35 2025 +0000
adaptive scheduler instead of Macro definition
commit 9aa3396a79
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 17 08:40:35 2025 +0000
fix tail handler bug
commit fb76450e63
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 16 10:12:19 2025 +0000
merge from dteng_flatmm_opt
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: AMD-dteng <dteng@amd.com >
Co-authored-by: solin <bingzhou@amd.com >
Co-authored-by: sjfeng <j514681085@icloud.com >
Co-authored-by: valarLip <340077269@qq.com >
Co-authored-by: asleepzzz <hanwen.chang@amd.com >
Co-authored-by: Feng Shijie <Shijie.Feng@amd.com >
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: Gino Lu <gino.lu@amd.com >
Co-authored-by: mtgu0705 <mtgu@amd.com >
* Fix crash on small M
* Apply suggestion from @Copilot
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: AMD-dteng <dteng@amd.com >
Co-authored-by: solin <bingzhou@amd.com >
Co-authored-by: sjfeng <j514681085@icloud.com >
Co-authored-by: valarLip <340077269@qq.com >
Co-authored-by: asleepzzz <hanwen.chang@amd.com >
Co-authored-by: Feng Shijie <Shijie.Feng@amd.com >
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: Gino Lu <gino.lu@amd.com >
Co-authored-by: mtgu0705 <mtgu@amd.com >
2025-10-31 11:29:05 +08:00
lalala-sh
211d64e18a
[CK_TILE] Update flatmm related kernels ( #3022 )
...
---------
Co-authored-by: Ding, Yi <yi.ding@amd.com >
Co-authored-by: felix <felix.li@amd.com >
2025-10-22 22:36:11 +08:00
Khushbu Agarwal
b56e5d1d79
Fix for Add the API to load SGPR ( #2913 )
...
* Revert "Revert "[CK-Tile] Add the API to load SGPR (#2878 )" (#2904 )"
This reverts commit f161b5b738 .
* Fix: sgpr minor issue
* cyclic dependency resolved
* clang formatted
* removing unused variable
* clang formatted
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2025-09-25 10:32:42 -07:00
asleepzzz
f161b5b738
Revert "[CK-Tile] Add the API to load SGPR ( #2878 )" ( #2904 )
...
This reverts commit 2cbbf5dcb3 .
2025-09-23 14:33:51 -07:00
Thomas Ning
2cbbf5dcb3
[CK-Tile] Add the API to load SGPR ( #2878 )
...
* Have a workable version for SGPR
* have a workable version for atomic add
* Revert "have a workable version for atomic add"
This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.
* substitute with the new sgpr read api
* update the CHANGELOG
* have a workable version for atomic add
* Revert "have a workable version for atomic add"
This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.
* change to static for logic
* have a workable version for atomic add
* Revert "have a workable version for atomic add"
This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.
2025-09-23 01:23:56 -07:00
linqunAMD
df4ee556d6
[CK_TILE] Fix flatmm on gfx11 and gfx12 ( #2790 )
...
1. Correct shuffle_b and MakeBFlatDramTileDistribution according to WMMA warp layout
2. Add FlatmmConfig16_Wmma for gfx11 and gfx12
2025-09-10 08:28:00 +08:00
linqunAMD
9fcc1ee9fd
Support Wave32 in CK_TILE - Part 1 ( #2594 )
...
* Support wave32/wave64 in CK_TILE - Part 1
* remove blocksize in kernel launch
* fix build error
* fix clang format
* fix clang format 2
* fix clang format 3
* fix fmha build error
* fix fmha build 2
* fix fmha build 3
* fix build error 4
* address review comment
* update change log
* replace KernelBlockSize with kBlockSize
* fix CI fail
* fix clang format
* address review comment and rebase code.
* fix universal test fail
---------
Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com >
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-08-18 10:08:31 -07:00
Aviral Goel
1441a0a7ee
Integration of a new pipeline for weight preshuffle into gemm examples ( #2516 )
...
* something khushbu can help with
* v1 v2 works with flatmm develop
* v0 v1 v2 numerical error gone
* Fixing numerical error, and interchange preshuffle configs to match with flatmm
* Refactor GEMM pipeline configurations and integrate preshuffle support
- Updated preshuffle pipeline definitions to include multiple versions (V1, V2, V3).
- Changed the pipeline constant from CK_TILE_PIPELINE_PRESHUFFLE to CK_TILE_PIPELINE_PRESHUFFLE_V3 in relevant configurations.
- Removed obsolete code and comments
* clang format
* fix vectorloadsize bug
* add the Preshuffle3
* update kwarp calculation in gemm utils
* update vector size A and B correctly in V2 pipeline; Added few more changes to align with dteng's branch
* fix: add CK_GFX950_SUPPORT macro for gfx950 detection
* default disable rotating buffer
* docs(CHANGELOG): update changelog for rocm 7.0
* Revert "docs(CHANGELOG): update changelog for rocm 7.0"
This reverts commit 2bc16fff84 .
* Remove unused Preshuffle V3 pipeline and related code; update gemm function to use Preshuffle V2; clean up comments and formatting in various files.
* revert example/ck_tile/flatmm to its original state
* remove comment added by second author
* switch to xor ALDSDescriptor
* modify the MakeALdsDescriptor()
* temporary profiling script
* getting rid of line marker compiler error
* UniversalWeightPreshufflePipelineAgBgCrPolicy now derives from UniversalGemmBasePolicy
* add a minor fix for the config
* typo fix
* Fix formatting in lambda function for WeightPreshufflePipelineAGmemBGmemCRegV2
* revert change in include/ck_tile/ops/flatmm/pipeline/flatmm_pipeline_agmem_bgmem_creg_v1.hpp
* revert change in include/ck_tile/core/arch/amd_buffer_addressing.hpp
* reenable the GemmSpatiallyLocalTilePartitioner
* make GemmConfigPreshuffle_1 for v1 pipeline, GemmConfigPreshuffle_2 for v2 pipeline
* remove hardcoded true for preshuffle bool template argument
* rename script
* remove gemm_profilie.sh script
* merge conflict resolve
* clang formatted
* typo fix
* Remove duplicate include of block_gemm_areg_bsmem_creg_v2r1.hpp in gemm.hpp
* Remove commented-out code in UniversalWeightPreshufflePipelineAgBgCrPolicy
* Fix missing newline at end of file in run_gemm_example.inc
* Remove unused barrier call in BlockWeightPreshuffleASmemBSmemCRegV1
* addressing review comments
* removing debug code
* addressing review comments
* Revert "addressing review comments"
This reverts commit 29c45192ba .
* updating tile_engine code
* addressing review comments
---------
Co-authored-by: amd-khushbu <khuagarw@amd.com >
Co-authored-by: ThomasNing <thomas.ning@amd.com >
2025-08-01 00:04:54 -07:00
Khushbu Agarwal
d239b91fd5
Merge flatmm Operator with universal gemm ( #2434 )
...
* Initial commit
* Adding new tile partitioner to flatmm
* intermediate changes
* debugging kernels
* Updating flatmm example to universal gemm example
* updated flatmm kernel to run via gemmKernel
* update universal gemm to incorporate flatmm
* debug
* Fix flatmm call
* Fixing other kernels and tests for API changes
* clang formatted
* fixing gemm tests
* added test for flatmm and simplify kernel arguments
* adding flatmm test
* fix test for flatmm
* simplify gemm kernel with flatmm
* remove flatmm related files
* addressing review comments and code clean up
* resolving empty file
* resolving empty file
* clang formatted
* addressing review comments
* enable persistent kernel for flatmm
* reverted the removed files for flatmm
* reverted the removed files for flatmm
* changed flatmm to weightPReshuffle; removed the _1 added in teh faltmm example
* some more renames
* clang formatted
2025-07-11 08:27:55 -07:00
Thomas Ning
3c4cdfac4f
Fix the CK Tile related operators ( #2356 )
...
* fix the flatmm
* Fix the pipeline
* address the comment
2025-06-16 17:38:52 -07:00
Illia Silin
5523df4b2d
Revert "fix the flatmm ( #2349 )" ( #2352 )
...
This reverts commit d996bc78be .
2025-06-16 07:54:55 -07:00
Thomas Ning
d996bc78be
fix the flatmm ( #2349 )
2025-06-16 02:17:53 -07:00
BingYuan.Zhou
6a3960c1e1
Flatmm merge ( #2168 )
...
* sync with function interface of cshuffleepiloge,fix flatmm build fail
* move code from solin/flatmm which add mfma16*16*32fp8 and optimize flatmm
---------
Co-authored-by: solin <bingzhou@amd.com >
2025-05-08 12:59:57 +08:00
BingYuan.Zhou
eaf1f0bf3b
[flatmm] implement basic fp16 flatmm ( #2089 )
...
* [flatmm] implement basic fp16 flatmm
* fix CI build fail
---------
Co-authored-by: root <root@hjbog-srdc-50.amd.com >
Co-authored-by: solin <bingzhou@amd.com >
2025-04-16 16:51:17 +08:00