Commit Graph

22 Commits

Author SHA1 Message Date
Zzz9990
1aa93ef551 [CK_TILE MOE] add NT & preshuffle permute to cktile MOE (#3377)
* update coherence
---------

Co-authored-by: Zzz9990 <Zzz9990>
2025-12-10 10:03:28 +08:00
lalala-sh
6f0966e1e9 fix a16w4 moe bugs (#3373)
* fix valid mask bug

* update format
2025-12-09 17:54:55 +08:00
Yi DING
878b4e7f46 [CK_TILE] Optimize Flatmm MXFP4 by Eliminating Runtime Division by 2 (#3287)
* [CK_TILE] Optimize Flatmm MXFP4 by Eliminating Runtime Division by 2

* typo
2025-12-08 19:20:44 +08:00
msaffari-amd
f875ab0bbc Add validity checks for MoE FlatMM scatter and enable bf16 hardware atomic-add (#3236)
* Add validity checks for MoE FlatMM scatter and enable bf16 hardware atomic

* correct clang-format

* removed unused rtol_atol variable from example code

* clang format correction

* remove unused varable max_accumulated_value from example
2025-11-28 09:43:01 +01:00
Aviral Goel
de6466481f chore(copyright): update copyright header for include directory (#3293) 2025-11-26 11:00:05 -07:00
Yi DING
47e2ed838e [CK_TILE] Add Flatmm MX FP8 (#3208)
* Use async for flatmm mxfp4

* Fix preshuffle

* Add flatmm mxfp8

* Thanks, Copilot

* Thanks Copilot again~
2025-11-20 10:35:15 +08:00
Max Podkorytov
a3a4eb12bd [CK-Tile] Remove usage of tile partitioner's full gemm shape (#3204)
gemm shape should be used from the pipeline instead (where it gets from a problem description struct)
2025-11-18 09:56:40 -08:00
BingYuan.Zhou
4d629cd2b0 fix build error (#3195)
Co-authored-by: root <root@hjbog-srdc-39.amd.com>
2025-11-14 09:46:13 +08:00
Yi DING
e135dd518d [CK_TILE] Add mxfp4 flatmm (#3080)
* Squashed commit of the following:

commit 3e1a851dad834776efbe4fe365ac82c4ed312010
Author: Ding, Yi <yi.ding@amd.com>
Date:   Thu Oct 23 06:10:54 2025 +0000

    Fix & clean after rebase

commit 1edf485092f44411da9a1796a4a6b72d5cdb67c6
Author: Ding, Yi <yi.ding@amd.com>
Date:   Wed Oct 22 10:46:13 2025 +0000

    Squashed commit of the following:

    commit 0b6b9dbd1b
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Mon Sep 22 02:04:27 2025 -0500

        fix bandwidth calculation

    commit 9aebf53bb7
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Mon Sep 22 00:58:59 2025 -0500

        updates

    commit 62607de56c
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Fri Sep 19 00:39:46 2025 -0500

        fix a bug, set the A DS_read preload size to 4 for MXFP4

    commit 92ad6fcc0a
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Thu Sep 18 01:19:03 2025 -0500

        fix a_wrap preload issue for large MPerBlock.

    commit f2db44710f
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Wed Sep 17 21:34:03 2025 -0500

        optimized the VGPR repack issue for MXFP4

    commit 346a400027
    Author: Gino Lu <gino.lu@amd.com>
    Date:   Wed Sep 17 04:19:44 2025 -0500

        fix time error

    commit 80c1743034
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Wed Sep 17 03:58:00 2025 -0500

        updated, function passed.

    commit ce26d9071e
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Tue Sep 16 22:21:39 2025 -0500

        fix, function partially passed

    commit 0a89ed13a5
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Tue Sep 16 03:01:12 2025 -0500

        fix, reference function passed, next check kernel function

    commit ec9bcef591
    Author: Gino Lu <gino.lu@amd.com>
    Date:   Tue Sep 16 02:29:01 2025 -0500

        let pack/unpack return pk_fp4_t

    commit a333206929
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Mon Sep 15 20:50:26 2025 -0500

        fix

    commit 3893c06540
    Author: Gino Lu <gino.lu@amd.com>
    Date:   Mon Sep 15 05:51:06 2025 -0500

        fix bug

    commit 8052bea019
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Mon Sep 15 04:02:05 2025 -0500

        fix core dump issue, function is not correct.

    commit 9ceb3fd508
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Mon Sep 15 03:03:02 2025 -0500

        updates, build pass

    commit cc94eb6045
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Mon Sep 15 00:05:18 2025 -0500

        updates

    commit 22586c3135
    Author: Gino Lu <gino.lu@amd.com>
    Date:   Sun Sep 14 23:40:28 2025 -0500

        fix bug

    commit e92e67b8dd
    Author: Gino Lu <gino.lu@amd.com>
    Date:   Fri Sep 12 03:28:50 2025 -0500

        fix interface

    commit 8b1dd60c08
    Author: Gino Lu <gino.lu@amd.com>
    Date:   Fri Sep 12 02:53:50 2025 -0500

        add interface in warp_gemm_impl

    commit c6135f6abe
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Wed Sep 10 05:03:08 2025 -0500

        updates some fixes.

    commit b0d71b8d19
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Tue Sep 9 04:37:42 2025 -0500

        fix after merge ginolu/add_wgmfma_dispatcher

    commit f119c30317
    Merge: c5030e602 72c8ef856
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Mon Sep 8 22:09:15 2025 -0500

        Merge remote-tracking branch 'origin/ginolu/add_wgmfma_dispatcher' into mtgu/cktile_mxfp4_flatmm_dev

    commit c5030e602e
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Mon Sep 8 21:42:47 2025 -0500

        update mx flatmm tail pipeline

    commit 72c8ef8567
    Merge: 9661bb400 e4a772890
    Author: Gino Lu <gino.lu@amd.com>
    Date:   Mon Sep 8 19:10:23 2025 -0500

        Merge branch 'develop' into ginolu/add_wgmfma_dispatcher

    commit 9661bb400b
    Author: Gino Lu <gino.lu@amd.com>
    Date:   Mon Sep 8 19:09:55 2025 -0500

        fix type error

    commit 0509597f55
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Mon Sep 8 04:01:40 2025 -0500

        update hotloop pipeline

    commit 754ae0461b
    Merge: 15d44406e 83f607e2a
    Author: Gino Lu <gino.lu@amd.com>
    Date:   Fri Sep 5 04:22:26 2025 -0500

        Merge branch 'develop' into ginolu/add_wgmfma_dispatcher

    commit 15d44406e5
    Author: Gino Lu <gino.lu@amd.com>
    Date:   Fri Sep 5 04:21:26 2025 -0500

        fix clang format

    commit 146963d62a
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Wed Sep 3 10:00:54 2025 -0500

        some updates

    commit 12526b626a
    Merge: 47cee0471 00fd72b2d
    Author: asleepzzz <hanwen.chang@amd.com>
    Date:   Wed Sep 3 13:22:03 2025 +0800

        Merge branch 'develop' into ginolu/add_wgmfma_dispatcher

    commit 47cee04712
    Author: Gino Lu <gino.lu@amd.com>
    Date:   Mon Sep 1 02:11:02 2025 -0500

        fix vec size error

    commit d2892925e5
    Author: Gino Lu <gino.lu@amd.com>
    Date:   Mon Sep 1 01:23:39 2025 -0500

        fix format error

    commit 16993acd1d
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Sat Aug 30 03:19:07 2025 -0500

        update codes

    commit 9c37e55d13
    Author: mtgu0705 <mtgu@amd.com>
    Date:   Fri Aug 29 11:27:33 2025 -0500

        init ck_tile mxfp4 flatmm

    commit 5c484a5672
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Thu Aug 28 08:02:50 2025 +0000

        Add bias for f16xf4 moe_flatmm

    commit dd6539f366
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Wed Aug 27 13:39:47 2025 +0000

        update case construction

    commit 65b702454c
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Tue Aug 26 12:32:29 2025 +0000

        support swiglu activaion and use rcpf to accelerate silu

    commit b422e41e08
    Author: Gino Lu <gino.lu@amd.com>
    Date:   Tue Aug 26 02:33:55 2025 -0500

        first commit

    commit d05eed931d
    Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu>
    Date:   Fri Aug 22 04:01:59 2025 -0500

        add line to last

    commit d69cab7f0c
    Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu>
    Date:   Fri Aug 22 03:20:46 2025 -0500

        adjust A_LDS descriptor to avoid bankconflict

    commit 65989e940c
    Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu>
    Date:   Thu Aug 21 09:46:52 2025 -0500

        enable hotloop

    commit c378e9bdf8
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Thu Aug 21 09:12:21 2025 +0000

        support atomic_pk_add_bf16 on gfx950

    commit 85976b0b87
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Thu Aug 21 06:58:55 2025 +0000

        use int64_t as expert stride to avoid overflow

    commit 9fbcc8f8a4
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Wed Aug 20 13:53:32 2025 +0000

        use v4i32 as the storage type for B to avoid repack operation

    commit 81899bd920
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Wed Aug 20 06:40:03 2025 +0000

        add pk_fp4_t and e8m0_t support for amd_buffer_load_impl

    commit c27eb0771a
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Wed Aug 20 04:39:14 2025 +0000

        optimize cvt_pkf4_to_f16 implementation

    commit 3ca0bd500a
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Tue Aug 19 14:56:46 2025 +0000

        optimize A_LDS descriptor to avoid bankconflict

    commit f7f0306eea
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Mon Aug 18 18:43:37 2025 +0000

        fix gate-up when GU_NRepeat > 1

    commit be55c0f9cb
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Mon Aug 18 17:28:11 2025 +0000

        add fp16xf4 moe

    commit 599e1f5b32
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Sun Aug 17 17:51:18 2025 +0000

        rename example

    commit 7899fb4a8d
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Fri Aug 15 06:20:46 2025 +0000

        remove additional check when e8m0->float

    commit 714b341797
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Thu Aug 14 09:34:12 2025 +0000

        eliminate repeat dequant

    commit 53e8c0c533
    Merge: 5de620895 cc9c7b9e5
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Wed Aug 13 16:51:49 2025 +0000

        Merge remote-tracking branch 'origin/moe_flatmm' into feat-mixed_input_flatmm

    commit 5de6208952
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Wed Aug 13 16:16:48 2025 +0000

        update f16xMXF4

    commit 732ebdee8b
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Wed Aug 13 10:48:53 2025 +0000

        update scale-preshuffle for MXF4

    commit edb58d0680
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Mon Aug 11 11:24:34 2025 +0000

        update

    commit cc9c7b9e58
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Mon Aug 11 08:38:23 2025 +0000

        optimize gemm2 atomic_add pattern

    commit 200a11afc8
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Mon Aug 11 07:59:47 2025 +0000

        update scale for mxfp4

    commit 87aed564dc
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Mon Aug 11 07:56:14 2025 +0000

        update case construction

    commit 8b85fa6cf2
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Mon Aug 11 06:03:06 2025 +0000

        update granularity control

    commit 1b8c7097b8
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Mon Aug 11 03:42:46 2025 +0000

        fix TileConfig

    commit 8ba1c708dc
    Author: Gino Lu <gino.lu@amd.com>
    Date:   Thu Aug 7 21:37:28 2025 +0800

        Add e8m0 scaled convert into CK_TILE (#2617)

        * first commit

        * remove redundent code

        * modify according to comments.

        * fix type_convert error with scaled_type_convert

    commit f788d3d629
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Fri Aug 8 20:19:16 2025 +0000

        add mixed_prec fp16xfp4

    commit 3dea10a277
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Thu Aug 7 09:22:04 2025 +0000

        debug mixed_prec flatmm

    commit 0ba513b148
    Merge: 90e910f3a c0cb4d036
    Author: lalala-sh <Jiaxing.Wen@amd.com>
    Date:   Wed Aug 6 16:49:47 2025 +0800

        Merge pull request #2626 from ROCm/felix/flatmm_fix_splitk

        fix split k

    commit 6d3cbc7c0e
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Wed Aug 6 08:33:33 2025 +0000

        add moe_flatmm

    commit c0cb4d036d
    Author: coderfeli <coderfeli@163.com>
    Date:   Wed Aug 6 02:45:31 2025 +0000

        fix split k

    commit 90e910f3a7
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Mon Aug 4 07:16:36 2025 +0000

        fix flatmm with scaling when WarpTileM == 32

    commit aa5e008fa5
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Fri Aug 1 11:01:23 2025 +0000

        optimize scaling epilogue

    commit ac5908c0bb
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Fri Aug 1 07:28:38 2025 +0000

        fix wrong config for fp8 scaling

    commit 3f43b841d4
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Wed Jul 30 06:20:30 2025 +0000

        prune debug message

    commit 2e5d4c74cd
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Wed Jul 30 04:52:08 2025 +0000

        fix compile error

    commit c117a1986a
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Tue Jul 29 15:42:58 2025 +0000

        Add persistent option on flatmm for tuning

    commit a587701117
    Author: AMD-dteng <dteng@amd.com>
    Date:   Tue Jul 29 22:48:00 2025 +0800

        update pipeline v1: add atomic IGLP schedule

    commit f9e48148d2
    Author: lalala-sh <Jiaxing.Wen@amd.com>
    Date:   Thu Jul 24 09:09:27 2025 +0000

        fix error log throwing

    commit 1b6d7cf407
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Mon Jul 28 08:24:51 2025 +0000

        crz idea

    commit 5473f06461
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Sun Jul 27 11:57:38 2025 +0000

        Add permuteN optimzization when NRepeat % 2 == 0 on flatmm

    commit bfb9f4002f
    Author: sjfeng <j514681085@icloud.com>
    Date:   Sun Jul 27 17:24:08 2025 +0800

        try to remove c_shuffle_lds

    commit 1264f4d2ab
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Fri Jul 25 07:41:48 2025 +0000

        fix loop-dim mismatch and improve c_shuffle alu parallelism

    commit 1239d8a546
    Merge: 406645448 b908f5e80
    Author: lalala-sh <Jiaxing.Wen@amd.com>
    Date:   Thu Jul 24 08:46:51 2025 +0000

        merge flatmm -scale

    commit 4066454483
    Author: lalala-sh <Jiaxing.Wen@amd.com>
    Date:   Thu Jul 24 16:19:58 2025 +0800

        revert delete of inc file

    commit 68390988c9
    Author: solin <bingzhou@amd.com>
    Date:   Thu Jul 24 04:38:16 2025 +0000

        reorg  flatmm code

    commit b908f5e803
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Wed Jul 23 19:12:31 2025 +0000

        fix flatmm syntax error on gfx950

    commit 5a1183ebbd
    Author: Feng Shijie <Shijie.Feng@amd.com>
    Date:   Wed Jul 23 19:04:22 2025 +0000

        support flatmm scaling

    commit 89fa639207
    Author: valarLip <340077269@qq.com>
    Date:   Wed Jul 23 08:44:12 2025 +0000

        merge flatmm pipe v0 from dteng_flatmm_opt

    commit 3f7d848dd3
    Author: lalala-sh <Jiaxing.Wen@amd.com>
    Date:   Wed Jul 23 15:38:12 2025 +0800

        build pass

    commit 6dacf833da
    Author: lalala-sh <Jiaxing.Wen@amd.com>
    Date:   Wed Jul 23 07:20:26 2025 +0000

         fix bug

    commit 7e1bd4b839
    Author: lalala-sh <Jiaxing.Wen@amd.com>
    Date:   Wed Jul 23 15:01:53 2025 +0800

        sync

    commit 46a538e39e
    Author: valarLip <340077269@qq.com>
    Date:   Tue Jul 22 08:09:35 2025 +0000

        adaptive scheduler instead of Macro definition

    commit 9aa3396a79
    Author: lalala-sh <Jiaxing.Wen@amd.com>
    Date:   Thu Jul 17 08:40:35 2025 +0000

        fix tail handler bug

    commit fb76450e63
    Author: lalala-sh <Jiaxing.Wen@amd.com>
    Date:   Wed Jul 16 10:12:19 2025 +0000

        merge from dteng_flatmm_opt

---------

Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com>
Co-authored-by: AMD-dteng <dteng@amd.com>
Co-authored-by: solin <bingzhou@amd.com>
Co-authored-by: sjfeng <j514681085@icloud.com>
Co-authored-by: valarLip <340077269@qq.com>
Co-authored-by: asleepzzz <hanwen.chang@amd.com>
Co-authored-by: Feng Shijie <Shijie.Feng@amd.com>
Co-authored-by: coderfeli <coderfeli@163.com>
Co-authored-by: Gino Lu <gino.lu@amd.com>
Co-authored-by: mtgu0705 <mtgu@amd.com>

* Fix crash on small M

* Apply suggestion from @Copilot

---------

Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com>
Co-authored-by: AMD-dteng <dteng@amd.com>
Co-authored-by: solin <bingzhou@amd.com>
Co-authored-by: sjfeng <j514681085@icloud.com>
Co-authored-by: valarLip <340077269@qq.com>
Co-authored-by: asleepzzz <hanwen.chang@amd.com>
Co-authored-by: Feng Shijie <Shijie.Feng@amd.com>
Co-authored-by: coderfeli <coderfeli@163.com>
Co-authored-by: Gino Lu <gino.lu@amd.com>
Co-authored-by: mtgu0705 <mtgu@amd.com>
2025-10-31 11:29:05 +08:00
lalala-sh
211d64e18a [CK_TILE] Update flatmm related kernels (#3022)
---------

Co-authored-by: Ding, Yi <yi.ding@amd.com>
Co-authored-by: felix <felix.li@amd.com>
2025-10-22 22:36:11 +08:00
Khushbu Agarwal
b56e5d1d79 Fix for Add the API to load SGPR (#2913)
* Revert "Revert "[CK-Tile] Add the API to load SGPR  (#2878)" (#2904)"

This reverts commit f161b5b738.

* Fix: sgpr minor issue

* cyclic dependency resolved

* clang formatted

* removing unused variable

* clang formatted

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2025-09-25 10:32:42 -07:00
asleepzzz
f161b5b738 Revert "[CK-Tile] Add the API to load SGPR (#2878)" (#2904)
This reverts commit 2cbbf5dcb3.
2025-09-23 14:33:51 -07:00
Thomas Ning
2cbbf5dcb3 [CK-Tile] Add the API to load SGPR (#2878)
* Have a workable version for SGPR

* have a workable version for atomic add

* Revert "have a workable version for atomic add"

This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.

* substitute with the new sgpr read api

* update the CHANGELOG

* have a workable version for atomic add

* Revert "have a workable version for atomic add"

This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.

* change to static for logic

* have a workable version for atomic add

* Revert "have a workable version for atomic add"

This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.
2025-09-23 01:23:56 -07:00
linqunAMD
df4ee556d6 [CK_TILE] Fix flatmm on gfx11 and gfx12 (#2790)
1. Correct shuffle_b and MakeBFlatDramTileDistribution according to WMMA warp layout
2. Add FlatmmConfig16_Wmma for gfx11 and gfx12
2025-09-10 08:28:00 +08:00
linqunAMD
9fcc1ee9fd Support Wave32 in CK_TILE - Part 1 (#2594)
* Support wave32/wave64 in CK_TILE - Part 1

* remove blocksize in kernel launch

* fix build error

* fix clang format

* fix clang format 2

* fix clang format 3

* fix fmha build error

* fix fmha build 2

* fix fmha build 3

* fix build error 4

* address review comment

* update change log

* replace KernelBlockSize with kBlockSize

* fix CI fail

* fix clang format

* address review comment and rebase code.

* fix universal test fail

---------

Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-08-18 10:08:31 -07:00
Aviral Goel
1441a0a7ee Integration of a new pipeline for weight preshuffle into gemm examples (#2516)
* something khushbu can help with

* v1 v2 works with flatmm develop

* v0 v1 v2 numerical error gone

* Fixing numerical error, and interchange preshuffle configs to match with flatmm

* Refactor GEMM pipeline configurations and integrate preshuffle support

- Updated preshuffle pipeline definitions to include multiple versions (V1, V2, V3).
- Changed the pipeline constant from CK_TILE_PIPELINE_PRESHUFFLE to CK_TILE_PIPELINE_PRESHUFFLE_V3 in relevant configurations.
- Removed obsolete code and comments

* clang format

* fix vectorloadsize bug

* add the Preshuffle3

* update kwarp calculation in gemm utils

* update vector size A and B correctly in V2 pipeline; Added few more changes to align with dteng's branch

* fix: add CK_GFX950_SUPPORT macro for gfx950 detection

* default disable rotating buffer

* docs(CHANGELOG): update changelog for rocm 7.0

* Revert "docs(CHANGELOG): update changelog for rocm 7.0"

This reverts commit 2bc16fff84.

* Remove unused Preshuffle V3 pipeline and related code; update gemm function to use Preshuffle V2; clean up comments and formatting in various files.

* revert example/ck_tile/flatmm to its original state

* remove comment added by second author

* switch to xor ALDSDescriptor

* modify the MakeALdsDescriptor()

* temporary profiling script

* getting rid of line marker compiler error

* UniversalWeightPreshufflePipelineAgBgCrPolicy now derives from UniversalGemmBasePolicy

* add a minor fix for the config

* typo fix

* Fix formatting in lambda function for WeightPreshufflePipelineAGmemBGmemCRegV2

* revert change in include/ck_tile/ops/flatmm/pipeline/flatmm_pipeline_agmem_bgmem_creg_v1.hpp

* revert change in include/ck_tile/core/arch/amd_buffer_addressing.hpp

* reenable the GemmSpatiallyLocalTilePartitioner

* make GemmConfigPreshuffle_1 for v1 pipeline, GemmConfigPreshuffle_2 for v2 pipeline

* remove hardcoded true for preshuffle bool template argument

* rename script

* remove gemm_profilie.sh script

* merge conflict resolve

* clang formatted

* typo fix

* Remove duplicate include of block_gemm_areg_bsmem_creg_v2r1.hpp in gemm.hpp

* Remove commented-out code in UniversalWeightPreshufflePipelineAgBgCrPolicy

* Fix missing newline at end of file in run_gemm_example.inc

* Remove unused barrier call in BlockWeightPreshuffleASmemBSmemCRegV1

* addressing review comments

* removing debug code

* addressing review comments

* Revert "addressing review comments"

This reverts commit 29c45192ba.

* updating tile_engine code

* addressing review comments

---------

Co-authored-by: amd-khushbu <khuagarw@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
2025-08-01 00:04:54 -07:00
Khushbu Agarwal
d239b91fd5 Merge flatmm Operator with universal gemm (#2434)
* Initial commit

* Adding new tile partitioner to flatmm

* intermediate changes

* debugging kernels

* Updating flatmm example to universal gemm example

* updated flatmm kernel to run via gemmKernel

* update universal gemm to incorporate flatmm

* debug

* Fix flatmm call

* Fixing other kernels and tests for API changes

* clang formatted

* fixing gemm tests

* added test for flatmm and simplify kernel arguments

* adding flatmm test

* fix test for flatmm

* simplify gemm kernel with flatmm

* remove flatmm related files

* addressing review comments and code clean up

* resolving empty file

* resolving empty file

* clang formatted

* addressing review comments

* enable persistent kernel for flatmm

* reverted the removed files for flatmm

* reverted the removed files for flatmm

* changed flatmm to weightPReshuffle; removed the _1 added in teh faltmm example

* some more renames

* clang formatted
2025-07-11 08:27:55 -07:00
Thomas Ning
3c4cdfac4f Fix the CK Tile related operators (#2356)
* fix the flatmm

* Fix the pipeline

* address the comment
2025-06-16 17:38:52 -07:00
Illia Silin
5523df4b2d Revert "fix the flatmm (#2349)" (#2352)
This reverts commit d996bc78be.
2025-06-16 07:54:55 -07:00
Thomas Ning
d996bc78be fix the flatmm (#2349) 2025-06-16 02:17:53 -07:00
BingYuan.Zhou
6a3960c1e1 Flatmm merge (#2168)
* sync with function interface of cshuffleepiloge,fix flatmm build fail

* move code from solin/flatmm which add mfma16*16*32fp8 and optimize flatmm

---------

Co-authored-by: solin <bingzhou@amd.com>
2025-05-08 12:59:57 +08:00
BingYuan.Zhou
eaf1f0bf3b [flatmm] implement basic fp16 flatmm (#2089)
* [flatmm] implement basic fp16 flatmm

* fix CI build fail

---------

Co-authored-by: root <root@hjbog-srdc-50.amd.com>
Co-authored-by: solin <bingzhou@amd.com>
2025-04-16 16:51:17 +08:00