Commit Graph

75 Commits

Author SHA1 Message Date
mtgu0705
f2db44710f optimized the VGPR repack issue for MXFP4 2025-09-17 21:34:03 -05:00
mtgu0705
80c1743034 updated, function passed. 2025-09-17 03:58:00 -05:00
mtgu0705
ce26d9071e fix, function partially passed 2025-09-16 22:21:39 -05:00
mtgu0705
0a89ed13a5 fix, reference function passed, next check kernel function 2025-09-16 03:01:12 -05:00
mtgu0705
9ceb3fd508 updates, build pass 2025-09-15 03:03:02 -05:00
mtgu0705
cc94eb6045 updates 2025-09-15 00:05:18 -05:00
mtgu0705
c6135f6abe updates some fixes. 2025-09-10 05:03:08 -05:00
mtgu0705
b0d71b8d19 fix after merge ginolu/add_wgmfma_dispatcher 2025-09-09 04:37:42 -05:00
mtgu0705
f119c30317 Merge remote-tracking branch 'origin/ginolu/add_wgmfma_dispatcher' into mtgu/cktile_mxfp4_flatmm_dev 2025-09-08 22:09:15 -05:00
mtgu0705
c5030e602e update mx flatmm tail pipeline 2025-09-08 21:42:47 -05:00
mtgu0705
0509597f55 update hotloop pipeline 2025-09-08 04:01:40 -05:00
mtgu0705
146963d62a some updates 2025-09-03 10:00:54 -05:00
mtgu0705
16993acd1d update codes 2025-08-30 03:19:07 -05:00
Feng Shijie
65b702454c support swiglu activaion and use rcpf to accelerate silu 2025-08-26 12:32:29 +00:00
root
d05eed931d add line to last 2025-08-22 04:01:59 -05:00
root
d69cab7f0c adjust A_LDS descriptor to avoid bankconflict 2025-08-22 03:20:46 -05:00
root
65989e940c enable hotloop 2025-08-21 09:46:52 -05:00
Feng Shijie
9fbcc8f8a4 use v4i32 as the storage type for B to avoid repack operation 2025-08-20 13:53:32 +00:00
Feng Shijie
c27eb0771a optimize cvt_pkf4_to_f16 implementation 2025-08-20 04:39:14 +00:00
Feng Shijie
3ca0bd500a optimize A_LDS descriptor to avoid bankconflict 2025-08-19 14:56:46 +00:00
Feng Shijie
be55c0f9cb add fp16xf4 moe 2025-08-18 17:28:11 +00:00
linqunAMD
9fcc1ee9fd Support Wave32 in CK_TILE - Part 1 (#2594)
* Support wave32/wave64 in CK_TILE - Part 1

* remove blocksize in kernel launch

* fix build error

* fix clang format

* fix clang format 2

* fix clang format 3

* fix fmha build error

* fix fmha build 2

* fix fmha build 3

* fix build error 4

* address review comment

* update change log

* replace KernelBlockSize with kBlockSize

* fix CI fail

* fix clang format

* address review comment and rebase code.

* fix universal test fail

---------

Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-08-18 10:08:31 -07:00
Feng Shijie
599e1f5b32 rename example 2025-08-17 17:51:18 +00:00
Tianyuan Wu
68134b60e4 [CK_TILE] CK_TILE GEMM WMMA Support for GFX11/GFX12 (#2466)
* WMMA GEMM F16 Implementation

Signed-off-by: root <tianyuwu@amd.com>

* Self-review

Signed-off-by: root <tianyuwu@amd.com>

* ASIC check minor tweak

Signed-off-by: root <tianyuwu@amd.com>

* add missing include file

* Set GPU_TARGETS to gfx11/12 generic

Signed-off-by: root <tianyuwu@amd.com>

* INT8 GFX12

Signed-off-by: root <tianyuwu@amd.com>

* add int8x16 branch

* Fix CI script

Signed-off-by: root <tianyuwu@amd.com>

* Fix typo

Signed-off-by: root <tianyuwu@amd.com>

* Add CK_Tile WMMA example

Signed-off-by: Tianyuan Wu <tianyuwu@amd.com>

* Fix CI

Signed-off-by: Tianyuan Wu <tianyuwu@amd.com>

* fix clang format

* Set M/N_Warp Back to Constant

Signed-off-by: Tianyuan Wu <tianyuwu@amd.com>

* Use GemmConfigComputeV3 by default

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Remove CK_Tile wmma gemm examples from the CI list

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Add atomic add fallback method for gfx11

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Fix typo

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Omit copyright year

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Support non-square cases

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Fix CI

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Add get_device_ip()

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Revert "Add atomic add fallback method for gfx11"

This reverts commit 07a79e797d.

Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>

* Revert "Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12"

This reverts commit ceee918007.

* Revise method name and typos

Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>

* clang-format

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Try fix CI

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Revert "Try fix CI"

This reverts commit 7a7241085e.

* clang-format

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Fix typo caused by merge

Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>

* Fix typo caused by merging

Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>

---------

Signed-off-by: root <tianyuwu@amd.com>
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com>
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
Co-authored-by: joye <joye@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
2025-08-15 16:22:27 -07:00
Feng Shijie
7899fb4a8d remove additional check when e8m0->float 2025-08-15 06:20:46 +00:00
Feng Shijie
714b341797 eliminate repeat dequant 2025-08-14 09:34:12 +00:00
Feng Shijie
53e8c0c533 Merge remote-tracking branch 'origin/moe_flatmm' into feat-mixed_input_flatmm 2025-08-13 16:51:49 +00:00
Feng Shijie
5de6208952 update f16xMXF4 2025-08-13 16:16:48 +00:00
Feng Shijie
732ebdee8b update scale-preshuffle for MXF4 2025-08-13 10:48:53 +00:00
joyeamd
0856b3f4a2 [CK_TILE]fix ck_tile's moe_sorting example in gfx11 (#2667)
* fix ck_tile's moe_sorting example in gfx11

* fix clang format

---------

Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
2025-08-12 12:33:56 -07:00
Feng Shijie
edb58d0680 update 2025-08-11 11:24:34 +00:00
Feng Shijie
200a11afc8 update scale for mxfp4 2025-08-11 07:59:47 +00:00
Feng Shijie
f788d3d629 add mixed_prec fp16xfp4 2025-08-08 20:19:16 +00:00
Feng Shijie
3dea10a277 debug mixed_prec flatmm 2025-08-07 09:22:04 +00:00
Feng Shijie
6d3cbc7c0e add moe_flatmm 2025-08-06 08:33:33 +00:00
coderfeli
c0cb4d036d fix split k 2025-08-06 02:45:31 +00:00
Aviral Goel
1441a0a7ee Integration of a new pipeline for weight preshuffle into gemm examples (#2516)
* something khushbu can help with

* v1 v2 works with flatmm develop

* v0 v1 v2 numerical error gone

* Fixing numerical error, and interchange preshuffle configs to match with flatmm

* Refactor GEMM pipeline configurations and integrate preshuffle support

- Updated preshuffle pipeline definitions to include multiple versions (V1, V2, V3).
- Changed the pipeline constant from CK_TILE_PIPELINE_PRESHUFFLE to CK_TILE_PIPELINE_PRESHUFFLE_V3 in relevant configurations.
- Removed obsolete code and comments

* clang format

* fix vectorloadsize bug

* add the Preshuffle3

* update kwarp calculation in gemm utils

* update vector size A and B correctly in V2 pipeline; Added few more changes to align with dteng's branch

* fix: add CK_GFX950_SUPPORT macro for gfx950 detection

* default disable rotating buffer

* docs(CHANGELOG): update changelog for rocm 7.0

* Revert "docs(CHANGELOG): update changelog for rocm 7.0"

This reverts commit 2bc16fff84.

* Remove unused Preshuffle V3 pipeline and related code; update gemm function to use Preshuffle V2; clean up comments and formatting in various files.

* revert example/ck_tile/flatmm to its original state

* remove comment added by second author

* switch to xor ALDSDescriptor

* modify the MakeALdsDescriptor()

* temporary profiling script

* getting rid of line marker compiler error

* UniversalWeightPreshufflePipelineAgBgCrPolicy now derives from UniversalGemmBasePolicy

* add a minor fix for the config

* typo fix

* Fix formatting in lambda function for WeightPreshufflePipelineAGmemBGmemCRegV2

* revert change in include/ck_tile/ops/flatmm/pipeline/flatmm_pipeline_agmem_bgmem_creg_v1.hpp

* revert change in include/ck_tile/core/arch/amd_buffer_addressing.hpp

* reenable the GemmSpatiallyLocalTilePartitioner

* make GemmConfigPreshuffle_1 for v1 pipeline, GemmConfigPreshuffle_2 for v2 pipeline

* remove hardcoded true for preshuffle bool template argument

* rename script

* remove gemm_profilie.sh script

* merge conflict resolve

* clang formatted

* typo fix

* Remove duplicate include of block_gemm_areg_bsmem_creg_v2r1.hpp in gemm.hpp

* Remove commented-out code in UniversalWeightPreshufflePipelineAgBgCrPolicy

* Fix missing newline at end of file in run_gemm_example.inc

* Remove unused barrier call in BlockWeightPreshuffleASmemBSmemCRegV1

* addressing review comments

* removing debug code

* addressing review comments

* Revert "addressing review comments"

This reverts commit 29c45192ba.

* updating tile_engine code

* addressing review comments

---------

Co-authored-by: amd-khushbu <khuagarw@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
2025-08-01 00:04:54 -07:00
Feng Shijie
3f43b841d4 prune debug message 2025-07-30 06:37:26 +00:00
Feng Shijie
2e5d4c74cd fix compile error 2025-07-30 04:52:08 +00:00
Feng Shijie
c117a1986a Add persistent option on flatmm for tuning 2025-07-29 15:42:58 +00:00
AMD-dteng
a587701117 update pipeline v1: add atomic IGLP schedule 2025-07-29 14:59:32 +00:00
Illia Silin
504b101da3 upgrade from clang-format-12 to clang-format-18 (#2568)
* upgrade to clang-format-18

* update to clang-format-18 in pre-commit-config
2025-07-28 11:34:07 -07:00
Feng Shijie
1b6d7cf407 crz idea 2025-07-28 08:24:51 +00:00
Feng Shijie
5473f06461 Add permuteN optimzization when NRepeat % 2 == 0 on flatmm 2025-07-27 11:57:38 +00:00
lalala-sh
1239d8a546 merge flatmm -scale 2025-07-24 08:46:51 +00:00
Feng Shijie
b908f5e803 fix flatmm syntax error on gfx950 2025-07-23 19:12:31 +00:00
Feng Shijie
5a1183ebbd support flatmm scaling 2025-07-23 19:04:22 +00:00
valarLip
89fa639207 merge flatmm pipe v0 from dteng_flatmm_opt 2025-07-23 09:50:33 +00:00
lalala-sh
3f7d848dd3 build pass 2025-07-23 15:38:12 +08:00
lalala-sh
6dacf833da fix bug 2025-07-23 07:20:26 +00:00