ZheWang
418ee44844
Mx fp6 flatmm ( #3601 )
...
* add fp6 data-type and support sync/async dwordx3 load/store
* clang-format
* pre-commit
* 1st commit
* default mnk pass ut
* fix a distrubution
* fix
* fix bdram distr
* update
* pass ut
* improve perf
* update
* clean code
* resolve copilot comment
* reslove comment
* clang-format
---------
Co-authored-by: ZheWang <zhewan@amd.com >
[ROCm/composable_kernel commit: e6bcd192d4 ]
2026-02-02 16:04:40 +08:00
yadaish
fc3ffa0d75
[CK_TILE] support split-k a16w4 gemm1 ( #3389 )
...
* initial version to support moe gemm1 split-k
* add missing args
* fix build warning
* update reference
* for split-k disable bias and weight
* remove debug log
* fix format
* fix div by zero errors
* fix cmake config
* update
* resolve conflicts
* remove useless changes
* reformat
* fix
* remove useless changes
* fix ci
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: root <root@smci355-ccs-aus-m01-25.cs-aus.dcgpu >
[ROCm/composable_kernel commit: dae85ead64 ]
2025-12-29 23:05:35 +08:00
Yi DING
fc71fcd9ad
[CK_TILE] MX Flatmm Use Byte Pointer Arithmetic for A Tensor ( #3446 )
...
* A as bytes
* Reformat with static_for_product
[ROCm/composable_kernel commit: 2220cbaba7 ]
2025-12-19 10:28:13 +08:00
Aviral Goel
0861395425
chore(copyright) update library wide CMakeLists.txt copyright header template ( #3313 )
...
* chore(copyright) update library wide CMakeLists.txt files copyright header template
* Fix build
---------
Co-authored-by: Sami Remes <samremes@amd.com >
[ROCm/composable_kernel commit: 004784ef98 ]
2025-11-28 13:49:54 -08:00
Max Podkorytov
0ce4a61da5
[CK Tile] enable building examples by default ( #3259 )
...
* remove EXCLUDE_FROM_ALL from ck-tile examples
-> +15 min build time w/ 64 threads for a single arch
* fix cpp17 compile error in the ck-tile examples
---------
Co-authored-by: khuagarw <khuagarw@amd.com >
Co-authored-by: Ding, Yi <yi.ding@amd.com >
[ROCm/composable_kernel commit: 79aae7c7f7 ]
2025-11-26 16:24:44 -08:00
Yi DING
e27e760d5a
[CK_TILE] Add Flatmm MX FP8 ( #3208 )
...
* Use async for flatmm mxfp4
* Fix preshuffle
* Add flatmm mxfp8
* Thanks, Copilot
* Thanks Copilot again~
[ROCm/composable_kernel commit: 47e2ed838e ]
2025-11-20 10:35:15 +08:00
Yi DING
e2060bd1fb
[CK_TILE] MX Flatmm Split kernel instances ( #3207 )
...
* [CK_TILE] MX Flatmm Split kernel instances
* Fix flatmm example compile
[ROCm/composable_kernel commit: b6720531de ]
2025-11-18 13:46:30 +08:00
Yi DING
acec30dd09
[CK_TILE] Add mxfp4 flatmm ( #3080 )
...
* Squashed commit of the following:
commit 3e1a851dad834776efbe4fe365ac82c4ed312010
Author: Ding, Yi <yi.ding@amd.com >
Date: Thu Oct 23 06:10:54 2025 +0000
Fix & clean after rebase
commit 1edf485092f44411da9a1796a4a6b72d5cdb67c6
Author: Ding, Yi <yi.ding@amd.com >
Date: Wed Oct 22 10:46:13 2025 +0000
Squashed commit of the following:
commit 5276b28a51dac7b5d2106fbae8e78de190ee0de1
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 22 02:04:27 2025 -0500
fix bandwidth calculation
commit d645bb20c6d879154c30ecd82bbff4d2a9206750
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 22 00:58:59 2025 -0500
updates
commit 0fa7e6b88aaf81a36034aa7607746de295de4263
Author: mtgu0705 <mtgu@amd.com >
Date: Fri Sep 19 00:39:46 2025 -0500
fix a bug, set the A DS_read preload size to 4 for MXFP4
commit 50cafa824e2267f2b2f0dfeeb93e69a673630c61
Author: mtgu0705 <mtgu@amd.com >
Date: Thu Sep 18 01:19:03 2025 -0500
fix a_wrap preload issue for large MPerBlock.
commit e6333bbbc6ef540e24f92095040085f1ed59041e
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 17 21:34:03 2025 -0500
optimized the VGPR repack issue for MXFP4
commit e99e4932c401b9f6d1893dd5044c2827d6b3f145
Author: Gino Lu <gino.lu@amd.com >
Date: Wed Sep 17 04:19:44 2025 -0500
fix time error
commit 4586ce6da7fba0514f2e01a8124c76b7d494e124
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 17 03:58:00 2025 -0500
updated, function passed.
commit c4f25e7579573db5681b9160f6bdb1349f3566f1
Author: mtgu0705 <mtgu@amd.com >
Date: Tue Sep 16 22:21:39 2025 -0500
fix, function partially passed
commit a51b56eb6b00b99a4e8d2802dbf5b5b5277b54d8
Author: mtgu0705 <mtgu@amd.com >
Date: Tue Sep 16 03:01:12 2025 -0500
fix, reference function passed, next check kernel function
commit 5b02643ebab18960e8f9ba66c6bd2f91774f9cae
Author: Gino Lu <gino.lu@amd.com >
Date: Tue Sep 16 02:29:01 2025 -0500
let pack/unpack return pk_fp4_t
commit 76d37c5d4b17530e95c6fced31bff66a35d54b8f
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 20:50:26 2025 -0500
fix
commit e5be3e162b9a20e5355bd556d2b27afb6d8bf085
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 15 05:51:06 2025 -0500
fix bug
commit 39a024efe4aa773df589712b1290803bb5ab5d1d
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 04:02:05 2025 -0500
fix core dump issue, function is not correct.
commit 16c49d268cfe065b5112b960b2d852b26552686a
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 03:03:02 2025 -0500
updates, build pass
commit fe7a961852dee6eff3be3cf1e0d0fabec5cd42ee
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 00:05:18 2025 -0500
updates
commit aaf9fe8022a72df59e04e4d5886dca3ba9c23400
Author: Gino Lu <gino.lu@amd.com >
Date: Sun Sep 14 23:40:28 2025 -0500
fix bug
commit a3da89290e1553b85fbf1171c07e93ac0f5584db
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 12 03:28:50 2025 -0500
fix interface
commit c5ff747e72d877461ba61dc19a0fe15527d3161e
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 12 02:53:50 2025 -0500
add interface in warp_gemm_impl
commit 0a48d369e601cc798589fc59e0784bdbfc0a22f9
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 10 05:03:08 2025 -0500
updates some fixes.
commit aaa2beca30ff5546d171a2028d1894fd4e131d4e
Author: mtgu0705 <mtgu@amd.com >
Date: Tue Sep 9 04:37:42 2025 -0500
fix after merge ginolu/add_wgmfma_dispatcher
commit bf87449b09cba690922b2f3f78ba39bf1b1e472e
Merge: 05ab58e3d 991d7fdbb
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 8 22:09:15 2025 -0500
Merge remote-tracking branch 'origin/ginolu/add_wgmfma_dispatcher' into mtgu/cktile_mxfp4_flatmm_dev
commit 05ab58e3de2b708aceda63d704089c0fa89437ae
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 8 21:42:47 2025 -0500
update mx flatmm tail pipeline
commit 991d7fdbb726d65091a91b5cc2800f798a6661fc
Merge: ad046084a b2f280046
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 8 19:10:23 2025 -0500
Merge branch 'develop' into ginolu/add_wgmfma_dispatcher
commit ad046084a2f6e4ebf0cd8b47d0d72b74815061fa
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 8 19:09:55 2025 -0500
fix type error
commit 42e16b43a035364a42789d7ce45a1e6a7d1d2609
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 8 04:01:40 2025 -0500
update hotloop pipeline
commit c2f69745346545087c8ce24acaba2961bb93ef0b
Merge: adbeeb90b 8b4be3a0e
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 5 04:22:26 2025 -0500
Merge branch 'develop' into ginolu/add_wgmfma_dispatcher
commit adbeeb90be1533f8aeb8c1d5aea6470d45a455a0
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 5 04:21:26 2025 -0500
fix clang format
commit e2378ac393bb79ac80a8eef84677bffce86d9e0a
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 3 10:00:54 2025 -0500
some updates
commit bdc18a2269db49ff88e1ef1db30f83ea430d7544
Merge: 6c5cea2b7 feec59755
Author: asleepzzz <hanwen.chang@amd.com >
Date: Wed Sep 3 13:22:03 2025 +0800
Merge branch 'develop' into ginolu/add_wgmfma_dispatcher
commit 6c5cea2b7a306f5d0ad346cb9baf6370ea2a73fe
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 1 02:11:02 2025 -0500
fix vec size error
commit 76d1dfa352087dfd5867c8909b73726d3a1e853e
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 1 01:23:39 2025 -0500
fix format error
commit a9061aaa1b4bfaa9db102c75b9d74863f39708a9
Author: mtgu0705 <mtgu@amd.com >
Date: Sat Aug 30 03:19:07 2025 -0500
update codes
commit 0caa184a271a8824ef40f87de456d0fa2500c8ad
Author: mtgu0705 <mtgu@amd.com >
Date: Fri Aug 29 11:27:33 2025 -0500
init ck_tile mxfp4 flatmm
commit 5d46a6635f04bd69b76f7eda1438862e271b987a
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 28 08:02:50 2025 +0000
Add bias for f16xf4 moe_flatmm
commit dd112dc302d17f541737671a3ac557d7c09ff969
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 27 13:39:47 2025 +0000
update case construction
commit b1aca68a073d82c7b3c7bb53286e5f415999edc1
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Tue Aug 26 12:32:29 2025 +0000
support swiglu activaion and use rcpf to accelerate silu
commit 49235bd42349a84fc2ebd7ad0b100cc2545bb80a
Author: Gino Lu <gino.lu@amd.com >
Date: Tue Aug 26 02:33:55 2025 -0500
first commit
commit c169e39d6381b932cf7098cc118db29df91da1cb
Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu >
Date: Fri Aug 22 04:01:59 2025 -0500
add line to last
commit 318f9bf317306454941bbf394c1940023edcf0ac
Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu >
Date: Fri Aug 22 03:20:46 2025 -0500
adjust A_LDS descriptor to avoid bankconflict
commit 9d066120ed068d6d102da25d619e170a28a04d18
Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu >
Date: Thu Aug 21 09:46:52 2025 -0500
enable hotloop
commit 61a895e6b821798970afffd0e9432a21e2f04df8
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 21 09:12:21 2025 +0000
support atomic_pk_add_bf16 on gfx950
commit 9f14864e45f21d8c1bc70a94988fb86c2c0017d8
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 21 06:58:55 2025 +0000
use int64_t as expert stride to avoid overflow
commit e63af46b32e1139a1e59dee6f46b9971047c4026
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 20 13:53:32 2025 +0000
use v4i32 as the storage type for B to avoid repack operation
commit 6cf0224dd8a229bf2be726ca861c736c9b5f5415
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 20 06:40:03 2025 +0000
add pk_fp4_t and e8m0_t support for amd_buffer_load_impl
commit 67a591f2240b0b035029edad904627f98b3839fd
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 20 04:39:14 2025 +0000
optimize cvt_pkf4_to_f16 implementation
commit 51c7126e77e9b17af694eaa57040e487f9d443e8
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Tue Aug 19 14:56:46 2025 +0000
optimize A_LDS descriptor to avoid bankconflict
commit c113160f326353290a2878d7b8febf7daed91d71
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 18 18:43:37 2025 +0000
fix gate-up when GU_NRepeat > 1
commit a45ca0e9934ca4bb9114f65621d5c9582d937a45
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 18 17:28:11 2025 +0000
add fp16xf4 moe
commit dc8c8e484804f7bca10c8f0764540af3b5884e83
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Sun Aug 17 17:51:18 2025 +0000
rename example
commit b177c967141cfdc401d3f36bf17830fe99893600
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 15 06:20:46 2025 +0000
remove additional check when e8m0->float
commit d467f9688c3d35f391e15089135edb1ad1d38b05
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 14 09:34:12 2025 +0000
eliminate repeat dequant
commit 1b20674b26ab3ce6bd2f710dd729fd4cc0f79428
Merge: faa3c0278 7d02625e7
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 13 16:51:49 2025 +0000
Merge remote-tracking branch 'origin/moe_flatmm' into feat-mixed_input_flatmm
commit faa3c0278cf11b7105a4302dea3a4416520b2cc7
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 13 16:16:48 2025 +0000
update f16xMXF4
commit a2a2e1dab05501cc2136133236c01c08d51db4ea
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 13 10:48:53 2025 +0000
update scale-preshuffle for MXF4
commit eac9667feb899419dda1628164c092b969852660
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 11:24:34 2025 +0000
update
commit 7d02625e7678882af653f52c2a4ddaf64568a41c
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 08:38:23 2025 +0000
optimize gemm2 atomic_add pattern
commit d5f3c3e3ec72d0e6739467c4dc0b4e209f6d1192
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 07:59:47 2025 +0000
update scale for mxfp4
commit 15db198084614466bd4cfd4943fcb549cab2069a
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 07:56:14 2025 +0000
update case construction
commit 5dff349d82a5f70b6eea821d2622df51f90ef200
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 06:03:06 2025 +0000
update granularity control
commit d32cdc52144f65ec473f4ec8e45ea23968811184
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 03:42:46 2025 +0000
fix TileConfig
commit 26f38c5716304ee5f84e5c4f6f88144d9f3dddaf
Author: Gino Lu <gino.lu@amd.com >
Date: Thu Aug 7 21:37:28 2025 +0800
Add e8m0 scaled convert into CK_TILE (#2617 )
* first commit
* remove redundent code
* modify according to comments.
* fix type_convert error with scaled_type_convert
commit 419041478745f65dfec18859e75a13d975089519
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 8 20:19:16 2025 +0000
add mixed_prec fp16xfp4
commit 92e2a8b0308b9b107df9d2fd63a961efce706402
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 7 09:22:04 2025 +0000
debug mixed_prec flatmm
commit dea3ce80496ebcb00512979f0c3bb897f25e11a5
Merge: fde443bc3 b4f45fe14
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Aug 6 16:49:47 2025 +0800
Merge pull request #2626 from ROCm/felix/flatmm_fix_splitk
fix split k
commit d480e8150358cc4ef8b05e25afe299141fad4fde
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 6 08:33:33 2025 +0000
add moe_flatmm
commit b4f45fe14d11569f34de40c8a205cd6760b61357
Author: coderfeli <coderfeli@163.com >
Date: Wed Aug 6 02:45:31 2025 +0000
fix split k
commit fde443bc38fe60e52195817ecb2c7b20d772eedb
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 4 07:16:36 2025 +0000
fix flatmm with scaling when WarpTileM == 32
commit 5a0667afa889a5af8c6b8509232eabd50cf5efef
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 1 11:01:23 2025 +0000
optimize scaling epilogue
commit 5c3502bbf71833c6f6f7d4a1cc4f4fd93811f522
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 1 07:28:38 2025 +0000
fix wrong config for fp8 scaling
commit eb2d0653cdb86603cb11539cbac466b6431b58b7
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 30 06:20:30 2025 +0000
prune debug message
commit 0c089cb56343a39e02a1ee38e9cabeb71ba35e92
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 30 04:52:08 2025 +0000
fix compile error
commit 61759ca30ce3787f70e228c3919b3e4d354016dd
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Tue Jul 29 15:42:58 2025 +0000
Add persistent option on flatmm for tuning
commit b36dc5dd55f15fc1ce8eb21637bdec862e56a883
Author: AMD-dteng <dteng@amd.com >
Date: Tue Jul 29 22:48:00 2025 +0800
update pipeline v1: add atomic IGLP schedule
commit f886f26994454fc2b4fc3433c86bf699767a2a7c
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 24 09:09:27 2025 +0000
fix error log throwing
commit 4b4686ab144daa9061fbda17f3df4c17600c8e9a
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Jul 28 08:24:51 2025 +0000
crz idea
commit 7099af44a81be41431ba70ae60827b60116d02d2
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Sun Jul 27 11:57:38 2025 +0000
Add permuteN optimzization when NRepeat % 2 == 0 on flatmm
commit b147524c92e69a267337c8e48b6e64bcb1483551
Author: sjfeng <j514681085@icloud.com >
Date: Sun Jul 27 17:24:08 2025 +0800
try to remove c_shuffle_lds
commit 2dd94f59d1a7740a5689e1713ed45588cd0d55dd
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Jul 25 07:41:48 2025 +0000
fix loop-dim mismatch and improve c_shuffle alu parallelism
commit 4e93f0c5e27806adc070e4caa81661069295751c
Merge: 3f12ef5aa 0eb7455f1
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 24 08:46:51 2025 +0000
merge flatmm -scale
commit 3f12ef5aa52ced1bff3bfb57b878358330e9e095
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 24 16:19:58 2025 +0800
revert delete of inc file
commit 08c3a0d184d7581dc5be364f5b36f16fb4a8d6fa
Author: solin <bingzhou@amd.com >
Date: Thu Jul 24 04:38:16 2025 +0000
reorg flatmm code
commit 0eb7455f106604d5254ed16b0daeda68e2a148e3
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 23 19:12:31 2025 +0000
fix flatmm syntax error on gfx950
commit 695ff87e68fdcbe28452c1805cd4dbb643c45495
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 23 19:04:22 2025 +0000
support flatmm scaling
commit e3c29d9dea8758db96b998982ccc8bd1c4e8298d
Author: valarLip <340077269@qq.com >
Date: Wed Jul 23 08:44:12 2025 +0000
merge flatmm pipe v0 from dteng_flatmm_opt
commit 425c366fa4c30426ff36cade89b39fd8cb7b9732
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 23 15:38:12 2025 +0800
build pass
commit 6b377a9481535696de40f175d7e2159263d21bdc
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 23 07:20:26 2025 +0000
fix bug
commit b6dc58d1ea676fe480c0243ae098c875498f6d6a
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 23 15:01:53 2025 +0800
sync
commit 904359f401866ee810484e6b8f5b46d79d9e25c8
Author: valarLip <340077269@qq.com >
Date: Tue Jul 22 08:09:35 2025 +0000
adaptive scheduler instead of Macro definition
commit f29916c17228c17de9923aab62e7d72d7a30f4e9
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 17 08:40:35 2025 +0000
fix tail handler bug
commit e2c60a90929fec955d91db909d50db538d58363b
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 16 10:12:19 2025 +0000
merge from dteng_flatmm_opt
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: AMD-dteng <dteng@amd.com >
Co-authored-by: solin <bingzhou@amd.com >
Co-authored-by: sjfeng <j514681085@icloud.com >
Co-authored-by: valarLip <340077269@qq.com >
Co-authored-by: asleepzzz <hanwen.chang@amd.com >
Co-authored-by: Feng Shijie <Shijie.Feng@amd.com >
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: Gino Lu <gino.lu@amd.com >
Co-authored-by: mtgu0705 <mtgu@amd.com >
* Fix crash on small M
* Apply suggestion from @Copilot
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: AMD-dteng <dteng@amd.com >
Co-authored-by: solin <bingzhou@amd.com >
Co-authored-by: sjfeng <j514681085@icloud.com >
Co-authored-by: valarLip <340077269@qq.com >
Co-authored-by: asleepzzz <hanwen.chang@amd.com >
Co-authored-by: Feng Shijie <Shijie.Feng@amd.com >
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: Gino Lu <gino.lu@amd.com >
Co-authored-by: mtgu0705 <mtgu@amd.com >
[ROCm/composable_kernel commit: e135dd518d ]
2025-10-31 11:29:05 +08:00
lalala-sh
63e0a73bd3
[CK_TILE] Update flatmm related kernels ( #3022 )
...
---------
Co-authored-by: Ding, Yi <yi.ding@amd.com >
Co-authored-by: felix <felix.li@amd.com >
[ROCm/composable_kernel commit: 211d64e18a ]
2025-10-22 22:36:11 +08:00
linqunAMD
511f170dab
[CK_TILE] Refine fp8 support in flatmm ( #2239 )
...
* [CK_TILE] Refine fp8 in flatmm
1. Replace USING_MFMA_16x16x32 & USING_MFMA_16x16x32 with constexpr
2. Add an additional const check to avoid build error in HotLoopScheduler
3. Refine shuffleb to support both tile 32x32 and 16x16
4. Support command option -init
5. Move Gemm warp defintion to a separate struct
* fix clang format
* fix clang format
* keep default bhavior unchanged (warp tile = 16x16)
* fix tile engine build error
* fix a typo in codegen_utils.py
* address review comments
* address review comments
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
[ROCm/composable_kernel commit: 37e1a27537 ]
2025-06-25 01:07:45 -07:00
Khushbu Agarwal
7afee6c536
fix flatmm kernel for bigger size for fp16 datatype ( #2302 )
...
[ROCm/composable_kernel commit: bd270fe4bc ]
2025-06-10 11:13:40 -07:00
BingYuan.Zhou
977b7d0928
Flatmm merge ( #2168 )
...
* sync with function interface of cshuffleepiloge,fix flatmm build fail
* move code from solin/flatmm which add mfma16*16*32fp8 and optimize flatmm
---------
Co-authored-by: solin <bingzhou@amd.com >
[ROCm/composable_kernel commit: 6a3960c1e1 ]
2025-05-08 12:59:57 +08:00
Illia Silin
df8daef645
Revert "Add ck tile examples to package ( #1880 )" ( #2150 )
...
[ROCm/composable_kernel commit: 9a9f59ae69 ]
2025-04-30 10:20:16 -07:00
jakpiase
d48f3b585b
Add ck tile examples to package ( #1880 )
...
* add ck tile examples to package
* Update jenkinsfile
* fix for jenkinsfile
* fix for building ck tile code on non gfx9
* compile ck tile examples only for gfx94
* include ck tile examples in all target
* fix for basic gemm UseStructuredSparsity
* Update CMakeLists.txt
* Update gemm_pipeline_problem.hpp
* add targets to rocm install
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
[ROCm/composable_kernel commit: 434d19f696 ]
2025-04-28 09:53:19 -07:00
BingYuan.Zhou
4ec293cb4b
[flatmm] implement basic fp16 flatmm ( #2089 )
...
* [flatmm] implement basic fp16 flatmm
* fix CI build fail
---------
Co-authored-by: root <root@hjbog-srdc-50.amd.com >
Co-authored-by: solin <bingzhou@amd.com >
[ROCm/composable_kernel commit: eaf1f0bf3b ]
2025-04-16 16:51:17 +08:00