Ding, Yi
331ccb8ca2
Merge remote-tracking branch 'origin/develop' into gfx950-mxfp4
2025-06-03 05:38:56 +00:00
Ding, Yi
0cd2e6e782
Fix OOB; add MB96 instances
2025-05-30 07:46:28 +00:00
Ding, Yi
69418725a6
Merge remote-tracking branch 'origin/moe_bs_fp8_no_asm' into gfx950-mxfp4
2025-05-30 03:15:47 +00:00
Ding, Yi
e4a40c7214
Add fp8 profiler instances
2025-05-29 08:19:31 +00:00
Bartłomiej Kocot
e7906dd644
Change relu to clamp for grouped conv fwd instances ( #2249 )
2025-05-29 00:51:25 +02:00
Adam Dickin
6df1c56ad6
Changes to allow MIOpen to build CK as part of its build. ( #2247 )
...
* tweaks to the miopen specific build. add way to skip clang-tidy checks and a way to skip some custom build targets MIOpen also has.
* move the tidy if statment
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2025-05-28 13:51:15 -07:00
BrianHarrisonAMD
e91be7d96a
Add option to disable offload compress for CK builds ( #2250 )
...
* Add option to disable offload compress for CK builds
* Remove gemm exe offload compress flag conditional
2025-05-28 13:47:56 -07:00
Ding, Yi
aecac410d0
Merge remote-tracking branch 'origin/f8blk_scale_opt' into wip-f4-mergemoe-2
2025-05-28 11:15:22 +00:00
Ding, Yi
ad7fd89c1d
Merge remote-tracking branch 'origin/feiw/mxfp4_moe_2Stages' into wip-f4
2025-05-28 09:28:26 +00:00
Ding, Yi
4f9bfb1566
Add more fp4 wp instances
2025-05-28 07:33:40 +00:00
Ding, Yi
857ef9f8c4
Merge preshuffle device
2025-05-28 07:02:28 +00:00
Ding, Yi
e2e0e0025e
Profiler add f4 wp
2025-05-28 05:12:39 +00:00
Ding, Yi
b99c50a5d5
pad ascale
2025-05-28 03:35:33 +00:00
Ding, Yi
cf5b4c11a2
Pad shuffled a scale only
2025-05-28 02:37:14 +00:00
Bartłomiej Kocot
b1ed92b131
Revert "Remove not needed bwd wei merged groups instances ( #2218 )" ( #2235 )
...
This reverts commit 4583aeffad .
2025-05-26 23:26:04 +02:00
Bartłomiej Kocot
4583aeffad
Remove not needed bwd wei merged groups instances ( #2218 )
...
* Grouped conv bwd wei add two stage instances for larger filter and Merge Groups
* Fix
* fix
* Revert "Restore oddc instances (#2201 )"
This reverts commit 6342f6b5e8 .
* fix
---------
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com >
2025-05-26 22:46:18 +02:00
Ding, Yi
91eb136937
Fix v1; use M padding
2025-05-26 10:32:26 +00:00
Andriy Roshchenko
fdfc9c6fd8
Merge remote-tracking branch 'origin/develop' into andriy/wip-f4
2025-05-23 23:02:43 +00:00
Andriy Roshchenko
f03da29b65
Merge branch origin/wip-f4 into andriy/wip-f4
2025-05-23 22:14:30 +00:00
Illia Silin
bc2551ac3b
disable building device_mha_operations by default ( #2225 )
2025-05-22 14:03:04 -07:00
Adam Dickin
417a6b65b6
Add MIOPEN_REQ_LIBS_ONLY option for cmake to build only the libs MIOpen requires ( #2224 )
...
* cut out anything we dont need for MIOpen to test
* refactor exclusion code to be more streamlined.
2025-05-22 11:14:33 -07:00
Ding, Yi
ce50d4bd62
Fix fp4 ckProfiler
2025-05-22 09:39:22 +00:00
Andriy Roshchenko
e302ab8f0c
Merge branch origin/develop into wip-fp4
2025-05-22 06:31:47 +00:00
Bartłomiej Kocot
ebc5a6ef87
Grouped conv bwd wei add for larger filter and Merge Groupes optimization ( #2197 )
...
* Grouped conv bwd wei add two stage instances for larger filter and Merge Groups
* Fix
* fix
* Restore removed instances
---------
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com >
2025-05-21 22:47:34 +02:00
OscarXu
fc9ef98e7b
Add gemm2 64x128x128 asm. Fix BF16 ref.
2025-05-21 16:57:57 +08:00
Thomas Ning
1386924749
Add the instances for small sized GEMM in preshuffle and improve CMake Flag ( #2212 )
...
* Add small instance, add the bug fix, & improve the example CMake
* clang format
2025-05-20 15:05:08 -07:00
OscarXu
b146191da2
v2 tok_weight WIP
2025-05-20 16:34:07 +08:00
Ding, Yi
667a356cc3
Add mx fp4 pileline v1 instances
2025-05-20 03:28:24 +00:00
Andriy Roshchenko
57e0f5df29
MX GEMM - Expand MX MFMA Testing to BF8, FP6, and BF6 Data Types ( #2199 )
...
* Unify test interface for different layouts.
* WIP: Introducing FP4/FP6/FP8 abstractions
* WIP: Introducing packed storage abstraction
* WIP: Introducing packed storage abstraction
* WIP: Improved support for FP6 data type
* Refactor packed storage for f6_t
* WIP: FP6 MFMA test
* Test if we correctly represent all FP6/FP4 numbers
* Additional output for failed FP4 test.
* More failing conversion tests
* Even more failing conversion tests
* Working FP6 MFMA tests
* Expand MX MFMA testing to BF8/6
* Update and verify MX MFMA test for packed types
* Fix fp4 and fp6 conversions on host
* Working MX MFMA tests for FP8/6/4
* Cleanup
* Add missing type
* Cleanup
* Final cleanup
* Restrict FP6/4 values output to CK_LOGGING=1
* Use CHAR_BIT instead of number 8
* Fix typo
* Remove FP6 and FP4 from the list of native types
---------
Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com >
2025-05-19 16:52:51 -05:00
Ding, Yi
ec240f391a
Fix mx f4 ckProfiler
2025-05-19 06:39:43 +00:00
mtgu0705
a4b5a374b9
Merge remote-tracking branch 'origin/wip-f4-pk' into mx_moe_f4_scale_shuffle
2025-05-17 09:49:24 -05:00
Bartłomiej Kocot
6342f6b5e8
Restore oddc instances ( #2201 )
2025-05-16 18:42:02 -07:00
Illia Silin
40668c9a99
Build and store CK library deb package for all targets daily. ( #2196 )
...
* generate and store library package for all targets
* use ninja to build packages for all targets
* make sure to use ftime-trace when using ninja
* make sure build trace only runs on gfx9
* archive lib package and stash only library package
2025-05-16 07:40:53 -07:00
OscarXu
ec8d00d58d
mx_moe_fp4 ready for aiter with clang-format.
2025-05-16 04:09:26 -05:00
OscarXu
c5be9a501b
v1 function pass.
2025-05-16 03:16:38 -05:00
Ding, Yi
70e0d94932
Add f4 profiler examples
2025-05-16 07:49:55 +00:00
Ding, Yi
dc30e7d025
Add f4 ckProfiler
2025-05-16 07:19:22 +00:00
Ding, Yi
c04d44b5f6
Merge remote-tracking branch 'origin/develop' into wip-f4
2025-05-16 07:11:26 +00:00
Bartłomiej Kocot
7c0e29cc0f
Extend 64x64 with 4 waves instances for grouped conv bwd wei ( #2187 )
...
* Extend 64x64 with 4 waves instnaces for grouped conv bwd wei
* Fix
* fix
* fix
2025-05-15 16:21:34 +02:00
OscarXu
f70f778e27
v1 compile pass. Function not ready
2025-05-15 08:01:56 -05:00
OscarXu
68dbe558df
compile error fix
2025-05-15 16:55:20 +08:00
OscarXu
17922821ec
Add gemm1 v1
2025-05-15 16:11:43 +08:00
OscarXu
98606fad94
Merge remote-tracking branch 'origin/wjx/moe_v3_aiter' into moe_bs_stage1_dev
2025-05-14 20:48:42 -05:00
mtgu0705
102151ebcf
temp save
2025-05-14 08:13:47 -05:00
aska-0096
74686bf008
tempsave. Almost all instances passed.
2025-05-14 10:15:59 +00:00
feifei14119
7be8730247
fix cpu ref
2025-05-14 10:22:18 +08:00
mtgu0705
6dfe24c53e
updated
2025-05-13 04:15:53 -05:00
valarLip
eaebefb278
update moe v1 pipeline
2025-05-13 08:42:49 +00:00
OscarXu
9ead312164
Gemm1 GUFusion function pass. Perf WIP
2025-05-13 15:32:39 +08:00
mtgu0705
5ba86c210b
updated and build passed
2025-05-13 14:49:37 +08:00