Max Podkorytov
79aae7c7f7
[CK Tile] enable building examples by default ( #3259 )
...
* remove EXCLUDE_FROM_ALL from ck-tile examples
-> +15 min build time w/ 64 threads for a single arch
* fix cpp17 compile error in the ck-tile examples
---------
Co-authored-by: khuagarw <khuagarw@amd.com >
Co-authored-by: Ding, Yi <yi.ding@amd.com >
2025-11-26 16:24:44 -08:00
Aviral Goel
d85f065b15
chore(copyright): update copyright header for example directory ( #3273 )
...
* chore(copyright): update copyright header for codegen directory
* chore(copyright): update copyright header for example directory
2025-11-24 18:02:41 -08:00
felix
4c826abfff
Felix/opt sorting ( #2902 )
...
* merge felix/sorting
* opt moe sorting (#2822 )
* opt moe storing for 2k
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: coderfeli <coderfeli@163.com >
2025-10-15 09:24:03 +08:00
linqunAMD
9fcc1ee9fd
Support Wave32 in CK_TILE - Part 1 ( #2594 )
...
* Support wave32/wave64 in CK_TILE - Part 1
* remove blocksize in kernel launch
* fix build error
* fix clang format
* fix clang format 2
* fix clang format 3
* fix fmha build error
* fix fmha build 2
* fix fmha build 3
* fix build error 4
* address review comment
* update change log
* replace KernelBlockSize with kBlockSize
* fix CI fail
* fix clang format
* address review comment and rebase code.
* fix universal test fail
---------
Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com >
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-08-18 10:08:31 -07:00
Illia Silin
504b101da3
upgrade from clang-format-12 to clang-format-18 ( #2568 )
...
* upgrade to clang-format-18
* update to clang-format-18 in pre-commit-config
2025-07-28 11:34:07 -07:00
carlushuang
cfe211cc60
[CK_TILE] moe sorting optimize local_token ( #2469 )
...
* fix bug in loops that need use local tokens to compute
* support extra chain local_token
* update
* update
* refine some main
* update
* support dispatch_policy
* fix 15 example
2025-07-15 09:42:18 +08:00
carlushuang
a4e1248dba
[CK_TILE] moe_sorting support "local_tokens" feature for EP case ( #2335 )
...
* support local_token for hipgraph
* update README
* fix comment
* fix fmoe example
2025-06-18 10:49:43 +08:00
carlushuang
4e9b76f88c
[CK_TILE] optimize moe sorting kernel, boost large context case up to 20x ( #2153 )
...
* combine 2-3 as single stage
* support zeroing
* improve long tokens
* update specialization
* b16 ws
* 8bit topk optimize
* update 15 example
2025-05-06 17:32:07 +08:00
felix
a82f338fb9
hotfix fix sorting int64 ( #2025 )
...
* fix sorting int64
* clang format
* fix example issue
* update WA issue #
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
2025-03-28 11:31:52 +08:00
carlushuang
e3c9886cdf
[CK_TILE] return value with macro in ck_tile::kernel_launch API ( #1982 )
...
* return value with macro and revert the return value
* [CK-TILE] no-macro launch api solution (#1992 )
* no-macro solution
* address -Wcomma
---------
Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com >
2025-03-20 11:00:29 -07:00
valarLip
52b1cd7780
hotfix fmoe build issue ( #1976 )
2025-03-13 15:11:59 +08:00
carlushuang
353a612b44
[CK_TILE] add moe-sorting MP kernel ( #1910 )
...
* moe sorting ex
* fix bug for race condition
* fix bug and optimze large expert
* fix
* optimize with sub_token_oneshot
* support skip empty tokens for expert sorting
* update moe_sorting
* tidy code
* support mp kernel
* hint mp
* remove use less code
* porting to example 15
---------
Co-authored-by: valarLip <340077269@qq.com >
2025-02-25 17:56:55 +08:00
valarLip
0e5e29c4e2
porting fmoe_sorting from moe_sorting ( #1884 )
...
* porting fmoe_sorting from moe_sorting
* pass default example test
* remod
2025-02-13 15:34:34 +08:00
carlushuang
c0adab4850
[CK_TILE] moe sorting ex kernel to support expert > 128 ( #1840 )
...
* moe sorting ex
* fix bug for race condition
* fix bug and optimze large expert
* fix
* optimize with sub_token_oneshot
* support skip empty tokens for expert sorting
* update moe_sorting
* tidy code
2025-02-11 17:49:17 +08:00
carlushuang
1ff50e78c6
[CK_TILE] Fix mock token id, support g1u1/g1u0 through same inline code block ( #1808 )
...
* fix mock token id
* prepare host for g1u1
* reformat inline-asm
* restructure uk_0
* restructure gate_up
* done
* change default to init=1
* update readme
* fix a bug in interleave pipeline
* rcp for silu
2025-01-16 17:51:10 +08:00
carlushuang
3d15f364b3
[CK_TILE] optimize moe-sorting kernel ( #1771 )
...
* opt moe sorting
* remove commented code
2024-12-23 10:59:02 +08:00
carlushuang
440e28b08f
[CK_TILE] fused-moe first version ( #1634 )
...
* moe pipeline
* update code
* compile OK
* update
* update cpu reference
* update pipeline_gemm0
* compiler ok
* update pipeline
* rename to ex pipeline
* block-asm
* update
* update
* update first gemm ok
* compute correct
* update file structure
* update README
* update
* update
* update code
* update API
* return unsupport case
* add comment
* update readme
* update
* uncomment
* update
* fix build err
---------
Co-authored-by: valarLip <340077269@qq.com >
2024-11-26 11:14:56 +08:00