Yi DING
b726f9606c
[CK_TILE] Generate random tensor values with multiple threads ( #3324 )
...
[ROCm/composable_kernel commit: c1c2e41a03 ]
2025-12-09 11:02:33 +08:00
Aviral Goel
a535de0f75
chore(copyright): update copyright header for example directory ( #3273 )
...
* chore(copyright): update copyright header for codegen directory
* chore(copyright): update copyright header for example directory
[ROCm/composable_kernel commit: d85f065b15 ]
2025-11-24 18:02:41 -08:00
Aviral Goel
358f7ab285
fix(copyright header): add header to missing files ( #2807 )
...
[ROCm/composable_kernel commit: f3239395dc ]
2025-09-11 12:27:08 -07:00
Illia Silin
c217c0fa93
Fix latest AITER failure and add more AITER tests in CK CI. ( #2782 )
...
* add aiter tests and move json_dump header
* remove example/include path from cmake
* extend time for aiter and pytorch stages
[ROCm/composable_kernel commit: ef6c28e989 ]
2025-09-04 13:44:00 -07:00
rahjain-amd
7674eb6416
Add json dump support to output details from CK/CKTile Examples. ( #2551 )
...
* Adding RapidJson Library
* Adding Json Dumps in all CK_Tile Examples
Not verified yet
* Adding json to cktile Batched Transpose
* adding json dumps to layernorm2d_fwd
* Adding json dump to flatmm_basic
* Adding RapidJson Library
* Adding Json Dumps in all CK_Tile Examples
Not verified yet
* Adding json to cktile Batched Transpose
* adding json dumps to layernorm2d_fwd
* Adding json dump to flatmm_basic
* Adding json in 03_gemm
* Add json dump to 16_batched_gemm
* Add json dump to gemm_multi_d_fp16
* Add json dump to grouped_gemm
* fix fmha_bwd/fwd
* Fix clang-format errors
exclude include/rapidjson in jenkins as its a third-party library
* Saparating function and defination.
* Update Documentation of 03_gemm
* Refactoring as per code review
* Disable fp8 instances on unsupported targets (#2592 )
* Restrict building of gemm_universal_preshuffle_f8 instances to specific targets in CMakeLists.txt
* Add condition to skip gemm_xdl_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt
* Add conditions to skip unsupported targets for gemm_universal_preshuffle_f8 and gemm_xdl_universal_preshuffle_f8 instances in CMakeLists.txt
* Refine conditions to exclude gemm_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt
---------
Co-authored-by: AviralGoelAMD <aviralgoel@amd.com >
* fix clang format
* remove duplicate lines of code from library/src/tensor_operation_instance/gpu/CMakeLists.txt
* Fixing Readme and unifying jsondumps
* adding moe_smoothquant
* adding fused_moe
* Fixing Readme for batched_gemm
* Fixing Readme for grouped_gemm
* adding flatmm
* adding gemm_multi_d_fp16
* adding elementwise
* adding File name when json is dumped
* Fixing Reduce after merge
* adding batched_transpose
* Adding Warptile in Gemm
* Fixing Clang Format
---------
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
Co-authored-by: AviralGoelAMD <aviralgoel@amd.com >
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: 4d041837ad ]
2025-09-02 23:31:29 -07:00
Illia Silin
3345f5f417
upgrade from clang-format-12 to clang-format-18 ( #2568 )
...
* upgrade to clang-format-18
* update to clang-format-18 in pre-commit-config
[ROCm/composable_kernel commit: 504b101da3 ]
2025-07-28 11:34:07 -07:00
carlushuang
34ca5f6a68
[CK_TILE] moe sorting optimize local_token ( #2469 )
...
* fix bug in loops that need use local tokens to compute
* support extra chain local_token
* update
* update
* refine some main
* update
* support dispatch_policy
* fix 15 example
[ROCm/composable_kernel commit: cfe211cc60 ]
2025-07-15 09:42:18 +08:00
carlushuang
8660f6ef22
[CK_TILE] moe_sorting support "local_tokens" feature for EP case ( #2335 )
...
* support local_token for hipgraph
* update README
* fix comment
* fix fmoe example
[ROCm/composable_kernel commit: a4e1248dba ]
2025-06-18 10:49:43 +08:00
carlushuang
807501ac3d
[CK_TILE] optimize moe sorting kernel, boost large context case up to 20x ( #2153 )
...
* combine 2-3 as single stage
* support zeroing
* improve long tokens
* update specialization
* b16 ws
* 8bit topk optimize
* update 15 example
[ROCm/composable_kernel commit: 4e9b76f88c ]
2025-05-06 17:32:07 +08:00
carlushuang
581c75f3b7
[CK_TILE] add moe-sorting MP kernel ( #1910 )
...
* moe sorting ex
* fix bug for race condition
* fix bug and optimze large expert
* fix
* optimize with sub_token_oneshot
* support skip empty tokens for expert sorting
* update moe_sorting
* tidy code
* support mp kernel
* hint mp
* remove use less code
* porting to example 15
---------
Co-authored-by: valarLip <340077269@qq.com >
[ROCm/composable_kernel commit: 353a612b44 ]
2025-02-25 17:56:55 +08:00
valarLip
baf4710ef6
porting fmoe_sorting from moe_sorting ( #1884 )
...
* porting fmoe_sorting from moe_sorting
* pass default example test
* remod
[ROCm/composable_kernel commit: 0e5e29c4e2 ]
2025-02-13 15:34:34 +08:00
carlushuang
2fec988802
[CK_TILE] Fix mock token id, support g1u1/g1u0 through same inline code block ( #1808 )
...
* fix mock token id
* prepare host for g1u1
* reformat inline-asm
* restructure uk_0
* restructure gate_up
* done
* change default to init=1
* update readme
* fix a bug in interleave pipeline
* rcp for silu
[ROCm/composable_kernel commit: 1ff50e78c6 ]
2025-01-16 17:51:10 +08:00
carlushuang
8acce2dee1
[CK_TILE] fused-moe first version ( #1634 )
...
* moe pipeline
* update code
* compile OK
* update
* update cpu reference
* update pipeline_gemm0
* compiler ok
* update pipeline
* rename to ex pipeline
* block-asm
* update
* update
* update first gemm ok
* compute correct
* update file structure
* update README
* update
* update
* update code
* update API
* return unsupport case
* add comment
* update readme
* update
* uncomment
* update
* fix build err
---------
Co-authored-by: valarLip <340077269@qq.com >
[ROCm/composable_kernel commit: 440e28b08f ]
2024-11-26 11:14:56 +08:00