Illia Silin
504b101da3
upgrade from clang-format-12 to clang-format-18 ( #2568 )
...
* upgrade to clang-format-18
* update to clang-format-18 in pre-commit-config
2025-07-28 11:34:07 -07:00
Satyanvesh Dittakavi
4c57157d50
Do not use warpSize as compile time constant as it is removed ( #2320 )
...
* Do not use warpSize as compile time constant as it is removed
* Update tile_image_to_column_shape.hpp
update warpSize usage.
* clean-up all use of warpSize, make sure code builds
* fix
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
Co-authored-by: Bartlomiej Kocot <barkocot@amd.com >
2025-06-17 11:54:30 -07:00
Illia Silin
a88bf76ecc
Replace buffer load/store intrinsics with builtins ( #1876 )
...
* replace buffer load/store intrinsics with builtins
* fix clang format
* replace buffer load/store intrinsics with built-ins in ck_tile
* fix clang format
* add switch between buffer intrinsics and built-ins
* change the builtins threshold to clang20
* fix clang format
* fix some compilation errors
* revert changes in ck_tile
* revert changes in ck_tile
* delete all root files and folders when CI completes
* try changing the username in CI
* fix groovy syntax
* add user and group id info to ci dockers
* change ownership of all files in CI to jenkins at the end
* update changelog
2025-03-05 14:33:28 -08:00
carlushuang
c0adab4850
[CK_TILE] moe sorting ex kernel to support expert > 128 ( #1840 )
...
* moe sorting ex
* fix bug for race condition
* fix bug and optimze large expert
* fix
* optimize with sub_token_oneshot
* support skip empty tokens for expert sorting
* update moe_sorting
* tidy code
2025-02-11 17:49:17 +08:00
carlushuang
1ff50e78c6
[CK_TILE] Fix mock token id, support g1u1/g1u0 through same inline code block ( #1808 )
...
* fix mock token id
* prepare host for g1u1
* reformat inline-asm
* restructure uk_0
* restructure gate_up
* done
* change default to init=1
* update readme
* fix a bug in interleave pipeline
* rcp for silu
2025-01-16 17:51:10 +08:00
carlushuang
3d15f364b3
[CK_TILE] optimize moe-sorting kernel ( #1771 )
...
* opt moe sorting
* remove commented code
2024-12-23 10:59:02 +08:00
Xu, Shengnan
f57d720c67
added moe interleaving pipeline ( #1712 )
...
* added moe interleaving pipeline
* remove redundant code
* formater
---------
Co-authored-by: root <root@hjbog-srdc-14.amd.com >
2024-12-15 20:13:10 +08:00
carlushuang
440e28b08f
[CK_TILE] fused-moe first version ( #1634 )
...
* moe pipeline
* update code
* compile OK
* update
* update cpu reference
* update pipeline_gemm0
* compiler ok
* update pipeline
* rename to ex pipeline
* block-asm
* update
* update
* update first gemm ok
* compute correct
* update file structure
* update README
* update
* update
* update code
* update API
* return unsupport case
* add comment
* update readme
* update
* uncomment
* update
* fix build err
---------
Co-authored-by: valarLip <340077269@qq.com >
2024-11-26 11:14:56 +08:00
dummycoderfe
bec6fbc65f
Ck tile/moe sorting ( #1624 )
...
* add moe_sorting & check ok
* fix comments & typo
* Run remod.py under include/ck_tile & example/ck_tile directories
* format codes
* fix output ci check bug
* fix moe sorting readme and error commit file
* use magiv div to accelerate compute
* add an loop unroll for moe lds ops
* add extblocksnel to set zeros for moebufs
* [Ck_tile] moe set zero run ok, add size check and fix ref check
* [Ck_tile]fix moe_sorting fuse set_zero remod
* [Ck_tile] change name style, fix zero buffer size err, change folder
* [Ck_tile] moe_sorting: fix name style
* [Ck_tile] moe_sorting, remove useless params in traits
* [Ck_tile] change outputtile cnt * unit_size; change output buf alloc
---------
Co-authored-by: dummycoderfe <noplydummmycoder@163.com >
Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
2024-11-09 17:57:27 +08:00