composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-17 09:08:35 +00:00

Files

lalala-sh 39ba03f25d Moe gemm activation (#2026 )

* fix useless code and remove usless oob

* clang format

* fix coredump in e2e test

* fix2

* fix clang format

* fix output oob

* impl int64 but result not correct

* int64 index ok now

* input output all ok

* fix uint32

* revert v1 test

* use uint32

* mork to support 13w tokens

* moe sorting fix moebuf

* fix merge

* update moe api fix aiter build

* fix buid

* fuse silu

* silu ok

* acale ok

* add silu

* change code

* gemm2 ok

* gufusion compatible ok, fix warnings

* gu fusion for m32 m64 ok

* support bf16 cshuffle

* i4 gemm2 ok

* i4 gemm2 ok and i4 gemm1 build

* 16x16 run ok

* change flops; change cshuffle dtype

* fuse gelu silu act in moe gemm1

* fp8 with act ready

* int4 act ready

* remove useless changes

* remove useless code change

* fix clang format

* add the arch limit of int4 moe gemm

* fuse moe activation

* fix fp8 16x16

* fix no quant case

* fix bugs

* fix fp8 gufusion bug

* remove useless comments

* refine activation code & complete moe example

* fix int8 bugs

* merge tkw1

---------

Co-authored-by: coderfeli <coderfeli@163.com>
Co-authored-by: feli <felix.li@amd.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: root <root@hjbog-srdc-51.amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

2025-04-23 10:35:34 +08:00

reduction_functions_threadwise.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

threadwise_contraction_dl.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

threadwise_gemm_dlops_v3.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

threadwise_tensor_slice_set.hpp

update copyright headers (#726 )