mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-01 20:21:23 +00:00
BlockWarps, WarpTile in Generic2dBlockShape are wave size dependent, it causes mangled name mismatch between host and device side. Solution: Replace them with ThreadPerBlock and move BlockWarps, WarpTile calculation into Generic2dBlockShape
moe-smoothquant
This folder contains example for moe-smoothquant using ck_tile tile-programming implementation.

Unlike standard smoothquant op, the input scale is from different expert [expert, hidden], we need reuse the topk-id from previous topk-softmax and select the corresponding expert from current topk, and expand the output/per-token-scale by topk
build
# in the root of ck_tile
mkdir build && cd build
sh ../script/cmake-ck-dev.sh ../ <arch> # you can replace this <arch> to gfx90a, gfx942...
make tile_example_moe_smoothquant -j`nproc`
This will result in an executable build/bin/tile_example_moe_smoothquant
example
args:
-t tokens dimension (default:3328)
-h hidden_size dimension (default:4096)
-e experts (default:32)
-k topk (default:5)
-stride stride per row, if -1 then equal to hidden_size (default:-1)
-v cpu validation or not (default:1)
-kname print kernel name or not (default:1)
-prec_i input precision, fp16/bf16 (default:fp16)
-prec_o precision, int8/fp8 (default:int8)
-warmup cold iter (default:5)
-repeat hot iter (default:20)
-json 0: No Json, 1: Dump Results in Json format (default:0)
-jsonfile json file name to dump results (default:moe_smoothquant.json)