mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-29 19:28:33 +00:00

Files

Aviral Goel c7eb33078c [rocm-libraries] ROCm/rocm-libraries#6302 (commit 8d419e8)

CK: Remove 41 commented-out dead code blocks (~200 lines) (#6302)

Depends on #6300

## Summary

Remove 41 commented-out code blocks across 33 files in Composable
Kernel, totaling ~200 lines.

Identified using an automated dead code scanning skill (`ck-dead-code`)
with a calibrated two-stage pipeline:
1. **Pre-filter**: Keyword-based scan found 1,338 `//`-commented blocks.
Calibrated heuristics (trained on 50-sample expert classification)
reduced to 89 high-confidence candidates — 93% noise reduction.
2. **Expert triage**: LLM expert classified each block in context as
CODE_REMOVE, CODE_KEEP, or NOT_CODE.

| Classification | Count |
|---------------|-------|
| Removed (this PR) | 41 |
| Kept (debug helpers, alt configs, reference impls) | 32 |
| Not code (false positives) | 16 |

Removed blocks include: superseded implementations, old test data,
abandoned stubs, unreachable code, and buggy dead code.

2026-04-10 11:17:11 -04:00

instances

[rocm-libraries] ROCm/rocm-libraries#6302 (commit 8d419e8)

2026-04-10 11:17:11 -04:00

misc

[CK_TILE]Moe update index (#1672 )

2024-11-25 13:12:35 +08:00

script

chore(copyright): update copyright header for example directory (#3273 )

2025-11-24 18:02:41 -08:00

CMakeLists.txt

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

moe_smoothquant.cpp

chore(copyright): update copyright header for example directory (#3273 )

2025-11-24 18:02:41 -08:00

moe_smoothquant.hpp

chore(copyright): update copyright header for example directory (#3273 )

2025-11-24 18:02:41 -08:00

README.md

[DOCS] Documentation Addition (Readme updates) (#2495 )

2025-10-16 03:10:57 -07:00

README.md

MoE-SmoothQuant with CK Tile

This example demonstrates MoE-SmoothQuant, a fused quantization operation for Mixture-of-Experts (MoE) models, using the CK Tile programming model. Unlike standard SmoothQuant, the input scale is expert-dependent, and the operation is fused with top-k expert selection. Specifically, it quantizes the top-k experts' outputs for each token using their respective expert scales. The input scale is from different expert [expert, hidden], and we need reuse the topk-id from previous topk-softmax and select the corresponding expert from current topk, and expand the output/per-token-scale by topk.

This diagram depicts moe-smoothquant using ck_tile tile-programming implementation.

Algorithm and Math

Given:

Input: X of shape [\text{tokens}, \text{topk}, \text{hidden}]
Expert scales: S of shape [\text{experts}, \text{hidden}]
TopK indices: I of shape [\text{tokens}, \text{topk}]

Steps:

For each token t and its k selected experts:
- Select scale S_{I_{t,k}, :} for the $k$-th expert.
- Scale: Y_{t,k,j} = X_{t,k,j} \cdot S_{I_{t,k}, j}
Rowwise Dynamic Quantization (per token-expert pair):
- s_{t,k} = \max_j |Y_{t,k,j}| / 127
- Q_{t,k,j} = \text{round}(Y_{t,k,j} / s_{t,k}), Q_{t,k,j} \in \text{int8}

Output:

Quantized tensor Q (int8)
Per-token-expert scale s (fp32)

Tile Programming Model

Tiles: Each thread block processes a tile (block of tokens, experts, or hidden units).
Tile Engine: Loads input, selects expert scales via top-k indices, applies scaling and quantization, and writes results.
Pipeline: Modular, can be extended for further fusion.

Build & Run

mkdir build && cd build
sh ../script/cmake-ck-dev.sh ../ <arch>
make tile_example_moe_smoothquant -j`nproc`
./bin/tile_example_moe_smoothquant -?

Source Structure

Kernel: moe_smoothquant.hpp (tile-programming kernel template)
Executable: moe_smoothquant.cpp
Build: CMakeLists.txt, instances/, misc/, script/

Technical Notes

Expert-dependent scaling: Each token's top-k experts use their own per-hidden-unit scale, requiring indirect indexing and efficient memory access.
Fused with top-k: The kernel uses top-k indices from gating to select the correct expert scale for each token.
Rowwise quantization: Each token-expert pair is quantized independently for maximum accuracy.

09_topk_softmax: TopK-Softmax for MoE gating
13_moe_sorting: MoE sorting for expert dispatch
12_smoothquant: Standard SmoothQuant

For distribution, see include/ck_tile/tile_program/tile_distribution/.

Back to CK Tile Examples

example

args:
          -t    tokens dimension (default:3328)
          -h    hidden_size dimension (default:4096)
          -e    experts (default:32)
          -k    topk (default:5)
     -stride    stride per row, if -1 then equal to hidden_size (default:-1)
          -v    cpu validation or not (default:1)
      -kname    print kernel name or not (default:1)
     -prec_i    input precision, fp16/bf16 (default:fp16)
     -prec_o    precision, int8/fp8 (default:int8)
     -warmup    cold iter (default:5)
     -repeat    hot iter (default:20)
       -json    0: No Json, 1: Dump Results in Json format (default:0)
   -jsonfile    json file name to dump results (default:moe_smoothquant.json)

README.md

MoE-SmoothQuant with CK Tile

Algorithm and Math

Tile Programming Model

Build & Run

Source Structure

Technical Notes

Related CK Tile Examples

example