mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-01 20:21:23 +00:00
docs: add quant mode comparison to readme (#3032)
* docs: add quant mode comparison to readme * Update example/ck_tile/38_block_scale_gemm/README.md Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com> --------- Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com>
This commit is contained in:
@@ -4,9 +4,18 @@ This folder contains examples of quant GEMMs using the ck_tile tile-programming
|
||||
|
||||
- AQuant kernel with blocks of A matrix sharing scales: custom GEMM pipeline
|
||||
- BQuant kernel with blocks of B matrix sharing scales: custom GEMM pipeline
|
||||
- Row and Column-wise scaled: All of the rowwise elements in A Matrix and columwise elements in B Matrix will share the same quantization element and the elementwisde operation will complete in epilogue.
|
||||
- Row and Column-wise scaled: All of the row-wise elements in A Matrix and column-wise elements in B Matrix will share the same quantization element and the element-wise operation will complete in epilogue.
|
||||
- Tensor-wise scaled: Share the same scalar scale across the whole tensor of A or B
|
||||
|
||||
## Quantization Mode Comparison
|
||||
|
||||
| Quant Mode | A Matrix Organization | A Scale Shape | B Matrix Organization | B Scale Shape |
|
||||
|------------|----------------------|---------------|----------------------|---------------|
|
||||
| **AQuant** | Blocks along K dimension<br/>Each M×GroupSize block shares one scale | `[M, K/GroupSize]` | Not quantized | N/A |
|
||||
| **BQuant** | Not quantized | N/A | Blocks along K dimension<br/>Each GroupSize×N block shares one scale | `[K/GroupSize, N]` |
|
||||
| **RowColQuant** | Per-row quantization<br/>All K elements in each row share one scale | `[M, 1]` | Per-column quantization<br/>All K elements in each column share one scale | `[1, N]` |
|
||||
| **TensorQuant** | Tensor-wise quantization<br/>All M×K elements share one scale | `[1]` | Tensor-wise quantization<br/>All K×N elements share one scale | `[1]` |
|
||||
|
||||
---
|
||||
|
||||
## Features
|
||||
|
||||
Reference in New Issue
Block a user