Files
composable_kernel/example/ck_tile/12_smoothquant
linqunAMD c254f3d7b4 [CK_TILE] Refine Generic2dBlockShape to fix ck_tile example 2,10,11,14 on rdna3 and 4 (#2795)
BlockWarps, WarpTile in Generic2dBlockShape are wave size dependent, it causes mangled name mismatch between host and device side.

Solution: Replace them with ThreadPerBlock and move BlockWarps, WarpTile calculation into Generic2dBlockShape
2025-09-10 08:29:20 +08:00
..
2024-11-01 13:51:56 +08:00

smoothquant

This folder contains example for smoothquant using ck_tile tile-programming implementation.

build

# in the root of ck_tile
mkdir build && cd build
sh ../script/cmake-ck-dev.sh  ../ <arch>  # you can replace this <arch> to gfx90a, gfx942...
make tile_smoothquant -j`nproc`

This will result in an executable build/bin/tile_smoothquant

cmdline

args:
          -m    m dimension (default:3328)
          -n    n dimension (default:4096)
   -x_stride    input stride per row, if -1 then equal to n (default:-1)
   -y_stride    output stride per row, if -1 then equal to n (default:-1)
          -v    cpu validation or not (default:1)
      -kname    print kernel name or not (default:1)
       -prec    precision (default:fp16)
     -warmup    cold iter (default:5)
     -repeat    hot iter (default:20)
       -json    0: No Json, 1: Dump Results in Json format (default:0)
   -jsonfile    json file name to dump results (default:smoothquant.json)