mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 02:02:46 +00:00
added gemm universal example in readme (#2216)
[ROCm/composable_kernel commit: 990d645578]
This commit is contained in:
@@ -1,5 +1,29 @@
|
||||
[Back to the main page](../README.md)
|
||||
# Composable Kernel profiler
|
||||
## Profiler GEMM UNIVERSAL kernels
|
||||
```bash
|
||||
# arg1: tensor operation (gemm_universal: Universal GEMM)
|
||||
# arg2: data type (0: fp32; 1: fp16; 2: bf16; 3: int8; 4: f8@f16; 5: f16@f8; 6: f16->f8; 7: f8->bf16, comp f8; 8: f16@i4; 9: bf16@i4
|
||||
# arg3: matrix layout (0: A[m, k] * B[k, n] = C[m, n];
|
||||
# 1: A[m, k] * B[n, k] = C[m, n];
|
||||
# 2: A[k, m] * B[k, n] = C[m, n];
|
||||
# 3: A[k, m] * B[n, k] = C[m, n])
|
||||
# arg4: verification (0: no; 1: yes)
|
||||
# arg5: initialization (0: no init; 1: integer value; 2: decimal value)
|
||||
# arg6: print tensor value (0: no; 1: yes)
|
||||
# arg7: time kernel (0=no, 1=yes)
|
||||
# arg8 to 13: M, N, K, StrideA, StrideB, StrideC
|
||||
# arg14: split k into mulitiple batch
|
||||
# optional:
|
||||
# arg15: number of warm-up cycles (default 1)
|
||||
# arg16: number of iterations (default 10)
|
||||
# arg17: memory for rotating buffer (default 0, size in MB)
|
||||
|
||||
|
||||
################ op datatype layout verify init print time M N K StrideA StrideB StrideC SplitK WarmupCycles Iterations MemoryBuffer
|
||||
./bin/ckProfiler gemm_universal 1 0 1 1 0 1 4096 4096 4096 4096 4096 4096 1 1 10 0
|
||||
```
|
||||
|
||||
## Profile GEMM kernels
|
||||
```bash
|
||||
#arg1: tensor operation (gemm=GEMM)
|
||||
|
||||
Reference in New Issue
Block a user