mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-04 21:51:28 +00:00
Add contraction profiler and tests (#701)
* Add contraction profiler and tests * Build and style fixes * Allow to use any elementwise operator for ref_contraction * Introduce profile_contraction_scale and profile_contraction_bilinear * Make ref_contraction generic and extend interface tests * Stylistic minor fixes * Extend test_contraction_interface
This commit is contained in:
@@ -46,3 +46,33 @@ out_n_k_ho_wo: dim 4, lengths {128, 256, 36, 36}, strides {331776, 1, 9216, 256}
|
||||
....
|
||||
Best Perf: 1.42509 ms, 102.988 TFlops, 234.086 GB/s
|
||||
```
|
||||
|
||||
## Profile contraction kernels
|
||||
```bash
|
||||
#arg1: tensor operation (contraction_bilinear=CONTRACTION+Bilinear)
|
||||
#arg2: data type (0: fp32; 1: f64)\n"
|
||||
#arg3: matrix layout (0: A[m0, m1, k0, k1] * B[k0, k1, n0, n1] + D[m0, m1, n0, n1] = E[m0, m1, n0, n1];
|
||||
# 1: A[m0, m1, k0, k1] * B[n0, n1, k0, k1] + D[m0, m1, n0, n1] = E[m0, m1, n0, n1];
|
||||
# 2: A[k0, k1, m0, m1] * B[k0, k1, n0, n1] + D[m0, m1, n0, n1] = E[m0, m1, n0, n1];
|
||||
# 3: A[k0, k1, m0, m1] * B[n0, n1, k0, k1] + D[m0, m1, n0, n1] = E[m0, m1, n0, n1])
|
||||
#arg4: verification (0: no; 1: yes)
|
||||
#arg5: initialization (0: no init; 1: integer value; 2: decimal value)
|
||||
#arg6: print tensor value (0: no; 1: yes)
|
||||
#arg7: time kernel (0: no, 1: yes)
|
||||
#arg8 and arg9: alpha and beta
|
||||
#arg10 to 15: M0, M1, N0, N1, K0, K1
|
||||
#arg16 to 31: Strides for A, B, D and E (skip for default)
|
||||
|
||||
################ op datatype layout verify init log time alpha beta M0 M1 N0 N1 K0 K1
|
||||
./bin/ckProfiler contraction_bilinear 0 1 0 0 0 1 1.0 1.0 128 128 128 128 128 128
|
||||
```
|
||||
|
||||
Result (MI100)
|
||||
```bash
|
||||
a_m_k: dim 4, lengths {128, 128, 128, 128}, strides {2097152, 16384, 128, 1}
|
||||
b_k_n: dim 4, lengths {128, 128, 128, 128}, strides {128, 1, 2097152, 16384}
|
||||
d_m_n: dim 4, lengths {128, 128, 128, 128}, strides {2097152, 16384, 128, 1}
|
||||
e_m_n: dim 4, lengths {128, 128, 128, 128}, strides {2097152, 16384, 128, 1}
|
||||
....
|
||||
Best Perf: 211.405 ms, 41.6077 TFlops, 15.2372 GB/s
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user