Files
composable_kernel/include/ck/tensor_operation/gpu/block
Anthony Chang 76764d8c92 Manual control of MAC cluster for improved interwave performance (#184)
* manual control of MAC cluster for improved 2-wave performance

ensure setprio's order; ensure inner loop size >= local read size

synchronize when single mac cluster

* format

* use value field from ck::integral_constant

* roll out inter-wave loop scheduler to c-shuffle gemm variants

will gradually roll out to other applicable device ops when occasional reg spill is resolved

* additional comments

* format

* fix mismatch between inter-wave pipeline and interwave blockwise gemm

* address review feedback

* amend
2022-05-10 19:19:22 -05:00
..