mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-06-30 11:47:48 +00:00
Provide static_for_indexed as an alternative to static_for that takes
a struct with templated operator() instead of a lambda. This reduces
instantiation bloat because struct types are named and reused, unlike
lambdas which create unique types at each call site.
Usage:
struct MyLoop {
Array& arr;
template <index_t I>
void operator()() const { arr[I] = I * 2; }
};
static_for_indexed<0, 8, 1>{}(MyLoop{arr});
Note: This is opt-in for high-impact code paths. The example_grouped_conv
target has minimal static_for usage (26 instantiations), but larger targets
like blockwise_gemm_pipeline (90+ usages per file) would benefit more.