mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-06-30 11:47:48 +00:00
Lambdas create unique types per call site, causing duplicate template instantiations. Named functors are shared across call sites. Results: - container_concat: 186 → 93 instantiations (50% reduction) - Wall-clock: 518ms → 309ms (40% reduction)