Replace specific benchmark numbers with qualitative descriptions since
measurements vary across environments and may become outdated.
Co-authored-by: Claude <noreply@anthropic.com>
[ROCm/composable_kernel commit: 3f04d27b68]
This document describes techniques for reducing C++ template instantiation
overhead in the Composable Kernel codebase, including:
- Replacing recursive templates with pack expansion (O(N) → O(1) depth)
- Using named functors instead of lambdas to share instantiations
- Replacing template recursion with constexpr loops
- Using fold expressions for accumulation operations
These techniques can significantly reduce build times for template-heavy code.
[ROCm/composable_kernel commit: b66597ed96]