Files
composable_kernel/include
Max Podkorytov bbf5c5e926 Replace generate_tuple lambda with pack expansion in InitializeElementSize
The InitializeElementSize function used generate_tuple with a lambda to
compute visible dimension lengths. Each TensorDescriptor type created
a unique lambda type, causing 78 instantiations (385ms).

Replace with direct pack expansion using helper functions, eliminating
the lambda instantiation overhead entirely.

Results on example_grouped_conv_fwd_xdl_fp16:
- generate_tuple lambdas: 178 -> 100 (44% reduction)
- Template instantiation time: 19.5s -> 19.0s
2026-01-16 21:46:50 -06:00
..