Files
composable_kernel/include
Max Podkorytov f65e9e4c99 Replace nested static_for lambdas with compile-time search helper
The GetTransformAndItsUpperDimension function used nested static_for
loops with lambdas to search for a hidden dimension in UpperDimensionIdss.
This caused 918 applier::operator() instantiations (81% of all applier
instantiations).

Replace with find_in_tuple_of_sequences helper that uses constexpr
array lookup and if-constexpr recursion, eliminating the lambda
instantiation overhead.

Results on example_grouped_conv_fwd_xdl_fp16:
- applier instantiations: 1132 -> 127 (89% reduction)
- TensorDescriptor instantiations: 2503 -> 664 (73% reduction)
- Template instantiation time: 23.4s -> 19.4s (17% reduction)
2026-01-22 01:11:15 +00:00
..