Files
composable_kernel/include
Max Podkorytov 991274aaaf Optimize sequence_gen and uniform_sequence_gen using __make_integer_seq
Replace recursive template instantiation with compiler intrinsic
__make_integer_seq and pack expansion for O(1) instantiation depth.

Before: Maximum nesting depth of 90 levels with recursive divide-and-conquer
After: Maximum nesting depth of 26 levels using flat pack expansion

Performance improvements measured on example_grouped_conv_fwd_xdl_fp16:
- Template instantiation wall-clock time: 36.8s -> 18.7s (49% faster)
- Template instantiation cumulative time: 56.6s -> 25.8s (54% faster)
- Maximum nesting depth: 90 -> 26 (71% reduction)

The key changes:
- sequence_gen: Uses __make_integer_seq to generate indices 0..N-1,
  then applies functor F via pack expansion in a single step
- uniform_sequence_gen: Uses __make_integer_seq with pack expansion
  to generate N copies of a constant value

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-15 21:15:57 -06:00
..