composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-30 03:37:38 +00:00

Files

Max Podkorytov bce6ec11cd Optimize tensor descriptor functor template instantiation

Replace inline lambdas with named functor structs in transform_tensor_descriptor
to reduce template instantiation overhead and improve compile times.

Changes:
- Add three named functors in tensor_descriptor.hpp:
  - convert_visible_to_hidden_id: maps visible dimension ID to hidden ID
  - convert_visible_ids_to_hidden_ids: maps sequence of visible IDs to hidden IDs
  - generate_arithmetic_sequence_from_scan: generates consecutive hidden dim ID ranges

- Add utility functions in sequence_helper.hpp and tuple_helper.hpp:
  - unpack_and_merge_sequences(): unpacks tuple of sequences and merges them
  - generate_identity_sequences(): creates Tuple<Sequence<0>, Sequence<1>, ...>

- Update 14 call sites across threadwise transfer, wrapper, and device files
  to use generate_identity_sequences() instead of generate_tuple with lambdas

- Add comprehensive unit tests:
  - unit_sequence_helper.cpp: tests for new utility functions
  - unit_tensor_descriptor_functors.cpp: tests for new functors

Co-Authored-By: Claude <noreply@anthropic.com>

2026-01-29 14:26:43 -07:00

Optimize tensor descriptor functor template instantiation

2026-01-29 14:26:43 -07:00

ck_tile

[CK_Tile] Adding support for preshuffleQuant in AB quant Block Scale Gemm (#3629 )

2026-01-28 19:45:09 -08:00

rapidjson

Update pre-commit to fixed versions, run remod for ck_tile (#2895 )

2025-10-16 15:29:17 -07:00