composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-03 13:48:30 +00:00

Author	SHA1	Message	Date
Max Podkorytov	bce6ec11cd	Optimize tensor descriptor functor template instantiation Replace inline lambdas with named functor structs in transform_tensor_descriptor to reduce template instantiation overhead and improve compile times. Changes: - Add three named functors in tensor_descriptor.hpp: - convert_visible_to_hidden_id: maps visible dimension ID to hidden ID - convert_visible_ids_to_hidden_ids: maps sequence of visible IDs to hidden IDs - generate_arithmetic_sequence_from_scan: generates consecutive hidden dim ID ranges - Add utility functions in sequence_helper.hpp and tuple_helper.hpp: - unpack_and_merge_sequences(): unpacks tuple of sequences and merges them - generate_identity_sequences(): creates Tuple<Sequence<0>, Sequence<1>, ...> - Update 14 call sites across threadwise transfer, wrapper, and device files to use generate_identity_sequences() instead of generate_tuple with lambdas - Add comprehensive unit tests: - unit_sequence_helper.cpp: tests for new utility functions - unit_tensor_descriptor_functors.cpp: tests for new functors Co-Authored-By: Claude <noreply@anthropic.com>	2026-01-29 14:26:43 -07:00
Aviral Goel	de6466481f	chore(copyright): update copyright header for include directory (#3293 )	2025-11-26 11:00:05 -07:00
Bartłomiej Kocot	42fc8eddd2	Fix warnings during wrapper docs generation (#1192 ) * Fix warnings during wrapper docs generation * Fixes	2024-03-08 17:13:03 -08:00
Bartłomiej Kocot	1e73adbc28	Add optimized blockwise gemm using ck wrapper (#1157 ) * Add optimized blockwise gemm using ck wrapper * Add basic gemm example * Update docs * Add tutorial for gemm using ck wrapper * Add perf note * edits * Fix cmake * Fixes --------- Co-authored-by: Lisa Delaney <lisa.delaney@amd.com>	2024-02-13 17:04:36 +01:00
Bartłomiej Kocot	f3b6c23ac5	Add blockwise gemm to ck wrapper (#1139 ) * Add blockwise gemm to ck wrapper * Add blockwise gemm traits * Disable test_gemm for non xdl devices * Fixes * Add c layout descritpions	2024-01-31 21:24:40 +01:00
Bartłomiej Kocot	7e4eb4b800	Add optimized copy to ck wrapper (#1126 ) * Add optimized copy to ck wrapper * Example optimizations * Fixes * Move img2col test to client example * Refactor example * Fix docs * Fixes * Fix * Fixes * Fixes * Fixes * Fixes * Fixes --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2024-01-19 11:29:00 +01:00
Bartłomiej Kocot	4234b3a691	Add tensor partition and generic copy for ck wrapper (#1108 ) * Add tensor partition and generic copy for ck wrapper * Update changelog * Stylistic fixes * Change shape/strides logic to descriptor transforms * Fixes * Fix client example * Fix comments	2024-01-03 01:10:57 +01:00
Bartłomiej Kocot	07092d68f0	Add tensor structure to wrapper (#1098 ) * Add tensor structure to wrapper * update changelog * Fix names * Comment fixes	2023-12-15 12:45:08 +01:00

8 Commits