Commit Graph

8 Commits

Author SHA1 Message Date
Max Podkorytov
bce6ec11cd Optimize tensor descriptor functor template instantiation
Replace inline lambdas with named functor structs in transform_tensor_descriptor
to reduce template instantiation overhead and improve compile times.

Changes:
- Add three named functors in tensor_descriptor.hpp:
  - convert_visible_to_hidden_id: maps visible dimension ID to hidden ID
  - convert_visible_ids_to_hidden_ids: maps sequence of visible IDs to hidden IDs
  - generate_arithmetic_sequence_from_scan: generates consecutive hidden dim ID ranges

- Add utility functions in sequence_helper.hpp and tuple_helper.hpp:
  - unpack_and_merge_sequences(): unpacks tuple of sequences and merges them
  - generate_identity_sequences(): creates Tuple<Sequence<0>, Sequence<1>, ...>

- Update 14 call sites across threadwise transfer, wrapper, and device files
  to use generate_identity_sequences() instead of generate_tuple with lambdas

- Add comprehensive unit tests:
  - unit_sequence_helper.cpp: tests for new utility functions
  - unit_tensor_descriptor_functors.cpp: tests for new functors

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-29 14:26:43 -07:00
Aviral Goel
de6466481f chore(copyright): update copyright header for include directory (#3293) 2025-11-26 11:00:05 -07:00
Bartłomiej Kocot
42fc8eddd2 Fix warnings during wrapper docs generation (#1192)
* Fix warnings during wrapper docs generation

* Fixes
2024-03-08 17:13:03 -08:00
Bartłomiej Kocot
1e73adbc28 Add optimized blockwise gemm using ck wrapper (#1157)
* Add optimized blockwise gemm using ck wrapper

* Add basic gemm example

* Update docs

* Add tutorial for gemm using ck wrapper

* Add perf note

* edits

* Fix cmake

* Fixes

---------

Co-authored-by: Lisa Delaney <lisa.delaney@amd.com>
2024-02-13 17:04:36 +01:00
Bartłomiej Kocot
f3b6c23ac5 Add blockwise gemm to ck wrapper (#1139)
* Add blockwise gemm to ck wrapper

* Add blockwise gemm traits

* Disable test_gemm for non xdl devices

* Fixes

* Add c layout descritpions
2024-01-31 21:24:40 +01:00
Bartłomiej Kocot
7e4eb4b800 Add optimized copy to ck wrapper (#1126)
* Add optimized copy to ck wrapper

* Example optimizations

* Fixes

* Move img2col test to client example

* Refactor example

* Fix docs

* Fixes

* Fix

* Fixes

* Fixes

* Fixes

* Fixes

* Fixes

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
2024-01-19 11:29:00 +01:00
Bartłomiej Kocot
4234b3a691 Add tensor partition and generic copy for ck wrapper (#1108)
* Add tensor partition and generic copy for ck wrapper

* Update changelog

* Stylistic fixes

* Change shape/strides logic to descriptor transforms

* Fixes

* Fix client example

* Fix comments
2024-01-03 01:10:57 +01:00
Bartłomiej Kocot
07092d68f0 Add tensor structure to wrapper (#1098)
* Add tensor structure to wrapper

* update changelog

* Fix names

* Comment fixes
2023-12-15 12:45:08 +01:00