Files
composable_kernel/include/ck/tensor_operation/gpu/device
Enrico Degregori 3d29bff2f0 Wmma support for multiple ABD GEMM (#2803)
* multi_abd wmma support:

 - Add multiple A and B support to multiple D implementation (gridwise level)
 - Add multi_abd GEMM (device level)
 - Add instances (xdl parity)
 - Add tests (both xdl and wmma)
 - Add examples
 - Add ckProfiler support (both xdl and wmma)

* Fix bug in device print function

* Fix unused template parameter

* Fix batched gemm for multiABD gridwise implementation

* Fix gemm_universal_reduce with multiABDs gridwise implementation

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2025-09-22 18:49:06 -07:00
..
2024-03-08 17:11:51 -08:00
2025-03-10 11:16:44 +08:00
2023-08-15 02:25:28 +08:00
2024-06-25 16:37:35 -05:00