Files
composable_kernel/include
Sami Remes 180a436cca WIP: Some numerical issues still, maybe from tail?
It should allow decoupling the MFMA and the FMA-scaling with two
c_thread_buf_per_scale buffers, and look ahead fetching of
a/b thread bufs.

The performance is still quite similar as without double buffering.
2025-09-11 12:22:51 +00:00
..
2025-09-02 14:14:10 +03:00