Files
ik_llama.cpp/ggml/src
Kawrakow ce2b0292e1 CUDA: faster FA TG for GQA models (#370)
* cuda: WIP MMA FA

* Use MMA for TG also when quantized

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-04 09:17:44 +03:00
..
2025-05-04 09:02:12 +03:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2025-04-07 10:43:26 +02:00
2024-08-12 15:14:32 +02:00
2025-04-03 17:54:25 +02:00
2025-04-03 17:54:25 +02:00
2025-04-29 07:19:43 +02:00