Files
ik_llama.cpp/gguf-py/gguf
saood06 d58dee869a Deepseek MLA Optimizations V2 (#195)
* Avoid allocating MHA KV cache when MLA is turned on

* Added missing gguf-py file

* Added final optimizations

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

* Make sure we do have wk_b and wv_b before enabling MLA

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-02-09 09:36:54 +02:00
..
2024-07-27 07:55:01 +02:00
2025-02-08 11:04:01 +02:00
2024-07-27 07:55:01 +02:00
2024-08-12 15:14:32 +02:00
2024-08-12 15:14:32 +02:00
2024-08-12 15:14:32 +02:00
2024-07-27 07:55:01 +02:00