ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-23 07:59:25 +00:00

Files

Iwan Kawrakow 31c85a8949 FlashMLA(CUDA) - allow q8_0 for KV cache

This works, and PP is not bad, but TG is still quite a bit slower.

2025-03-11 16:53:05 +02:00

2024-07-27 07:55:01 +02:00

2025-03-02 13:47:38 +02:00

2025-03-11 16:53:05 +02:00

.gitignore

2024-07-27 07:55:01 +02:00

CMakeLists.txt

2025-02-09 18:59:33 +02:00