Files
ik_llama.cpp/ggml
Kawrakow 478b56871f Faster long context TG on CUDA for GLM-4.5/4.6/4.7/AIR (part 2) (#1190)
* This works

* Make quantized KV cache work

* Remove the glm45 graph building changes

* Add condition
2026-01-26 07:21:47 +02:00
..
2024-07-27 07:55:01 +02:00
2026-01-22 13:20:23 +02:00
2024-07-27 07:55:01 +02:00