ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-05-11 00:20:19 +00:00

Files

Kawrakow 478b56871f Faster long context TG on CUDA for GLM-4.5/4.6/4.7/AIR (part 2) (#1190 )

* This works

* Make quantized KV cache work

* Remove the glm45 graph building changes

* Add condition

2026-01-26 07:21:47 +02:00

2024-07-27 07:55:01 +02:00

2026-01-22 13:20:23 +02:00

2026-01-26 07:21:47 +02:00

.gitignore

2024-07-27 07:55:01 +02:00

CMakeLists.txt

2026-01-22 13:20:23 +02:00