ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-07 04:20:03 +00:00

Files

Iwan Kawrakow 8a83e1f083 cuda: re-add q8_0 -> q8_0 transpose

so mla = 2 can be used with CUDA graphs and q8_0 cache.

2025-08-14 19:33:58 +03:00

2024-07-27 07:55:01 +02:00

2025-08-12 09:49:18 +03:00

2025-08-14 19:33:58 +03:00

.gitignore

2024-07-27 07:55:01 +02:00

CMakeLists.txt

2025-08-07 17:26:21 +03:00