Files
ik_llama.cpp/ggml/src
Iwan Kawrakow df066ced5e Seems to be working on CUDA
For a dense model we get 2-3% speedup for PP and ~0.6% for TG.
2025-08-30 12:09:54 +03:00
..
2025-08-30 12:09:54 +03:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2025-08-30 12:09:54 +03:00
2025-08-09 08:40:18 +03:00
2025-08-09 08:40:18 +03:00
2025-08-09 08:40:18 +03:00
2025-08-27 08:03:47 +03:00
2025-07-15 08:03:13 +02:00