Files
ik_llama.cpp/ggml
Iwan Kawrakow 0fee6c54d9 Much better
The issue was that I did not change the number of warps
used for 3D matrix multiplications (wk_b * kv_cache, MoE),
so we ended up using 4 warps for TG. By going to 1 warp
in these cases, we get a significant boost in TG performance
(tested with DeepSeek-Lite)
2025-05-06 12:59:05 +03:00
..
2024-07-27 07:55:01 +02:00
2025-04-07 10:43:26 +02:00
2025-05-06 12:59:05 +03:00
2024-07-27 07:55:01 +02:00