Files
ik_llama.cpp/ggml
Kawrakow 6970ef925f CUDA: small PP performance improvement for MoE models (#589)
* Trying to implement quantized fmoe - not working yet

* This works, but is slower than the non-working version

* quantize_mmq_q8_1_id

* Minor

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-07-07 07:23:12 +02:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00