Files
ik_llama.cpp/src
Nexes the Elder 39e712620d Streamline a bit the quant strategies (#443)
* Streamline a bit the quant strategies

No change over the existing patterns, except for the bump for attn_k and attn_v for the models with 4 and 6 experts (several frankensteins seen on HF, and which also use GQA).
The rest is applying the existing patterns to the new IQ_K quants.
Also, a Q8_0 for attn_q slipped into the MOEs 8 experts rule, I removed it, because that tensor is much bigger than attn_k or attn_v.

* remove <=8 experts condition.
2025-05-22 18:04:47 +03:00
..
2024-07-27 07:55:01 +02:00
2025-04-07 10:43:26 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2025-01-23 18:24:10 +02:00
2024-07-27 07:55:01 +02:00