ik_llama.cpp/20 - iq2_k_ slightly better bpw - accuracy compromise.md at main - ik_llama.cpp - Public git mirror

ikawrakow/ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 09:09:50 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

506 B

Raw Permalink Blame History

🔀 #20 - iq2_k: slightly better bpw - accuracy compromise

Author	`ikawrakow`
State	❌ Closed
Created	2024-08-19
Updated	2024-08-19

Description

For LLaMA-3.1 models:

It is better to quantize all of attn_v with iq3_k instead of half of attn_v with iq4_k
Quantizing attn_output with iq3_k results in a larger PPL decrease compared to what one expects from the added bpw.