Files
ik_llama.cpp/examples/quantize-stats
Iwan Kawrakow f21dd3fb15 Testing Trellis quantization
Using 12 bits per 8 weights I get a better rmse than
iq2_xxs. I still need to see how quantizing the group-of-8
scales will affect accuracy. By AVX2 SIMDifying the search
for the best code, LLaMA-3.1-8B gets quantized in 130 seconds
on the Ryzen-7950X CPU - sluggish but still acceptable.
2024-11-21 08:16:40 +02:00
..
WIP
2024-11-21 08:16:40 +02:00