Files
ik_llama.cpp/examples/quantize-stats
Iwan Kawrakow 766fa600c8 WIP - try larger blocks
With blocks of 32 and 16 bits per groups of 8 the brute force
seach becomes prohibitive in terms of CPU time (30+ minutes
for 8B LLaMA after SIMDifying with AVX2). The trick is to
group the points in clusters, find the nearest cluster,
and only search within the cluster.
2024-11-21 08:16:41 +02:00
..
WIP
2024-11-21 08:16:40 +02:00
2024-11-21 08:16:41 +02:00