Files
ik_llama.cpp/iqk_mul_mat.cpp
Iwan Kawrakow e05cca9ef6 bitnet(scale in a separate tensor): CPU improvements
Arrange Q8 quants in blocks of 128 and adapt iqk_mul_mat
to deal with that. This improves PP speef by a few percent.
2024-06-22 12:02:52 +03:00

193 KiB