Kawrakow
39982764d7
Bitnet: 2.25 bpw version
...
Just scaler and AVX2 for now.
PP-512 is even faster (325 t/s on the Ryzn-7950X, 404 t/s on
Ryzen-5975WX). We lose ~6-7% for TG due to being memory bound and
the model being 10% larger.
2024-06-22 12:02:52 +03:00
Kawrakow
318899c8b7
bitnet: add 2 bpw quantization
...
The scalar dot product already chieves 37 t/s for TG!
2024-06-22 12:02:51 +03:00
Kawrakow
f9ba085ef7
Move Q8_K64 quantization to iqk-quantize.cpp and add copyright notice
2024-06-22 12:02:51 +03:00
Kawrakow
b0967ffa79
bitnet: fix scalar dot product
...
I had forgotten to adjust for the change to q8_K64.
On the M2 I'm getting 10.8 t/s with the scalar version!
2024-06-22 12:02:51 +03:00
Kawrakow
81576cdcac
bitnet: python + llama
2024-06-22 12:02:51 +03:00