ik_llama.cpp/iqk-quantize.cpp at a2e43b83c9344e7c1130e3e95917bdd61dfb6aab

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-03 10:30:27 +00:00

Files

Iwan Kawrakow 58d9e8f1d2 bitnet: put the scale in a separate tensor

and correspondingly add an extra ggml_mul_mat operation.
As per @ggerganov, this is how things should be done.
It seems to be working, but as far as I can tell this
results in a ~15% performance penalty for prompt processing.
Commiting so I can go and test on othe platforms.

2024-06-22 12:02:52 +03:00

12 KiB

Raw Blame History

View Raw

12 KiB Raw Blame History

12 KiB

Raw Blame History