mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-02 18:10:02 +00:00
Rquires to change the get_rows and matrix multiplication kernels to use a dequantizer type rather than a dequantization function. But once this is done, we can simply reuse the iq1_bn implementation. This change will also allow to add other quantization types that have meta data (such as a row scale) stored at the beginning of a row (or change existing quantization types to row-wise scales).