mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-18 20:30:11 +00:00
* WIP: adding mainline mmq_id implementation * This seems to work * Now also -fmoe works * WIP * WIP * WIP * This works for mainline supported quants * mmq_id: add iq2_k, iq2_k_r4 * mmiq_id: don't assume row size is multiple of type size (per row scales) * mmiq_id: don't assume row size is multiple of type size * mmq_id: add iq2_ks So we are sure it works with per row scales * mmq_id: add iq2_kl * mmq_id: add iq3_ks * mmq_id: adding iq3_k, iq3_k_r4 * mmq_id: add iq4_kss, iq4_ks, iq4_ks_r4 * mmq_id: adding iq4_k, iq4_k_r4 * mmq_id: adding iq5_ks, iq5_ks_r4 * mmq_id: adding iq5_k, iq5_k_r4, q6_0 * mmq_id: adding iq6_k * mmq_id: add iq1_s_r4 * mmq_id: adding iq1_kt, iq2_kt * mmq_id: add iq3_kt, iq4_kt * Add CUDA fp8 header --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>