mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-24 23:24:13 +00:00
This allows us to optimize TG performance for GQA models. E.g., for IQ4_XS L3-8B with 8k TG-64 goes from 8.6 to 10.26 t/s. Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>