Files
ik_llama.cpp/ggml
Kawrakow 0551e7630b Moving 4D gemm logic from ggml.c to iqk_mul_mat.cpp (#207)
This allows us to optimize TG performance for GQA models.
E.g., for IQ4_XS L3-8B with 8k TG-64 goes from 8.6 to 10.26 t/s.

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-02-15 08:45:45 +02:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00