ik_llama.cpp/ggml/src/iqk/iqk_gemm_iquants.h at 4d5dcba7c955b198d9482085ea4e484ed5fbfff3 - ik_llama.cpp - Public git mirror

ikawrakow/ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-30 19:31:48 +00:00

Files

Kawrakow bdf1b34493 IQ2_XXS: much faster CPU prompt processing (#515 )

* Much faster iq2_xxs GEMM

PP-512 = 290 t/s vs ~110 t/s (iq2_xxs) or 148 t/s (iq2_xxs_r4) on main.

* iq2_xxs: q8_2_x4 GEMM

* iq2_xxs: use template for q8_2_x4 GEMM

* Fix AVX2

* Cleanup

* NEON is not working yet, so still use Q8_K GEMM

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-06-11 11:12:30 +03:00

14 lines

313 B

C++

Raw Blame History

 #pragma once
 #include "iqk_common.h"
 #ifdef IQK_IMPLEMENT
 #include <array>
 bool iqk_set_kernels_iquants(int ne00, int typeA, int typeB, std::array<mul_mat_t, IQK_MAX_NY>& kernels, mul_mat_t& func16);
 bool iqk_convert_iquants_q80_r8(int type, int n, const void * vx, size_t bx, void * vy, int nrc_x);
 #endif