mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-13 15:30:03 +00:00
AVX512+AVXVNNI GEMM implementation for quants using Q8_K for activations (#710)
* q8_k_r16: basics * q8_k_r16: iq4_xs now uses q8_k_r16 on Zen4+ PP performance is about the same as using q8_k_r8 on the Ryzen-7950X, so we expect nice gains on Zen5, and we don't need to wory about using 2 different q8_k_r8 implementations for fancy SIMD. * q8_k_r16: iq2_xxs now uses q8_k_r16 on Zen4+ * q8_k_r16: iq2_xs now uses q8_k_r16 on Zen4+ * q8_k_r16: iq2_s now uses q8_k_r16 on Zen4+ * q8_k_r16: iq3_xxs now uses q8_k_r16 on Zen4+ * q8_k_r16: iq3_s now uses q8_k_r16 on Zen4+ * q8_k_r16: iq1_s and iq1_m now uses q8_k_r16 on Zen4+ * q8_k_r16: q2_K and q3_K now uses q8_k_r16 on Zen4+ * q8_k_r16: iq2_ks and iq2_k now uses q8_k_r16 on Zen4+ * q8_k_r16: iq2_kl now uses q8_k_r16 on Zen4+ * q8_k_r16: iq3_ks and iq3_k now uses q8_k_r16 on Zen4+ * q8_k_r16: iq4_kss, iq4_ks, and iq4_k now use q8_k_r16 on Zen4+ * q8_k_r16: iq5_ks, iq5_k, and iq6_k now use q8_k_r16 on Zen4+ * Fix AVX2 * Just always set num_rows to 16 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
@@ -475,6 +475,7 @@ extern "C" {
|
||||
GGML_TYPE_IQ5_K_R4 = 340,
|
||||
GGML_TYPE_IQ4_KS_R4 = 344,
|
||||
GGML_TYPE_IQ5_KS_R4 = 352,
|
||||
GGML_TYPE_Q8_K_R16 = 397,
|
||||
GGML_TYPE_Q8_KV_R8 = 398,
|
||||
GGML_TYPE_Q8_K_R8 = 399,
|
||||
GGML_TYPE_COUNT,
|
||||
@@ -571,6 +572,7 @@ extern "C" {
|
||||
GGML_FTYPE_MOSTLY_IQ5_K_R4 = 333, // except 1d tensors
|
||||
GGML_FTYPE_MOSTLY_IQ4_KS_R4 = 337, // except 1d tensors
|
||||
GGML_FTYPE_MOSTLY_IQ5_KS_R4 = 341, // except 1d tensors
|
||||
GGML_FTYPE_MOSTLY_Q8_K_R16 = 397, // except 1d tensors
|
||||
GGML_FTYPE_MOSTLY_Q8_KV_R8 = 398, // except 1d tensors
|
||||
GGML_FTYPE_MOSTLY_Q8_K_R8 = 399, // except 1d tensors
|
||||
};
|
||||
|
||||
Reference in New Issue
Block a user