mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-05-01 20:01:42 +00:00
Update README.md
This commit is contained in:
48
README.md
48
README.md
@@ -13,7 +13,7 @@ If you are not already familiar with [llama.cpp](https://github.com/ggerganov/ll
|
||||
|
||||
Note that I have published some, but not all, of the code in this repository in a series of [llamafile](https://github.com/Mozilla-Ocho/llamafile) PRs ([394](https://github.com/Mozilla-Ocho/llamafile/pull/394), [405](https://github.com/Mozilla-Ocho/llamafile/pull/405), [428](https://github.com/Mozilla-Ocho/llamafile/pull/428), [435](https://github.com/Mozilla-Ocho/llamafile/pull/435), [453](https://github.com/Mozilla-Ocho/llamafile/pull/453), and [464](https://github.com/Mozilla-Ocho/llamafile/pull/464))
|
||||
|
||||
The implementation оф матриь мултиплицатионс is in a single C++ source file (`iqk_mul_mat.cpp`) with just two interface functions `iqk_mul_mat` (`fp16/fp32` and quantized matrix multiplications) and `iqk_mul_mat_moe` (as `iqk_mul_mat` but meant to be used for the FFN part of a MoE model). Under the hood `iqk_mul_mat_moe` uses the same implementation as `iqk_mul_mat`, with the only difference being where results are stored in memory. Bitnet quantization related stuff is in `iqk-quantize.cpp`.
|
||||
The implementation of matrix-matrix and matrix-vector multiplications is in a single C++ source file (`iqk_mul_mat.cpp`) with just two interface functions `iqk_mul_mat` (`fp16/fp32` and quantized matrix multiplications) and `iqk_mul_mat_moe` (as `iqk_mul_mat` but meant to be used for the FFN part of a MoE model). Under the hood `iqk_mul_mat_moe` uses the same implementation as `iqk_mul_mat`, with the only difference being where results are stored in memory. Bitnet quantization related stuff is in `iqk-quantize.cpp`.
|
||||
|
||||
## Why?
|
||||
|
||||
@@ -93,7 +93,7 @@ The command line to generate the data was
|
||||
```
|
||||
|
||||
| Quantization| size | backend | threads | t/s (llama.cpp) | t/s (iqk_mul_mat)| Speedup |
|
||||
| ------------------------ | ---------: | ---------- | ------: | ---------------: | ---------------: | ------: |
|
||||
| ---------- | ---------: | ---------- | ------: | ---------------: | ---------------: | ------: |
|
||||
| 8B F16 | 14.96 GiB | AVX2 | 1 | 2.20 ± 0.00 | 2.25 ± 0.00 | 1.023 |
|
||||
| | | | 2 | 3.63 ± 0.00 | 3.68 ± 0.00 | 1.014 |
|
||||
| | | | 4 | 4.20 ± 0.00 | 4.20 ± 0.00 | 1.000 |
|
||||
@@ -125,24 +125,24 @@ The command line to generate the data was
|
||||
| 7B Q5_1 | 4.72 GiB | NEON | 2 | 6.51 ± 0.01 | 6.42 ± 0.03 | 0.986 |
|
||||
| | | | 4 | 12.26 ± 0.18 | 12.21 ± 0.14 | 0.996 |
|
||||
| | | | 8 | 20.33 ± 0.52 | 21.85 ± 0.22 | 1.075 |
|
||||
| 8B Q2_K - Small | 2.78 GiB | AVX2 | 2 | 11.30 ± 0.00 | 13.06 ± 0.01 | 1.156 |
|
||||
| 8B Q2_K_S | 2.78 GiB | AVX2 | 2 | 11.30 ± 0.00 | 13.06 ± 0.01 | 1.156 |
|
||||
| | | | 4 | 18.70 ± 0.00 | 19.04 ± 0.65 | 1.014 |
|
||||
| 7B Q2_K - Small | 2.16 GiB | NEON | 2 | 8.42 ± 0.05 | 11.97 ± 0.10 | 1.422 |
|
||||
| 7B Q2_K_S | 2.16 GiB | NEON | 2 | 8.42 ± 0.05 | 11.97 ± 0.10 | 1.422 |
|
||||
| | | | 4 | 15.74 ± 0.01 | 22.09 ± 0.08 | 1.403 |
|
||||
| | | | 8 | 27.35 ± 0.05 | 38.32 ± 0.05 | 1.401 |
|
||||
| 8B Q3_K - Small | 3.41 GiB | AVX2 | 2 | 8.58 ± 0.00 | 10.82 ± 0.00 | 1.261 |
|
||||
| 8B Q3_K_S | 3.41 GiB | AVX2 | 2 | 8.58 ± 0.00 | 10.82 ± 0.00 | 1.261 |
|
||||
| | | | 4 | 15.26 ± 0.01 | 16.25 ± 0.01 | 1.065 |
|
||||
| 7B Q3_K - Small | 2.75 GiB | NEON | 2 | 6.40 ± 0.02 | 9.12 ± 0.09 | 1.425 |
|
||||
| 7B Q3_K_S | 2.75 GiB | NEON | 2 | 6.40 ± 0.02 | 9.12 ± 0.09 | 1.425 |
|
||||
| | | | 4 | 12.17 ± 0.00 | 17.11 ± 0.03 | 1.406 |
|
||||
| | | | 8 | 22.04 ± 0.08 | 31.39 ± 0.31 | 1.424 |
|
||||
| 8B Q4_K - Small | 4.36 GiB | AVX2 | 2 | 9.61 ± 0.00 | 10.72 ± 0.01 | 1.116 |
|
||||
| 8B Q4_K_S | 4.36 GiB | AVX2 | 2 | 9.61 ± 0.00 | 10.72 ± 0.01 | 1.116 |
|
||||
| | | | 4 | 13.24 ± 0.31 | 13.28 ± 0.01 | 1.003 |
|
||||
| 7B Q4_K - Small | 3.59 GiB | NEON | 2 | 11.15 ± 0.05 | 12.93 ± 0.09 | 1.160 |
|
||||
| 7B Q4_K_S | 3.59 GiB | NEON | 2 | 11.15 ± 0.05 | 12.93 ± 0.09 | 1.160 |
|
||||
| | | | 4 | 20.24 ± 0.16 | 23.49 ± 0.29 | 1.161 |
|
||||
| | | | 8 | 25.76 ± 0.07 | 28.31 ± 0.22 | 1.099 |
|
||||
| 8B Q5_K - Small | 5.21 GiB | AVX2 | 2 | 7.45 ± 0.00 | 9.73 ± 0.00 | 1.306 |
|
||||
| 8B Q5_K_S | 5.21 GiB | AVX2 | 2 | 7.45 ± 0.00 | 9.73 ± 0.00 | 1.306 |
|
||||
| | | | 4 | 11.05 ± 0.33 | 11.43 ± 0.02 | 1.034 |
|
||||
| 7B Q5_K - Small | 4.33 GiB | NEON | 2 | 7.20 ± 0.04 | 8.81 ± 0.04 | 1.224 |
|
||||
| 7B Q5_K_S | 4.33 GiB | NEON | 2 | 7.20 ± 0.04 | 8.81 ± 0.04 | 1.224 |
|
||||
| | | | 4 | 13.62 ± 0.15 | 16.81 ± 0.16 | 1.234 |
|
||||
| | | | 8 | 20.56 ± 0.19 | 23.96 ± 0.14 | 1.165 |
|
||||
| 8B Q6_K | 6.14 GiB | AVX2 | 2 | 7.53 ± 0.00 | 9.42 ± 0.00 | 1.251 |
|
||||
@@ -150,39 +150,39 @@ The command line to generate the data was
|
||||
| 7B Q6_K | 5.15 GiB | NEON | 2 | 6.85 ± 0.04 | 8.30 ± 0.06 | 1.212 |
|
||||
| | | | 4 | 13.03 ± 0.05 | 15.47 ± 0.17 | 1.187 |
|
||||
| | | | 8 | 18.52 ± 0.07 | 20.67 ± 0.08 | 1.116 |
|
||||
| 8B IQ2_XXS - 2.0625 bpw | 2.23 GiB | AVX2 | 2 | 5.33 ± 0.01 | 6.40 ± 0.00 | 1.201 |
|
||||
| 8B IQ2_XXS | 2.23 GiB | AVX2 | 2 | 5.33 ± 0.01 | 6.40 ± 0.00 | 1.201 |
|
||||
| | | | 4 | 10.06 ± 0.03 | 11.76 ± 0.03 | 1.169 |
|
||||
| 7B IQ2_XXS - 2.0625 bpw | 1.73 GiB | NEON | 2 | 5.07 ± 0.04 | 5.22 ± 0.05 | 1.030 |
|
||||
| 7B IQ2_XXS | 1.73 GiB | NEON | 2 | 5.07 ± 0.04 | 5.22 ± 0.05 | 1.030 |
|
||||
| | | | 4 | 9.63 ± 0.00 | 9.91 ± 0.07 | 1.029 |
|
||||
| | | | 8 | 17.40 ± 0.50 | 18.65 ± 0.22 | 1.072 |
|
||||
| 8B IQ2_XS - 2.3125 bpw | 2.42 GiB | AVX2 | 2 | 5.83 ± 0.00 | 6.55 ± 0.00 | 1.123 |
|
||||
| 8B IQ2_XS | 2.42 GiB | AVX2 | 2 | 5.83 ± 0.00 | 6.55 ± 0.00 | 1.123 |
|
||||
| | | | 4 | 10.88 ± 0.09 | 12.07 ± 0.07 | 1.109 |
|
||||
| 7B IQ2_XS - 2.3125 bpw | 1.89 GiB | NEON | 2 | 5.52 ± 0.01 | 5.60 ± 0.00 | 1.014 |
|
||||
| 7B IQ2_XS | 1.89 GiB | NEON | 2 | 5.52 ± 0.01 | 5.60 ± 0.00 | 1.014 |
|
||||
| | | | 4 | 10.50 ± 0.01 | 11.15 ± 0.00 | 1.062 |
|
||||
| | | | 8 | 18.19 ± 1.30 | 20.94 ± 0.19 | 1.151 |
|
||||
| 8B IQ2_M - 2.7 bpw | 2.74 GiB | AVX2 | 2 | 5.12 ± 0.01 | 5.17 ± 0.00 | 1.010 |
|
||||
| 8B IQ2_M | 2.74 GiB | AVX2 | 2 | 5.12 ± 0.01 | 5.17 ± 0.00 | 1.010 |
|
||||
| | | | 4 | 9.60 ± 0.28 | 9.68 ± 0.16 | 1.008 |
|
||||
| 7B IQ2_M - 2.7 bpw | 2.20 GiB | NEON | 2 | 3.73 ± 0.02 | 4.53 ± 0.00 | 1.214 |
|
||||
| 7B IQ2_M | 2.20 GiB | NEON | 2 | 3.73 ± 0.02 | 4.53 ± 0.00 | 1.214 |
|
||||
| | | | 4 | 7.14 ± 0.05 | 8.70 ± 0.06 | 1.218 |
|
||||
| | | | 8 | 11.99 ± 0.48 | 16.41 ± 0.05 | 1.369 |
|
||||
| 8B IQ3_XXS - 3.0625 bpw | 3.04 GiB | AVX2 | 2 | 4.06 ± 0.01 | 5.00 ± 0.00 | 1.232 |
|
||||
| 8B IQ3_XXS | 3.04 GiB | AVX2 | 2 | 4.06 ± 0.01 | 5.00 ± 0.00 | 1.232 |
|
||||
| | | | 4 | 7.75 ± 0.02 | 9.13 ± 0.45 | 1.178 |
|
||||
| 7B IQ3_XXS - 3.0625 bpw | 2.41 GiB | NEON | 2 | 3.53 ± 0.00 | 3.82 ± 0.00 | 1.082 |
|
||||
| 7B IQ3_XXS | 2.41 GiB | NEON | 2 | 3.53 ± 0.00 | 3.82 ± 0.00 | 1.082 |
|
||||
| | | | 4 | 6.74 ± 0.04 | 7.42 ± 0.07 | 1.103 |
|
||||
| | | | 8 | 11.96 ± 0.40 | 13.19 ± 0.29 | 1.103 |
|
||||
| 8B IQ3_S - 3.4375 bpw | 3.42 GiB | AVX2 | 2 | 3.62 ± 0.00 | 4.06 ± 0.00 | 1.122 |
|
||||
| 8B IQ3_S | 3.42 GiB | AVX2 | 2 | 3.62 ± 0.00 | 4.06 ± 0.00 | 1.122 |
|
||||
| | | | 4 | 6.80 ± 0.01 | 7.62 ± 0.10 | 1.121 |
|
||||
| 7B IQ3_S - 3.4375 bpw | 2.75 GiB | NEON | 2 | 2.96 ± 0.01 | 3.21 ± 0.03 | 1.084 |
|
||||
| 7B IQ3_S | 2.75 GiB | NEON | 2 | 2.96 ± 0.01 | 3.21 ± 0.03 | 1.084 |
|
||||
| | | | 4 | 5.68 ± 0.01 | 6.25 ± 0.05 | 1.100 |
|
||||
| | | | 8 | 10.32 ± 0.25 | 11.11 ± 0.37 | 1.077 |
|
||||
| 8B IQ4_XS - 4.25 bpw | 4.13 GiB | AVX2 | 2 | 8.08 ± 0.00 | 11.35 ± 0.00 | 1.405 |
|
||||
| 8B IQ4_XS | 4.13 GiB | AVX2 | 2 | 8.08 ± 0.00 | 11.35 ± 0.00 | 1.405 |
|
||||
| | | | 4 | 13.36 ± 0.72 | 14.32 ± 0.24 | 1.072 |
|
||||
| 7B IQ4_XS - 4.25 bpw | 3.37 GiB | NEON | 2 | 9.87 ± 0.03 | 12.06 ± 0.00 | 1.222 |
|
||||
| 7B IQ4_XS | 3.37 GiB | NEON | 2 | 9.87 ± 0.03 | 12.06 ± 0.00 | 1.222 |
|
||||
| | | | 4 | 17.78 ± 0.23 | 22.06 ± 0.28 | 1.241 |
|
||||
| | | | 8 | 27.62 ± 0.09 | 29.70 ± 0.39 | 1.075 |
|
||||
| 8B IQ4_NL - 4.5 bpw | 4.35 GiB | AVX2 | 2 | 5.52 ± 0.00 | 10.26 ± 0.00 | 1.859 |
|
||||
| 8B IQ4_NL | 4.35 GiB | AVX2 | 2 | 5.52 ± 0.00 | 10.26 ± 0.00 | 1.859 |
|
||||
| | | | 4 | 10.78 ± 0.01 | 13.69 ± 0.08 | 1.270 |
|
||||
| 7B IQ4_NL - 4.5 bpw | 3.56 GiB | NEON | 2 | 8.32 ± 0.01 | 13.54 ± 0.01 | 1.627 |
|
||||
| 7B IQ4_NL | 3.56 GiB | NEON | 2 | 8.32 ± 0.01 | 13.54 ± 0.01 | 1.627 |
|
||||
| | | | 4 | 15.89 ± 0.00 | 24.28 ± 0.29 | 1.528 |
|
||||
| | | | 8 | 26.56 ± 0.36 | 29.87 ± 0.08 | 1.125 |
|
||||
|
||||
|
||||
Reference in New Issue
Block a user