Files
ik_llama.cpp/github-data/pull_requests/351 - CPU FA improvements.md
2025-07-23 13:31:53 +02:00

344 lines
21 KiB
Markdown

### 🔀 [#351](https://github.com/ikawrakow/ik_llama.cpp/pull/351) - CPU FA improvements
| **Author** | `ikawrakow` |
| :--- | :--- |
| **State** | ❌ **Closed** |
| **Created** | 2025-04-28 |
| **Updated** | 2025-04-29 |
---
#### Description
This PR further improves CPU FA performance for GQA models. It does not affect FlashMLA (relevant for DeepSeek models), but the same strategy could be applied also there. I have left this for a future PR.
Here some performance data and graphs for LLaMA-3.1-8B and Gemma3-12B. In all cases `Q8_0` quantized KV cache is used. The model weights are quantized with `Q4_0`, selected specifically because of having best performance in mainline `llama.cpp` due to the extraordinary amount of attention this quantization type receives.
## Gemma3-12B, Ryzen-7950X CPU
![g3_tg_7950](https://github.com/user-attachments/assets/e1f27dfb-8234-4157-9603-6fae9fc40dc0)
![g3_pp_7950](https://github.com/user-attachments/assets/13712509-db82-40a1-945c-670d2b40eee8)
<details>
<summary>Gemma3-12B, Ryzen-7950X CPU, mainline llama.cpp</summary>
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|-------|--------|--------|----------|----------|----------|----------|
| 512 | 128 | 0 | 4.855 | 105.46 | 15.816 | 8.09 |
| 512 | 128 | 512 | 5.743 | 89.15 | 16.529 | 7.74 |
| 512 | 128 | 1024 | 6.337 | 80.80 | 17.091 | 7.49 |
| 512 | 128 | 1536 | 6.516 | 78.58 | 17.199 | 7.44 |
| 512 | 128 | 2048 | 6.688 | 76.56 | 17.309 | 7.39 |
| 512 | 128 | 2560 | 6.882 | 74.40 | 17.416 | 7.35 |
| 512 | 128 | 3072 | 7.075 | 72.36 | 17.526 | 7.30 |
| 512 | 128 | 3584 | 7.291 | 70.22 | 17.638 | 7.26 |
| 512 | 128 | 4096 | 7.493 | 68.33 | 17.746 | 7.21 |
| 512 | 128 | 4608 | 7.751 | 66.05 | 17.769 | 7.20 |
| 512 | 128 | 5120 | 8.153 | 62.80 | 17.957 | 7.13 |
| 512 | 128 | 5632 | 8.658 | 59.13 | 18.072 | 7.08 |
| 512 | 128 | 6144 | 9.215 | 55.56 | 18.165 | 7.05 |
| 512 | 128 | 6656 | 9.792 | 52.29 | 18.264 | 7.01 |
| 512 | 128 | 7168 | 10.360 | 49.42 | 18.378 | 6.97 |
| 512 | 128 | 7680 | 10.964 | 46.70 | 18.484 | 6.92 |
| 512 | 128 | 8192 | 11.576 | 44.23 | 18.599 | 6.88 |
| 512 | 128 | 8704 | 12.193 | 41.99 | 18.687 | 6.85 |
| 512 | 128 | 9216 | 12.805 | 39.98 | 18.817 | 6.80 |
| 512 | 128 | 9728 | 13.402 | 38.20 | 18.923 | 6.76 |
| 512 | 128 | 10240 | 13.914 | 36.80 | 19.047 | 6.72 |
| 512 | 128 | 10752 | 14.442 | 35.45 | 19.226 | 6.66 |
| 512 | 128 | 11264 | 14.966 | 34.21 | 19.333 | 6.62 |
| 512 | 128 | 11776 | 15.517 | 33.00 | 19.372 | 6.61 |
| 512 | 128 | 12288 | 16.000 | 32.00 | 19.480 | 6.57 |
| 512 | 128 | 12800 | 16.504 | 31.02 | 19.593 | 6.53 |
| 512 | 128 | 13312 | 16.998 | 30.12 | 19.706 | 6.50 |
| 512 | 128 | 13824 | 17.607 | 29.08 | 19.810 | 6.46 |
| 512 | 128 | 14336 | 18.041 | 28.38 | 19.976 | 6.41 |
| 512 | 128 | 14848 | 18.543 | 27.61 | 20.092 | 6.37 |
| 512 | 128 | 15360 | 19.050 | 26.88 | 20.216 | 6.33 |
| 512 | 128 | 15872 | 19.514 | 26.24 | 20.393 | 6.28 |
</details>
<details>
<summary> Gemma3-12B, Ryzen-7950X, ik_llama.cpp main branch</summary>
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|-------|--------|--------|----------|----------|----------|----------|
| 512 | 128 | 0 | 2.913 | 175.75 | 15.638 | 8.18 |
| 512 | 128 | 512 | 2.998 | 170.78 | 15.889 | 8.06 |
| 512 | 128 | 1024 | 3.094 | 165.46 | 16.178 | 7.91 |
| 512 | 128 | 1536 | 3.180 | 160.99 | 16.474 | 7.77 |
| 512 | 128 | 2048 | 3.269 | 156.61 | 16.668 | 7.68 |
| 512 | 128 | 2560 | 3.360 | 152.39 | 16.895 | 7.58 |
| 512 | 128 | 3072 | 3.447 | 148.55 | 17.145 | 7.47 |
| 512 | 128 | 3584 | 3.539 | 144.66 | 17.415 | 7.35 |
| 512 | 128 | 4096 | 3.627 | 141.16 | 17.672 | 7.24 |
| 512 | 128 | 4608 | 3.715 | 137.82 | 17.924 | 7.14 |
| 512 | 128 | 5120 | 3.805 | 134.58 | 18.184 | 7.04 |
| 512 | 128 | 5632 | 3.892 | 131.56 | 18.448 | 6.94 |
| 512 | 128 | 6144 | 3.985 | 128.47 | 18.702 | 6.84 |
| 512 | 128 | 6656 | 4.081 | 125.45 | 18.951 | 6.75 |
| 512 | 128 | 7168 | 4.180 | 122.50 | 19.199 | 6.67 |
| 512 | 128 | 7680 | 4.289 | 119.38 | 19.444 | 6.58 |
| 512 | 128 | 8192 | 4.376 | 117.00 | 19.689 | 6.50 |
| 512 | 128 | 8704 | 4.481 | 114.27 | 19.927 | 6.42 |
| 512 | 128 | 9216 | 4.570 | 112.04 | 20.185 | 6.34 |
| 512 | 128 | 9728 | 4.684 | 109.31 | 20.427 | 6.27 |
| 512 | 128 | 10240 | 4.766 | 107.42 | 20.689 | 6.19 |
| 512 | 128 | 10752 | 4.870 | 105.13 | 20.921 | 6.12 |
| 512 | 128 | 11264 | 4.983 | 102.75 | 21.177 | 6.04 |
| 512 | 128 | 11776 | 5.076 | 100.87 | 21.430 | 5.97 |
| 512 | 128 | 12288 | 5.213 | 98.21 | 21.661 | 5.91 |
| 512 | 128 | 12800 | 5.324 | 96.16 | 21.924 | 5.84 |
| 512 | 128 | 13312 | 5.356 | 95.59 | 22.439 | 5.70 |
| 512 | 128 | 13824 | 5.468 | 93.63 | 22.689 | 5.64 |
| 512 | 128 | 14336 | 5.558 | 92.11 | 22.964 | 5.57 |
| 512 | 128 | 14848 | 5.684 | 90.07 | 23.209 | 5.52 |
| 512 | 128 | 15360 | 5.829 | 87.84 | 23.803 | 5.38 |
| 512 | 128 | 15872 | 5.971 | 85.75 | 24.068 | 5.32 |
</details>
<details>
<summary>Gemma3-12B, Ryzen-7950X, PR</summary>
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|-------|--------|--------|----------|----------|----------|----------|
| 512 | 128 | 0 | 2.871 | 178.35 | 15.620 | 8.19 |
| 512 | 128 | 512 | 2.952 | 173.46 | 15.752 | 8.13 |
| 512 | 128 | 1024 | 3.033 | 168.84 | 15.861 | 8.07 |
| 512 | 128 | 1536 | 3.112 | 164.53 | 15.995 | 8.00 |
| 512 | 128 | 2048 | 3.187 | 160.65 | 16.099 | 7.95 |
| 512 | 128 | 2560 | 3.265 | 156.82 | 16.227 | 7.89 |
| 512 | 128 | 3072 | 3.339 | 153.32 | 16.339 | 7.83 |
| 512 | 128 | 3584 | 3.419 | 149.75 | 16.463 | 7.77 |
| 512 | 128 | 4096 | 3.490 | 146.68 | 16.577 | 7.72 |
| 512 | 128 | 4608 | 3.566 | 143.60 | 16.701 | 7.66 |
| 512 | 128 | 5120 | 3.643 | 140.56 | 16.814 | 7.61 |
| 512 | 128 | 5632 | 3.721 | 137.61 | 16.940 | 7.56 |
| 512 | 128 | 6144 | 3.802 | 134.66 | 17.057 | 7.50 |
| 512 | 128 | 6656 | 3.884 | 131.84 | 17.165 | 7.46 |
| 512 | 128 | 7168 | 3.966 | 129.10 | 17.282 | 7.41 |
| 512 | 128 | 7680 | 4.051 | 126.38 | 17.402 | 7.36 |
| 512 | 128 | 8192 | 4.127 | 124.05 | 17.521 | 7.31 |
| 512 | 128 | 8704 | 4.208 | 121.68 | 17.631 | 7.26 |
| 512 | 128 | 9216 | 4.288 | 119.39 | 17.751 | 7.21 |
| 512 | 128 | 9728 | 4.366 | 117.28 | 17.861 | 7.17 |
| 512 | 128 | 10240 | 4.447 | 115.13 | 17.986 | 7.12 |
| 512 | 128 | 10752 | 4.526 | 113.13 | 18.099 | 7.07 |
| 512 | 128 | 11264 | 4.609 | 111.08 | 18.209 | 7.03 |
| 512 | 128 | 11776 | 4.698 | 108.99 | 18.330 | 6.98 |
| 512 | 128 | 12288 | 4.765 | 107.44 | 18.448 | 6.94 |
| 512 | 128 | 12800 | 4.843 | 105.71 | 18.559 | 6.90 |
| 512 | 128 | 13312 | 4.923 | 104.00 | 18.686 | 6.85 |
| 512 | 128 | 13824 | 4.999 | 102.42 | 18.797 | 6.81 |
| 512 | 128 | 14336 | 5.081 | 100.76 | 18.915 | 6.77 |
| 512 | 128 | 14848 | 5.160 | 99.23 | 19.029 | 6.73 |
| 512 | 128 | 15360 | 5.234 | 97.81 | 19.144 | 6.69 |
| 512 | 128 | 15872 | 5.320 | 96.24 | 19.265 | 6.64 |
</details>
## LLaMA-3.1-8B, Ryzen-7950X CPU
![l3_tg_7950](https://github.com/user-attachments/assets/ffa3c090-1155-45a1-af25-5cd9501bb59e)
![l3_pp_7950](https://github.com/user-attachments/assets/cc941136-e853-4389-a5e5-46cc96344869)
<details>
<summary>LLaMA-3.1-8B, Ryzen-7950X CPU, mainline llama.cpp</summary>
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|-------|--------|--------|----------|----------|----------|----------|
| 512 | 128 | 0 | 3.142 | 162.97 | 9.757 | 13.12 |
| 512 | 128 | 512 | 3.843 | 133.21 | 10.188 | 12.56 |
| 512 | 128 | 1024 | 4.755 | 107.68 | 10.650 | 12.02 |
| 512 | 128 | 1536 | 5.603 | 91.37 | 11.111 | 11.52 |
| 512 | 128 | 2048 | 6.516 | 78.58 | 11.663 | 10.98 |
| 512 | 128 | 2560 | 7.336 | 69.79 | 11.965 | 10.70 |
| 512 | 128 | 3072 | 8.223 | 62.27 | 12.806 | 10.00 |
| 512 | 128 | 3584 | 8.933 | 57.32 | 13.365 | 9.58 |
| 512 | 128 | 4096 | 9.856 | 51.95 | 13.786 | 9.28 |
| 512 | 128 | 4608 | 10.706 | 47.82 | 14.193 | 9.02 |
| 512 | 128 | 5120 | 11.364 | 45.05 | 14.343 | 8.92 |
| 512 | 128 | 5632 | 12.454 | 41.11 | 14.798 | 8.65 |
| 512 | 128 | 6144 | 13.314 | 38.46 | 15.306 | 8.36 |
| 512 | 128 | 6656 | 14.295 | 35.82 | 16.040 | 7.98 |
| 512 | 128 | 7168 | 15.305 | 33.45 | 16.261 | 7.87 |
| 512 | 128 | 7680 | 16.176 | 31.65 | 16.296 | 7.85 |
| 512 | 128 | 8192 | 17.431 | 29.37 | 16.787 | 7.62 |
| 512 | 128 | 8704 | 18.729 | 27.34 | 17.301 | 7.40 |
| 512 | 128 | 9216 | 19.666 | 26.03 | 18.312 | 6.99 |
| 512 | 128 | 9728 | 20.288 | 25.24 | 18.825 | 6.80 |
| 512 | 128 | 10240 | 21.463 | 23.86 | 19.068 | 6.71 |
| 512 | 128 | 10752 | 23.474 | 21.81 | 19.701 | 6.50 |
| 512 | 128 | 11264 | 25.045 | 20.44 | 21.869 | 5.85 |
| 512 | 128 | 11776 | 27.214 | 18.81 | 21.128 | 6.06 |
| 512 | 128 | 12288 | 29.659 | 17.26 | 21.934 | 5.84 |
| 512 | 128 | 12800 | 32.139 | 15.93 | 22.233 | 5.76 |
| 512 | 128 | 13312 | 34.763 | 14.73 | 23.041 | 5.56 |
| 512 | 128 | 13824 | 34.760 | 14.73 | 24.010 | 5.33 |
| 512 | 128 | 14336 | 37.343 | 13.71 | 24.287 | 5.27 |
| 512 | 128 | 14848 | 42.109 | 12.16 | 25.254 | 5.07 |
| 512 | 128 | 15360 | 44.581 | 11.48 | 26.290 | 4.87 |
| 512 | 128 | 15872 | 45.159 | 11.34 | 25.655 | 4.99 |
</details>
<details>
<summary>LLaMA-3.1-8B, Ryzen-7950X CPU, ik_llama.cpp, main branch</summary>
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|-------|--------|--------|----------|----------|----------|----------|
| 512 | 128 | 0 | 1.812 | 282.53 | 9.859 | 12.98 |
| 512 | 128 | 512 | 1.856 | 275.84 | 9.971 | 12.84 |
| 512 | 128 | 1024 | 1.911 | 267.87 | 10.082 | 12.70 |
| 512 | 128 | 1536 | 1.976 | 259.05 | 10.207 | 12.54 |
| 512 | 128 | 2048 | 2.025 | 252.81 | 10.323 | 12.40 |
| 512 | 128 | 2560 | 2.078 | 246.34 | 10.442 | 12.26 |
| 512 | 128 | 3072 | 2.137 | 239.57 | 10.559 | 12.12 |
| 512 | 128 | 3584 | 2.210 | 231.72 | 10.674 | 11.99 |
| 512 | 128 | 4096 | 2.248 | 227.76 | 10.791 | 11.86 |
| 512 | 128 | 4608 | 2.299 | 222.75 | 10.909 | 11.73 |
| 512 | 128 | 5120 | 2.357 | 217.24 | 11.024 | 11.61 |
| 512 | 128 | 5632 | 2.408 | 212.60 | 11.140 | 11.49 |
| 512 | 128 | 6144 | 2.467 | 207.51 | 11.255 | 11.37 |
| 512 | 128 | 6656 | 2.519 | 203.22 | 11.369 | 11.26 |
| 512 | 128 | 7168 | 2.578 | 198.63 | 11.488 | 11.14 |
| 512 | 128 | 7680 | 2.628 | 194.79 | 11.607 | 11.03 |
| 512 | 128 | 8192 | 2.688 | 190.46 | 11.720 | 10.92 |
| 512 | 128 | 8704 | 2.742 | 186.70 | 11.842 | 10.81 |
| 512 | 128 | 9216 | 2.796 | 183.10 | 11.965 | 10.70 |
| 512 | 128 | 9728 | 2.848 | 179.75 | 12.078 | 10.60 |
| 512 | 128 | 10240 | 2.910 | 175.97 | 12.194 | 10.50 |
| 512 | 128 | 10752 | 2.964 | 172.76 | 12.319 | 10.39 |
| 512 | 128 | 11264 | 3.021 | 169.48 | 12.440 | 10.29 |
| 512 | 128 | 11776 | 3.077 | 166.40 | 12.547 | 10.20 |
| 512 | 128 | 12288 | 3.136 | 163.27 | 12.670 | 10.10 |
| 512 | 128 | 12800 | 3.193 | 160.33 | 12.799 | 10.00 |
| 512 | 128 | 13312 | 3.252 | 157.42 | 12.913 | 9.91 |
| 512 | 128 | 13824 | 3.309 | 154.71 | 13.018 | 9.83 |
| 512 | 128 | 14336 | 3.372 | 151.85 | 13.152 | 9.73 |
| 512 | 128 | 14848 | 3.429 | 149.30 | 13.270 | 9.65 |
| 512 | 128 | 15360 | 3.491 | 146.65 | 13.370 | 9.57 |
| 512 | 128 | 15872 | 3.554 | 144.08 | 13.496 | 9.48 |
</details>
<details>
<summary>LLaMA-3.1-8B, Ryzen-7950X CPU, ik_llama.cpp, PR</summary>
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|-------|--------|--------|----------|----------|----------|----------|
| 512 | 128 | 0 | 1.848 | 277.10 | 9.838 | 13.01 |
| 512 | 128 | 512 | 1.834 | 279.16 | 9.893 | 12.94 |
| 512 | 128 | 1024 | 1.891 | 270.70 | 9.971 | 12.84 |
| 512 | 128 | 1536 | 1.951 | 262.37 | 10.033 | 12.76 |
| 512 | 128 | 2048 | 2.003 | 255.62 | 10.082 | 12.70 |
| 512 | 128 | 2560 | 2.057 | 248.90 | 10.147 | 12.61 |
| 512 | 128 | 3072 | 2.111 | 242.51 | 10.200 | 12.55 |
| 512 | 128 | 3584 | 2.169 | 236.00 | 10.258 | 12.48 |
| 512 | 128 | 4096 | 2.217 | 230.97 | 10.314 | 12.41 |
| 512 | 128 | 4608 | 2.268 | 225.72 | 10.368 | 12.35 |
| 512 | 128 | 5120 | 2.322 | 220.51 | 10.423 | 12.28 |
| 512 | 128 | 5632 | 2.372 | 215.83 | 10.479 | 12.22 |
| 512 | 128 | 6144 | 2.430 | 210.68 | 10.538 | 12.15 |
| 512 | 128 | 6656 | 2.477 | 206.73 | 10.575 | 12.10 |
| 512 | 128 | 7168 | 2.530 | 202.39 | 10.626 | 12.05 |
| 512 | 128 | 7680 | 2.580 | 198.42 | 10.685 | 11.98 |
| 512 | 128 | 8192 | 2.637 | 194.15 | 10.738 | 11.92 |
| 512 | 128 | 8704 | 2.682 | 190.88 | 10.791 | 11.86 |
| 512 | 128 | 9216 | 2.740 | 186.87 | 10.847 | 11.80 |
| 512 | 128 | 9728 | 2.785 | 183.83 | 10.903 | 11.74 |
| 512 | 128 | 10240 | 2.849 | 179.69 | 10.959 | 11.68 |
| 512 | 128 | 10752 | 2.892 | 177.03 | 11.015 | 11.62 |
| 512 | 128 | 11264 | 2.949 | 173.60 | 11.068 | 11.56 |
| 512 | 128 | 11776 | 2.995 | 170.93 | 11.122 | 11.51 |
| 512 | 128 | 12288 | 3.058 | 167.45 | 11.179 | 11.45 |
| 512 | 128 | 12800 | 3.102 | 165.06 | 11.233 | 11.39 |
| 512 | 128 | 13312 | 3.164 | 161.82 | 11.285 | 11.34 |
| 512 | 128 | 13824 | 3.210 | 159.52 | 11.339 | 11.29 |
| 512 | 128 | 14336 | 3.271 | 156.54 | 11.394 | 11.23 |
| 512 | 128 | 14848 | 3.319 | 154.26 | 11.447 | 11.18 |
| 512 | 128 | 15360 | 3.380 | 151.49 | 11.504 | 11.13 |
| 512 | 128 | 15872 | 3.428 | 149.34 | 11.560 | 11.07 |
</details>
## LLaMA-3.1-8B, M2-Max CPU
![l3_tg_m2](https://github.com/user-attachments/assets/79f32577-cfa7-4034-998f-ba819fa6f294)
![l3_pp_m2](https://github.com/user-attachments/assets/be6834ec-ff5e-4eb6-869f-d373c0e7d71b)
<details>
<summary>LLaMA-3.1-8B, M2-Max CPU, mainline llama.cpp</summary>
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|-------|--------|--------|----------|----------|----------|----------|
| 512 | 128 | 0 | 4.775 | 107.22 | 4.909 | 26.08 |
| 512 | 128 | 512 | 6.157 | 83.15 | 5.462 | 23.43 |
| 512 | 128 | 1024 | 8.047 | 63.63 | 5.981 | 21.40 |
| 512 | 128 | 1536 | 9.752 | 52.50 | 6.553 | 19.53 |
| 512 | 128 | 2048 | 11.760 | 43.54 | 7.078 | 18.08 |
| 512 | 128 | 2560 | 13.010 | 39.36 | 7.527 | 17.01 |
| 512 | 128 | 3072 | 13.878 | 36.89 | 8.051 | 15.90 |
| 512 | 128 | 3584 | 15.967 | 32.07 | 8.611 | 14.87 |
| 512 | 128 | 4096 | 17.357 | 29.50 | 9.099 | 14.07 |
| 512 | 128 | 4608 | 17.953 | 28.52 | 9.664 | 13.25 |
| 512 | 128 | 5120 | 20.917 | 24.48 | 10.123 | 12.64 |
| 512 | 128 | 5632 | 21.812 | 23.47 | 10.720 | 11.94 |
| 512 | 128 | 6144 | 24.313 | 21.06 | 11.310 | 11.32 |
| 512 | 128 | 6656 | 26.592 | 19.25 | 12.010 | 10.66 |
| 512 | 128 | 7168 | 28.705 | 17.84 | 12.549 | 10.20 |
| 512 | 128 | 7680 | 29.934 | 17.10 | 13.435 | 9.53 |
</details>
<details>
<summary>LLaMA-3.1-8B, M2-Max CPU, ik_llama.cpp, main branch</summary>
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|-------|--------|--------|----------|----------|----------|----------|
| 512 | 128 | 0 | 4.026 | 127.16 | 4.793 | 26.70 |
| 512 | 128 | 512 | 4.150 | 123.36 | 4.949 | 25.87 |
| 512 | 128 | 1024 | 4.322 | 118.45 | 5.292 | 24.19 |
| 512 | 128 | 1536 | 4.524 | 113.18 | 5.263 | 24.32 |
| 512 | 128 | 2048 | 4.740 | 108.01 | 5.415 | 23.64 |
| 512 | 128 | 2560 | 4.966 | 103.11 | 5.558 | 23.03 |
| 512 | 128 | 3072 | 5.154 | 99.34 | 5.708 | 22.42 |
| 512 | 128 | 3584 | 5.330 | 96.06 | 5.930 | 21.59 |
| 512 | 128 | 4096 | 5.471 | 93.59 | 6.072 | 21.08 |
| 512 | 128 | 4608 | 5.636 | 90.85 | 6.161 | 20.78 |
| 512 | 128 | 5120 | 5.755 | 88.96 | 6.449 | 19.85 |
| 512 | 128 | 5632 | 5.919 | 86.50 | 6.473 | 19.78 |
| 512 | 128 | 6144 | 6.142 | 83.36 | 6.672 | 19.19 |
| 512 | 128 | 6656 | 6.242 | 82.03 | 6.838 | 18.72 |
| 512 | 128 | 7168 | 6.287 | 81.44 | 6.923 | 18.49 |
| 512 | 128 | 7680 | 6.406 | 79.93 | 7.077 | 18.09 |
</details>
<details>
<summary>LLaMA-3.1-8B, M2-Max CPU, ik_llama.cpp, PR</summary>
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|-------|--------|--------|----------|----------|----------|----------|
| 512 | 128 | 0 | 4.035 | 126.88 | 4.842 | 26.73 |
| 512 | 128 | 512 | 4.139 | 123.70 | 4.868 | 26.29 |
| 512 | 128 | 1024 | 4.250 | 120.46 | 4.955 | 25.83 |
| 512 | 128 | 1536 | 4.408 | 116.16 | 5.055 | 25.32 |
| 512 | 128 | 2048 | 4.605 | 111.19 | 5.181 | 24.70 |
| 512 | 128 | 2560 | 4.790 | 106.90 | 5.250 | 24.38 |
| 512 | 128 | 3072 | 5.022 | 101.96 | 5.362 | 23.87 |
| 512 | 128 | 3584 | 5.198 | 98.50 | 5.379 | 23.80 |
| 512 | 128 | 4096 | 5.395 | 94.90 | 5.460 | 23.44 |
| 512 | 128 | 4608 | 5.546 | 92.31 | 5.543 | 23.09 |
| 512 | 128 | 5120 | 5.671 | 90.28 | 5.717 | 22.39 |
| 512 | 128 | 5632 | 5.793 | 88.39 | 5.718 | 22.39 |
| 512 | 128 | 6144 | 5.967 | 85.80 | 5.820 | 21.99 |
| 512 | 128 | 6656 | 6.051 | 84.61 | 5.901 | 21.69 |
| 512 | 128 | 7168 | 6.147 | 83.29 | 5.972 | 21.43 |
| 512 | 128 | 7680 | 6.228 | 82.21 | 6.081 | 21.05 |
</details>