68 KiB
🗣️ #266 - Benchmarking DeepSeek R1 - 16x3090
| Author | davidsyoung |
|---|---|
| Created | 2025-03-18 |
| Updated | 2025-03-21 |
Description
Wanted to create a resource for anyone looking to optimise -b -ub -amb with -mla 2 -fa -fmoe with offloading DeepSeek R1 fully on CUDA with ik_llama.cpp @ dcdfad29f7.
Layers are not evenly spread over 16 GPUs, and GPU utilisation is only at 5-10% on avg. <150w per GPU.
I'm not sure how useful this is, but ran it over night. It had an error on -b 4096 pp8192 due to OOM but still feel it's useful!
| model | size | params | backend | ngl | n_batch | n_ubatch | fa | mla | amb | fmoe | test | t/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | pp512 | 216.01 ± 4.70 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | pp1024 | 219.99 ± 2.45 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | pp2048 | 219.74 ± 1.46 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | pp4096 | 208.57 ± 0.58 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | pp8192 | 183.37 ± 0.73 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | tg128 | 17.22 ± 0.05 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | tg256 | 17.84 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | tg512 | 18.06 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | tg1024 | 18.02 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | tg2048 | 17.74 ± 0.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | pp512 | 238.55 ± 2.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | pp1024 | 235.57 ± 0.05 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | pp2048 | 226.29 ± 0.05 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | pp4096 | 208.86 ± 0.10 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | pp8192 | 182.56 ± 0.39 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | tg128 | 17.23 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | tg256 | 17.87 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | tg512 | 18.05 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | tg1024 | 18.01 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | tg2048 | 17.75 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | pp512 | 239.67 ± 1.22 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | pp1024 | 235.22 ± 1.85 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | pp2048 | 225.73 ± 0.06 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | pp4096 | 207.66 ± 0.12 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | pp8192 | 179.22 ± 0.24 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | tg128 | 17.25 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | tg256 | 17.85 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | tg512 | 18.05 ± 0.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | tg1024 | 18.04 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | tg2048 | 17.77 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | pp512 | 239.69 ± 0.92 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | pp1024 | 235.48 ± 0.07 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | pp2048 | 224.92 ± 0.24 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | pp4096 | 205.77 ± 0.20 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | pp8192 | 176.72 ± 0.14 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | tg128 | 17.21 ± 0.08 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | tg256 | 17.85 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | tg512 | 18.05 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | tg1024 | 18.04 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | tg2048 | 17.77 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | pp512 | 236.20 ± 0.76 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | pp1024 | 233.43 ± 0.95 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | pp2048 | 222.88 ± 0.17 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | pp4096 | 203.34 ± 0.16 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | pp8192 | 173.21 ± 0.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | tg128 | 17.27 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | tg256 | 17.85 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | tg512 | 18.06 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | tg1024 | 18.02 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | tg2048 | 17.79 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | pp512 | 238.70 ± 0.38 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | pp1024 | 303.92 ± 1.82 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | pp2048 | 295.71 ± 0.91 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | pp4096 | 276.63 ± 0.38 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | pp8192 | 244.18 ± 0.26 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | tg128 | 17.26 ± 0.05 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | tg256 | 17.79 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | tg512 | 18.09 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | tg1024 | 18.04 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | tg2048 | 17.77 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | pp512 | 239.64 ± 1.20 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | pp1024 | 305.79 ± 0.40 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | pp2048 | 296.58 ± 0.75 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | pp4096 | 276.62 ± 0.54 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | pp8192 | 244.26 ± 0.31 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | tg128 | 17.27 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | tg256 | 17.88 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | tg512 | 18.09 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | tg1024 | 18.05 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | tg2048 | 17.70 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | pp512 | 238.73 ± 1.24 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | pp1024 | 304.83 ± 0.61 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | pp2048 | 295.23 ± 0.09 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | pp4096 | 275.28 ± 0.29 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | pp8192 | 239.76 ± 0.39 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | tg128 | 17.21 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | tg256 | 17.82 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | tg512 | 18.05 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | tg1024 | 18.01 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | tg2048 | 17.71 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | pp512 | 237.98 ± 0.20 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | pp1024 | 304.20 ± 0.22 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | pp2048 | 293.80 ± 1.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | pp4096 | 272.19 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | pp8192 | 235.64 ± 0.42 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | tg128 | 17.14 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | tg256 | 17.79 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | tg512 | 18.02 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | tg1024 | 18.00 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | tg2048 | 17.72 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | pp512 | 238.40 ± 1.47 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | pp1024 | 301.66 ± 1.64 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | pp2048 | 290.44 ± 0.38 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | pp4096 | 267.12 ± 0.09 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | pp8192 | 229.98 ± 0.19 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | tg128 | 17.16 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | tg256 | 17.76 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | tg512 | 18.01 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | tg1024 | 17.97 ± 0.06 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | tg2048 | 17.73 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | pp512 | 240.23 ± 1.70 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | pp1024 | 305.03 ± 0.60 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | pp2048 | 349.22 ± 0.37 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | pp4096 | 327.33 ± 0.82 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | pp8192 | 290.90 ± 0.26 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | tg128 | 17.21 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | tg256 | 17.84 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | tg512 | 18.05 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | tg1024 | 18.01 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | tg2048 | 17.74 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | pp512 | 239.12 ± 3.60 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | pp1024 | 305.13 ± 1.86 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | pp2048 | 349.84 ± 0.12 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | pp4096 | 328.46 ± 0.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | pp8192 | 290.47 ± 0.23 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | tg128 | 17.24 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | tg256 | 17.81 ± 0.07 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | tg512 | 18.02 ± 0.06 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | tg1024 | 18.04 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | tg2048 | 17.79 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | pp512 | 238.52 ± 1.44 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | pp1024 | 304.77 ± 0.07 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | pp2048 | 348.11 ± 0.69 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | pp4096 | 326.30 ± 0.69 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | pp8192 | 288.35 ± 0.12 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | tg128 | 17.24 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | tg256 | 17.88 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | tg512 | 18.07 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | tg1024 | 18.05 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | tg2048 | 17.77 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | pp512 | 238.42 ± 1.40 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | pp1024 | 304.32 ± 1.66 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | pp2048 | 344.70 ± 1.92 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | pp4096 | 323.64 ± 0.60 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | pp8192 | 283.02 ± 0.24 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | tg128 | 17.22 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | tg256 | 17.86 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | tg512 | 18.06 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | tg1024 | 18.06 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | tg2048 | 17.79 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | pp512 | 236.64 ± 1.54 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | pp1024 | 301.44 ± 1.56 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | pp2048 | 343.13 ± 0.36 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | pp4096 | 317.60 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | pp8192 | 274.27 ± 0.22 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | tg128 | 17.28 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | tg256 | 17.89 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | tg512 | 18.08 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | tg1024 | 18.05 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | tg2048 | 17.78 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | pp512 | 238.37 ± 1.05 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | pp1024 | 304.95 ± 1.38 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | pp2048 | 349.14 ± 0.52 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | pp4096 | 327.89 ± 0.19 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | pp8192 | 291.05 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | tg128 | 17.25 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | tg256 | 17.81 ± 0.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | tg512 | 18.06 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | tg1024 | 18.04 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | tg2048 | 17.78 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | pp512 | 238.06 ± 0.70 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | pp1024 | 304.73 ± 0.74 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | pp2048 | 348.72 ± 1.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | pp4096 | 328.20 ± 0.51 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | pp8192 | 290.87 ± 0.49 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | tg128 | 17.27 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | tg256 | 17.88 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | tg512 | 18.09 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | tg1024 | 18.04 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | tg2048 | 17.72 ± 0.07 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | pp512 | 239.80 ± 0.46 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | pp1024 | 306.38 ± 1.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | pp2048 | 348.17 ± 0.55 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | pp4096 | 325.50 ± 0.88 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | pp8192 | 288.20 ± 0.07 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | tg128 | 17.25 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | tg256 | 17.83 ± 0.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | tg512 | 18.10 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | tg1024 | 18.06 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | tg2048 | 17.76 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | pp512 | 237.92 ± 2.32 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | pp1024 | 304.37 ± 0.47 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | pp2048 | 347.09 ± 0.66 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | pp4096 | 323.48 ± 0.46 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | pp8192 | 283.28 ± 0.14 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | tg128 | 17.20 ± 0.05 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | tg256 | 17.86 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | tg512 | 18.05 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | tg1024 | 18.05 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | tg2048 | 17.78 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | pp512 | 238.77 ± 2.73 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | pp1024 | 302.54 ± 0.90 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | pp2048 | 342.62 ± 0.56 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | pp4096 | 317.58 ± 0.10 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | pp8192 | 274.23 ± 0.40 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | tg128 | 17.27 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | tg256 | 17.88 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | tg512 | 18.09 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | tg1024 | 17.98 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | tg2048 | 17.78 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | pp512 | 240.30 ± 2.99 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | pp1024 | 236.20 ± 1.81 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | pp2048 | 226.46 ± 0.49 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | pp4096 | 209.52 ± 0.06 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | pp8192 | 183.03 ± 0.23 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | tg128 | 17.24 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | tg256 | 17.89 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | tg512 | 18.08 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | tg1024 | 18.06 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | tg2048 | 17.77 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | pp512 | 238.21 ± 0.99 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | pp1024 | 236.32 ± 1.53 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | pp2048 | 225.41 ± 0.24 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | pp4096 | 209.14 ± 0.30 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | pp8192 | 182.42 ± 0.08 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | tg128 | 17.24 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | tg256 | 17.86 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | tg512 | 18.09 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | tg1024 | 18.06 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | tg2048 | 17.78 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | pp512 | 239.31 ± 0.11 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | pp1024 | 234.58 ± 0.88 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | pp2048 | 224.77 ± 0.60 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | pp4096 | 207.35 ± 0.38 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | pp8192 | 178.79 ± 0.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | tg128 | 17.26 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | tg256 | 17.88 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | tg512 | 18.07 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | tg1024 | 18.05 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | tg2048 | 17.78 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | pp512 | 239.12 ± 0.21 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | pp1024 | 235.30 ± 1.41 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | pp2048 | 224.94 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | pp4096 | 206.20 ± 0.28 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | pp8192 | 176.54 ± 0.17 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | tg128 | 17.29 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | tg256 | 17.86 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | tg512 | 18.07 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | tg1024 | 17.99 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | tg2048 | 17.72 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | pp512 | 238.94 ± 0.70 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | pp1024 | 233.23 ± 0.45 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | pp2048 | 222.40 ± 0.23 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | pp4096 | 203.04 ± 0.51 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | pp8192 | 173.09 ± 0.06 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | tg128 | 17.25 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | tg256 | 17.89 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | tg512 | 18.06 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | tg1024 | 18.04 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | tg2048 | 17.76 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | pp512 | 239.80 ± 0.48 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | pp1024 | 305.07 ± 0.33 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | pp2048 | 295.09 ± 0.13 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | pp4096 | 275.70 ± 0.25 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | pp8192 | 243.52 ± 0.27 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | tg128 | 17.25 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | tg256 | 17.87 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | tg512 | 18.03 ± 0.06 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | tg1024 | 17.97 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | tg2048 | 17.72 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | pp512 | 241.05 ± 0.59 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | pp1024 | 304.85 ± 1.84 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | pp2048 | 295.04 ± 0.48 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | pp4096 | 276.20 ± 0.08 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | pp8192 | 243.36 ± 0.27 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | tg128 | 17.17 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | tg256 | 17.79 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | tg512 | 18.00 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | tg1024 | 17.98 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | tg2048 | 17.76 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | pp512 | 238.47 ± 0.34 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | pp1024 | 305.42 ± 1.32 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | pp2048 | 295.28 ± 0.20 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | pp4096 | 274.18 ± 0.37 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | pp8192 | 239.55 ± 0.20 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | tg128 | 17.27 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | tg256 | 17.85 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | tg512 | 17.99 ± 0.06 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | tg1024 | 18.04 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | tg2048 | 17.77 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | pp512 | 239.49 ± 0.90 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | pp1024 | 303.09 ± 1.76 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | pp2048 | 292.21 ± 1.47 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | pp4096 | 271.27 ± 0.16 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | pp8192 | 234.84 ± 0.11 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | tg128 | 17.23 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | tg256 | 17.83 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | tg512 | 18.06 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | tg1024 | 18.05 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | tg2048 | 17.73 ± 0.05 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | pp512 | 238.09 ± 1.33 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | pp1024 | 302.10 ± 0.35 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | pp2048 | 289.34 ± 0.51 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | pp4096 | 266.76 ± 0.16 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | pp8192 | 229.52 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | tg128 | 17.29 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | tg256 | 17.80 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | tg512 | 18.07 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | tg1024 | 18.04 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | tg2048 | 17.74 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | pp512 | 239.40 ± 0.85 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | pp1024 | 304.81 ± 0.38 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | pp2048 | 348.47 ± 1.08 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | pp4096 | 327.77 ± 0.24 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | pp8192 | 290.58 ± 0.18 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | tg128 | 17.26 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | tg256 | 17.86 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | tg512 | 18.08 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | tg1024 | 18.01 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | tg2048 | 17.67 ± 0.11 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | pp512 | 239.10 ± 1.34 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | pp1024 | 304.24 ± 2.13 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | pp2048 | 348.34 ± 0.82 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | pp4096 | 327.32 ± 0.20 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | pp8192 | 290.58 ± 0.09 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | tg128 | 17.27 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | tg256 | 17.83 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | tg512 | 18.06 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | tg1024 | 18.04 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | tg2048 | 17.71 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | pp512 | 239.16 ± 0.38 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | pp1024 | 304.15 ± 0.87 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | pp2048 | 347.30 ± 0.52 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | pp4096 | 325.70 ± 0.67 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | pp8192 | 287.87 ± 0.21 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | tg128 | 17.20 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | tg256 | 17.82 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | tg512 | 18.04 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | tg1024 | 18.01 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | tg2048 | 17.72 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | pp512 | 240.31 ± 3.17 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | pp1024 | 303.77 ± 1.31 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | pp2048 | 346.19 ± 0.76 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | pp4096 | 323.25 ± 0.24 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | pp8192 | 282.42 ± 0.07 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | tg128 | 17.18 ± 0.12 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | tg256 | 17.79 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | tg512 | 17.99 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | tg1024 | 18.02 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | tg2048 | 17.78 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | pp512 | 237.68 ± 1.86 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | pp1024 | 302.20 ± 1.45 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | pp2048 | 342.06 ± 0.96 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | pp4096 | 317.32 ± 0.50 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | pp8192 | 273.87 ± 0.54 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | tg128 | 17.28 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | tg256 | 17.85 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | tg512 | 18.03 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | tg1024 | 18.04 ± 0.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | tg2048 | 17.77 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 4096 | 1 | 2 | 1024 | 1 | pp512 | 238.93 ± 0.91 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 4096 | 1 | 2 | 1024 | 1 | pp1024 | 305.36 ± 0.21 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 4096 | 1 | 2 | 1024 | 1 | pp2048 | 348.42 ± 0.27 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 4096 | 1 | 2 | 1024 | 1 | pp4096 | 346.42 ± 0.52 |
Feel free to create whichever interesting graphs you find from it, as there's a lot of data it's quite hard to isolate:
PP
TG shows no notable difference.
🗣️ Discussion
👤 davidsyoung replied the 2025-03-18 at 09:37:29:
Mixed quant of Q8 for attn, Q5 down / IQ4_XS up|gate for layers 3-8, and IQ4_XS down / IQ3_S up|gate.
| Component | Blocks 0-2 | Blocks 3-8 | Blocks 9-60 |
|---|---|---|---|
| Attention Query/Key/Value | q8_0 | q8_0 | q8_0 |
| Attention Output | q8_0 | q8_0 | q8_0 |
| FFN Down (regular) | q8_0 | - | - |
| FFN Gate/Up (regular) | q8_0 | - | - |
| FFN Down Shared Experts | - | q5_K | q5_K |
| FFN Gate/Up Shared Experts | - | q5_K | q5_K |
| FFN Down Experts | - | q5_K | iq4_xs |
| FFN Gate/Up Experts | - | iq4_xs | iq3_s |
| Output Layer | q8_0 | q8_0 | q8_0 |
| Compression Results | |||
| Original size: 1,282,038 MB (~1.2 TB) | |||
| Quantized size: 314,569 MB (~307 GB) | |||
| Compression ratio: 4.1x |
PPL
perplexity: tokenizing the input ..
perplexity: tokenization took 1195.26 ms
perplexity: calculating perplexity over 561 chunks, n_ctx=512, batch_size=2048, n_seq=4
perplexity: 11.69 seconds per pass - ETA 27.32 minutes
[1]2.5779,[2]3.3447,[3]2.4073,[4]2.0140,[5]1.8352,[6]1.6862,[7]1.5895,[8]1.5208,[9]1.4715,[10]1.4284,[11]1.4147,[12]1.4406,[13]1.4529,[14]1.5824,[15]1.7144,[16]1.7752,[17]1.9408,[18]2.0703,[19]2.0333,[20]2.0250,[21]2.1305,[22]2.1021,[23]2.0764,[24]2.0880,[25]2.0581,[26]2.0330,[27]2.0797,[28]2.0888,[29]2.1391,[30]2.1698,[31]2.2044,[32]2.2227,[33]2.2626,[34]2.3049,[35]2.3566,[36]2.4115,[37]2.4463,[38]2.4930,[39]2.5346,[40]2.5926,[41]2.6353,[42]2.6458,[43]2.6948,[44]2.7107,[45]2.7909,[46]2.8420,[47]2.8003,[48]2.7549,[49]2.7298,[50]2.7498,[51]2.7964,[52]2.8105,[53]2.8597,[54]2.8734,[55]2.9047,[56]2.9384,[57]2.9550,[58]2.9926,[59]3.0027,[60]3.0502,[61]3.0906,[62]3.1475,[63]3.1812,[64]3.2262,[65]3.2360,[66]3.2179,[67]3.1954,[68]3.2271,[69]3.2225,[70]3.2377,[71]3.2562,[72]3.2726,[73]3.2860,[74]3.3095,[75]3.2881,[76]3.2396,[77]3.1959,[78]3.1931,[79]3.1728,[80]3.1563,[81]3.1190,[82]3.1220,[83]3.0918,[84]3.0554,[85]3.0218,[86]2.9995,[87]2.9958,[88]2.9686,[89]2.9537,[90]2.9261,[91]2.8966,[92]2.8704,[93]2.8441,[94]2.8196,[95]2.7964,[96]2.7947,[97]2.8024,[98]2.7882,[99]2.7728,[100]2.7752,[101]2.7671,[102]2.7843,[103]2.8105,[104]2.8288,[105]2.8261,[106]2.8486,[107]2.8737,[108]2.8953,[109]2.9296,[110]2.9637,[111]2.9837,[112]2.9567,[113]2.9436,[114]2.9207,[115]2.9047,[116]2.8905,[117]2.8672,[118]2.8450,[119]2.8235,[120]2.8040,[121]2.7884,[122]2.7698,[123]2.7532,[124]2.7334,[125]2.7156,[126]2.6981,[127]2.6840,[128]2.6757,[129]2.6662,[130]2.6551,[131]2.6472,[132]2.6548,[133]2.6649,[134]2.6714,[135]2.6822,[136]2.6990,[137]2.7145,[138]2.7231,[139]2.7348,[140]2.7353,[141]2.7368,[142]2.7356,[143]2.7359,[144]2.7320,[145]2.7228,[146]2.7211,[147]2.7254,[148]2.7248,[149]2.7265,[150]2.7210,[151]2.7192,[152]2.7157,[153]2.7114,[154]2.7119,[155]2.7159,[156]2.7180,[157]2.7237,[158]2.7322,[159]2.7339,[160]2.7428,[161]2.7509,[162]2.7605,[163]2.7660,[164]2.7863,[165]2.8095,[166]2.8270,[167]2.8399,[168]2.8647,[169]2.8872,[170]2.9083,[171]2.9311,[172]2.9150,[173]2.8980,[174]2.8843,[175]2.8712,[176]2.8589,[177]2.8467,[178]2.8338,[179]2.8193,[180]2.8228,[181]2.8370,[182]2.8519,[183]2.8669,[184]2.8813,[185]2.8915,[186]2.9083,[187]2.9241,[188]2.9381,[189]2.9489,[190]2.9490,[191]2.9561,[192]2.9601,[193]2.9652,[194]2.9848,[195]2.9935,[196]3.0068,[197]3.0167,[198]3.0211,[199]3.0267,[200]3.0261,[201]3.0415,[202]3.0361,[203]3.0413,[204]3.0446,[205]3.0447,[206]3.0468,[207]3.0552,[208]3.0645,[209]3.0737,[210]3.0738,[211]3.0688,[212]3.0689,[213]3.0765,[214]3.0781,[215]3.0837,[216]3.0847,[217]3.0805,[218]3.0804,[219]3.0811,[220]3.0800,[221]3.0803,[222]3.0803,[223]3.0805,[224]3.0856,[225]3.0871,[226]3.0791,[227]3.0772,[228]3.0792,[229]3.0835,[230]3.0900,[231]3.0962,[232]3.0880,[233]3.0801,[234]3.0803,[235]3.0787,[236]3.0879,[237]3.0957,[238]3.1050,[239]3.1151,[240]3.1241,[241]3.1353,[242]3.1498,[243]3.1632,[244]3.1713,[245]3.1831,[246]3.1937,[247]3.1927,[248]3.1884,[249]3.1867,[250]3.1804,[251]3.1782,[252]3.1805,[253]3.1841,[254]3.1910,[255]3.1971,[256]3.2005,[257]3.2032,[258]3.2042,[259]3.2076,[260]3.2098,[261]3.2107,[262]3.2099,[263]3.2158,[264]3.2179,[265]3.2182,[266]3.2199,[267]3.2230,[268]3.2267,[269]3.2298,[270]3.2290,[271]3.2271,[272]3.2205,[273]3.2208,[274]3.2143,[275]3.2037,[276]3.1934,[277]3.1951,[278]3.2052,[279]3.2115,[280]3.2195,[281]3.2272,[282]3.2333,[283]3.2398,[284]3.2466,[285]3.2603,[286]3.2626,[287]3.2661,[288]3.2707,[289]3.2732,[290]3.2648,[291]3.2557,[292]3.2544,[293]3.2536,[294]3.2513,[295]3.2487,[296]3.2507,[297]3.2513,[298]3.2562,[299]3.2620,[300]3.2651,[301]3.2691,[302]3.2713,[303]3.2734,[304]3.2726,[305]3.2845,[306]3.2922,[307]3.3033,[308]3.2916,[309]3.2865,[310]3.2769,[311]3.2804,[312]3.2825,[313]3.2893,[314]3.2915,[315]3.2946,[316]3.2959,[317]3.2974,[318]3.2979,[319]3.2982,[320]3.3026,[321]3.3028,[322]3.3042,[323]3.3106,[324]3.3112,[325]3.3167,[326]3.3214,[327]3.3255,[328]3.3282,[329]3.3297,[330]3.3360,[331]3.3396,[332]3.3443,[333]3.3428,[334]3.3425,[335]3.3428,[336]3.3429,[337]3.3437,[338]3.3441,[339]3.3466,[340]3.3502,[341]3.3555,[342]3.3649,[343]3.3744,[344]3.3797,[345]3.3713,[346]3.3640,[347]3.3597,[348]3.3523,[349]3.3488,[350]3.3471,[351]3.3521,[352]3.3671,[353]3.3761,[354]3.3892,[355]3.3977,[356]3.4029,[357]3.4148,[358]3.4246,[359]3.4279,[360]3.4346,[361]3.4439,[362]3.4526,[363]3.4586,[364]3.4649,[365]3.4715,[366]3.4822,[367]3.4909,[368]3.4975,[369]3.5054,[370]3.5138,[371]3.5277,[372]3.5368,[373]3.5401,[374]3.5435,[375]3.5485,[376]3.5616,[377]3.5727,[378]3.5754,[379]3.5749,[380]3.5715,[381]3.5762,[382]3.5816,[383]3.5853,[384]3.5894,[385]3.5931,[386]3.5996,[387]3.6055,[388]3.6087,[389]3.5980,[390]3.5883,[391]3.5774,[392]3.5715,[393]3.5623,[394]3.5535,[395]3.5438,[396]3.5336,[397]3.5245,[398]3.5146,[399]3.5042,[400]3.4963,[401]3.4863,[402]3.4756,[403]3.4668,[404]3.4563,[405]3.4465,[406]3.4364,[407]3.4270,[408]3.4178,[409]3.4090,[410]3.4031,[411]3.4038,[412]3.3993,[413]3.4012,[414]3.4038,[415]3.4009,[416]3.4009,[417]3.4034,[418]3.3979,[419]3.3991,[420]3.3966,[421]3.3953,[422]3.3970,[423]3.3964,[424]3.4006,[425]3.4005,[426]3.4009,[427]3.3997,[428]3.4021,[429]3.4037,[430]3.4064,[431]3.4074,[432]3.4064,[433]3.4027,[434]3.4028,[435]3.3956,[436]3.3891,[437]3.3851,[438]3.3833,[439]3.3805,[440]3.3855,[441]3.3905,[442]3.3979,[443]3.3964,[444]3.3972,[445]3.3983,[446]3.4029,[447]3.4058,[448]3.4083,[449]3.4114,[450]3.4154,[451]3.4184,[452]3.4206,[453]3.4223,[454]3.4208,[455]3.4229,[456]3.4232,[457]3.4257,[458]3.4311,[459]3.4317,[460]3.4318,[461]3.4284,[462]3.4322,[463]3.4396,[464]3.4448,[465]3.4381,[466]3.4361,[467]3.4344,[468]3.4355,[469]3.4328,[470]3.4301,[471]3.4304,[472]3.4311,[473]3.4304,[474]3.4295,[475]3.4308,[476]3.4290,[477]3.4282,[478]3.4288,[479]3.4307,[480]3.4334,[481]3.4290,[482]3.4325,[483]3.4316,[484]3.4353,[485]3.4416,[486]3.4444,[487]3.4479,[488]3.4531,[489]3.4555,[490]3.4603,[491]3.4665,[492]3.4709,[493]3.4707,[494]3.4719,[495]3.4746,[496]3.4764,[497]3.4794,[498]3.4798,[499]3.4790,[500]3.4832,[501]3.4877,[502]3.4865,[503]3.4849,[504]3.4871,[505]3.4905,[506]3.4988,[507]3.5016,[508]3.5050,[509]3.4973,[510]3.4914,[511]3.4851,[512]3.4810,[513]3.4750,[514]3.4738,[515]3.4761,[516]3.4714,[517]3.4713,[518]3.4704,[519]3.4710,[520]3.4755,[521]3.4744,[522]3.4730,[523]3.4790,[524]3.4775,[525]3.4761,[526]3.4715,[527]3.4663,[528]3.4628,[529]3.4599,[530]3.4568,[531]3.4536,[532]3.4479,[533]3.4415,[534]3.4370,[535]3.4382,[536]3.4410,[537]3.4443,[538]3.4469,[539]3.4496,[540]3.4550,[541]3.4584,[542]3.4607,[543]3.4552,[544]3.4512,[545]3.4508,[546]3.4440,[547]3.4374,[548]3.4307,[549]3.4240,[550]3.4178,[551]3.4116,[552]3.4060,[553]3.4002,[554]3.3983,[555]3.3970,[556]3.3998,[557]3.4039,[558]3.4098,[559]3.4145,[560]3.4197,[561]3.4178,
Final estimate: PPL = 3.4178 +/- 0.01891
👤 fredlas replied the 2025-03-19 at 15:49:40:
Were you thinking of uploading this to huggingface, by any chance? I can reproduce and upload it myself if necessary, but I haven't downloaded the full R1 weights yet, and would be happy to continue avoiding that if possible!👤 ubergarm replied the 2025-03-19 at 22:37:04:
@fredlas do you have any specific hardware configuration in mind? e.g. how much system RAM, and GPUs / VRAM? I put together rough notes on making your own custom quant in this quick-start guide discussion. I believe @davidsyoung has tailored the quant specific to his 16x3090 = 384 GB VRAM setup.I've made a couple quants now and have one okay one for 256GB RAM + 24GB VRAM single GPU configuration with better perplexity than unsloth
UD-Q2_K_XLbut just a little bit slower. I'm still experimenting to see how the various types effect generation speed vs perplexity while fitting inside the envelope of my current hardware.You can get started with
ik_llama.cppincluding-mla 2and repacked quants now with an existing unsloth quant or whatever you have probably. (sorry if you already know this, I'm still new here!) Cheers!👤 davidsyoung replied the 2025-03-19 at 23:18:56:
I might be able to upload if you give me enough time, however, I actually recommend getting used to quanting as there’s a lot tweaking you may want to do.For example, I don’t actually think this quant suits my setup best yet, and I’m actually underutilising one GPU. I just haven’t found a way to split the layers that well yet.
👤 fredlas replied the 2025-03-21 at 02:37:16:
@ubergarm 307GiB happens to be right around the size I'm thinking of. 72GiB VRAM + 256GiB RAM, for queuing up jobs to run overnight with 16k context - should just fit in there, I think. Funny coincidence for an extremely different configuration! Thanks for that guide - I made my own quants of Wizard2 8x22B a while back, but long enough that I was probably going to have to basically relearn it.@davidsyoung I'd say don't upload them just for my sake if you weren't already planning to - I just thought I'd check in case I could stay lazy. Plus this size range is probably pretty niche anyways; might not really be worth it in terms of helping people.
👤 ikawrakow replied the 2025-03-18 at 09:44:15:
Thank you for this. I think it can be really useful for people.
👤 saood06 replied the 2025-03-18 at 20:14:25:
@ikawrakow Can I convert this to a discussion?
👤 davidsyoung replied the 2025-03-18 at 20:19:37:
All good with me @saood06
👤 ikawrakow replied the 2025-03-18 at 20:29:32:
@ikawrakow Can I convert this to a discussion?
Sure, go ahead