Files
ik_llama.cpp/github-data/discussions/266 - Benchmarking DeepSeek R1 - 16x3090.md
2025-07-23 13:31:53 +02:00

468 lines
68 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

### 🗣️ [#266](https://github.com/ikawrakow/ik_llama.cpp/discussions/266) - Benchmarking DeepSeek R1 - 16x3090
| **Author** | `davidsyoung` |
| :--- | :--- |
| **Created** | 2025-03-18 |
| **Updated** | 2025-03-21 |
---
#### Description
Wanted to create a resource for anyone looking to optimise `-b -ub -amb` with `-mla 2 -fa -fmoe` with offloading DeepSeek R1 fully on CUDA with ik_llama.cpp @ https://github.com/ikawrakow/ik_llama.cpp/commit/dcdfad29f7d2b831f1c84751f00bda14cc359a84.
Layers are not evenly spread over 16 GPUs, and GPU utilisation is only at 5-10% on avg. <150w per GPU.
I'm not sure how useful this is, but ran it over night. It had an error on `-b 4096 pp8192` due to OOM but still feel it's useful!
| model | size | params | backend | ngl | n_batch | n_ubatch | fa | mla | amb | fmoe | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --: | ----: | ---: | ------------: | ---------------: |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | pp512 | 216.01 ± 4.70 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | pp1024 | 219.99 ± 2.45 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | pp2048 | 219.74 ± 1.46 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | pp4096 | 208.57 ± 0.58 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | pp8192 | 183.37 ± 0.73 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | tg128 | 17.22 ± 0.05 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | tg256 | 17.84 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | tg512 | 18.06 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | tg1024 | 18.02 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | tg2048 | 17.74 ± 0.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | pp512 | 238.55 ± 2.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | pp1024 | 235.57 ± 0.05 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | pp2048 | 226.29 ± 0.05 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | pp4096 | 208.86 ± 0.10 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | pp8192 | 182.56 ± 0.39 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | tg128 | 17.23 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | tg256 | 17.87 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | tg512 | 18.05 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | tg1024 | 18.01 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | tg2048 | 17.75 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | pp512 | 239.67 ± 1.22 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | pp1024 | 235.22 ± 1.85 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | pp2048 | 225.73 ± 0.06 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | pp4096 | 207.66 ± 0.12 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | pp8192 | 179.22 ± 0.24 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | tg128 | 17.25 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | tg256 | 17.85 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | tg512 | 18.05 ± 0.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | tg1024 | 18.04 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | tg2048 | 17.77 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | pp512 | 239.69 ± 0.92 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | pp1024 | 235.48 ± 0.07 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | pp2048 | 224.92 ± 0.24 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | pp4096 | 205.77 ± 0.20 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | pp8192 | 176.72 ± 0.14 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | tg128 | 17.21 ± 0.08 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | tg256 | 17.85 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | tg512 | 18.05 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | tg1024 | 18.04 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | tg2048 | 17.77 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | pp512 | 236.20 ± 0.76 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | pp1024 | 233.43 ± 0.95 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | pp2048 | 222.88 ± 0.17 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | pp4096 | 203.34 ± 0.16 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | pp8192 | 173.21 ± 0.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | tg128 | 17.27 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | tg256 | 17.85 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | tg512 | 18.06 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | tg1024 | 18.02 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | tg2048 | 17.79 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | pp512 | 238.70 ± 0.38 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | pp1024 | 303.92 ± 1.82 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | pp2048 | 295.71 ± 0.91 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | pp4096 | 276.63 ± 0.38 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | pp8192 | 244.18 ± 0.26 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | tg128 | 17.26 ± 0.05 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | tg256 | 17.79 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | tg512 | 18.09 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | tg1024 | 18.04 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | tg2048 | 17.77 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | pp512 | 239.64 ± 1.20 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | pp1024 | 305.79 ± 0.40 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | pp2048 | 296.58 ± 0.75 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | pp4096 | 276.62 ± 0.54 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | pp8192 | 244.26 ± 0.31 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | tg128 | 17.27 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | tg256 | 17.88 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | tg512 | 18.09 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | tg1024 | 18.05 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | tg2048 | 17.70 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | pp512 | 238.73 ± 1.24 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | pp1024 | 304.83 ± 0.61 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | pp2048 | 295.23 ± 0.09 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | pp4096 | 275.28 ± 0.29 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | pp8192 | 239.76 ± 0.39 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | tg128 | 17.21 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | tg256 | 17.82 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | tg512 | 18.05 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | tg1024 | 18.01 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | tg2048 | 17.71 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | pp512 | 237.98 ± 0.20 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | pp1024 | 304.20 ± 0.22 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | pp2048 | 293.80 ± 1.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | pp4096 | 272.19 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | pp8192 | 235.64 ± 0.42 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | tg128 | 17.14 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | tg256 | 17.79 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | tg512 | 18.02 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | tg1024 | 18.00 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | tg2048 | 17.72 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | pp512 | 238.40 ± 1.47 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | pp1024 | 301.66 ± 1.64 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | pp2048 | 290.44 ± 0.38 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | pp4096 | 267.12 ± 0.09 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | pp8192 | 229.98 ± 0.19 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | tg128 | 17.16 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | tg256 | 17.76 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | tg512 | 18.01 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | tg1024 | 17.97 ± 0.06 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | tg2048 | 17.73 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | pp512 | 240.23 ± 1.70 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | pp1024 | 305.03 ± 0.60 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | pp2048 | 349.22 ± 0.37 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | pp4096 | 327.33 ± 0.82 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | pp8192 | 290.90 ± 0.26 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | tg128 | 17.21 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | tg256 | 17.84 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | tg512 | 18.05 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | tg1024 | 18.01 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | tg2048 | 17.74 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | pp512 | 239.12 ± 3.60 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | pp1024 | 305.13 ± 1.86 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | pp2048 | 349.84 ± 0.12 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | pp4096 | 328.46 ± 0.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | pp8192 | 290.47 ± 0.23 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | tg128 | 17.24 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | tg256 | 17.81 ± 0.07 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | tg512 | 18.02 ± 0.06 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | tg1024 | 18.04 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | tg2048 | 17.79 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | pp512 | 238.52 ± 1.44 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | pp1024 | 304.77 ± 0.07 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | pp2048 | 348.11 ± 0.69 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | pp4096 | 326.30 ± 0.69 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | pp8192 | 288.35 ± 0.12 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | tg128 | 17.24 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | tg256 | 17.88 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | tg512 | 18.07 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | tg1024 | 18.05 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | tg2048 | 17.77 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | pp512 | 238.42 ± 1.40 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | pp1024 | 304.32 ± 1.66 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | pp2048 | 344.70 ± 1.92 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | pp4096 | 323.64 ± 0.60 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | pp8192 | 283.02 ± 0.24 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | tg128 | 17.22 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | tg256 | 17.86 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | tg512 | 18.06 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | tg1024 | 18.06 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | tg2048 | 17.79 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | pp512 | 236.64 ± 1.54 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | pp1024 | 301.44 ± 1.56 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | pp2048 | 343.13 ± 0.36 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | pp4096 | 317.60 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | pp8192 | 274.27 ± 0.22 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | tg128 | 17.28 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | tg256 | 17.89 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | tg512 | 18.08 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | tg1024 | 18.05 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | tg2048 | 17.78 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | pp512 | 238.37 ± 1.05 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | pp1024 | 304.95 ± 1.38 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | pp2048 | 349.14 ± 0.52 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | pp4096 | 327.89 ± 0.19 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | pp8192 | 291.05 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | tg128 | 17.25 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | tg256 | 17.81 ± 0.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | tg512 | 18.06 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | tg1024 | 18.04 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | tg2048 | 17.78 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | pp512 | 238.06 ± 0.70 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | pp1024 | 304.73 ± 0.74 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | pp2048 | 348.72 ± 1.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | pp4096 | 328.20 ± 0.51 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | pp8192 | 290.87 ± 0.49 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | tg128 | 17.27 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | tg256 | 17.88 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | tg512 | 18.09 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | tg1024 | 18.04 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | tg2048 | 17.72 ± 0.07 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | pp512 | 239.80 ± 0.46 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | pp1024 | 306.38 ± 1.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | pp2048 | 348.17 ± 0.55 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | pp4096 | 325.50 ± 0.88 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | pp8192 | 288.20 ± 0.07 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | tg128 | 17.25 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | tg256 | 17.83 ± 0.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | tg512 | 18.10 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | tg1024 | 18.06 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | tg2048 | 17.76 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | pp512 | 237.92 ± 2.32 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | pp1024 | 304.37 ± 0.47 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | pp2048 | 347.09 ± 0.66 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | pp4096 | 323.48 ± 0.46 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | pp8192 | 283.28 ± 0.14 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | tg128 | 17.20 ± 0.05 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | tg256 | 17.86 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | tg512 | 18.05 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | tg1024 | 18.05 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | tg2048 | 17.78 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | pp512 | 238.77 ± 2.73 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | pp1024 | 302.54 ± 0.90 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | pp2048 | 342.62 ± 0.56 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | pp4096 | 317.58 ± 0.10 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | pp8192 | 274.23 ± 0.40 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | tg128 | 17.27 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | tg256 | 17.88 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | tg512 | 18.09 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | tg1024 | 17.98 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | tg2048 | 17.78 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | pp512 | 240.30 ± 2.99 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | pp1024 | 236.20 ± 1.81 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | pp2048 | 226.46 ± 0.49 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | pp4096 | 209.52 ± 0.06 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | pp8192 | 183.03 ± 0.23 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | tg128 | 17.24 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | tg256 | 17.89 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | tg512 | 18.08 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | tg1024 | 18.06 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | tg2048 | 17.77 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | pp512 | 238.21 ± 0.99 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | pp1024 | 236.32 ± 1.53 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | pp2048 | 225.41 ± 0.24 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | pp4096 | 209.14 ± 0.30 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | pp8192 | 182.42 ± 0.08 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | tg128 | 17.24 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | tg256 | 17.86 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | tg512 | 18.09 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | tg1024 | 18.06 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | tg2048 | 17.78 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | pp512 | 239.31 ± 0.11 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | pp1024 | 234.58 ± 0.88 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | pp2048 | 224.77 ± 0.60 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | pp4096 | 207.35 ± 0.38 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | pp8192 | 178.79 ± 0.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | tg128 | 17.26 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | tg256 | 17.88 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | tg512 | 18.07 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | tg1024 | 18.05 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | tg2048 | 17.78 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | pp512 | 239.12 ± 0.21 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | pp1024 | 235.30 ± 1.41 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | pp2048 | 224.94 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | pp4096 | 206.20 ± 0.28 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | pp8192 | 176.54 ± 0.17 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | tg128 | 17.29 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | tg256 | 17.86 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | tg512 | 18.07 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | tg1024 | 17.99 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | tg2048 | 17.72 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | pp512 | 238.94 ± 0.70 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | pp1024 | 233.23 ± 0.45 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | pp2048 | 222.40 ± 0.23 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | pp4096 | 203.04 ± 0.51 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | pp8192 | 173.09 ± 0.06 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | tg128 | 17.25 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | tg256 | 17.89 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | tg512 | 18.06 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | tg1024 | 18.04 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | tg2048 | 17.76 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | pp512 | 239.80 ± 0.48 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | pp1024 | 305.07 ± 0.33 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | pp2048 | 295.09 ± 0.13 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | pp4096 | 275.70 ± 0.25 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | pp8192 | 243.52 ± 0.27 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | tg128 | 17.25 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | tg256 | 17.87 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | tg512 | 18.03 ± 0.06 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | tg1024 | 17.97 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | tg2048 | 17.72 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | pp512 | 241.05 ± 0.59 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | pp1024 | 304.85 ± 1.84 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | pp2048 | 295.04 ± 0.48 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | pp4096 | 276.20 ± 0.08 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | pp8192 | 243.36 ± 0.27 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | tg128 | 17.17 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | tg256 | 17.79 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | tg512 | 18.00 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | tg1024 | 17.98 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | tg2048 | 17.76 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | pp512 | 238.47 ± 0.34 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | pp1024 | 305.42 ± 1.32 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | pp2048 | 295.28 ± 0.20 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | pp4096 | 274.18 ± 0.37 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | pp8192 | 239.55 ± 0.20 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | tg128 | 17.27 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | tg256 | 17.85 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | tg512 | 17.99 ± 0.06 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | tg1024 | 18.04 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | tg2048 | 17.77 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | pp512 | 239.49 ± 0.90 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | pp1024 | 303.09 ± 1.76 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | pp2048 | 292.21 ± 1.47 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | pp4096 | 271.27 ± 0.16 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | pp8192 | 234.84 ± 0.11 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | tg128 | 17.23 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | tg256 | 17.83 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | tg512 | 18.06 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | tg1024 | 18.05 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | tg2048 | 17.73 ± 0.05 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | pp512 | 238.09 ± 1.33 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | pp1024 | 302.10 ± 0.35 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | pp2048 | 289.34 ± 0.51 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | pp4096 | 266.76 ± 0.16 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | pp8192 | 229.52 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | tg128 | 17.29 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | tg256 | 17.80 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | tg512 | 18.07 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | tg1024 | 18.04 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | tg2048 | 17.74 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | pp512 | 239.40 ± 0.85 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | pp1024 | 304.81 ± 0.38 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | pp2048 | 348.47 ± 1.08 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | pp4096 | 327.77 ± 0.24 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | pp8192 | 290.58 ± 0.18 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | tg128 | 17.26 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | tg256 | 17.86 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | tg512 | 18.08 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | tg1024 | 18.01 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | tg2048 | 17.67 ± 0.11 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | pp512 | 239.10 ± 1.34 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | pp1024 | 304.24 ± 2.13 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | pp2048 | 348.34 ± 0.82 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | pp4096 | 327.32 ± 0.20 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | pp8192 | 290.58 ± 0.09 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | tg128 | 17.27 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | tg256 | 17.83 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | tg512 | 18.06 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | tg1024 | 18.04 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | tg2048 | 17.71 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | pp512 | 239.16 ± 0.38 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | pp1024 | 304.15 ± 0.87 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | pp2048 | 347.30 ± 0.52 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | pp4096 | 325.70 ± 0.67 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | pp8192 | 287.87 ± 0.21 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | tg128 | 17.20 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | tg256 | 17.82 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | tg512 | 18.04 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | tg1024 | 18.01 ± 0.00 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | tg2048 | 17.72 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | pp512 | 240.31 ± 3.17 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | pp1024 | 303.77 ± 1.31 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | pp2048 | 346.19 ± 0.76 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | pp4096 | 323.25 ± 0.24 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | pp8192 | 282.42 ± 0.07 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | tg128 | 17.18 ± 0.12 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | tg256 | 17.79 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | tg512 | 17.99 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | tg1024 | 18.02 ± 0.02 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | tg2048 | 17.78 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | pp512 | 237.68 ± 1.86 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | pp1024 | 302.20 ± 1.45 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | pp2048 | 342.06 ± 0.96 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | pp4096 | 317.32 ± 0.50 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | pp8192 | 273.87 ± 0.54 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | tg128 | 17.28 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | tg256 | 17.85 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | tg512 | 18.03 ± 0.03 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | tg1024 | 18.04 ± 0.04 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | tg2048 | 17.77 ± 0.01 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 4096 | 1 | 2 | 1024 | 1 | pp512 | 238.93 ± 0.91 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 4096 | 1 | 2 | 1024 | 1 | pp1024 | 305.36 ± 0.21 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 4096 | 1 | 2 | 1024 | 1 | pp2048 | 348.42 ± 0.27 |
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 4096 | 1 | 2 | 1024 | 1 | pp4096 | 346.42 ± 0.52 |
---
Feel free to create whichever interesting graphs you find from it, as there's a lot of data it's quite hard to isolate:
# PP
![Image](https://github.com/user-attachments/assets/20ebe637-909c-4290-92b1-4f20460e8ed2)
![Image](https://github.com/user-attachments/assets/70bc8604-53f1-4723-a0ff-8c28fb694c67)
![Image](https://github.com/user-attachments/assets/fab55341-9c3f-48eb-afc1-8b5facbedbb2)
_TG shows no notable difference._
---
#### 🗣️ Discussion
👤 **davidsyoung** replied the **2025-03-18** at **09:37:29**:<br>
### Mixed quant of `Q8` for attn, `Q5 down / IQ4_XS up|gate` for layers 3-8, and `IQ4_XS down / IQ3_S up|gate`.
| Component | Blocks 0-2 | Blocks 3-8 | Blocks 9-60 |
|-----------|------------|------------|-------------|
| Attention Query/Key/Value | q8_0 | q8_0 | q8_0 |
| Attention Output | q8_0 | q8_0 | q8_0 |
| FFN Down (regular) | q8_0 | - | - |
| FFN Gate/Up (regular) | q8_0 | - | - |
| FFN Down Shared Experts | - | q5_K | q5_K |
| FFN Gate/Up Shared Experts | - | q5_K | q5_K |
| FFN Down Experts | - | q5_K | iq4_xs |
| FFN Gate/Up Experts | - | iq4_xs | iq3_s |
| Output Layer | q8_0 | q8_0 | q8_0 |
Compression Results
Original size: 1,282,038 MB (~1.2 TB)
Quantized size: 314,569 MB (~307 GB)
Compression ratio: 4.1x
---
### PPL
```
perplexity: tokenizing the input ..
perplexity: tokenization took 1195.26 ms
perplexity: calculating perplexity over 561 chunks, n_ctx=512, batch_size=2048, n_seq=4
perplexity: 11.69 seconds per pass - ETA 27.32 minutes
[1]2.5779,[2]3.3447,[3]2.4073,[4]2.0140,[5]1.8352,[6]1.6862,[7]1.5895,[8]1.5208,[9]1.4715,[10]1.4284,[11]1.4147,[12]1.4406,[13]1.4529,[14]1.5824,[15]1.7144,[16]1.7752,[17]1.9408,[18]2.0703,[19]2.0333,[20]2.0250,[21]2.1305,[22]2.1021,[23]2.0764,[24]2.0880,[25]2.0581,[26]2.0330,[27]2.0797,[28]2.0888,[29]2.1391,[30]2.1698,[31]2.2044,[32]2.2227,[33]2.2626,[34]2.3049,[35]2.3566,[36]2.4115,[37]2.4463,[38]2.4930,[39]2.5346,[40]2.5926,[41]2.6353,[42]2.6458,[43]2.6948,[44]2.7107,[45]2.7909,[46]2.8420,[47]2.8003,[48]2.7549,[49]2.7298,[50]2.7498,[51]2.7964,[52]2.8105,[53]2.8597,[54]2.8734,[55]2.9047,[56]2.9384,[57]2.9550,[58]2.9926,[59]3.0027,[60]3.0502,[61]3.0906,[62]3.1475,[63]3.1812,[64]3.2262,[65]3.2360,[66]3.2179,[67]3.1954,[68]3.2271,[69]3.2225,[70]3.2377,[71]3.2562,[72]3.2726,[73]3.2860,[74]3.3095,[75]3.2881,[76]3.2396,[77]3.1959,[78]3.1931,[79]3.1728,[80]3.1563,[81]3.1190,[82]3.1220,[83]3.0918,[84]3.0554,[85]3.0218,[86]2.9995,[87]2.9958,[88]2.9686,[89]2.9537,[90]2.9261,[91]2.8966,[92]2.8704,[93]2.8441,[94]2.8196,[95]2.7964,[96]2.7947,[97]2.8024,[98]2.7882,[99]2.7728,[100]2.7752,[101]2.7671,[102]2.7843,[103]2.8105,[104]2.8288,[105]2.8261,[106]2.8486,[107]2.8737,[108]2.8953,[109]2.9296,[110]2.9637,[111]2.9837,[112]2.9567,[113]2.9436,[114]2.9207,[115]2.9047,[116]2.8905,[117]2.8672,[118]2.8450,[119]2.8235,[120]2.8040,[121]2.7884,[122]2.7698,[123]2.7532,[124]2.7334,[125]2.7156,[126]2.6981,[127]2.6840,[128]2.6757,[129]2.6662,[130]2.6551,[131]2.6472,[132]2.6548,[133]2.6649,[134]2.6714,[135]2.6822,[136]2.6990,[137]2.7145,[138]2.7231,[139]2.7348,[140]2.7353,[141]2.7368,[142]2.7356,[143]2.7359,[144]2.7320,[145]2.7228,[146]2.7211,[147]2.7254,[148]2.7248,[149]2.7265,[150]2.7210,[151]2.7192,[152]2.7157,[153]2.7114,[154]2.7119,[155]2.7159,[156]2.7180,[157]2.7237,[158]2.7322,[159]2.7339,[160]2.7428,[161]2.7509,[162]2.7605,[163]2.7660,[164]2.7863,[165]2.8095,[166]2.8270,[167]2.8399,[168]2.8647,[169]2.8872,[170]2.9083,[171]2.9311,[172]2.9150,[173]2.8980,[174]2.8843,[175]2.8712,[176]2.8589,[177]2.8467,[178]2.8338,[179]2.8193,[180]2.8228,[181]2.8370,[182]2.8519,[183]2.8669,[184]2.8813,[185]2.8915,[186]2.9083,[187]2.9241,[188]2.9381,[189]2.9489,[190]2.9490,[191]2.9561,[192]2.9601,[193]2.9652,[194]2.9848,[195]2.9935,[196]3.0068,[197]3.0167,[198]3.0211,[199]3.0267,[200]3.0261,[201]3.0415,[202]3.0361,[203]3.0413,[204]3.0446,[205]3.0447,[206]3.0468,[207]3.0552,[208]3.0645,[209]3.0737,[210]3.0738,[211]3.0688,[212]3.0689,[213]3.0765,[214]3.0781,[215]3.0837,[216]3.0847,[217]3.0805,[218]3.0804,[219]3.0811,[220]3.0800,[221]3.0803,[222]3.0803,[223]3.0805,[224]3.0856,[225]3.0871,[226]3.0791,[227]3.0772,[228]3.0792,[229]3.0835,[230]3.0900,[231]3.0962,[232]3.0880,[233]3.0801,[234]3.0803,[235]3.0787,[236]3.0879,[237]3.0957,[238]3.1050,[239]3.1151,[240]3.1241,[241]3.1353,[242]3.1498,[243]3.1632,[244]3.1713,[245]3.1831,[246]3.1937,[247]3.1927,[248]3.1884,[249]3.1867,[250]3.1804,[251]3.1782,[252]3.1805,[253]3.1841,[254]3.1910,[255]3.1971,[256]3.2005,[257]3.2032,[258]3.2042,[259]3.2076,[260]3.2098,[261]3.2107,[262]3.2099,[263]3.2158,[264]3.2179,[265]3.2182,[266]3.2199,[267]3.2230,[268]3.2267,[269]3.2298,[270]3.2290,[271]3.2271,[272]3.2205,[273]3.2208,[274]3.2143,[275]3.2037,[276]3.1934,[277]3.1951,[278]3.2052,[279]3.2115,[280]3.2195,[281]3.2272,[282]3.2333,[283]3.2398,[284]3.2466,[285]3.2603,[286]3.2626,[287]3.2661,[288]3.2707,[289]3.2732,[290]3.2648,[291]3.2557,[292]3.2544,[293]3.2536,[294]3.2513,[295]3.2487,[296]3.2507,[297]3.2513,[298]3.2562,[299]3.2620,[300]3.2651,[301]3.2691,[302]3.2713,[303]3.2734,[304]3.2726,[305]3.2845,[306]3.2922,[307]3.3033,[308]3.2916,[309]3.2865,[310]3.2769,[311]3.2804,[312]3.2825,[313]3.2893,[314]3.2915,[315]3.2946,[316]3.2959,[317]3.2974,[318]3.2979,[319]3.2982,[320]3.3026,[321]3.3028,[322]3.3042,[323]3.3106,[324]3.3112,[325]3.3167,[326]3.3214,[327]3.3255,[328]3.3282,[329]3.3297,[330]3.3360,[331]3.3396,[332]3.3443,[333]3.3428,[334]3.3425,[335]3.3428,[336]3.3429,[337]3.3437,[338]3.3441,[339]3.3466,[340]3.3502,[341]3.3555,[342]3.3649,[343]3.3744,[344]3.3797,[345]3.3713,[346]3.3640,[347]3.3597,[348]3.3523,[349]3.3488,[350]3.3471,[351]3.3521,[352]3.3671,[353]3.3761,[354]3.3892,[355]3.3977,[356]3.4029,[357]3.4148,[358]3.4246,[359]3.4279,[360]3.4346,[361]3.4439,[362]3.4526,[363]3.4586,[364]3.4649,[365]3.4715,[366]3.4822,[367]3.4909,[368]3.4975,[369]3.5054,[370]3.5138,[371]3.5277,[372]3.5368,[373]3.5401,[374]3.5435,[375]3.5485,[376]3.5616,[377]3.5727,[378]3.5754,[379]3.5749,[380]3.5715,[381]3.5762,[382]3.5816,[383]3.5853,[384]3.5894,[385]3.5931,[386]3.5996,[387]3.6055,[388]3.6087,[389]3.5980,[390]3.5883,[391]3.5774,[392]3.5715,[393]3.5623,[394]3.5535,[395]3.5438,[396]3.5336,[397]3.5245,[398]3.5146,[399]3.5042,[400]3.4963,[401]3.4863,[402]3.4756,[403]3.4668,[404]3.4563,[405]3.4465,[406]3.4364,[407]3.4270,[408]3.4178,[409]3.4090,[410]3.4031,[411]3.4038,[412]3.3993,[413]3.4012,[414]3.4038,[415]3.4009,[416]3.4009,[417]3.4034,[418]3.3979,[419]3.3991,[420]3.3966,[421]3.3953,[422]3.3970,[423]3.3964,[424]3.4006,[425]3.4005,[426]3.4009,[427]3.3997,[428]3.4021,[429]3.4037,[430]3.4064,[431]3.4074,[432]3.4064,[433]3.4027,[434]3.4028,[435]3.3956,[436]3.3891,[437]3.3851,[438]3.3833,[439]3.3805,[440]3.3855,[441]3.3905,[442]3.3979,[443]3.3964,[444]3.3972,[445]3.3983,[446]3.4029,[447]3.4058,[448]3.4083,[449]3.4114,[450]3.4154,[451]3.4184,[452]3.4206,[453]3.4223,[454]3.4208,[455]3.4229,[456]3.4232,[457]3.4257,[458]3.4311,[459]3.4317,[460]3.4318,[461]3.4284,[462]3.4322,[463]3.4396,[464]3.4448,[465]3.4381,[466]3.4361,[467]3.4344,[468]3.4355,[469]3.4328,[470]3.4301,[471]3.4304,[472]3.4311,[473]3.4304,[474]3.4295,[475]3.4308,[476]3.4290,[477]3.4282,[478]3.4288,[479]3.4307,[480]3.4334,[481]3.4290,[482]3.4325,[483]3.4316,[484]3.4353,[485]3.4416,[486]3.4444,[487]3.4479,[488]3.4531,[489]3.4555,[490]3.4603,[491]3.4665,[492]3.4709,[493]3.4707,[494]3.4719,[495]3.4746,[496]3.4764,[497]3.4794,[498]3.4798,[499]3.4790,[500]3.4832,[501]3.4877,[502]3.4865,[503]3.4849,[504]3.4871,[505]3.4905,[506]3.4988,[507]3.5016,[508]3.5050,[509]3.4973,[510]3.4914,[511]3.4851,[512]3.4810,[513]3.4750,[514]3.4738,[515]3.4761,[516]3.4714,[517]3.4713,[518]3.4704,[519]3.4710,[520]3.4755,[521]3.4744,[522]3.4730,[523]3.4790,[524]3.4775,[525]3.4761,[526]3.4715,[527]3.4663,[528]3.4628,[529]3.4599,[530]3.4568,[531]3.4536,[532]3.4479,[533]3.4415,[534]3.4370,[535]3.4382,[536]3.4410,[537]3.4443,[538]3.4469,[539]3.4496,[540]3.4550,[541]3.4584,[542]3.4607,[543]3.4552,[544]3.4512,[545]3.4508,[546]3.4440,[547]3.4374,[548]3.4307,[549]3.4240,[550]3.4178,[551]3.4116,[552]3.4060,[553]3.4002,[554]3.3983,[555]3.3970,[556]3.3998,[557]3.4039,[558]3.4098,[559]3.4145,[560]3.4197,[561]3.4178,
Final estimate: PPL = 3.4178 +/- 0.01891
```
> 👤 **fredlas** replied the **2025-03-19** at **15:49:40**:<br>
> Were you thinking of uploading this to huggingface, by any chance? I can reproduce and upload it myself if necessary, but I haven't downloaded the full R1 weights yet, and would be happy to continue avoiding that if possible!
>
> 👤 **ubergarm** replied the **2025-03-19** at **22:37:04**:<br>
> @fredlas do you have any specific hardware configuration in mind? e.g. how much system RAM, and GPUs / VRAM? I put together rough notes on making your own custom quant in [this quick-start guide discussion](https://github.com/ikawrakow/ik_llama.cpp/discussions/258). I believe @davidsyoung has tailored the quant specific to his 16x3090 = 384 GB VRAM setup.
>
> I've made a couple quants now and have one okay one for 256GB RAM + 24GB VRAM single GPU configuration with better perplexity than unsloth `UD-Q2_K_XL` but just a little bit slower. I'm still experimenting to see how the various types effect generation speed vs perplexity while fitting inside the envelope of my current hardware.
>
> You can get started with `ik_llama.cpp` including `-mla 2` and repacked quants now with an existing unsloth quant or whatever you have probably. (sorry if you already know this, I'm still new here!) Cheers!
>
> 👤 **davidsyoung** replied the **2025-03-19** at **23:18:56**:<br>
> I might be able to upload if you give me enough time, however, I actually recommend getting used to quanting as theres _a lot_ tweaking you may want to do.
>
> For example, I dont actually think this quant suits my setup best yet, and Im actually underutilising one GPU. I just havent found a way to split the layers that well yet.
>
> 👤 **fredlas** replied the **2025-03-21** at **02:37:16**:<br>
> @ubergarm 307GiB happens to be right around the size I'm thinking of. 72GiB VRAM + 256GiB RAM, for queuing up jobs to run overnight with 16k context - should just fit in there, I think. Funny coincidence for an extremely different configuration! Thanks for that guide - I made my own quants of Wizard2 8x22B a while back, but long enough that I was probably going to have to basically relearn it.
>
> @davidsyoung I'd say don't upload them just for my sake if you weren't already planning to - I just thought I'd check in case I could stay lazy. Plus this size range is probably pretty niche anyways; might not really be worth it in terms of helping people.
---
👤 **ikawrakow** replied the **2025-03-18** at **09:44:15**:<br>
Thank you for this. I think it can be really useful for people.
---
👤 **saood06** replied the **2025-03-18** at **20:14:25**:<br>
@ikawrakow Can I convert this to a discussion?
---
👤 **davidsyoung** replied the **2025-03-18** at **20:19:37**:<br>
All good with me @saood06
---
👤 **ikawrakow** replied the **2025-03-18** at **20:29:32**:<br>
> @ikawrakow Can I convert this to a discussion?
Sure, go ahead