mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-01-26 17:20:01 +00:00
468 lines
68 KiB
Markdown
468 lines
68 KiB
Markdown
### 🗣️ [#266](https://github.com/ikawrakow/ik_llama.cpp/discussions/266) - Benchmarking DeepSeek R1 - 16x3090
|
||
|
||
| **Author** | `davidsyoung` |
|
||
| :--- | :--- |
|
||
| **Created** | 2025-03-18 |
|
||
| **Updated** | 2025-03-21 |
|
||
|
||
---
|
||
|
||
#### Description
|
||
|
||
Wanted to create a resource for anyone looking to optimise `-b -ub -amb` with `-mla 2 -fa -fmoe` with offloading DeepSeek R1 fully on CUDA with ik_llama.cpp @ https://github.com/ikawrakow/ik_llama.cpp/commit/dcdfad29f7d2b831f1c84751f00bda14cc359a84.
|
||
|
||
Layers are not evenly spread over 16 GPUs, and GPU utilisation is only at 5-10% on avg. <150w per GPU.
|
||
|
||
I'm not sure how useful this is, but ran it over night. It had an error on `-b 4096 pp8192` due to OOM but still feel it's useful!
|
||
|
||
|
||
| model | size | params | backend | ngl | n_batch | n_ubatch | fa | mla | amb | fmoe | test | t/s |
|
||
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --: | ----: | ---: | ------------: | ---------------: |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | pp512 | 216.01 ± 4.70 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | pp1024 | 219.99 ± 2.45 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | pp2048 | 219.74 ± 1.46 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | pp4096 | 208.57 ± 0.58 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | pp8192 | 183.37 ± 0.73 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | tg128 | 17.22 ± 0.05 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | tg256 | 17.84 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | tg512 | 18.06 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | tg1024 | 18.02 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 1024 | 1 | tg2048 | 17.74 ± 0.04 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | pp512 | 238.55 ± 2.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | pp1024 | 235.57 ± 0.05 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | pp2048 | 226.29 ± 0.05 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | pp4096 | 208.86 ± 0.10 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | pp8192 | 182.56 ± 0.39 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | tg128 | 17.23 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | tg256 | 17.87 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | tg512 | 18.05 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | tg1024 | 18.01 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 512 | 1 | tg2048 | 17.75 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | pp512 | 239.67 ± 1.22 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | pp1024 | 235.22 ± 1.85 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | pp2048 | 225.73 ± 0.06 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | pp4096 | 207.66 ± 0.12 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | pp8192 | 179.22 ± 0.24 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | tg128 | 17.25 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | tg256 | 17.85 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | tg512 | 18.05 ± 0.04 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | tg1024 | 18.04 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 128 | 1 | tg2048 | 17.77 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | pp512 | 239.69 ± 0.92 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | pp1024 | 235.48 ± 0.07 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | pp2048 | 224.92 ± 0.24 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | pp4096 | 205.77 ± 0.20 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | pp8192 | 176.72 ± 0.14 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | tg128 | 17.21 ± 0.08 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | tg256 | 17.85 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | tg512 | 18.05 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | tg1024 | 18.04 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 64 | 1 | tg2048 | 17.77 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | pp512 | 236.20 ± 0.76 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | pp1024 | 233.43 ± 0.95 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | pp2048 | 222.88 ± 0.17 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | pp4096 | 203.34 ± 0.16 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | pp8192 | 173.21 ± 0.04 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | tg128 | 17.27 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | tg256 | 17.85 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | tg512 | 18.06 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | tg1024 | 18.02 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 512 | 1 | 2 | 32 | 1 | tg2048 | 17.79 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | pp512 | 238.70 ± 0.38 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | pp1024 | 303.92 ± 1.82 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | pp2048 | 295.71 ± 0.91 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | pp4096 | 276.63 ± 0.38 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | pp8192 | 244.18 ± 0.26 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | tg128 | 17.26 ± 0.05 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | tg256 | 17.79 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | tg512 | 18.09 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | tg1024 | 18.04 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 1024 | 1 | tg2048 | 17.77 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | pp512 | 239.64 ± 1.20 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | pp1024 | 305.79 ± 0.40 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | pp2048 | 296.58 ± 0.75 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | pp4096 | 276.62 ± 0.54 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | pp8192 | 244.26 ± 0.31 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | tg128 | 17.27 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | tg256 | 17.88 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | tg512 | 18.09 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | tg1024 | 18.05 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 512 | 1 | tg2048 | 17.70 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | pp512 | 238.73 ± 1.24 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | pp1024 | 304.83 ± 0.61 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | pp2048 | 295.23 ± 0.09 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | pp4096 | 275.28 ± 0.29 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | pp8192 | 239.76 ± 0.39 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | tg128 | 17.21 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | tg256 | 17.82 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | tg512 | 18.05 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | tg1024 | 18.01 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 128 | 1 | tg2048 | 17.71 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | pp512 | 237.98 ± 0.20 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | pp1024 | 304.20 ± 0.22 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | pp2048 | 293.80 ± 1.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | pp4096 | 272.19 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | pp8192 | 235.64 ± 0.42 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | tg128 | 17.14 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | tg256 | 17.79 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | tg512 | 18.02 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | tg1024 | 18.00 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 64 | 1 | tg2048 | 17.72 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | pp512 | 238.40 ± 1.47 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | pp1024 | 301.66 ± 1.64 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | pp2048 | 290.44 ± 0.38 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | pp4096 | 267.12 ± 0.09 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | pp8192 | 229.98 ± 0.19 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | tg128 | 17.16 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | tg256 | 17.76 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | tg512 | 18.01 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | tg1024 | 17.97 ± 0.06 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 1024 | 1 | 2 | 32 | 1 | tg2048 | 17.73 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | pp512 | 240.23 ± 1.70 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | pp1024 | 305.03 ± 0.60 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | pp2048 | 349.22 ± 0.37 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | pp4096 | 327.33 ± 0.82 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | pp8192 | 290.90 ± 0.26 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | tg128 | 17.21 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | tg256 | 17.84 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | tg512 | 18.05 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | tg1024 | 18.01 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 1024 | 1 | tg2048 | 17.74 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | pp512 | 239.12 ± 3.60 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | pp1024 | 305.13 ± 1.86 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | pp2048 | 349.84 ± 0.12 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | pp4096 | 328.46 ± 0.04 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | pp8192 | 290.47 ± 0.23 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | tg128 | 17.24 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | tg256 | 17.81 ± 0.07 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | tg512 | 18.02 ± 0.06 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | tg1024 | 18.04 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 512 | 1 | tg2048 | 17.79 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | pp512 | 238.52 ± 1.44 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | pp1024 | 304.77 ± 0.07 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | pp2048 | 348.11 ± 0.69 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | pp4096 | 326.30 ± 0.69 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | pp8192 | 288.35 ± 0.12 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | tg128 | 17.24 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | tg256 | 17.88 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | tg512 | 18.07 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | tg1024 | 18.05 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 128 | 1 | tg2048 | 17.77 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | pp512 | 238.42 ± 1.40 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | pp1024 | 304.32 ± 1.66 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | pp2048 | 344.70 ± 1.92 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | pp4096 | 323.64 ± 0.60 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | pp8192 | 283.02 ± 0.24 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | tg128 | 17.22 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | tg256 | 17.86 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | tg512 | 18.06 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | tg1024 | 18.06 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 64 | 1 | tg2048 | 17.79 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | pp512 | 236.64 ± 1.54 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | pp1024 | 301.44 ± 1.56 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | pp2048 | 343.13 ± 0.36 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | pp4096 | 317.60 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | pp8192 | 274.27 ± 0.22 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | tg128 | 17.28 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | tg256 | 17.89 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | tg512 | 18.08 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | tg1024 | 18.05 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 2048 | 1 | 2 | 32 | 1 | tg2048 | 17.78 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | pp512 | 238.37 ± 1.05 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | pp1024 | 304.95 ± 1.38 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | pp2048 | 349.14 ± 0.52 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | pp4096 | 327.89 ± 0.19 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | pp8192 | 291.05 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | tg128 | 17.25 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | tg256 | 17.81 ± 0.04 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | tg512 | 18.06 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | tg1024 | 18.04 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 1024 | 1 | tg2048 | 17.78 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | pp512 | 238.06 ± 0.70 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | pp1024 | 304.73 ± 0.74 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | pp2048 | 348.72 ± 1.04 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | pp4096 | 328.20 ± 0.51 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | pp8192 | 290.87 ± 0.49 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | tg128 | 17.27 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | tg256 | 17.88 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | tg512 | 18.09 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | tg1024 | 18.04 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 512 | 1 | tg2048 | 17.72 ± 0.07 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | pp512 | 239.80 ± 0.46 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | pp1024 | 306.38 ± 1.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | pp2048 | 348.17 ± 0.55 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | pp4096 | 325.50 ± 0.88 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | pp8192 | 288.20 ± 0.07 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | tg128 | 17.25 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | tg256 | 17.83 ± 0.04 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | tg512 | 18.10 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | tg1024 | 18.06 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 128 | 1 | tg2048 | 17.76 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | pp512 | 237.92 ± 2.32 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | pp1024 | 304.37 ± 0.47 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | pp2048 | 347.09 ± 0.66 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | pp4096 | 323.48 ± 0.46 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | pp8192 | 283.28 ± 0.14 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | tg128 | 17.20 ± 0.05 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | tg256 | 17.86 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | tg512 | 18.05 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | tg1024 | 18.05 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 64 | 1 | tg2048 | 17.78 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | pp512 | 238.77 ± 2.73 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | pp1024 | 302.54 ± 0.90 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | pp2048 | 342.62 ± 0.56 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | pp4096 | 317.58 ± 0.10 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | pp8192 | 274.23 ± 0.40 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | tg128 | 17.27 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | tg256 | 17.88 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | tg512 | 18.09 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | tg1024 | 17.98 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 2048 | 4096 | 1 | 2 | 32 | 1 | tg2048 | 17.78 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | pp512 | 240.30 ± 2.99 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | pp1024 | 236.20 ± 1.81 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | pp2048 | 226.46 ± 0.49 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | pp4096 | 209.52 ± 0.06 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | pp8192 | 183.03 ± 0.23 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | tg128 | 17.24 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | tg256 | 17.89 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | tg512 | 18.08 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | tg1024 | 18.06 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 1024 | 1 | tg2048 | 17.77 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | pp512 | 238.21 ± 0.99 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | pp1024 | 236.32 ± 1.53 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | pp2048 | 225.41 ± 0.24 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | pp4096 | 209.14 ± 0.30 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | pp8192 | 182.42 ± 0.08 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | tg128 | 17.24 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | tg256 | 17.86 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | tg512 | 18.09 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | tg1024 | 18.06 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 512 | 1 | tg2048 | 17.78 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | pp512 | 239.31 ± 0.11 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | pp1024 | 234.58 ± 0.88 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | pp2048 | 224.77 ± 0.60 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | pp4096 | 207.35 ± 0.38 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | pp8192 | 178.79 ± 0.04 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | tg128 | 17.26 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | tg256 | 17.88 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | tg512 | 18.07 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | tg1024 | 18.05 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 128 | 1 | tg2048 | 17.78 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | pp512 | 239.12 ± 0.21 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | pp1024 | 235.30 ± 1.41 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | pp2048 | 224.94 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | pp4096 | 206.20 ± 0.28 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | pp8192 | 176.54 ± 0.17 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | tg128 | 17.29 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | tg256 | 17.86 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | tg512 | 18.07 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | tg1024 | 17.99 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 64 | 1 | tg2048 | 17.72 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | pp512 | 238.94 ± 0.70 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | pp1024 | 233.23 ± 0.45 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | pp2048 | 222.40 ± 0.23 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | pp4096 | 203.04 ± 0.51 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | pp8192 | 173.09 ± 0.06 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | tg128 | 17.25 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | tg256 | 17.89 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | tg512 | 18.06 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | tg1024 | 18.04 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 512 | 1 | 2 | 32 | 1 | tg2048 | 17.76 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | pp512 | 239.80 ± 0.48 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | pp1024 | 305.07 ± 0.33 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | pp2048 | 295.09 ± 0.13 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | pp4096 | 275.70 ± 0.25 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | pp8192 | 243.52 ± 0.27 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | tg128 | 17.25 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | tg256 | 17.87 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | tg512 | 18.03 ± 0.06 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | tg1024 | 17.97 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 1024 | 1 | tg2048 | 17.72 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | pp512 | 241.05 ± 0.59 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | pp1024 | 304.85 ± 1.84 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | pp2048 | 295.04 ± 0.48 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | pp4096 | 276.20 ± 0.08 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | pp8192 | 243.36 ± 0.27 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | tg128 | 17.17 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | tg256 | 17.79 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | tg512 | 18.00 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | tg1024 | 17.98 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 512 | 1 | tg2048 | 17.76 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | pp512 | 238.47 ± 0.34 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | pp1024 | 305.42 ± 1.32 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | pp2048 | 295.28 ± 0.20 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | pp4096 | 274.18 ± 0.37 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | pp8192 | 239.55 ± 0.20 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | tg128 | 17.27 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | tg256 | 17.85 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | tg512 | 17.99 ± 0.06 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | tg1024 | 18.04 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 128 | 1 | tg2048 | 17.77 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | pp512 | 239.49 ± 0.90 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | pp1024 | 303.09 ± 1.76 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | pp2048 | 292.21 ± 1.47 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | pp4096 | 271.27 ± 0.16 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | pp8192 | 234.84 ± 0.11 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | tg128 | 17.23 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | tg256 | 17.83 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | tg512 | 18.06 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | tg1024 | 18.05 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 64 | 1 | tg2048 | 17.73 ± 0.05 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | pp512 | 238.09 ± 1.33 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | pp1024 | 302.10 ± 0.35 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | pp2048 | 289.34 ± 0.51 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | pp4096 | 266.76 ± 0.16 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | pp8192 | 229.52 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | tg128 | 17.29 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | tg256 | 17.80 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | tg512 | 18.07 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | tg1024 | 18.04 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 1024 | 1 | 2 | 32 | 1 | tg2048 | 17.74 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | pp512 | 239.40 ± 0.85 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | pp1024 | 304.81 ± 0.38 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | pp2048 | 348.47 ± 1.08 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | pp4096 | 327.77 ± 0.24 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | pp8192 | 290.58 ± 0.18 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | tg128 | 17.26 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | tg256 | 17.86 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | tg512 | 18.08 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | tg1024 | 18.01 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 1024 | 1 | tg2048 | 17.67 ± 0.11 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | pp512 | 239.10 ± 1.34 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | pp1024 | 304.24 ± 2.13 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | pp2048 | 348.34 ± 0.82 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | pp4096 | 327.32 ± 0.20 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | pp8192 | 290.58 ± 0.09 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | tg128 | 17.27 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | tg256 | 17.83 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | tg512 | 18.06 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | tg1024 | 18.04 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 512 | 1 | tg2048 | 17.71 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | pp512 | 239.16 ± 0.38 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | pp1024 | 304.15 ± 0.87 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | pp2048 | 347.30 ± 0.52 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | pp4096 | 325.70 ± 0.67 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | pp8192 | 287.87 ± 0.21 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | tg128 | 17.20 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | tg256 | 17.82 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | tg512 | 18.04 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | tg1024 | 18.01 ± 0.00 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 128 | 1 | tg2048 | 17.72 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | pp512 | 240.31 ± 3.17 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | pp1024 | 303.77 ± 1.31 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | pp2048 | 346.19 ± 0.76 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | pp4096 | 323.25 ± 0.24 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | pp8192 | 282.42 ± 0.07 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | tg128 | 17.18 ± 0.12 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | tg256 | 17.79 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | tg512 | 17.99 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | tg1024 | 18.02 ± 0.02 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 64 | 1 | tg2048 | 17.78 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | pp512 | 237.68 ± 1.86 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | pp1024 | 302.20 ± 1.45 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | pp2048 | 342.06 ± 0.96 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | pp4096 | 317.32 ± 0.50 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | pp8192 | 273.87 ± 0.54 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | tg128 | 17.28 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | tg256 | 17.85 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | tg512 | 18.03 ± 0.03 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | tg1024 | 18.04 ± 0.04 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 2048 | 1 | 2 | 32 | 1 | tg2048 | 17.77 ± 0.01 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 4096 | 1 | 2 | 1024 | 1 | pp512 | 238.93 ± 0.91 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 4096 | 1 | 2 | 1024 | 1 | pp1024 | 305.36 ± 0.21 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 4096 | 1 | 2 | 1024 | 1 | pp2048 | 348.42 ± 0.27 |
|
||
| deepseek2 671B Q8_0 | 307.20 GiB | 672.05 B | CUDA | 63 | 4096 | 4096 | 1 | 2 | 1024 | 1 | pp4096 | 346.42 ± 0.52 |
|
||
|
||
---
|
||
|
||
Feel free to create whichever interesting graphs you find from it, as there's a lot of data it's quite hard to isolate:
|
||
|
||
# PP
|
||
|
||

|
||

|
||

|
||
|
||
_TG shows no notable difference._
|
||
|
||
---
|
||
|
||
#### 🗣️ Discussion
|
||
|
||
👤 **davidsyoung** replied the **2025-03-18** at **09:37:29**:<br>
|
||
|
||
### Mixed quant of `Q8` for attn, `Q5 down / IQ4_XS up|gate` for layers 3-8, and `IQ4_XS down / IQ3_S up|gate`.
|
||
|
||
| Component | Blocks 0-2 | Blocks 3-8 | Blocks 9-60 |
|
||
|-----------|------------|------------|-------------|
|
||
| Attention Query/Key/Value | q8_0 | q8_0 | q8_0 |
|
||
| Attention Output | q8_0 | q8_0 | q8_0 |
|
||
| FFN Down (regular) | q8_0 | - | - |
|
||
| FFN Gate/Up (regular) | q8_0 | - | - |
|
||
| FFN Down Shared Experts | - | q5_K | q5_K |
|
||
| FFN Gate/Up Shared Experts | - | q5_K | q5_K |
|
||
| FFN Down Experts | - | q5_K | iq4_xs |
|
||
| FFN Gate/Up Experts | - | iq4_xs | iq3_s |
|
||
| Output Layer | q8_0 | q8_0 | q8_0 |
|
||
Compression Results
|
||
Original size: 1,282,038 MB (~1.2 TB)
|
||
Quantized size: 314,569 MB (~307 GB)
|
||
Compression ratio: 4.1x
|
||
---
|
||
|
||
### PPL
|
||
|
||
```
|
||
perplexity: tokenizing the input ..
|
||
perplexity: tokenization took 1195.26 ms
|
||
perplexity: calculating perplexity over 561 chunks, n_ctx=512, batch_size=2048, n_seq=4
|
||
perplexity: 11.69 seconds per pass - ETA 27.32 minutes
|
||
[1]2.5779,[2]3.3447,[3]2.4073,[4]2.0140,[5]1.8352,[6]1.6862,[7]1.5895,[8]1.5208,[9]1.4715,[10]1.4284,[11]1.4147,[12]1.4406,[13]1.4529,[14]1.5824,[15]1.7144,[16]1.7752,[17]1.9408,[18]2.0703,[19]2.0333,[20]2.0250,[21]2.1305,[22]2.1021,[23]2.0764,[24]2.0880,[25]2.0581,[26]2.0330,[27]2.0797,[28]2.0888,[29]2.1391,[30]2.1698,[31]2.2044,[32]2.2227,[33]2.2626,[34]2.3049,[35]2.3566,[36]2.4115,[37]2.4463,[38]2.4930,[39]2.5346,[40]2.5926,[41]2.6353,[42]2.6458,[43]2.6948,[44]2.7107,[45]2.7909,[46]2.8420,[47]2.8003,[48]2.7549,[49]2.7298,[50]2.7498,[51]2.7964,[52]2.8105,[53]2.8597,[54]2.8734,[55]2.9047,[56]2.9384,[57]2.9550,[58]2.9926,[59]3.0027,[60]3.0502,[61]3.0906,[62]3.1475,[63]3.1812,[64]3.2262,[65]3.2360,[66]3.2179,[67]3.1954,[68]3.2271,[69]3.2225,[70]3.2377,[71]3.2562,[72]3.2726,[73]3.2860,[74]3.3095,[75]3.2881,[76]3.2396,[77]3.1959,[78]3.1931,[79]3.1728,[80]3.1563,[81]3.1190,[82]3.1220,[83]3.0918,[84]3.0554,[85]3.0218,[86]2.9995,[87]2.9958,[88]2.9686,[89]2.9537,[90]2.9261,[91]2.8966,[92]2.8704,[93]2.8441,[94]2.8196,[95]2.7964,[96]2.7947,[97]2.8024,[98]2.7882,[99]2.7728,[100]2.7752,[101]2.7671,[102]2.7843,[103]2.8105,[104]2.8288,[105]2.8261,[106]2.8486,[107]2.8737,[108]2.8953,[109]2.9296,[110]2.9637,[111]2.9837,[112]2.9567,[113]2.9436,[114]2.9207,[115]2.9047,[116]2.8905,[117]2.8672,[118]2.8450,[119]2.8235,[120]2.8040,[121]2.7884,[122]2.7698,[123]2.7532,[124]2.7334,[125]2.7156,[126]2.6981,[127]2.6840,[128]2.6757,[129]2.6662,[130]2.6551,[131]2.6472,[132]2.6548,[133]2.6649,[134]2.6714,[135]2.6822,[136]2.6990,[137]2.7145,[138]2.7231,[139]2.7348,[140]2.7353,[141]2.7368,[142]2.7356,[143]2.7359,[144]2.7320,[145]2.7228,[146]2.7211,[147]2.7254,[148]2.7248,[149]2.7265,[150]2.7210,[151]2.7192,[152]2.7157,[153]2.7114,[154]2.7119,[155]2.7159,[156]2.7180,[157]2.7237,[158]2.7322,[159]2.7339,[160]2.7428,[161]2.7509,[162]2.7605,[163]2.7660,[164]2.7863,[165]2.8095,[166]2.8270,[167]2.8399,[168]2.8647,[169]2.8872,[170]2.9083,[171]2.9311,[172]2.9150,[173]2.8980,[174]2.8843,[175]2.8712,[176]2.8589,[177]2.8467,[178]2.8338,[179]2.8193,[180]2.8228,[181]2.8370,[182]2.8519,[183]2.8669,[184]2.8813,[185]2.8915,[186]2.9083,[187]2.9241,[188]2.9381,[189]2.9489,[190]2.9490,[191]2.9561,[192]2.9601,[193]2.9652,[194]2.9848,[195]2.9935,[196]3.0068,[197]3.0167,[198]3.0211,[199]3.0267,[200]3.0261,[201]3.0415,[202]3.0361,[203]3.0413,[204]3.0446,[205]3.0447,[206]3.0468,[207]3.0552,[208]3.0645,[209]3.0737,[210]3.0738,[211]3.0688,[212]3.0689,[213]3.0765,[214]3.0781,[215]3.0837,[216]3.0847,[217]3.0805,[218]3.0804,[219]3.0811,[220]3.0800,[221]3.0803,[222]3.0803,[223]3.0805,[224]3.0856,[225]3.0871,[226]3.0791,[227]3.0772,[228]3.0792,[229]3.0835,[230]3.0900,[231]3.0962,[232]3.0880,[233]3.0801,[234]3.0803,[235]3.0787,[236]3.0879,[237]3.0957,[238]3.1050,[239]3.1151,[240]3.1241,[241]3.1353,[242]3.1498,[243]3.1632,[244]3.1713,[245]3.1831,[246]3.1937,[247]3.1927,[248]3.1884,[249]3.1867,[250]3.1804,[251]3.1782,[252]3.1805,[253]3.1841,[254]3.1910,[255]3.1971,[256]3.2005,[257]3.2032,[258]3.2042,[259]3.2076,[260]3.2098,[261]3.2107,[262]3.2099,[263]3.2158,[264]3.2179,[265]3.2182,[266]3.2199,[267]3.2230,[268]3.2267,[269]3.2298,[270]3.2290,[271]3.2271,[272]3.2205,[273]3.2208,[274]3.2143,[275]3.2037,[276]3.1934,[277]3.1951,[278]3.2052,[279]3.2115,[280]3.2195,[281]3.2272,[282]3.2333,[283]3.2398,[284]3.2466,[285]3.2603,[286]3.2626,[287]3.2661,[288]3.2707,[289]3.2732,[290]3.2648,[291]3.2557,[292]3.2544,[293]3.2536,[294]3.2513,[295]3.2487,[296]3.2507,[297]3.2513,[298]3.2562,[299]3.2620,[300]3.2651,[301]3.2691,[302]3.2713,[303]3.2734,[304]3.2726,[305]3.2845,[306]3.2922,[307]3.3033,[308]3.2916,[309]3.2865,[310]3.2769,[311]3.2804,[312]3.2825,[313]3.2893,[314]3.2915,[315]3.2946,[316]3.2959,[317]3.2974,[318]3.2979,[319]3.2982,[320]3.3026,[321]3.3028,[322]3.3042,[323]3.3106,[324]3.3112,[325]3.3167,[326]3.3214,[327]3.3255,[328]3.3282,[329]3.3297,[330]3.3360,[331]3.3396,[332]3.3443,[333]3.3428,[334]3.3425,[335]3.3428,[336]3.3429,[337]3.3437,[338]3.3441,[339]3.3466,[340]3.3502,[341]3.3555,[342]3.3649,[343]3.3744,[344]3.3797,[345]3.3713,[346]3.3640,[347]3.3597,[348]3.3523,[349]3.3488,[350]3.3471,[351]3.3521,[352]3.3671,[353]3.3761,[354]3.3892,[355]3.3977,[356]3.4029,[357]3.4148,[358]3.4246,[359]3.4279,[360]3.4346,[361]3.4439,[362]3.4526,[363]3.4586,[364]3.4649,[365]3.4715,[366]3.4822,[367]3.4909,[368]3.4975,[369]3.5054,[370]3.5138,[371]3.5277,[372]3.5368,[373]3.5401,[374]3.5435,[375]3.5485,[376]3.5616,[377]3.5727,[378]3.5754,[379]3.5749,[380]3.5715,[381]3.5762,[382]3.5816,[383]3.5853,[384]3.5894,[385]3.5931,[386]3.5996,[387]3.6055,[388]3.6087,[389]3.5980,[390]3.5883,[391]3.5774,[392]3.5715,[393]3.5623,[394]3.5535,[395]3.5438,[396]3.5336,[397]3.5245,[398]3.5146,[399]3.5042,[400]3.4963,[401]3.4863,[402]3.4756,[403]3.4668,[404]3.4563,[405]3.4465,[406]3.4364,[407]3.4270,[408]3.4178,[409]3.4090,[410]3.4031,[411]3.4038,[412]3.3993,[413]3.4012,[414]3.4038,[415]3.4009,[416]3.4009,[417]3.4034,[418]3.3979,[419]3.3991,[420]3.3966,[421]3.3953,[422]3.3970,[423]3.3964,[424]3.4006,[425]3.4005,[426]3.4009,[427]3.3997,[428]3.4021,[429]3.4037,[430]3.4064,[431]3.4074,[432]3.4064,[433]3.4027,[434]3.4028,[435]3.3956,[436]3.3891,[437]3.3851,[438]3.3833,[439]3.3805,[440]3.3855,[441]3.3905,[442]3.3979,[443]3.3964,[444]3.3972,[445]3.3983,[446]3.4029,[447]3.4058,[448]3.4083,[449]3.4114,[450]3.4154,[451]3.4184,[452]3.4206,[453]3.4223,[454]3.4208,[455]3.4229,[456]3.4232,[457]3.4257,[458]3.4311,[459]3.4317,[460]3.4318,[461]3.4284,[462]3.4322,[463]3.4396,[464]3.4448,[465]3.4381,[466]3.4361,[467]3.4344,[468]3.4355,[469]3.4328,[470]3.4301,[471]3.4304,[472]3.4311,[473]3.4304,[474]3.4295,[475]3.4308,[476]3.4290,[477]3.4282,[478]3.4288,[479]3.4307,[480]3.4334,[481]3.4290,[482]3.4325,[483]3.4316,[484]3.4353,[485]3.4416,[486]3.4444,[487]3.4479,[488]3.4531,[489]3.4555,[490]3.4603,[491]3.4665,[492]3.4709,[493]3.4707,[494]3.4719,[495]3.4746,[496]3.4764,[497]3.4794,[498]3.4798,[499]3.4790,[500]3.4832,[501]3.4877,[502]3.4865,[503]3.4849,[504]3.4871,[505]3.4905,[506]3.4988,[507]3.5016,[508]3.5050,[509]3.4973,[510]3.4914,[511]3.4851,[512]3.4810,[513]3.4750,[514]3.4738,[515]3.4761,[516]3.4714,[517]3.4713,[518]3.4704,[519]3.4710,[520]3.4755,[521]3.4744,[522]3.4730,[523]3.4790,[524]3.4775,[525]3.4761,[526]3.4715,[527]3.4663,[528]3.4628,[529]3.4599,[530]3.4568,[531]3.4536,[532]3.4479,[533]3.4415,[534]3.4370,[535]3.4382,[536]3.4410,[537]3.4443,[538]3.4469,[539]3.4496,[540]3.4550,[541]3.4584,[542]3.4607,[543]3.4552,[544]3.4512,[545]3.4508,[546]3.4440,[547]3.4374,[548]3.4307,[549]3.4240,[550]3.4178,[551]3.4116,[552]3.4060,[553]3.4002,[554]3.3983,[555]3.3970,[556]3.3998,[557]3.4039,[558]3.4098,[559]3.4145,[560]3.4197,[561]3.4178,
|
||
Final estimate: PPL = 3.4178 +/- 0.01891
|
||
```
|
||
|
||
> 👤 **fredlas** replied the **2025-03-19** at **15:49:40**:<br>
|
||
> Were you thinking of uploading this to huggingface, by any chance? I can reproduce and upload it myself if necessary, but I haven't downloaded the full R1 weights yet, and would be happy to continue avoiding that if possible!
|
||
>
|
||
> 👤 **ubergarm** replied the **2025-03-19** at **22:37:04**:<br>
|
||
> @fredlas do you have any specific hardware configuration in mind? e.g. how much system RAM, and GPUs / VRAM? I put together rough notes on making your own custom quant in [this quick-start guide discussion](https://github.com/ikawrakow/ik_llama.cpp/discussions/258). I believe @davidsyoung has tailored the quant specific to his 16x3090 = 384 GB VRAM setup.
|
||
>
|
||
> I've made a couple quants now and have one okay one for 256GB RAM + 24GB VRAM single GPU configuration with better perplexity than unsloth `UD-Q2_K_XL` but just a little bit slower. I'm still experimenting to see how the various types effect generation speed vs perplexity while fitting inside the envelope of my current hardware.
|
||
>
|
||
> You can get started with `ik_llama.cpp` including `-mla 2` and repacked quants now with an existing unsloth quant or whatever you have probably. (sorry if you already know this, I'm still new here!) Cheers!
|
||
>
|
||
> 👤 **davidsyoung** replied the **2025-03-19** at **23:18:56**:<br>
|
||
> I might be able to upload if you give me enough time, however, I actually recommend getting used to quanting as there’s _a lot_ tweaking you may want to do.
|
||
>
|
||
> For example, I don’t actually think this quant suits my setup best yet, and I’m actually underutilising one GPU. I just haven’t found a way to split the layers that well yet.
|
||
>
|
||
> 👤 **fredlas** replied the **2025-03-21** at **02:37:16**:<br>
|
||
> @ubergarm 307GiB happens to be right around the size I'm thinking of. 72GiB VRAM + 256GiB RAM, for queuing up jobs to run overnight with 16k context - should just fit in there, I think. Funny coincidence for an extremely different configuration! Thanks for that guide - I made my own quants of Wizard2 8x22B a while back, but long enough that I was probably going to have to basically relearn it.
|
||
>
|
||
> @davidsyoung I'd say don't upload them just for my sake if you weren't already planning to - I just thought I'd check in case I could stay lazy. Plus this size range is probably pretty niche anyways; might not really be worth it in terms of helping people.
|
||
|
||
---
|
||
|
||
👤 **ikawrakow** replied the **2025-03-18** at **09:44:15**:<br>
|
||
|
||
Thank you for this. I think it can be really useful for people.
|
||
|
||
---
|
||
|
||
👤 **saood06** replied the **2025-03-18** at **20:14:25**:<br>
|
||
|
||
@ikawrakow Can I convert this to a discussion?
|
||
|
||
---
|
||
|
||
👤 **davidsyoung** replied the **2025-03-18** at **20:19:37**:<br>
|
||
|
||
All good with me @saood06
|
||
|
||
---
|
||
|
||
👤 **ikawrakow** replied the **2025-03-18** at **20:29:32**:<br>
|
||
|
||
> @ikawrakow Can I convert this to a discussion?
|
||
|
||
Sure, go ahead |