mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2026-03-14 18:37:23 +00:00
update smallthinker and glm4 readme
This commit is contained in:
@@ -5,15 +5,15 @@
|
||||
### Overview
|
||||
We are excited to announce that **KTransformers now supports both SmallThinker and GLM-4-MoE**.
|
||||
|
||||
- **SmallThinker-21B (bf16)**: ~26 TPS **on a dual-socket CPU with one consumer-grade GPU**, requiring ~84 GB DRAM.
|
||||
- **GLM-4-MoE 110B (bf16)**: ~11 TPS **on a dual-socket CPU with one consumer-grade GPU**, requiring ~440 GB DRAM.
|
||||
- **GLM-4-MoE 110B (AMX INT8)**: prefill ~309 TPS / decode ~16 TPS **on a dual-socket CPU with one consumer-grade GPU**, requiring ~220 GB DRAM.
|
||||
- **SmallThinker-21BA3B-Instruct (bf16)**: ~26 TPS **on a dual-socket CPU with one consumer-grade GPU**, requiring ~84 GB DRAM.
|
||||
- **GLM-4.5-Air (bf16)**: ~11 TPS **on a dual-socket CPU with one consumer-grade GPU**, requiring ~440 GB DRAM.
|
||||
- **GLM-4.5-Air (AMX INT8)**: prefill ~309 TPS / decode ~16 TPS **on a dual-socket CPU with one consumer-grade GPU**, requiring ~220 GB DRAM.
|
||||
|
||||
### Model & Resource Links
|
||||
- **SmallThinker-21B**
|
||||
- *(to be announced)*
|
||||
- **GLM-4-MoE 110B**
|
||||
- *(to be announced)*
|
||||
- **SmallThinker-21BA3B-Instruct**
|
||||
- *[SmallThinker-21BA3B-Instruct](https://huggingface.co/PowerInfer/SmallThinker-21BA3B-Instruct)*
|
||||
- **GLM-4.5-Air 110B**
|
||||
- [*GLM-4.5-Air*](https://huggingface.co/zai-org/GLM-4.5-Air)
|
||||
|
||||
---
|
||||
|
||||
@@ -23,9 +23,9 @@ We are excited to announce that **KTransformers now supports both SmallThinker a
|
||||
|
||||
| Model | Precision | Experts | DRAM Needed | GPU Memory Needed\* | TPS (approx.) |
|
||||
| ------------------------- | ---------- | ------- | ----------- | ------------------- | --------------------------------------- |
|
||||
| SmallThinker-21B | bf16 | 32 | \~42 GB | 14 GB | \~26 TPS |
|
||||
| GLM-4-MoE 110B | bf16 | 128 | \~220 GB | 14 GB | \~11 TPS |
|
||||
| GLM-4-MoE 110B (AMX INT8) | int8 | 128 | \~220 GB | 14 GB | \~16 TPS
|
||||
| SmallThinker-21B-Instruct | bf16 | 32 | \~42 GB | 14 GB | \~26 TPS |
|
||||
| GLM-4.5-Air | bf16 | 128 | \~220 GB | 14 GB | \~11 TPS |
|
||||
| GLM-4.5-Air (AMX INT8) | int8 | 128 | \~220 GB | 14 GB | \~16 TPS
|
||||
|
||||
|
||||
\* Exact GPU memory depends on sequence length, batch size, and kernels used.
|
||||
@@ -37,12 +37,12 @@ We are excited to announce that **KTransformers now supports both SmallThinker a
|
||||
# (Fill in actual repos/filenames yourself)
|
||||
|
||||
# SmallThinker-21B
|
||||
huggingface-cli download --resume-download placeholder-org/Model-TBA \
|
||||
--local-dir ./Model-TBA
|
||||
huggingface-cli download --resume-download https://huggingface.co/PowerInfer/SmallThinker-21BA3B-Instruct \
|
||||
--local-dir ./SmallThinker-21BA3B-Instruct
|
||||
|
||||
# GLM-4-MoE 110B
|
||||
huggingface-cli download --resume-download placeholder-org/Model-TBA \
|
||||
--local-dir ./Model-TBA
|
||||
huggingface-cli download --resume-download https://huggingface.co/zai-org/GLM-4.5-Air \
|
||||
--local-dir ./GLM-4.5-Air
|
||||
```
|
||||
|
||||
|
||||
@@ -94,7 +94,7 @@ curl -X POST http://localhost:10021/v1/chat/completions \
|
||||
"messages": [
|
||||
{"role": "user", "content": "hello"}
|
||||
],
|
||||
"model": "SmallThinker-21B",
|
||||
"model": "SmallThinker-21BA3B-Instruct",
|
||||
"temperature": 0.3,
|
||||
"top_p": 1.0,
|
||||
"stream": true
|
||||
@@ -109,7 +109,7 @@ curl -X POST http://localhost:10110/v1/chat/completions \
|
||||
"messages": [
|
||||
{"role": "user", "content": "hello"}
|
||||
],
|
||||
"model": "GLM-4-MoE-110B",
|
||||
"model": "GLM-4.5-Air",
|
||||
"temperature": 0.3,
|
||||
"top_p": 1.0,
|
||||
"stream": true
|
||||
|
||||
Reference in New Issue
Block a user