diff --git a/doc/en/SmallThinker_and_Glm4moe.md b/doc/en/SmallThinker_and_Glm4moe.md index ac6dd07..fb4c9d1 100644 --- a/doc/en/SmallThinker_and_Glm4moe.md +++ b/doc/en/SmallThinker_and_Glm4moe.md @@ -5,15 +5,15 @@ ### Overview We are excited to announce that **KTransformers now supports both SmallThinker and GLM-4-MoE**. -- **SmallThinker-21B (bf16)**: ~26 TPS **on a dual-socket CPU with one consumer-grade GPU**, requiring ~84 GB DRAM. -- **GLM-4-MoE 110B (bf16)**: ~11 TPS **on a dual-socket CPU with one consumer-grade GPU**, requiring ~440 GB DRAM. -- **GLM-4-MoE 110B (AMX INT8)**: prefill ~309 TPS / decode ~16 TPS **on a dual-socket CPU with one consumer-grade GPU**, requiring ~220 GB DRAM. +- **SmallThinker-21BA3B-Instruct (bf16)**: ~26 TPS **on a dual-socket CPU with one consumer-grade GPU**, requiring ~84 GB DRAM. +- **GLM-4.5-Air (bf16)**: ~11 TPS **on a dual-socket CPU with one consumer-grade GPU**, requiring ~440 GB DRAM. +- **GLM-4.5-Air (AMX INT8)**: prefill ~309 TPS / decode ~16 TPS **on a dual-socket CPU with one consumer-grade GPU**, requiring ~220 GB DRAM. ### Model & Resource Links -- **SmallThinker-21B** - - *(to be announced)* -- **GLM-4-MoE 110B** - - *(to be announced)* +- **SmallThinker-21BA3B-Instruct** + - *[SmallThinker-21BA3B-Instruct](https://huggingface.co/PowerInfer/SmallThinker-21BA3B-Instruct)* +- **GLM-4.5-Air 110B** + - [*GLM-4.5-Air*](https://huggingface.co/zai-org/GLM-4.5-Air) --- @@ -23,9 +23,9 @@ We are excited to announce that **KTransformers now supports both SmallThinker a | Model | Precision | Experts | DRAM Needed | GPU Memory Needed\* | TPS (approx.) | | ------------------------- | ---------- | ------- | ----------- | ------------------- | --------------------------------------- | -| SmallThinker-21B | bf16 | 32 | \~42 GB | 14 GB | \~26 TPS | -| GLM-4-MoE 110B | bf16 | 128 | \~220 GB | 14 GB | \~11 TPS | -| GLM-4-MoE 110B (AMX INT8) | int8 | 128 | \~220 GB | 14 GB | \~16 TPS +| SmallThinker-21B-Instruct | bf16 | 32 | \~42 GB | 14 GB | \~26 TPS | +| GLM-4.5-Air | bf16 | 128 | \~220 GB | 14 GB | \~11 TPS | +| GLM-4.5-Air (AMX INT8) | int8 | 128 | \~220 GB | 14 GB | \~16 TPS \* Exact GPU memory depends on sequence length, batch size, and kernels used. @@ -37,12 +37,12 @@ We are excited to announce that **KTransformers now supports both SmallThinker a # (Fill in actual repos/filenames yourself) # SmallThinker-21B -huggingface-cli download --resume-download placeholder-org/Model-TBA \ - --local-dir ./Model-TBA +huggingface-cli download --resume-download https://huggingface.co/PowerInfer/SmallThinker-21BA3B-Instruct \ + --local-dir ./SmallThinker-21BA3B-Instruct # GLM-4-MoE 110B -huggingface-cli download --resume-download placeholder-org/Model-TBA \ - --local-dir ./Model-TBA +huggingface-cli download --resume-download https://huggingface.co/zai-org/GLM-4.5-Air \ + --local-dir ./GLM-4.5-Air ``` @@ -94,7 +94,7 @@ curl -X POST http://localhost:10021/v1/chat/completions \ "messages": [ {"role": "user", "content": "hello"} ], - "model": "SmallThinker-21B", + "model": "SmallThinker-21BA3B-Instruct", "temperature": 0.3, "top_p": 1.0, "stream": true @@ -109,7 +109,7 @@ curl -X POST http://localhost:10110/v1/chat/completions \ "messages": [ {"role": "user", "content": "hello"} ], - "model": "GLM-4-MoE-110B", + "model": "GLM-4.5-Air", "temperature": 0.3, "top_p": 1.0, "stream": true