mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2026-04-20 06:18:59 +00:00
Update reference to optimize rules directory
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
This commit is contained in:
@@ -123,7 +123,7 @@ Download source code and compile:
|
||||
```shell
|
||||
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
|
||||
```
|
||||
- For Windows (Windows native temprarily deprecated, please try WSL)
|
||||
- For Windows (Windows native temporarily deprecated, please try WSL)
|
||||
|
||||
```shell
|
||||
install.bat
|
||||
@@ -166,7 +166,7 @@ It features the following arguments:
|
||||
> Note: <strong>.safetensors</strong> files are not required in the directory. We only need config files to build model and tokenizer.
|
||||
>
|
||||
- `--gguf_path` (required): Path of a directory containing GGUF files which could that can be downloaded from [Hugging Face](https://huggingface.co/mzwing/DeepSeek-V2-Lite-Chat-GGUF/tree/main). Note that the directory should only contains GGUF of current model, which means you need one separate directory for each model.
|
||||
- `--optimize_config_path` (required except for Qwen2Moe and DeepSeek-V2): Path of YAML file containing optimize rules. There are two rule files pre-written in the [ktransformers/optimize/optimize_rules](ktransformers/optimize/optimize_rules) directory for optimizing DeepSeek-V2 and Qwen2-57B-A14, two SOTA MoE models.
|
||||
- `--optimize_config_path` (required except for Qwen2Moe and DeepSeek-V2): Path of YAML file containing optimize rules. There are two rule files pre-written in the [ktransformers/optimize/optimize_rules](https://github.com/kvcache-ai/ktransformers/tree/main/ktransformers/optimize/optimize_rules) directory for optimizing DeepSeek-V2 and Qwen2-57B-A14, two SOTA MoE models.
|
||||
- `--max_new_tokens`: Int (default=1000). Maximum number of new tokens to generate.
|
||||
- `--cpu_infer`: Int (default=10). The number of CPUs used for inference. Should ideally be set to the (total number of cores - 2).
|
||||
|
||||
|
||||
Reference in New Issue
Block a user