mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2026-05-11 00:10:07 +00:00
Update prefix_cache.md
This commit is contained in:
@@ -1,6 +1,6 @@
|
|||||||
## Enabling Prefix Cache Mode in KTransformers
|
## Enabling Prefix Cache Mode in KTransformers
|
||||||
|
|
||||||
To enable **Prefix Cache Mode** in KTransformers, you need to modify the configuration file and recompile the project.
|
Balance serve now supports prefix cache reuse! To enable **Prefix Cache Mode** in KTransformers, you need to modify the configuration file and recompile the project.
|
||||||
|
|
||||||
### Step 1: Modify the Configuration File
|
### Step 1: Modify the Configuration File
|
||||||
|
|
||||||
@@ -31,4 +31,8 @@ Then recompile the project:
|
|||||||
USE_BALANCE_SERVE=1 bash ./install.sh
|
USE_BALANCE_SERVE=1 bash ./install.sh
|
||||||
# For those who have two cpu and 1T RAM(Dual NUMA):
|
# For those who have two cpu and 1T RAM(Dual NUMA):
|
||||||
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
|
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Note
|
||||||
|
Balance serve utilizes a 3-layer (GPU-CPU-Disk) scheme to store and reuse KVCache. Deleting KVCache is not supported now. If you have too much KVCache, you can simply delete them by remove kvcache files.
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user