Update prefix_cache.md

This commit is contained in:
ErvinXie
2025-06-30 15:04:37 +08:00
committed by GitHub
parent a9a72e52c3
commit 5a73aaf652

View File

@@ -1,6 +1,6 @@
## Enabling Prefix Cache Mode in KTransformers
To enable **Prefix Cache Mode** in KTransformers, you need to modify the configuration file and recompile the project.
Balance serve now supports prefix cache reuse! To enable **Prefix Cache Mode** in KTransformers, you need to modify the configuration file and recompile the project.
### Step 1: Modify the Configuration File
@@ -32,3 +32,7 @@ USE_BALANCE_SERVE=1 bash ./install.sh
# For those who have two cpu and 1T RAMDual NUMA:
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
```
## Note
Balance serve utilizes a 3-layer (GPU-CPU-Disk) scheme to store and reuse KVCache. Deleting KVCache is not supported now. If you have too much KVCache, you can simply delete them by remove kvcache files.