mirror of
https://github.com/theroyallab/tabbyAPI.git
synced 2026-03-14 15:57:27 +00:00
Model: Change cache_size/max_seq_len behavior
- Cache size is now given only by the cache_size config option. Default is 4096 (user should always override to max out VRAM) - max_seq_len, if not overridden in the config, will default to the model's config.json - max_seq_len is reduced to be no larger than the cache
This commit is contained in:
@@ -85,7 +85,7 @@ class ModelLoadRequest(BaseModel):
|
||||
examples=[4096],
|
||||
)
|
||||
cache_size: Optional[int] = Field(
|
||||
description=("Number in tokens, must be greater than or equal to max_seq_len"),
|
||||
description="Number in tokens, must be multiple of 256",
|
||||
default=None,
|
||||
examples=[4096],
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user