Model: Change cache_size/max_seq_len behavior

- Cache size is now given only by the cache_size config option. Default is 4096 (user should always override to max out VRAM) - max_seq_len, if not overridden in the config, will default to the model's config.json - max_seq_len is reduced to be no larger than the cache
2026-03-14 15:57:27 +00:00 · 2025-10-05 22:15:27 +02:00
parent d672dc2137
commit 4235f98e83
5 changed files with 37 additions and 63 deletions
--- a/endpoints/core/types/model.py
+++ b/endpoints/core/types/model.py
@@ -85,7 +85,7 @@ class ModelLoadRequest(BaseModel):
        examples=[4096],
    )
    cache_size: Optional[int] = Field(
-        description=("Number in tokens, must be greater than or equal to max_seq_len"),
+        description="Number in tokens, must be multiple of 256",
        default=None,
        examples=[4096],
    )