Model: Correct exl3 generation, add concurrency, and cleanup

Fixes application of sampler parameters by adding a new sampler builder
interface. Also expose the generator class-wide and add wait_for_jobs.

Finally, allow inline loading to specify the backend.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
This commit is contained in:
kingbri
2025-04-30 22:59:25 -04:00
parent c744790f14
commit 303e2dde12
2 changed files with 155 additions and 85 deletions

View File

@@ -163,8 +163,10 @@ class ModelConfig(BaseConfigModel):
"Example: ['max_seq_len', 'cache_mode']."
),
)
# Defaults to exllamav2 in common/model.py
backend: Optional[str] = Field(
"exllamav2",
None,
description=(
"Backend to use for this model (default: exllamav2)\n"
"Options: exllamav2, exllamav3",