Model: Add proper jobs cleanup and fix var calls

Jobs should be started and immediately cleaned up when calling the
generation stream. Expose a stream_generate function and append
this to the base class since it's more idiomatic than generate_gen.

The exl2 container's generate_gen function is now internal.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
This commit is contained in:
kingbri
2025-04-24 21:30:55 -04:00
parent 7e007f0761
commit f070587e9f
6 changed files with 45 additions and 26 deletions

View File

@@ -14,7 +14,7 @@ if dependencies.extras:
class InfinityContainer:
model_dir: pathlib.Path
model_loaded: bool = False
loaded: bool = False
# Use a runtime type hint here
engine: Optional["AsyncEmbeddingEngine"] = None
@@ -37,7 +37,7 @@ class InfinityContainer:
self.engine = AsyncEmbeddingEngine.from_args(engine_args)
await self.engine.astart()
self.model_loaded = True
self.loaded = True
logger.info("Embedding model successfully loaded.")
async def unload(self):