Model: Add proper jobs cleanup and fix var calls

Jobs should be started and immediately cleaned up when calling the generation stream. Expose a stream_generate function and append this to the base class since it's more idiomatic than generate_gen. The exl2 container's generate_gen function is now internal. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2026-04-20 14:28:54 +00:00 · 2025-04-24 21:30:55 -04:00
parent 7e007f0761
commit f070587e9f
6 changed files with 45 additions and 26 deletions
--- a/backends/infinity/model.py
+++ b/backends/infinity/model.py
@@ -14,7 +14,7 @@ if dependencies.extras:

 class InfinityContainer:
    model_dir: pathlib.Path
-    model_loaded: bool = False
+    loaded: bool = False

    # Use a runtime type hint here
    engine: Optional["AsyncEmbeddingEngine"] = None
@@ -37,7 +37,7 @@ class InfinityContainer:
        self.engine = AsyncEmbeddingEngine.from_args(engine_args)
        await self.engine.astart()

-        self.model_loaded = True
+        self.loaded = True
        logger.info("Embedding model successfully loaded.")

    async def unload(self):