Tree: Switch to async generators

Async generation helps remove many roadblocks to managing tasks
using threads. It should allow for abortables and modern-day paradigms.

NOTE: Exllamav2 itself is not an asynchronous library. It's just
been added into tabby's async nature to allow for a fast and concurrent
API server. It's still being debated to run stream_ex in a separate
thread or manually manage it using asyncio.sleep(0)

Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
kingbri
2024-03-14 10:27:39 -04:00
committed by Brian Dashore
parent 33e2df50b7
commit 7fded4f183
10 changed files with 84 additions and 88 deletions

11
main.py
View File

@@ -5,9 +5,6 @@ import os
import pathlib
import signal
import sys
import threading
import time
from functools import partial
from loguru import logger
from typing import Optional
@@ -121,13 +118,7 @@ async def entrypoint(args: Optional[dict] = None):
lora_dir = pathlib.Path(unwrap(lora_config.get("lora_dir"), "loras"))
model.container.load_loras(lora_dir.resolve(), **lora_config)
# TODO: Replace this with abortables, async via producer consumer, or something else
api_thread = threading.Thread(target=partial(start_api, host, port), daemon=True)
api_thread.start()
# Keep the program alive
while api_thread.is_alive():
time.sleep(0.5)
await start_api(host, port)
if __name__ == "__main__":