tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-05-11 08:20:08 +00:00

Author	SHA1	Message	Date
kingbri	09a4c79847	Model: Auto-scale max_tokens by default If max_tokens is None, it automatically scales to fill up the context. This does not mean the generation will fill up that context since EOS stops also exist. Originally suggested by #86 Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 22:54:59 -04:00
kingbri	8cbb59d6e1	Model: Cleanup some comments Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 22:20:45 -04:00
kingbri	4f75fb5588	Model: Adjust max output len Max output len should be hardcoded to 16 since it's the amount of tokens to predict per forward pass. 16 is a good value for both normal inference and speculative decoding which also helps save vram compared to 2048 which was the previous default. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 22:16:53 -04:00
kingbri	2704ff8344	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 16:02:29 -04:00
kingbri	5c7fc69ded	API: Fix finish_reason returns OAI expects finish_reason to be "stop" or "length" (there are others, but they're not in the current scope of this project). Make all completions and chat completions responses return this from the model generation itself rather than putting a placeholder. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 15:59:28 -04:00
kingbri	25f5d4a690	API: Cleanup permission endpoint Don't return an OAI specific type from a common file. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 15:13:26 -04:00
kingbri	3c08f46c51	Endpoints: Add key permission checker This is a definite way to check if an authorized key is API or admin. The endpoint only runs if the key is valid in the first place to keep inline with the API's security model. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 00:53:27 -04:00
kingbri	c9a6d9ae1f	Model: Switch to begin_stream_ex Allows for dynamically passing logprobs params instead of assuming on initialization of the generator. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-17 14:41:16 -04:00
kingbri	08bcc6307a	Config: Update description part 2 Forgot to change wording. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-17 01:07:23 -04:00
kingbri	7abbac098a	Config: Update Q4 in comments Wasn't present when the option was added. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-17 01:04:12 -04:00
kingbri	14d8ec2007	Signal: Fix signal handlers for uvicorn Add the ability to override uvicorn's signal handler in addition to using main's signal handler for any SIGINTs before the API server starts. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	95e44c20d6	Model: Fix load if model didn't load properly If the model didn't load properly, the container still exists until unload is called. However, the name check still registered as true. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	2755fd1af0	API: Fix blocking iterator execution Run these iterators on the background thread. On startup, the API spawns a background thread as needed to run sync code on without blocking the event loop. Use asyncio's run_thread function since it allows for errors to be propegated. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	7fded4f183	Tree: Switch to async generators Async generation helps remove many roadblocks to managing tasks using threads. It should allow for abortables and modern-day paradigms. NOTE: Exllamav2 itself is not an asynchronous library. It's just been added into tabby's async nature to allow for a fast and concurrent API server. It's still being debated to run stream_ex in a separate thread or manually manage it using asyncio.sleep(0) Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	33e2df50b7	API: Disable SSE ping chunks These are mainly used for some clients that ping to see if the request is alive. However, we don't need this. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-14 20:47:05 -04:00
kingbri	7006fa4cc8	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-13 23:33:18 -04:00
kingbri	efc01d947b	API + Model: Add speculative ngram decoding Speculative ngram decoding is like speculative decoding without the draft model. It's not as useful because it only decodes on predictable sequences, but it depends on the usecase. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-13 23:32:11 -04:00
kingbri	2ebefe8258	Logging: Move metrics to gen logging This didn't have a place in the generation function. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-13 23:13:55 -04:00
kingbri	1ec8eb9620	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-13 00:02:55 -04:00
kingbri	8e4745920c	Requirements: Update Ruff v0.3.2 Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-13 00:02:55 -04:00
kingbri	6f03be9523	API: Split functions into their own files Previously, generation function were bundled with the request function causing the overall code structure and API to look ugly and unreadable. Split these up and cleanup a lot of the methods that were previously overlooked in the API itself. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	104a6121cb	API: Split into separate folder Moving the API into its own directory helps compartmentalize it and allows for cleaning up the main file to just contain bootstrapping and the entry point. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	5a2de30066	Tree: Update to cleanup globals Use the module singleton pattern to share global state. This can also be a modified version of the Global Object Pattern. The main reason this pattern is used is for ease of use when handling global state rather than adding extra dependencies for a DI parameter. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	b373b25235	API: Move to ModelManager This is a shared module which manages the model container and provides extra utility functions around it to help slim down the API. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	8b46282aef	Model: Fix state flag sets on unload The load state should be false only if the models are unloaded. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	894be4a818	Startup: Check if the port is available and fallback Similar to Gradio, fall back to port + 1 if the config port isn't bindable. If both ports aren't available, let the user know and exit. An infinite loop of finding a port isn't advisable. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-11 21:57:28 -04:00
kingbri	7c6fd7ac60	Main: Cleanup Remove leftover debug statements. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-11 18:10:35 -04:00
kingbri	53d889e0f0	Logging: Fix legacy warn statement Warn is not a valid method with loguru. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-11 01:31:43 -04:00
kingbri	ba3da6d92f	Logging: Escape rich markup sequences Rich markup sequences inside the log string were causing issues with printing. Fix this by using their escape function. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-11 00:28:48 -04:00
kingbri	4cc0b59bdc	Requirements: Add sse-starlette Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 19:41:08 -04:00
kingbri	42c0dbe795	Generation: Explicitly release semaphore on disconnect This prevents any lockups when querying another request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:54:48 -04:00
kingbri	2025a1c857	Requirements: Unpin uvicorn v0.28.0 works now and the underlying errors were fixed. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:48:43 -04:00
kingbri	bbb1a4ec20	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:45:09 -04:00
kingbri	045262f51f	Logging: Loglevel INFO This is the max that Tabby should log because debug and trace aren't used within the application. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:44:19 -04:00
kingbri	d45e847c7a	API: Fix disconnect handling on streaming responses Starlette's StreamingResponse has an issue where it yields after a request has disconnected. A bugfix to starlette will fix this issue, but FastAPI uses starlette <= 0.36 which isn't ideal. Therefore, switch back to sse-starlette which handles these disconnects correctly. Also don't try yielding after the request is disconnected. Just return out of the generator instead. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:43:13 -04:00
kingbri	6b4f100db2	Logger: Escape tags Angle brackets should be escaped to avoid mistaken color formatting. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 01:24:50 -05:00
kingbri	e33971859b	Requirements: Pin uvicorn Pin uvicorn due to issues with request disconnection in the latest version. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 01:23:36 -05:00
kingbri	a69ee976f0	API: Let the user know if a disconnect occurred If a user disconnects from a request, log this in the console. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-09 15:48:27 -05:00
kingbri	c77259bfbb	Logger: Fix reformatting of message Use the reformatted message when splitting lines instead of the raw message to prevent exceptions. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-09 15:40:37 -05:00
kingbri	4d09226364	Logging: Fix Uvicorn hook The Uvicorn logging config wasn't being set. Fix that when creating a new server. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 17:56:48 -05:00
kingbri	2295b12643	Progress: Fix bar with draft models Show two bars and clarify which bar is which. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:48:06 -05:00
kingbri	c9b4b7c509	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	cad72315f4	Init: Switch to display redoc endpoint Redoc looks much better than Swagger docs, so show that by default. Both endpoints still exist. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	ef2dc326f5	Logging: Fix inconsistent formatting Some colorization was incorrect and the separator insertion has become more robust. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	228c227c1e	Logging: Switch to loguru Loguru is a flexible logger that allows for easier hooking and imports into Rich with no problems. Also makes progress bars stick to the bottom of the terminal window. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	fe0ff240e7	Progress: Switch to Rich Rich is a more mature library for displaying progress bars, logging, and console output. This should help properly align progress bars within the terminal. Side note: "We're Rich!" Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	39617adb65	Requirements: Update Exllamav2 v0.0.15 Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-06 22:29:55 -05:00
Brian Dashore	47c42a23d4	Merge pull request #72 from djmaze/patch-1 Remove explicit install of pytorch & exllamav2 in Dockerfile	2024-03-06 01:13:37 -05:00
kingbri	9a007c4707	Model: Add support for Q4 cache Add this in addition to 8bit cache and 16bit cache. Passing "Q4" with the cache_mode request parameter will set this on model load. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-06 00:59:28 -05:00
kingbri	0b25c208d6	API: Fix error reporting Make a disconnect on load error consistently. It should be safer to warn the user to run unload (or re-run load) if a model does not load correctly. Also don't log the traceback for request errors that don't have one. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-05 18:16:02 -05:00

1 2 3 4 5 ...

361 Commits