tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-03-14 15:57:27 +00:00

Author	SHA1	Message	Date
kingbri	d8f7b93c54	Model: Fix fetching of draft args Mistakenly fetched these from parent kwargs instead of the scoped draft_config var. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-05 22:24:27 -05:00
DocShotgun	3f2fcbcc45	Add fallback to draft_rope_scale to 1.0	2023-12-05 18:51:36 -08:00
DocShotgun	39f7a2aabd	Expose draft_rope_scale	2023-12-05 12:59:32 -08:00
Brian Dashore	e085b806e8	Merge pull request #22 from DocShotgun/main Update colab, expose additional args	2023-12-05 01:22:33 -05:00
DocShotgun	67507105d0	Update colab, expose additional args * Exposed draft model args for speculative decoding * Exposed int8 cache, dummy models, and no flash attention * Resolved CUDA 11.8 dependency issue	2023-12-04 22:20:46 -08:00
Brian Dashore	37f8f3ef8b	Merge pull request #20 from veryamazinglystupid/main make colab better, fix libcudart errors	2023-12-05 01:14:21 -05:00
kingbri	621e11b940	Update documentation Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-05 00:33:43 -05:00
kingbri	8ba3bfa6b3	API: Fix load exception handling Models do not fully unload if an exception is caught in load. Therefore, leave it to the client to unload on cancel. Also add handlers in the event a SSE stream is cancelled. These packets can't be sent back to the client since the client has severed the connection, so print them in terminal. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-05 00:23:15 -05:00
kingbri	7c92968558	API: Fix mistaken debug statement Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-04 18:07:12 -05:00
kingbri	5e54911cc8	API: Fix semaphore handling and chat completion errors Chat completions previously always yielded a final packet to say that a generation finished. However, this caused errors that a yield was executed after GeneratorExit. This is correctly stated because python's garbage collector can't clean up the generator after exiting due to the finally block executing. In addition, SSE endpoints close off the connection, so the finish packet can only be yielded when the response has completed, so ignore yield on exception. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-04 15:51:25 -05:00
kingbri	30fc5b3d29	Merge branch 'main' of github.com:theroyallab/tabbyAPI	2023-12-03 22:55:51 -05:00
kingbri	ed6c962aad	API: Fix sequential requests FastAPI is kinda weird with queueing. If an await is used within an async def, requests aren't executed sequentially. Get the sequential requests back by using a semaphore to limit concurrent execution from generator functions. Also scaffold the framework to move generator functions to their own file. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-03 22:54:34 -05:00
veryamazinglystupid	ad1a12a0f2	make colab better, fix libcudart errors :3	2023-12-03 14:07:52 +05:30
DocShotgun	2a9e4ca051	Add Colab example *note: this uses wheels for python 3.10 and torch 2.1.0+cu118 which is the current default in colab	2023-12-03 02:21:51 -05:00
kingbri	e740b53478	Requirements: Update Flash Attention 2 Bump to 2.3.6 Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-03 01:56:29 -05:00
kingbri	c67c9f6d66	Model + Config: Remove low_mem option Low_mem doesn't work in exl2 and it was an experimental option to begin with. Keep the loading code commented out in case it gets fixed in the future. A better alternative is to use 8bit cache which works and helps save VRAM. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-03 01:07:42 -05:00
Brian Dashore	109e4223e0	Merge pull request #18 from DocShotgun/main Add automatic NTK-aware alpha scaling to model	2023-12-03 01:06:50 -05:00
kingbri	27fc0c0069	Model: Cleanup and compartmentalize auto rope functions Also handle an edge case if ratio <= 1 since NTK scaling is only used for values > 1. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-03 01:05:09 -05:00
DocShotgun	bd2c5d0d09	Force auto-alpha to 1.0 if config ctx == base ctx	2023-12-02 21:19:59 -08:00
DocShotgun	1c398b0be7	Add automatic NTK-aware alpha scaling to model * enables automatic calculation of NTK-aware alpha scaling for models if the rope_alpha arg is not passed in the config, using the same formula used for draft models	2023-12-02 21:02:29 -08:00
kingbri	61f6e51fdb	OAI: Add separator style fallback Some models may return None for separator style with FastChat. Fall back to LLAMA2 if this is the case. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-01 23:30:19 -05:00
kingbri	ae69b18583	API: Use FastAPI streaming instead of sse_starlette sse_starlette kept firing a ping response if it was taking too long to set an event. Rather than using a hacky workaround, switch to FastAPI's inbuilt streaming response and construct SSE requests with a utility function. This helps the API become more robust and removes an extra requirement. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-01 01:54:35 -05:00
kingbri	6493b1d2aa	OAI: Add ability to send dummy models Some APIs require an OAI model to be sent against the models endpoint. Fix this by adding a GPT 3.5 turbo entry as first in the list to cover as many APIs as possible. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-01 00:27:28 -05:00
kingbri	aef411bed5	OAI: Fix chat completion streaming Chat completions require a finish reason to be provided in the OAI spec once the streaming is completed. This is different from a non- streaming chat completion response. Also fix some errors that were raised from the endpoint. References #15 Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-01 00:14:24 -05:00
Brian Dashore	c4d8c901e1	Merge pull request #13 from ziadloo/main Adding the usage stat support (prompt_tokens, completion_tokens, and total_tokens)	2023-11-30 01:57:44 -05:00
kingbri	8a5ac5485b	Model: Fix rounding generated_tokens is always a whole number. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-30 01:55:46 -05:00
kingbri	e703c716ee	Merge branch 'main' of https://github.com/ziadloo/tabbyAPI into ziadloo-main	2023-11-30 01:01:48 -05:00
kingbri	56f9b1d1a8	API: Add generator error handling If the generator errors, there's no proper handling to send an error packet and close the connection. This is especially important for unloading models if the load fails at any stage to reclaim a user's VRAM. Raising an exception caused the model_container object to lock and not get freed by the GC. This made sense to propegate SSE errors across all generator functions rather than relying on abort signals. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-30 00:37:48 -05:00
kingbri	2bc3da0155	YAML: Force all files to open with utf8 The default encoding method when opening files on Windows is cp1252 which doesn't support all unicode and can cause issues. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-29 22:04:29 -05:00
kingbri	3957316b79	Revert "API: Rename repetition_decay -> repetition_slope" This reverts commit `cad144126f`. Change this parameter back to repetition_decay. This is different than rep_pen_slope used in other backends such as kobold and NAI. Still keep the fallback condition though. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-29 22:03:45 -05:00
kingbri	94696543bc	Model: Warn user if context > max_seq_len Unlike other backends, tabby attempts to generate even if the context is greater than the max sequence length via truncation of the given context. Rather than artifically erroring out, give a warning that outputted console metrics are going to be incorrect and to make sure that context <= max_seq_len. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-29 01:35:32 -05:00
kingbri	cad144126f	API: Rename repetition_decay -> repetition_slope Also fix the fallback to use 0 for sanity checking and validation. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-29 01:13:05 -05:00
kingbri	5cbf7f13da	OAI: Fix repetition range Alias repetition_penalty_range to repetition_range since that's used as an internal variable. Perhaps in the future, there should be a function that allows for iterating through request aliases and give a default value. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-29 00:53:19 -05:00
Mehran Ziadloo	b0c42d0f05	Leveraging local variables	2023-11-27 20:56:56 -08:00
Mehran Ziadloo	ead503c75b	Adding token usage support	2023-11-27 20:05:05 -08:00
kingbri	44e7f7b0ee	Update README Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-25 23:47:48 -05:00
Brian Dashore	0914bc313f	Merge pull request #12 from DocShotgun/main Add start-up shell script for Linux	2023-11-25 00:29:47 -05:00
kingbri	d929e0c826	API: Fix error points and exceptions On /v1/model/load, some internal server errors weren't being sent, so migrate directory checking out and also add a check to make sure the proposed model path exists. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-25 00:27:02 -05:00
DocShotgun	cffd20f580	Add start-up shell script for Linux - requires user to have already installed the pre-requisites in venv	2023-11-23 19:03:52 -08:00
kingbri	d47c39da54	API: Don't include draft directory in response The draft directory should be returned for a draft model request (TBD). Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-23 00:07:56 -05:00
kingbri	13c9c09398	Update README Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-22 00:20:21 -05:00
kingbri	d25310e55d	Requirements: Update Flash Attention 2 Use 2.3.4 from tgw. However, keep the 2.3.3 wheels in requirements if the newer wheels don't work for now. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-21 22:12:55 -05:00
kingbri	71b9a53336	API: Add temperature_last support Documented in previous commits. Also make sure that for version checking, check the value of kwargs instead of if the key is present since requests pass default values. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-21 21:20:59 -05:00
turboderp	3337fe6acc	Warning if unsupported samplers are used	2023-11-21 18:35:22 +01:00
turboderp	a54de11cf3	Add new samplers	2023-11-21 18:16:53 +01:00
kingbri	c92ee24bb4	Tree: Add batch script A simple batch script to activate a venv and start TabbyAPI. This can be used with nssm in Windows for a systemd-like background service. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-20 01:48:06 -05:00
kingbri	2aa9c145be	Auth: Fix an oops with headers I copy pasted the code wrong. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-20 00:16:12 -05:00
kingbri	39ea730be5	Auth: Allow admin keys to work with api key routes Admin keys are an administrator key, so it makes sense to allow it for API key routes as well. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-19 23:53:07 -05:00
turboderp	8ef730f016	Merge pull request #11 from veden/patch-1 Fix incorrect ratio calculation for draft model	2023-11-20 04:23:34 +01:00
Veden	f960fac8ff	Fix incorrect ratio calculation for draft model	2023-11-19 13:12:53 -08:00

1 2 3

117 Commits