tabbyAPI

mirror of https://github.com/theroyallab/tabbyAPI.git synced 2026-03-15 00:07:28 +00:00

Author	SHA1	Message	Date
kingbri	d15eb55f20	Model: Fix exl2 cache mode check FP16 was not included in the validation step. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 09:51:09 -04:00
kingbri	8996dc7b02	API: Add default for backend in model load request Should be None so pydantic doesn't complain. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 09:51:09 -04:00
Brian	b555eeb6e7	Merge pull request #339 from Maaaxiii/fix/tool-calling-embeddings fix: Aligned Parameter Name in chat completions generate_tool_calls	2025-05-11 20:41:58 -04:00
kingbri	f4adca1f3e	API: Remove default fallback from backend param Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-11 09:56:53 -04:00
Brian	3674d7b9b5	Merge pull request #341 from theroyallab/exl3 Exl3	2025-05-10 23:43:02 -04:00
kingbri	6379081dd8	Sampling: Make add_bos_token override concise Also set the default to None so text completions follows the same pattern. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-10 19:07:35 -04:00
kingbri	656af41b5d	Model: Always enable decode_special_tokens The frontend should handle the special tokens if they get emitted. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:25:50 -04:00
kingbri	83826b56be	Main: Remove unnecessary import Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:14:11 -04:00
kingbri	42346c6b39	Sampling: Remove skip_special_tokens This parameter is way too confusing and does not make sense in the modern LLM space. Change approved by all maintainers. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:11:33 -04:00
kingbri	25c77ebf77	Model: Remove exllamav2-specific version check No longer necessary thanks to the agnostic check. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:08:15 -04:00
kingbri	48ea1737cf	Startup: Check agnostically for inference deps If an inference dep isn't present, force exit the application. This occurs after all subcommands have been appropriately processed. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 21:59:00 -04:00
kingbri	33ac016023	Dependencies: Add ExllamaV3 v0.0.1 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 21:42:07 -04:00
Brian	f26ca23f1a	Merge pull request #336 from DocShotgun/backend-detect Automatically select model backend based on config.json	2025-05-09 01:56:44 -04:00
Brian	02a8d68e17	Merge branch 'exl3' into backend-detect	2025-05-08 23:50:33 -04:00
kingbri	d5963007f0	Model: Add backend print Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-08 23:45:04 -04:00
kingbri	cfee16905b	Model: Migrate backend detection to a separate function Seemed out of place in the common load function. In addition, rename the transformers utils signature which actually takes a directory instead of a file. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-08 23:42:39 -04:00
Brian	527afc206b	Merge pull request #329 from DocShotgun/exl3 Exllamav3 cache quantization	2025-05-08 23:11:45 -04:00
kingbri	638eef401a	Model: Move cache creation to a common function Prevents repetitiveness while also creating a Cache class. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-08 23:10:03 -04:00
Maximilian Klem	22f7f1e1ec	fix: flipped parameter name with variable name	2025-05-07 21:04:30 +02:00
DocShotgun	f8070e7707	Model: Auto detect model backend from config * Use exllamav3 for exl3 models, exllamav2 otherwise	2025-05-06 18:51:58 -07:00
DocShotgun	9dcde59c57	Model: Check for unsupported cache mode in exllamav2	2025-05-06 01:18:15 -07:00
kingbri	bc0a84241a	API: Patch kobold generation call Calling the model requires different args now. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-05 22:11:21 -04:00
kingbri	b683545d0e	Config: Fix argparse help Adding a comma in the description converts the string to a tuple, which isn't parseable by argparse's help. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-05 21:52:30 -04:00
turboderp	ff38305145	Common: Fix exception f-string	2025-05-05 02:01:16 +02:00
DocShotgun	45b966363e	Tree: Format	2025-05-03 21:01:03 -07:00
DocShotgun	a635a719d7	Model: Enable draft model q-cache in Exl3 * Remove unneeded default fp16 cache layer import	2025-05-03 20:59:36 -07:00
DocShotgun	58e34ba4c5	Model: Exl3 cache quant settings lenient with whitespace	2025-05-03 20:35:35 -07:00
DocShotgun	68a660bdb3	Model: Initial Exl3 cache quantization support	2025-05-03 20:35:35 -07:00
turboderp	036af02bf6	Common: No default add_bos_token value for chat completion requests	2025-05-04 05:25:58 +02:00
turboderp	92ea7ee7cd	Model: Add draft model/speculative decoding	2025-05-04 01:27:42 +02:00
turboderp	1db2cb99cb	Model: Avoid initializing class variables	2025-05-04 01:26:42 +02:00
turboderp	0405a94a89	Model: Cast penalty range to int	2025-05-03 22:28:36 +02:00
turboderp	58c380b8ca	Model: Create generator on load	2025-05-03 18:33:37 +02:00
turboderp	0d949d00b9	Model: Set default max_batch_size	2025-05-03 18:33:37 +02:00
turboderp	8c75b29923	Model: Fix some warnings	2025-05-03 18:33:36 +02:00
kingbri	15cc480cb0	Exl3: Simplify add_bos_token handling Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:50:42 -04:00
randoentity	d8a8ccfc2a	Model: fix add_bos_token	2025-05-02 21:33:25 -04:00
kingbri	0d02af3c81	Model: Set model_dir on init Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	c89bea030e	Model: Add template fetching to Exl3 Use the same functionality as exl2's loader. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	e8f00412f6	Model: Fetch from generation_config and tokenizer_config in Exl3 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	59d081fe83	Common: Add hardware file Removed from a commit as well. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	eca403a0e4	Model: Add Exllamav3 sampler File was not included in previous commit. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	bdc5189a4b	Exl3: Add chunk size, cache size, and model info Use the same algorithm for estimating and adjusting cache size based on multiples of 256 and above max seq len. Same applies for chunk size. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	303e2dde12	Model: Correct exl3 generation, add concurrency, and cleanup Fixes application of sampler parameters by adding a new sampler builder interface. Also expose the generator class-wide and add wait_for_jobs. Finally, allow inline loading to specify the backend. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
randoentity	c744790f14	fixup: add sampler logs Also passing sampler to job with this, no idea if this is correct	2025-05-02 21:33:25 -04:00
randoentity	b35c48da37	fixup: some metrics	2025-05-02 21:33:25 -04:00
randoentity	c0f268f33e	fixup: autosplit, start work on metrics	2025-05-02 21:33:25 -04:00
randoentity	306fc7cd15	fixup: autosplit reserve this probably breaks v2 support	2025-05-02 21:33:25 -04:00
randoentity	acb3adb953	fixup: auto split	2025-05-02 21:33:25 -04:00
randoentity	14fb573371	fixup: max_seq_len Whoops	2025-05-02 21:33:25 -04:00

1 2 3 4 5 ...

990 Commits