exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-04-20 14:29:28 +00:00

Author	SHA1	Message	Date
TerminalMan	d92ff8d9e4	improve installation experience (#666 )	2024-11-02 21:11:14 +01:00
Brian Dashore	84b1f9017d	Torch 2.5 (#659 ) * Actions: Add helpful comments Useful for updating dependencies when building. Signed-off-by: kingbri <bdashore3@proton.me> * Actions: Add torch 2.5 builds Signed-off-by: kingbri <bdashore3@proton.me> --------- Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-02 21:09:23 +01:00
turboderp	9cd077dc96	Fix regression	2024-10-20 21:25:05 +02:00
turboderp	a8d8a41dc4	Add multimodal experiment	2024-10-20 21:21:21 +02:00
turboderp	0347b062bf	Add indexed embeddings support to dynamic gen	2024-10-20 21:21:07 +02:00
turboderp	1f35150e81	Fix thread-local device setup in safetensors loader, fix for #647	2024-10-15 20:42:18 +02:00
Valeriy Selitskiy	e55c4ad283	feat: try to create `out_dir` if it does not exist (#654 )	2024-10-15 19:30:33 +02:00
turboderp	a40c07a333	Update Formatron example (supports conlist since 0.4.6)	2024-10-15 19:28:00 +02:00
turboderp	acccc930cc	Don't yield thread early for background filter evaluation (benchmarks slightly faster in some cases)	2024-10-03 00:00:45 +02:00
turboderp	7bacab2a55	Rename JSON example	2024-10-02 23:59:53 +02:00
turboderp	ed6dc9b7b3	Add some debug functions	2024-10-02 23:58:33 +02:00
turboderp	b651f4abab	Add XTC and DRY options to chatbot example.	2024-10-02 00:01:49 +02:00
turboderp	2616fd74d0	Add Formatron example	2024-09-30 00:41:51 +02:00
turboderp	22cbff66cf	Add logit masking mode to filters	2024-09-30 00:35:15 +02:00
turboderp	1b580ce15f	Sampling: Fix inefficient top-K when most probs are zero	2024-09-30 00:32:59 +02:00
turboderp	03b2d551b2	Bump to v0.2.3 v0.2.3	2024-09-29 13:00:18 +02:00
turboderp	cad7848375	HumanEval: Rename new args to match other scripts	2024-09-29 12:57:06 +02:00
turboderp	ef7cdda31c	Merge remote-tracking branch 'refs/remotes/LlamaEnjoyer/add_more_args_to_humaneval' into dev	2024-09-29 12:52:52 +02:00
turboderp	5d4359317d	Add YaRN factor override to model_init	2024-09-29 12:35:45 +02:00
turboderp	c84f5979c8	Merge branch 'refs/heads/dev-yarn' into dev	2024-09-29 12:21:22 +02:00
turboderp	f1adff9472	Fix multi-token character decoding for Qwen2 (legacy gen)	2024-09-29 00:15:05 +02:00
turboderp	431479207f	Fix multi-token character decoding for Qwen2	2024-09-28 23:47:07 +02:00
turboderp	be3de0fa85	Add some code for evaluating FPx (not enabled)	2024-09-28 16:07:39 +02:00
turboderp	d393bfe4a7	Merge remote-tracking branch 'origin/dev' into dev	2024-09-28 16:05:01 +02:00
Downtown-Case	6b73184d4f	Use specified max context.py Instead of original_max_position_embeddings. This appears to be what transformers intended, and does not update dynamically with sequence leng there.	2024-09-27 23:31:53 -04:00
Downtown-Case	8dca1abf44	Only trigger if long context config is set	2024-09-27 19:44:49 -04:00
Downtown-Case	b1955039c6	Pesky space.py	2024-09-27 19:06:05 -04:00
Downtown-Case	aff1e5a547	Add YaRN	2024-09-27 18:57:15 -04:00
Downtown-Case	0d78f034b1	Add YaRN	2024-09-27 18:53:22 -04:00
Llama Enjoyer	b2af0bbad3	Remove stray import.	2024-09-24 17:32:09 +02:00
Llama Enjoyer	3a389131de	Add more arguments to accept values passed via the cmd line.	2024-09-24 17:28:02 +02:00
Llama Enjoyer	e960dfd68d	Fix the temperature argument to accept values passed via the cmd line.	2024-09-24 17:18:08 +02:00
Sinan	7c7b1993b4	Added draft token count as parameter to chat.py (#635 )	2024-09-24 11:16:30 +02:00
turboderp	8361f3f4a0	Add missing cp310+cu118 torch 2.4 windows wheel	2024-09-23 17:38:53 +02:00
turboderp	15e54046ba	More stream gymnastics	2024-09-23 17:28:55 +02:00
turboderp	a5132d072e	Add XTC sampler	2024-09-22 23:09:19 +02:00
turboderp	6d7b2e8e7a	Revert snapshot interval	2024-09-22 19:11:32 +02:00
turboderp	43a0be35df	Make measurement less sensitive to very sparse inf values in reference fwd pass	2024-09-22 19:01:29 +02:00
turboderp	a17f6665cb	Fix streams in quantizer	2024-09-22 18:15:05 +02:00
turboderp	9946f45f1c	Force tensor loading onto priority stream	2024-09-20 22:05:20 +02:00
turboderp	e155e0a5b0	Fix loading in new thread	2024-09-18 19:46:56 +02:00
turboderp	c4a03e09f5	Tokenizer: Give priority to tokenizer.json instead of tokenizer.model	2024-09-18 00:41:43 +02:00
turboderp	12bceb9f4b	Cleanup	2024-09-18 00:32:00 +02:00
turboderp	0695f3a854	Fix potential bug in filter evaluation	2024-09-17 00:34:28 +02:00
turboderp	8a25e0f2b3	Merge branch 'refs/heads/master' into dev	2024-09-17 00:33:36 +02:00
turboderp	b25210778c	Remove fasttensors, add platform-agnostic multithreaded ST loader	2024-09-17 00:33:16 +02:00
turboderp	144c576bdb	Fix bottlenecks in quantized tensor loading	2024-09-17 00:00:27 +02:00
turboderp	10a8842b25	Fix JSON inference example	2024-09-14 21:35:02 +02:00
turboderp	b2c7cf280c	Add cp310 cu121 torch2.4 Windows wheel v0.2.2	2024-09-14 21:17:52 +02:00
turboderp	46eff43403	Merge branch 'refs/heads/dev'	2024-09-14 21:13:46 +02:00

1 2 3 4 5 ...

1290 Commits