exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-04-29 18:51:35 +00:00

Author	SHA1	Message	Date
turboderp	9961dbdcaf	Bump to 0.2.4 v0.2.4	2024-11-12 04:13:33 +01:00
turboderp	2a888dbd47	Pixtral example	2024-11-12 03:46:29 +01:00
turboderp	16cd5ef384	Generator: Make sampler settings optional instead of default arg	2024-11-12 03:41:59 +01:00
turboderp	90895967b1	Fix quantization for Pixtral, copy vision tower tensors to quantized model	2024-11-10 16:22:57 +01:00
turboderp	d37cf7e764	Fix regressions	2024-11-10 13:38:21 +01:00
turboderp	b28300c0db	Pixtral: Refactor vision model, update example	2024-11-10 12:34:42 +01:00
turboderp	7c876ef091	Update Pixtral experiment	2024-11-10 11:17:21 +01:00
turboderp	193a6b2b36	Pixtral: Add vision tower and preprocessor	2024-11-10 11:15:06 +01:00
turboderp	9504b515f7	Formatting	2024-11-10 11:13:49 +01:00
turboderp	a2f0f87713	Pixtral: Load vision tower and preprocessor config	2024-11-10 10:42:08 +01:00
turboderp	26406f9360	Make attn keys mappable, switch attn/MLP shapes for vision model	2024-11-10 10:40:58 +01:00
turboderp	79ca8fb65b	Add alt. RoPE sin/cos table as attn parameter, and non-causal option	2024-11-10 10:35:49 +01:00
turboderp	c5a21bccb7	Add Conv2D module	2024-11-10 10:32:08 +01:00
turboderp	525b3204e0	Fix PIL dependency, skip version check in preprocessor	2024-11-10 10:31:21 +01:00
turboderp	4e6783f97b	Pixtral arch definition, load projector from quantized model	2024-11-04 00:47:39 +01:00
turboderp	6cdfa5e52f	Refactor architecture logic for code reuse between LLM/VLM	2024-11-03 22:34:25 +01:00
TerminalMan	d92ff8d9e4	improve installation experience (#666 )	2024-11-02 21:11:14 +01:00
Brian Dashore	84b1f9017d	Torch 2.5 (#659 ) * Actions: Add helpful comments Useful for updating dependencies when building. Signed-off-by: kingbri <bdashore3@proton.me> * Actions: Add torch 2.5 builds Signed-off-by: kingbri <bdashore3@proton.me> --------- Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-02 21:09:23 +01:00
turboderp	9cd077dc96	Fix regression	2024-10-20 21:25:05 +02:00
turboderp	a8d8a41dc4	Add multimodal experiment	2024-10-20 21:21:21 +02:00
turboderp	0347b062bf	Add indexed embeddings support to dynamic gen	2024-10-20 21:21:07 +02:00
turboderp	1f35150e81	Fix thread-local device setup in safetensors loader, fix for #647	2024-10-15 20:42:18 +02:00
Valeriy Selitskiy	e55c4ad283	feat: try to create `out_dir` if it does not exist (#654 )	2024-10-15 19:30:33 +02:00
turboderp	a40c07a333	Update Formatron example (supports conlist since 0.4.6)	2024-10-15 19:28:00 +02:00
turboderp	acccc930cc	Don't yield thread early for background filter evaluation (benchmarks slightly faster in some cases)	2024-10-03 00:00:45 +02:00
turboderp	7bacab2a55	Rename JSON example	2024-10-02 23:59:53 +02:00
turboderp	ed6dc9b7b3	Add some debug functions	2024-10-02 23:58:33 +02:00
turboderp	b651f4abab	Add XTC and DRY options to chatbot example.	2024-10-02 00:01:49 +02:00
turboderp	2616fd74d0	Add Formatron example	2024-09-30 00:41:51 +02:00
turboderp	22cbff66cf	Add logit masking mode to filters	2024-09-30 00:35:15 +02:00
turboderp	1b580ce15f	Sampling: Fix inefficient top-K when most probs are zero	2024-09-30 00:32:59 +02:00
turboderp	03b2d551b2	Bump to v0.2.3 v0.2.3	2024-09-29 13:00:18 +02:00
turboderp	cad7848375	HumanEval: Rename new args to match other scripts	2024-09-29 12:57:06 +02:00
turboderp	ef7cdda31c	Merge remote-tracking branch 'refs/remotes/LlamaEnjoyer/add_more_args_to_humaneval' into dev	2024-09-29 12:52:52 +02:00
turboderp	5d4359317d	Add YaRN factor override to model_init	2024-09-29 12:35:45 +02:00
turboderp	c84f5979c8	Merge branch 'refs/heads/dev-yarn' into dev	2024-09-29 12:21:22 +02:00
turboderp	f1adff9472	Fix multi-token character decoding for Qwen2 (legacy gen)	2024-09-29 00:15:05 +02:00
turboderp	431479207f	Fix multi-token character decoding for Qwen2	2024-09-28 23:47:07 +02:00
turboderp	be3de0fa85	Add some code for evaluating FPx (not enabled)	2024-09-28 16:07:39 +02:00
turboderp	d393bfe4a7	Merge remote-tracking branch 'origin/dev' into dev	2024-09-28 16:05:01 +02:00
Downtown-Case	6b73184d4f	Use specified max context.py Instead of original_max_position_embeddings. This appears to be what transformers intended, and does not update dynamically with sequence leng there.	2024-09-27 23:31:53 -04:00
Downtown-Case	8dca1abf44	Only trigger if long context config is set	2024-09-27 19:44:49 -04:00
Downtown-Case	b1955039c6	Pesky space.py	2024-09-27 19:06:05 -04:00
Downtown-Case	aff1e5a547	Add YaRN	2024-09-27 18:57:15 -04:00
Downtown-Case	0d78f034b1	Add YaRN	2024-09-27 18:53:22 -04:00
Llama Enjoyer	b2af0bbad3	Remove stray import.	2024-09-24 17:32:09 +02:00
Llama Enjoyer	3a389131de	Add more arguments to accept values passed via the cmd line.	2024-09-24 17:28:02 +02:00
Llama Enjoyer	e960dfd68d	Fix the temperature argument to accept values passed via the cmd line.	2024-09-24 17:18:08 +02:00
Sinan	7c7b1993b4	Added draft token count as parameter to chat.py (#635 )	2024-09-24 11:16:30 +02:00
turboderp	8361f3f4a0	Add missing cp310+cu118 torch 2.4 windows wheel	2024-09-23 17:38:53 +02:00

... 2 3 4 5 6 ...

1456 Commits