exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-04-25 00:39:05 +00:00

Author	SHA1	Message	Date
turboderp	4b5dbecdc1	Allow key prefix for lm_head (Gemma3)	2025-03-15 00:06:56 +01:00
turboderp	4844f3873c	Upcast MM embeddings when residual is FP32	2025-03-15 00:06:56 +01:00
turboderp	fe51a8f4b5	Correctly include Q/K norms when compiling model	2025-03-15 00:06:56 +01:00
turboderp	38f4d7c87d	Allow loading transposed unquantized linear layer	2025-03-15 00:06:56 +01:00
turboderp	9669fa33c9	Allow component models to use learned pos embeddings without regarding LLM max_seq_len	2025-03-15 00:06:56 +01:00
turboderp	7b05acd233	Allow per-layer RoPE theta	2025-03-15 00:06:56 +01:00
turboderp	23395dfa42	Fix FP32 residual for paged attn	2025-03-14 23:09:31 +01:00
Thomas	eaf8ad1041	Update chat.py, include multi-line input support and context clearing through input (#738 ) * Update chat.py, include multi-line input support and context clearing - Enable multi-line input (mli) support through the -mli argument. When using mli, end input with the EOF char (return/Ctrl+D on Unix, return/Ctrl+Z/return on Windows) - Allow context clearing outside of amnesia by inputting "clear" * Adding qwq chat mode, adding the ability to forget thinking context	2025-03-10 15:28:33 +01:00
turboderp	d8fa1a8250	Support partial_rotary_factor (Phi-4 mini)	2025-02-28 08:51:11 +01:00
turboderp	2e630aefdd	Fix alt pos embeddings and block diagonal mask when flash-attn is disabled	2025-02-13 22:13:48 +01:00
turboderp	1a80d38891	Update build actions	2025-02-08 03:29:05 +01:00
turboderp	f1c4126045	Update build actions	2025-02-08 03:14:32 +01:00
turboderp	f98a7b7099	Update build actions	2025-02-08 03:08:45 +01:00
turboderp	096076b3fd	Update build actions	2025-02-08 02:49:21 +01:00
turboderp	0f4a9f0042	Update build actions	2025-02-08 01:36:51 +01:00
turboderp	f3de3cbd34	Update build actions	2025-02-08 01:05:22 +01:00
turboderp	94e57904bc	Update build actions v0.2.8	2025-02-08 00:57:29 +01:00
turboderp	3a9618d471	Update build actions	2025-02-08 00:44:44 +01:00
turboderp	3486f9eb71	Merge branch 'refs/heads/dev'	2025-02-08 00:26:52 +01:00
turboderp	6e4a84a1e3	Bump to 0.2.8	2025-02-08 00:26:30 +01:00
turboderp	d05fbcc854	Fix Pixtral regression	2025-02-04 21:01:23 +01:00
turboderp	96b2f9df77	Add Qwen2.5 mode to grounding demo	2025-01-29 22:41:36 +01:00
turboderp	cce6f95cd3	Initial support for Qwen2.5-VL	2025-01-29 03:03:36 +01:00
turboderp	d0413b06f8	Check length of gpu_split in model_init	2025-01-09 11:36:25 +01:00
turboderp	c8fa853c89	Test script: Allow --eval_rows in wiki2 ppl test	2025-01-09 11:14:48 +01:00
turboderp	318435db81	Sampler: Remove superfluous pre-sort pass	2025-01-09 11:14:19 +01:00
turboderp	d302fa3d37	Optimizer: Ensure weight budget is fully used up	2025-01-09 11:14:03 +01:00
turboderp	b400394f06	Update build actions	2025-01-09 11:13:03 +01:00
turboderp	b9c025b4b1	Enable large runner	2024-12-30 05:47:11 +01:00
turboderp	c41acd5c11	Extra ROCm 6.2 actions	2024-12-30 04:31:44 +01:00
turboderp	7c08c6df71	Deactivate mamba	2024-12-30 04:07:50 +01:00
turboderp	c8075cabf4	Update conda-incubator	2024-12-30 04:00:15 +01:00
turboderp	ae241a9af5	Fix video example v0.2.7	2024-12-30 02:24:49 +01:00
turboderp	1ef618389b	Bump to v0.2.7	2024-12-30 02:19:19 +01:00
turboderp	b010cb950f	Fix compilation errors on aarch64	2024-12-29 20:30:59 +01:00
turboderp	fb5000ac62	Don't compile AVX2 functions when building without AVX2 support	2024-12-29 19:05:54 +01:00
turboderp	82bb648517	Fix Granite3 logit scaling	2024-12-27 19:54:19 +01:00
turboderp	bee449d116	Support Granite 3.x arch	2024-12-27 19:11:21 +01:00
turboderp	ab4d9e15eb	Chat example Granite3 template	2024-12-27 18:32:46 +01:00
turboderp	ebfefc4bed	Support Cohere2 architecture	2024-12-25 20:14:45 +01:00
turboderp	d815f5f9e1	Fix RoPE alpha after refactor in #4d25874	2024-12-25 18:09:11 +01:00
nintwentydo	b2dd5a7e06	Modify handling for Pixtral Large model params (#701 ) * Modify handling for Pixtral Large model params. * Fix multimodal_projector_bias to default to True if not in model config.json	2024-12-21 19:58:41 +01:00
turboderp	cf7fcd18d2	Fix chat example system prompt	2024-12-18 07:52:09 +01:00
turboderp	f76bc8537a	Read number of vision tower layers from config for Pixtral (fix Pixtral-Large)	2024-12-18 01:29:20 +01:00
turboderp	4061c24373	Qwen2-VL: Basic video support	2024-12-15 23:32:41 +01:00
turboderp	c78d9027aa	Fix ChatML template in multimodal example	2024-12-15 21:38:40 +01:00
turboderp	9934f06442	Refactoring	2024-12-15 21:37:29 +01:00
turboderp	edf1a3575a	Util function to get byte size of MM embeddings object	2024-12-09 23:22:42 +01:00
turboderp	254e76b178	Merge remote-tracking branch 'origin/dev' into dev	2024-12-09 20:15:23 +01:00
turboderp	8bb283d319	Cleanup build actions	2024-12-09 20:14:35 +01:00

1 2 3 4 5 ...

1444 Commits