exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-04-28 10:11:37 +00:00

Author	SHA1	Message	Date
turboderp	fb5000ac62	Don't compile AVX2 functions when building without AVX2 support	2024-12-29 19:05:54 +01:00
turboderp	82bb648517	Fix Granite3 logit scaling	2024-12-27 19:54:19 +01:00
turboderp	bee449d116	Support Granite 3.x arch	2024-12-27 19:11:21 +01:00
turboderp	ab4d9e15eb	Chat example Granite3 template	2024-12-27 18:32:46 +01:00
turboderp	ebfefc4bed	Support Cohere2 architecture	2024-12-25 20:14:45 +01:00
turboderp	d815f5f9e1	Fix RoPE alpha after refactor in #4d25874	2024-12-25 18:09:11 +01:00
nintwentydo	b2dd5a7e06	Modify handling for Pixtral Large model params (#701 ) * Modify handling for Pixtral Large model params. * Fix multimodal_projector_bias to default to True if not in model config.json	2024-12-21 19:58:41 +01:00
turboderp	cf7fcd18d2	Fix chat example system prompt	2024-12-18 07:52:09 +01:00
turboderp	f76bc8537a	Read number of vision tower layers from config for Pixtral (fix Pixtral-Large)	2024-12-18 01:29:20 +01:00
turboderp	4061c24373	Qwen2-VL: Basic video support	2024-12-15 23:32:41 +01:00
turboderp	c78d9027aa	Fix ChatML template in multimodal example	2024-12-15 21:38:40 +01:00
turboderp	9934f06442	Refactoring	2024-12-15 21:37:29 +01:00
turboderp	edf1a3575a	Util function to get byte size of MM embeddings object	2024-12-09 23:22:42 +01:00
turboderp	254e76b178	Merge remote-tracking branch 'origin/dev' into dev	2024-12-09 20:15:23 +01:00
turboderp	8bb283d319	Cleanup build actions	2024-12-09 20:14:35 +01:00
turboderp	f4119aec5b	Fix background filter eval when draft model used	2024-12-09 20:12:50 +01:00
DocShotgun	af69ce9458	Prevent UnboundLocalError when loading with yarn/su with short ctx len (#694 ) * scaling_factor is left unbound when the requested max_seq_len < the model's original unscaled max_seq_len	2024-12-08 21:24:45 +01:00
turboderp	4f83f52d7d	Merge branch 'refs/heads/dev' v0.2.6	2024-12-07 15:56:16 +01:00
turboderp	15b5df784a	Cleanup build actions	2024-12-07 15:55:53 +01:00
turboderp	ebaf819bc0	Merge remote-tracking branch 'origin/dev' into dev	2024-12-07 15:55:33 +01:00
turboderp	83a57c74ed	Bump to v0.2.6	2024-12-07 15:55:11 +01:00
turboderp	ba9774f1c8	Enable noise tokens for Qwen2-VL quantizatino	2024-12-07 15:53:52 +01:00
turboderp	c55656cc0c	Fix system RAM consumption while quantizing, fixes #692	2024-12-05 21:16:36 +01:00
turboderp	c86f62c3b8	Ensure MRoPE ID tensor is contiguous	2024-12-05 18:02:02 +01:00
Philipp Emanuel Weidmann	db78601226	Prevent NPE in `deallocate_pages` (#688 ) Prevent NPE in `deallocate_pages` If `deallocate_pages` is called on a job for which `allocate_pages` has not been called (see `iterate_start_jobs` for conditions under which this is true), `allocated_pages` is `None`, raising a NPE when attempting to iterate. In particular, this prevents `clear_queue` from working. In practice, this problem readily occurs when starting a few jobs and then calling `clear_queue`.	2024-12-01 22:02:32 +01:00
turboderp	663eea1b53	Fix 64-bit dtype for MSVC	2024-12-01 20:09:40 +01:00
turboderp	bc7db9395d	Merge remote-tracking branch 'origin/master' v0.2.5	2024-12-01 14:29:44 +01:00
turboderp	e3b5549e0b	Bump to v0.2.5	2024-12-01 14:21:59 +01:00
turboderp	fa7e89c197	Update example	2024-12-01 14:20:33 +01:00
turboderp	48e6306193	Update chat example, prompt formats	2024-11-30 13:31:35 +01:00
turboderp	1f685bd8d3	Update grounding demo	2024-11-23 14:46:51 +01:00
turboderp	638cf3015f	Add Qwen2-VL grounding demo	2024-11-23 12:19:02 +01:00
turboderp	bfa4b4f043	Don't clamp FP32 residual during quantization	2024-11-22 09:30:36 +01:00
turboderp	142190e1f8	DRY: Avoid out-of-bounds error when computing penalty for sequence with image tokens	2024-11-22 02:13:26 +01:00
turboderp	9cacd66229	Fix MRoPE model inference when no MM embeddings present	2024-11-20 05:49:03 +01:00
turboderp	69bb9d6cff	Add optional noise embeddings during quantization	2024-11-20 05:48:22 +01:00
turboderp	5857ea9846	MLP: Fix temp state size calculation (for Qwen2-VL-72B mmp)	2024-11-18 16:31:57 +01:00
turboderp	c16aa9b3eb	Update multimodal example	2024-11-18 07:51:41 +01:00
turboderp	c81603c441	Update multimodal example	2024-11-18 07:16:48 +01:00
turboderp	6aab7064e2	Support MRoPE (dynamic gen prompt ingest only)	2024-11-18 05:33:11 +01:00
turboderp	b1e786cee3	Fix regression	2024-11-18 04:25:42 +01:00
turboderp	d6177d568f	Add grid def etc. MM embedding, keep embeddings in system RAM by default	2024-11-18 03:31:50 +01:00
turboderp	4d258742ed	Refactor RoPE initialization	2024-11-18 03:24:26 +01:00
turboderp	be3eeb403d	Add Qwen2-VL arch definition, preprocessor and vision tower	2024-11-16 11:36:42 +01:00
turboderp	2ac584cb24	MLP: Support quick_gelu in Torch fwd	2024-11-16 11:36:42 +01:00
turboderp	70bdb0969a	MLP: Group input states	2024-11-16 11:36:42 +01:00
turboderp	6405225582	Refactoring	2024-11-16 11:36:42 +01:00
turboderp	5129f96231	Support Conv3d	2024-11-16 11:26:40 +01:00
turboderp	1776223296	Pixtral: Fix hardcoded device ID	2024-11-16 11:23:27 +01:00
turboderp	576303a152	Fix unload for Conv2D	2024-11-16 04:46:27 +01:00

1 2 3 4 5 ...

1459 Commits