1459 Commits

Author SHA1 Message Date
turboderp
fb5000ac62 Don't compile AVX2 functions when building without AVX2 support 2024-12-29 19:05:54 +01:00
turboderp
82bb648517 Fix Granite3 logit scaling 2024-12-27 19:54:19 +01:00
turboderp
bee449d116 Support Granite 3.x arch 2024-12-27 19:11:21 +01:00
turboderp
ab4d9e15eb Chat example Granite3 template 2024-12-27 18:32:46 +01:00
turboderp
ebfefc4bed Support Cohere2 architecture 2024-12-25 20:14:45 +01:00
turboderp
d815f5f9e1 Fix RoPE alpha after refactor in #4d25874 2024-12-25 18:09:11 +01:00
nintwentydo
b2dd5a7e06 Modify handling for Pixtral Large model params (#701)
* Modify handling for Pixtral Large model params.

* Fix multimodal_projector_bias to default to True if not in model config.json
2024-12-21 19:58:41 +01:00
turboderp
cf7fcd18d2 Fix chat example system prompt 2024-12-18 07:52:09 +01:00
turboderp
f76bc8537a Read number of vision tower layers from config for Pixtral (fix Pixtral-Large) 2024-12-18 01:29:20 +01:00
turboderp
4061c24373 Qwen2-VL: Basic video support 2024-12-15 23:32:41 +01:00
turboderp
c78d9027aa Fix ChatML template in multimodal example 2024-12-15 21:38:40 +01:00
turboderp
9934f06442 Refactoring 2024-12-15 21:37:29 +01:00
turboderp
edf1a3575a Util function to get byte size of MM embeddings object 2024-12-09 23:22:42 +01:00
turboderp
254e76b178 Merge remote-tracking branch 'origin/dev' into dev 2024-12-09 20:15:23 +01:00
turboderp
8bb283d319 Cleanup build actions 2024-12-09 20:14:35 +01:00
turboderp
f4119aec5b Fix background filter eval when draft model used 2024-12-09 20:12:50 +01:00
DocShotgun
af69ce9458 Prevent UnboundLocalError when loading with yarn/su with short ctx len (#694)
* scaling_factor is left unbound when the requested max_seq_len < the model's original unscaled max_seq_len
2024-12-08 21:24:45 +01:00
turboderp
4f83f52d7d Merge branch 'refs/heads/dev' v0.2.6 2024-12-07 15:56:16 +01:00
turboderp
15b5df784a Cleanup build actions 2024-12-07 15:55:53 +01:00
turboderp
ebaf819bc0 Merge remote-tracking branch 'origin/dev' into dev 2024-12-07 15:55:33 +01:00
turboderp
83a57c74ed Bump to v0.2.6 2024-12-07 15:55:11 +01:00
turboderp
ba9774f1c8 Enable noise tokens for Qwen2-VL quantizatino 2024-12-07 15:53:52 +01:00
turboderp
c55656cc0c Fix system RAM consumption while quantizing, fixes #692 2024-12-05 21:16:36 +01:00
turboderp
c86f62c3b8 Ensure MRoPE ID tensor is contiguous 2024-12-05 18:02:02 +01:00
Philipp Emanuel Weidmann
db78601226 Prevent NPE in deallocate_pages (#688)
Prevent NPE in `deallocate_pages`

If `deallocate_pages` is called on a job for which `allocate_pages`
has not been called (see `iterate_start_jobs` for conditions under
which this is true), `allocated_pages` is `None`, raising a NPE
when attempting to iterate.

In particular, this prevents `clear_queue` from working. In
practice, this problem readily occurs when starting a few jobs
and then calling `clear_queue`.
2024-12-01 22:02:32 +01:00
turboderp
663eea1b53 Fix 64-bit dtype for MSVC 2024-12-01 20:09:40 +01:00
turboderp
bc7db9395d Merge remote-tracking branch 'origin/master' v0.2.5 2024-12-01 14:29:44 +01:00
turboderp
e3b5549e0b Bump to v0.2.5 2024-12-01 14:21:59 +01:00
turboderp
fa7e89c197 Update example 2024-12-01 14:20:33 +01:00
turboderp
48e6306193 Update chat example, prompt formats 2024-11-30 13:31:35 +01:00
turboderp
1f685bd8d3 Update grounding demo 2024-11-23 14:46:51 +01:00
turboderp
638cf3015f Add Qwen2-VL grounding demo 2024-11-23 12:19:02 +01:00
turboderp
bfa4b4f043 Don't clamp FP32 residual during quantization 2024-11-22 09:30:36 +01:00
turboderp
142190e1f8 DRY: Avoid out-of-bounds error when computing penalty for sequence with image tokens 2024-11-22 02:13:26 +01:00
turboderp
9cacd66229 Fix MRoPE model inference when no MM embeddings present 2024-11-20 05:49:03 +01:00
turboderp
69bb9d6cff Add optional noise embeddings during quantization 2024-11-20 05:48:22 +01:00
turboderp
5857ea9846 MLP: Fix temp state size calculation (for Qwen2-VL-72B mmp) 2024-11-18 16:31:57 +01:00
turboderp
c16aa9b3eb Update multimodal example 2024-11-18 07:51:41 +01:00
turboderp
c81603c441 Update multimodal example 2024-11-18 07:16:48 +01:00
turboderp
6aab7064e2 Support MRoPE (dynamic gen prompt ingest only) 2024-11-18 05:33:11 +01:00
turboderp
b1e786cee3 Fix regression 2024-11-18 04:25:42 +01:00
turboderp
d6177d568f Add grid def etc. MM embedding, keep embeddings in system RAM by default 2024-11-18 03:31:50 +01:00
turboderp
4d258742ed Refactor RoPE initialization 2024-11-18 03:24:26 +01:00
turboderp
be3eeb403d Add Qwen2-VL arch definition, preprocessor and vision tower 2024-11-16 11:36:42 +01:00
turboderp
2ac584cb24 MLP: Support quick_gelu in Torch fwd 2024-11-16 11:36:42 +01:00
turboderp
70bdb0969a MLP: Group input states 2024-11-16 11:36:42 +01:00
turboderp
6405225582 Refactoring 2024-11-16 11:36:42 +01:00
turboderp
5129f96231 Support Conv3d 2024-11-16 11:26:40 +01:00
turboderp
1776223296 Pixtral: Fix hardcoded device ID 2024-11-16 11:23:27 +01:00
turboderp
576303a152 Fix unload for Conv2D 2024-11-16 04:46:27 +01:00