Commit Graph

894 Commits

Author SHA1 Message Date
turboderp
4e587cd19b Actions: Fix typo 2026-04-27 21:17:32 +02:00
turboderp
5975a65cb1 Actions: Add retry for downloading windows dependencies v0.0.31 2026-04-27 20:40:37 +02:00
turboderp
aa86876907 Bump to v0.0.31 2026-04-27 20:38:03 +02:00
turboderp
f262b0851b Qwen3-VL: Accept "qwen3_5_vision" vision model type 2026-04-27 20:37:35 +02:00
turboderp
76a178ebc1 RYS: Add RYS relayering 2026-04-26 23:39:26 +02:00
turboderp
57031e1f1b chat.py: Fix colors 2026-04-26 23:04:40 +02:00
turboderp
602cc7447f Merge branch 'fork/mayakwd/feature/lora' into dev 2026-04-25 21:16:59 +02:00
turboderp
dc8de61cdd Bit of cleanup 2026-04-25 20:40:27 +02:00
turboderp
5416d5acc9 spec_decode eval: Allow overriding default num_draft_tokens 2026-04-25 04:00:18 +02:00
turboderp
001c952b71 DFlash: Fix incorrect input layer dim 2026-04-25 03:38:41 +02:00
turboderp
67ddb6b4ec Add spec_decode eval 2026-04-25 03:20:48 +02:00
turboderp
036cfecec7 Update branch_decode example 2026-04-25 03:17:37 +02:00
turboderp
8de991aa3b DFlash: Enable SWA 2026-04-24 03:00:23 +02:00
turboderp
cfe137b308 GatedDeltaNet: Fix rewind logic 2026-04-24 02:55:00 +02:00
turboderp
14a2cbff1d Bit of cleanup 2026-04-24 01:01:42 +02:00
turboderp
ee1a45fb0a Add DFlash 2026-04-24 01:01:25 +02:00
turboderp
dcd7cf6310 Add paged KV update kernel 2026-04-24 00:57:16 +02:00
turboderp
875dbc6a9f Update generator example 2026-04-24 00:07:47 +02:00
turboderp
cb50b9fa6a afmoe: Assert topk_groups=1 and use dots router 2026-04-23 21:05:21 +02:00
turboderp
a909138d1c Merge remote-tracking branch 'origin/dev' into dev 2026-04-23 20:03:21 +02:00
turboderp
34bbeca42b Merge pull request #198 from AlpinDale/feat/afmoe-support
feat: add support for `AfmoeForCausalLM` family of models
2026-04-23 20:03:10 +02:00
turboderp
d602ede0eb chat.py: Optional sysprompt in chatml template 2026-04-23 19:27:25 +02:00
turboderp
f5ab2bcf4e RoPE: Fix bad optional access in kernel launch 2026-04-23 19:25:19 +02:00
AlpinDale
e7511e1df8 feat: add support for AfmoeForCausalLM family of models 2026-04-23 21:00:41 +04:30
turboderp
1d90dfb64e Architectures: Make sure layer_idx is set and propagated for all top-level modules 2026-04-22 14:15:21 +02:00
turboderp
fa9bf06468 Add ability to use recurrent/SWA models as SD targets 2026-04-22 13:47:06 +02:00
turboderp
f8be5ff566 Generator: Add ngram drafting 2026-04-22 13:45:07 +02:00
turboderp
5f550a379f Bump to v0.0.30 v0.0.30 2026-04-19 19:48:11 +02:00
turboderp
2fa7593f3b model_init: Add draft model initialization 2026-04-19 19:44:34 +02:00
turboderp
e6ec078e17 SWA: Fix regression 2026-04-18 22:54:34 +02:00
turboderp
716f1748bd Merge remote-tracking branch 'origin/dev' into dev 2026-04-18 18:55:21 +02:00
turboderp
1677054088 Merge pull request #195 from jwinpbe/patch-1
update setup.py with current repository URL
2026-04-18 18:54:49 +02:00
turboderp
9a67b44535 Step3.5: Support SWA mode
- Add headwise gate to SlidingAttention
2026-04-18 18:53:29 +02:00
turboderp
7f6459f259 MMLU eval: Add redux option 2026-04-18 18:30:25 +02:00
turboderp
471a6c7ead SlidingAttention: Use variable-length state, append and only roll every 256 tokens 2026-04-18 18:30:25 +02:00
turboderp
5418863174 Attn: Avoid potentially processing (and masking out) future tokens in SWA 2026-04-18 16:41:01 +02:00
turboderp
93f92280df Gemma4: Fix multimodal span logic 2026-04-18 16:36:59 +02:00
turboderp
0d3893face Loader: Allow specifying max bsz for autosplit to better estimate recurrent state VRAM overhead 2026-04-18 14:12:29 +02:00
JwinPBE
f7da9c58e1 update setup.py with current repository URL 2026-04-18 02:29:04 -04:00
turboderp
07d3765772 Merge remote-tracking branch 'origin/dev' into dev 2026-04-17 01:09:21 +02:00
turboderp
224110db73 Gemma4: Add SlidingAttention 2026-04-17 01:01:58 +02:00
turboderp
6744acca7a Add SlidingAttention 2026-04-17 01:01:44 +02:00
turboderp
2233f62df5 Merge pull request #192 from mratsim/patch-2
fix regression: 'final_bits' in recompile.py - mirrors 2e52f80480
2026-04-16 03:55:13 +02:00
turboderp
e765043846 Generator: Allow atomic prefill with recurrent modules 2026-04-16 03:47:42 +02:00
turboderp
1301689070 Generator: Architecture-specific defaults for checkpoint interval 2026-04-16 03:47:11 +02:00
turboderp
75a6655dba Generator: Ensure state checkpoint when requeuing 2026-04-16 03:46:04 +02:00
turboderp
a743bfa856 Generator: Allow state checkpoint to cover span of positions 2026-04-16 03:45:26 +02:00
turboderp
18203ae7bf Loader: More precise memory limits for autosplit, simulate memory allocations for cached inference. Also account for recurrent states 2026-04-16 03:42:56 +02:00
turboderp
3853eb44aa Slight refactoring and cleanup 2026-04-16 03:38:38 +02:00
Mamy Ratsimbazafy
e7ccf0fe2b fix regression: 'final_bits' in recompile.py - mirrors 2e52f80480 2026-04-15 21:34:04 +02:00