turboderp
|
4e587cd19b
|
Actions: Fix typo
|
2026-04-27 21:17:32 +02:00 |
|
turboderp
|
5975a65cb1
|
Actions: Add retry for downloading windows dependencies
v0.0.31
|
2026-04-27 20:40:37 +02:00 |
|
turboderp
|
aa86876907
|
Bump to v0.0.31
|
2026-04-27 20:38:03 +02:00 |
|
turboderp
|
f262b0851b
|
Qwen3-VL: Accept "qwen3_5_vision" vision model type
|
2026-04-27 20:37:35 +02:00 |
|
turboderp
|
76a178ebc1
|
RYS: Add RYS relayering
|
2026-04-26 23:39:26 +02:00 |
|
turboderp
|
57031e1f1b
|
chat.py: Fix colors
|
2026-04-26 23:04:40 +02:00 |
|
turboderp
|
602cc7447f
|
Merge branch 'fork/mayakwd/feature/lora' into dev
|
2026-04-25 21:16:59 +02:00 |
|
turboderp
|
dc8de61cdd
|
Bit of cleanup
|
2026-04-25 20:40:27 +02:00 |
|
turboderp
|
5416d5acc9
|
spec_decode eval: Allow overriding default num_draft_tokens
|
2026-04-25 04:00:18 +02:00 |
|
turboderp
|
001c952b71
|
DFlash: Fix incorrect input layer dim
|
2026-04-25 03:38:41 +02:00 |
|
turboderp
|
67ddb6b4ec
|
Add spec_decode eval
|
2026-04-25 03:20:48 +02:00 |
|
turboderp
|
036cfecec7
|
Update branch_decode example
|
2026-04-25 03:17:37 +02:00 |
|
turboderp
|
8de991aa3b
|
DFlash: Enable SWA
|
2026-04-24 03:00:23 +02:00 |
|
turboderp
|
cfe137b308
|
GatedDeltaNet: Fix rewind logic
|
2026-04-24 02:55:00 +02:00 |
|
turboderp
|
14a2cbff1d
|
Bit of cleanup
|
2026-04-24 01:01:42 +02:00 |
|
turboderp
|
ee1a45fb0a
|
Add DFlash
|
2026-04-24 01:01:25 +02:00 |
|
turboderp
|
dcd7cf6310
|
Add paged KV update kernel
|
2026-04-24 00:57:16 +02:00 |
|
turboderp
|
875dbc6a9f
|
Update generator example
|
2026-04-24 00:07:47 +02:00 |
|
turboderp
|
cb50b9fa6a
|
afmoe: Assert topk_groups=1 and use dots router
|
2026-04-23 21:05:21 +02:00 |
|
turboderp
|
a909138d1c
|
Merge remote-tracking branch 'origin/dev' into dev
|
2026-04-23 20:03:21 +02:00 |
|
turboderp
|
34bbeca42b
|
Merge pull request #198 from AlpinDale/feat/afmoe-support
feat: add support for `AfmoeForCausalLM` family of models
|
2026-04-23 20:03:10 +02:00 |
|
turboderp
|
d602ede0eb
|
chat.py: Optional sysprompt in chatml template
|
2026-04-23 19:27:25 +02:00 |
|
turboderp
|
f5ab2bcf4e
|
RoPE: Fix bad optional access in kernel launch
|
2026-04-23 19:25:19 +02:00 |
|
AlpinDale
|
e7511e1df8
|
feat: add support for AfmoeForCausalLM family of models
|
2026-04-23 21:00:41 +04:30 |
|
turboderp
|
1d90dfb64e
|
Architectures: Make sure layer_idx is set and propagated for all top-level modules
|
2026-04-22 14:15:21 +02:00 |
|
turboderp
|
fa9bf06468
|
Add ability to use recurrent/SWA models as SD targets
|
2026-04-22 13:47:06 +02:00 |
|
turboderp
|
f8be5ff566
|
Generator: Add ngram drafting
|
2026-04-22 13:45:07 +02:00 |
|
turboderp
|
5f550a379f
|
Bump to v0.0.30
v0.0.30
|
2026-04-19 19:48:11 +02:00 |
|
turboderp
|
2fa7593f3b
|
model_init: Add draft model initialization
|
2026-04-19 19:44:34 +02:00 |
|
turboderp
|
e6ec078e17
|
SWA: Fix regression
|
2026-04-18 22:54:34 +02:00 |
|
turboderp
|
716f1748bd
|
Merge remote-tracking branch 'origin/dev' into dev
|
2026-04-18 18:55:21 +02:00 |
|
turboderp
|
1677054088
|
Merge pull request #195 from jwinpbe/patch-1
update setup.py with current repository URL
|
2026-04-18 18:54:49 +02:00 |
|
turboderp
|
9a67b44535
|
Step3.5: Support SWA mode
- Add headwise gate to SlidingAttention
|
2026-04-18 18:53:29 +02:00 |
|
turboderp
|
7f6459f259
|
MMLU eval: Add redux option
|
2026-04-18 18:30:25 +02:00 |
|
turboderp
|
471a6c7ead
|
SlidingAttention: Use variable-length state, append and only roll every 256 tokens
|
2026-04-18 18:30:25 +02:00 |
|
turboderp
|
5418863174
|
Attn: Avoid potentially processing (and masking out) future tokens in SWA
|
2026-04-18 16:41:01 +02:00 |
|
turboderp
|
93f92280df
|
Gemma4: Fix multimodal span logic
|
2026-04-18 16:36:59 +02:00 |
|
turboderp
|
0d3893face
|
Loader: Allow specifying max bsz for autosplit to better estimate recurrent state VRAM overhead
|
2026-04-18 14:12:29 +02:00 |
|
JwinPBE
|
f7da9c58e1
|
update setup.py with current repository URL
|
2026-04-18 02:29:04 -04:00 |
|
turboderp
|
07d3765772
|
Merge remote-tracking branch 'origin/dev' into dev
|
2026-04-17 01:09:21 +02:00 |
|
turboderp
|
224110db73
|
Gemma4: Add SlidingAttention
|
2026-04-17 01:01:58 +02:00 |
|
turboderp
|
6744acca7a
|
Add SlidingAttention
|
2026-04-17 01:01:44 +02:00 |
|
turboderp
|
2233f62df5
|
Merge pull request #192 from mratsim/patch-2
fix regression: 'final_bits' in recompile.py - mirrors 2e52f80480
|
2026-04-16 03:55:13 +02:00 |
|
turboderp
|
e765043846
|
Generator: Allow atomic prefill with recurrent modules
|
2026-04-16 03:47:42 +02:00 |
|
turboderp
|
1301689070
|
Generator: Architecture-specific defaults for checkpoint interval
|
2026-04-16 03:47:11 +02:00 |
|
turboderp
|
75a6655dba
|
Generator: Ensure state checkpoint when requeuing
|
2026-04-16 03:46:04 +02:00 |
|
turboderp
|
a743bfa856
|
Generator: Allow state checkpoint to cover span of positions
|
2026-04-16 03:45:26 +02:00 |
|
turboderp
|
18203ae7bf
|
Loader: More precise memory limits for autosplit, simulate memory allocations for cached inference. Also account for recurrent states
|
2026-04-16 03:42:56 +02:00 |
|
turboderp
|
3853eb44aa
|
Slight refactoring and cleanup
|
2026-04-16 03:38:38 +02:00 |
|
Mamy Ratsimbazafy
|
e7ccf0fe2b
|
fix regression: 'final_bits' in recompile.py - mirrors 2e52f80480
|
2026-04-15 21:34:04 +02:00 |
|