Kawrakow
3735e88925
Remove unused tensors from delta-net ( #1350 )
2026-03-02 16:02:40 +01:00
Kawrakow
a568e12c8f
Minor delta-net tweak ( #1337 )
2026-03-01 17:45:02 +01:00
Kawrakow
0ff3a43289
Bring back #1333 and #1335 ( #1340 )
...
* Bring back fused delta net 3
* Remove autoregressive and chunking
2026-02-28 14:31:42 +01:00
Kawrakow
1922449b2c
Revert delta net 3 ( #1339 )
...
* Revert "Simplify delta-net (#1335 )"
This reverts commit e5fc30244c .
* Revert "Fused delta net 3 (#1333 )"
This reverts commit 7b68353e09 .
2026-02-28 13:12:08 +01:00
Kawrakow
e5fc30244c
Simplify delta-net ( #1335 )
...
* Simplify delta-net
* Minor
* Minor
2026-02-28 11:12:19 +01:00
Kawrakow
7b68353e09
Fused delta net 3 ( #1333 )
...
* This is better than chunked
* Keep the state in registers
* Cleanup
* Remove unused stuff
* Minor
* Make fused delta-net the default
* Fix race
2026-02-27 15:02:56 +01:00
Kawrakow
c77ec4b8b8
Fused delta-net ( #1315 )
...
* Revive fused delta-net
* Add command line argument for fused delta net
* Simplify/improve CUDA delta-net
* Add -fdn to llama-bench
* More CUDA fused delta net optimizations
* CPU optimizations
* Much faster fused delta-net on the CPU
It seems it is faster than the chunked implementation!
* Change meaning of fdn from bool flag to threshold value
* Use eps = 1e-6
* Give some nodes a name
2026-02-25 14:12:48 +01:00
Kawrakow
38ca19d828
Minor delta-net tweak ( #1308 )
...
* Make sure we pick the reduced tensor from the right GPU
* Minor
* Minor delta-net tweak
2026-02-24 15:22:57 +01:00
Kawrakow
5dacb5355a
Graph parallel for Qwen3-Next ( #1292 )
...
* WIP
* This works, but is slower than split mode layer
2026-02-23 07:58:00 +01:00
Kawrakow
13c3d83ce7
Qwen3.5-MoE support ( #1288 )
...
* WIP: loads and runs, but not correct
Very high PPL, empty TG.
* This appears to work
2026-02-21 08:33:06 +01:00
Kawrakow
04cf685e82
Factor out delta net ( #1286 )
...
* WIP: factor out delta net implementation
* WIP
* Use the standard FFN functions
* More standard attn for Qwen3-Next
2026-02-18 17:16:17 +01:00