Commit Graph

11 Commits

Author SHA1 Message Date
Kawrakow
3735e88925 Remove unused tensors from delta-net (#1350) 2026-03-02 16:02:40 +01:00
Kawrakow
a568e12c8f Minor delta-net tweak (#1337) 2026-03-01 17:45:02 +01:00
Kawrakow
0ff3a43289 Bring back #1333 and #1335 (#1340)
* Bring back fused delta net 3

* Remove autoregressive and chunking
2026-02-28 14:31:42 +01:00
Kawrakow
1922449b2c Revert delta net 3 (#1339)
* Revert "Simplify delta-net (#1335)"

This reverts commit e5fc30244c.

* Revert "Fused delta net 3 (#1333)"

This reverts commit 7b68353e09.
2026-02-28 13:12:08 +01:00
Kawrakow
e5fc30244c Simplify delta-net (#1335)
* Simplify delta-net

* Minor

* Minor
2026-02-28 11:12:19 +01:00
Kawrakow
7b68353e09 Fused delta net 3 (#1333)
* This is better than chunked

* Keep the state in registers

* Cleanup

* Remove unused stuff

* Minor

* Make fused delta-net the default

* Fix race
2026-02-27 15:02:56 +01:00
Kawrakow
c77ec4b8b8 Fused delta-net (#1315)
* Revive fused delta-net

* Add command line argument for fused delta net

* Simplify/improve CUDA delta-net

* Add -fdn to llama-bench

* More CUDA fused delta net optimizations

* CPU optimizations

* Much faster fused delta-net on the CPU

It seems it is faster than the chunked implementation!

* Change meaning of fdn from bool flag to threshold value

* Use eps = 1e-6

* Give some nodes a name
2026-02-25 14:12:48 +01:00
Kawrakow
38ca19d828 Minor delta-net tweak (#1308)
* Make sure we pick the reduced tensor from the right GPU

* Minor

* Minor delta-net tweak
2026-02-24 15:22:57 +01:00
Kawrakow
5dacb5355a Graph parallel for Qwen3-Next (#1292)
* WIP

* This works, but is slower than split mode layer
2026-02-23 07:58:00 +01:00
Kawrakow
13c3d83ce7 Qwen3.5-MoE support (#1288)
* WIP: loads and runs, but not correct

Very high PPL, empty TG.

* This appears to work
2026-02-21 08:33:06 +01:00
Kawrakow
04cf685e82 Factor out delta net (#1286)
* WIP: factor out delta net implementation

* WIP

* Use the standard FFN functions

* More standard attn for Qwen3-Next
2026-02-18 17:16:17 +01:00