Files
ik_llama.cpp/ggml
Kawrakow 2616efa296 Fused delta net 2 (#1320)
* Revive fused delta-net

* Add command line argument for fused delta net

* Simplify/improve CUDA delta-net

* Add -fdn to llama-bench

* More CUDA fused delta net optimizations

* CPU optimizations

* Much faster fused delta-net on the CPU

It seems it is faster than the chunked implementation!

* Change meaning of fdn from bool flag to threshold value

* Use eps = 1e-6

* Give some nodes a name

* Don't re-apply L2 norm - it has already been done

* This seems quite a bit better

* More tweaks

* Restore per context buffer size log

Not everybody uses models split in 2000 parts, and those who do,
actually want to see the biffer sizes.
2026-02-26 06:53:43 +01:00
..
2024-07-27 07:55:01 +02:00
2026-02-25 14:12:48 +01:00
2026-02-26 06:53:43 +01:00
2024-07-27 07:55:01 +02:00