Fused delta-net (#1315)

* Revive fused delta-net

* Add command line argument for fused delta net

* Simplify/improve CUDA delta-net

* Add -fdn to llama-bench

* More CUDA fused delta net optimizations

* CPU optimizations

* Much faster fused delta-net on the CPU

It seems it is faster than the chunked implementation!

* Change meaning of fdn from bool flag to threshold value

* Use eps = 1e-6

* Give some nodes a name
This commit is contained in:
Kawrakow
2026-02-25 14:12:48 +01:00
committed by GitHub
parent 0bf7043a7b
commit c77ec4b8b8
15 changed files with 1002 additions and 13 deletions

View File

@@ -456,6 +456,7 @@ extern "C" {
bool split_mode_graph_scheduling; // if true, force split mode graph scheduling
//bool split_mode_f16; // if true, cast intermediate results to f16 before copying to other GPUs
bool scheduler_async; // if true, with split mode "graph" graph evaluation will be done using multiple threads
int fused_delta_net;
bool mtp; // Activate MTP if supported
enum llama_mtp_op_type mtp_op_type;