Fused delta-net (#1315)

* Revive fused delta-net * Add command line argument for fused delta net * Simplify/improve CUDA delta-net * Add -fdn to llama-bench * More CUDA fused delta net optimizations * CPU optimizations * Much faster fused delta-net on the CPU It seems it is faster than the chunked implementation! * Change meaning of fdn from bool flag to threshold value * Use eps = 1e-6 * Give some nodes a name
2026-03-07 04:20:03 +00:00 · 2026-02-25 14:12:48 +01:00
parent 0bf7043a7b
commit c77ec4b8b8
15 changed files with 1002 additions and 13 deletions
--- a/include/llama.h
+++ b/include/llama.h
@@ -456,6 +456,7 @@ extern "C" {
        bool split_mode_graph_scheduling; // if true, force split mode graph scheduling
        //bool split_mode_f16;    // if true, cast intermediate results to f16 before copying to other GPUs
        bool scheduler_async;   // if true, with split mode "graph" graph evaluation will be done using multiple threads
+        int  fused_delta_net;
        bool mtp;   // Activate MTP if supported
        enum llama_mtp_op_type mtp_op_type;