Fused delta net 2 (#1320)

* Revive fused delta-net * Add command line argument for fused delta net * Simplify/improve CUDA delta-net * Add -fdn to llama-bench * More CUDA fused delta net optimizations * CPU optimizations * Much faster fused delta-net on the CPU It seems it is faster than the chunked implementation! * Change meaning of fdn from bool flag to threshold value * Use eps = 1e-6 * Give some nodes a name * Don't re-apply L2 norm - it has already been done * This seems quite a bit better * More tweaks * Restore per context buffer size log Not everybody uses models split in 2000 parts, and those who do, actually want to see the biffer sizes.
2026-02-28 17:14:17 +00:00 · 2026-02-26 06:53:43 +01:00
parent 87b35dac0c
commit 2616efa296
3 changed files with 41 additions and 78 deletions
--- a/src/llama.cpp
+++ b/src/llama.cpp
@@ -2222,7 +2222,7 @@ static bool llm_load_tensors(

    // print memory requirements
    for (ggml_backend_buffer_t buf : model.bufs) {
-        LLAMA_LOG_DEBUG("%s: %10s buffer size = %8.2f MiB\n", __func__, ggml_backend_buffer_name(buf), ggml_backend_buffer_get_size(buf) / 1024.0 / 1024.0);
+        LLAMA_LOG_INFO("%s: %10s buffer size = %8.2f MiB\n", __func__, ggml_backend_buffer_name(buf), ggml_backend_buffer_get_size(buf) / 1024.0 / 1024.0);
    }

    // populate tensors_by_name