mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-28 17:14:17 +00:00
Fused delta net 2 (#1320)
* Revive fused delta-net * Add command line argument for fused delta net * Simplify/improve CUDA delta-net * Add -fdn to llama-bench * More CUDA fused delta net optimizations * CPU optimizations * Much faster fused delta-net on the CPU It seems it is faster than the chunked implementation! * Change meaning of fdn from bool flag to threshold value * Use eps = 1e-6 * Give some nodes a name * Don't re-apply L2 norm - it has already been done * This seems quite a bit better * More tweaks * Restore per context buffer size log Not everybody uses models split in 2000 parts, and those who do, actually want to see the biffer sizes.
This commit is contained in:
@@ -2222,7 +2222,7 @@ static bool llm_load_tensors(
|
||||
|
||||
// print memory requirements
|
||||
for (ggml_backend_buffer_t buf : model.bufs) {
|
||||
LLAMA_LOG_DEBUG("%s: %10s buffer size = %8.2f MiB\n", __func__, ggml_backend_buffer_name(buf), ggml_backend_buffer_get_size(buf) / 1024.0 / 1024.0);
|
||||
LLAMA_LOG_INFO("%s: %10s buffer size = %8.2f MiB\n", __func__, ggml_backend_buffer_name(buf), ggml_backend_buffer_get_size(buf) / 1024.0 / 1024.0);
|
||||
}
|
||||
|
||||
// populate tensors_by_name
|
||||
|
||||
Reference in New Issue
Block a user