ik_llama.cpp/582 - Vulkan_ adding GGML_OP_MULTI_ADD implementation.md at 993cb00a347fc77632b73126f614092d659727de - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-02 12:39:54 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

742 B

Raw Blame History

🔀 #582 - Vulkan: adding GGML_OP_MULTI_ADD implementation

Author	`ikawrakow`
State	❌ Closed
Created	2025-07-04
Updated	2025-07-04

Description

This is relevant for MoE models. The performance improvement is surprisingly small. Somewhere it was mentioned that Vulkan kernel launch overhead is significantly larger than CUDA, so I would have expected a more significant performance benefit. For DeepSeek-Lite, the number of graph nodes in ik_llama.cpp with this PR is 1420 vs 1871 in mainline llama.cpp.

But, if nothing else, this removes the last Vulkan special-casing when building the compute graph.

742 B Raw Blame History

🔀 #582 - Vulkan: adding GGML_OP_MULTI_ADD implementation

Description

742 B

Raw Blame History