Files
ik_llama.cpp/github-data/pull_requests/582 - Vulkan_ adding GGML_OP_MULTI_ADD implementation.md
2025-07-23 13:31:53 +02:00

742 B

🔀 #582 - Vulkan: adding GGML_OP_MULTI_ADD implementation

Author ikawrakow
State Closed
Created 2025-07-04
Updated 2025-07-04

Description

This is relevant for MoE models. The performance improvement is surprisingly small. Somewhere it was mentioned that Vulkan kernel launch overhead is significantly larger than CUDA, so I would have expected a more significant performance benefit. For DeepSeek-Lite, the number of graph nodes in ik_llama.cpp with this PR is 1420 vs 1871 in mainline llama.cpp.

But, if nothing else, this removes the last Vulkan special-casing when building the compute graph.