Files
ik_llama.cpp/github-data/pull_requests/238 - A better way to measure the cost of ggml_barrier.md
2025-07-23 13:31:53 +02:00

1.1 KiB

🔀 #238 - A better way to measure the cost of ggml_barrier

Author ikawrakow
State Closed
Created 2025-03-01
Updated 2025-03-01

Description

Trying to measure it on each ggml_barrier invocation is too imprecise as the best time resolution we have in ggml is 1 us. Hence, measure the total graph execution time and and the sum of the node execution times. The difference is then the cost of thread synchronization via ggml_barrier.

Using this on TG runs with DeepSeek-Lite I'm finding that ggml_barrier costs about 7% of the graph evaluation time when running on the CPU.


💬 Conversation

👤 davidsyoung commented the 2025-03-01 at 09:51:17:

@ikawrakow you are seriously cooking!


👤 ikawrakow commented the 2025-03-01 at 15:12:54:

@ikawrakow you are seriously cooking!

I like cooking. Well, at least this kind of cooking. Real cooking I tend to avoid by going to restaurants.