mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-01-26 09:09:50 +00:00
1.1 KiB
1.1 KiB
🔀 #238 - A better way to measure the cost of ggml_barrier
| Author | ikawrakow |
|---|---|
| State | ❌ Closed |
| Created | 2025-03-01 |
| Updated | 2025-03-01 |
Description
Trying to measure it on each ggml_barrier invocation is too imprecise as the best time resolution we have in ggml is 1 us. Hence, measure the total graph execution time and and the sum of the node execution times. The difference is then the cost of thread synchronization via ggml_barrier.
Using this on TG runs with DeepSeek-Lite I'm finding that ggml_barrier costs about 7% of the graph evaluation time when running on the CPU.
💬 Conversation
👤 davidsyoung commented the 2025-03-01 at 09:51:17:
@ikawrakow you are seriously cooking!
👤 ikawrakow commented the 2025-03-01 at 15:12:54:
@ikawrakow you are seriously cooking!
I like cooking. Well, at least this kind of cooking. Real cooking I tend to avoid by going to restaurants.