ik_llama.cpp/github-data/pull_requests/98 - Avoid rebuild of GGML graph for each token.md at ik/refactor_llama.cpp - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-29 02:41:47 +00:00

Files

Thomas 0451f10a42 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

965 B

Raw Permalink Blame History

🔀 #98 - Avoid rebuild of GGML graph for each token

Author	`agray3`
State	❌ Closed
Created	2024-10-19
Updated	2024-10-20

Description

Introduces caching of GGML graph to avoid unnecessary full rebuild between each token. KV cache parameters, which change with each token, are updated directly in cached GGML graph. Can be disabled with GGML_DISABLE_GRAPH_CACHING environment variable.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

💬 Conversation

👤 agray3 commented the 2024-10-19 at 19:19:21:

See https://github.com/ikawrakow/ik_llama.cpp/pull/94

👤 ikawrakow submitted a review the 2024-10-20 at 06:35:58: ✅ APPROVED

965 B Raw Permalink Blame History

🔀 #98 - Avoid rebuild of GGML graph for each token

Description

💬 Conversation

965 B

Raw Permalink Blame History