Files
ik_llama.cpp/github-data/pull_requests/98 - Avoid rebuild of GGML graph for each token.md
2025-07-23 13:31:53 +02:00

965 B

🔀 #98 - Avoid rebuild of GGML graph for each token

Author agray3
State Closed
Created 2024-10-19
Updated 2024-10-20

Description

Introduces caching of GGML graph to avoid unnecessary full rebuild between each token. KV cache parameters, which change with each token, are updated directly in cached GGML graph. Can be disabled with GGML_DISABLE_GRAPH_CACHING environment variable.


💬 Conversation

👤 agray3 commented the 2024-10-19 at 19:19:21:

See https://github.com/ikawrakow/ik_llama.cpp/pull/94


👤 ikawrakow submitted a review the 2024-10-20 at 06:35:58: APPROVED