ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-30 19:19:57 +00:00

Files

agray3 f2d315b46f Avoid rebuild of GGML graph for each token (#98 )

Introduces caching of GGML graph to avoid unnecessary full rebuild between each token.
KV cache parameters, which change with each token, are updated directly in cached GGML
graph. Can be disabled with GGML_DISABLE_GRAPH_CACHING environment variable.

2024-10-20 08:36:16 +02:00

ggml-alloc.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-backend.h

Avoid rebuild of GGML graph for each token (#98 )

2024-10-20 08:36:16 +02:00

ggml-blas.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-cann.h

Merge mainline llama.cpp (#3 )