mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-31 03:29:52 +00:00

Files

Georgi Gerganov 2743064b15 refact : fix convert script + zero out KV cache to avoid nans (#3523 )

* refact : fix convert script + zero out KV cache to avoid nans

* ggml : silu(-inf) should never happen

* metal : assert various kernel requirements

2023-10-09 14:32:17 +03:00

CMakeLists.txt

llama : custom attention mask + parallel decoding + no context swaps (#3228 )

2023-09-28 19:04:36 +03:00

parallel.cpp

refact : fix convert script + zero out KV cache to avoid nans (#3523 )

2023-10-09 14:32:17 +03:00

README.md

llama : custom attention mask + parallel decoding + no context swaps (#3228 )

2023-09-28 19:04:36 +03:00

README.md

llama.cpp/example/parallel

Simplified simluation for serving incoming requests in parallel