mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-13 15:30:03 +00:00

Files

Georgi Gerganov ede7949722 sampling : refactor init to use llama_sampling_params (#3696 )

* sampling : refactor init to use llama_sampling_params

* llama : combine repetition, frequency and presence penalties in 1 call

* examples : remove embd-input and gptneox-wip

* sampling : rename penalty params + reduce size of "prev" vector

* sampling : add llama_sampling_print helper

* sampling : hide prev behind API and apply #3661

ggml-ci

2023-10-20 21:07:23 +03:00

CMakeLists.txt

llama : custom attention mask + parallel decoding + no context swaps (#3228 )

2023-09-28 19:04:36 +03:00

parallel.cpp

sampling : refactor init to use llama_sampling_params (#3696 )

2023-10-20 21:07:23 +03:00

README.md

llama : custom attention mask + parallel decoding + no context swaps (#3228 )

2023-09-28 19:04:36 +03:00

README.md

llama.cpp/example/parallel

Simplified simluation for serving incoming requests in parallel