ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 17:20:01 +00:00

Author	SHA1	Message	Date
firecoperana	d71a3ec315	Server: refactor and rename functions (#1151 ) * Server: rename functions and refactor code rename functions refactor update slots rename params_base rename timings * change * Revert kv cache name changes * Revert 2 * fix test build error --------- Co-authored-by: firecoperana <firecoperana>	2026-01-18 08:16:57 +02:00
firecoperana	672df48ed1	server: keep logit bias unchanged when client does not set it (#1144 ) Co-authored-by: firecoperana <firecoperana>	2026-01-13 18:08:09 +02:00
hksdpc255	e1c4c4a495	Fix Anthropic Messages API (#1136 ) * server: stop processing the prompt when client disconnects implement generator-based API for task results Update httplib.h to 0.27.0 Fix embedding error Stop prompt processing when disconnected * Port upstream https://github.com/ggml-org/llama.cpp/pull/18551 * add back anthropic * Fix merge issue caused by github webui --------- Co-authored-by: firecoperana <firecoperana>	2026-01-13 08:37:29 +02:00
firecoperana	1a461525d5	server: stop processing the prompt when client disconnects (#1134 ) implement generator-based API for task results Update httplib.h to 0.27.0 Fix embedding error Stop prompt processing when disconnected Co-authored-by: firecoperana <firecoperana>	2026-01-13 07:56:59 +02:00
Kawrakow	d3e3ad40f9	Compiler warning and white space	2026-01-12 19:06:17 +02:00
firecoperana	c03ee1a4d2	server: improve speed of speculative decoding (#1119 ) * server: improve speed of speculative decoding change logs rpc: add recompute spec dec fix * Fix n_batch_size not set to context size for draft model --------- Co-authored-by: firecoperana <firecoperana>	2026-01-10 08:01:22 +02:00
dungquixote42	52ad1c6421	Implement Adaptive-P Sampler (#1100 ) * initial implementation of adaptive-p sampler * explicitly mark candidates unsorted + cleanup qualifiers * cosmetic update * reorg prototypes * lockstep with mainline * add _impl for _init + reorg * add LLAMA_API to prototypes * update sharpness to 10 * lockstep: rng seed * delete llama_sampling member in llama_sampler_adaptive_p * fix LLAMA_API return type * lockstep: rng seed cont * actually correct implementation * lockstep: sorting behavior * const -> constexpr for known constants * add missing space * fix softmax usage in adaptive p sampler * cosmetic changes * implement do-not-sort version of softmax * simpify rng seed, add static to constexpr * refactor: remove iface + use shared rng + use actually original probabilities * adaptive-p: add dedicated rng back in * fix initial max_logit + add float vector to adaptive p sampler context + stochastic sampling * adaptive-p: fuse first softmax with transformation * adaptive-p: implement binary search selection * adaptive-p: update comment	2026-01-10 07:58:53 +02:00
firecoperana	2a633c4357	server: exclude thinking tokens when finding the slot (#1079 ) refactor find slot enable by default Fix load prompt rename variables Co-authored-by: firecoperana <firecoperana>	2025-12-22 09:46:45 +01:00
firecoperana	0e91b89cd3	Refactor chat and server file (#1062 ) * Add alternative log functions * chat: fix int overflow, prevent size calculation in float/double (#17357) * chat: fix int overflow, prevent size calculation in float/double * Update common/chat.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * common : move all common_chat_parse_* to chat-parser.cpp. (#17481) # Conflicts: # common/chat.cpp * server: split server.cpp code into server/common/task/queue/context * Fix compiler warning * Clean up code * common: use native MultiByteToWideChar * move server prompt to server task * Clean code * delete utils.hpp --------- Co-authored-by: firecoperana <firecoperana> Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: DAN™ <dranger003@gmail.com>	2025-12-15 08:27:20 +01:00

9 Commits