ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-07 20:40:02 +00:00

Author	SHA1	Message	Date
dungquixote42	a903409a5e	fix adaptive p sampler rewinding too far back (#1359 ) * fix adaptive p sampler rewinding too far back * update comments * correct default value for total_weight, more comments * new variables/names * update comment for n_rewind * move null pointer check back to common_sampler_review() * refactor weighted_sum and total_weight to vector<pair>, better boundary check in llama_review_adaptive_p_impl()	2026-03-04 13:26:25 +01:00
dungquixote42	aaa545c3dc	adaptive p: collect probability before logit bias (#1314 )	2026-02-24 15:39:17 +01:00
dungquixote42	0f411b02e2	Fix adaptive p sampler bug with string ban (#1287 ) * adaptive p: upadte internal state only if not rewinding * adaptive p: conditional update for speculative decoding * adaptive p: refactor to rewind instead of update * adaptive p fix: better comments * fix rewind check * add record to handle multi-token rewind * better comment	2026-02-20 07:11:36 +01:00
firecoperana	1cb7e1bf39	spec : add self speculative decoding, ngram and refactor (#1261 ) * spec : add self speculative decoding and ngram-mod and refactor common : use common_ prefix for common library function llama : use LLAMA_TOKEN_NULL spec : add self speculative decoding (no draft model required) + refactor spec : add ngram-mod spec : various improvements ton ngram-map + docs spec : fix the check-rate logic of ngram-simple common : add common_speculative_is_compat() spec : simplify time measurement using common_time_meas refactor common_sampler_init refactor common_token_to_piece refactor and fix cur_p bug clean up * spec : remove check rate * spec: show warnings instead of abort --------- Co-authored-by: firecoperana <firecoperana> Co-authored-by: Sascha Rogmann <59577610+srogmann@users.noreply.github.com>	2026-02-13 19:04:55 +01:00
dungquixote42	b86d8024a5	Adaptive p: history update fix + temp as flag (#1213 ) * adaptive_p: fix history update + use current probability for high temp * adaptive_p: fix history update bug, update with current probability if temp is high * replace temp-as-signal with server argument * adaptive_p: rename ema_w_cur_p to updt_w_cur * delete test code	2026-02-03 07:36:12 +02:00
Kawrakow	98b30e5e81	Faster adaptive_p sampling (#1165 ) * A hopefully more efficient adaptive_p sampling * Once at it, lets fix the formatting too * More formatting * Hopefully better * This should be better * Correctly accumulate adaptive_p sampling time * AVX2	2026-01-19 16:03:09 +02:00
Kawrakow	fa58c20c42	A hopefully more efficient adaptive_p sampling (#1161 ) * A hopefully more efficient adaptive_p sampling * Once at it, lets fix the formatting too * More formatting * Correctly accumulate sampling time for adaptive_p	2026-01-19 15:01:55 +02:00
dungquixote42	6dfbef27ec	Adaptive p: bugfix + optimization + refactor (#1155 ) * adaptive-p sampler: fix zeroed orig_probs bug and refactor - Fix bug where original probabilities were captured as zero by calculating them from logits in llama_prep_adaptive_p (new). - Replace vector with unordered_map to track candidate probabilities, filtering for relevance via logit delta (16.6f). - Standardize API naming: llama_<action/verb>_<focus/name/topic>_<extra/info> - Update function signatures to follow most other samplers. * resolve merge bug * adaptive-p: revert reordering function definitions	2026-01-18 08:26:06 +02:00
dungquixote42	52ad1c6421	Implement Adaptive-P Sampler (#1100 ) * initial implementation of adaptive-p sampler * explicitly mark candidates unsorted + cleanup qualifiers * cosmetic update * reorg prototypes * lockstep with mainline * add _impl for _init + reorg * add LLAMA_API to prototypes * update sharpness to 10 * lockstep: rng seed * delete llama_sampling member in llama_sampler_adaptive_p * fix LLAMA_API return type * lockstep: rng seed cont * actually correct implementation * lockstep: sorting behavior * const -> constexpr for known constants * add missing space * fix softmax usage in adaptive p sampler * cosmetic changes * implement do-not-sort version of softmax * simpify rng seed, add static to constexpr * refactor: remove iface + use shared rng + use actually original probabilities * adaptive-p: add dedicated rng back in * fix initial max_logit + add float vector to adaptive p sampler context + stochastic sampling * adaptive-p: fuse first softmax with transformation * adaptive-p: implement binary search selection * adaptive-p: update comment	2026-01-10 07:58:53 +02:00
firecoperana	d1f92e24d3	add dry sampler (#513 ) * add dry sampler * use vocab instead of model in dry_init function * fix compile error for build test --------- Co-authored-by: firecoperana <firecoperana>	2025-06-19 10:24:53 +03:00
Kawrakow	1d28b2a9a1	Adding top-n-sigma sampler (#489 ) * Adding top-n-sigma sampler * Fix typos in XTC PR * Update README.md for main and server * More README * More README --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2025-06-03 17:35:09 +03:00
Kawrakow	accf69b126	Adding the XTC sampler (#486 ) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2025-06-03 11:32:03 +03:00
Kawrakow	0ceeb11721	Merge mainline llama.cpp (#3 ) * Merging mainline - WIP * Merging mainline - WIP AVX2 and CUDA appear to work. CUDA performance seems slightly (~1-2%) lower as it is so often the case with llama.cpp/ggml after some "improvements" have been made. * Merging mainline - fix Metal * Remove check --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-07-27 07:55:01 +02:00

13 Commits