ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-27 09:53:40 +00:00

Author	SHA1	Message	Date
Michael Hueschen	e12a06272d	nix: add cc to devShell LD_LIBRARY_PATH this fixes the error I encountered when trying to run the convert.py script in a venv: ``` $ nix develop [...]$ source .venv/bin/activate (.venv) [...]$ pip3 install -r requirements.txt <... clipped ...> [...]$ python3 ./convert.py Traceback (most recent call last): File "/home/mhueschen/projects-reference/llama.cpp/./convert.py", line 40, in <module> from sentencepiece import SentencePieceProcessor File "/home/mhueschen/projects-reference/llama.cpp/.venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 13, in <module> from . import _sentencepiece ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory ``` however, I am not sure this is the cleanest way to address this linker issue...	2024-01-24 12:39:29 +00:00
slaren	ab0c5dbd6d	llama : pre-allocate input tensors in a separate buffer (#5100 )	2024-01-24 12:48:14 +01:00
Georgi Gerganov	a4ce5bf351	metal : disable support for MUL_MAT F32 x F16	2024-01-23 15:50:56 +02:00
Kawrakow	07be9cef49	Additional KL-divergence statistics (#5081 ) * perplexity: add top-token probability * perplexity: add additional KL-divergence statistics * perplexity: a better organized KL-divergence statistics output --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-23 15:17:20 +02:00
Johannes Gäßler	fa690025e6	CUDA: more info when no device code (#5088 )	2024-01-23 13:31:56 +01:00
Georgi Gerganov	0beb2d8bf4	minor : clean-up some warnings and style (#5094 ) * minor : clean-up some warnings and style ggml-ci * ggml : add comment	2024-01-23 14:12:57 +02:00
Xuan Son Nguyen	8bb43a2380	devops : add intel oneapi dockerfile (#5068 ) Co-authored-by: Xuan Son Nguyen <xuanson.nguyen@snowpack.eu>	2024-01-23 09:11:39 +02:00
Michael Coppola	05e68851a2	llama.vim : added api key support (#5090 ) Co-authored-by: Michael Coppola <info@michaeljcoppola.com>	2024-01-23 08:51:27 +02:00
slaren	85013d185e	llama : fix not enough space in buffer with Qwen (#5086 )	2024-01-22 23:42:41 +01:00
Kawrakow	21124f8250	KL-divergence (#5076 ) * kl-divergence: be able to save all logits to a file * Add ability to compute KL-divergence --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-22 16:10:14 +02:00
Reinforce-II	db23c1e61b	ggml : parallelize FP32 conversion when using BLAS (#5045 ) * make GGML_TASK_INIT phase can be run in multithread * multithreaded dequantize in mul_mat when using blas library * minor fixes * update outdated comment * fix coding style * simplify code Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-22 15:15:08 +02:00
XiaotaoChen	27a6a3d428	llava : MobileVLM support (#4954 ) * MobileVLM native implementation * delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake * move android script to example/llava directory * Fix the editor config checks --------- Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>	2024-01-22 15:09:35 +02:00
Someone Serge	7cf6f6f7e7	flake.nix: add a comment about flakes vs nix	2024-01-22 12:19:30 +00:00
Someone Serge	1ff9757668	nix: add a comment on the many nixpkgs-with-cuda instances	2024-01-22 12:19:30 +00:00
Someone Serge	f622bb7e14	nix: add a comment about makeScope	2024-01-22 12:19:30 +00:00
Someone Serge	ec81abd9a5	nix: refactor the cleanSource rules	2024-01-22 12:19:30 +00:00
Someone Serge	b9f0b6782d	workflows: nix-ci: drop the redundant "paths" filter	2024-01-22 12:19:30 +00:00
Someone Serge	0146a1a253	workflows: nix-build-aarch64: rate limit	2024-01-22 12:19:30 +00:00
Someone Serge	fbceda0636	workflows: nix-ci: rebuild on flake.lock updates	2024-01-22 12:19:30 +00:00
Kawrakow	c394fe969c	imatrix : keep intermediate imatrix results (#5077 ) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-22 14:18:43 +02:00
compilade	9cfd9f45ca	llama : support StableLM 2 1.6B (#5052 ) * llama : support StableLM 2 1.6B * convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}] * convert : refactor Qwen's set_vocab to use it for StableLM 2 too * nix : add tiktoken to llama-python-extra * convert : use presence of tokenizer.json to determine StableLM tokenizer loader It's a less arbitrary heuristic than the vocab size.	2024-01-22 13:21:52 +02:00
Daniel Bevenius	0244a6ceb3	finetune : print sample-start/include-sample-start (#5072 ) This commit adds `--sample-start` and `--include-sample-start` to the output from the main function in finetune.cpp. The motivation for this is that even though these are set explicitly by the user via the command line, if one forgets to set them then it is useful to have their values printed out. Otherwise it is possible to go through the whole training process before realizing that the values are not what one expected. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-01-22 13:11:01 +02:00
Kawrakow	27f6120aa2	llama : add Q3_K_XS (#5060 ) * Add Q3_K_XS - intermediate size between Q2_K and Q3_K_S * Q3_K_XS: quanize first 1/8 of ffn_down layers with Q4_K Together with an importance matrix, this brings perplexity for LLaMA-v2-70B below the perplexity of the former Q2_K with a 800 MB smaller quantized model size. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-22 12:43:33 +02:00
bobqianic	4aac1d433e	ci : fix Windows CI by updating Intel SDE version (#5053 )	2024-01-22 10:55:05 +02:00
Shijie	a37bce0e93	llama : add more qwen2 models (#5071 )	2024-01-22 09:33:19 +02:00
iSma	3ffdaca35d	Revert LLAMA_NATIVE to OFF in flake.nix (#5066 )	2024-01-21 21:37:13 +00:00
kuronekosaiko	a727920ce6	add safetensors support to convert-lora-to-ggml.py (#5062 ) * add safetensors support to convert-lora-to-ggml.py * Update convert-lora-to-ggml.py Remove white space in line 69.	2024-01-21 17:28:14 +01:00
bobqianic	4a2cf46fe6	add `#include <string>` to unicode.h (#5051 ) Co-authored-by: Jared Van Bortel <jared@nomic.ai>	2024-01-21 10:17:35 -05:00
Kawrakow	9a6b77d595	Add ability to evauate multiple choice tasks (#5047 ) * TruthfulQA: 1st attempt, does not look like it is working The same implementation can be used for HellaSwag as well, so I converted a HellaSwag validation dataset to the binary format used here and tested with that. The score is only around 50, so something is not quite right. * TruthfulQA: works but the result is bad I know it works because if I convert the HellaSwag validation data to the binary format used in the truthful_qa_score() function I get the exact same result as from the hellaswag_score() function. But I guess, the questions are tricky and the way I have done the combination of question + answer is very likely not the best. The TruthfulQA validation dataset contains 817 questions, with random chance result around 19%. With this version I get 29.1% for Mistral-7B and 55.2% for Mistral-7B-Instruct-v0.2. The HF leader board results for these two models are 42.2% and 68.3%, respectively. * TruthfulQA: fix random sample * TruthfulQA: prepare tasks in parallel for large test datasets * Rename truthful_qa to multiple_choice * Make MSVC happy I had forgotten that MSVC does not make constexpr's available inside a lambda. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-21 14:42:44 +02:00
Kawrakow	cb2341a665	Slightly faster imatrix (#5050 ) * imatrix: speedup by avoiding unnecessary allocations and copies * imatrix: add --no-ppl option to skip PPL calculations altogether --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-21 08:01:20 +02:00
Georgi Gerganov	fb68f21765	flake.lock: Update (#5054 ) Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/9b19f5e77dd906cb52dade0b7bd280339d2a1f3d' (2024-01-13) → 'github:NixOS/nixpkgs/bbe7d8f876fbbe7c959c90ba2ae2852220573261' (2024-01-19) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-01-21 03:17:27 +00:00
Jared Van Bortel	4461e32e36	convert : partially revert PR #4818 (#5041 )	2024-01-20 18:14:18 -05:00
Jared Van Bortel	6cac3ffdbf	perplexity : fix MSVC build after #5020 (#5043 ) * perplexity : fix MSVC build after #5020 * try a differerent fix	2024-01-20 17:08:08 +02:00
slaren	ec252dfbb1	llama : run all KQV ops on the CPU with no KV offload (#5049 ) ggml-ci	2024-01-20 17:05:49 +02:00
Herman Semenov	0a093b1cbb	cmake : add support for ccache (#5002 ) * Added support ccache for speedup recompilation * cmake : option to disable ccache --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-20 10:11:31 +02:00
adel boussaken	0891e26f16	Add a dart/flutter binding to README.md (#4882 )	2024-01-20 03:05:43 -05:00
Kylin	b525ea981f	cuda : fix compile error in jetson platform (#4975 ) * cuda: fix compile error in jetson platform * cuda: update comment in ggml-cuda.cu * cuda: update ggml-cuda.cu comment	2024-01-20 09:01:46 +02:00
Uzo Nweke	48d3127e56	finetune : fix ggml_allocr lifetimes (tmp workaround) (#5033 ) * Fix issue with alloc causing max_compute_size to be calculated * remove ggml_allocr_free as suggested in issue #4791	2024-01-19 20:20:50 +02:00
Georgi Gerganov	e4b9f2c841	imatrix : add README.md	2024-01-19 15:24:47 +02:00
Shijie	fc798e7035	llama : support upcoming Qwen2 (#5037 )	2024-01-19 13:53:13 +02:00
Georgi Gerganov	538e806ec1	py : fix flake8 lint	2024-01-19 13:52:22 +02:00
Kawrakow	8eea9170ff	winogrande: evaluate log-probs in parallel (#5036 ) This is a relatively minor performance tweak resulting in ~10% speedup on my system. Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-19 11:39:11 +02:00
chiranko	234f8b0a95	llama : add CodeShell support (#5016 ) * llama: add codeshell support * llama.cpp: fix codeshell with NeoX rope Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-19 11:07:27 +02:00
Kawrakow	cdf1a689c1	perplexity: avoid unnecessary alloocations and logit copies (#5035 ) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-19 11:02:39 +02:00
Georgi Gerganov	49f0f9a5be	perplexity : faster Winogrande via batching (#5024 ) * perplexity : faster Winogrande via batching ggml-ci * perplexity : remove unused function * perplexity : only tokenize selected tasks for Winogrande	2024-01-19 10:45:06 +02:00
John	8ed186203c	llama : fix falcon arch for tied output embeddings (#4978 ) * falcon arch fix for tied output embeddings * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-19 00:12:15 +02:00
Georgi Gerganov	8a4d0e3cf5	cmake : add ggml public headers (#5011 )	2024-01-18 23:36:07 +02:00
Xuan Son Nguyen	4d5cc65823	server : defer tasks when "slot unavailable" (#5018 ) * server: defer task when no slot is available * remove unnecessary log --------- Co-authored-by: Xuan Son Nguyen <xuanson.nguyen@snowpack.eu>	2024-01-18 22:33:05 +02:00
slaren	8d779e531e	llama : fix mlock with no-mmap with Metal (#5025 )	2024-01-18 21:12:15 +01:00
Georgi Gerganov	d23cd25dd0	imatrix : fix assert for src0 non-cont check	2024-01-18 21:45:51 +02:00

1 2 3 4 5 ...

1962 Commits