ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-03 13:04:59 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	f8da188258	metal : fix kernel_norm (fixes Falcon on Metal) (#3057 ) * metal : fix kernel_norm ggml-ci * metal : put warning in kernel_norm to not combine the loops * metal : restore original F16 mat-vec multiplication It works after the norm fixes * common : don't do warm-up with more than n_batch tokens (close #3058) ggml-ci * metal : minor	2023-09-07 15:49:09 +03:00
Przemysław Pawełczyk	ce6bb57378	ggml : posixify madvise and pagesize (#3037 ) * llama : use posix_madvise() instead of madvise() derived from BSD sed -i 's,\<madvise\>,posix_&,g;s,\<MADV_,POSIX_&,g' llama.cpp * ggml : use sysconf(_SC_PAGESIZE) instead of getpagesize() derived from BSD sed -i 's,getpagesize(),sysconf(_SC_PAGESIZE),g' ggml.c * metal : use sysconf(_SC_PAGESIZE) instead of getpagesize() derived from BSD sed -i 's,getpagesize(),sysconf(_SC_PAGESIZE),g' ggml-metal.m	2023-09-07 11:15:06 +03:00
Georgi Gerganov	bf0b4c808d	k-quants : fix zero-weight guard in Q6_K (ref #3040 )	2023-09-06 12:40:57 +03:00
Kerfuffle	be4f496d09	convert-llama-ggml-to-gguf: Try to handle files older than GGJTv3 (#3023 ) * convert-llama-ggmlv3-to-gguf: Try to handle files older than GGJTv3 * Better error messages for files that cannot be converted * Add file type to GGUF output * Rename to convert-llama-ggml-to-gguf.py * Include original file type information in description * Improve some informational output	2023-09-06 02:49:11 -06:00
Cebtenzzre	a76a94fe31	build : add LLAMA_METAL_NDEBUG flag (#3033 )	2023-09-05 18:21:10 -04:00
Cebtenzzre	2ebfd0aa22	make : use new flag variables for recent changes (#3019 )	2023-09-05 15:12:00 -04:00
Cebtenzzre	ed5a405c22	examples : replace fprintf to stdout with printf (#3017 )	2023-09-05 15:10:27 -04:00
Erik Scholz	c9e735f4bf	convert: fix convert.py not working with int filename_stem (#3028 ) * fix implicit int to string conversion * convert : remove an obsolete pyright comment --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>	2023-09-05 19:41:00 +02:00
Kawrakow	4f7048458f	Guard against all weights in a super-block being zero (#3010 ) * Guard against all weights in a super-block being zero * Also guard against extremely small weights Closes #2982 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-09-05 09:55:33 +02:00
Georgi Gerganov	365578f31e	llama : update logic for number of threads when using BLAS	2023-09-05 10:46:39 +03:00
Georgi Gerganov	9615d0c6b4	speculative : add grammar support (#2991 ) * speculative : add grammar support * grammars : add json_arr.gbnf * grammar : add comments to new grammar file * grammar : remove one nested level * common : warm-up with 2 tokens - seems to work better * speculative : print draft token pieces * speculative : reuse grammar parser + better logs and comments * speculative : avoid grammar_mem * make : fix speculative build	2023-09-05 08:46:17 +03:00
Georgi Gerganov	5ce628ba1c	py : minor	2023-09-04 22:50:50 +03:00
Georgi Gerganov	8e49675a7b	build : on Mac OS enable Metal by default (#2901 ) * build : on Mac OS enable Metal by default * make : try to fix build on Linux * make : move targets back to the top * make : fix target clean * llama : enable GPU inference by default with Metal * llama : fix vocab_only logic when GPU is enabled * common : better `n_gpu_layers` assignment * readme : update Metal instructions * make : fix merge conflict remnants * gitignore : metal	2023-09-04 22:26:24 +03:00
slaren	8d85c7d12c	ggml-opencl : store GPU buffer in ggml_tensor::extra (#2994 )	2023-09-04 14:59:52 +02:00
Cebtenzzre	24d2622ba2	llama-bench : make cpp file non-executable (#2999 )	2023-09-04 13:40:18 +03:00
Leng Yue	736e898675	make : add speculative example (#3003 )	2023-09-04 13:39:57 +03:00
Aarni Koskela	d48f5c09df	server : add a subtle loading animation to the edit box (#2466 ) * editorconfig: add override for the server HTML (which already is 2-space indented) * server: add a subtle loading animation to the edit box	2023-09-04 16:28:55 +08:00
Jiahao Li	b0cd5d83b3	2x faster (rms) norm cuda kernels (3.7% e2e improvement) (#2985 ) * 2x faster (rms) norm cuda kernels * Fix code style	2023-09-04 08:53:30 +02:00
slaren	822ad0f739	ggml-alloc : use virtual memory for measurement (#2973 ) * ggml-alloc : use virtual memory for measurement * compatibility fixes for MAP_ANONYMOUS * fallback to fixed address for systems without virtual memory	2023-09-03 20:34:09 +02:00
Georgi Gerganov	a88f9a8ca8	speculative : PoC for speeding-up inference via speculative sampling (#2926 ) * speculative : initial example * speculative : print encoding speed * speculative : add --draft CLI arg	2023-09-03 15:12:08 +03:00
Georgi Gerganov	9fe76e79e3	perplexity : fix ETA by warming up the model with an empty run	2023-09-03 13:43:17 +03:00
Kerfuffle	ae8e8ebe53	gguf(python): Fix special vocab handling when id < 0 (#2984 )	2023-09-03 04:38:43 -06:00
Georgi Gerganov	811285a543	metal : restore 363f0bf and fix reduce in F16_F32 kernels (#2986 )	2023-09-03 13:23:33 +03:00
Alon	928ab515d1	cov : disable comment in PRs (#2989 )	2023-09-03 13:19:01 +03:00
opparco	7fe3095287	llama : fix bpe tokenize from byte (#2889 )	2023-09-03 13:18:09 +03:00
Georgi Gerganov	89349ceb7b	metal : revert 6af0bab until we fix it This restores the generated text to be the same as before #2959	2023-09-03 12:40:56 +03:00
Alon	53afc99c41	cov : add Code Coverage and codecov.io integration (#2928 ) * update .gitignore * makefile: add coverage support (lcov, gcovr) * add code-coverage workflow * update code coverage workflow * wun on ubuntu 20.04 * use gcc-8 * check why the job hang * add env vars * add LLAMA_CODE_COVERAGE=1 again * - add CODECOV_TOKEN - add missing make lcov-report * install lcov * update make file -pb flag * remove unused GGML_NITER from workflows * wrap coverage output files in COV_TARGETS	2023-09-03 11:48:49 +03:00
Wentai Zhang	b3912e82f1	opencl : fix a bug in ggml_cl_pool_malloc() for ggml_cl_mul_mat_f32() (#2955 ) Co-authored-by: Wentai Zhang <wentaizhang@tencent.com>	2023-09-03 11:46:44 +03:00
Kawrakow	c937b6d718	metal : more optimizations (#2959 ) * Very minor speedup via simd-group synchronization in f16 x f32 * Another very minor speedup on metal * Quite significant PP speedup on metal * Another attempt * Minor * Massive improvement for TG for fp16 * ~4-5% improvement for Q8_0 TG on metal --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-03 11:06:22 +03:00
kchro3	c2873512e6	swift : add support for k-quants (#2983 )	2023-09-03 09:21:05 +03:00
Kerfuffle	ba445e659c	convert.py : BPE fixes (#2938 ) * convert.py: BPE fixes? * Remove unnecessary conditional in addl token error handling	2023-09-03 08:52:13 +03:00
Ido S	a8b85ea614	docs : add `catai` to `README.md` (#2967 )	2023-09-03 08:50:51 +03:00
momonga	b41680f397	examples : fix gpt-neox (#2943 ) Co-authored-by: mmnga <mmnga1mmnga@gmail.com>	2023-09-03 08:36:28 +03:00
kchro3	f96a0722fa	swift : add missing c file to Package.swift (#2978 )	2023-09-03 08:27:25 +03:00
Cebtenzzre	af0127b31a	make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS (#2886 ) * make : remove unused -DGGML_BIG_ENDIAN * make : put preprocessor stuff in CPPFLAGS * make : pass Raspberry Pi arch flags to g++ as well * make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS * make : fix inverted conditional	2023-09-03 08:26:59 +03:00
Kerfuffle	9f664f66a4	logging: Fix creating empty file even when disabled (#2966 ) * logging: Fix creating empty file even when disabled * Minor formatting fix Co-authored-by: staviq <staviq@gmail.com> --------- Co-authored-by: staviq <staviq@gmail.com>	2023-09-02 11:53:55 -06:00
bandoti	626da973c4	readme : update clblast instructions (#2903 ) * Update Windows CLBlast instructions * Update Windows CLBlast instructions * Remove trailing whitespace	2023-09-02 15:53:18 +03:00
Karsten Weiss	837df7e8d2	metal : show all Metal device instances in the system (#2952 ) * ggml_metal_init: Show all Metal device instances in the system Also show the default Metal device that was picked. * Update ggml-metal.m --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-02 15:29:09 +03:00
Jhen-Jie Hong	65702a083e	k-quants : fix build on armv7 (android only) (#2920 ) * k-quants : fix build on armv7 * ggml : cleanup unused arm32 specific impl * k-quants : avoid some unused vzero / mzero define * ggml-alloc : use 4g for MEASURE_MAX_SIZE in 32-bit arm	2023-09-02 15:23:45 +03:00
Jhen-Jie Hong	beed103e3d	server : avoid aniprompt in probabilities of final response (#2849 )	2023-09-02 08:31:46 +08:00
Engininja2	26205b02a4	cuda : vsubss4 for older versions of ROCm/clang (#2942 )	2023-09-01 23:33:19 +02:00
ZHAOKAI WANG	0e58794306	readme : quick start command fix (#2908 ) * quick start command fix * quick start win command fix	2023-09-01 17:06:44 +03:00
Kerfuffle	2ac8bf40d0	Allow quantize to only copy tensors, some other improvements (#2931 ) * Allow quantize tool to only copy tensors to allow repackaging models. * Slightly better logic when requantizing. * Change help message to go to `stdout`.	2023-09-01 08:02:48 -06:00
Georgi Gerganov	7166fdca0f	llama2c : rename function	2023-09-01 17:01:11 +03:00
Cebtenzzre	6fd8d848b3	make : use unaligned vector moves on MinGW (#2945 ) Fixes #2922	2023-09-01 16:53:14 +03:00
m3ndax	03c5668102	minor : add const qualifiers (#2853 ) * made the methods const # Conflicts: # examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp * made method const * Update convert-llama2c-to-ggml.cpp removed write_raw and write_u32 * llama2c : remove misleading const --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-01 16:47:27 +03:00
Konstantin Herud	80569510d8	docs : add java-llama.cpp to README.md (#2935 )	2023-09-01 16:36:14 +03:00
Cebtenzzre	a512948d93	build : fix most gcc and clang warnings (#2861 ) * fix most gcc and clang warnings * baby-llama : remove commented opt_params_adam * fix some MinGW warnings * fix more MinGW warnings	2023-09-01 16:34:50 +03:00
Ben Siraphob	5b374b4cfe	examples : add C grammar (#2357 )	2023-09-01 16:32:14 +03:00
Tameem	fcaf9592ce	ggml : add RISC-V vector intrinsics support (#2929 ) * added support for RISCV CFLAGS & native compile + cross compile options * Add RISC-V Vector Intrinsics Support Added RVV intrinsics for following ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 ggml_vec_dot_q8_0_q8_0 Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai> Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> --------- Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> Co-authored-by: moiz.hussain <moiz.hussain@10xengineers.ai> Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai>	2023-09-01 16:27:40 +03:00

... 28 29 30 31 32 ...

2639 Commits