ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-02 20:48:03 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	faa13abb73	editorconfig : remove trailing spaces	2023-10-17 19:52:53 +03:00
coezbek	57fb1fe438	server : documentation of JSON return value of /completion endpoint (#3632 ) * Added documentation of JSON return value of /completion endpoint * Update examples/server/README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-17 19:51:02 +03:00
Georgi Gerganov	3589fdc88d	save-load-state : fix example + add ci test (#3655 ) * save-load-state : fix example (close #3606) * ci : add test for save-load-state example ggml-ci	2023-10-17 19:12:46 +03:00
ldwang	e49cde7ded	readme : add Aquila2 links (#3610 ) Signed-off-by: ldwang <ftgreat@gmail.com> Co-authored-by: ldwang <ftgreat@gmail.com>	2023-10-17 18:52:33 +03:00
staviq	71936d5fbe	tokenizer : special token handling (#3538 ) * Rewrite special token handling from #1931 * shorten param name, add st verification by type * use offsets instead of copy by substr * formatting, remove copying iterator on delete * llama : normalize code-style * swift fix * print pfx/sfx if verb, main: split pfx input sfx * dont add space when using special tokens * minor : comment + spacing --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-17 18:11:01 +03:00
Georgi Gerganov	37219c789f	k-quants : fix quantization ranges (#3646 )	2023-10-17 09:19:28 +03:00
Georgi Gerganov	1f9f23ea4a	llava : fix tokenization to not add bos between image embeddings and user prompt (#3645 ) * llava : fix tokenization to not add bos after system prompt * set seed --------- Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>	2023-10-16 23:58:00 +03:00
cebtenzzre	2d9d8e7de2	MPT : support GQA for replit-code-v1.5 (#3627 )	2023-10-15 09:32:06 +03:00
M. Yusuf Sarıgöz	dd5a356c81	Honor -ngl option for Cuda offloading in llava (#3621 )	2023-10-14 04:52:44 -06:00
Daniel Bevenius	bfde2a4566	llama : remove n_threads from llama_decode_internal (#3614 ) This commit removes `n_threads` from the `llama_decode_internal` functions doc comment as it does not exist anymore. It looks like this parameter was removed in Commit `16bc66d947` ("llama.cpp : split llama_context_params into model and context params"). Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2023-10-13 13:33:16 +03:00
slaren	729c8f78ec	ggml : add context enumeration functions (#3605 ) finetune : fix assert failure in ggml-alloc	2023-10-13 12:23:10 +02:00
shibe2	8f78e4d46e	CLBlast: Fix matrix-vector multiplication (#3544 )	2023-10-12 21:59:47 +02:00
M. Yusuf Sarıgöz	d406725539	examples: support LLaVA v1.5 (multimodal model) (#3436 ) * WIP: start implementing LLaVA * rm scratch buf for now, will revert after cleanup * LLaVA image encoder is working. will combine with llama * Add llava inference code, but it's buggy. debugging * LLaVA is working e2e, needs to optimize memory allocation + cleanup * Use ggml_allocr + rm unnecessary code * fix: crlf -> lf * fix: new line at EoF * fix: trailing whitespace * Add readme * Update readme * Some cleanup * Are you happy editorconfig? * rm unused batch image preprocessing * rm unused import * fix: rm designated initializers * introduce pad-to-square mode for non-square images * are you happy editorconfig? * gitignore /llava * Handle cases where image file does not exist * add llava target to Makefile * add support for 13b model variant * Maybe seed is unlucky? * Check if apples are compared to apples * are you happy editorconfig? * Use temperature = 0.1 by default * command line: use gpt_params_parse() * minor * handle default n_predict * fix typo * llava : code formatting, rename files, fix compile warnings * do not use Wno-cast-qual for MSVC --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-12 18:23:18 +03:00
uint256_t	dc9c5a37a3	docs : fix typo GOMP_CPU_AFFINITY (#3597 )	2023-10-12 16:36:16 +03:00
Georgi Gerganov	7bf0bf6231	cmake : fix add_compile_options on macOS	2023-10-12 14:31:05 +03:00
Ian Scrivener	3ee11e89e1	typo : it is `--n-gpu-layers` not `--gpu-layers` (#3592 ) fixed a typo in the MacOS Metal run doco	2023-10-12 14:10:50 +03:00
Georgi Gerganov	907b41b661	ci : check if there is enough VRAM (#3596 ) ggml-ci	2023-10-12 13:44:56 +03:00
Aarni Koskela	e47676d9e3	server : add completion mode (no chat) (#3582 )	2023-10-12 09:51:53 +03:00
Georgi Gerganov	758d0ddfca	prompts : add mnemonics.txt	2023-10-12 09:35:30 +03:00
Georgi Gerganov	a247c7af1c	server : fix kv cache management (#3588 )	2023-10-12 09:29:04 +03:00
Georgi Gerganov	47ae6b2fa3	main : fix session loading bug (#3400 )	2023-10-11 23:55:41 +03:00
Michael Coppola	132406fe03	server : add parameter -tb N, --threads-batch N (#3584 ) Co-authored-by: Michael Coppola <info@michaeljcoppola.com>	2023-10-11 22:42:22 +03:00
Kerfuffle	ecd831a6b8	common : fix mirostat state when using multiple sequences (#3543 ) * Fix mirostat state when using multiple sequences * Fix mirostat by completely refactoring sampling! * Try to fix zig build. * Export function to fetch/create default sampler states Code formatting cleanups and add some comments Silence a warning about id not being used when logging is disabled * Apply some renaming suggestions. Fix comments that were out of sync with the pull. * Use more consistant naming convention for sampling contexts	2023-10-11 22:35:46 +03:00
Georgi Gerganov	f11fd81fbd	batched : add bench tool (#3545 ) * batched : add bench tool * batched : minor fix table * batched-bench : add readme + n_kv_max is now configurable * batched-bench : init warm-up batch * batched-bench : pass custom set of PP, TG and PL * batched-bench : add mmq CLI arg	2023-10-11 21:25:33 +03:00
Zane Shannon	dcdafa74c6	examples : add batched.swift + improve CI for swift (#3562 )	2023-10-11 06:14:05 -05:00
Galunid	a637869df6	Add MPT model to supported models in README.md (#3574 )	2023-10-10 19:02:49 -04:00
goerch	4e6e75e98e	Minor improvements in GPT2 tokenizer (#3567 ) * Fixing minor bugs in bpe_gpt2_preprocess * Don't add bos token in test	2023-10-10 18:59:52 +02:00
Xingchen Song(宋星辰)	8994c485e9	readme : add bloom (#3570 )	2023-10-10 19:28:50 +03:00
Xingchen Song(宋星辰)	5f0a4ad1c2	llm : add bloom models (#3553 ) * feat: Support bloom models * fix(bloom): fix model size --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-10 17:48:21 +03:00
Jhen-Jie Hong	d49c1c5b2d	swift : improvements and fixes (#3564 ) * swift : use macOS 12 as minimum requirement * swift : add missing ggml-backend.c source * swift : add -O3 -DNDEBUG unsafe flags	2023-10-10 14:31:13 +03:00
Jan Ploski	fe2f22f1e0	llm : add MPT support (#3417 ) * CUDA: added support for ggml_clamp (see also: https://github.com/ggerganov/ggml/issues/545) * mpt : added an implementation based (mostly) on falcon integration, modified with deltas from ggml/examples/mpt * mpt : protect against "clip_qkv": null in mpt-7b * mpt : quick fix to avoid "Strange model" warning when quantizing MPT models * mpt : addendum to changeset:84e30e8 - leave parameter clamp_kqv out from metadata rather than use 0.0 to indicate "no clamping" (more compliant with the current GGUF spec?) * mpt : standardized all tensor names to follow GGUF spec * mpt : addendum to changeset:1be89c40 - use "req" parameter of GGUF_GET_KEY macro instead of duplicate code * mpt : fixed comment s/gptneox/mpt/ * mpt : remove tabs, trailing whitespace * mpt : removed ne01 + n_past == ne00 assertion from alibi (cuda/f32) and rope_shift from build_mpt * mpt : updated convert-mpt-hf-to-gguf.py to reflect changes made to convert-gptneox-hf-to-gguf.py in pr:3252 * comment out n_past instead of marking it unused * mpt : removed hardcoded +178 from convert script in favor of utilizing hparams["vocab_size"] * mpt : remove unused tokenizer_json in convert script * ggml : remove obsolete n_past assert in ggml_alibi * llama : print clam_kqv and max_alibi_bias hparams --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-10 10:50:23 +03:00
vvhg1	b1144203e3	infill. : fix tokenization (#3508 ) * infill tokens correction * serverinfill tokens correction * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * only rm when params.escape, rm space if possible which is added back or rm added space token * only rm when params.escape, rm space if possible which is added back or rm added space token * Revert "only rm when params.escape, rm space if possible which is added back or rm added space token" This reverts commit 63ba0b621f21077c0e3bc6ba6a327534123cb738. * fix interactive prompt escaping and fix server infill leading space handling * rm unnecessary bool check	2023-10-10 10:31:21 +03:00
slaren	ff8ee10bfa	ggml-alloc : fix assert in debug builds (#3555 )	2023-10-09 15:44:58 +03:00
Georgi Gerganov	2743064b15	refact : fix convert script + zero out KV cache to avoid nans (#3523 ) * refact : fix convert script + zero out KV cache to avoid nans * ggml : silu(-inf) should never happen * metal : assert various kernel requirements	2023-10-09 14:32:17 +03:00
Georgi Gerganov	6ac45c3397	metal : do not use mul_mm kernels when ne00 < 64 (#3542 )	2023-10-09 14:28:27 +03:00
Georgi Gerganov	78b3d9b796	sync : ggml (ggml-backend) (#3548 ) * sync : ggml (ggml-backend) ggml-ci * zig : add ggml-backend to the build	2023-10-08 20:19:14 +03:00
Matheus C. França	6a0de063d6	ci : add Zig CI/CD and fix build (#2996 ) * zig CI/CD and fix build Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com> * fix build_compiler * ci : remove trailing whitespace --------- Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-08 16:59:20 +03:00
Ryder Wishart	61bd777112	api_like_OAI.py : compat with Microsoft Guidance (#2746 ) Check for None in addition to empty string check in all request params Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-08 13:55:58 +03:00
arcrank	4e58bb2f8d	api_like_OAI.py : simplify function (#2796 ) Simplify function	2023-10-08 13:52:57 +03:00
Johannes Rudolph	34388801da	k-quants : fix comments about block sizing (#3499 )	2023-10-08 13:21:19 +03:00
Georgi Gerganov	cd058b6357	ci : enable on obj-c changes + fix metal build (#3540 )	2023-10-08 11:24:50 +03:00
Luo Tian	f518e5c7e3	zig : fix build by introducing train.cpp (#3539 )	2023-10-08 11:24:01 +03:00
Georgi Gerganov	d596ae762a	metal : support MTLGPUFamily < Apple7, formatting, style (#3524 ) * metal : improve decoding speed for batches of 2-16 * metal : rename kernels mul_mat_ to mul_mv_ * metal : indentations * minor * metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7	2023-10-08 10:01:53 +03:00
Kerfuffle	418c7c4e56	llama : fix missing break in Persimmon arch case statements (#3535 )	2023-10-08 08:22:17 +03:00
Kerfuffle	7b49ee2537	Fix trying to strip newline from empty prompt and cfg prompt file content (#3534 )	2023-10-07 15:31:41 -06:00
M. Yusuf Sarıgöz	fb1d64727e	gguf.py : fix CI for publishing GGUF package (#3532 ) * Fix CI for publishing GGUF package * Bump version * fix * bump version * bump version * bump version	2023-10-07 22:14:10 +03:00
Tom C	a802debeb6	py : change version of numpy requirement to 1.24.4 (#3515 ) Co-authored-by: Lyjia <me@lyjia.us>	2023-10-07 12:56:15 +03:00
cebtenzzre	80735cb7bd	quantize : fail fast on write errors (#3521 )	2023-10-07 11:41:52 +03:00
Jhen-Jie Hong	bff47ce69b	metal : support default.metallib load & reuse code for swift package (#3522 ) * metal : support load default.metallib & reuse code for swift package * metal : use SWIFT_PACKAGE def instead of define GGML_SWIFT	2023-10-07 11:40:27 +03:00
Phillip Kravtsov	afaaf1849d	llm : support Adept Persimmon 8B (#3410 ) * Produces garbage output * wip: correct tensors up to RoPE * correct tensors thru RoPE * Correct outputs through masked & softmax'd KQ * fp32 works * Rename adept->persimmon * Produces correct outputs * clean up convert scripts * remove printing logic from ggml.c * remove prints from llama.cpp & fix merge * trivial cleanups * Add offload funcs * update conversion script to directly take adept artifacts rather than .saftensors file * Fix norm eps bug * Support sqr and concat on metal, persimmon-8b-q4 runs correctly * Small changes from review * Formatting changes * Minor changes to conversion script * Remove old script * Fix editorconfig formatting * Fix build * add overlooked offload code ggml-ci	2023-10-07 10:12:43 +03:00

... 24 25 26 27 28 ...

2639 Commits