ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-05-11 16:40:16 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	1b22fde79c	minor : fix trailing whitespace (#5638 )	2024-02-22 13:54:03 +02:00
Georgi Gerganov	2366d08782	readme : update hot topics	2024-02-22 10:35:54 +02:00
Xuan Son Nguyen	967b99606a	server : fallback to chatml, add AlphaMonarch chat template (#5628 ) * server: fallback to chatml * add new chat template * server: add AlphaMonarch to test chat template * server: only check model template if there is no custom tmpl * remove TODO	2024-02-22 10:33:24 +02:00
Alexey Parfenov	5626d4898f	server : clarify some params in the docs (#5640 )	2024-02-22 10:27:32 +02:00
Dat Quoc Nguyen	e204f6452d	mpt : add optional bias tensors (#5638 ) Update for MPT with optional bias parameters: to work with PhoGPT and SEA-LION models that were pre-trained with 'bias'.	2024-02-22 10:15:13 +02:00
slaren	1144411469	llama : fix loading models with shared tok_embd and output (#5651 ) ggml-ci	2024-02-22 00:42:09 +01:00
Xuan Son Nguyen	c46eadce5a	Add docs for llama_chat_apply_template (#5645 ) * add docs for llama_chat_apply_template * fix typo	2024-02-22 00:31:00 +01:00
slaren	f06a6c9879	llama : fix session save/load with quantized KV (#5649 )	2024-02-21 22:52:39 +01:00
slaren	7bef081a42	gemma : allow offloading the output tensor (#5646 )	2024-02-21 22:18:23 +01:00
Jared Van Bortel	8cdd01f57b	examples : do not assume BOS when shifting context (#5622 )	2024-02-21 10:33:54 -05:00
Georgi Gerganov	f85c73f1d0	sync : ggml	2024-02-21 16:52:52 +02:00
Pierrick Hymbert	bceef5565a	server: health: fix race condition on slots data using tasks queue (#5634 ) * server: health: fix race condition on slots data using tasks queue * server: health: * include_slots only if slots_endpoint * fix compile warning task.target_id not initialized.	2024-02-21 15:47:48 +01:00
Ettore Di Giacinto	d8426bb13e	readme : add LocalAI to the availables UI (#5629 )	2024-02-21 16:39:10 +02:00
Georgi Gerganov	0b5d2708a7	sync : ggml (#5633 ) * ggml : fix conv_2d batch mode (ggml/737) Co-authored-by: bssrdf <bssrdf@gmail.com> * ggml : compute forward no longer pass src tensors (ggml/729) * sync : ggml ggml-ci --------- Co-authored-by: bssrdf <merlintiger@hotmail.com> Co-authored-by: bssrdf <bssrdf@gmail.com>	2024-02-21 16:17:10 +02:00
Georgi Gerganov	76044db92b	readme : update hot topics	2024-02-21 15:39:54 +02:00
Daniel Bevenius	d5454927e0	llava : add --skip-unknown to 1.6 convert.py (#5632 ) This commit adds the `--skip-unknown` option to the convert.py script and removes the saving of the updated checkpoints to avoid updating possibly checked out files. The motivation for this change is that this was done for 1.5 in Commit `fc0c8d286a` ("llava : update surgery script to not remove tensors") and makes the examples more consistent. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-02-21 15:36:57 +02:00
postmasters	9a92a78e47	llama : add `gemma` model (#5631 ) There are couple things in this architecture: 1. Shared input and output embedding parameters. 2. Key length and value length are not derived from `n_embd`. More information about the models can be found at https://ai.google.dev/gemma. GGUFs can be downloaded from https://huggingface.co/google.	2024-02-21 15:08:22 +02:00
Meng, Hengyu	2b9427c35d	[SYCL] conext add name (#5624 ) * [SYCL] conext add name * name should start with SYCL*	2024-02-21 17:52:06 +08:00
Kawrakow	0d012cc5d3	IQ4_NL: 4-bit non-linear quants with blocks of 32 (#5590 ) * iq4_nl: squash commits for easier rebase * Basics (quantize, dequantize) * CUDA dequantize and dot product * Slightly faster CUDA dot product (120 t/s) * Switch to 6-bit scales * Scalar dot product * AVX2 dot product * ARM_NEON dot product * Works on metal, but still slow * Slightly better Metal dot product * Another small Metal improvement * Metal dot product is getting there * Faster CUDA dot product * Add 1/8 ffn_down layers as Q5_K when no imatrix has been provided * Report the actual bpw * Add _xs mix that is 4.05 bpw for non-MoE models * Remove IQ4_XS for now, slightly adjust kvalues_iq4nl * AVX2 dot product uses Q8_0 instead of Q8_K * Add to test-backend-ops * Minor fix * Also use use Q5_K for attn_output in MoE models * Fixes after merging latest master * Switching to blocks of 32 * AVX2 for blocks of 32 * Scaler dot product for blocks of 32 * ARM_NEON dot product for blocks of 32 * Metal kernels for blocks of 32 * Slightly faster Metal kernels * iq4_nl: Fix after merging with master * iq4_nl: another fix after merging with master * Use IQ4_NL instead of Q4_K when using k-quants is not possible * Fix typo that makes several tests fail * It was the ggml_vdotq thing missed inside the brackets --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-02-21 11:39:52 +02:00
CJ Pais	8f2443f334	server : support llava 1.6 (#5553 ) * server: init working 1.6 * move clip_image to header * remove commented code * remove c++ style from header * remove todo * expose llava_image_embed_make_with_clip_img * fix zig build	2024-02-20 21:07:22 +02:00
slaren	0c25ab0c1e	make : fix debug build with CUDA (#5616 )	2024-02-20 20:06:17 +01:00
Daniel Bevenius	09232a43bd	llava : add explicit instructions for llava-1.6 (#5611 ) This commit contains a suggestion for the README.md in the llava example. The suggestion adds explicit instructions for how to convert a llava-1.6 model and run it using llava-cli. The motivation for this is that having explicit instructions similar to the 1.5 instructions will make it easier for users to try this out. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-02-20 19:30:27 +02:00
Xuan Son Nguyen	8410e862e7	Server: use llama_chat_apply_template (#5593 ) * server: use llama_chat_apply_template * server: remove trailing space * server: fix format_chat * server: fix help message Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server: fix formatted_chat --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-20 15:58:27 +01:00
Dane Madsen	a65baf5728	readme : update UI list (#5605 ) * Add maid to ui list * Specify licence	2024-02-20 12:00:23 +02:00
Haoxiang Fei	9a7e846764	metal : add build system support for embedded metal library (#5604 ) * add build support for embedded metal library * Update Makefile --------- Co-authored-by: Haoxiang Fei <feihaoxiang@idea.edu.cn> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-20 11:58:36 +02:00
Pierrick Hymbert	9c2e3206ae	server : health endpoint configurable failure on no slot (#5594 )	2024-02-20 09:48:19 +02:00
AidanBeltonS	da66df57c2	Update ggml_sycl_op_mul_mat_vec_q (#5502 ) * Update ggml_sycl_op_mul_mat_vec_q * Apply suggestions from code review Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * revert suggestion on macro * fix bug * Add quant type GGML_TYPE_IQ1_S to unsupported * fix format --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-02-20 12:31:25 +05:30
Mathijs de Bruin	a0a954047e	nix: now that we can do so, allow MacOS to build Vulkan binaries Author: Philip Taron <philip.taron@gmail.com> Date: Tue Feb 13 20:28:02 2024 +0000	2024-02-19 14:49:49 -08:00
0cc4m	4e4b0af11b	Enable Vulkan MacOS CI	2024-02-19 14:49:49 -08:00
0cc4m	b3a7327e50	Refactor validation and enumeration platform checks into functions to clean up ggml_vk_instance_init()	2024-02-19 14:49:49 -08:00
0cc4m	54a487239e	Add check for VK_KHR_portability_enumeration for MoltenVK support	2024-02-19 14:49:49 -08:00
Mathijs de Bruin	bdea454f1b	Add preprocessor checks for Apple devices. Based on work by @rbourgeat in https://github.com/ggerganov/llama.cpp/pull/5322/files	2024-02-19 14:49:49 -08:00
Mathijs de Bruin	e53d9228cb	Resolve ErrorIncompatibleDriver with Vulkan on MacOS. Refs: - https://chat.openai.com/share/7020ce72-65fc-45ec-b7be-9d9d798a5f3f - https://github.com/SaschaWillems/Vulkan/issues/954 - https://github.com/haasn/libplacebo/issues/128 - https://github.com/KhronosGroup/Vulkan-Samples/issues/476	2024-02-19 14:49:49 -08:00
Mathijs de Bruin	c63a50d87f	Allow for Vulkan build with Accelerate. Closes #5304	2024-02-19 14:49:49 -08:00
slaren	c547a6ab37	cuda : ignore peer access already enabled errors (#5597 ) * cuda : ignore peer access already enabled errors * fix hip	2024-02-19 23:40:26 +01:00
Jared Van Bortel	1f11070a5d	make : pass CPPFLAGS directly to nvcc, not via -Xcompiler (#5598 )	2024-02-19 15:54:12 -05:00
nopperl	ef74d0a568	examples : support minItems/maxItems in JSON grammar converter (#5039 ) * support minLength and maxLength in JSON schema grammar converter * Update examples/json-schema-to-grammar.py --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-19 16:14:07 +02:00
Georgi Gerganov	ac2ef9177f	llava : remove extra cont (#5587 )	2024-02-19 15:23:17 +02:00
slaren	9b03978212	llava : replace ggml_cpy with ggml_cont	2024-02-19 15:09:43 +02:00
Georgi Gerganov	3a1f76bfc6	sync : ggml ggml-ci	2024-02-19 15:09:43 +02:00
Georgi Gerganov	e9f1033234	ggml-alloc : apply ggml/731	2024-02-19 15:09:43 +02:00
Didzis Gosko	588bd6dfb3	metal : option to embed MSL source into compiled binary (whisper/1842) * ggml : embed Metal library source (ggml-metal.metal) into binary enable by setting WHISPER_EMBED_METAL_LIBRARY * rename the build option * rename the preprocessor directive * generate Metal library embedding assembly on-fly during build process	2024-02-19 15:09:43 +02:00
Georgi Gerganov	142d5804a2	ci : enable -Werror for CUDA builds (#5579 ) * cmake : pass -Werror through -Xcompiler ggml-ci * make, cmake : enable CUDA errors on warnings ggml-ci	2024-02-19 14:45:41 +02:00
Georgi Gerganov	1d3666a30a	make : fix CUDA build (#5580 )	2024-02-19 13:41:51 +02:00
valiray	1fb6cbbfb1	readme : fix typo in README-sycl.md (#5353 )	2024-02-19 12:37:10 +02:00
Abhilash Majumder	d6d5780ad5	cmake : remove obsolete sycl compile flags (#5581 ) * rm unwanted sycl compile options * fix bug * fix bug * format fix	2024-02-19 11:15:18 +02:00
Georgi Gerganov	a71d4093c0	minor : fix trailing whitespace (#5538 )	2024-02-19 10:34:10 +02:00
Daniel Bevenius	5f8a7627bf	llava : avoid changing the original BakLLaVA model (#5577 ) This is a follup of Commit `fc0c8d286a` ("llava : update surgery script to not remove tensors") but this time the change is to the BakLLaVA specific part of the surgery script. I've been able to test this using SkunkworksAI/BakLLaVA-1 and it works as expected using the instructions in README.md. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-02-19 10:31:59 +02:00
NawafAlansari	df67f5441a	baby-llama : allocate graphs in ggml_context (#5573 ) * Fixed the baby-llama issue (see issue #4830) * minor : fix whitespaces --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-19 10:25:38 +02:00
Xuan Son Nguyen	d62719c0c7	llama : add llama_chat_apply_template() (#5538 ) * llama: add llama_chat_apply_template * test-chat-template: remove dedundant vector * chat_template: do not use std::string for buffer * add clarification for llama_chat_apply_template * llama_chat_apply_template: add zephyr template * llama_chat_apply_template: correct docs * llama_chat_apply_template: use term "chat" everywhere * llama_chat_apply_template: change variable name to "tmpl"	2024-02-19 10:23:37 +02:00

... 7 8 9 10 11 ...

2639 Commits