ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-02 04:29:53 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	7ddf5857e7	main : add self-extend support (#4815 ) * examples : add passkey test * passkey : better prints * passkey : select pass key pos from CLI * passkey : simplify n_past logic * llama : "self-extend"-like context extension * passkey : add comment * main : add Self-Extend support * llama : add comment about llama_kv_cache_seq_div	2024-01-08 11:18:32 +02:00
Georgi Gerganov	a386b0dd63	examples : add passkey test (#3856 ) * examples : add passkey test * passkey : better prints * passkey : select pass key pos from CLI * passkey : simplify n_past logic * make : add passkey target * passkey : add "self-extend"-like context extension (#4810) * llama : "self-extend"-like context extension * passkey : add comment * passkey : add readme	2024-01-08 11:14:04 +02:00
Lars Grammel	9e96d6076a	readme : add lgrammel/modelfusion JS/TS client for llama.cpp (#4814 )	2024-01-07 22:24:11 +02:00
slaren	d513cfc4b5	llama-bench : add no-kv-offload parameter (#4812 )	2024-01-07 17:59:01 +01:00
Johannes Gäßler	770ec541f9	CUDA: fixed redundant value dequantization (#4809 )	2024-01-07 17:24:08 +01:00
Georgi Gerganov	ec08b3e86f	llama : remove unused vars (#4796 )	2024-01-07 14:29:36 +02:00
Georgi Gerganov	3a96073b59	llama : remove redundant GQA check (#4796 )	2024-01-07 11:21:53 +02:00
Alex Azarov	30df691a96	llama.swiftui : use llama.cpp as SPM package (#4804 )	2024-01-07 10:20:50 +02:00
Georgi Gerganov	52b664aece	llama : print tensor meta for debugging	2024-01-07 09:51:12 +02:00
Alex Azarov	8c36aaf5a8	llama.swiftui : add visionOS target (#4805 )	2024-01-07 09:46:55 +02:00
Konstantin Zhuravlyov	5391345fcc	ggml : use __builtin_amdgcn_sudot4 in __dp4a for gfx11 (#4787 )	2024-01-07 08:52:42 +02:00
Georgi Gerganov	003f85d7ea	server : fix n_predict check (#4798 )	2024-01-07 08:45:26 +02:00
Daniel Illescas Romero	34d18eff4c	llama.swiftui : use correct pointer for llama_token_eos (#4797 )	2024-01-06 17:12:59 +02:00
Georgi Gerganov	33c9d849fd	examples : improve base-translate.sh script (#4783 )	2024-01-06 11:40:24 +02:00
a-n-n-a-l-e-e	b52357162d	cmake : check for openblas64 (#4134 ) openblas v0.3.22 64-bit pkg-config file is named openblas64.pc https://github.com/OpenMathLib/OpenBLAS/issues/3790	2024-01-05 18:04:40 +02:00
Ikko Eltociear Ashimine	f4ee045ad0	flake.nix : fix typo (#4700 ) betwen -> between	2024-01-05 18:02:44 +02:00
Georgi Gerganov	7e27e37f26	metal : switch back to default.metallib (ggml/681) ggml-ci	2024-01-05 18:02:06 +02:00
Georgi Gerganov	d6ec7cfc70	ggml : fix q2_k bpw in comments (ggml/680)	2024-01-05 18:02:06 +02:00
Finn Voorhees	0630261a48	ggml : add error handling to graph_compute (whisper/1714)	2024-01-05 18:02:06 +02:00
Georgi Gerganov	5ffddb870b	ggml : do not sched_yield when calling BLAS (#4761 ) * ggml : do not sched_yield when calling BLAS ggml-ci * ggml : fix do_yield logic ggml-ci * ggml : simplify do_yield logic ggml-ci	2024-01-05 15:18:21 +02:00
Georgi Gerganov	41ced5ce3c	examples : add few-shot translation example (#4783 )	2024-01-05 15:11:10 +02:00
Daniel Bevenius	0c4cb7138c	finetune : remove unused includes (#4756 ) This commit removes unused includes from finetune.cpp. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-01-04 21:45:37 +02:00
Georgi Gerganov	82e82f484d	server : send token probs for "stream == false" (#4714 )	2024-01-04 19:56:33 +02:00
Johannes Gäßler	b0a9bb90f9	Print backend name on test-backend-ops failure (#4751 )	2024-01-04 09:43:23 +01:00
singularity	2d08e99f47	llama.swiftui : support loading custom model from file picker (#4767 ) * swiftui: support load model from file picker * swiftui: remove trailing whitespace	2024-01-04 10:22:38 +02:00
Michael Coppola	85648efa9e	server : fix options in README.md (#4765 ) * fix examples/server/README.md * minor : fix whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-04 10:17:09 +02:00
Georgi Gerganov	7967c42ffb	ggml : include stdlib.h before intrin.h (#4736 )	2024-01-04 10:12:26 +02:00
singularity	c399a87c6b	llama.swiftui : fix build of ggml.metallib (#4754 ) * metal: fix metal backend init failure in swiftui * metal: build ggml.metallib instead of copy src * llama.swift : remove debug flags from metallib build --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-04 09:58:16 +02:00
Daniel Bevenius	41a287de3c	train : fix typo in overlapping-samples help msg (#4758 ) This commit fixes a typo in the help message for the --overlapping-samples option. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-01-03 19:53:40 +02:00
Ashraful Islam	59092ff962	swift : update Package.swift to use ggml as dependency (#4691 ) * updates the package.swift to use ggml as dependency * changes the ggml package url src to ggerganov	2024-01-03 19:30:02 +02:00
Georgi Gerganov	f2001ff46d	cuda : simplify expression Co-authored-by: slaren <slarengh@gmail.com>	2024-01-03 14:38:38 +02:00
Georgi Gerganov	09d890cb54	cuda : mark I16 and I32 ops as unsupported ggml-ci	2024-01-03 14:38:38 +02:00
Georgi Gerganov	4ebea0bdce	sync : ggml ggml-ci	2024-01-03 14:38:38 +02:00
Georgi Gerganov	514561978d	metal : add kernel_get_rows_i32 ggml-ci	2024-01-03 14:38:38 +02:00
Georgi Gerganov	74b4d9c1ed	scripts : fix sync order + metal sed	2024-01-03 14:38:38 +02:00
Guillaume Wenzek	b2cfdd2ea3	ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639) * add more int ops * ggml_compute_forward_dup_bytes * add tests * PR comments * tests : minor indentations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-03 14:38:38 +02:00
Justin Parker	5b56760f5c	server : throw an error when `slot unavailable` (#4741 )	2024-01-03 10:43:19 +02:00
Georgi Gerganov	dc7752d269	metal : optimize ggml_mul_mat_id (faster Mixtral PP) (#4725 ) * ggml : disable fast-math for Metal (cmake build only) ggml-ci * metal : fix Metal API debug warnings * cmake : add -fno-inline for Metal build (#4545) * metal : fix API debug warnings * metal : fix compile warnings * metal : use uint64_t for strides * cmake : rename option to LLAMA_METAL_SHADER_DEBUG * metal : fix mat-vec Q8_0 kernel for BS > 1 * metal : normalize mat-vec kernel signatures * cmake : respect LLAMA_QKK_64 option * metal : fix mat-vec Q4_K kernel for QK_K == 64 * metal : optimizing ggml_mul_mat_id (wip) * metal : minor fix * metal : opt mul_mm_id	2024-01-02 21:07:47 +02:00
Phil H	421b0da133	server : add token counts to html footer (#4738 ) * server: add token counts to stats * server: generate hpp --------- Co-authored-by: phiharri <ph@got-root.co.uk>	2024-01-02 17:48:49 +02:00
Georgi Gerganov	af83cacf1e	llama : llama_model_desc print number of experts	2024-01-02 16:26:45 +02:00
Marcus Dunn	7ea2965198	llama : replace all API facing `int`'s with `int32_t` (#4577 ) * replaced all API facing `int`'s with `int32_t` * formatting and missed `int` in `llama_token_to_piece`	2024-01-02 16:15:16 +02:00
postmasters	1081e7c69c	llama : differentiate the KV dims in the attention (#4657 ) * Add n_key_dim and n_value_dim Some models use values that are not derived from `n_embd`. Also remove `n_embd_head` and `n_embd_gqa` because it is not clear which "head" is referred to (key or value). Fix issue #4648. * Fix `llm_build_kqv` to use `n_value_gqa` * Rebase * Rename variables * Fix llm_build_kqv to be more generic wrt n_embd_head_k * Update default values for n_embd_head_k and n_embd_head_v Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix llm_load_tensors: the asserts were not backcompat --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-02 13:51:28 +02:00
Georgi Gerganov	8243feab46	editorconfig : fix whitespace and indentation #4710	2024-01-02 13:28:15 +02:00
minarchist	37b6fbf892	server : add --override-kv parameter (#4710 ) * Changes to server to allow metadata override * documentation * flake.nix: expose full scope in legacyPackages * flake.nix: rocm not yet supported on aarch64, so hide the output * flake.nix: expose checks * workflows: nix-ci: init; build flake outputs * workflows: nix-ci: add a job for eval * workflows: weekly `nix flake update` * workflows: nix-flakestry: drop tag filters ...and add a job for flakehub.com * workflows: nix-ci: add a qemu job for jetsons * flake.nix: suggest the binary caches * flake.lock: update to a commit recently cached by nixpkgs-cuda-ci --------- Co-authored-by: John <john@jLap.lan> Co-authored-by: Someone Serge <sergei.kozlukov@aalto.fi>	2024-01-02 12:38:15 +02:00
Nam D. Tran	b8646c035d	py : re-enable mmap in convert hf (#4732 ) * update: awq support llama-7b model * update: change order * update: benchmark results for llama2-7b * update: mistral 7b v1 benchmark * update: support 4 models * fix: Readme * update: ready for PR * update: readme * fix: readme * update: change order import * black * format code * update: work for bot mpt and awqmpt * update: readme * Rename to llm_build_ffn_mpt_awq * Formatted other files * Fixed params count * fix: remove code * update: more detail for mpt * fix: readme * fix: readme * update: change folder architecture * fix: common.cpp * fix: readme * fix: remove ggml_repeat * update: cicd * update: cicd * uppdate: remove use_awq arg * update: readme * llama : adapt plamo to new ffn ggml-ci * fix: update torch version --------- Co-authored-by: Trần Đức Nam <v.namtd12@vinai.io> Co-authored-by: Le Hoang Anh <v.anhlh33@vinai.io> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-02 11:23:38 +02:00
Daniel Bevenius	ffcf2ca432	finetune: fix typo in README.md (#4733 ) Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-01-02 10:16:55 +01:00
Georgi Gerganov	64ec26ed76	metal : enable shader debugging (cmake option) (#4705 ) * ggml : disable fast-math for Metal (cmake build only) ggml-ci * metal : fix Metal API debug warnings * cmake : add -fno-inline for Metal build (#4545) * metal : fix API debug warnings * metal : fix compile warnings * metal : use uint64_t for strides * cmake : rename option to LLAMA_METAL_SHADER_DEBUG * metal : fix mat-vec Q8_0 kernel for BS > 1 * metal : normalize mat-vec kernel signatures * cmake : respect LLAMA_QKK_64 option * metal : fix mat-vec Q4_K kernel for QK_K == 64 ggml-ci	2024-01-02 10:57:44 +02:00
Someone Serge	80f197bec8	flake.lock: update to a commit recently cached by nixpkgs-cuda-ci	2023-12-31 13:14:58 -08:00
Someone Serge	f0542c5698	flake.nix: suggest the binary caches	2023-12-31 13:14:58 -08:00
Someone Serge	5c68d6471c	workflows: nix-ci: add a qemu job for jetsons	2023-12-31 13:14:58 -08:00

... 16 17 18 19 20 ...

2639 Commits