Commit Graph

2639 Commits

Author SHA1 Message Date
Georgi Gerganov
7ddf5857e7 main : add self-extend support (#4815)
* examples : add passkey test

* passkey : better prints

* passkey : select pass key pos from CLI

* passkey : simplify n_past logic

* llama : "self-extend"-like context extension

* passkey : add comment

* main : add Self-Extend support

* llama : add comment about llama_kv_cache_seq_div
2024-01-08 11:18:32 +02:00
Georgi Gerganov
a386b0dd63 examples : add passkey test (#3856)
* examples : add passkey test

* passkey : better prints

* passkey : select pass key pos from CLI

* passkey : simplify n_past logic

* make : add passkey target

* passkey : add "self-extend"-like context extension (#4810)

* llama : "self-extend"-like context extension

* passkey : add comment

* passkey : add readme
2024-01-08 11:14:04 +02:00
Lars Grammel
9e96d6076a readme : add lgrammel/modelfusion JS/TS client for llama.cpp (#4814) 2024-01-07 22:24:11 +02:00
slaren
d513cfc4b5 llama-bench : add no-kv-offload parameter (#4812) 2024-01-07 17:59:01 +01:00
Johannes Gäßler
770ec541f9 CUDA: fixed redundant value dequantization (#4809) 2024-01-07 17:24:08 +01:00
Georgi Gerganov
ec08b3e86f llama : remove unused vars (#4796) 2024-01-07 14:29:36 +02:00
Georgi Gerganov
3a96073b59 llama : remove redundant GQA check (#4796) 2024-01-07 11:21:53 +02:00
Alex Azarov
30df691a96 llama.swiftui : use llama.cpp as SPM package (#4804) 2024-01-07 10:20:50 +02:00
Georgi Gerganov
52b664aece llama : print tensor meta for debugging 2024-01-07 09:51:12 +02:00
Alex Azarov
8c36aaf5a8 llama.swiftui : add visionOS target (#4805) 2024-01-07 09:46:55 +02:00
Konstantin Zhuravlyov
5391345fcc ggml : use __builtin_amdgcn_sudot4 in __dp4a for gfx11 (#4787) 2024-01-07 08:52:42 +02:00
Georgi Gerganov
003f85d7ea server : fix n_predict check (#4798) 2024-01-07 08:45:26 +02:00
Daniel Illescas Romero
34d18eff4c llama.swiftui : use correct pointer for llama_token_eos (#4797) 2024-01-06 17:12:59 +02:00
Georgi Gerganov
33c9d849fd examples : improve base-translate.sh script (#4783) 2024-01-06 11:40:24 +02:00
a-n-n-a-l-e-e
b52357162d cmake : check for openblas64 (#4134)
openblas v0.3.22 64-bit pkg-config file is named openblas64.pc
https://github.com/OpenMathLib/OpenBLAS/issues/3790
2024-01-05 18:04:40 +02:00
Ikko Eltociear Ashimine
f4ee045ad0 flake.nix : fix typo (#4700)
betwen -> between
2024-01-05 18:02:44 +02:00
Georgi Gerganov
7e27e37f26 metal : switch back to default.metallib (ggml/681)
ggml-ci
2024-01-05 18:02:06 +02:00
Georgi Gerganov
d6ec7cfc70 ggml : fix q2_k bpw in comments (ggml/680) 2024-01-05 18:02:06 +02:00
Finn Voorhees
0630261a48 ggml : add error handling to graph_compute (whisper/1714) 2024-01-05 18:02:06 +02:00
Georgi Gerganov
5ffddb870b ggml : do not sched_yield when calling BLAS (#4761)
* ggml : do not sched_yield when calling BLAS

ggml-ci

* ggml : fix do_yield logic

ggml-ci

* ggml : simplify do_yield logic

ggml-ci
2024-01-05 15:18:21 +02:00
Georgi Gerganov
41ced5ce3c examples : add few-shot translation example (#4783) 2024-01-05 15:11:10 +02:00
Daniel Bevenius
0c4cb7138c finetune : remove unused includes (#4756)
This commit removes unused includes from finetune.cpp.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-04 21:45:37 +02:00
Georgi Gerganov
82e82f484d server : send token probs for "stream == false" (#4714) 2024-01-04 19:56:33 +02:00
Johannes Gäßler
b0a9bb90f9 Print backend name on test-backend-ops failure (#4751) 2024-01-04 09:43:23 +01:00
singularity
2d08e99f47 llama.swiftui : support loading custom model from file picker (#4767)
* swiftui: support load model from file picker

* swiftui: remove trailing whitespace
2024-01-04 10:22:38 +02:00
Michael Coppola
85648efa9e server : fix options in README.md (#4765)
* fix examples/server/README.md

* minor : fix whitespace

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-04 10:17:09 +02:00
Georgi Gerganov
7967c42ffb ggml : include stdlib.h before intrin.h (#4736) 2024-01-04 10:12:26 +02:00
singularity
c399a87c6b llama.swiftui : fix build of ggml.metallib (#4754)
* metal: fix metal backend init failure in swiftui

* metal: build ggml.metallib instead of copy src

* llama.swift : remove debug flags from metallib build

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-04 09:58:16 +02:00
Daniel Bevenius
41a287de3c train : fix typo in overlapping-samples help msg (#4758)
This commit fixes a typo in the help message for the
--overlapping-samples option.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-03 19:53:40 +02:00
Ashraful Islam
59092ff962 swift : update Package.swift to use ggml as dependency (#4691)
* updates the package.swift to use ggml as dependency

* changes the ggml package url src to ggerganov
2024-01-03 19:30:02 +02:00
Georgi Gerganov
f2001ff46d cuda : simplify expression
Co-authored-by: slaren <slarengh@gmail.com>
2024-01-03 14:38:38 +02:00
Georgi Gerganov
09d890cb54 cuda : mark I16 and I32 ops as unsupported
ggml-ci
2024-01-03 14:38:38 +02:00
Georgi Gerganov
4ebea0bdce sync : ggml
ggml-ci
2024-01-03 14:38:38 +02:00
Georgi Gerganov
514561978d metal : add kernel_get_rows_i32
ggml-ci
2024-01-03 14:38:38 +02:00
Georgi Gerganov
74b4d9c1ed scripts : fix sync order + metal sed 2024-01-03 14:38:38 +02:00
Guillaume Wenzek
b2cfdd2ea3 ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639)
* add more int ops

* ggml_compute_forward_dup_bytes

* add tests

* PR comments

* tests : minor indentations

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-03 14:38:38 +02:00
Justin Parker
5b56760f5c server : throw an error when slot unavailable (#4741) 2024-01-03 10:43:19 +02:00
Georgi Gerganov
dc7752d269 metal : optimize ggml_mul_mat_id (faster Mixtral PP) (#4725)
* ggml : disable fast-math for Metal (cmake build only)

ggml-ci

* metal : fix Metal API debug warnings

* cmake : add -fno-inline for Metal build (#4545)

* metal : fix API debug warnings

* metal : fix compile warnings

* metal : use uint64_t for strides

* cmake : rename option to LLAMA_METAL_SHADER_DEBUG

* metal : fix mat-vec Q8_0 kernel for BS > 1

* metal : normalize mat-vec kernel signatures

* cmake : respect LLAMA_QKK_64 option

* metal : fix mat-vec Q4_K kernel for QK_K == 64

* metal : optimizing ggml_mul_mat_id (wip)

* metal : minor fix

* metal : opt mul_mm_id
2024-01-02 21:07:47 +02:00
Phil H
421b0da133 server : add token counts to html footer (#4738)
* server: add token counts to stats

* server: generate hpp

---------

Co-authored-by: phiharri <ph@got-root.co.uk>
2024-01-02 17:48:49 +02:00
Georgi Gerganov
af83cacf1e llama : llama_model_desc print number of experts 2024-01-02 16:26:45 +02:00
Marcus Dunn
7ea2965198 llama : replace all API facing int's with int32_t (#4577)
* replaced all API facing `int`'s with `int32_t`

* formatting and missed `int` in `llama_token_to_piece`
2024-01-02 16:15:16 +02:00
postmasters
1081e7c69c llama : differentiate the KV dims in the attention (#4657)
* Add n_key_dim and n_value_dim

Some models use values that are not derived from `n_embd`.
Also remove `n_embd_head` and `n_embd_gqa` because it is not clear
which "head" is referred to (key or value).

Fix issue #4648.

* Fix `llm_build_kqv` to use `n_value_gqa`

* Rebase

* Rename variables

* Fix llm_build_kqv to be more generic wrt n_embd_head_k

* Update default values for n_embd_head_k and n_embd_head_v

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Fix llm_load_tensors: the asserts were not backcompat

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-02 13:51:28 +02:00
Georgi Gerganov
8243feab46 editorconfig : fix whitespace and indentation #4710 2024-01-02 13:28:15 +02:00
minarchist
37b6fbf892 server : add --override-kv parameter (#4710)
* Changes to server to allow metadata override

* documentation

* flake.nix: expose full scope in legacyPackages

* flake.nix: rocm not yet supported on aarch64, so hide the output

* flake.nix: expose checks

* workflows: nix-ci: init; build flake outputs

* workflows: nix-ci: add a job for eval

* workflows: weekly `nix flake update`

* workflows: nix-flakestry: drop tag filters

...and add a job for flakehub.com

* workflows: nix-ci: add a qemu job for jetsons

* flake.nix: suggest the binary caches

* flake.lock: update

to a commit recently cached by nixpkgs-cuda-ci

---------

Co-authored-by: John <john@jLap.lan>
Co-authored-by: Someone Serge <sergei.kozlukov@aalto.fi>
2024-01-02 12:38:15 +02:00
Nam D. Tran
b8646c035d py : re-enable mmap in convert hf (#4732)
* update: awq support llama-7b model

* update: change order

* update: benchmark results for llama2-7b

* update: mistral 7b v1 benchmark

* update: support 4 models

* fix: Readme

* update: ready for PR

* update: readme

* fix: readme

* update: change order import

* black

* format code

* update: work for bot mpt and awqmpt

* update: readme

* Rename to llm_build_ffn_mpt_awq

* Formatted other files

* Fixed params count

* fix: remove code

* update: more detail for mpt

* fix: readme

* fix: readme

* update: change folder architecture

* fix: common.cpp

* fix: readme

* fix: remove ggml_repeat

* update: cicd

* update: cicd

* uppdate: remove use_awq arg

* update: readme

* llama : adapt plamo to new ffn

ggml-ci

* fix: update torch version

---------

Co-authored-by: Trần Đức Nam <v.namtd12@vinai.io>
Co-authored-by: Le Hoang Anh <v.anhlh33@vinai.io>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-02 11:23:38 +02:00
Daniel Bevenius
ffcf2ca432 finetune: fix typo in README.md (#4733)
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-02 10:16:55 +01:00
Georgi Gerganov
64ec26ed76 metal : enable shader debugging (cmake option) (#4705)
* ggml : disable fast-math for Metal (cmake build only)

ggml-ci

* metal : fix Metal API debug warnings

* cmake : add -fno-inline for Metal build (#4545)

* metal : fix API debug warnings

* metal : fix compile warnings

* metal : use uint64_t for strides

* cmake : rename option to LLAMA_METAL_SHADER_DEBUG

* metal : fix mat-vec Q8_0 kernel for BS > 1

* metal : normalize mat-vec kernel signatures

* cmake : respect LLAMA_QKK_64 option

* metal : fix mat-vec Q4_K kernel for QK_K == 64

ggml-ci
2024-01-02 10:57:44 +02:00
Someone Serge
80f197bec8 flake.lock: update
to a commit recently cached by nixpkgs-cuda-ci
2023-12-31 13:14:58 -08:00
Someone Serge
f0542c5698 flake.nix: suggest the binary caches 2023-12-31 13:14:58 -08:00
Someone Serge
5c68d6471c workflows: nix-ci: add a qemu job for jetsons 2023-12-31 13:14:58 -08:00