Commit Graph

94 Commits

Author SHA1 Message Date
Olivier Chafik
6e744919ab build: generate hex dump of server assets during build (#6661)
* `build`: generate hex dumps of server assets on the fly

* build: workaround lack of -n on gnu xxd

* build: don't use xxd in cmake

* build: don't call xxd from build.zig

* build: more idiomatic hexing

* build: don't use xxd in Makefile (od hackery instead)

* build: avoid exceeding max cmd line limit in makefile hex dump

* build: hex dump assets at cmake build time (not config time)
2024-04-21 18:48:53 +01:00
slaren
b266290ee8 ggml : group all experts in a single ggml_mul_mat_id (#6505)
* ggml : group all experts in a single ggml_mul_mat_id
cuda : improve mmid row copy

* cuda : fix bin bcast with non-cont src0

* test-backend-ops : only run all mul mat tests for base types

* llama : disable moe offloading with SYCL

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-18 15:18:48 +02:00
Pierrick Hymbert
5dc0da80c8 model: support arch DbrxForCausalLM (#6515)
* model: dbrx convert to gguf
#6344

* llama: support dbrx
#6344

* doc: dbrx: add the model as supported

* scripts: get-wikitext-2 add unzip

* llama: increase maximum experts allowed

* llama: factorize moe graph implementation between grok, mixtral and dbrx


---------

Co-authored-by: Megha Agarwal <16129366+megha95@users.noreply.github.com>
2024-04-13 11:33:52 +02:00
Daniel Bevenius
63d85af088 scripts : add --outdir option to hf.sh (#6600)
* scripts : add --outdir option to hf.sh

This commit adds an option to the hf.sh script that allows the user to
specify an output directory for the downloaded file.

The motivation for this changes is that examples that use the hf.sh
script to download models from huggingface can now specify the output
directory, perhaps to the `models` directory to keep them in one place
and not clutter the root directory.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! scripts : add --outdir option to hf.sh

Fix format of the --outdir option in the usage message.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-11 16:22:47 +03:00
Georgi Gerganov
61e36e1532 sync : ggml 2024-04-09 20:29:06 +03:00
Georgi Gerganov
41c01483dd license : update copyright notice + add AUTHORS (#6405)
* license : add AUTHORS

* authors : update

* scipts : add LICENSE and gen-authors.sh to sync
2024-04-09 09:23:19 +03:00
Georgi Gerganov
2c80dc319b sync : ggml 2024-04-07 17:05:51 +03:00
Georgi Gerganov
9638aa3727 scripts : sync ggml-cuda folder 2024-04-07 16:08:12 +03:00
Georgi Gerganov
0bcde06402 sync : ggml 2024-04-06 18:27:46 +03:00
Johannes Gäßler
5a5d9cbbe2 compare-llama-bench.py: fix long hexsha args (#6424) 2024-04-01 13:30:43 +02:00
Georgi Gerganov
537fc022b8 sync : ggml (#6351)
* sync : ggml

ggml-ci

* cuda : move GGML_CUDA_DMMV constants to dmmv.cuh

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-03-29 17:45:46 +02:00
slaren
f6b28778af cuda : rename build flag to LLAMA_CUDA (#6299) 2024-03-26 01:16:01 +01:00
Johannes Gäßler
56d74c8210 lookup: complement data from context with general text statistics (#5479)
* lookup: evaluation tools, use corpus/previous gens

* fixup! lookup: evaluation tools, use corpus/previous gens

* fixup! lookup: evaluation tools, use corpus/previous gens

* fixup! lookup: evaluation tools, use corpus/previous gens

* fixup! lookup: evaluation tools, use corpus/previous gens
2024-03-23 01:24:36 +01:00
Georgi Gerganov
2c0917ae95 sync : ggml 2024-03-10 20:10:46 +02:00
Georgi Gerganov
1cad57678f ggml : add ggml-common.h to deduplicate shared code (#5940)
* ggml : add ggml-common.h to shared code

ggml-ci

* scripts : update sync scripts

* sycl : reuse quantum tables

ggml-ci

* ggml : minor

* ggml : minor

* sycl : try to fix build
2024-03-09 12:47:57 +02:00
slaren
1c6fa131d3 compare-llama-bench.py : remove mul_mat_q (#5892) 2024-03-05 22:27:29 +01:00
Georgi Gerganov
96ea2cb08d sync : ggml
ggml-ci
2024-03-04 20:54:23 +02:00
Georgi Gerganov
80b5fed6d0 sync : ggml 2024-03-04 10:40:04 +02:00
Georgi Gerganov
ba72410b27 scripts : add pod-llama.sh 2024-03-02 16:54:20 +02:00
Pierrick Hymbert
5d743c9e16 llama : cleanup unused mmq flags (#5772)
* cleanup unused --no-mul-mat-q,-nommq, -mmq, --mul-mat-q, mul_mat_q

* remove: mul_mat_q in compare llama bench and usage

* update llama-bench

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-03-01 13:39:06 +02:00
Georgi Gerganov
49255c8c2e sync : ggml 2024-02-28 11:17:32 +02:00
Georgi Gerganov
dfb0d63843 sync : ggml 2024-02-22 23:21:05 +02:00
Georgi Gerganov
f85c73f1d0 sync : ggml 2024-02-21 16:52:52 +02:00
Georgi Gerganov
0b5d2708a7 sync : ggml (#5633)
* ggml : fix conv_2d batch mode (ggml/737)

Co-authored-by: bssrdf <bssrdf@gmail.com>

* ggml : compute forward no longer pass src tensors (ggml/729)

* sync : ggml

ggml-ci

---------

Co-authored-by: bssrdf <merlintiger@hotmail.com>
Co-authored-by: bssrdf <bssrdf@gmail.com>
2024-02-21 16:17:10 +02:00
Georgi Gerganov
3a1f76bfc6 sync : ggml
ggml-ci
2024-02-19 15:09:43 +02:00
Jared Van Bortel
6b6f1ee292 build : pass all warning flags to nvcc via -Xcompiler (#5570)
* build : pass all warning flags to nvcc via -Xcompiler
* make : fix apparent mis-merge from #3952
* make : fix incorrect GF_CC_VER for CUDA host compiler
2024-02-18 16:21:52 -05:00
Georgi Gerganov
a889a4a4c6 ci : fix wikitext url + compile warnings (#5569)
ggml-ci
2024-02-18 22:39:30 +02:00
Georgi Gerganov
b6ebf5b6ff scripts : add helpers script for bench comparing commits (#5521)
* scripts : add helpers script for bench comparing commits

* scripts : detect CUDA

* set flags after checking the command line

* fix make flags

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-02-16 15:14:40 +02:00
Georgi Gerganov
31e77c5029 scripts : add hf.sh helper script (#5501)
* scripts : add hf.sh helper scripts

* hf : add error logs

* hf : add support for --repo and --file
2024-02-15 15:41:15 +02:00
Georgi Gerganov
0ca4e0c14c sync : ggml (#5452)
* ggml-alloc : v3 (ggml/727)

* ggml-alloc v3

ggml-ci

* fix ci

ggml-ci

* whisper : check for backend buffer allocation failures

* whisper : avoid leaks when initialization fails

* cleanup

ggml-ci

* style fixes

ggml-ci

* sync : ggml

* update llama.cpp, clip.cpp, export-lora.cpp

* update finetune.cpp, train-text-from-scratch.cpp

ggml-ci

* ggml-backend : reduce alignment to 32 to match gguf and fix mmap

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-02-12 09:16:06 +02:00
Georgi Gerganov
71e5730e05 scripts : update sync scripts with new backends 2024-02-10 09:53:05 +02:00
Georgi Gerganov
14d2486167 sync : ggml 2024-02-10 09:30:36 +02:00
Georgi Gerganov
120cae84ec scripts : fix typos, cleanup (#5303) 2024-02-05 09:48:03 +02:00
Нияз Гарифзянов
b8eac7a045 scripts : add non-interactive server-llm.sh (#5303)
* Update server-llm.sh

Add flag --non-interactive that allows run script without asking a permission

* Update scripts/server-llm.sh

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-05 09:43:57 +02:00
Georgi Gerganov
afd1dc6907 scripts : parse wtype in server-llm.sh (#5167)
* scripts : parse wtype in server-llm.sh

* scripts : fix check for wfile
2024-02-02 14:23:40 +02:00
Neo Zhang Jianyu
d021491248 support SYCL backend windows build (#5208)
* support SYCL backend windows build

* add windows build in CI

* add for win build CI

* correct install oneMKL

* fix install issue

* fix ci

* fix install cmd

* fix install cmd

* fix install cmd

* fix install cmd

* fix install cmd

* fix win build

* fix win build

* fix win build

* restore other CI part

* restore as base

* rm no new line

* fix no new line issue, add -j

* fix grammer issue

* allow to trigger manually, fix format issue

* fix format

* add newline

* fix format

* fix format

* fix format issuse

---------

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-01-31 08:08:07 +05:30
Georgi Gerganov
bb6467bd41 sync : ggml (#0) 2024-01-30 16:21:57 +02:00
Georgi Gerganov
c1aa8de15f sync : ggml 2024-01-28 19:48:05 +02:00
Georgi Gerganov
32e84d74f1 sync : ggml 2024-01-27 17:00:24 +02:00
Georgi Gerganov
8289eb006a scripts : move run-with-preset.py from root to scripts folder 2024-01-26 17:09:44 +02:00
crasm
f4cc7db364 ci : add model tests + script wrapper (#4586)
* scripts : add lib.sh and lib_test.sh

* scripts : stub out new ci-run.sh script

* scripts : switch to PascalCase for functions

This looks a little odd at first, but I find it very useful as a
convention to know if a command is part of our code vs a builtin.

* scripts : add some fancy conversion from snake_case to PascalCase

* Add venv to ci/run.sh

* Revert scripts work

* scripts : add wrapper script for local use of ci/run.sh

* Simplify .gitignore for tests, clang-tidy fixes

* Label all ctest tests

* ci : ctest uses -L main

* Attempt at writing ctest_with_model

* Update test-model-load-cancel

* ci : add ctest_with_model for debug and release

ggml-ci

* Fix gg_get_model function

ggml-ci

* got stuck on CMake

* Add get_model.cpp to tests/CMakeLists.txt

ggml-ci

* Fix README.md output for ctest_with_model

ggml-ci

* workflows : use `-L main` for all ctest

ggml-ci

* Fixes

* GG_RUN_CTEST_MODELFILE => LLAMACPP_TESTMODELFILE
* Always show warning rather than failing if model file variable is not
  set

* scripts : update usage text for ci-run.sh
2024-01-26 14:18:00 +02:00
Georgi Gerganov
8e6a95c27e scripts : add get-winogrande.sh 2024-01-18 20:45:39 +02:00
Georgi Gerganov
98642b6cb3 scritps : add helper script to get hellaswag data in txt format 2024-01-18 11:44:49 +02:00
Georgi Gerganov
af72b6c82d sync : ggml 2024-01-17 20:54:50 +02:00
Georgi Gerganov
97e013bd09 scripts : sync-ggml-am.sh option to skip commits 2024-01-14 11:08:41 +02:00
Georgi Gerganov
2e51f37f77 sync : ggml 2024-01-14 00:14:46 +02:00
Johannes Gäßler
7079775d2b compare-llama-bench: tweak output format (#4910) 2024-01-13 15:52:53 +01:00
Georgi Gerganov
211567ae1a sync : ggml 2024-01-12 22:02:43 +02:00
Georgi Gerganov
d3d92fe41c sync : ggml 2024-01-11 09:39:08 +02:00
Johannes Gäßler
3091bc1ab0 Python script to compare commits with llama-bench (#4844) 2024-01-10 01:04:33 +01:00