Georgi Gerganov
ae9bcd1d75
ggml : improve ggml_is_contiguous logic ( #7856 )
...
* ggml : improve ggml_is_contiguous logic
ggml-ci
* ggml : support more contiguous cases
ggml-ci
2024-06-12 15:24:20 +03:00
Georgi Gerganov
f196f2a2c9
server : restore numeric prompts ( #7883 )
2024-06-12 14:42:29 +03:00
Meng, Hengyu
f16b859200
update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 ( #7894 )
...
In addition this reverts a workaround we had to do to workaround the upstream issue with expired intel GPG package keys in 2024.0.1-devel-ubuntu22.04
2024-06-12 19:05:35 +10:00
Patrice Ferlet
a128c6d094
Fix a typo and add Fedora 40 pacakge to install for Vulkan ( #7794 ) [no ci]
...
Fix "appropiate" to "appropriate" and add Fedora 40 packages to install to compile with Vulkan support
2024-06-12 11:18:16 +10:00
k.h.lai
478f365d75
vulkan: select only one device for single gpu with multiple drivers ( #7582 )
2024-06-11 21:26:05 +02:00
0cc4m
8ba2f270e6
Update Vulkan RoPE implementation ( #7818 )
...
* Update Vulkan RoPE implementation
* Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception
Minor fixes
* Fix segfault when running out of VRAM
Co-authored-by: slaren <slarengh@gmail.com >
---------
Co-authored-by: slaren <slarengh@gmail.com >
2024-06-11 21:20:29 +02:00
Deven Mistry
a08dd44cb8
fix broken link in pr template ( #7880 ) [no ci]
...
* fix broken link in pr template
* Update pull_request_template.md [no ci]
---------
Co-authored-by: Brian <mofosyne@gmail.com >
2024-06-12 02:18:58 +10:00
Brian
ea1bb2b82b
github: move PR template to .github/ root ( #7868 )
2024-06-11 17:43:41 +03:00
Johannes Gäßler
a9e26d8c45
llama-bench: more compact markdown tables ( #7879 )
2024-06-11 14:45:40 +02:00
Georgi Gerganov
99456d217a
tests : check the Python version ( #7872 )
...
ggml-ci
2024-06-11 10:10:20 +03:00
Johannes Gäßler
cb7240ad05
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) ( #7860 )
2024-06-11 08:26:07 +02:00
slaren
e4e6f9abea
fix CUDA CI by using a windows-2019 image ( #7861 )
...
* try to fix CUDA ci with --allow-unsupported-compiler
* trigger when build.yml changes
* another test
* try exllama/bdashore3 method
* install vs build tools before cuda toolkit
* try win-2019
2024-06-11 08:59:20 +03:00
Olivier Chafik
52819e6643
json: refine constraint for whitespace to avoid runaways yet allow pretty print ( #7866 )
2024-06-11 02:22:57 +01:00
Olivier Chafik
1de5991f7c
json: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841 )
...
* json: fix char pattern in grammar converters
* json: prevent number precision & whitespace runaways in example grammars
* json: add doc to grammar readme
2024-06-11 01:00:30 +01:00
Jared Van Bortel
9fa0c89c0c
cmake : fix CMake requirement for CUDA ( #7821 )
2024-06-10 18:32:10 -04:00
slaren
27d373a411
ci : try win-2019 on server windows test ( #7854 )
2024-06-10 15:18:41 +03:00
Georgi Gerganov
a98d3afc28
examples : remove --instruct remnants ( #7846 )
2024-06-10 15:00:15 +03:00
Georgi Gerganov
968cfb9d8d
server : improve "prompt" handling ( #7847 )
2024-06-10 14:59:55 +03:00
Johannes Gäßler
fc5f0f1647
CUDA: use tensor cores for MMQ ( #7676 )
...
* CUDA: int8 tensor cores for MMQ (legacy quants)
* fix out-of-bounds writes
* __builtin_assume -> GGML_CUDA_ASSUME
* fix writeback returning too early
2024-06-10 11:45:13 +02:00
Ben Ashbaugh
8112ed6c29
use the correct SYCL context for host USM allocations ( #7777 )
...
Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com >
2024-06-10 10:21:31 +01:00
Georgi Gerganov
cff824ecc3
flake.lock: Update ( #7838 )
...
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29)
→ 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-06-09 16:04:50 -07:00
Georgi Gerganov
ca9f39a167
imatrix : handle partial entries ( #7833 )
2024-06-09 20:19:35 +03:00
Nicolás Pérez
f6bbf78d23
docs: Added initial PR template with directions for doc only changes and squash merges [no ci] ( #7700 )
...
This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions.
Co-authored-by: Brian <mofosyne@gmail.com >
Co-authored-by: compilade <git@compilade.net >
2024-06-10 01:24:29 +10:00
mgroeber9110
36f9df2257
server: do not remove whitespace at the start of a completion chunk ( #7830 )
2024-06-09 20:50:35 +10:00
Johannes Gäßler
d4dc8e168b
CUDA: revise q8_1 data layout for mul_mat_q ( #7824 )
2024-06-09 09:42:25 +02:00
sasha0552
08bf59c672
convert-hf : set the model name based on cli arg, if present ( #7693 )
...
`--model-name` argument was added a while ago but did not do anything.
This commit fixes this issue and enables this feature.
2024-06-09 16:39:25 +10:00
compilade
46f8d599f6
convert-hf : match model part name prefix and suffix ( #7687 )
...
In #7075 , to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names.
But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present.
This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some
persistent problem, but shall do in the meantime.
2024-06-09 12:47:25 +10:00
compilade
bad6961237
gguf-py : decouple adding metadata from writing in GGUFWriter ( #7827 )
...
Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value.
In addition use_temp_file is now opt-in instead of opt-out defaulting to False.
Also GGUFWriter now does not require output file name until when actually writing to it.
And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata
2024-06-09 12:34:29 +10:00
slaren
1caca63a87
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend ( #7682 )" ( #7808 )
...
This reverts commit 9422c5e34b .
2024-06-09 01:43:39 +02:00
Olivier Chafik
20e69f8dff
url: save -mu downloads to new cache location ( #7826 )
...
* url: save -mu download to new cache location
* url: fs_get_cache_file_path util
* url: tweak sig of fs_get_cache_file
2024-06-08 21:21:08 +02:00
sasha0552
66217bbac6
server : smart slot selection using Longest Common Prefix ( #7728 )
...
* server : Smart selection of available slot using Longest Common Substring
* add usage
* remove trailing whitespaces
* Use Longest Common Prefix (LCP) instead of LCS
* Rename argument
2024-06-08 10:50:31 +03:00
slaren
b28de435a1
vulkan : reuse parent extra for views ( #7806 )
...
* vulkan : reuse parent extra for views
* Fix validation error when multiple compute contexts are used in a graph
---------
Co-authored-by: 0cc4m <picard12@live.de >
2024-06-07 19:47:49 +02:00
Christian Zhou-Zheng
c0600f4f5c
gguf-split : change binary multi-byte units to decimal ( #7803 )
2024-06-07 15:56:01 +03:00
intelmatt
544d23d303
cmake : fix BUILD_SHARED_LIBS=ON build ( #7784 )
...
common depends on pthreads in Linux
2024-06-07 15:15:07 +03:00
Johannes Gäßler
e8ac0b3518
server: update cache_prompt documentation [no ci] ( #7745 )
2024-06-07 11:15:49 +02:00
woodx
f91a2cbb62
server : do not get prompt in infill mode ( #7286 )
...
* avoid to get prompt in infill mode and embedding mode
* remove embedding mode
* refactor format
---------
Co-authored-by: wudexiang <wudexiang@bytedance.com >
2024-06-07 10:09:45 +03:00
pengxin99
bd158af596
[SYCL] fix softmax r2r result wrong issue ( #7811 )
2024-06-07 14:28:26 +08:00
slaren
bad6ac1321
check for nans in imatrix and quantize ( #7807 )
...
* imatrix : detect nan/inf values
* quantize : check imatrix for nan/inf values
2024-06-07 09:01:29 +03:00
Georgi Gerganov
4e92948760
server : fix --threads-http arg ( #7801 )
2024-06-06 19:19:59 +03:00
Georgi Gerganov
c2a2806fac
imatrix : migrate to gpt_params ( #7771 )
...
* imatrix : migrate to gpt_params
ggml-ci
* imatrix : add --save-frequency cli arg
* common : fix --no-ppl
2024-06-06 16:30:58 +03:00
Clint Herron
00552af560
Added support for . (any character) token in grammar engine. ( #6467 )
...
* Added support for . (any characer) token in grammar engine.
* Add integration tests for any-character symbol.
2024-06-06 06:08:52 -07:00
Mattheus Chediak
43ce4c2223
README minor fixes ( #7798 ) [no ci]
...
derievatives --> derivatives
2024-06-06 22:17:54 +10:00
Olivier Chafik
bb0026f4f1
grammars: x{min,max} repetition operator ( #6640 )
...
* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates
* grammars: handle `x{n}` and fix `x{n,n}`
* grammars: document new repetition operators
* grammars: uniform use of int for min & max
* grammars: refactor parser test
* grammar: parsing tests w/ natural pretty print of updated expectations
* grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all)
* grammars: improve test pretty print again
* grammars: pretty print rules and chars
* grammars: fix copy rule skipping
* grammars: disallow `a{,}` (not allowed in regexps)
* Update common/grammar-parser.cpp
Co-authored-by: Clint Herron <hanclinto@gmail.com >
* grammars: fix copy rule skipping (again) & display of expectations
* grammars: more test cases
* grammars: update reps parsing to bring ? / * / + closer to before
* json: use new GBNF repetitions{m,n} syntax
* grammars: update performance gotchas w/ repetition advice
* Update examples/json_schema_to_grammar.py
Co-authored-by: Clint Herron <hanclinto@gmail.com >
* Update examples/server/public/json-schema-to-grammar.mjs
Co-authored-by: Clint Herron <hanclinto@gmail.com >
* grammars: comment on rule repetitions
* grammars: ensure unambiguous number alternatives
* grammar: nit typo switched error msgs
* grammar: nit numbering in comment
* json: update numeric rule to be unambiguous
* Apply suggestions from code review
Co-authored-by: Clint Herron <hanclinto@gmail.com >
* Update examples/server/public/json-schema-to-grammar.mjs
Co-authored-by: Clint Herron <hanclinto@gmail.com >
* json: fix integral-part
* grammar: add repetition tests
---------
Co-authored-by: Clint Herron <hanclinto@gmail.com >
2024-06-06 10:07:06 +01:00
Joan Fontanals
add6ba8d05
llama : add jina v2 base code ( #7596 )
...
* feat: add changes to handle jina v2 base code
* fix: do not complicate things
* fix: fix the usage of the code model
* fix: fix comments
* fix: fix linting issues
* fix: remove ollama patches
* style : minor
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2024-06-06 10:22:41 +03:00
slaren
283627eb48
docker : build only main and server in their images ( #7782 )
...
* add openmp lib to dockerfiles
* build only main and server in their docker images
2024-06-06 08:19:49 +03:00
slaren
132e99131c
docker : add openmp lib ( #7780 )
2024-06-06 08:17:21 +03:00
Galunid
e4c231c3ff
Fix encoding in python scripts ( #7733 )
2024-06-06 03:07:24 +10:00
Johannes Gäßler
4ac499892a
CUDA: refactor mmq, dmmv, mmvq ( #7716 )
...
* CUDA: refactor mmq, dmmv, mmvq
* fix out-of-bounds write
* struct for qk, qr, qi
* fix cmake build
* mmq_type_traits
2024-06-05 16:53:00 +02:00
Georgi Gerganov
13a467c230
ggml : refactor rope norm/neox ( #7634 )
...
* ggml : unify rope norm/neox (CPU)
* ggml : fix compile warning
* ggml : remove GLM rope mode
ggml-ci
* metal : better rope implementation
ggml-ci
* cuda : better rope implementation
ggml-ci
* naming : n_orig_ctx -> n_ctx_orig
ggml-ci
* dev : add reminders to update backends
ggml-ci
* vulkan : fix ggml_rope_ext() usage
* cuda : fix array size + indents
ggml-ci
2024-06-05 11:29:20 +03:00
arch-btw
e2d1daa87f
readme : remove -ins ( #7759 )
...
-ins and --instruct were moved in https://github.com/ggerganov/llama.cpp/pull/7675
I have adjusted the README accordingly.
There was no trace of --chatml in the README.
2024-06-05 09:40:49 +03:00