Commit Graph

2639 Commits

Author SHA1 Message Date
Georgi Gerganov
f8da188258 metal : fix kernel_norm (fixes Falcon on Metal) (#3057)
* metal : fix kernel_norm

ggml-ci

* metal : put warning in kernel_norm to not combine the loops

* metal : restore original F16 mat-vec multiplication

It works after the norm fixes

* common : don't do warm-up with more than n_batch tokens (close #3058)

ggml-ci

* metal : minor
2023-09-07 15:49:09 +03:00
Przemysław Pawełczyk
ce6bb57378 ggml : posixify madvise and pagesize (#3037)
* llama : use posix_madvise() instead of madvise() derived from BSD

sed -i 's,\<madvise\>,posix_&,g;s,\<MADV_,POSIX_&,g' llama.cpp

* ggml : use sysconf(_SC_PAGESIZE) instead of getpagesize() derived from BSD

sed -i 's,getpagesize(),sysconf(_SC_PAGESIZE),g' ggml.c

* metal : use sysconf(_SC_PAGESIZE) instead of getpagesize() derived from BSD

sed -i 's,getpagesize(),sysconf(_SC_PAGESIZE),g' ggml-metal.m
2023-09-07 11:15:06 +03:00
Georgi Gerganov
bf0b4c808d k-quants : fix zero-weight guard in Q6_K (ref #3040) 2023-09-06 12:40:57 +03:00
Kerfuffle
be4f496d09 convert-llama-ggml-to-gguf: Try to handle files older than GGJTv3 (#3023)
* convert-llama-ggmlv3-to-gguf: Try to handle files older than GGJTv3

* Better error messages for files that cannot be converted

* Add file type to GGUF output

* Rename to convert-llama-ggml-to-gguf.py

* Include original file type information in description

* Improve some informational output
2023-09-06 02:49:11 -06:00
Cebtenzzre
a76a94fe31 build : add LLAMA_METAL_NDEBUG flag (#3033) 2023-09-05 18:21:10 -04:00
Cebtenzzre
2ebfd0aa22 make : use new flag variables for recent changes (#3019) 2023-09-05 15:12:00 -04:00
Cebtenzzre
ed5a405c22 examples : replace fprintf to stdout with printf (#3017) 2023-09-05 15:10:27 -04:00
Erik Scholz
c9e735f4bf convert: fix convert.py not working with int filename_stem (#3028)
* fix implicit int to string conversion
* convert : remove an obsolete pyright comment

---------

Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-09-05 19:41:00 +02:00
Kawrakow
4f7048458f Guard against all weights in a super-block being zero (#3010)
* Guard against all weights in a super-block being zero

* Also guard against extremely small weights

Closes #2982 

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-09-05 09:55:33 +02:00
Georgi Gerganov
365578f31e llama : update logic for number of threads when using BLAS 2023-09-05 10:46:39 +03:00
Georgi Gerganov
9615d0c6b4 speculative : add grammar support (#2991)
* speculative : add grammar support

* grammars : add json_arr.gbnf

* grammar : add comments to new grammar file

* grammar : remove one nested level

* common : warm-up with 2 tokens - seems to work better

* speculative : print draft token pieces

* speculative : reuse grammar parser + better logs and comments

* speculative : avoid grammar_mem

* make : fix speculative build
2023-09-05 08:46:17 +03:00
Georgi Gerganov
5ce628ba1c py : minor 2023-09-04 22:50:50 +03:00
Georgi Gerganov
8e49675a7b build : on Mac OS enable Metal by default (#2901)
* build : on Mac OS enable Metal by default

* make : try to fix build on Linux

* make : move targets back to the top

* make : fix target clean

* llama : enable GPU inference by default with Metal

* llama : fix vocab_only logic when GPU is enabled

* common : better `n_gpu_layers` assignment

* readme : update Metal instructions

* make : fix merge conflict remnants

* gitignore : metal
2023-09-04 22:26:24 +03:00
slaren
8d85c7d12c ggml-opencl : store GPU buffer in ggml_tensor::extra (#2994) 2023-09-04 14:59:52 +02:00
Cebtenzzre
24d2622ba2 llama-bench : make cpp file non-executable (#2999) 2023-09-04 13:40:18 +03:00
Leng Yue
736e898675 make : add speculative example (#3003) 2023-09-04 13:39:57 +03:00
Aarni Koskela
d48f5c09df server : add a subtle loading animation to the edit box (#2466)
* editorconfig: add override for the server HTML (which already is 2-space indented)

* server: add a subtle loading animation to the edit box
2023-09-04 16:28:55 +08:00
Jiahao Li
b0cd5d83b3 2x faster (rms) norm cuda kernels (3.7% e2e improvement) (#2985)
* 2x faster (rms) norm cuda kernels

* Fix code style
2023-09-04 08:53:30 +02:00
slaren
822ad0f739 ggml-alloc : use virtual memory for measurement (#2973)
* ggml-alloc : use virtual memory for measurement

* compatibility fixes for MAP_ANONYMOUS

* fallback to fixed address for systems without virtual memory
2023-09-03 20:34:09 +02:00
Georgi Gerganov
a88f9a8ca8 speculative : PoC for speeding-up inference via speculative sampling (#2926)
* speculative : initial example

* speculative : print encoding speed

* speculative : add --draft CLI arg
2023-09-03 15:12:08 +03:00
Georgi Gerganov
9fe76e79e3 perplexity : fix ETA by warming up the model with an empty run 2023-09-03 13:43:17 +03:00
Kerfuffle
ae8e8ebe53 gguf(python): Fix special vocab handling when id < 0 (#2984) 2023-09-03 04:38:43 -06:00
Georgi Gerganov
811285a543 metal : restore 363f0bf and fix reduce in F16_F32 kernels (#2986) 2023-09-03 13:23:33 +03:00
Alon
928ab515d1 cov : disable comment in PRs (#2989) 2023-09-03 13:19:01 +03:00
opparco
7fe3095287 llama : fix bpe tokenize from byte (#2889) 2023-09-03 13:18:09 +03:00
Georgi Gerganov
89349ceb7b metal : revert 6af0bab until we fix it
This restores the generated text to be the same as before #2959
2023-09-03 12:40:56 +03:00
Alon
53afc99c41 cov : add Code Coverage and codecov.io integration (#2928)
* update .gitignore

* makefile: add coverage support (lcov, gcovr)

* add code-coverage workflow

* update code coverage workflow

* wun on ubuntu 20.04

* use gcc-8

* check why the job hang

* add env vars

* add LLAMA_CODE_COVERAGE=1 again

* - add CODECOV_TOKEN
- add missing make lcov-report

* install lcov

* update make file -pb flag

* remove unused  GGML_NITER from workflows

* wrap coverage output files in COV_TARGETS
2023-09-03 11:48:49 +03:00
Wentai Zhang
b3912e82f1 opencl : fix a bug in ggml_cl_pool_malloc() for ggml_cl_mul_mat_f32() (#2955)
Co-authored-by: Wentai Zhang <wentaizhang@tencent.com>
2023-09-03 11:46:44 +03:00
Kawrakow
c937b6d718 metal : more optimizations (#2959)
* Very minor speedup via simd-group synchronization in f16 x f32

* Another very minor speedup on metal

* Quite significant PP speedup on metal

* Another attempt

* Minor

* Massive improvement for TG for fp16

* ~4-5% improvement for Q8_0 TG on metal

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-03 11:06:22 +03:00
kchro3
c2873512e6 swift : add support for k-quants (#2983) 2023-09-03 09:21:05 +03:00
Kerfuffle
ba445e659c convert.py : BPE fixes (#2938)
* convert.py: BPE fixes?

* Remove unnecessary conditional in addl token error handling
2023-09-03 08:52:13 +03:00
Ido S
a8b85ea614 docs : add catai to README.md (#2967) 2023-09-03 08:50:51 +03:00
momonga
b41680f397 examples : fix gpt-neox (#2943)
Co-authored-by: mmnga <mmnga1mmnga@gmail.com>
2023-09-03 08:36:28 +03:00
kchro3
f96a0722fa swift : add missing c file to Package.swift (#2978) 2023-09-03 08:27:25 +03:00
Cebtenzzre
af0127b31a make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS (#2886)
* make : remove unused -DGGML_BIG_ENDIAN

* make : put preprocessor stuff in CPPFLAGS

* make : pass Raspberry Pi arch flags to g++ as well

* make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS

* make : fix inverted conditional
2023-09-03 08:26:59 +03:00
Kerfuffle
9f664f66a4 logging: Fix creating empty file even when disabled (#2966)
* logging: Fix creating empty file even when disabled

* Minor formatting fix

Co-authored-by: staviq <staviq@gmail.com>

---------

Co-authored-by: staviq <staviq@gmail.com>
2023-09-02 11:53:55 -06:00
bandoti
626da973c4 readme : update clblast instructions (#2903)
* Update Windows CLBlast instructions

* Update Windows CLBlast instructions

* Remove trailing whitespace
2023-09-02 15:53:18 +03:00
Karsten Weiss
837df7e8d2 metal : show all Metal device instances in the system (#2952)
* ggml_metal_init: Show all Metal device instances in the system

Also show the default Metal device that was picked.

* Update ggml-metal.m

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-02 15:29:09 +03:00
Jhen-Jie Hong
65702a083e k-quants : fix build on armv7 (android only) (#2920)
* k-quants : fix build on armv7

* ggml : cleanup unused arm32 specific impl

* k-quants : avoid some unused vzero / mzero define

* ggml-alloc : use 4g for MEASURE_MAX_SIZE in 32-bit arm
2023-09-02 15:23:45 +03:00
Jhen-Jie Hong
beed103e3d server : avoid aniprompt in probabilities of final response (#2849) 2023-09-02 08:31:46 +08:00
Engininja2
26205b02a4 cuda : vsubss4 for older versions of ROCm/clang (#2942) 2023-09-01 23:33:19 +02:00
ZHAOKAI WANG
0e58794306 readme : quick start command fix (#2908)
* quick start command fix

* quick start win command fix
2023-09-01 17:06:44 +03:00
Kerfuffle
2ac8bf40d0 Allow quantize to only copy tensors, some other improvements (#2931)
* Allow quantize tool to only copy tensors to allow repackaging models.

* Slightly better logic when requantizing.

* Change help message to go to `stdout`.
2023-09-01 08:02:48 -06:00
Georgi Gerganov
7166fdca0f llama2c : rename function 2023-09-01 17:01:11 +03:00
Cebtenzzre
6fd8d848b3 make : use unaligned vector moves on MinGW (#2945)
Fixes #2922
2023-09-01 16:53:14 +03:00
m3ndax
03c5668102 minor : add const qualifiers (#2853)
* made the methods const

# Conflicts:
#	examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp

* made method const

* Update convert-llama2c-to-ggml.cpp

removed write_raw and write_u32

* llama2c : remove misleading const

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-01 16:47:27 +03:00
Konstantin Herud
80569510d8 docs : add java-llama.cpp to README.md (#2935) 2023-09-01 16:36:14 +03:00
Cebtenzzre
a512948d93 build : fix most gcc and clang warnings (#2861)
* fix most gcc and clang warnings

* baby-llama : remove commented opt_params_adam

* fix some MinGW warnings

* fix more MinGW warnings
2023-09-01 16:34:50 +03:00
Ben Siraphob
5b374b4cfe examples : add C grammar (#2357) 2023-09-01 16:32:14 +03:00
Tameem
fcaf9592ce ggml : add RISC-V vector intrinsics support (#2929)
* added support for RISCV CFLAGS & native compile + cross compile options

* Add RISC-V Vector Intrinsics Support

Added RVV intrinsics for following
   ggml_vec_dot_q4_0_q8_0
   ggml_vec_dot_q4_1_q8_1
   ggml_vec_dot_q5_0_q8_0
   ggml_vec_dot_q5_1_q8_1
   ggml_vec_dot_q8_0_q8_0

Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai>
Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>

---------

Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
Co-authored-by: moiz.hussain <moiz.hussain@10xengineers.ai>
Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai>
2023-09-01 16:27:40 +03:00