Commit Graph

501 Commits

Author SHA1 Message Date
Tomas
fa68fa5932 fix #1224 reverse prompt and multi line (#1297)
* fix reverse prompt and multi line

* Code Formatting

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-04 03:02:30 -07:00
Georgi Gerganov
33002a8e03 ggml : vectorize Q8_0 quantization
https://github.com/ggerganov/ggml/pull/127#issuecomment-1533648531
2023-05-03 23:24:20 +03:00
khimaros
0cb9cc1e2f examples : read chat prompts from a template file (#1196) 2023-05-03 20:58:11 +03:00
Georgi Gerganov
aaa2ee9e37 minor : fix whitespaces (#1302) 2023-05-03 20:09:42 +03:00
Georgi Gerganov
d989735c4d minor : fix trailing whitespaces 2023-05-03 18:43:23 +03:00
KASR
2290898428 scripts : platform independent script to verify sha256 checksums (#1203)
* python script to verify the checksum of the llama models

Added Python script for verifying SHA256 checksums of files in a directory, which can run on multiple platforms. Improved the formatting of the output results for better readability.

* Update README.md

update to the readme for improved readability and to explain the usage of the python checksum verification script

* update the verification script

I've extended the script based on suggestions by @prusnak

The script now checks the available RAM, is there is enough to check the file at once it will do so. If not the file is read in chunks.

* minor improvment

small change so that the available ram is checked and not the total ram

* remove the part of the code that reads the file at once if enough ram is available

based on suggestions from @prusnak i removed the part of the code that checks whether the user had enough ram to read the entire model at once. the file is now always read in chunks.

* Update verify-checksum-models.py

quick fix to pass the git check
2023-05-03 18:31:28 +03:00
CRD716
46a50dd4bf examples : various prompt and example fixes (#1298)
* fix dan.txt

* miku prompt improvements

* use common characters
2023-05-03 18:26:47 +03:00
Evan Jones
5b067c42a2 llama : only copy used KV cache in get / set state (#1272)
* llama : only copy used KV cache in get / set state

* switch to ggml for copying k, v

* avoid designated initializers
2023-05-02 22:26:13 -04:00
DannyDaemonic
0f15df3c9e Process escape sequences given in prompts (#1173) 2023-05-02 18:46:20 -07:00
DannyDaemonic
998d3bf39e Handle signals properly on Windows (#1123) 2023-05-02 18:01:57 -07:00
DannyDaemonic
882fa33772 Call sh on build-info.sh (#1294) 2023-05-02 17:52:35 -07:00
kuvaus
d3c3b947ab fix build-info.h for git submodules (#1289)
* make git build info work with submodules

---------

Co-authored-by: Green Sky <green@g-s.xyz>
2023-05-03 02:43:43 +02:00
slaren
e859f73be1 fix missing parameters in llama_init_from_gpt_params (#1293) 2023-05-03 01:36:45 +02:00
Ron Evans
66990fbcc9 examples : add llama_init_from_gpt_params() common function (#1290)
Signed-off-by: deadprogram <ron@hybridgroup.com>
2023-05-02 23:39:51 +03:00
Georgi Gerganov
922abdc9c8 llama : fix compile warnings 2023-05-02 23:09:08 +03:00
Georgi Gerganov
29b3bf28a8 ggml : fix 32-bit ARM 2023-05-02 22:14:50 +03:00
Ron Evans
ae3d46ba30 examples : improve vertical alignment of a few variables (#1286)
Signed-off-by: deadprogram <ron@hybridgroup.com>
2023-05-02 20:53:52 +03:00
Marvin Gießing
84e8f840b8 ggml : fix ppc64le build error and make cmake detect Power processors (#1284)
* Fix ppc64le build issue

* Added support to detect ppc64* processors
2023-05-02 19:42:16 +03:00
Robert Brisita
19855ef65a llama : allow 0 as a seed number. (#1275) 2023-05-02 19:23:44 +03:00
Ron Evans
890730a4de main : switch input_noecho to input_echo to remove negation (#979)
Signed-off-by: deadprogram <ron@hybridgroup.com>
2023-05-02 19:13:26 +03:00
slaren
b828c402b3 ggml: add names to tensors (#1268)
* ggml: add names to tensors

* minor improvements to dot file formatting
2023-05-02 16:03:00 +02:00
DannyDaemonic
8f29a9ed07 Add git-based build information for better issue tracking (#1232)
* Add git-based build information for better issue tracking

* macOS fix

* "build (hash)" and "CMAKE_SOURCE_DIR" changes

* Redo "CMAKE_CURRENT_SOURCE_DIR" and clearer build messages

* Fix conditional dependency on missing target

* Broke out build-info.cmake, added find_package fallback, and added build into to all examples, added dependencies to Makefile

* 4 space indenting for cmake, attempt to clean up my mess in Makefile

* Short hash, less fancy Makefile, and don't modify build-info.h if it wouldn't change it
2023-05-01 18:23:47 +02:00
slaren
589b8dfe52 cuBLAS: refactor and optimize f16 mat mul performance (#1259)
* cuBLAS: refactor, convert fp16 to fp32 on device

* cuBLAS: use multiple streams, choose smartly between mul_mat_q and mul_mat_f16

* fix build

* cuBLAS: update block_q5_1
2023-05-01 18:11:07 +02:00
xloem
772f8bc9bb llama : update stubs for systems without mmap and mlock (#1266)
Co-authored-by: John Doe <john.doe@example.com>
2023-05-01 15:58:51 +03:00
Kerfuffle
d9d626efba ggml : fix ggml_used_mem() (#1264) 2023-05-01 14:56:07 +03:00
Georgi Gerganov
e1103bdf91 llama : fix session load / save (#1263) 2023-05-01 14:54:59 +03:00
slaren
83109d07fd cuBLAS: fall back to pageable memory if pinned alloc fails (#1233)
* cuBLAS: fall back to pageable memory if pinned alloc fails

* cuBLAS: do not use pinned memory if env variable GGML_CUDA_NO_PINNED is set
2023-05-01 13:32:22 +02:00
Alex Klinkhamer
b559cfc9a4 llama : let context be const when accessing const data (#1261) 2023-05-01 10:24:20 +03:00
Georgi Gerganov
96c1b57c16 ggml : fix UB (int << 31) 2023-04-30 22:28:51 +03:00
Pavol Rusnak
db80eecbb0 build: add armv{6,7,8} support to cmake (#1251)
- flags copied from Makefile
- updated comments in both CMakeLists.txt and Makefile to match reality
2023-04-30 20:48:38 +02:00
jon-chuang
90c41e9a61 common : better default number of threads (#934)
* commit

* fix

* try-catch

* apply code review

* improve

* improve

* add macos headers

* done

* remove color

* fix windows

* minor

* fix

* Apply suggestions from code review

Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>

* remove

* minor

* minor

---------

Co-authored-by: jon-chuang <jon-chuang@users.noreply.github.com>
Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
2023-04-30 21:41:35 +03:00
0cc4m
ec68c5101d ggml : add CLBlast q5_0, q5_1, q8_0 dequant kernels (#1225)
* Implement q5_0, q5_1 and q8_0

* Work around q5_0 OpenCL issue

* Fix q8_0 dequant kernel

* Move cl kernels into ggml-opencl.c

* Use two memcpy calls for q5_0 buffer transfer
2023-04-30 21:34:52 +03:00
Georgi Gerganov
eac5ecc6d1 ggml : add Q5 WASM SIMD + GGML_FTYPE 2023-04-30 19:07:43 +03:00
Stephan Walter
8ceb2442b7 Various fixes to mat_mul benchmark (#1253) 2023-04-30 12:32:37 +00:00
Georgi Gerganov
bbe6fce16a ggml : fix labels for GGML_OP_ALIBI 2023-04-30 10:25:46 +03:00
Georgi Gerganov
986f07b45e ggml : fix 32-bit ARM NEON 2023-04-29 21:34:23 +03:00
Georgi Gerganov
3d0b3fb66f ggml : use vzip instead of vuzp for consistency 2023-04-29 21:12:56 +03:00
Georgi Gerganov
36bf79f4ba ggml : fix visibility and unused warnings 2023-04-29 19:28:36 +03:00
Georgi Gerganov
79baa0ef36 ggml : fix #if for f32_f32 mul_mat (CLBlast) (#1229) 2023-04-29 18:43:42 +03:00
Georgi Gerganov
e40aa5185e ggml : adjust mul_mat_f16 work memory (#1226)
* llama : minor - remove explicity int64_t cast

* ggml : reduce memory buffer for F16 mul_mat when not using cuBLAS

* ggml : add asserts to guard for incorrect wsize
2023-04-29 18:43:28 +03:00
Georgi Gerganov
91dc074524 build : fix reference to old llama_util.h 2023-04-29 13:53:12 +03:00
Georgi Gerganov
d483156c17 examples : fix save-load-state + rename llama-util.h 2023-04-29 13:48:11 +03:00
Georgi Gerganov
1fb89c9913 common : change default parameters to pre-#1126 (#1223) 2023-04-29 09:51:06 +03:00
Ivan Stepanov
d00422bd62 llama : new sampling algorithms (#1126)
* Sample interface, new samplers.

New samplers:
- locally typical sampling
- tail free sampling
- frequency and presence penalty
- mirostat

Ignore EOS fix: -inf should be used.

* mirostat

* Added --logit-bias and --no-penalize-nl, removed std::span

* Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)

Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)

* Save and load example adjust

* Tests

* Windows build fix

* Windows test fix
2023-04-29 08:34:41 +03:00
slaren
796c107b37 cuBLAS: use host pinned memory and dequantize while copying (#1207)
* cuBLAS: dequantize simultaneously while copying memory

* cuBLAS: use host pinned memory

* cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory

* cuBLAS: also pin kv cache

* fix rebase
2023-04-29 02:04:18 +02:00
Henri Vasserman
19f0b58347 cuBLAS: non-contiguous tensor support (#1215)
* Cuda: non-contiguous tensor support

* remove extra stuff

* rename

* fix error

* more fixes, now OpenBLAS and CLBlast build too

* now then?
2023-04-29 01:31:56 +02:00
Stephan Walter
4aeea780bc Remove Q4_3 which is no better than Q5 (#1218) 2023-04-28 23:10:43 +00:00
Georgi Gerganov
3a8a3891b3 readme : update hot topics 2023-04-28 21:32:52 +03:00
Georgi Gerganov
b4e4b6fde1 ggml : sync ggml (ggml_alibi) 2023-04-28 20:51:05 +03:00
CRD716
c1343a8d6b examples : add Jeopardy example (#1168)
* Basic Setup

* Prevent Results.txt from coming up

* Prefixes, Line separators, etc

* editorcheck

* introduction to give more consistent results

* Basic graph thing

* Grading, ready for testing!

* Y'all ready to get funky?

* fix column removal stuff

* missed a few
2023-04-28 19:13:33 +03:00