Commit Graph

2639 Commits

Author SHA1 Message Date
Adithya Balaji
5cdd9ef43f readme : update with CMake and windows example (#748)
* README: Update with CMake and windows example

* README: update with code-review for cmake build
2023-04-05 17:36:12 +03:00
at8u
06bf2b5f86 examples : add Miku.sh (#724)
* Add Miku.sh to examples

* Add missing line to prompt in Miku.sh

* Add --keep param to Miku.sh

* Remove '[end_of_conversation]' line from Miku.sh

No longer is necessary.
2023-04-05 17:32:42 +03:00
Andrew Duffy
54aaf78743 Add Accelerate/BLAS when using Swift (#765) 2023-04-05 06:44:24 -04:00
mgroeber9110
c94c876bb6 Windows: reactive sigint handler after each Ctrl-C (#736) 2023-04-03 18:00:55 +02:00
SebastianApel
10d8b9e8b9 10+% performance improvement of ggml_vec_dot_q4_0 on AVX2 (#654)
* Performance improvement of AVX2 code
* Fixed problem with MSVC compiler
* Reviewer comments: removed double semicolon, deleted empty line 1962
2023-04-03 09:52:28 +02:00
Ivan Stepanov
c59dd952e4 Define non-positive temperature behavior (#720) 2023-04-03 02:19:04 +02:00
bsilvereagle
dd873d495a Remove torch GPU dependencies from the Docker.full image (#665)
By using `pip install torch --index-url https://download.pytorch.org/whl/cpu`
instead of `pip install torch` we can specify we want to install a CPU-only version
of PyTorch without any GPU dependencies. This reduces the size of the Docker image
from 7.32 GB to 1.62 GB
2023-04-03 00:13:03 +02:00
Thatcher Chamberlin
01e2261e5f Add a missing step to the gpt4all instructions (#690)
`migrate-ggml-2023-03-30-pr613.py` is needed to get gpt4all running.
2023-04-02 12:48:57 +02:00
Christian Falch
35405e6856 Added api for getting/setting the kv_cache (#685)
The api provides access methods for retrieving the current memory buffer for the kv_cache and its token number.
It also contains a method for setting the kv_cache from a memory buffer.

This makes it possible to load/save history - maybe support --cache-prompt paramater as well?

Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
2023-04-02 12:23:04 +02:00
Marian Cepok
b61594c857 ggml : change ne to int64_t (#626) 2023-04-02 13:21:31 +03:00
Leonardo Neumann
727a6059fe examples : add gpt4all script (#658) 2023-04-02 10:56:20 +03:00
Stephan Walter
aa6766c975 llama : do not allocate KV cache for "vocab_only == true" (#682)
Fixes sanitizer CI
2023-04-02 10:18:53 +03:00
Fabian
6285e389b3 make : use -march=native -mtune=native on x86 (#609) 2023-04-02 10:17:05 +03:00
Murilo Santana
5b61d03180 fix default params for examples/main (#697) 2023-04-02 04:41:12 +02:00
Ikko Eltociear Ashimine
bb36bca0f8 py: huggingface -> Hugging Face (#686) 2023-04-01 18:38:18 +02:00
rimoliga
34977d15c2 readme: replace termux links with homepage, play store is deprecated (#680) 2023-04-01 16:57:30 +02:00
Slaren
8060cfc838 Show error message when -f fails 2023-04-01 16:08:40 +02:00
Stephan Walter
f7ea9fa785 Enable -std= for cmake builds, fix warnings (#598) 2023-03-31 19:19:16 +00:00
slaren
770361c7a7 Optimize AVX2 ggml_vec_dot_q4_0 (#642) 2023-03-31 15:55:52 +00:00
perserk
f49b853100 Add AVX acceleration (#617)
* ggml : add AVX quantize_row_q4_0()

* ggml : add AVX ggml_vec_dot_q4_0()

* ggml : refactor AVX part of ggml_vec_dot_q4_0()

https://github.com/ggerganov/llama.cpp/pull/617#issuecomment-1489985645
2023-03-31 13:55:44 +02:00
Pavol Rusnak
6cdc182e32 py : cleanup the code
- use f-strings where possible
- drop first param of encode/decode functions since "utf-8" is the default
2023-03-31 10:32:01 +02:00
Pavol Rusnak
e88a8002b5 drop quantize.py (now that models are using a single file) 2023-03-31 01:07:32 +02:00
Georgi Gerganov
e19e304480 readme : update supported models 2023-03-30 22:31:54 +03:00
Justine Tunney
45f44d8945 Introduce GGML migration tool for new file format
If you deleted your old Meta LLaMA .pth files, then the
migrate-ggml-2023-03-30-pr613.py script will allow you to convert your
old ggml files into the new mmap()'able format.

See #613
2023-03-30 12:28:25 -07:00
Justine Tunney
1eaba2c35b Ensure --mlock works properly with mmap() support 2023-03-30 12:28:25 -07:00
Justine Tunney
bb3e5452e9 Make loading weights 10-100x faster
This is a breaking change that's going to give you three benefits:

1. Your inference commands should load 100x faster
2. You may be able to safely load models 2x larger
3. You can run many concurrent inference processes

This was accomplished by changing the file format so we can mmap()
weights directly into memory without having to read() or copy them
thereby ensuring the kernel can make its file cache pages directly
accessible to our inference processes; and secondly, that the file
cache pages are much less likely to get evicted (which would force
loads to hit disk) because they're no longer competing with memory
pages that were needlessly created by gigabytes of standard i/o.

The new file format supports single-file models like LLaMA 7b, and
it also supports multi-file models like LLaMA 13B. Our Python tool
now merges the foo.1, foo.2, etc. files back into a single file so
that the C++ code which maps it doesn't need to reshape data every
time. That's made llama.cpp so much simpler. Much of its load code
has now been deleted.

Furthermore, this change ensures that tensors are aligned properly
on a 32-byte boundary. That opens the door to seeing if we can get
additional performance gains on some microprocessors, by using ops
that require memory alignment.

Lastly note that both POSIX and the Windows platform are supported

Fixes #91
2023-03-30 12:28:25 -07:00
Slaren
81c13359bb Initial windows support (untested) 2023-03-30 12:28:25 -07:00
Slaren
c2aa32e62f Always initialize mm_addr and mm_length in llama_model 2023-03-30 12:28:25 -07:00
Slaren
7697211099 Unmap the file in llama_free 2023-03-30 12:28:25 -07:00
Slaren
4ccd1fa7b4 Make mmap_file static 2023-03-30 12:28:25 -07:00
Slaren
ee1eb8aab0 Fix ggml_init_params in quantize 2023-03-30 12:28:25 -07:00
Slaren
4608b1ee54 Add mmap support for model files 2023-03-30 12:28:25 -07:00
Stephan Walter
f27e27c590 cmake : properly invoke CTest (#629) 2023-03-30 20:56:59 +03:00
Casey Primozic
3b78ca3c81 Remove unused variable (#607)
* It seems some new warning were added recently that exposed this.  I wrote the code that included this unused variable originally and it is indeed not needed.
2023-03-30 17:53:35 +00:00
david raistrick
dc396d9386 make : fix darwin f16c flags check (#615)
...there was no check.  ported upstream from https://github.com/zanussbaum/gpt4all.cpp/pull/2 (I dont see any clean path for upstream patches)
2023-03-30 20:34:45 +03:00
Georgi Gerganov
46bc56c86e ggml : fix NEON signs (close #620, #622) 2023-03-30 20:27:32 +03:00
slaren
c7a5aebde4 Fix GGML_F32Cx8_STORE in AVX without F16C path (#619) 2023-03-30 11:16:30 +02:00
anzz1
357f21576e ci : re-enable AVX512 testing (Windows-MSVC) (#584)
* CI: Re-enable AVX512 testing (Windows-MSVC)

Now with 100% less base64 encoding

* plain __cpuid is enough here
2023-03-29 23:44:39 +03:00
Georgi Gerganov
7639a7c89c ggml : init time on first ggml_init() call 2023-03-29 22:15:34 +03:00
Georgi Gerganov
ed1554989a llama : fix compile warnings when reading the vocab 2023-03-29 22:13:12 +03:00
Georgi Gerganov
169c724830 ggml : add ARM_NEON dequantize_row_q4_1() 2023-03-29 22:10:01 +03:00
Georgi Gerganov
31887afce7 ggml : add ARM_NEON quantize_row_q4_1() 2023-03-29 22:03:07 +03:00
Georgi Gerganov
fe3f4493ec ggml : add ARM_NEON ggml_vec_dot_q4_1() 2023-03-29 22:03:07 +03:00
Pavol Rusnak
f5b1f5b676 rename convert_ggml_to_pth.py -> convert-ggml-to-pth.py (#600)
to match filenames of other converters
2023-03-29 20:09:25 +02:00
Thérence
02ddd7f6d9 Create chat-13B.bat (#592)
* Create chat-13B.bat

Same script than chat-13B.sh, but for windows users.
Tested and working on windows 10/11 v 22H2

* Apply suggestions from code review

---------

Co-authored-by: anzz1 <anzz1@live.com>
2023-03-29 20:21:09 +03:00
Georgi Gerganov
32d84d4876 readme : fix typos 2023-03-29 19:38:31 +03:00
Georgi Gerganov
689ed6a51e readme : add GPT4All instructions (close #588) 2023-03-29 19:37:20 +03:00
Georgi Gerganov
39c1b01a04 py : add GPT4All conversion script
For now: copy-paste
Too much time for me to deduplicate the python code
2023-03-29 19:29:52 +03:00
Maël Kerbiriou
462548c4a1 llama : use the same threshold for OpenBLAS and ggml thread limiting (#577) 2023-03-29 19:10:07 +03:00
Tobias Lütke
0f9f0fdabf add example of re-act pattern (#583)
* add example of re-act pattern

* spelling...

* fixed whitespace in reverse prompt issue
2023-03-29 10:10:24 -05:00