Commit Graph

266 Commits

Author SHA1 Message Date
Justine Tunney
45f44d8945 Introduce GGML migration tool for new file format
If you deleted your old Meta LLaMA .pth files, then the
migrate-ggml-2023-03-30-pr613.py script will allow you to convert your
old ggml files into the new mmap()'able format.

See #613
2023-03-30 12:28:25 -07:00
Justine Tunney
1eaba2c35b Ensure --mlock works properly with mmap() support 2023-03-30 12:28:25 -07:00
Justine Tunney
bb3e5452e9 Make loading weights 10-100x faster
This is a breaking change that's going to give you three benefits:

1. Your inference commands should load 100x faster
2. You may be able to safely load models 2x larger
3. You can run many concurrent inference processes

This was accomplished by changing the file format so we can mmap()
weights directly into memory without having to read() or copy them
thereby ensuring the kernel can make its file cache pages directly
accessible to our inference processes; and secondly, that the file
cache pages are much less likely to get evicted (which would force
loads to hit disk) because they're no longer competing with memory
pages that were needlessly created by gigabytes of standard i/o.

The new file format supports single-file models like LLaMA 7b, and
it also supports multi-file models like LLaMA 13B. Our Python tool
now merges the foo.1, foo.2, etc. files back into a single file so
that the C++ code which maps it doesn't need to reshape data every
time. That's made llama.cpp so much simpler. Much of its load code
has now been deleted.

Furthermore, this change ensures that tensors are aligned properly
on a 32-byte boundary. That opens the door to seeing if we can get
additional performance gains on some microprocessors, by using ops
that require memory alignment.

Lastly note that both POSIX and the Windows platform are supported

Fixes #91
2023-03-30 12:28:25 -07:00
Slaren
81c13359bb Initial windows support (untested) 2023-03-30 12:28:25 -07:00
Slaren
c2aa32e62f Always initialize mm_addr and mm_length in llama_model 2023-03-30 12:28:25 -07:00
Slaren
7697211099 Unmap the file in llama_free 2023-03-30 12:28:25 -07:00
Slaren
4ccd1fa7b4 Make mmap_file static 2023-03-30 12:28:25 -07:00
Slaren
ee1eb8aab0 Fix ggml_init_params in quantize 2023-03-30 12:28:25 -07:00
Slaren
4608b1ee54 Add mmap support for model files 2023-03-30 12:28:25 -07:00
Stephan Walter
f27e27c590 cmake : properly invoke CTest (#629) 2023-03-30 20:56:59 +03:00
Casey Primozic
3b78ca3c81 Remove unused variable (#607)
* It seems some new warning were added recently that exposed this.  I wrote the code that included this unused variable originally and it is indeed not needed.
2023-03-30 17:53:35 +00:00
david raistrick
dc396d9386 make : fix darwin f16c flags check (#615)
...there was no check.  ported upstream from https://github.com/zanussbaum/gpt4all.cpp/pull/2 (I dont see any clean path for upstream patches)
2023-03-30 20:34:45 +03:00
Georgi Gerganov
46bc56c86e ggml : fix NEON signs (close #620, #622) 2023-03-30 20:27:32 +03:00
slaren
c7a5aebde4 Fix GGML_F32Cx8_STORE in AVX without F16C path (#619) 2023-03-30 11:16:30 +02:00
anzz1
357f21576e ci : re-enable AVX512 testing (Windows-MSVC) (#584)
* CI: Re-enable AVX512 testing (Windows-MSVC)

Now with 100% less base64 encoding

* plain __cpuid is enough here
2023-03-29 23:44:39 +03:00
Georgi Gerganov
7639a7c89c ggml : init time on first ggml_init() call 2023-03-29 22:15:34 +03:00
Georgi Gerganov
ed1554989a llama : fix compile warnings when reading the vocab 2023-03-29 22:13:12 +03:00
Georgi Gerganov
169c724830 ggml : add ARM_NEON dequantize_row_q4_1() 2023-03-29 22:10:01 +03:00
Georgi Gerganov
31887afce7 ggml : add ARM_NEON quantize_row_q4_1() 2023-03-29 22:03:07 +03:00
Georgi Gerganov
fe3f4493ec ggml : add ARM_NEON ggml_vec_dot_q4_1() 2023-03-29 22:03:07 +03:00
Pavol Rusnak
f5b1f5b676 rename convert_ggml_to_pth.py -> convert-ggml-to-pth.py (#600)
to match filenames of other converters
2023-03-29 20:09:25 +02:00
Thérence
02ddd7f6d9 Create chat-13B.bat (#592)
* Create chat-13B.bat

Same script than chat-13B.sh, but for windows users.
Tested and working on windows 10/11 v 22H2

* Apply suggestions from code review

---------

Co-authored-by: anzz1 <anzz1@live.com>
2023-03-29 20:21:09 +03:00
Georgi Gerganov
32d84d4876 readme : fix typos 2023-03-29 19:38:31 +03:00
Georgi Gerganov
689ed6a51e readme : add GPT4All instructions (close #588) 2023-03-29 19:37:20 +03:00
Georgi Gerganov
39c1b01a04 py : add GPT4All conversion script
For now: copy-paste
Too much time for me to deduplicate the python code
2023-03-29 19:29:52 +03:00
Maël Kerbiriou
462548c4a1 llama : use the same threshold for OpenBLAS and ggml thread limiting (#577) 2023-03-29 19:10:07 +03:00
Tobias Lütke
0f9f0fdabf add example of re-act pattern (#583)
* add example of re-act pattern

* spelling...

* fixed whitespace in reverse prompt issue
2023-03-29 10:10:24 -05:00
anzz1
22ac42c847 Fix GCC warning about binary literal (#595)
0b10101010 -> 0xAA /* 0b10101010 */
2023-03-29 13:20:07 +00:00
anzz1
2b0da79a3a Fix typo in llama.h (#593) 2023-03-29 13:19:29 +00:00
anzz1
77f02cd5d0 Enable Fused-Multiply-Add (FMA) and F16C/CVT16 vector extensions on MSVC (#375)
* Enable Fused-Multiply-Add (FMA) instructions on MSVC

__FMA__ macro does not exist in MSVC

* Enable F16C/CVT16 vector extensions on MSVC

__F16C__ macro does not exist in MSVC, but is implied with AVX2/AVX512

* MSVC cvt intrinsics

* Add __SSE3__ macro for MSVC too because why not

even though it's not currently used for anything when AVX is defined
2023-03-28 22:44:29 +03:00
anzz1
056cb367c5 CI: fix subdirectory path globbing (#546)
- Changes in subdirectories will now be detecter properly
- (Windows-MSVC) AVX512 tests temporarily disabled
2023-03-28 22:43:25 +03:00
anzz1
651675b679 llama : fix linkage with mingw (#551)
* Revert 7e53955 (#542)

Still needs to be fixed properly

* Fix linking on mingw32
2023-03-28 21:23:09 +03:00
slaren
2fd21ada5b ggml : add AVX2 implementation of quantize_row_q4_1 (#515)
* Add AVX2 implementation of quantize_row_q4_1

* Actually use AVX2

* Make quantize_row_q4_1 static

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28 21:06:03 +03:00
thement
23728d6bd2 py : add temporary script to convert old ggml files to newer version (#539)
Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>
2023-03-28 20:55:42 +03:00
Tai Duc Nguyen
73978d1ad2 py : add capabiliy to convert from ggml back to torch or hf format for further consumption/training/finetuning (#403) 2023-03-28 20:51:29 +03:00
Stephan Walter
223cad655e ggml : refactor quantized processing functions (#509)
* Refactor quantized processing functions

* ggml : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28 20:13:01 +03:00
DooWoong Lee (David)
412b42ed29 py : removed unused model variable and verified that the code functions correctly with vocab_only setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (#547) 2023-03-28 20:02:34 +03:00
Georgi Gerganov
d4cd9f7004 ci : make ctest verbose, hopefully we see what is wrong with the sanitizer 2023-03-28 20:01:09 +03:00
Georgi Gerganov
c4f628288b tests : free llama context at the end of the test 2023-03-28 19:51:55 +03:00
Stephan Walter
188fb59d88 all : be more strict about converting float to double (#458)
* Be more strict about converting float to double

* Test equivalence of round, SILU implementations

Test module is commented out in CMakeLists.txt because the tests may
take a long time, depending on how much the compiler optimizes.

* Fix softmax in perplexity.cpp

* all : prefer float over double where appropriate

* perplexity : add <cmath>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28 19:48:20 +03:00
Jed Fox
a9b8ceaea2 deploy : add a Package.swift for SwiftPM support (#393)
* Add a Package.swift for SwiftPM support

* Swap from exclusions to allowlist
2023-03-28 19:39:01 +03:00
Stephan Walter
884f88402f ggml : introduce structs for the q4 data blocks (#356)
* Introduce structs for the q4 data blocks

* ggml : rename quant struct variables + fix ARM_NEON

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28 18:56:03 +03:00
Georgi Gerganov
eba3e4dba3 gitignore : add "embedding" 2023-03-28 18:34:35 +03:00
dotpy314
19bf52b793 Check the existence of f16_model_path_base in quantize.py (#574)
Co-authored-by: Jincheng Miao <jincheng.miao@gmail.com>
2023-03-28 18:06:28 +03:00
slaren
9ed607fdd5 Fix usage of F16C intrinsics in AVX code (#563)
* Fix usage of F16C intrinsics in AVX code when F16C is not defined
2023-03-28 17:26:55 +03:00
anzz1
68f43a13dc main.cpp fixes, refactoring (#571)
- main: entering empty line passes back control without new input in interactive/instruct modes
- instruct mode: keep prompt fix
- instruct mode: duplicate instruct prompt fix
- refactor: move common console code from main->common
2023-03-28 17:09:55 +03:00
RJ Adriaansen
d7f5b1ac65 Add embedding example to Makefile (#540) 2023-03-28 09:11:09 +03:00
Marco Matthies
63d2de599a Fix missing ggml link in cmake for examples/* on w64-mingw32 (#542) 2023-03-27 07:55:26 +03:00
Erik Scholz
2c6eed596e ci: add debug build to sanitizer build matrix (#527) 2023-03-26 15:48:40 +00:00
Stephan Walter
180198d957 Fix undefined variables in debug build, remove unused variables (#531) 2023-03-26 15:34:02 +00:00