Tobias Lütke
0f9f0fdabf
add example of re-act pattern ( #583 )
...
* add example of re-act pattern
* spelling...
* fixed whitespace in reverse prompt issue
2023-03-29 10:10:24 -05:00
anzz1
22ac42c847
Fix GCC warning about binary literal ( #595 )
...
0b10101010 -> 0xAA /* 0b10101010 */
2023-03-29 13:20:07 +00:00
anzz1
2b0da79a3a
Fix typo in llama.h ( #593 )
2023-03-29 13:19:29 +00:00
anzz1
77f02cd5d0
Enable Fused-Multiply-Add (FMA) and F16C/CVT16 vector extensions on MSVC ( #375 )
...
* Enable Fused-Multiply-Add (FMA) instructions on MSVC
__FMA__ macro does not exist in MSVC
* Enable F16C/CVT16 vector extensions on MSVC
__F16C__ macro does not exist in MSVC, but is implied with AVX2/AVX512
* MSVC cvt intrinsics
* Add __SSE3__ macro for MSVC too because why not
even though it's not currently used for anything when AVX is defined
2023-03-28 22:44:29 +03:00
anzz1
056cb367c5
CI: fix subdirectory path globbing ( #546 )
...
- Changes in subdirectories will now be detecter properly
- (Windows-MSVC) AVX512 tests temporarily disabled
2023-03-28 22:43:25 +03:00
anzz1
651675b679
llama : fix linkage with mingw ( #551 )
...
* Revert 7e53955 (#542 )
Still needs to be fixed properly
* Fix linking on mingw32
2023-03-28 21:23:09 +03:00
slaren
2fd21ada5b
ggml : add AVX2 implementation of quantize_row_q4_1 ( #515 )
...
* Add AVX2 implementation of quantize_row_q4_1
* Actually use AVX2
* Make quantize_row_q4_1 static
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2023-03-28 21:06:03 +03:00
thement
23728d6bd2
py : add temporary script to convert old ggml files to newer version ( #539 )
...
Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net >
2023-03-28 20:55:42 +03:00
Tai Duc Nguyen
73978d1ad2
py : add capabiliy to convert from ggml back to torch or hf format for further consumption/training/finetuning ( #403 )
2023-03-28 20:51:29 +03:00
Stephan Walter
223cad655e
ggml : refactor quantized processing functions ( #509 )
...
* Refactor quantized processing functions
* ggml : minor
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2023-03-28 20:13:01 +03:00
DooWoong Lee (David)
412b42ed29
py : removed unused model variable and verified that the code functions correctly with vocab_only setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. ( #547 )
2023-03-28 20:02:34 +03:00
Georgi Gerganov
d4cd9f7004
ci : make ctest verbose, hopefully we see what is wrong with the sanitizer
2023-03-28 20:01:09 +03:00
Georgi Gerganov
c4f628288b
tests : free llama context at the end of the test
2023-03-28 19:51:55 +03:00
Stephan Walter
188fb59d88
all : be more strict about converting float to double ( #458 )
...
* Be more strict about converting float to double
* Test equivalence of round, SILU implementations
Test module is commented out in CMakeLists.txt because the tests may
take a long time, depending on how much the compiler optimizes.
* Fix softmax in perplexity.cpp
* all : prefer float over double where appropriate
* perplexity : add <cmath>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2023-03-28 19:48:20 +03:00
Jed Fox
a9b8ceaea2
deploy : add a Package.swift for SwiftPM support ( #393 )
...
* Add a Package.swift for SwiftPM support
* Swap from exclusions to allowlist
2023-03-28 19:39:01 +03:00
Stephan Walter
884f88402f
ggml : introduce structs for the q4 data blocks ( #356 )
...
* Introduce structs for the q4 data blocks
* ggml : rename quant struct variables + fix ARM_NEON
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2023-03-28 18:56:03 +03:00
Georgi Gerganov
eba3e4dba3
gitignore : add "embedding"
2023-03-28 18:34:35 +03:00
dotpy314
19bf52b793
Check the existence of f16_model_path_base in quantize.py ( #574 )
...
Co-authored-by: Jincheng Miao <jincheng.miao@gmail.com >
2023-03-28 18:06:28 +03:00
slaren
9ed607fdd5
Fix usage of F16C intrinsics in AVX code ( #563 )
...
* Fix usage of F16C intrinsics in AVX code when F16C is not defined
2023-03-28 17:26:55 +03:00
anzz1
68f43a13dc
main.cpp fixes, refactoring ( #571 )
...
- main: entering empty line passes back control without new input in interactive/instruct modes
- instruct mode: keep prompt fix
- instruct mode: duplicate instruct prompt fix
- refactor: move common console code from main->common
2023-03-28 17:09:55 +03:00
RJ Adriaansen
d7f5b1ac65
Add embedding example to Makefile ( #540 )
2023-03-28 09:11:09 +03:00
Marco Matthies
63d2de599a
Fix missing ggml link in cmake for examples/* on w64-mingw32 ( #542 )
2023-03-27 07:55:26 +03:00
Erik Scholz
2c6eed596e
ci: add debug build to sanitizer build matrix ( #527 )
2023-03-26 15:48:40 +00:00
Stephan Walter
180198d957
Fix undefined variables in debug build, remove unused variables ( #531 )
2023-03-26 15:34:02 +00:00
Juan Calderon-Perez
47fc0b82b4
Add support for linux/arm64 platform during Docker Builds ( #514 )
...
* Add support for linux/arm64 platform
* Add platform to versioned builds
2023-03-26 14:48:42 +00:00
Stephan Walter
3b8b2c584a
Update README and comments for standalone perplexity tool ( #525 )
2023-03-26 16:14:01 +03:00
anzz1
a990294c27
[main] fix infinite generation (-n == -1) ( #523 )
2023-03-26 16:06:10 +03:00
Georgi Gerganov
3600f1d140
Add logo to README.md
2023-03-26 10:20:49 +03:00
Harald Fernengel
85e558b4ad
Exit from interactive mode if input stream is bad ( #491 )
...
Allow exiting the interactive prompt also with CTRL-D on Unix and CTRL-Z
on Windows.
2023-03-26 08:25:46 +03:00
anzz1
5c63c02491
CI: Run other sanitizer builds even if one fails ( #511 )
...
applies only to sanitizer builds so they wont be cancelled
2023-03-26 00:13:28 +02:00
jp-x-g
9c2b80f69b
Clarify console output in convert-pth-to-ggml.py ( #512 )
...
"Processing part 1 of 3" instead of "Processing part 0"
2023-03-25 23:53:55 +02:00
anzz1
1ea6448129
CMake / CI additions ( #497 )
...
* CMake: Add AVX512 option
* CI: Add AVX/AVX512 builds (Windows)
(AVX512 tests can only be run when the worker happens to support it, building works anyway)
* CMake: Fix sanitizer linkage ( merged #468 )
* CI: Add sanitizer builds (Ubuntu)
* CI: Fix release tagging
(change @zendesk/action-create-release to @anzz1/action-create-release until upstream PR Added commitish as input zendesk/action-create-release#32 is merged)
2023-03-25 23:38:11 +02:00
anzz1
f8eb92869e
(Windows) Set console to UTF-8 on init ( #420 )
...
Sets console codepage to 65001 (CP_UTF8) on start for both input and output, should fix problems with UTF-8 characters.
2023-03-25 22:29:22 +02:00
Georgi Gerganov
2e01c018d2
Fix colors enabling on WIN32
2023-03-25 21:53:39 +02:00
Georgi Gerganov
9fe0e95688
If n_predict == -1, generate forever
2023-03-25 21:51:41 +02:00
Georgi Gerganov
310d5d09a3
Inifinite generation via context swapping ( #71 )
2023-03-25 21:36:22 +02:00
Georgi Gerganov
3468a153ba
Cleanup STL headers + fix embedding examples + minor stuff
2023-03-25 20:51:14 +02:00
Georgi Gerganov
9d678e17dc
Move chat scripts into "./examples"
2023-03-25 20:37:09 +02:00
slaren
4b720d5b92
Add AVX2 implementation of dequantize_row_q4_1 ( #505 )
2023-03-25 20:31:48 +02:00
Georgi Gerganov
84db7c0b8f
Overhaul the examples structure
...
- main -> examples
- utils -> examples (renamed to "common")
- quantize -> examples
- separate tools for "perplexity" and "embedding"
Hope I didn't break something !
2023-03-25 20:26:40 +02:00
Georgi Gerganov
56e7297bbd
Retire the ggml_mul_mat() branch for transposed src0 ( #500 )
...
* Retire the ggml_mul_mat() for transposed src0
- It can always be made contiguous with ggml_cpy()
- The code is now simplified
- The results are deterministic in respect to num threads
* SIMD-ify dequantize_row_q4_0() for ARM_NEON (#502 )
* Attempt to SIMD-ify dequantize_row_q4_0() for ARM_NEON
* Fix dequantization - forgot to interleave the quants
2023-03-25 19:47:21 +02:00
Georgi Gerganov
d2336726ee
Disable prompt verbosity by default and add option to enable ( #480 )
2023-03-25 17:17:16 +02:00
slaren
432b98793c
Add AVX2 implementation of dequantize_row_q4_0 ( #467 )
2023-03-25 17:06:49 +02:00
Georgi Gerganov
9f8548b2d5
Don't interefe with BLAS for large prompts by running only 1 thread
2023-03-25 17:03:10 +02:00
Georgi Gerganov
f6a2b1fc20
Add longer DAN prompt for testing big batch numbers
2023-03-25 16:49:09 +02:00
slaren
e66804f2d7
Add timings for the prompt evaluation ( #478 )
2023-03-25 16:34:23 +02:00
Georgi Gerganov
1c1459f073
Remove obsolete information from README
2023-03-25 16:30:32 +02:00
Georgi Gerganov
39ab880ccd
Remove obsolete assert and fix compiler warning
2023-03-25 16:22:05 +02:00
Georgi Gerganov
0bbf9a17e7
Fix nasty bug in ggml_compute_forward_mul_mat_f32() and reenable BLAS
2023-03-25 16:10:14 +02:00
anzz1
f60b207880
bounds checking for input prefix ( #492 )
2023-03-25 14:42:09 +02:00