Matvey Soloviev
a30749e299
Fix token count accounting
2023-03-13 01:04:41 +01:00
Georgi Gerganov
49a8c7675b
Revert "10% performance boost on ARM"
...
This reverts commit 113a9e83eb .
There are some reports for illegal instruction.
Moved this stuff to vdotq_s32 branch until resolve
2023-03-13 01:28:08 +02:00
Georgi Gerganov
c47fa0ea5e
Check for vdotq_s32 availability
2023-03-13 01:21:03 +02:00
Georgi Gerganov
c00675331e
Ammend to previous commit - forgot to update non-QRDMX branch
2023-03-13 01:05:24 +02:00
Georgi Gerganov
f48b7628ea
10% performance boost on ARM
2023-03-13 00:56:10 +02:00
Matvey Soloviev
fedc405b41
Fix color getting reset before prompt output done ( #65 )
...
(cherry picked from commit 7eb2987619feee04c40eff69b604017d09919cb6)
2023-03-13 00:07:34 +02:00
Georgi Gerganov
c240cd1e05
Update README.md
2023-03-12 23:39:01 +02:00
Matvey Soloviev
d35528087e
Add interactive mode ( #61 )
...
* Initial work on interactive mode.
* Improve interactive mode. Make rev. prompt optional.
* Update README to explain interactive mode.
* Fix OS X build
2023-03-12 23:13:28 +02:00
Marc Köhlbrugge
8de246c2d8
Fix typo in README ( #45 )
2023-03-12 22:30:08 +02:00
Ben Garney
7a708ee9b0
Allow using prompt files ( #59 )
2023-03-12 22:28:36 +02:00
beiller
c763dc1bc2
Add back top_k ( #56 )
...
* Add back top_k
* Update utils.cpp
* Update utils.h
---------
Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2023-03-12 22:23:15 +02:00
Sebastián A
fde84afbed
Windows fixes ( #31 )
...
* Apply fixes suggested to build on windows
Issue: https://github.com/ggerganov/llama.cpp/issues/22
* Remove unsupported VLAs
* MSVC: Remove features that are only available on MSVC C++20.
* Fix zero initialization of the other fields.
* Change the use of vector for stack allocations.
2023-03-12 22:15:00 +02:00
Georgi Gerganov
f6f3f1c7c1
Update README.md
2023-03-12 22:09:26 +02:00
Georgi Gerganov
1f0283048a
Add CI ( #60 )
2023-03-12 22:08:24 +02:00
Georgi Gerganov
85c71945cf
Revert "weights_only" arg - this causing more trouble than help
2023-03-12 20:59:01 +02:00
Oleksandr Nikitin
a7cf72d75e
python/pytorch compat notes ( #44 )
2023-03-12 14:16:33 +02:00
beiller
a63a748bba
Add repetition penalty ( #20 )
...
* Adding repeat penalization
* Update utils.h
* Update utils.cpp
* Numeric fix
Should probably still scale by temp even if penalized
* Update comments, more proper application
I see that numbers can go negative so a fix from a referenced commit
* Minor formatting
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2023-03-12 11:27:42 +02:00
Georgi Gerganov
dc91ec5d67
Clarify meaning of hacking
2023-03-12 09:03:25 +02:00
Georgi Gerganov
95fb97b137
README: add "Supported platforms" + update hot topics
2023-03-12 08:41:54 +02:00
deepdiffuser
a9b53a036b
use weights_only in conversion script ( #32 )
...
this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries
2023-03-12 08:36:35 +02:00
Pavol Rusnak
865eff3820
Add LICENSE ( #21 )
2023-03-12 08:36:03 +02:00
Georgi Gerganov
e34e3e21c4
Update README.md
2023-03-12 01:26:32 +02:00
Juraj Bednar
4cdcd39348
Fix a typo in model name ( #16 )
2023-03-11 19:32:20 +02:00
Georgi Gerganov
284d9be2de
Update README.md
2023-03-11 18:10:18 +02:00
Georgi Gerganov
cc0f26bef3
Add AVX2 support for x86 architectures thanks to @Const-me !
2023-03-11 18:04:25 +02:00
Georgi Gerganov
bc3184cb2d
Fix un-initialized FP16 tables on x86 ( #15 , #2 )
2023-03-11 17:40:14 +02:00
Georgi Gerganov
5afe16962e
Bump memory buffer
2023-03-11 12:45:01 +02:00
Georgi Gerganov
35cb0d2a39
Update README.md
2023-03-11 12:31:21 +02:00
Georgi Gerganov
34bb8821d6
.gitignore models/
2023-03-11 12:27:02 +02:00
Georgi Gerganov
2d2cadab68
Update Makefile var + add comment
2023-03-11 12:27:02 +02:00
Georgi Gerganov
657074b014
Update README.md
2023-03-11 11:34:25 +02:00
Georgi Gerganov
b53c6356f3
Update README.md
2023-03-11 11:34:11 +02:00
Georgi Gerganov
a2799521b9
Support all LLaMA models + change Q4_0 quantization storage
2023-03-11 11:28:30 +02:00
Simon Willison
d4919344b1
Include Python dependencies in README ( #6 )
2023-03-11 07:47:26 +02:00
Georgi Gerganov
11dae511e3
Update README.md
2023-03-11 01:30:47 +02:00
Georgi Gerganov
240b0bf6ea
Update README.md
2023-03-11 01:22:58 +02:00
Georgi Gerganov
87da10c739
Update README.md
2023-03-11 01:18:10 +02:00
Jean-Michaël Celerier
12131a74cb
Add missing headers for memcpy and assert ( #3 )
2023-03-11 01:04:06 +02:00
Georgi Gerganov
01e3d38e1c
Update README.md
2023-03-11 00:55:22 +02:00
Georgi Gerganov
8d38e7e279
Update README.md
2023-03-11 00:51:46 +02:00
Georgi Gerganov
586e0f1f3d
Update README.md
2023-03-11 00:09:19 +02:00
Georgi Gerganov
4c7f13c170
Update README.md
2023-03-10 23:53:11 +02:00
Georgi Gerganov
8453184bb2
Fix a bug in the rope calculation
2023-03-10 23:46:57 +02:00
Georgi Gerganov
44f3a5b932
Update README.md
2023-03-10 21:52:27 +02:00
Georgi Gerganov
3cda59d04e
Final touches
2023-03-10 21:50:46 +02:00
Georgi Gerganov
b2a7bb3e19
Create README.md
2023-03-10 21:47:46 +02:00
Georgi Gerganov
4b5b86d6ee
Initial release
2023-03-10 20:56:40 +02:00