Commit Graph

32 Commits

Author SHA1 Message Date
tjohnman
815b60c690 Don't force immediate interactive without -i (#354)
* Don't force immediate interactive without -i

Sometimes we might want to use a reverse prompt but we want to let the
model generate tokens right after the initial prompt. So we don't force
user input mode if the -i flag wasn't specified and instead let it run
until we encounter the reverse prompt.

This gives use some more flexibility, since it doesn't force the user to
enter a newline if they want to let the model generate text right after
the initial prompt and only be asked for input if the reverse prompt is
encountered.

The `--interactive-first` flag is reintroduced to force the old
behavior. `-r` behaves like `-i` plus introduces a reverse prompt (it
can be specified more than once).

* Update help output.

---------

Co-authored-by: Johnman <tjohnman@github>
2023-03-22 19:16:35 +02:00
Erik Scholz
48c8ad5bcf fix perplexity after c-api refactor (#390)
* preallocate a buffer of fitting size for tokenization (utils.cpp)

* don't create a new std::string (especially here, where it's usually large)
2023-03-22 18:09:38 +02:00
Georgi Gerganov
dc2abfbdf0 When seed <= 0 - use the clock to generate one 2023-03-22 07:47:15 +02:00
Georgi Gerganov
c40b5b3d59 Introduce C-style API (#370)
* Major refactoring - introduce C-style API

* Clean up

* Add <cassert>

* Add <iterator>

* Add <algorithm> ....

* Fix timing reporting and accumulation

* Measure eval time only for single-token calls

* Change llama_tokenize return meaning
2023-03-22 07:32:36 +02:00
Fabio R. Sluzala
edb2b84e60 We could use std::unordered_map over std::map (#305)
* Improve performance by changing std::map to std::unordered_map and std::map<id, token> id_to_token; to std::vector<token> id_to_token;

* fix last commit on gpt_vocab_init add vocab.id_to_token.resize(vocab.token_to_id.size());

* Removed include <map>

* Nest struct token score inside gpt_vocab

* renamed token to tok
2023-03-21 19:21:50 +02:00
Gary Linscott
4458de7520 Compute perplexity over prompt (#270)
* Compute perplexity over prompt

* More accurate perplexity calculation - over all logits in the context window (so 512x more tokens!)

* Output all perplexitiies

* Add timing/ETA
2023-03-21 18:27:42 +02:00
Kevin Lo
f2222d00b1 Add OpenBSD support (#314) 2023-03-21 17:50:09 +02:00
anzz1
a2c6b64d8d cmdline option for custom amount of model parts (--n_parts N) (#348)
* cmdline option for custom amount of model parts (--n_parts N)

* Update main.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-21 17:42:43 +02:00
Georgi Gerganov
26d3089d76 Add tokenizer test + revert to C++11 (#355)
* Add test-tokenizer-0 to do a few tokenizations - feel free to expand
* Added option to convert-pth-to-ggml.py script to dump just the vocabulary
* Added ./models/ggml-vocab.bin containing just LLaMA vocab data (used for tests)
* Added utility to load vocabulary file from previous point (temporary implementation)
* Avoid using std::string_view and drop back to C++11 (hope I didn't break something)
* Rename gpt_vocab -> llama_vocab
* All CMake binaries go into ./bin/ now
2023-03-21 17:29:41 +02:00
Mack Straight
60d93896be sentencepiece bpe compatible tokenizer (#252)
* potential out of bounds read

* fix quantize

* style

* Update convert-pth-to-ggml.py

* mild cleanup

* don't need the space-prefixing here rn since main.cpp already does it

* new file magic + version header field

* readme notice

* missing newlines

Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>
2023-03-20 03:17:23 -07:00
tjohnman
3a9d372c08 Support for multiple reverse prompts. (#299)
Co-authored-by: Johnman <>
Co-authored-by: Johnman <tjohnman@github>
2023-03-19 21:33:06 +02:00
tjohnman
eacfb91e66 Make prompt randomization optional. (#300)
Co-authored-by: Johnman <>
2023-03-19 20:36:19 +02:00
slaren
d51eb6a70f Add --ignore-eos parameter (#181)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-19 20:22:48 +02:00
Erik Scholz
30ea62e25d Command line switch to use F16 for memory_k and memory_v (refactor of #154) (#294)
* Use F16 for memory_k and memory_v

* add command line switch to use f16 instead of f32 for memory k+v

---------

Co-authored-by: Ty Everett <ty@tyweb.us>
2023-03-19 19:57:00 +02:00
Georgi Gerganov
5ff8d6f48e Drop trailing new line from file prompts (#80) 2023-03-19 19:05:04 +02:00
Georgi Gerganov
2ad520e892 Add "--instruct" argument for usage with Alpaca (#240)
Also start adding prompts in "./prompts"
2023-03-19 18:37:02 +02:00
Gary Linscott
ce7029fe88 Fix n^2 loop in tokenization (#254)
This causes long prompts to parse very slowly.
2023-03-18 11:17:19 +00:00
thement
95eab2152b Implement non-greedy tokenizer that tries to maximize token lengths (#242)
* Implement non-greedy tokenizer that tries to maximize token lengths

* Insert single space in front of the prompt

- this is to match original llama tokenizer behavior

---------

Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>
2023-03-17 21:05:58 +01:00
Stephan Walter
45113b2f42 Don't tell users to use a bad number of threads (#243)
The readme tells people to use the command line option "-t 8", causing 8
threads to be started. On systems with fewer than 8 cores, this causes a
significant slowdown. Remove the option from the example command lines
and use /proc/cpuinfo on Linux to determine a sensible default.
2023-03-17 19:47:35 +02:00
Matvey Soloviev
03c8e88515 Q4_1 quantization (#193)
* Add AVX2 version of ggml_vec_dot_q4_1

* Small optimisations to q4_1 dot product (@Const-me)

* Rearrange Q4_1 quantization to work for multipart models. (Fix #152)

* Fix ggml_vec_mad_q4_1 too

* Fix non-vectorised q4_1 vec mul
2023-03-17 06:48:39 +02:00
Justin Suess
a4d17b7096 added ctx_size parameter (#148)
* added ctx_size parameter

* added it in more places

* Apply suggestions from code review

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-15 21:42:40 +02:00
Thomas Klausner
d3ed019b74 Add NetBSD support. (#90) 2023-03-13 18:40:54 +02:00
Matvey Soloviev
d35528087e Add interactive mode (#61)
* Initial work on interactive mode.

* Improve interactive mode. Make rev. prompt optional.

* Update README to explain interactive mode.

* Fix OS X build
2023-03-12 23:13:28 +02:00
Ben Garney
7a708ee9b0 Allow using prompt files (#59) 2023-03-12 22:28:36 +02:00
beiller
c763dc1bc2 Add back top_k (#56)
* Add back top_k

* Update utils.cpp

* Update utils.h

---------

Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-12 22:23:15 +02:00
Sebastián A
fde84afbed Windows fixes (#31)
* Apply fixes suggested to build on windows

Issue: https://github.com/ggerganov/llama.cpp/issues/22

* Remove unsupported VLAs

* MSVC: Remove features that are only available on MSVC C++20.

* Fix zero initialization of the other fields.

* Change the use of vector for stack allocations.
2023-03-12 22:15:00 +02:00
beiller
a63a748bba Add repetition penalty (#20)
* Adding repeat penalization

* Update utils.h

* Update utils.cpp

* Numeric fix

Should probably still scale by temp even if penalized

* Update comments, more proper application

I see that numbers can go negative so a fix from a referenced commit

* Minor formatting

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-12 11:27:42 +02:00
Georgi Gerganov
a2799521b9 Support all LLaMA models + change Q4_0 quantization storage 2023-03-11 11:28:30 +02:00
Jean-Michaël Celerier
12131a74cb Add missing headers for memcpy and assert (#3) 2023-03-11 01:04:06 +02:00
Georgi Gerganov
8453184bb2 Fix a bug in the rope calculation 2023-03-10 23:46:57 +02:00
Georgi Gerganov
3cda59d04e Final touches 2023-03-10 21:50:46 +02:00
Georgi Gerganov
4b5b86d6ee Initial release 2023-03-10 20:56:40 +02:00