ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-28 17:14:17 +00:00

Author	SHA1	Message	Date
DooWoong Lee (David)	412b42ed29	py : removed unused `model` variable and verified that the code functions correctly with `vocab_only` setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (#547 )	2023-03-28 20:02:34 +03:00
Georgi Gerganov	d4cd9f7004	ci : make ctest verbose, hopefully we see what is wrong with the sanitizer	2023-03-28 20:01:09 +03:00
Georgi Gerganov	c4f628288b	tests : free llama context at the end of the test	2023-03-28 19:51:55 +03:00
Stephan Walter	188fb59d88	all : be more strict about converting float to double (#458 ) * Be more strict about converting float to double * Test equivalence of round, SILU implementations Test module is commented out in CMakeLists.txt because the tests may take a long time, depending on how much the compiler optimizes. * Fix softmax in perplexity.cpp * all : prefer float over double where appropriate * perplexity : add <cmath> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-28 19:48:20 +03:00
Jed Fox	a9b8ceaea2	deploy : add a Package.swift for SwiftPM support (#393 ) * Add a Package.swift for SwiftPM support * Swap from exclusions to allowlist	2023-03-28 19:39:01 +03:00
Stephan Walter	884f88402f	ggml : introduce structs for the q4 data blocks (#356 ) * Introduce structs for the q4 data blocks * ggml : rename quant struct variables + fix ARM_NEON --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-28 18:56:03 +03:00
Georgi Gerganov	eba3e4dba3	gitignore : add "embedding"	2023-03-28 18:34:35 +03:00
dotpy314	19bf52b793	Check the existence of f16_model_path_base in quantize.py (#574 ) Co-authored-by: Jincheng Miao <jincheng.miao@gmail.com>	2023-03-28 18:06:28 +03:00
slaren	9ed607fdd5	Fix usage of F16C intrinsics in AVX code (#563 ) * Fix usage of F16C intrinsics in AVX code when F16C is not defined	2023-03-28 17:26:55 +03:00
anzz1	68f43a13dc	main.cpp fixes, refactoring (#571 ) - main: entering empty line passes back control without new input in interactive/instruct modes - instruct mode: keep prompt fix - instruct mode: duplicate instruct prompt fix - refactor: move common console code from main->common	2023-03-28 17:09:55 +03:00
RJ Adriaansen	d7f5b1ac65	Add embedding example to Makefile (#540 )	2023-03-28 09:11:09 +03:00
Marco Matthies	63d2de599a	Fix missing ggml link in cmake for examples/* on w64-mingw32 (#542 )	2023-03-27 07:55:26 +03:00
Erik Scholz	2c6eed596e	ci: add debug build to sanitizer build matrix (#527 )	2023-03-26 15:48:40 +00:00
Stephan Walter	180198d957	Fix undefined variables in debug build, remove unused variables (#531 )	2023-03-26 15:34:02 +00:00
Juan Calderon-Perez	47fc0b82b4	Add support for linux/arm64 platform during Docker Builds (#514 ) * Add support for linux/arm64 platform * Add platform to versioned builds	2023-03-26 14:48:42 +00:00
Stephan Walter	3b8b2c584a	Update README and comments for standalone perplexity tool (#525 )	2023-03-26 16:14:01 +03:00
anzz1	a990294c27	[main] fix infinite generation (-n == -1) (#523 )	2023-03-26 16:06:10 +03:00
Georgi Gerganov	3600f1d140	Add logo to README.md	2023-03-26 10:20:49 +03:00
Harald Fernengel	85e558b4ad	Exit from interactive mode if input stream is bad (#491 ) Allow exiting the interactive prompt also with CTRL-D on Unix and CTRL-Z on Windows.	2023-03-26 08:25:46 +03:00
anzz1	5c63c02491	CI: Run other sanitizer builds even if one fails (#511 ) applies only to sanitizer builds so they wont be cancelled	2023-03-26 00:13:28 +02:00
jp-x-g	9c2b80f69b	Clarify console output in convert-pth-to-ggml.py (#512 ) "Processing part 1 of 3" instead of "Processing part 0"	2023-03-25 23:53:55 +02:00
anzz1	1ea6448129	CMake / CI additions (#497 ) * CMake: Add AVX512 option * CI: Add AVX/AVX512 builds (Windows) (AVX512 tests can only be run when the worker happens to support it, building works anyway) * CMake: Fix sanitizer linkage ( merged #468 ) * CI: Add sanitizer builds (Ubuntu) * CI: Fix release tagging (change @zendesk/action-create-release to @anzz1/action-create-release until upstream PR Added commitish as input zendesk/action-create-release#32 is merged)	2023-03-25 23:38:11 +02:00
anzz1	f8eb92869e	(Windows) Set console to UTF-8 on init (#420 ) Sets console codepage to 65001 (CP_UTF8) on start for both input and output, should fix problems with UTF-8 characters.	2023-03-25 22:29:22 +02:00
Georgi Gerganov	2e01c018d2	Fix colors enabling on WIN32	2023-03-25 21:53:39 +02:00
Georgi Gerganov	9fe0e95688	If n_predict == -1, generate forever	2023-03-25 21:51:41 +02:00
Georgi Gerganov	310d5d09a3	Inifinite generation via context swapping (#71 )	2023-03-25 21:36:22 +02:00
Georgi Gerganov	3468a153ba	Cleanup STL headers + fix embedding examples + minor stuff	2023-03-25 20:51:14 +02:00
Georgi Gerganov	9d678e17dc	Move chat scripts into "./examples"	2023-03-25 20:37:09 +02:00
slaren	4b720d5b92	Add AVX2 implementation of dequantize_row_q4_1 (#505 )	2023-03-25 20:31:48 +02:00
Georgi Gerganov	84db7c0b8f	Overhaul the examples structure - main -> examples - utils -> examples (renamed to "common") - quantize -> examples - separate tools for "perplexity" and "embedding" Hope I didn't break something !	2023-03-25 20:26:40 +02:00
Georgi Gerganov	56e7297bbd	Retire the ggml_mul_mat() branch for transposed src0 (#500 ) * Retire the ggml_mul_mat() for transposed src0 - It can always be made contiguous with ggml_cpy() - The code is now simplified - The results are deterministic in respect to num threads * SIMD-ify dequantize_row_q4_0() for ARM_NEON (#502) * Attempt to SIMD-ify dequantize_row_q4_0() for ARM_NEON * Fix dequantization - forgot to interleave the quants	2023-03-25 19:47:21 +02:00
Georgi Gerganov	d2336726ee	Disable prompt verbosity by default and add option to enable (#480 )	2023-03-25 17:17:16 +02:00
slaren	432b98793c	Add AVX2 implementation of dequantize_row_q4_0 (#467 )	2023-03-25 17:06:49 +02:00
Georgi Gerganov	9f8548b2d5	Don't interefe with BLAS for large prompts by running only 1 thread	2023-03-25 17:03:10 +02:00
Georgi Gerganov	f6a2b1fc20	Add longer DAN prompt for testing big batch numbers	2023-03-25 16:49:09 +02:00
slaren	e66804f2d7	Add timings for the prompt evaluation (#478 )	2023-03-25 16:34:23 +02:00
Georgi Gerganov	1c1459f073	Remove obsolete information from README	2023-03-25 16:30:32 +02:00
Georgi Gerganov	39ab880ccd	Remove obsolete assert and fix compiler warning	2023-03-25 16:22:05 +02:00
Georgi Gerganov	0bbf9a17e7	Fix nasty bug in ggml_compute_forward_mul_mat_f32() and reenable BLAS	2023-03-25 16:10:14 +02:00
anzz1	f60b207880	bounds checking for input prefix (#492 )	2023-03-25 14:42:09 +02:00
anzz1	e0522e5dd3	feat: '--in-prefix STRING' option (#426 ) Prefix user inputs with a string	2023-03-25 14:03:19 +02:00
Jed Fox	3261abc446	Add support for file load progress reporting callbacks (#434 ) * File load progress reporting * Move llama_progress_handler into llama_context_params * Renames * Use seekg to find file size instead * More correct load progress * Call progress callback more frequently * Fix typo	2023-03-25 07:26:28 +02:00
Doomsdayrs	27d29a069f	Add missing struct annotation (#483 ) `llama_sample_top_p_top_k` was missing the struct annotation on line 126. This causes a compiler issue when being parsed by the Kotlin C interop generator. This commit fixes the above issue by adding the struct annotation.	2023-03-25 07:21:24 +02:00
Chris Kuehl	9ba873f48c	Fix crash for 65B model with pre-allocated memory (#485 )	2023-03-25 06:38:14 +02:00
Georgi Gerganov	0965918677	Disable BLAS altogether - the bug is not just for qunatized mat mul	2023-03-24 23:47:06 +02:00
Georgi Gerganov	76e580d933	Disable BLAS branch in mul_mat - seems there is a bug	2023-03-24 23:39:17 +02:00
Georgi Gerganov	ba186f7f64	Immediately start processing the prompt before user input has been provided (#476 )	2023-03-24 23:17:58 +02:00
Georgi Gerganov	92dc17b275	Reduce memory usage and allocate enough memory for largest context (#473 ) * Reduce memory usage and allocate enough memory for large contexts * Simpler scratch buffer usage * Reenable BLAS for quantized mul_mat * Fix number of layers in 30B and 65B * Fix KV cache size for F32	2023-03-24 23:17:37 +02:00
Georgi Gerganov	a1a48cfccb	Temporary bump the memory buffer size - hopefully fix issues from `483bab2e`	2023-03-24 18:23:56 +02:00
Gary Mulder	ccf5a1b08d	Update README.md (#444 ) Added explicit bolded instructions clarifying that people need to request access to models from Facebook and never through through this repo.	2023-03-24 15:23:09 +00:00

1 2 3 4 5

230 Commits