ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 17:20:01 +00:00

Author	SHA1	Message	Date
Pavol Rusnak	e429f5b9e0	Use fprintf for diagnostic output (#48 ) keep printf only for printing model output one can now use ./main ... 2>dev/null to suppress any diagnostic output	2023-03-13 18:39:56 +02:00
Georgi Gerganov	c1eebc2a25	Use vdotq_s32 to improve performance (#67 ) * 10% performance boost on ARM * Back to original change	2023-03-13 18:36:44 +02:00
uint256_t	a81c113197	Reduce model loading time (#43 ) * Use buffering * Use vector * Minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-13 18:33:43 +02:00
Val Kharitonov	d35d36dff8	Fix UTF-8 handling (including colors) (#79 )	2023-03-13 18:24:18 +02:00
Pavol Rusnak	b84a31d659	Add quantize script for batch quantization (#92 ) * Add quantize script for batch quantization * Indentation * README for new quantize.sh * Fix script name * Fix file list on Mac OS --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-13 18:15:20 +02:00
Georgi Gerganov	67d50a97b4	Add initial contribution guidelines	2023-03-13 09:42:26 +02:00
Matvey Soloviev	00be0e42e4	Gate signal support on being on a unixoid system. (#74 )	2023-03-13 04:08:01 +01:00
Matvey Soloviev	a30749e299	Fix token count accounting	2023-03-13 01:04:41 +01:00
Georgi Gerganov	49a8c7675b	Revert "10% performance boost on ARM" This reverts commit `113a9e83eb`. There are some reports for illegal instruction. Moved this stuff to vdotq_s32 branch until resolve	2023-03-13 01:28:08 +02:00
Georgi Gerganov	c47fa0ea5e	Check for vdotq_s32 availability	2023-03-13 01:21:03 +02:00
Georgi Gerganov	c00675331e	Ammend to previous commit - forgot to update non-QRDMX branch	2023-03-13 01:05:24 +02:00
Georgi Gerganov	f48b7628ea	10% performance boost on ARM	2023-03-13 00:56:10 +02:00
Matvey Soloviev	fedc405b41	Fix color getting reset before prompt output done (#65 ) (cherry picked from commit 7eb2987619feee04c40eff69b604017d09919cb6)	2023-03-13 00:07:34 +02:00
Georgi Gerganov	c240cd1e05	Update README.md	2023-03-12 23:39:01 +02:00
Matvey Soloviev	d35528087e	Add interactive mode (#61 ) * Initial work on interactive mode. * Improve interactive mode. Make rev. prompt optional. * Update README to explain interactive mode. * Fix OS X build	2023-03-12 23:13:28 +02:00
Marc Köhlbrugge	8de246c2d8	Fix typo in README (#45 )	2023-03-12 22:30:08 +02:00
Ben Garney	7a708ee9b0	Allow using prompt files (#59 )	2023-03-12 22:28:36 +02:00
beiller	c763dc1bc2	Add back top_k (#56 ) * Add back top_k * Update utils.cpp * Update utils.h --------- Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-12 22:23:15 +02:00
Sebastián A	fde84afbed	Windows fixes (#31 ) * Apply fixes suggested to build on windows Issue: https://github.com/ggerganov/llama.cpp/issues/22 * Remove unsupported VLAs * MSVC: Remove features that are only available on MSVC C++20. * Fix zero initialization of the other fields. * Change the use of vector for stack allocations.	2023-03-12 22:15:00 +02:00
Georgi Gerganov	f6f3f1c7c1	Update README.md	2023-03-12 22:09:26 +02:00
Georgi Gerganov	1f0283048a	Add CI (#60 )	2023-03-12 22:08:24 +02:00
Georgi Gerganov	85c71945cf	Revert "weights_only" arg - this causing more trouble than help	2023-03-12 20:59:01 +02:00
Oleksandr Nikitin	a7cf72d75e	python/pytorch compat notes (#44 )	2023-03-12 14:16:33 +02:00
beiller	a63a748bba	Add repetition penalty (#20 ) * Adding repeat penalization * Update utils.h * Update utils.cpp * Numeric fix Should probably still scale by temp even if penalized * Update comments, more proper application I see that numbers can go negative so a fix from a referenced commit * Minor formatting --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-12 11:27:42 +02:00
Georgi Gerganov	dc91ec5d67	Clarify meaning of hacking	2023-03-12 09:03:25 +02:00
Georgi Gerganov	95fb97b137	README: add "Supported platforms" + update hot topics	2023-03-12 08:41:54 +02:00
deepdiffuser	a9b53a036b	use weights_only in conversion script (#32 ) this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries	2023-03-12 08:36:35 +02:00
Pavol Rusnak	865eff3820	Add LICENSE (#21 )	2023-03-12 08:36:03 +02:00
Georgi Gerganov	e34e3e21c4	Update README.md	2023-03-12 01:26:32 +02:00
Juraj Bednar	4cdcd39348	Fix a typo in model name (#16 )	2023-03-11 19:32:20 +02:00
Georgi Gerganov	284d9be2de	Update README.md	2023-03-11 18:10:18 +02:00
Georgi Gerganov	cc0f26bef3	Add AVX2 support for x86 architectures thanks to @Const-me !	2023-03-11 18:04:25 +02:00
Georgi Gerganov	bc3184cb2d	Fix un-initialized FP16 tables on x86 (#15 , #2 )	2023-03-11 17:40:14 +02:00
Georgi Gerganov	5afe16962e	Bump memory buffer	2023-03-11 12:45:01 +02:00
Georgi Gerganov	35cb0d2a39	Update README.md	2023-03-11 12:31:21 +02:00
Georgi Gerganov	34bb8821d6	.gitignore models/	2023-03-11 12:27:02 +02:00
Georgi Gerganov	2d2cadab68	Update Makefile var + add comment	2023-03-11 12:27:02 +02:00
Georgi Gerganov	657074b014	Update README.md	2023-03-11 11:34:25 +02:00
Georgi Gerganov	b53c6356f3	Update README.md	2023-03-11 11:34:11 +02:00
Georgi Gerganov	a2799521b9	Support all LLaMA models + change Q4_0 quantization storage	2023-03-11 11:28:30 +02:00
Simon Willison	d4919344b1	Include Python dependencies in README (#6 )	2023-03-11 07:47:26 +02:00
Georgi Gerganov	11dae511e3	Update README.md	2023-03-11 01:30:47 +02:00
Georgi Gerganov	240b0bf6ea	Update README.md	2023-03-11 01:22:58 +02:00
Georgi Gerganov	87da10c739	Update README.md	2023-03-11 01:18:10 +02:00
Jean-Michaël Celerier	12131a74cb	Add missing headers for memcpy and assert (#3 )	2023-03-11 01:04:06 +02:00
Georgi Gerganov	01e3d38e1c	Update README.md	2023-03-11 00:55:22 +02:00
Georgi Gerganov	8d38e7e279	Update README.md	2023-03-11 00:51:46 +02:00
Georgi Gerganov	586e0f1f3d	Update README.md	2023-03-11 00:09:19 +02:00
Georgi Gerganov	4c7f13c170	Update README.md	2023-03-10 23:53:11 +02:00
Georgi Gerganov	8453184bb2	Fix a bug in the rope calculation	2023-03-10 23:46:57 +02:00

... 79 80 81 82 83

4104 Commits