ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 17:20:01 +00:00

Author	SHA1	Message	Date
Kawrakow	0ceeb11721	Merge mainline llama.cpp (#3 ) * Merging mainline - WIP * Merging mainline - WIP AVX2 and CUDA appear to work. CUDA performance seems slightly (~1-2%) lower as it is so often the case with llama.cpp/ggml after some "improvements" have been made. * Merging mainline - fix Metal * Remove check --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-07-27 07:55:01 +02:00
Justine Tunney	ce8b5426be	ggml : add llamafile sgemm (#6414 ) This change upstreams llamafile's cpu matrix multiplication kernels which improve image and prompt evaluation speed. For starters, Q4_0 and Q8_0 weights should go ~40% faster on CPU. The biggest benefits are with data types like f16 / f32, which process prompts 2x faster thus making them faster than quantized data types for prompt evals. This change also introduces bona fide AVX512 support since tinyBLAS is able to exploit the larger register file. For example, on my CPU llama.cpp llava-cli processes an image prompt at 305 tokens/second, using the Q4_K and Q4_0 types, which has always been faster than if we used f16 LLaVA weights, which at HEAD go 188 tokens/second. With this change, f16 LLaVA performance leap frogs to 464 tokens/second. On Intel Core i9-14900K this change improves F16 prompt perf by 5x. For example, using llama.cpp at HEAD with Mistral 7b f16 to process a 215 token prompt will go 13 tok/sec. This change has fixes making it go 52 tok/sec. It's mostly thanks to my vectorized outer product kernels but also because I added support for correctly counting the number of cores on Alderlake, so the default thread count discounts Intel's new efficiency cores. Only Linux right now can count cores. This work was sponsored by Mozilla who's given permission to change the license of this code from Apache 2.0 to MIT. To read more about what's improved, and how it works, see: https://justine.lol/matmul/	2024-04-16 21:55:30 +03:00
Steven Prichard	2ea89c4845	swift : linux support (#6590 ) - Package.swift now supports conditional compilation based on OS - Allows for package to be used by SPM on Non-Apple platforms Co-authored-by: Steven Prichard <steven.prichard@justeattakeaway.com>	2024-04-15 13:14:46 +03:00
Jared Van Bortel	e8b00e1e36	wpm : portable unicode tolower (#6305 ) Also use C locale for ispunct/isspace, and split unicode-data.cpp from unicode.cpp.	2024-03-26 17:46:21 -04:00
Georgi Gerganov	f60b94d486	llama : refactor unicode stuff (#5992 ) * llama : refactor unicode stuff ggml-ci * unicode : names * make : fix c++ compiler * unicode : names * unicode : straighten tables * zig : fix build * unicode : put nfd normalization behind API ggml-ci * swift : fix build * unicode : add BOM * unicode : add <cstdint> ggml-ci * unicode : pass as cpts as const ref	2024-03-11 17:47:47 +02:00
Georgi Gerganov	6858a4fa83	swift : package no longer use ggml dependency (#5465 ) * Revert "swift : update Package.swift to use ggml as dependency (#4691)" This reverts commit `ece9a45e8f`. * spm : add ggml headers	2024-02-12 19:54:29 +02:00
Georgi Gerganov	6936a37c2b	swift : track ggml release branch (#4867 )	2024-01-11 21:58:28 +02:00
Georgi Gerganov	8b546126e4	swift : pin ggml commit + remove ggml.h from spm-headers (#4878 ) ggml-ci	2024-01-11 21:31:31 +02:00
Georgi Gerganov	0fe35cdf38	swift : exclude ggml-metal.metal from the package (#4822 )	2024-01-08 16:40:51 +02:00
Ashraful Islam	59092ff962	swift : update Package.swift to use ggml as dependency (#4691 ) * updates the package.swift to use ggml as dependency * changes the ggml package url src to ggerganov	2024-01-03 19:30:02 +02:00
kchro3	970ec26a7d	swift : revert compiler checks for swift package (#4332 )	2023-12-05 09:29:46 +02:00
Georgi Gerganov	91be989d92	ggml : quantization refactoring (#3833 ) * ggml : factor all quantization code in ggml-quants ggml-ci * ggml-quants : fix Zig and Swift builds + quantize tool ggml-ci * quantize : --pure option for disabling k-quant mixtures --------- Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>	2023-10-29 18:32:28 +02:00
Jhen-Jie Hong	d49c1c5b2d	swift : improvements and fixes (#3564 ) * swift : use macOS 12 as minimum requirement * swift : add missing ggml-backend.c source * swift : add -O3 -DNDEBUG unsafe flags	2023-10-10 14:31:13 +03:00
Jhen-Jie Hong	bff47ce69b	metal : support default.metallib load & reuse code for swift package (#3522 ) * metal : support load default.metallib & reuse code for swift package * metal : use SWIFT_PACKAGE def instead of define GGML_SWIFT	2023-10-07 11:40:27 +03:00
Jhen-Jie Hong	594db7b27d	swift : disable ACCELERATE_NEW_LAPACK (#3481 )	2023-10-05 17:00:07 +03:00
Jhen-Jie Hong	f93da61e4c	swift : fix build on xcode 15 (#3387 )	2023-09-29 08:25:13 +03:00
Jag Chadha	31e3b674ad	build : add ACCELERATE_NEW_LAPACK to fix warning on macOS Sonoma (#3342 )	2023-09-27 18:34:32 +03:00
kchro3	1b0b14195c	metal : support for Swift (#3078 ) * Metal support for Swift * update * add a toggle for arm/arm64 * set minimum versions for all platforms * update to use newLibraryWithURL * bump version Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com> --------- Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com>	2023-09-09 17:12:10 +08:00
kchro3	c2873512e6	swift : add support for k-quants (#2983 )	2023-09-03 09:21:05 +03:00
kchro3	f96a0722fa	swift : add missing c file to Package.swift (#2978 )	2023-09-03 08:27:25 +03:00
Frederik Vogel	d050d6ae67	swift : Package compile breaks due to ggml-metal.metal (#1831 ) * Ignore metal file in spm * Add ggml.h to spm public Headers --------- Co-authored-by: Vogel Frederik <vogel.frederik@linecorp.com>	2023-06-15 20:47:04 +03:00
Andrew Duffy	54aaf78743	Add Accelerate/BLAS when using Swift (#765 )	2023-04-05 06:44:24 -04:00
Jed Fox	a9b8ceaea2	deploy : add a Package.swift for SwiftPM support (#393 ) * Add a Package.swift for SwiftPM support * Swap from exclusions to allowlist	2023-03-28 19:39:01 +03:00

23 Commits