ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-26 09:29:27 +00:00

Author	SHA1	Message	Date
Kawrakow	dc663fe632	Better strategy for GPU offload (#520 ) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2025-06-12 19:25:11 +03:00
Kawrakow	7e0ac477b8	Option to enable disable the IQK CPU FA kernels (#429 ) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2025-05-17 11:21:58 +03:00
Kawrakow	f8277ced45	Compile time option to use bf16 for qunts without MMQ kernels (#261 ) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2025-03-18 07:37:10 +01:00
Kawrakow	db7eabb111	FA: Add option to build all FA kernels (#197 ) Similar to the CUDA situation. It is OFF by default. If OFF, only F16, Q8_0, Q6_0, and, if the CPU provides native BF16 support, BF16 FA kernels will be included. To enable all, cmake -DGGML_IQK_FA_ALL_QUANTS=1 ... This cuts compilation time for iqk_mul_mat.cpp by almost half (45 seconds vs 81 seconds on my Ryzen-7950X). Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2025-02-09 18:59:33 +02:00
Kawrakow	d2b53228f5	Move to c++17 projectwide (#80 ) * Slightly better * Make the entire project c++17 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-10-04 14:43:26 +03:00
Kawrakow	1a4cfbcc53	Merge mainline - Aug 12 2024 (#17 ) * Merge mainline * Fix after merge * Remove CI check --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-08-12 15:14:32 +02:00
Kawrakow	0ceeb11721	Merge mainline llama.cpp (#3 ) * Merging mainline - WIP * Merging mainline - WIP AVX2 and CUDA appear to work. CUDA performance seems slightly (~1-2%) lower as it is so often the case with llama.cpp/ggml after some "improvements" have been made. * Merging mainline - fix Metal * Remove check --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-07-27 07:55:01 +02:00

7 Commits