Commit Graph

  • eb9e2b628a iqk_mul_mat: experimenting with zen4 (iq2_xxs) Kawrakow 2024-06-05 11:27:55 +03:00
  • 47ae12bbec iqk_mul_mat: experimenting with zen4 (iq2_xxs) Iwan Kawrakow 2024-06-05 11:27:55 +03:00
  • 2c8d3dad1f iqk_mul_mat: experimenting with zen4 (iq2_xs) Kawrakow 2024-06-05 09:38:29 +03:00
  • dc96d5484f iqk_mul_mat: experimenting with zen4 (iq2_xs) Iwan Kawrakow 2024-06-05 09:38:29 +03:00
  • 0d9027fe74 iqk_mul_mat: experimenting with zen4 (iq3_s and iq2_m) Kawrakow 2024-06-05 08:58:36 +03:00
  • cb063a2a20 iqk_mul_mat: experimenting with zen4 (iq3_s and iq2_m) Iwan Kawrakow 2024-06-05 08:58:36 +03:00
  • ed8f1fe490 iqk_mul_mat: small improvement for iq3_s Kawrakow 2024-06-04 17:24:04 +03:00
  • 61b8cc1ff6 iqk_mul_mat: small improvement for iq3_s Iwan Kawrakow 2024-06-04 17:24:04 +03:00
  • 01d55dcbf0 iqk_mul_mat: better AVX2 implementation for iq2_xxs Kawrakow 2024-06-04 16:47:55 +03:00
  • 2a72d9f978 iqk_mul_mat: better AVX2 implementation for iq2_xxs Iwan Kawrakow 2024-06-04 16:47:55 +03:00
  • d4e9e595f9 iqk_mul_mat: better AVX2 implementation for iq2_xxs Kawrakow 2024-05-30 09:43:23 +03:00
  • 3a6e3943a8 iqk_mul_mat: better AVX2 implementation for iq2_xxs Iwan Kawrakow 2024-05-30 09:43:23 +03:00
  • 41391ff4b0 iqk_mul_mat: AVX2 implementation for iq2_xxs Kawrakow 2024-05-29 19:58:02 +03:00
  • 60f050d610 iqk_mul_mat: AVX2 implementation for iq2_xxs Iwan Kawrakow 2024-05-29 19:58:02 +03:00
  • be132341f5 iqk_mul_mat: AVX2 implementation for iq2_xs Kawrakow 2024-05-29 19:05:20 +03:00
  • 309e32405f iqk_mul_mat: AVX2 implementation for iq2_xs Iwan Kawrakow 2024-05-29 19:05:20 +03:00
  • 3c448906bf iqk_mul_mat: AVX2 implementation for iq2_s Kawrakow 2024-05-29 17:27:36 +03:00
  • 8015edb3cc iqk_mul_mat: AVX2 implementation for iq2_s Iwan Kawrakow 2024-05-29 17:27:36 +03:00
  • f31200bde1 Separate templates for TG and PP for i-quants on AVX2 Kawrakow 2024-05-29 13:42:50 +03:00
  • b0071de081 Separate templates for TG and PP for i-quants on AVX2 Iwan Kawrakow 2024-05-29 13:42:50 +03:00
  • 3f90520d1f iqk_mul_mat: AVX2 implementation for iq3_xxs Kawrakow 2024-05-29 10:38:58 +03:00
  • 2c8c0d0a68 iqk_mul_mat: AVX2 implementation for iq3_xxs Iwan Kawrakow 2024-05-29 10:38:58 +03:00
  • 24ccf42a4f iqk_mul_mat: AVX2 implementation for iq3_s Kawrakow 2024-05-29 08:00:59 +03:00
  • 34befcaf67 iqk_mul_mat: AVX2 implementation for iq3_s Iwan Kawrakow 2024-05-29 08:00:59 +03:00
  • 32f20a1b9b Cleanup - Arm i-quants should be good now Kawrakow 2024-05-28 13:13:25 +02:00
  • 4f53915dcb Cleanup - Arm i-quants should be good now Iwan Kawrakow 2024-05-28 13:13:25 +02:00
  • 7235135c3e iqk_mul_mat: Arm implementation for iq3_s (llama.cpp version) Kawrakow 2024-05-28 12:10:52 +02:00
  • 4b27ade2fb iqk_mul_mat: Arm implementation for iq3_s (llama.cpp version) Iwan Kawrakow 2024-05-28 12:10:52 +02:00
  • 482dd30382 Simplify Kawrakow 2024-05-28 10:58:30 +02:00
  • 221a2c3807 Simplify Iwan Kawrakow 2024-05-28 10:58:30 +02:00
  • 6aa7ac9cd3 iqk_mul_mat: Arm implementation for iq3_xxs (llama.cpp version) Kawrakow 2024-05-28 10:36:49 +02:00
  • 7dcca6aea7 iqk_mul_mat: Arm implementation for iq3_xxs (llama.cpp version) Iwan Kawrakow 2024-05-28 10:36:49 +02:00
  • d041c81b1d iqk_mul_mat: Arm implementation for iq2_xs (llama.cpp version) Kawrakow 2024-05-28 09:52:20 +02:00
  • effa4448d6 iqk_mul_mat: Arm implementation for iq2_xs (llama.cpp version) Iwan Kawrakow 2024-05-28 09:52:20 +02:00
  • 3fe4e1b27c iqk_mul_mat: Arm implementation for iq2_s (llama.cpp version) Kawrakow 2024-05-28 08:43:09 +02:00
  • d2ee9ab95e iqk_mul_mat: Arm implementation for iq2_s (llama.cpp version) Iwan Kawrakow 2024-05-28 08:43:09 +02:00
  • 4c0920cb1b Add Q8_0 Kawrakow 2024-05-27 19:04:25 +02:00
  • 9ac9e928d5 Add Q8_0 Iwan Kawrakow 2024-05-27 19:04:25 +02:00
  • 62122c1950 Cosmetics Kawrakow 2024-05-27 15:20:05 +02:00
  • 3f996d0c70 Cosmetics Iwan Kawrakow 2024-05-27 15:20:05 +02:00
  • fb8bc26dc5 iqk_mul_mat: Arm implementation for iq2_xxs (llama.cpp version) Kawrakow 2024-05-27 13:38:26 +02:00
  • d7ab97149f iqk_mul_mat: Arm implementation for iq2_xxs (llama.cpp version) Iwan Kawrakow 2024-05-27 13:38:26 +02:00
  • a18a564e54 iqk_mul_mat: faster q3_K TG Kawrakow 2024-05-27 11:05:44 +02:00
  • b51922530f iqk_mul_mat: faster q3_K TG Iwan Kawrakow 2024-05-27 11:05:44 +02:00
  • d434b4751a iqk_mul_mat for llama.cpp Kawrakow 2024-05-27 09:51:08 +02:00
  • 19c578b413 iqk_mul_mat for llama.cpp Iwan Kawrakow 2024-05-27 09:51:08 +02:00
  • 9fa7946997 JSON Schema to GBNF integration tests (#7790) Clint Herron 2024-06-21 23:18:36 -04:00
  • c5a8d4b749 JSON Schema to GBNF integration tests (#7790) Clint Herron 2024-06-21 23:18:36 -04:00
  • d34e2e8860 vulkan: detect multiple devices by deviceUUID instead of deviceID (#8022) k.h.lai 2024-06-21 16:28:20 +08:00
  • 557b653dc9 vulkan: detect multiple devices by deviceUUID instead of deviceID (#8022) k.h.lai 2024-06-21 16:28:20 +08:00
  • 7ccc0cb46d ggml : AVX IQ quants (#7845) Eve 2024-06-21 05:57:36 +00:00
  • 7d5e8777ae ggml : AVX IQ quants (#7845) Eve 2024-06-21 05:57:36 +00:00
  • 46e0320612 llama : optimize long word tokenization with WPM (#8034) Georgi Gerganov 2024-06-21 08:51:28 +03:00
  • a927b0f3dd llama : optimize long word tokenization with WPM (#8034) Georgi Gerganov 2024-06-21 08:51:28 +03:00
  • a895a1b78e llama : allow pooled embeddings on any model (#7477) Douglas Hanley 2024-06-21 00:38:22 -05:00
  • 80ea089d77 llama : allow pooled embeddings on any model (#7477) Douglas Hanley 2024-06-21 00:38:22 -05:00
  • 7ab016f973 swiftui : enable stream updating (#7754) Shuichi Tsutsumi 2024-06-21 14:30:58 +09:00
  • 0e64591e82 swiftui : enable stream updating (#7754) Shuichi Tsutsumi 2024-06-21 14:30:58 +09:00
  • 4fb22fa139 requirements : Bump torch and numpy for python3.12 (#8041) Hamdoud Hakem 2024-06-20 21:01:15 +01:00
  • b1ef562bc1 requirements : Bump torch and numpy for python3.12 (#8041) Hamdoud Hakem 2024-06-20 21:01:15 +01:00
  • e767e20fc6 convert-hf : Fix the encoding in the convert-hf-to-gguf-update.py (#8040) Hamdoud Hakem 2024-06-20 20:59:59 +01:00
  • 17b291a6a5 convert-hf : Fix the encoding in the convert-hf-to-gguf-update.py (#8040) Hamdoud Hakem 2024-06-20 20:59:59 +01:00
  • 5b4e0a2a38 common: fix warning (#8036) Johannes Gäßler 2024-06-20 16:40:13 +02:00
  • abd894ad96 common: fix warning (#8036) Johannes Gäßler 2024-06-20 16:40:13 +02:00
  • 20a2d77aa2 [SYCL] Fix windows build and inference (#8003) luoyu-intel 2024-06-20 13:19:05 +00:00
  • de391e4c80 [SYCL] Fix windows build and inference (#8003) luoyu-intel 2024-06-20 13:19:05 +00:00
  • 24dfdbb1a3 CUDA: stream-k decomposition for MMQ (#8018) Johannes Gäßler 2024-06-20 14:39:21 +02:00
  • d50f8897a7 CUDA: stream-k decomposition for MMQ (#8018) Johannes Gäßler 2024-06-20 14:39:21 +02:00
  • 4f46967577 metal : fix ggml_metal_supports_op for BF16 (#8021) Michael de Gans 2024-06-19 22:32:01 -07:00
  • 2075a66a96 metal : fix ggml_metal_supports_op for BF16 (#8021) Michael de Gans 2024-06-19 22:32:01 -07:00
  • c7d9dd7634 server : fix smart slot selection (#8020) sasha0552 2024-06-19 23:57:10 +00:00
  • ba58993152 server : fix smart slot selection (#8020) sasha0552 2024-06-19 23:57:10 +00:00
  • 9d63d2b978 un-ignore build-info.cmake and build-info.sh (#7996) Michael de Gans 2024-06-19 13:10:42 -07:00
  • a7854743c5 un-ignore build-info.cmake and build-info.sh (#7996) Michael de Gans 2024-06-19 13:10:42 -07:00
  • 028d6b31c6 ggml : synchronize threads using barriers (#7993) slaren 2024-06-19 15:04:15 +02:00
  • 9c77ec1d74 ggml : synchronize threads using barriers (#7993) slaren 2024-06-19 15:04:15 +02:00
  • efc3d09e43 codecov : remove (#8004) Georgi Gerganov 2024-06-19 13:04:36 +03:00
  • a04a953cab codecov : remove (#8004) Georgi Gerganov 2024-06-19 13:04:36 +03:00
  • ce37982f07 [SYCL] refactor (#6408) Meng, Hengyu 2024-06-19 09:11:51 +08:00
  • 623494a478 [SYCL] refactor (#6408) Meng, Hengyu 2024-06-19 09:11:51 +08:00
  • b8114be2fd tokenizer : BPE fixes (#7530) jaime-m-p 2024-06-18 18:40:52 +02:00
  • 37bef89433 tokenizer : BPE fixes (#7530) jaime-m-p 2024-06-18 18:40:52 +02:00
  • 083d5edc87 Only use FIM middle token if it exists (#7648) Sigbjørn Skjæret 2024-06-18 14:19:45 +02:00
  • 91c188d6c2 Only use FIM middle token if it exists (#7648) Sigbjørn Skjæret 2024-06-18 14:19:45 +02:00
  • 42fc9c93d7 Fix no gcc pragma on Windows (#7751) jojorne 2024-06-18 09:18:32 -03:00
  • 84f6de17f6 Fix no gcc pragma on Windows (#7751) jojorne 2024-06-18 09:18:32 -03:00
  • 3757bf9623 Allow compiling with CUDA without CUDA runtime installed (#7989) Ulrich Drepper 2024-06-18 14:00:14 +02:00
  • 61665277af Allow compiling with CUDA without CUDA runtime installed (#7989) Ulrich Drepper 2024-06-18 14:00:14 +02:00
  • dfdd084165 chore: clean useless beam search param (#7985) Frank Mai 2024-06-18 15:11:40 +08:00
  • b96f9afb0d chore: clean useless beam search param (#7985) Frank Mai 2024-06-18 15:11:40 +08:00
  • d406a5fb51 readme : update UI list (#7943) Abheek Gulati 2024-06-17 23:57:41 -07:00
  • 1193778105 readme : update UI list (#7943) Abheek Gulati 2024-06-17 23:57:41 -07:00
  • a1a23d5f3e ggml : sync Georgi Gerganov 2024-06-18 09:50:45 +03:00
  • 5326bcceeb ggml : sync Georgi Gerganov 2024-06-18 09:50:45 +03:00
  • 1792d68172 whisper : use ggml_backend_sched (whisper/2239) Georgi Gerganov 2024-06-18 09:37:20 +03:00
  • e6ecc2be47 whisper : use ggml_backend_sched (whisper/2239) Georgi Gerganov 2024-06-18 09:37:20 +03:00
  • 89d2889200 update: support Qwen2-57B-A14B (#7835) Ștefan-Gabriel Muscalu 2024-06-17 22:08:46 +03:00
  • a94e6ff877 update: support Qwen2-57B-A14B (#7835) Ștefan-Gabriel Muscalu 2024-06-17 22:08:46 +03:00
  • 97752e393d Make updates to type cast based on compiler instead of OS (#7851) Srihari-mcw 2024-06-17 23:53:17 +05:30
  • 5b6da18750 Make updates to type cast based on compiler instead of OS (#7851) Srihari-mcw 2024-06-17 23:53:17 +05:30