Commit Graph

  • 3b169441df sync : ggml (#5452) Georgi Gerganov 2024-02-12 09:16:06 +02:00
  • 8377d606b8 CUDA: mul_mat_vec_q tiling, refactor mul mat logic (#5434) Johannes Gäßler 2024-02-11 19:08:39 +01:00
  • 3bdc4cd0f5 CUDA: mul_mat_vec_q tiling, refactor mul mat logic (#5434) Johannes Gäßler 2024-02-11 19:08:39 +01:00
  • 2dc6eef6da Add support for BERT embedding models (#5423) Douglas Hanley 2024-02-11 10:21:38 -06:00
  • 2891c8aa9a Add support for BERT embedding models (#5423) Douglas Hanley 2024-02-11 10:21:38 -06:00
  • 713ce99422 flake.lock: Update github-actions[bot] 2024-02-11 00:17:31 +00:00
  • 97a336507e flake.lock: Update github-actions[bot] 2024-02-11 00:17:31 +00:00
  • 68e7918281 vulkan: only use M-sized matmul on Apple GPUs (#5412) Sergio López 2024-02-11 15:12:00 +01:00
  • c88c74f967 vulkan: only use M-sized matmul on Apple GPUs (#5412) Sergio López 2024-02-11 15:12:00 +01:00
  • f2f1af418b common : use enums for sampler types (#5418) Alexey Parfenov 2024-02-11 13:43:31 +00:00
  • a803333a4e common : use enums for sampler types (#5418) Alexey Parfenov 2024-02-11 13:43:31 +00:00
  • bcf24ef69b server : allow to specify tokens as strings in logit_bias (#5003) Alexey Parfenov 2024-02-11 13:38:14 +00:00
  • 684780141a server : allow to specify tokens as strings in logit_bias (#5003) Alexey Parfenov 2024-02-11 13:38:14 +00:00
  • a77a4011c9 main : ctrl+C print timing in non-interactive mode (#3873) Georgi Gerganov 2024-02-11 15:35:50 +02:00
  • 85910c5b30 main : ctrl+C print timing in non-interactive mode (#3873) Georgi Gerganov 2024-02-11 15:35:50 +02:00
  • d6f4a7c4bc common : fix compile warning Georgi Gerganov 2024-02-11 15:33:43 +02:00
  • 139b62a839 common : fix compile warning Georgi Gerganov 2024-02-11 15:33:43 +02:00
  • 358914f9b0 ggml : fix compile warnings (unused vars) (#4966) Georgi Gerganov 2024-02-11 15:33:01 +02:00
  • 0f2411f154 ggml : fix compile warnings (unused vars) (#4966) Georgi Gerganov 2024-02-11 15:33:01 +02:00
  • d7f7308deb ggml : add mmla kernels for quantized GEMM (#4966) snadampal 2024-02-11 07:22:33 -06:00
  • a07d0fee1f ggml : add mmla kernels for quantized GEMM (#4966) snadampal 2024-02-11 07:22:33 -06:00
  • 6834564e30 lookup: add print for drafting performance (#5450) Johannes Gäßler 2024-02-11 12:44:51 +01:00
  • e4640d8fdf lookup: add print for drafting performance (#5450) Johannes Gäßler 2024-02-11 12:44:51 +01:00
  • 71993ba165 server : add llama2 chat template (#5425) Xuan Son Nguyen 2024-02-11 11:16:22 +01:00
  • 907e08c110 server : add llama2 chat template (#5425) Xuan Son Nguyen 2024-02-11 11:16:22 +01:00
  • ccd61a2337 metal : use autoreleasepool to avoid memory leaks (#5437) Ian Bull 2024-02-10 02:53:28 -08:00
  • f026f8120f metal : use autoreleasepool to avoid memory leaks (#5437) Ian Bull 2024-02-10 02:53:28 -08:00
  • 71e5730e05 scripts : update sync scripts with new backends Georgi Gerganov 2024-02-10 09:53:05 +02:00
  • cd9aea63b5 scripts : update sync scripts with new backends Georgi Gerganov 2024-02-10 09:53:05 +02:00
  • 14d2486167 sync : ggml Georgi Gerganov 2024-02-10 09:30:36 +02:00
  • 43b65f5eb8 sync : ggml Georgi Gerganov 2024-02-10 09:30:36 +02:00
  • 8b531d3640 ggml : add abort_callback for cpu backend (ggml/725) Michael Podvitskiy 2024-02-09 10:42:27 +01:00
  • 4633d93af0 ggml : add abort_callback for cpu backend (ggml/725) Michael Podvitskiy 2024-02-09 10:42:27 +01:00
  • 9cb8788ad8 vulkan: Set limit for task concurrency (#5427) Neuman Vong 2024-02-10 05:30:19 +11:00
  • 4b7b38bef5 vulkan: Set limit for task concurrency (#5427) Neuman Vong 2024-02-10 05:30:19 +11:00
  • 380440da2a llava : add requirements.txt and update README.md (#5428) Daniel Bevenius 2024-02-09 14:00:59 +01:00
  • e00d2a62dd llava : add requirements.txt and update README.md (#5428) Daniel Bevenius 2024-02-09 14:00:59 +01:00
  • 1f6f1c2251 server : fix prompt caching for repeated prompts (#5420) Riley Stewart 2024-02-09 02:49:49 -08:00
  • 7c777fcd5d server : fix prompt caching for repeated prompts (#5420) Riley Stewart 2024-02-09 02:49:49 -08:00
  • c7fa729d3a llama : do not cap thread count when MoE on CPU (#5419) Paul Tsochantaris 2024-02-09 10:48:06 +00:00
  • e5ca3937c6 llama : do not cap thread count when MoE on CPU (#5419) Paul Tsochantaris 2024-02-09 10:48:06 +00:00
  • 0af9b0c0bf readme : add JavaScript/Wasm repo (#5415) Marko Tasic 2024-02-09 11:17:00 +01:00
  • e4124c2477 readme : add JavaScript/Wasm repo (#5415) Marko Tasic 2024-02-09 11:17:00 +01:00
  • 5b915c46e7 ggml : fix error C2078: too many initializers for MSVC ARM64 (#5404) Michael Podvitskiy 2024-02-09 10:56:43 +01:00
  • b2f87cb64d ggml : fix error C2078: too many initializers for MSVC ARM64 (#5404) Michael Podvitskiy 2024-02-09 10:56:43 +01:00
  • 60e30ba4aa Fix Vulkan crash on APUs with very little device memory (#5424) 0cc4m 2024-02-09 06:52:33 +01:00
  • 44fbe34360 Fix Vulkan crash on APUs with very little device memory (#5424) 0cc4m 2024-02-09 06:52:33 +01:00
  • 788f40265d CUDA: more warps for mmvq on NVIDIA (#5394) Johannes Gäßler 2024-02-08 21:56:40 +01:00
  • 8e6a9d2de0 CUDA: more warps for mmvq on NVIDIA (#5394) Johannes Gäßler 2024-02-08 21:56:40 +01:00
  • 9fbf29ca7d llama : do not print "offloading layers" message in CPU-only builds (#5416) slaren 2024-02-08 21:33:03 +01:00
  • 41f308f58e llama : do not print "offloading layers" message in CPU-only builds (#5416) slaren 2024-02-08 21:33:03 +01:00
  • ee83074b40 Fix f16_sycl cpy call from Arc (#5411) Abhilash Majumder 2024-02-08 22:39:10 +05:30
  • 6e99f2a04f Fix f16_sycl cpy call from Arc (#5411) Abhilash Majumder 2024-02-08 22:39:10 +05:30
  • 53150b07a1 llava : add missing .py, and fix paths in README.md (#5414) Daniel Bevenius 2024-02-08 15:20:03 +01:00
  • ff4ff05c5f llava : add missing .py, and fix paths in README.md (#5414) Daniel Bevenius 2024-02-08 15:20:03 +01:00
  • 5d5a6cfa7b fix trailing whitespace (#5407) Johannes Gäßler 2024-02-08 11:36:54 +01:00
  • b7b74cef36 fix trailing whitespace (#5407) Johannes Gäßler 2024-02-08 11:36:54 +01:00
  • 26b7d77de4 llama : fix MiniCPM (#5392) runfuture 2024-02-08 18:36:19 +08:00
  • 4aa43fab56 llama : fix MiniCPM (#5392) runfuture 2024-02-08 18:36:19 +08:00
  • 648e6a96b9 llava: fix typo/formatting in README.md (#5405) Daniel Bevenius 2024-02-08 09:58:19 +01:00
  • a6e514a85f llava: fix typo/formatting in README.md (#5405) Daniel Bevenius 2024-02-08 09:58:19 +01:00
  • 5a19feb61f sampling: fix top_k <= 0 (#5388) Johannes Gäßler 2024-02-08 09:46:30 +01:00
  • 26d4efd11e sampling: fix top_k <= 0 (#5388) Johannes Gäßler 2024-02-08 09:46:30 +01:00
  • dbd41b1dc7 tests : .gitignore obj files Georgi Gerganov 2024-02-08 09:46:47 +02:00
  • 8504d2d0da tests : .gitignore obj files Georgi Gerganov 2024-02-08 09:46:47 +02:00
  • 434d94837b CMAKE_OSX_ARCHITECTURES for MacOS cross compilation (#5393) Michael Podvitskiy 2024-02-07 22:39:23 +01:00
  • c4fbb6717c CMAKE_OSX_ARCHITECTURES for MacOS cross compilation (#5393) Michael Podvitskiy 2024-02-07 22:39:23 +01:00
  • 92ae72f525 fix typo in readme (#5399) Ebey Abraham 2024-02-07 21:11:30 +00:00
  • 8c933b70c2 fix typo in readme (#5399) Ebey Abraham 2024-02-07 21:11:30 +00:00
  • 20781735c8 Add Ava in the list of llama.cpp UIs (#4362) Kamil Tomšík 2024-02-07 19:44:52 +01:00
  • b906596bb7 Add Ava in the list of llama.cpp UIs (#4362) Kamil Tomšík 2024-02-07 19:44:52 +01:00
  • f3a0ec2947 CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (#5386) Johannes Gäßler 2024-02-07 12:40:26 +01:00
  • aa7ab99be2 CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (#5386) Johannes Gäßler 2024-02-07 12:40:26 +01:00
  • 7312af4f6e [SYCL] update install make by w64devkit (#5297) Neo Zhang Jianyu 2024-02-07 18:16:55 +08:00
  • 10afa6f1d1 [SYCL] update install make by w64devkit (#5297) Neo Zhang Jianyu 2024-02-07 18:16:55 +08:00
  • d3fc78479e llava-cli : always tokenize special tokens (#5382) Xiao-Yong Jin 2024-02-07 02:17:25 -06:00
  • 0ef46da632 llava-cli : always tokenize special tokens (#5382) Xiao-Yong Jin 2024-02-07 02:17:25 -06:00
  • 5f99615aed Basic Vulkan Multi-GPU implementation (#5321) 0cc4m 2024-02-07 07:54:50 +01:00
  • ee1628bdfe Basic Vulkan Multi-GPU implementation (#5321) 0cc4m 2024-02-07 07:54:50 +01:00
  • e83e46c1a1 readme : modernize (#5379) Eve 2024-02-07 06:21:30 +00:00
  • ed0bf32290 readme : modernize (#5379) Eve 2024-02-07 06:21:30 +00:00
  • 7a650d834c readme : update ui list (#5354) Ben Williams 2024-02-06 22:16:48 -08:00
  • 9a697d842b readme : update ui list (#5354) Ben Williams 2024-02-06 22:16:48 -08:00
  • f64dfb1678 llama : add MiniCPM support (#5346) runfuture 2024-02-07 14:15:56 +08:00
  • 316c7faf77 llama : add MiniCPM support (#5346) runfuture 2024-02-07 14:15:56 +08:00
  • d5e1d65102 server : update /props with "total_slots" value (#5373) Justin Parker 2024-02-07 01:15:19 -05:00
  • f3e2b4fa3f server : update /props with "total_slots" value (#5373) Justin Parker 2024-02-07 01:15:19 -05:00
  • 4c1cf78b7a convert : fix TypeError on GPT-2 vocab.json (#5288) Sang-Kil Park 2024-02-07 13:28:00 +09:00
  • f68664ac24 convert : fix TypeError on GPT-2 vocab.json (#5288) Sang-Kil Park 2024-02-07 13:28:00 +09:00
  • 65b0045d18 server : remove model.json endpoint (#5371) Alexey Parfenov 2024-02-06 18:08:38 +00:00
  • 213d1439fa server : remove model.json endpoint (#5371) Alexey Parfenov 2024-02-06 18:08:38 +00:00
  • e3ee2b0879 CUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370) Johannes Gäßler 2024-02-06 18:43:06 +01:00
  • 17c97fb062 CUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370) Johannes Gäßler 2024-02-06 18:43:06 +01:00
  • afb11f7794 Update README.md (#5366) Kawrakow 2024-02-06 19:00:16 +02:00
  • b08f22c882 Update README.md (#5366) Kawrakow 2024-02-06 19:00:16 +02:00
  • bd1301d6c5 Slight quantization improvement for Q4_K and Q5_K (#5361) Kawrakow 2024-02-06 17:28:02 +02:00
  • f57fadc009 Slight quantization improvement for Q4_K and Q5_K (#5361) Kawrakow 2024-02-06 17:28:02 +02:00
  • 1675e8787c readme : add phi, orion 14b, internlm2, and yi-VL to readme (#5362) BarfingLemurs 2024-02-06 09:06:48 -05:00
  • 2e9c0bd6b3 readme : add phi, orion 14b, internlm2, and yi-VL to readme (#5362) BarfingLemurs 2024-02-06 09:06:48 -05:00
  • d293e063ce CUDA: mul_mat_vec_q for batch sizes > 1 (#5351) Johannes Gäßler 2024-02-06 14:44:06 +01:00