Commit Graph

  • 27d373a411 ci : try win-2019 on server windows test (#7854) slaren 2024-06-10 14:18:41 +02:00
  • fd5ea0f897 ci : try win-2019 on server windows test (#7854) slaren 2024-06-10 14:18:41 +02:00
  • a98d3afc28 examples : remove --instruct remnants (#7846) Georgi Gerganov 2024-06-10 15:00:15 +03:00
  • c28a83902c examples : remove --instruct remnants (#7846) Georgi Gerganov 2024-06-10 15:00:15 +03:00
  • 968cfb9d8d server : improve "prompt" handling (#7847) Georgi Gerganov 2024-06-10 14:59:55 +03:00
  • d9da0e4986 server : improve "prompt" handling (#7847) Georgi Gerganov 2024-06-10 14:59:55 +03:00
  • fc5f0f1647 CUDA: use tensor cores for MMQ (#7676) Johannes Gäßler 2024-06-10 11:45:13 +02:00
  • 1f0dabda8d CUDA: use tensor cores for MMQ (#7676) Johannes Gäßler 2024-06-10 11:45:13 +02:00
  • 8112ed6c29 use the correct SYCL context for host USM allocations (#7777) Ben Ashbaugh 2024-06-10 02:21:31 -07:00
  • af4ae502dd use the correct SYCL context for host USM allocations (#7777) Ben Ashbaugh 2024-06-10 02:21:31 -07:00
  • cff824ecc3 flake.lock: Update (#7838) Georgi Gerganov 2024-06-10 02:04:50 +03:00
  • 10ceba354a flake.lock: Update (#7838) Georgi Gerganov 2024-06-10 02:04:50 +03:00
  • ca9f39a167 imatrix : handle partial entries (#7833) Georgi Gerganov 2024-06-09 20:19:35 +03:00
  • e95beeb1fc imatrix : handle partial entries (#7833) Georgi Gerganov 2024-06-09 20:19:35 +03:00
  • f6bbf78d23 docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700) Nicolás Pérez 2024-06-09 11:24:29 -04:00
  • 57bf62ce7c docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700) Nicolás Pérez 2024-06-09 11:24:29 -04:00
  • 36f9df2257 server: do not remove whitespace at the start of a completion chunk (#7830) mgroeber9110 2024-06-09 12:50:35 +02:00
  • 3e2ee44315 server: do not remove whitespace at the start of a completion chunk (#7830) mgroeber9110 2024-06-09 12:50:35 +02:00
  • d4dc8e168b CUDA: revise q8_1 data layout for mul_mat_q (#7824) Johannes Gäßler 2024-06-09 09:42:25 +02:00
  • 42b53d192f CUDA: revise q8_1 data layout for mul_mat_q (#7824) Johannes Gäßler 2024-06-09 09:42:25 +02:00
  • 08bf59c672 convert-hf : set the model name based on cli arg, if present (#7693) sasha0552 2024-06-09 06:39:25 +00:00
  • 2decf57bc6 convert-hf : set the model name based on cli arg, if present (#7693) sasha0552 2024-06-09 06:39:25 +00:00
  • 46f8d599f6 convert-hf : match model part name prefix and suffix (#7687) compilade 2024-06-08 22:47:25 -04:00
  • 5795b94182 convert-hf : match model part name prefix and suffix (#7687) compilade 2024-06-08 22:47:25 -04:00
  • bad6961237 gguf-py : decouple adding metadata from writing in GGUFWriter (#7827) compilade 2024-06-08 22:34:29 -04:00
  • ed9f252118 gguf-py : decouple adding metadata from writing in GGUFWriter (#7827) compilade 2024-06-08 22:34:29 -04:00
  • 1caca63a87 Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808) slaren 2024-06-09 01:43:39 +02:00
  • fe1e3917cf Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808) slaren 2024-06-09 01:43:39 +02:00
  • 20e69f8dff url: save -mu downloads to new cache location (#7826) Olivier Chafik 2024-06-08 20:21:08 +01:00
  • d4d915d351 url: save -mu downloads to new cache location (#7826) Olivier Chafik 2024-06-08 20:21:08 +01:00
  • 66217bbac6 server : smart slot selection using Longest Common Prefix (#7728) sasha0552 2024-06-08 07:50:31 +00:00
  • 7a16ce7db2 server : smart slot selection using Longest Common Prefix (#7728) sasha0552 2024-06-08 07:50:31 +00:00
  • b28de435a1 vulkan : reuse parent extra for views (#7806) slaren 2024-06-07 19:47:49 +02:00
  • da799b4189 vulkan : reuse parent extra for views (#7806) slaren 2024-06-07 19:47:49 +02:00
  • c0600f4f5c gguf-split : change binary multi-byte units to decimal (#7803) Christian Zhou-Zheng 2024-06-07 08:56:01 -04:00
  • c00fad71e5 gguf-split : change binary multi-byte units to decimal (#7803) Christian Zhou-Zheng 2024-06-07 08:56:01 -04:00
  • 544d23d303 cmake : fix BUILD_SHARED_LIBS=ON build (#7784) intelmatt 2024-06-07 05:15:07 -07:00
  • 27615f5ab2 cmake : fix BUILD_SHARED_LIBS=ON build (#7784) intelmatt 2024-06-07 05:15:07 -07:00
  • e8ac0b3518 server: update cache_prompt documentation [no ci] (#7745) Johannes Gäßler 2024-06-07 11:15:49 +02:00
  • 7027b27d76 server: update cache_prompt documentation [no ci] (#7745) Johannes Gäßler 2024-06-07 11:15:49 +02:00
  • f91a2cbb62 server : do not get prompt in infill mode (#7286) woodx 2024-06-07 15:09:45 +08:00
  • a5cabd7649 server : do not get prompt in infill mode (#7286) woodx 2024-06-07 15:09:45 +08:00
  • bd158af596 [SYCL] fix softmax r2r result wrong issue (#7811) pengxin99 2024-06-07 14:28:26 +08:00
  • d5c938cd77 [SYCL] fix softmax r2r result wrong issue (#7811) pengxin99 2024-06-07 14:28:26 +08:00
  • bad6ac1321 check for nans in imatrix and quantize (#7807) slaren 2024-06-07 08:01:29 +02:00
  • c9ee7118d5 check for nans in imatrix and quantize (#7807) slaren 2024-06-07 08:01:29 +02:00
  • 4e92948760 server : fix --threads-http arg (#7801) Georgi Gerganov 2024-06-06 19:19:59 +03:00
  • ee459f40f6 server : fix --threads-http arg (#7801) Georgi Gerganov 2024-06-06 19:19:59 +03:00
  • c2a2806fac imatrix : migrate to gpt_params (#7771) Georgi Gerganov 2024-06-06 16:30:58 +03:00
  • f83351f9a6 imatrix : migrate to gpt_params (#7771) Georgi Gerganov 2024-06-06 16:30:58 +03:00
  • 00552af560 Added support for . (any character) token in grammar engine. (#6467) Clint Herron 2024-06-06 06:08:52 -07:00
  • ad675e1c67 Added support for . (any character) token in grammar engine. (#6467) Clint Herron 2024-06-06 06:08:52 -07:00
  • 43ce4c2223 README minor fixes (#7798) [no ci] Mattheus Chediak 2024-06-06 09:17:54 -03:00
  • a143c04375 README minor fixes (#7798) [no ci] Mattheus Chediak 2024-06-06 09:17:54 -03:00
  • bb0026f4f1 grammars: x{min,max} repetition operator (#6640) Olivier Chafik 2024-06-06 10:07:06 +01:00
  • 55b2d0849d grammars: x{min,max} repetition operator (#6640) Olivier Chafik 2024-06-06 10:07:06 +01:00
  • add6ba8d05 llama : add jina v2 base code (#7596) Joan Fontanals 2024-06-06 09:22:41 +02:00
  • f5d7b268ec llama : add jina v2 base code (#7596) Joan Fontanals 2024-06-06 09:22:41 +02:00
  • 283627eb48 docker : build only main and server in their images (#7782) slaren 2024-06-06 07:19:49 +02:00
  • 2d08b7fbb4 docker : build only main and server in their images (#7782) slaren 2024-06-06 07:19:49 +02:00
  • 132e99131c docker : add openmp lib (#7780) slaren 2024-06-06 07:17:21 +02:00
  • d67caea0d6 docker : add openmp lib (#7780) slaren 2024-06-06 07:17:21 +02:00
  • e4c231c3ff Fix encoding in python scripts (#7733) Galunid 2024-06-05 19:07:24 +02:00
  • 7672adeec7 Fix encoding in python scripts (#7733) Galunid 2024-06-05 19:07:24 +02:00
  • 4ac499892a CUDA: refactor mmq, dmmv, mmvq (#7716) Johannes Gäßler 2024-06-05 16:53:00 +02:00
  • 7d1a378b8f CUDA: refactor mmq, dmmv, mmvq (#7716) Johannes Gäßler 2024-06-05 16:53:00 +02:00
  • 13a467c230 ggml : refactor rope norm/neox (#7634) Georgi Gerganov 2024-06-05 11:29:20 +03:00
  • 2b3389677a ggml : refactor rope norm/neox (#7634) Georgi Gerganov 2024-06-05 11:29:20 +03:00
  • e2d1daa87f readme : remove -ins (#7759) arch-btw 2024-06-04 23:40:49 -07:00
  • 9973e81c5c readme : remove -ins (#7759) arch-btw 2024-06-04 23:40:49 -07:00
  • b443dc4fa3 Fix per token atrributes bits (#7749) jaime-m-p 2024-06-05 01:26:14 +02:00
  • c90dbe026b Fix per token atrributes bits (#7749) jaime-m-p 2024-06-05 01:26:14 +02:00
  • 3a57b9a5c3 Allow number of nodes in CUDA graph to change (#7738) agray3 2024-06-04 21:06:49 +01:00
  • b90dc566c1 Allow number of nodes in CUDA graph to change (#7738) agray3 2024-06-04 21:06:49 +01:00
  • 8822dcce8d common : refactor cli arg parsing (#7675) Georgi Gerganov 2024-06-04 21:23:39 +03:00
  • 1442677f92 common : refactor cli arg parsing (#7675) Georgi Gerganov 2024-06-04 21:23:39 +03:00
  • 8de006f83e ggml : remove OpenCL (#7735) Georgi Gerganov 2024-06-04 21:23:20 +03:00
  • 554c247caf ggml : remove OpenCL (#7735) Georgi Gerganov 2024-06-04 21:23:20 +03:00
  • 515db58a33 llama : remove beam search (#7736) Georgi Gerganov 2024-06-04 21:23:05 +03:00
  • 0cd6bd3483 llama : remove beam search (#7736) Georgi Gerganov 2024-06-04 21:23:05 +03:00
  • c5b0ed9622 readme : remove obsolete Zig instructions (#7471) Georgi Gerganov 2024-06-04 19:43:01 +03:00
  • 5ca0944a15 readme : remove obsolete Zig instructions (#7471) Georgi Gerganov 2024-06-04 19:43:01 +03:00
  • 00465f7b5a llama-bench : allow using a different printer for stderr with -oe (#7722) slaren 2024-06-04 14:32:42 +02:00
  • adc9ff3841 llama-bench : allow using a different printer for stderr with -oe (#7722) slaren 2024-06-04 14:32:42 +02:00
  • 846a287022 Improve hipBLAS support in CMake (#7696) Daniele 2024-06-04 12:09:15 +00:00
  • 987d743d6b Improve hipBLAS support in CMake (#7696) Daniele 2024-06-04 12:09:15 +00:00
  • 76501c35ee refine .gitignore (#7688) zhouwg 2024-06-04 19:21:26 +08:00
  • b226c1227b refine .gitignore (#7688) zhouwg 2024-06-04 19:21:26 +08:00
  • ac02b89600 Per token attributes (#7685) jaime-m-p 2024-06-04 09:17:17 +02:00
  • 3b38d48609 Per token attributes (#7685) jaime-m-p 2024-06-04 09:17:17 +02:00
  • 9ca9da72ec ggml : prevent builds with -ffinite-math-only (#7726) Georgi Gerganov 2024-06-04 10:01:09 +03:00
  • 6d1616944d ggml : prevent builds with -ffinite-math-only (#7726) Georgi Gerganov 2024-06-04 10:01:09 +03:00
  • ccd01bc441 llama : offload to RPC in addition to other backends (#7640) Radoslav Gerganov 2024-06-03 20:03:26 +03:00
  • bde7cd3cd9 llama : offload to RPC in addition to other backends (#7640) Radoslav Gerganov 2024-06-03 20:03:26 +03:00
  • 2c040f0269 ggml : use OpenMP as a thread pool (#7606) Masaya, Kato 2024-06-04 00:14:15 +09:00
  • a5735e4426 ggml : use OpenMP as a thread pool (#7606) Masaya, Kato 2024-06-04 00:14:15 +09:00
  • 57c1334819 make: fix debug options not being applied to NVCC (#7714) Johannes Gäßler 2024-06-03 16:28:58 +02:00
  • 0b832d53ba make: fix debug options not being applied to NVCC (#7714) Johannes Gäßler 2024-06-03 16:28:58 +02:00
  • 946c648701 Vulkan Mixture of Experts (MoE) support (#7628) 0cc4m 2024-06-03 10:59:14 +02:00
  • 3d7ebf6312 Vulkan Mixture of Experts (MoE) support (#7628) 0cc4m 2024-06-03 10:59:14 +02:00