Commit Graph

  • d752327c33 Adding KodiBot to UI list (#6535) Firat 2024-04-08 00:48:29 -07:00
  • 0b06becdd7 Change Windows AMD example to release build to make inference much faster. (#6525) Mark Fairbairn 2024-04-07 19:52:19 +01:00
  • 855f54402e Change Windows AMD example to release build to make inference much faster. (#6525) Mark Fairbairn 2024-04-07 19:52:19 +01:00
  • 9bcd3315a1 flake.lock: Update (#6517) Georgi Gerganov 2024-04-07 21:25:30 +03:00
  • b909236c0b flake.lock: Update (#6517) Georgi Gerganov 2024-04-07 21:25:30 +03:00
  • 9b88711670 Add GritLM as supported models. (#6513) DAN™ 2024-04-07 13:33:59 -04:00
  • e0717e751e Add GritLM as supported models. (#6513) DAN™ 2024-04-07 13:33:59 -04:00
  • 2c80dc319b sync : ggml Georgi Gerganov 2024-04-07 17:05:51 +03:00
  • c37247796b sync : ggml Georgi Gerganov 2024-04-07 17:05:51 +03:00
  • 0f3cd80c8f ggml: bypass code incompatible with CUDA < 11.1 (whisper/2020) Slava Primenko 2024-04-04 14:49:24 +02:00
  • f77261a7c5 ggml: bypass code incompatible with CUDA < 11.1 (whisper/2020) Slava Primenko 2024-04-04 14:49:24 +02:00
  • 9638aa3727 scripts : sync ggml-cuda folder Georgi Gerganov 2024-04-07 16:08:12 +03:00
  • 43e8995e75 scripts : sync ggml-cuda folder Georgi Gerganov 2024-04-07 16:08:12 +03:00
  • 16daf2d791 Run make to build the project (#6457) limitedAtonement 2024-04-07 07:05:40 -04:00
  • 9472bce308 Run make to build the project (#6457) limitedAtonement 2024-04-07 07:05:40 -04:00
  • 05a887dfdd support/fix OPs GGML_TYPE_IQ4_NL, GGML_TYPE_IQ4_XS, GGML_TYPE_IQ3_XXS, GGML_TYPE_IQ3_S, GGML_TYPE_IQ2_XXS, GGML_TYPE_IQ2_XS, GGML_TYPE_IQ2_S, GGML_TYPE_IQ1_S, GGML_TYPE_IQ1_M (#6521) Neo Zhang Jianyu 2024-04-07 10:55:59 +08:00
  • d4f220a5cc support/fix OPs GGML_TYPE_IQ4_NL, GGML_TYPE_IQ4_XS, GGML_TYPE_IQ3_XXS, GGML_TYPE_IQ3_S, GGML_TYPE_IQ2_XXS, GGML_TYPE_IQ2_XS, GGML_TYPE_IQ2_S, GGML_TYPE_IQ1_S, GGML_TYPE_IQ1_M (#6521) Neo Zhang Jianyu 2024-04-07 10:55:59 +08:00
  • 0bcde06402 sync : ggml Georgi Gerganov 2024-04-06 17:43:15 +03:00
  • 54ea0698fb sync : ggml Georgi Gerganov 2024-04-06 17:43:15 +03:00
  • 7017659e90 backend : fix typo in scheduler documentation (ggml/781) Daniel Bevenius 2024-04-03 22:57:20 +02:00
  • b66aec675c backend : fix typo in scheduler documentation (ggml/781) Daniel Bevenius 2024-04-03 22:57:20 +02:00
  • 5b69ee6ce5 Tests: Added integration tests for GBNF parser (#6472) Clint Herron 2024-04-06 10:31:33 -04:00
  • 57dd02c44b Tests: Added integration tests for GBNF parser (#6472) Clint Herron 2024-04-06 10:31:33 -04:00
  • 73e01781d4 ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response (#6495) Pierrick Hymbert 2024-04-06 05:40:47 +02:00
  • 75cd4c7729 ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response (#6495) Pierrick Hymbert 2024-04-06 05:40:47 +02:00
  • 8ab3d0da25 gguf.py : add licence and version to gguf writer (#6504) Brian 2024-04-06 05:41:38 +11:00
  • a8bd14d557 gguf.py : add licence and version to gguf writer (#6504) Brian 2024-04-06 05:41:38 +11:00
  • 2f6d0324a5 readme : update UI list (#6503) Hoang Nguyen 2024-04-05 11:39:43 -07:00
  • d0f5deebf8 readme : update UI list (#6503) Hoang Nguyen 2024-04-05 11:39:43 -07:00
  • bcd1e3beaf bench : make n_batch and n_ubatch configurable in Batched bench (#6500) Ting Sun 2024-04-06 01:34:53 +07:00
  • 87e21bbacd bench : make n_batch and n_ubatch configurable in Batched bench (#6500) Ting Sun 2024-04-06 01:34:53 +07:00
  • 3c00380ea7 [SYCL] Fixed minor bug when enabling FP16 for non intel targets (#6464) Ouadie EL FAROUKI 2024-04-05 14:35:06 +01:00
  • 1b496a745c [SYCL] Fixed minor bug when enabling FP16 for non intel targets (#6464) Ouadie EL FAROUKI 2024-04-05 14:35:06 +01:00
  • 6285695777 readme : add Dot to UI list (#6487) alexpinel 2024-04-04 18:22:50 +01:00
  • a307375c02 readme : add Dot to UI list (#6487) alexpinel 2024-04-04 18:22:50 +01:00
  • cd082c6713 readme : fix typo (#6481) Jun Jie 2024-04-05 01:16:37 +08:00
  • b660a5729e readme : fix typo (#6481) Jun Jie 2024-04-05 01:16:37 +08:00
  • a57ef1110e server: add cURL support to server Dockerfiles (#6474) Ed Lepedus 2024-04-04 17:31:22 +01:00
  • 0a1d889e27 server: add cURL support to server Dockerfiles (#6474) Ed Lepedus 2024-04-04 17:31:22 +01:00
  • 016aa58b11 ci: exempt master branch workflows from getting cancelled (#6486) Minsoo Cheong 2024-04-05 01:30:53 +09:00
  • 7dda1b727e ci: exempt master branch workflows from getting cancelled (#6486) Minsoo Cheong 2024-04-05 01:30:53 +09:00
  • 7d32d7d775 build CI: Name artifacts (#6482) Ewout ter Hoeven 2024-04-04 17:08:55 +02:00
  • c666ba26c3 build CI: Name artifacts (#6482) Ewout ter Hoeven 2024-04-04 17:08:55 +02:00
  • deefac27f2 server: allow penalizing repetition of newlines on server webpage (#6431) Shakhar Dasgupta 2024-04-04 11:03:00 -04:00
  • 2e66913e5f server: allow penalizing repetition of newlines on server webpage (#6431) Shakhar Dasgupta 2024-04-04 11:03:00 -04:00
  • fd66566ee1 ci: bench fix concurrency for workflow trigger dispatch with sha1 (#6478) Pierrick Hymbert 2024-04-04 16:59:04 +02:00
  • 8120efee1d ci: bench fix concurrency for workflow trigger dispatch with sha1 (#6478) Pierrick Hymbert 2024-04-04 16:59:04 +02:00
  • 997c0854b4 Correct README link (#6458) limitedAtonement 2024-04-04 10:30:02 -04:00
  • a74401f0e5 Correct README link (#6458) limitedAtonement 2024-04-04 10:30:02 -04:00
  • 4cf330fb1e ci: bench: add more ftype, fix triggers and bot comment (#6466) Pierrick Hymbert 2024-04-04 11:57:58 +02:00
  • 7a2c92637a ci: bench: add more ftype, fix triggers and bot comment (#6466) Pierrick Hymbert 2024-04-04 11:57:58 +02:00
  • 6d074393e8 common: remove duplicate check for curl (#6471) Daniel Bevenius 2024-04-04 09:49:21 +02:00
  • 4bcd6b959c common: remove duplicate check for curl (#6471) Daniel Bevenius 2024-04-04 09:49:21 +02:00
  • 9179276b55 examples : add GBNF validator program (#5948) Clint Herron 2024-04-04 03:44:28 -04:00
  • 9b84ae1806 examples : add GBNF validator program (#5948) Clint Herron 2024-04-04 03:44:28 -04:00
  • af0871e8a7 server : remove obsolete --memory-f32 option Georgi Gerganov 2024-04-04 09:34:58 +03:00
  • 4399f13fb9 server : remove obsolete --memory-f32 option Georgi Gerganov 2024-04-04 09:34:58 +03:00
  • abdfc39ec8 server : add option to disable KV offload (#6468) Xiao-Yong Jin 2024-04-04 01:33:48 -05:00
  • 1a43c7254e server : add option to disable KV offload (#6468) Xiao-Yong Jin 2024-04-04 01:33:48 -05:00
  • d7a430215b convert : fix for lint error complaining of bare except (#6470) Clint Herron 2024-04-04 02:32:53 -04:00
  • 72d73af651 convert : fix for lint error complaining of bare except (#6470) Clint Herron 2024-04-04 02:32:53 -04:00
  • 7ee41c6e35 A few small fixes to server's README docs (#6428) Fattire 2024-04-03 13:22:57 -07:00
  • 5fb1574c81 A few small fixes to server's README docs (#6428) Fattire 2024-04-03 13:22:57 -07:00
  • 0f4e3af782 server : handle exception on wrong type in request (#6452) JH23X 2024-04-03 20:09:52 +02:00
  • 60cdf40cc3 server : handle exception on wrong type in request (#6452) JH23X 2024-04-03 20:09:52 +02:00
  • 919a8a7a7c llama : add SEA-LION support (#6448) bryanSwk 2024-04-04 02:05:10 +08:00
  • bb43cf7e9d llama : add SEA-LION support (#6448) bryanSwk 2024-04-04 02:05:10 +08:00
  • e9cd978f7e ci : update checkout, setup-python and upload-artifact to latest (#6456) Ewout ter Hoeven 2024-04-03 20:01:13 +02:00
  • 9f62c0173d ci : update checkout, setup-python and upload-artifact to latest (#6456) Ewout ter Hoeven 2024-04-03 20:01:13 +02:00
  • be830a3c64 server: add cURL support to server.Dockerfile (#6461) Ed Lepedus 2024-04-03 18:56:37 +01:00
  • 5d4f12e462 server: add cURL support to server.Dockerfile (#6461) Ed Lepedus 2024-04-03 18:56:37 +01:00
  • 840675869e readme : add feature-rich rust bindings (#6465) Francisco Melo 2024-04-03 18:53:37 +01:00
  • 154d4ee39c readme : add feature-rich rust bindings (#6465) Francisco Melo 2024-04-03 18:53:37 +01:00
  • d504285ccb security : create policy (#6354) Joyce 2024-04-03 14:48:07 -03:00
  • e69945d953 security : create policy (#6354) Joyce 2024-04-03 14:48:07 -03:00
  • 20b9b65e75 Missing tokenizer.model error during gguf conversion (#6443) Abhishek Gopinath K 2024-04-03 21:12:52 +05:30
  • db214fa578 Missing tokenizer.model error during gguf conversion (#6443) Abhishek Gopinath K 2024-04-03 21:12:52 +05:30
  • 17d75e9340 Add OpenChat, Alpaca, Vicuna chat templates (#6397) kaizau 2024-04-03 23:24:31 +08:00
  • 1ff4d9f3d6 Add OpenChat, Alpaca, Vicuna chat templates (#6397) kaizau 2024-04-03 23:24:31 +08:00
  • 4fd21ec901 readme : update hot topics Georgi Gerganov 2024-04-03 16:11:15 +03:00
  • 076b08649e readme : update hot topics Georgi Gerganov 2024-04-03 16:11:15 +03:00
  • 5d3839837b ggml : mul_mat_id use the same tensor for all the experts (#6387) slaren 2024-04-03 15:07:05 +02:00
  • 08a0c02060 ggml : mul_mat_id use the same tensor for all the experts (#6387) slaren 2024-04-03 15:07:05 +02:00
  • d29450b4ca [SYCL] Disable iqx on windows as WA (#6435) Meng, Hengyu 2024-04-03 10:34:40 +08:00
  • 52604860f9 [SYCL] Disable iqx on windows as WA (#6435) Meng, Hengyu 2024-04-03 10:34:40 +08:00
  • fd20ccef2e flake.lock: Update (#6402) Georgi Gerganov 2024-04-01 19:05:57 +03:00
  • f87f7b8986 flake.lock: Update (#6402) Georgi Gerganov 2024-04-01 19:05:57 +03:00
  • 5a5d9cbbe2 compare-llama-bench.py: fix long hexsha args (#6424) Johannes Gäßler 2024-04-01 13:30:43 +02:00
  • 33a5244806 compare-llama-bench.py: fix long hexsha args (#6424) Johannes Gäßler 2024-04-01 13:30:43 +02:00
  • 5f580568dd ci: server: verify deps are coherent with the commit (#6409) Pierrick Hymbert 2024-04-01 12:36:40 +02:00
  • 226e819371 ci: server: verify deps are coherent with the commit (#6409) Pierrick Hymbert 2024-04-01 12:36:40 +02:00
  • 47038dcea2 readme : update hot topics Georgi Gerganov 2024-03-31 11:56:30 +03:00
  • c50a82ce0f readme : update hot topics Georgi Gerganov 2024-03-31 11:56:30 +03:00
  • f0be4bd555 ci: bench: fix Resource not accessible by integration on PR event (#6393) Pierrick Hymbert 2024-03-30 11:36:07 +01:00
  • 37e7854c10 ci: bench: fix Resource not accessible by integration on PR event (#6393) Pierrick Hymbert 2024-03-30 11:36:07 +01:00
  • 8dbe6f877d Fedora build update (#6388) Mohammadreza Hendiani 2024-03-30 01:29:56 +03:30
  • c342d070c6 Fedora build update (#6388) Mohammadreza Hendiani 2024-03-30 01:29:56 +03:30
  • 75b580db0a split: allow --split-max-size option (#6343) Xuan Son Nguyen 2024-03-29 22:34:44 +01:00
  • f7fc5f6c6f split: allow --split-max-size option (#6343) Xuan Son Nguyen 2024-03-29 22:34:44 +01:00
  • 134e314654 Vulkan k-quant mmq and ggml-backend offload functionality (#6155) 0cc4m 2024-03-29 17:29:21 +01:00