Commit Graph

  • 9fcea71539 switch to using localizedDescription (#7010) Kevin Gibbons 2024-04-30 08:14:02 -07:00
  • f364eb6fb5 switch to using localizedDescription (#7010) Kevin Gibbons 2024-04-30 08:14:02 -07:00
  • ba4f181b0a metal : remove deprecated error code (#7008) Georgi Gerganov 2024-04-30 15:52:21 +03:00
  • 77e15bec62 metal : remove deprecated error code (#7008) Georgi Gerganov 2024-04-30 15:52:21 +03:00
  • 1ff6ff918c metal : log more info on error (#6987) Kevin Gibbons 2024-04-30 02:34:50 -07:00
  • a68a1e7ed0 metal : log more info on error (#6987) Kevin Gibbons 2024-04-30 02:34:50 -07:00
  • 3a4c3c374d ggml : add Flash Attention (#5021) Georgi Gerganov 2024-04-30 12:16:08 +03:00
  • 9c67c2773d ggml : add Flash Attention (#5021) Georgi Gerganov 2024-04-30 12:16:08 +03:00
  • aac7c0192a convert : use utf8 encoding (#7000) Georgi Gerganov 2024-04-30 11:05:25 +03:00
  • 952d03dbea convert : use utf8 encoding (#7000) Georgi Gerganov 2024-04-30 11:05:25 +03:00
  • d8818786c9 Improve usability of --model-url & related flags (#6930) Olivier Chafik 2024-04-30 00:52:50 +01:00
  • 8843a98c2b Improve usability of --model-url & related flags (#6930) Olivier Chafik 2024-04-30 00:52:50 +01:00
  • 65a25a2a29 Extending grammar integration tests (#6644) Clint Herron 2024-04-29 14:40:14 -04:00
  • b8c1476e44 Extending grammar integration tests (#6644) Clint Herron 2024-04-29 14:40:14 -04:00
  • 3dcf005c04 main : fix typo in comment in main.cpp (#6985) Daniel Bevenius 2024-04-29 19:56:59 +02:00
  • 5539e6fdd1 main : fix typo in comment in main.cpp (#6985) Daniel Bevenius 2024-04-29 19:56:59 +02:00
  • b9688eda68 build(cmake): simplify instructions (cmake -B build && cmake --build build ...) (#6964) Olivier Chafik 2024-04-29 17:02:45 +01:00
  • b8a7a5a90f build(cmake): simplify instructions (cmake -B build && cmake --build build ...) (#6964) Olivier Chafik 2024-04-29 17:02:45 +01:00
  • a46abdfd89 ci : tmp disable gguf-split (#6983) Georgi Gerganov 2024-04-29 18:36:39 +03:00
  • d2c898f746 ci : tmp disable gguf-split (#6983) Georgi Gerganov 2024-04-29 18:36:39 +03:00
  • 7d3b617ce8 ggml : fix __MSC_VER -> _MSC_VER (#6977) Georgi Gerganov 2024-04-29 17:55:02 +03:00
  • 544f1f10ad ggml : fix __MSC_VER -> _MSC_VER (#6977) Georgi Gerganov 2024-04-29 17:55:02 +03:00
  • 08e50da0e4 llava-cli : multiple images (#6969) cpumaxx 2024-04-29 07:34:24 -07:00
  • ffe666572f llava-cli : multiple images (#6969) cpumaxx 2024-04-29 07:34:24 -07:00
  • 74b151c966 readme : update hot topics Georgi Gerganov 2024-04-29 17:06:19 +03:00
  • 24affa7db3 readme : update hot topics Georgi Gerganov 2024-04-29 17:06:19 +03:00
  • 820703bf9c llama : fix BPE pre-tokenization (#6920) Georgi Gerganov 2024-04-29 16:58:41 +03:00
  • f4ab2a4147 llama : fix BPE pre-tokenization (#6920) Georgi Gerganov 2024-04-29 16:58:41 +03:00
  • 4fcd38d3e2 sampling : use std::random_device{}() for default random seed (#6962) David Renshaw 2024-04-29 09:35:45 -04:00
  • 3f167476b1 sampling : use std::random_device{}() for default random seed (#6962) David Renshaw 2024-04-29 09:35:45 -04:00
  • 4b73489d69 convert : fix conversion of some BERT embedding models (#6937) Christian Zhou-Zheng 2024-04-29 09:34:41 -04:00
  • 3055a41805 convert : fix conversion of some BERT embedding models (#6937) Christian Zhou-Zheng 2024-04-29 09:34:41 -04:00
  • 0bf2a9ced7 make : change GNU make default CXX from g++ to c++ (#6966) Przemysław Pawełczyk 2024-04-29 15:08:20 +02:00
  • 577277ffd2 make : change GNU make default CXX from g++ to c++ (#6966) Przemysław Pawełczyk 2024-04-29 15:08:20 +02:00
  • 2307a7b21c ci : add building in MSYS2 environments (Windows) (#6967) Przemysław Pawełczyk 2024-04-29 14:59:47 +02:00
  • ca7f29f568 ci : add building in MSYS2 environments (Windows) (#6967) Przemysław Pawełczyk 2024-04-29 14:59:47 +02:00
  • ca998d7836 llama : fix typo LAMMAFILE -> LLAMAFILE (#6974) Johannes Gäßler 2024-04-29 14:36:22 +02:00
  • c4f708a93f llama : fix typo LAMMAFILE -> LLAMAFILE (#6974) Johannes Gäßler 2024-04-29 14:36:22 +02:00
  • 0c6b78922a Fix more int overflow during quant (PPL/CUDA). (#6563) DAN™ 2024-04-28 18:38:44 -04:00
  • e00b4a8f81 Fix more int overflow during quant (PPL/CUDA). (#6563) DAN™ 2024-04-28 18:38:44 -04:00
  • d6e3c23e2f gguf : enforce that tensor names are unique (#6905) Xuan Son Nguyen 2024-04-28 17:36:18 +02:00
  • 7bb36ccf91 gguf : enforce that tensor names are unique (#6905) Xuan Son Nguyen 2024-04-28 17:36:18 +02:00
  • 45685f7607 add device version in device list (#6959) Neo Zhang 2024-04-28 22:40:31 +08:00
  • ce023f6f2f add device version in device list (#6959) Neo Zhang 2024-04-28 22:40:31 +08:00
  • 4687dbd9ac flake.lock: Update github-actions[bot] 2024-04-28 00:18:27 +00:00
  • 6e472f58e4 flake.lock: Update github-actions[bot] 2024-04-28 00:18:27 +00:00
  • cb34841313 Replace "alternative" boolean operator in conditional compilation directive (#6949) mgroeber9110 2024-04-27 21:02:06 +02:00
  • 4dba7e8114 Replace "alternative" boolean operator in conditional compilation directive (#6949) mgroeber9110 2024-04-27 21:02:06 +02:00
  • 6feab329fe ci: server: tests python env on github container ubuntu latest / fix n_predict (#6935) Pierrick Hymbert 2024-04-27 17:50:48 +02:00
  • b7368332e2 ci: server: tests python env on github container ubuntu latest / fix n_predict (#6935) Pierrick Hymbert 2024-04-27 17:50:48 +02:00
  • b67fa6eaed Reset schedule earlier to allow overlap with ggml graph computation on device (#6933) agray3 2024-04-26 19:08:30 +01:00
  • 928e0b7013 Reset schedule earlier to allow overlap with ggml graph computation on device (#6933) agray3 2024-04-26 19:08:30 +01:00
  • ffc7d66851 quantize: add imatrix and dataset metadata in GGUF (#6658) Pierrick Hymbert 2024-04-26 20:06:33 +02:00
  • 0c4d489e29 quantize: add imatrix and dataset metadata in GGUF (#6658) Pierrick Hymbert 2024-04-26 20:06:33 +02:00
  • 275bc14206 add basic tensor data validation function (#6884) slaren 2024-04-26 18:39:58 +02:00
  • 017e6999b5 add basic tensor data validation function (#6884) slaren 2024-04-26 18:39:58 +02:00
  • 2780c476e1 gguf : fix mismatch between alloc and free functions (#6929) slaren 2024-04-26 17:07:42 +02:00
  • e2764cd7ca gguf : fix mismatch between alloc and free functions (#6929) slaren 2024-04-26 17:07:42 +02:00
  • 93ed98625b llamafile : use 64-bit integers in sgemm (#6928) Justine Tunney 2024-04-26 10:05:33 -04:00
  • 4b1c3c98b4 llamafile : use 64-bit integers in sgemm (#6928) Justine Tunney 2024-04-26 10:05:33 -04:00
  • e5ef23a472 ci: server: fix python installation (#6925) Pierrick Hymbert 2024-04-26 12:27:25 +02:00
  • bbe3c6e761 ci: server: fix python installation (#6925) Pierrick Hymbert 2024-04-26 12:27:25 +02:00
  • a82d6e2eb0 server: stop generation at n_ctx_train if n_predict is not set (#6638) Pierrick Hymbert 2024-04-26 12:15:30 +02:00
  • 7f5ff558ee server: stop generation at n_ctx_train if n_predict is not set (#6638) Pierrick Hymbert 2024-04-26 12:15:30 +02:00
  • bdd8ba3806 ci: server: fix python installation (#6922) Pierrick Hymbert 2024-04-26 11:11:51 +02:00
  • 9e4e077ec5 ci: server: fix python installation (#6922) Pierrick Hymbert 2024-04-26 11:11:51 +02:00
  • 5f103c0fef Merge pull request from GHSA-p5mv-gjc5-mwqv Georgi Gerganov 2024-04-26 10:41:53 +03:00
  • 83b72cb086 Merge pull request from GHSA-p5mv-gjc5-mwqv Georgi Gerganov 2024-04-26 10:41:53 +03:00
  • a1cc26069f ci: server: fix python installation (#6918) Pierrick Hymbert 2024-04-26 09:27:49 +02:00
  • d4a9afc100 ci: server: fix python installation (#6918) Pierrick Hymbert 2024-04-26 09:27:49 +02:00
  • f8c07fa19a ci: fix concurrency for pull_request_target (#6917) Pierrick Hymbert 2024-04-26 09:26:59 +02:00
  • 7d641c26ac ci: fix concurrency for pull_request_target (#6917) Pierrick Hymbert 2024-04-26 09:26:59 +02:00
  • 22e7fa819f bench: server add stop word for PHI-2 (#6916) Pierrick Hymbert 2024-04-26 09:26:16 +02:00
  • 5790c8dac1 bench: server add stop word for PHI-2 (#6916) Pierrick Hymbert 2024-04-26 09:26:16 +02:00
  • 63cba2da4e llava : add support for moondream vision language model (#6899) vik 2024-04-25 12:38:31 -07:00
  • 46e12c4692 llava : add support for moondream vision language model (#6899) vik 2024-04-25 12:38:31 -07:00
  • 6b8354b2d0 cmake : restore LLAMA_LLAMAFILE_DEFAULT Georgi Gerganov 2024-04-25 21:31:17 +03:00
  • dba497e0c1 cmake : restore LLAMA_LLAMAFILE_DEFAULT Georgi Gerganov 2024-04-25 21:31:17 +03:00
  • e8f5549ad2 cmake : remove obsolete ANDROID check Georgi Gerganov 2024-04-25 18:59:51 +03:00
  • fa0b4ad252 cmake : remove obsolete ANDROID check Georgi Gerganov 2024-04-25 18:59:51 +03:00
  • f36e69fd59 llama : synchronize before get/set session data (#6911) slaren 2024-04-25 17:59:03 +02:00
  • d6e1d44f16 llama : synchronize before get/set session data (#6911) slaren 2024-04-25 17:59:03 +02:00
  • cfffb05250 ci : tmp disable slow tests Georgi Gerganov 2024-04-25 17:06:27 +03:00
  • 853d06ffe2 ci : tmp disable slow tests Georgi Gerganov 2024-04-25 17:06:27 +03:00
  • f03d13c5cf readme : update model list (#6908) BarfingLemurs 2024-04-25 09:52:28 -04:00
  • 3fe0596c18 readme : update model list (#6908) BarfingLemurs 2024-04-25 09:52:28 -04:00
  • 5a827d4012 llama : check that all the tensor data is in the model file (#6885) slaren 2024-04-25 15:23:47 +02:00
  • 0ead1f1072 llama : check that all the tensor data is in the model file (#6885) slaren 2024-04-25 15:23:47 +02:00
  • 0f1efe56bb ggml : fix redefinition of vaddvq_f32 for 32-bit ARM (#6906) Georgi Gerganov 2024-04-25 15:48:25 +03:00
  • 51543729ff ggml : fix redefinition of vaddvq_f32 for 32-bit ARM (#6906) Georgi Gerganov 2024-04-25 15:48:25 +03:00
  • 23fd2e9915 clip : rename lerp function to avoid conflict (#6894) Daniel Bevenius 2024-04-25 14:38:14 +02:00
  • 4ab99d8d47 clip : rename lerp function to avoid conflict (#6894) Daniel Bevenius 2024-04-25 14:38:14 +02:00
  • 3b3d637fc8 ggml : fix MIN / MAX macros (#6904) Georgi Gerganov 2024-04-25 15:12:28 +03:00
  • 54770413c4 ggml : fix MIN / MAX macros (#6904) Georgi Gerganov 2024-04-25 15:12:28 +03:00
  • 4ce9087ea4 tests : minor bash stuff (#6902) Georgi Gerganov 2024-04-25 14:27:20 +03:00
  • aa750c1ede tests : minor bash stuff (#6902) Georgi Gerganov 2024-04-25 14:27:20 +03:00
  • 1d00f348a3 quantize : add '--keep-split' to quantize model into shards (#6688) jiez 2024-04-25 18:29:35 +08:00
  • 1966eb2615 quantize : add '--keep-split' to quantize model into shards (#6688) jiez 2024-04-25 18:29:35 +08:00
  • 4adf11a516 README: add graphic for matrix multiplication (#6881) Johannes Gäßler 2024-04-24 21:29:13 +02:00
  • 784e11dea1 README: add graphic for matrix multiplication (#6881) Johannes Gäßler 2024-04-24 21:29:13 +02:00