Commit Graph

  • 1c51f98adc cuda : print the returned error when CUDA initialization fails (#6185) slaren 2024-03-20 21:03:26 +01:00
  • f00595726e llava : update MobileVLM-README.md (#6180) Ziang Wu 2024-03-20 23:29:51 +08:00
  • f9c7ba3447 llava : update MobileVLM-README.md (#6180) Ziang Wu 2024-03-20 23:29:51 +08:00
  • 04e192af46 llava : add MobileVLM_V2 backup (#6175) Ziang Wu 2024-03-20 23:02:32 +08:00
  • 272935b281 llava : add MobileVLM_V2 backup (#6175) Ziang Wu 2024-03-20 23:02:32 +08:00
  • 8d51f81ee4 cuda : refactor to remove global resources (#6170) slaren 2024-03-20 14:42:59 +01:00
  • ccf58aa3ec cuda : refactor to remove global resources (#6170) slaren 2024-03-20 14:42:59 +01:00
  • 20acc5b3e9 Server: version bump for httplib and json (#6169) Xuan Son Nguyen 2024-03-20 13:30:36 +01:00
  • 91f8ad167d Server: version bump for httplib and json (#6169) Xuan Son Nguyen 2024-03-20 13:30:36 +01:00
  • eb7c0903c9 gitignore : ignore curl-related files Georgi Gerganov 2024-03-20 14:17:34 +02:00
  • 6b7e76d28c gitignore : ignore curl-related files Georgi Gerganov 2024-03-20 14:17:34 +02:00
  • 11c28ed72b server : allow to override -ngl in tests (#6170) Georgi Gerganov 2024-03-20 14:14:32 +02:00
  • bc0baab2ea server : allow to override -ngl in tests (#6170) Georgi Gerganov 2024-03-20 14:14:32 +02:00
  • da44616253 Revert "llava : add a MobileVLM_V2-1.7B backup (#6152)" Georgi Gerganov 2024-03-20 13:29:49 +02:00
  • d795988d9e Revert "llava : add a MobileVLM_V2-1.7B backup (#6152)" Georgi Gerganov 2024-03-20 13:29:49 +02:00
  • aa66bb387c llava : add a MobileVLM_V2-1.7B backup (#6152) Ziang Wu 2024-03-20 19:20:37 +08:00
  • f8c4e745e1 llava : add a MobileVLM_V2-1.7B backup (#6152) Ziang Wu 2024-03-20 19:20:37 +08:00
  • 1d1248fb8c Server: Handle n_keep parameter in the request (#6174) Karthick 2024-03-20 16:32:34 +05:30
  • 47cc7a7bf9 Server: Handle n_keep parameter in the request (#6174) Karthick 2024-03-20 16:32:34 +05:30
  • 7609cf9583 server tests : more pythonic process management; fix bare except: (#6146) Jared Van Bortel 2024-03-20 01:33:49 -04:00
  • bd60d82d0c server tests : more pythonic process management; fix bare except: (#6146) Jared Van Bortel 2024-03-20 01:33:49 -04:00
  • 2742b450c1 update readme sycl for new update (#6151) Neo Zhang Jianyu 2024-03-20 11:21:41 +08:00
  • 6c0b287748 update readme sycl for new update (#6151) Neo Zhang Jianyu 2024-03-20 11:21:41 +08:00
  • dd2cb4edd5 increase igpu cluster limit (#6159) Abhilash Majumder 2024-03-20 08:28:49 +05:30
  • d26e8b669d increase igpu cluster limit (#6159) Abhilash Majumder 2024-03-20 08:28:49 +05:30
  • 2dd32d61b7 Remove undeed header file. (#6158) DAN™ 2024-03-19 12:16:09 -04:00
  • d8b009a945 Remove undeed header file. (#6158) DAN™ 2024-03-19 12:16:09 -04:00
  • 1b5523dc79 gguf-split: split and merge gguf per batch of tensors (#6135) Pierrick Hymbert 2024-03-19 12:05:44 +01:00
  • d0d5de42e5 gguf-split: split and merge gguf per batch of tensors (#6135) Pierrick Hymbert 2024-03-19 12:05:44 +01:00
  • dbae762c3f common : disable repeat penalties by default (#6127) Georgi Gerganov 2024-03-19 10:21:54 +02:00
  • b80cf3b2d1 common : disable repeat penalties by default (#6127) Georgi Gerganov 2024-03-19 10:21:54 +02:00
  • 24feb265d1 ci : exempt some labels from being tagged as stale (#6140) slaren 2024-03-19 09:06:54 +01:00
  • 970a48060a ci : exempt some labels from being tagged as stale (#6140) slaren 2024-03-19 09:06:54 +01:00
  • d332aa91b6 common : print usage on '-h' and '--help' (#6145) DAN™ 2024-03-19 01:59:36 -04:00
  • 4c28b82529 common : print usage on '-h' and '--help' (#6145) DAN™ 2024-03-19 01:59:36 -04:00
  • 2ee6390309 flake.lock: Update github-actions[bot] 2024-03-17 06:37:44 +00:00
  • 2d15886bb0 flake.lock: Update github-actions[bot] 2024-03-17 06:37:44 +00:00
  • ab78e80b7f mpt : implement backwards compatiblity with duped output tensor (#6139) Jared Van Bortel 2024-03-18 12:49:02 -04:00
  • d199ca79f2 mpt : implement backwards compatiblity with duped output tensor (#6139) Jared Van Bortel 2024-03-18 12:49:02 -04:00
  • 41eb9f687e clip : fix memory leak (#6138) Felix 2024-03-18 16:40:22 +01:00
  • 104f5e0fc1 clip : fix memory leak (#6138) Felix 2024-03-18 16:40:22 +01:00
  • d06ac1d0c1 backend : set max split inputs to GGML_MAX_SRC (#6137) slaren 2024-03-18 16:33:44 +01:00
  • 5e1b7f94a0 backend : set max split inputs to GGML_MAX_SRC (#6137) slaren 2024-03-18 16:33:44 +01:00
  • c31ea012cc ci : disable stale issue messages (#6126) Georgi Gerganov 2024-03-18 13:45:38 +02:00
  • ac9ee6a4ad ci : disable stale issue messages (#6126) Georgi Gerganov 2024-03-18 13:45:38 +02:00
  • 440d6a226c ci : temporary disable sanitizer builds (#6128) Georgi Gerganov 2024-03-18 13:45:27 +02:00
  • 4f6d1337ca ci : temporary disable sanitizer builds (#6128) Georgi Gerganov 2024-03-18 13:45:27 +02:00
  • d7d3ffdecf backend : offload large batches to GPU (#6083) slaren 2024-03-18 11:03:04 +01:00
  • 2bf8d0f7c4 backend : offload large batches to GPU (#6083) slaren 2024-03-18 11:03:04 +01:00
  • 4c689ebd66 common : tidy-up argument parsing (#6105) DAN™ 2024-03-18 04:27:44 -04:00
  • 496bc79bc2 common : tidy-up argument parsing (#6105) DAN™ 2024-03-18 04:27:44 -04:00
  • 3dd6805dcf convert : add support for CamembertModel architecture (#6119) Thérence 2024-03-18 09:17:00 +01:00
  • 9b03719ad7 convert : add support for CamembertModel architecture (#6119) Thérence 2024-03-18 09:17:00 +01:00
  • 8295e15e30 convert : use f32 outtype for bf16 tensors (#6106) Romain D 2024-03-18 09:04:41 +01:00
  • 3a6efdd03c convert : use f32 outtype for bf16 tensors (#6106) Romain D 2024-03-18 09:04:41 +01:00
  • cd0c187f9a common: llama_load_model_from_url using --model-url (#6098) Pierrick Hymbert 2024-03-17 19:12:37 +01:00
  • d01b3c4c32 common: llama_load_model_from_url using --model-url (#6098) Pierrick Hymbert 2024-03-17 19:12:37 +01:00
  • 1816cecf00 ci : close all stale issues at once (#6115) Georgi Gerganov 2024-03-17 19:51:57 +02:00
  • cd776c37c9 ci : close all stale issues at once (#6115) Georgi Gerganov 2024-03-17 19:51:57 +02:00
  • 9a6ac8119a ggml:fix finding transfer queue family index error (#6094) GainLee 2024-03-18 01:12:22 +08:00
  • dc0f612548 ggml:fix finding transfer queue family index error (#6094) GainLee 2024-03-18 01:12:22 +08:00
  • be9deffbe1 ggml : add AVX512F SIMD (#6088) AmirAli Mirian 2024-03-16 11:52:02 -04:00
  • c47cf414ef ggml : add AVX512F SIMD (#6088) AmirAli Mirian 2024-03-16 11:52:02 -04:00
  • ad084cb949 gritlm : add initial README.md (#6086) Daniel Bevenius 2024-03-16 16:46:29 +01:00
  • b5f4ae09c3 gritlm : add initial README.md (#6086) Daniel Bevenius 2024-03-16 16:46:29 +01:00
  • 077033ecff readme : add wllama as a wasm binding (#6100) Xuan Son Nguyen 2024-03-16 16:42:08 +01:00
  • dfbfdd60f9 readme : add wllama as a wasm binding (#6100) Xuan Son Nguyen 2024-03-16 16:42:08 +01:00
  • 3e799df80a common : refactor nested if causing error C1061 on MSVC (#6101) DAN™ 2024-03-16 11:39:15 -04:00
  • 15961ec04d common : refactor nested if causing error C1061 on MSVC (#6101) DAN™ 2024-03-16 11:39:15 -04:00
  • 2226be39d7 ci : close inactive issue with workflow (#6053) Pierrick Hymbert 2024-03-16 13:20:53 +01:00
  • a56d09a440 ci : close inactive issue with workflow (#6053) Pierrick Hymbert 2024-03-16 13:20:53 +01:00
  • eb9ea6d425 llama : fix Baichuan2 13B (#6092) slaren 2024-03-15 22:14:16 +01:00
  • d84c48505f llama : fix Baichuan2 13B (#6092) slaren 2024-03-15 22:14:16 +01:00
  • 0b21f1b9bc llama : add support for control vectors (#5970) Theia Vogel 2024-03-15 13:43:02 -07:00
  • 877b4d0c62 llama : add support for control vectors (#5970) Theia Vogel 2024-03-15 13:43:02 -07:00
  • b13c2a285a llama : add Command-R support (#6033) Andrew Canis 2024-03-15 16:41:22 -04:00
  • 12247f4c69 llama : add Command-R support (#6033) Andrew Canis 2024-03-15 16:41:22 -04:00
  • 4e3f9788ba llava : change API to pure C style for Rust FFI bindgen (#6079) Ting Lou 2024-03-15 22:31:05 +08:00
  • 4e9a7f7f7f llava : change API to pure C style for Rust FFI bindgen (#6079) Ting Lou 2024-03-15 22:31:05 +08:00
  • 1c80ddb229 cuda : disable unused cudaLaunchHostFunc code (#6078) slaren 2024-03-15 13:24:03 +01:00
  • 3020327f6c cuda : disable unused cudaLaunchHostFunc code (#6078) slaren 2024-03-15 13:24:03 +01:00
  • 58d73e4f6f fix set main gpu error (#6073) Neo Zhang Jianyu 2024-03-15 18:53:53 +08:00
  • 46acb36767 fix set main gpu error (#6073) Neo Zhang Jianyu 2024-03-15 18:53:53 +08:00
  • 0dba2910a1 make : ggml-metal.o depends on ggml.h Georgi Gerganov 2024-03-15 11:36:50 +02:00
  • 131b058409 make : ggml-metal.o depends on ggml.h Georgi Gerganov 2024-03-15 11:36:50 +02:00
  • 16e7faad23 [SYCL] Fix non-intel device selection (#6042) AidanBeltonS 2024-03-15 09:26:20 +00:00
  • 753e36f650 [SYCL] Fix non-intel device selection (#6042) AidanBeltonS 2024-03-15 09:26:20 +00:00
  • 9c4886ee7e gguf : add support for I64 and F64 arrays (#6062) Ondřej Čertík 2024-03-15 02:46:51 -06:00
  • 7ce2c77f88 gguf : add support for I64 and F64 arrays (#6062) Ondřej Čertík 2024-03-15 02:46:51 -06:00
  • e67a2e3d40 llama : add Orion chat template (#6066) Xuan Son Nguyen 2024-03-15 09:44:57 +01:00
  • aab606a11f llama : add Orion chat template (#6066) Xuan Son Nguyen 2024-03-15 09:44:57 +01:00
  • 878ec8fef5 llama-bench : use random tokens to improve accuracy with mixtral (#6069) slaren 2024-03-15 09:22:24 +01:00
  • b0bc9f4a9d llama-bench : use random tokens to improve accuracy with mixtral (#6069) slaren 2024-03-15 09:22:24 +01:00
  • 7c4cffe96d llama : fix integer overflow during quantization (#6063) Georgi Gerganov 2024-03-14 22:58:41 +02:00
  • 4755afd1cb llama : fix integer overflow during quantization (#6063) Georgi Gerganov 2024-03-14 22:58:41 +02:00
  • e290985462 gguf : fix resource leaks (#6061) Steve Grubb 2024-03-14 14:29:32 -04:00
  • 6e0438da3c gguf : fix resource leaks (#6061) Steve Grubb 2024-03-14 14:29:32 -04:00
  • 4dd654dc82 gguf-py : bump version to 0.8.0 (#6060) Ondřej Čertík 2024-03-14 11:57:31 -06:00
  • 727107707a gguf-py : bump version to 0.8.0 (#6060) Ondřej Čertík 2024-03-14 11:57:31 -06:00
  • a88d7966b5 llama : support models without vocabulary (#5798) Michael Podvitskiy 2024-03-14 17:21:56 +01:00