Commit Graph

  • ae1f211ce2 cuda : refactor into multiple files (#6269) slaren 2024-03-25 13:50:23 +01:00
  • 242d1c1e90 Server: clean up OAI params parsing function (#6284) Xuan Son Nguyen 2024-03-25 09:42:17 +01:00
  • ad3a0505e3 Server: clean up OAI params parsing function (#6284) Xuan Son Nguyen 2024-03-25 09:42:17 +01:00
  • 41c513730a [SYCL] fix SYCL backend build on windows is break by LOG() error (#6290) Neo Zhang Jianyu 2024-03-25 15:52:41 +08:00
  • 95ad616cdd [SYCL] fix SYCL backend build on windows is break by LOG() error (#6290) Neo Zhang Jianyu 2024-03-25 15:52:41 +08:00
  • 60dfbb2b55 examples : add "retrieval" (#6193) Minsoo Cheong 2024-03-25 16:38:22 +09:00
  • 64e7b47c69 examples : add "retrieval" (#6193) Minsoo Cheong 2024-03-25 16:38:22 +09:00
  • 37d58c3951 ggml : support AVX512VNNI (#6280) Justine Tunney 2024-03-25 01:39:56 -04:00
  • 7733f0c760 ggml : support AVX512VNNI (#6280) Justine Tunney 2024-03-25 01:39:56 -04:00
  • b7ffeac89e Fix heap corruption from wmode out-of-bound writes on windows (#6272) Rick G 2024-03-24 14:45:56 -07:00
  • a32b77c4b2 Fix heap corruption from wmode out-of-bound writes on windows (#6272) Rick G 2024-03-24 14:45:56 -07:00
  • 35dcb40f68 imatrix : fix wname for mul_mat_id ops (#6271) Georgi Gerganov 2024-03-24 16:18:45 +02:00
  • a0e584defd imatrix : fix wname for mul_mat_id ops (#6271) Georgi Gerganov 2024-03-24 16:18:45 +02:00
  • df091faf85 Fixed lookup compilation issues on Windows (#6273) Johannes Gäßler 2024-03-24 14:21:17 +01:00
  • 7aed0ffe68 Fixed lookup compilation issues on Windows (#6273) Johannes Gäßler 2024-03-24 14:21:17 +01:00
  • 5e3a953f58 ci : close inactive issue, increase operations per run (#6270) Pierrick Hymbert 2024-03-24 09:57:06 +01:00
  • ea279d5609 ci : close inactive issue, increase operations per run (#6270) Pierrick Hymbert 2024-03-24 09:57:06 +01:00
  • 67aff3f53d sampling : deduplicated code for probability distribution access (#6240) Minsoo Cheong 2024-03-24 17:54:07 +09:00
  • 586e7bc561 sampling : deduplicated code for probability distribution access (#6240) Minsoo Cheong 2024-03-24 17:54:07 +09:00
  • 0c01a79e96 [SYCL] offload op (#6217) Meng, Hengyu 2024-03-24 12:04:25 +08:00
  • ddf6568510 [SYCL] offload op (#6217) Meng, Hengyu 2024-03-24 12:04:25 +08:00
  • 8b641b7777 Support build win release for SYCL (#6241) Neo Zhang Jianyu 2024-03-24 09:44:01 +08:00
  • d03224ac98 Support build win release for SYCL (#6241) Neo Zhang Jianyu 2024-03-24 09:44:01 +08:00
  • 5006f7ee37 use _wfopen instead of fopen on Windows (#6248) Jared Van Bortel 2024-03-23 18:48:02 -04:00
  • 94d1b3b411 use _wfopen instead of fopen on Windows (#6248) Jared Van Bortel 2024-03-23 18:48:02 -04:00
  • d8dfe9020b gitignore : gguf-split Georgi Gerganov 2024-03-23 21:35:23 +02:00
  • 95562175f8 gitignore : gguf-split Georgi Gerganov 2024-03-23 21:35:23 +02:00
  • c9d1476f75 common: llama_load_model_from_url split support (#6192) Pierrick Hymbert 2024-03-23 18:07:00 +01:00
  • f482bb2e49 common: llama_load_model_from_url split support (#6192) Pierrick Hymbert 2024-03-23 18:07:00 +01:00
  • 2edca2eefb server: docs: --threads and --threads, --ubatch-size, --log-disable (#6254) Pierrick Hymbert 2024-03-23 18:00:38 +01:00
  • 1997577d5e server: docs: --threads and --threads, --ubatch-size, --log-disable (#6254) Pierrick Hymbert 2024-03-23 18:00:38 +01:00
  • ba07d55780 llama : add grok-1 support (#6204) Julius Arkenberg 2024-03-23 17:41:53 +01:00
  • 476b0251b2 llama : add grok-1 support (#6204) Julius Arkenberg 2024-03-23 17:41:53 +01:00
  • b6cf5b76eb split: add gguf-split in the make build target (#6262) Pierrick Hymbert 2024-03-23 17:18:13 +01:00
  • 21cad01b6e split: add gguf-split in the make build target (#6262) Pierrick Hymbert 2024-03-23 17:18:13 +01:00
  • 6331d10b51 server: flush stdout after logging in both text and json layout (#6253) Pierrick Hymbert 2024-03-23 13:18:45 +01:00
  • 1b26aebe4d server: flush stdout after logging in both text and json layout (#6253) Pierrick Hymbert 2024-03-23 13:18:45 +01:00
  • 56d74c8210 lookup: complement data from context with general text statistics (#5479) Johannes Gäßler 2024-03-23 01:24:36 +01:00
  • 50ccaf5eac lookup: complement data from context with general text statistics (#5479) Johannes Gäßler 2024-03-23 01:24:36 +01:00
  • 2f07837d9f common : default --hf-file to --model (#6234) Georgi Gerganov 2024-03-22 21:10:39 +02:00
  • 56a00f0a2f common : default --hf-file to --model (#6234) Georgi Gerganov 2024-03-22 21:10:39 +02:00
  • dd4f991835 convert-llama2c-to-ggml : enable conversion of GQA models (#6237) fraxy-v 2024-03-22 20:49:06 +02:00
  • 92397d87a4 convert-llama2c-to-ggml : enable conversion of GQA models (#6237) fraxy-v 2024-03-22 20:49:06 +02:00
  • 26dbb0527b quantize: options for output and token embedding tensors qtype (#6239) Kawrakow 2024-03-22 19:47:14 +01:00
  • 1d0331c12a quantize: options for output and token embedding tensors qtype (#6239) Kawrakow 2024-03-22 19:47:14 +01:00
  • 9127e8276f llama_model_loader: support multiple split/shard GGUFs (#6187) Pierrick Hymbert 2024-03-22 19:00:01 +01:00
  • dba1af6129 llama_model_loader: support multiple split/shard GGUFs (#6187) Pierrick Hymbert 2024-03-22 19:00:01 +01:00
  • f22129ee26 ci: apply concurrency limit for github workflows (#6243) Minsoo Cheong 2024-03-23 02:15:06 +09:00
  • ee804f6223 ci: apply concurrency limit for github workflows (#6243) Minsoo Cheong 2024-03-23 02:15:06 +09:00
  • 4db58e1aba common : add HF arg helpers (#6234) Georgi Gerganov 2024-03-22 15:33:38 +02:00
  • 80bd33bc2c common : add HF arg helpers (#6234) Georgi Gerganov 2024-03-22 15:33:38 +02:00
  • 11222d42ba llama : correction of the attn.v.weight quantization for IQ3_XS (#6209) Nexesenex 2024-03-22 14:32:02 +01:00
  • e80f06d2a1 llama : correction of the attn.v.weight quantization for IQ3_XS (#6209) Nexesenex 2024-03-22 14:32:02 +01:00
  • 876b9ffb5a tests : conditional python & node json schema tests (#6207) Olivier Chafik 2024-03-22 13:09:07 +00:00
  • f77a8ffd3b tests : conditional python & node json schema tests (#6207) Olivier Chafik 2024-03-22 13:09:07 +00:00
  • f41d10fa62 json-schema-to-grammar : fix order of props + non-str const/enum (#6232) Olivier Chafik 2024-03-22 13:07:44 +00:00
  • 72114edf06 json-schema-to-grammar : fix order of props + non-str const/enum (#6232) Olivier Chafik 2024-03-22 13:07:44 +00:00
  • 7e37d39b80 cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208) slaren 2024-03-22 14:05:31 +01:00
  • 2f0e81e053 cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208) slaren 2024-03-22 14:05:31 +01:00
  • 17123111b0 readme : add RecurseChat to the list of UIs (#6219) Xiaoyi Chen 2024-03-22 04:29:49 -07:00
  • 29ab270e65 readme : add RecurseChat to the list of UIs (#6219) Xiaoyi Chen 2024-03-22 04:29:49 -07:00
  • 44a079e4ca server : fix n_keep always showing as 0 in response (#6211) Jan Boon 2024-03-22 19:12:05 +08:00
  • 6b8bb3a31d server : fix n_keep always showing as 0 in response (#6211) Jan Boon 2024-03-22 19:12:05 +08:00
  • b6da6123ac server : enable continuous batching by default (#6231) Georgi Gerganov 2024-03-22 13:08:28 +02:00
  • 68e210b354 server : enable continuous batching by default (#6231) Georgi Gerganov 2024-03-22 13:08:28 +02:00
  • cfd7ac5c62 metal : proper assert for mat-mat memory alignment (#6225) Georgi Gerganov 2024-03-22 11:35:53 +02:00
  • b3e94f26ba metal : proper assert for mat-mat memory alignment (#6225) Georgi Gerganov 2024-03-22 11:35:53 +02:00
  • f130fbfb15 ci : add CURL flag for the mac builds (#6214) Vaibhav Srivastav 2024-03-22 08:53:43 +01:00
  • b2075fd6a5 ci : add CURL flag for the mac builds (#6214) Vaibhav Srivastav 2024-03-22 08:53:43 +01:00
  • fb99743daa metal : pad n_ctx by 32 (#6177) Georgi Gerganov 2024-03-22 09:36:03 +02:00
  • 95d576b48e metal : pad n_ctx by 32 (#6177) Georgi Gerganov 2024-03-22 09:36:03 +02:00
  • 1c52612039 add blog link (#6222) Neo Zhang Jianyu 2024-03-22 15:19:37 +08:00
  • 59c17f02de add blog link (#6222) Neo Zhang Jianyu 2024-03-22 15:19:37 +08:00
  • 1208ae0ed5 Fix params underscore convert to dash. (#6203) DAN™ 2024-03-21 21:32:42 -04:00
  • fa046eafbc Fix params underscore convert to dash. (#6203) DAN™ 2024-03-21 21:32:42 -04:00
  • 35a2494afd server : update readme doc from slot_id to id_slot (#6213) Jan Boon 2024-03-22 06:41:24 +08:00
  • be07a03217 server : update readme doc from slot_id to id_slot (#6213) Jan Boon 2024-03-22 06:41:24 +08:00
  • ce32312a92 cuda : disable host register by default (#6206) slaren 2024-03-21 19:54:28 +01:00
  • d0a71233fb cuda : disable host register by default (#6206) slaren 2024-03-21 19:54:28 +01:00
  • 1d1132ff2b Corrected typo to wrong file (#6199) semidark 2024-03-21 11:52:35 -06:00
  • f372c49ccd Corrected typo to wrong file (#6199) semidark 2024-03-21 11:52:35 -06:00
  • cc77fb2e24 tests : disable system() calls (#6198) Georgi Gerganov 2024-03-21 16:20:05 +02:00
  • 924ce1dce7 tests : disable system() calls (#6198) Georgi Gerganov 2024-03-21 16:20:05 +02:00
  • 99382100cf cuda : fix LLAMA_CUDA_F16 build (#6197) slaren 2024-03-21 13:59:53 +01:00
  • 03a8f8fafe cuda : fix LLAMA_CUDA_F16 build (#6197) slaren 2024-03-21 13:59:53 +01:00
  • 07792051a1 ggml : same IQ4_NL quantization for CPU/CUDA/Metal (#6196) Kawrakow 2024-03-21 13:59:38 +01:00
  • cfd3be76e3 ggml : same IQ4_NL quantization for CPU/CUDA/Metal (#6196) Kawrakow 2024-03-21 13:59:38 +01:00
  • c5b204162d json-schema-to-grammar improvements (+ added to server) (#5978) Olivier Chafik 2024-03-21 11:50:43 +00:00
  • 5b7b0ac8df json-schema-to-grammar improvements (+ added to server) (#5978) Olivier Chafik 2024-03-21 11:50:43 +00:00
  • f1d2df47e9 ci : fix indentation error (#6195) Vaibhav Srivastav 2024-03-21 10:30:40 +01:00
  • 1943c01981 ci : fix indentation error (#6195) Vaibhav Srivastav 2024-03-21 10:30:40 +01:00
  • b6bb9be322 build : add mac pre-build binaries (#6182) Vaibhav Srivastav 2024-03-21 10:13:12 +01:00
  • 5e43ba8742 build : add mac pre-build binaries (#6182) Vaibhav Srivastav 2024-03-21 10:13:12 +01:00
  • 4c0d39d55e Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183) Kawrakow 2024-03-21 08:27:57 +01:00
  • 76aa30a263 Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183) Kawrakow 2024-03-21 08:27:57 +01:00
  • 259776b1fe Add nvidia and amd backends (#6157) AidanBeltonS 2024-03-21 06:10:52 +00:00
  • c5b8595e3f Add nvidia and amd backends (#6157) AidanBeltonS 2024-03-21 06:10:52 +00:00
  • c8e04d9d6c cuda : fix conflict with std::swap (#6186) slaren 2024-03-21 01:47:46 +01:00
  • 42e21c6882 cuda : fix conflict with std::swap (#6186) slaren 2024-03-21 01:47:46 +01:00
  • 4e6c5139e2 cuda : print the returned error when CUDA initialization fails (#6185) slaren 2024-03-20 21:03:26 +01:00