Commit Graph

  • f68664ac24 convert : fix TypeError on GPT-2 vocab.json (#5288) Sang-Kil Park 2024-02-07 13:28:00 +09:00
  • 65b0045d18 server : remove model.json endpoint (#5371) Alexey Parfenov 2024-02-06 18:08:38 +00:00
  • 213d1439fa server : remove model.json endpoint (#5371) Alexey Parfenov 2024-02-06 18:08:38 +00:00
  • e3ee2b0879 CUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370) Johannes Gäßler 2024-02-06 18:43:06 +01:00
  • 17c97fb062 CUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370) Johannes Gäßler 2024-02-06 18:43:06 +01:00
  • afb11f7794 Update README.md (#5366) Kawrakow 2024-02-06 19:00:16 +02:00
  • b08f22c882 Update README.md (#5366) Kawrakow 2024-02-06 19:00:16 +02:00
  • bd1301d6c5 Slight quantization improvement for Q4_K and Q5_K (#5361) Kawrakow 2024-02-06 17:28:02 +02:00
  • f57fadc009 Slight quantization improvement for Q4_K and Q5_K (#5361) Kawrakow 2024-02-06 17:28:02 +02:00
  • 1675e8787c readme : add phi, orion 14b, internlm2, and yi-VL to readme (#5362) BarfingLemurs 2024-02-06 09:06:48 -05:00
  • 2e9c0bd6b3 readme : add phi, orion 14b, internlm2, and yi-VL to readme (#5362) BarfingLemurs 2024-02-06 09:06:48 -05:00
  • d293e063ce CUDA: mul_mat_vec_q for batch sizes > 1 (#5351) Johannes Gäßler 2024-02-06 14:44:06 +01:00
  • 2c516611f1 CUDA: mul_mat_vec_q for batch sizes > 1 (#5351) Johannes Gäßler 2024-02-06 14:44:06 +01:00
  • 0ecfdea861 server : include total "num_slots" in props endpoint (#5349) Justin Parker 2024-02-06 04:20:59 -05:00
  • 8a79c591de server : include total "num_slots" in props endpoint (#5349) Justin Parker 2024-02-06 04:20:59 -05:00
  • 218d94cfe7 server : add dynatemp_range and dynatemp_exponent (#5352) Michael Coppola 2024-02-06 04:20:00 -05:00
  • 31e7903221 server : add dynatemp_range and dynatemp_exponent (#5352) Michael Coppola 2024-02-06 04:20:00 -05:00
  • 4d4986a984 server : various fixes for the prompt field in /completion (#5300) Niall Coates 2024-02-06 08:16:23 +00:00
  • 4ffc7a17d4 server : various fixes for the prompt field in /completion (#5300) Niall Coates 2024-02-06 08:16:23 +00:00
  • 3ea6b587aa py : handle byte tokens in get_token_type (#5341) Georgi Gerganov 2024-02-06 07:47:22 +02:00
  • 906cff55c2 py : handle byte tokens in get_token_type (#5341) Georgi Gerganov 2024-02-06 07:47:22 +02:00
  • d9e98a5b79 make: Use ccache for faster compilation (#5318) Johannes Gäßler 2024-02-05 19:33:00 +01:00
  • 098f6d737b make: Use ccache for faster compilation (#5318) Johannes Gäßler 2024-02-05 19:33:00 +01:00
  • 468c6071d7 README: updated introduction (#5343) Johannes Gäßler 2024-02-05 15:55:10 +01:00
  • 78b00dda6c README: updated introduction (#5343) Johannes Gäßler 2024-02-05 15:55:10 +01:00
  • 82838bcba3 ggml : make use of ggml-quants.h possible in C++ code (#5338) Kawrakow 2024-02-05 14:09:47 +02:00
  • c6b395535a ggml : make use of ggml-quants.h possible in C++ code (#5338) Kawrakow 2024-02-05 14:09:47 +02:00
  • 53bc9b90e7 ggml : avoid duplicating function calls using MIN/MAX macros (#5325) Dr. Tom Murphy VII Ph.D 2024-02-05 06:13:57 -05:00
  • abb61944a5 ggml : avoid duplicating function calls using MIN/MAX macros (#5325) Dr. Tom Murphy VII Ph.D 2024-02-05 06:13:57 -05:00
  • eb5e5bc481 iq3_xxs: quards for the no-imatrix situation (#5334) Kawrakow 2024-02-05 12:32:27 +02:00
  • 89503dcb5f iq3_xxs: quards for the no-imatrix situation (#5334) Kawrakow 2024-02-05 12:32:27 +02:00
  • 4df81bbf3c py : fix internlm2-hf convert to gguf (#5305) Guoteng 2024-02-05 17:04:06 +08:00
  • 7e1ae372f3 py : fix internlm2-hf convert to gguf (#5305) Guoteng 2024-02-05 17:04:06 +08:00
  • c797312e11 iq2_xxs: tune quantization (#5320) Kawrakow 2024-02-05 10:46:06 +02:00
  • 6fdfa2ecc6 iq2_xxs: tune quantization (#5320) Kawrakow 2024-02-05 10:46:06 +02:00
  • d17d1428c7 server : allow to get default generation settings for completion (#5307) Alexey Parfenov 2024-02-05 08:10:22 +00:00
  • a2d60c9158 server : allow to get default generation settings for completion (#5307) Alexey Parfenov 2024-02-05 08:10:22 +00:00
  • 86fc73a5d8 common : add dynamic temperature parameters to main example cli (#5295) l3utterfly 2024-02-05 17:00:47 +09:00
  • e6f8177532 common : add dynamic temperature parameters to main example cli (#5295) l3utterfly 2024-02-05 17:00:47 +09:00
  • 120cae84ec scripts : fix typos, cleanup (#5303) Georgi Gerganov 2024-02-05 09:48:03 +02:00
  • 30679d438d scripts : fix typos, cleanup (#5303) Georgi Gerganov 2024-02-05 09:48:03 +02:00
  • b8eac7a045 scripts : add non-interactive server-llm.sh (#5303) Нияз Гарифзянов 2024-02-05 10:43:57 +03:00
  • 4be04c8965 scripts : add non-interactive server-llm.sh (#5303) Нияз Гарифзянов 2024-02-05 10:43:57 +03:00
  • 268ef8b555 readme : add CodeShell models to the supported models list (#5330) chiranko 2024-02-05 15:41:38 +08:00
  • 5d55b0cd82 readme : add CodeShell models to the supported models list (#5330) chiranko 2024-02-05 15:41:38 +08:00
  • 52f7d1ac5d [SYCL] Fix cpy with dims of 3 (#5289) AidanBeltonS 2024-02-05 07:08:24 +00:00
  • 4833ac209d [SYCL] Fix cpy with dims of 3 (#5289) AidanBeltonS 2024-02-05 07:08:24 +00:00
  • ded62473c5 flake.lock: Update github-actions[bot] 2024-02-04 00:17:24 +00:00
  • 9392ebd49e flake.lock: Update github-actions[bot] 2024-02-04 00:17:24 +00:00
  • cf4454fb96 Adding some imatrix tools (#5302) Kawrakow 2024-02-04 10:39:58 +02:00
  • 5ed26e1fc9 Adding some imatrix tools (#5302) Kawrakow 2024-02-04 10:39:58 +02:00
  • 31ac5ee8c8 cmake : use set() for LLAMA_WIN_VER (#5298) Welby Seely 2024-02-03 23:18:51 -05:00
  • 277fad30c6 cmake : use set() for LLAMA_WIN_VER (#5298) Welby Seely 2024-02-03 23:18:51 -05:00
  • 3f5c2d9c0d make: add nvcc info print (#5310) Johannes Gäßler 2024-02-03 20:15:13 +01:00
  • 3c0d25c475 make: add nvcc info print (#5310) Johannes Gäßler 2024-02-03 20:15:13 +01:00
  • dacfbb8ada make: fix nvcc optimization flags for host code (#5309) Johannes Gäßler 2024-02-03 20:14:59 +01:00
  • 3cc5ed353c make: fix nvcc optimization flags for host code (#5309) Johannes Gäßler 2024-02-03 20:14:59 +01:00
  • 0d266df204 add Vulkan support to Nix flake Martin Schwaighofer 2024-01-28 12:59:43 +01:00
  • 60ecf099ed add Vulkan support to Nix flake Martin Schwaighofer 2024-01-28 12:59:43 +01:00
  • 5ffdf03ba8 Vulkan Intel Fixes, Optimizations and Debugging Flags (#5301) 0cc4m 2024-02-03 18:15:00 +01:00
  • e920ed393d Vulkan Intel Fixes, Optimizations and Debugging Flags (#5301) 0cc4m 2024-02-03 18:15:00 +01:00
  • 1e8c6c465e refactor : switch to emplace_back to avoid extra object (#5291) Michael Klimenko 2024-02-03 12:23:37 +01:00
  • 52bb63c708 refactor : switch to emplace_back to avoid extra object (#5291) Michael Klimenko 2024-02-03 12:23:37 +01:00
  • 2ffb24965d YaRN : store rope scaling type as int32_t in memory (#5285) Jared Van Bortel 2024-02-03 06:22:06 -05:00
  • 1ec3332ade YaRN : store rope scaling type as int32_t in memory (#5285) Jared Van Bortel 2024-02-03 06:22:06 -05:00
  • fd6bcd4d7a readme : add tenere in the ui tools list (#5284) BADR 2024-02-03 12:20:26 +01:00
  • 6a66c5071a readme : add tenere in the ui tools list (#5284) BADR 2024-02-03 12:20:26 +01:00
  • 1e1ea9bcde Fix im2col with 32fp (#5286) AidanBeltonS 2024-02-03 08:11:37 +00:00
  • a305dba8ff Fix im2col with 32fp (#5286) AidanBeltonS 2024-02-03 08:11:37 +00:00
  • 7448bba8bc perplexity : fix KL divergence calculations on Windows (#5273) kalomaze 2024-02-02 08:15:30 -06:00
  • 191221178f perplexity : fix KL divergence calculations on Windows (#5273) kalomaze 2024-02-02 08:15:30 -06:00
  • afd1dc6907 scripts : parse wtype in server-llm.sh (#5167) Georgi Gerganov 2024-02-02 14:23:40 +02:00
  • e437b37fd0 scripts : parse wtype in server-llm.sh (#5167) Georgi Gerganov 2024-02-02 14:23:40 +02:00
  • 2bce5a7e2c py : add check for '.attn.masked_bias' layers to GPT2model (#5281) Mirror Azure 2024-02-02 14:39:09 +03:00
  • 2d40085c26 py : add check for '.attn.masked_bias' layers to GPT2model (#5281) Mirror Azure 2024-02-02 14:39:09 +03:00
  • 016fbe2996 Tidy ggml-sycl (#5261) AidanBeltonS 2024-02-02 08:39:48 +00:00
  • b05102fe8c Tidy ggml-sycl (#5261) AidanBeltonS 2024-02-02 08:39:48 +00:00
  • c64337ec66 docker : add build for SYCL, Vulkan + update readme (#5228) Xuan Son Nguyen 2024-02-02 08:56:31 +01:00
  • 6b91b1e0a9 docker : add build for SYCL, Vulkan + update readme (#5228) Xuan Son Nguyen 2024-02-02 08:56:31 +01:00
  • 922620b711 [SYCL] get MAX_MEM_ALLOC from device property (#5270) Meng, Hengyu 2024-02-02 15:54:14 +08:00
  • e805f0fa99 [SYCL] get MAX_MEM_ALLOC from device property (#5270) Meng, Hengyu 2024-02-02 15:54:14 +08:00
  • d78397520f [SYCL] update guide of SYCL backend (#5254) Neo Zhang Jianyu 2024-02-02 15:53:27 +08:00
  • af3ba5d946 [SYCL] update guide of SYCL backend (#5254) Neo Zhang Jianyu 2024-02-02 15:53:27 +08:00
  • dc1c37c0cf llama : fix memory leak in llama_batch_free (#5252) Ian Bull 2024-02-01 23:20:13 -08:00
  • e1e721094d llama : fix memory leak in llama_batch_free (#5252) Ian Bull 2024-02-01 23:20:13 -08:00
  • 4e41469952 add --no-mmap in llama-bench (#5257) Neo Zhang Jianyu 2024-02-02 03:48:53 +08:00
  • 128dcbd3c9 add --no-mmap in llama-bench (#5257) Neo Zhang Jianyu 2024-02-02 03:48:53 +08:00
  • 9027003dab Vulkan Phi Fix for AMD Proprietary Drivers (#5260) 0cc4m 2024-02-01 19:25:24 +01:00
  • 4d0924a890 Vulkan Phi Fix for AMD Proprietary Drivers (#5260) 0cc4m 2024-02-01 19:25:24 +01:00
  • ab0ea1427a cuda : fix LLAMA_CUDA_F16 (#5262) slaren 2024-02-01 18:30:17 +01:00
  • 8ca511cade cuda : fix LLAMA_CUDA_F16 (#5262) slaren 2024-02-01 18:30:17 +01:00
  • a8e064f6e5 make : generate .a library for static linking (#5205) Ali Nehzat 2024-02-02 02:18:53 +11:00
  • d71ac90985 make : generate .a library for static linking (#5205) Ali Nehzat 2024-02-02 02:18:53 +11:00
  • 8e1476c579 llama : support InternLM2 (#5184) Guoteng 2024-02-01 17:19:51 +08:00
  • ce32060198 llama : support InternLM2 (#5184) Guoteng 2024-02-01 17:19:51 +08:00
  • 01d251cd48 Fix broken Vulkan Cmake (properly) (#5230) Eve 2024-01-31 19:21:55 +00:00
  • 1cfb5372cf Fix broken Vulkan Cmake (properly) (#5230) Eve 2024-01-31 19:21:55 +00:00
  • 2c2414f814 llama : reorder build_orion() at correct place (#5118) Georgi Gerganov 2024-01-31 18:47:10 +02:00
  • d3bac7d584 llama : reorder build_orion() at correct place (#5118) Georgi Gerganov 2024-01-31 18:47:10 +02:00
  • fa0642fc01 llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240) Georgi Gerganov 2024-01-31 17:30:17 +02:00