Commit Graph

  • f3f28c5395 cmake : fix GGML_USE_SYCL typo (#5555) Georgi Gerganov 2024-02-18 19:17:00 +02:00
  • af5d2d4d3d server : enhanced health endpoint (#5548) Pierrick Hymbert 2024-02-18 17:31:28 +01:00
  • e75c6279d1 server : enhanced health endpoint (#5548) Pierrick Hymbert 2024-02-18 17:31:28 +01:00
  • f01cb6dac9 server : --n-predict option document and cap to max value (#5549) Pierrick Hymbert 2024-02-18 17:30:09 +01:00
  • 36376abe05 server : --n-predict option document and cap to max value (#5549) Pierrick Hymbert 2024-02-18 17:30:09 +01:00
  • 42c5518ad5 server : graceful server shutdown (#5244) Daniel Hiltgen 2024-02-18 08:23:16 -08:00
  • 66c1968f7a server : graceful server shutdown (#5244) Daniel Hiltgen 2024-02-18 08:23:16 -08:00
  • 3c8f28b4d9 common : fix ub (#5530) Georgi Gerganov 2024-02-18 18:21:52 +02:00
  • 1dcc3fde00 common : fix ub (#5530) Georgi Gerganov 2024-02-18 18:21:52 +02:00
  • d5d86073dc ggml, common, examples, tests : fixed type arguments in printf (#5528) Herman Semenov 2024-02-18 16:20:12 +00:00
  • 5d3de51f97 ggml, common, examples, tests : fixed type arguments in printf (#5528) Herman Semenov 2024-02-18 16:20:12 +00:00
  • 787dbf4d0f llava : update surgery script to not remove tensors (#5536) Daniel Bevenius 2024-02-18 17:19:23 +01:00
  • fc0c8d286a llava : update surgery script to not remove tensors (#5536) Daniel Bevenius 2024-02-18 17:19:23 +01:00
  • fa40433c9d 1.5 bit quantization (#5453) Kawrakow 2024-02-18 18:16:55 +02:00
  • bd2d4e393b 1.5 bit quantization (#5453) Kawrakow 2024-02-18 18:16:55 +02:00
  • b02abf3383 flake.lock: Update github-actions[bot] 2024-02-18 00:17:07 +00:00
  • c8e0d7efeb flake.lock: Update github-actions[bot] 2024-02-18 00:17:07 +00:00
  • a5c73a0e9d ggml : add ALiBi support for ggml_soft_max_ext (#5488) Georgi Gerganov 2024-02-17 23:04:16 +02:00
  • 8f1be0d42f ggml : add ALiBi support for ggml_soft_max_ext (#5488) Georgi Gerganov 2024-02-17 23:04:16 +02:00
  • 27a984488c ci : add an option to fail on compile warning (#3952) Ananta Bastola 2024-02-17 16:03:14 -05:00
  • 6e4e973b26 ci : add an option to fail on compile warning (#3952) Ananta Bastola 2024-02-17 16:03:14 -05:00
  • 5c46b0e0bc gitignore : update for CLion IDE (#5544) clibdev 2024-02-17 18:28:37 +02:00
  • d250c9d61d gitignore : update for CLion IDE (#5544) clibdev 2024-02-17 18:28:37 +02:00
  • 8f20d4fb3e cmake : fix VULKAN and ROCm builds (#5525) Georgi Gerganov 2024-02-16 19:05:56 +02:00
  • 5bf2b94dd4 cmake : fix VULKAN and ROCm builds (#5525) Georgi Gerganov 2024-02-16 19:05:56 +02:00
  • b6ebf5b6ff scripts : add helpers script for bench comparing commits (#5521) Georgi Gerganov 2024-02-16 15:14:40 +02:00
  • d2819d5577 scripts : add helpers script for bench comparing commits (#5521) Georgi Gerganov 2024-02-16 15:14:40 +02:00
  • 26bfe98833 llava : removed excess free(NULL) operation (#5531) Herman Semenov 2024-02-16 12:43:23 +00:00
  • 4cb0727698 llava : removed excess free(NULL) operation (#5531) Herman Semenov 2024-02-16 12:43:23 +00:00
  • f54ed222bf llama : minor fixed return int value (#5529) Herman Semenov 2024-02-16 11:45:48 +00:00
  • 65085c713e llama : minor fixed return int value (#5529) Herman Semenov 2024-02-16 11:45:48 +00:00
  • 14c96c2c4c server : add "samplers" param to control the samplers order (#5494) Alexey Parfenov 2024-02-16 11:33:25 +00:00
  • 6dcc02d244 server : add "samplers" param to control the samplers order (#5494) Alexey Parfenov 2024-02-16 11:33:25 +00:00
  • f183700c25 server : fix system prompt cli (#5516) Rőczey Barnabás 2024-02-16 11:00:56 +01:00
  • 5f5808ca7b server : fix system prompt cli (#5516) Rőczey Barnabás 2024-02-16 11:00:56 +01:00
  • 4bc5d852a2 ggml : add numa options (#5377) bmwl 2024-02-16 01:31:07 -08:00
  • f486f6e1e5 ggml : add numa options (#5377) bmwl 2024-02-16 01:31:07 -08:00
  • 384e6bbbe7 llava : fix clip-model-is-vision flag in README.md (#5509) Daniel Bevenius 2024-02-16 10:24:39 +01:00
  • 60ed04cf82 llava : fix clip-model-is-vision flag in README.md (#5509) Daniel Bevenius 2024-02-16 10:24:39 +01:00
  • adfe875b17 ci : fix BERT model download and convert Georgi Gerganov 2024-02-16 09:57:55 +02:00
  • 594845aab1 ci : fix BERT model download and convert Georgi Gerganov 2024-02-16 09:57:55 +02:00
  • 9bc28075c1 Use correct type of pooling for embedding models (#5500) Douglas Hanley 2024-02-15 11:21:49 -06:00
  • 4524290e87 Use correct type of pooling for embedding models (#5500) Douglas Hanley 2024-02-15 11:21:49 -06:00
  • fc7f903883 clip : fix wrong loop condition Georgi Gerganov 2024-02-15 18:49:08 +02:00
  • c06e45d729 clip : fix wrong loop condition Georgi Gerganov 2024-02-15 18:49:08 +02:00
  • a8d3e145cc cuda : print message when initialization fails (#5512) slaren 2024-02-15 16:49:01 +01:00
  • 9060a1e9df cuda : print message when initialization fails (#5512) slaren 2024-02-15 16:49:01 +01:00
  • 31e77c5029 scripts : add hf.sh helper script (#5501) Georgi Gerganov 2024-02-15 15:41:15 +02:00
  • 9350a1cf21 scripts : add hf.sh helper script (#5501) Georgi Gerganov 2024-02-15 15:41:15 +02:00
  • e56fe7b5ca fix(gguf-py): special tokens are no longer skipped when add_<token>_token is set to false (#5487) Michaël de Vries 2024-02-15 14:14:37 +01:00
  • 73122473ff fix(gguf-py): special tokens are no longer skipped when add_<token>_token is set to false (#5487) Michaël de Vries 2024-02-15 14:14:37 +01:00
  • e4880f18f3 llava : fix memory management bug (#5491) Elbios 2024-02-15 09:01:57 +01:00
  • 0d4177126b llava : fix memory management bug (#5491) Elbios 2024-02-15 09:01:57 +01:00
  • 66faa7f5d2 llaba : hotfix for llava-1.6 image number (#5495) John 2024-02-15 08:59:18 +01:00
  • 7930a8a6e8 llaba : hotfix for llava-1.6 image number (#5495) John 2024-02-15 08:59:18 +01:00
  • a5579e4f93 vulkan: Find optimal memory type but with fallback (#5381) Neuman Vong 2024-02-15 17:11:15 +11:00
  • 704359e299 vulkan: Find optimal memory type but with fallback (#5381) Neuman Vong 2024-02-15 17:11:15 +11:00
  • de06c28a15 readme : fix typo (#5490) Rune 2024-02-14 16:15:49 +01:00
  • 594fca3fef readme : fix typo (#5490) Rune 2024-02-14 16:15:49 +01:00
  • 07687d7350 llava : update README.md (#5489) John 2024-02-14 15:49:42 +01:00
  • ccbb277f46 llava : update README.md (#5489) John 2024-02-14 15:49:42 +01:00
  • f5c9547e67 cmake : ARM intrinsics detection for MSVC (#5401) Michael Podvitskiy 2024-02-14 11:49:01 +03:00
  • 8084d55440 cmake : ARM intrinsics detection for MSVC (#5401) Michael Podvitskiy 2024-02-14 11:49:01 +03:00
  • 4351148229 llava : support v1.6 (#5267) John 2024-02-14 08:38:35 +01:00
  • aa23412989 llava : support v1.6 (#5267) John 2024-02-14 08:38:35 +01:00
  • 0aa506cb97 Early return for zero size calls to get_tensor. (#5482) AT 2024-02-13 15:44:25 -06:00
  • f5ca054855 Early return for zero size calls to get_tensor. (#5482) AT 2024-02-13 15:44:25 -06:00
  • 3872f9b23f gguf : add python reader example (#5216) John 2024-02-13 18:56:38 +01:00
  • 6c00a06692 gguf : add python reader example (#5216) John 2024-02-13 18:56:38 +01:00
  • f79cb016f4 llama : add support for Nomic Embed (#5468) Jared Van Bortel 2024-02-13 12:03:53 -05:00
  • ea9c8e1143 llama : add support for Nomic Embed (#5468) Jared Van Bortel 2024-02-13 12:03:53 -05:00
  • d03fb522e5 llama : allow raw byte in SPM vocabs; don't crash on nl 404 (#5478) Aarni Koskela 2024-02-13 18:18:16 +02:00
  • c4e6dd59e4 llama : allow raw byte in SPM vocabs; don't crash on nl 404 (#5478) Aarni Koskela 2024-02-13 18:18:16 +02:00
  • 604cdd2d78 llama : make load error reporting more granular (#5477) Aarni Koskela 2024-02-13 15:24:50 +02:00
  • 037259be68 llama : make load error reporting more granular (#5477) Aarni Koskela 2024-02-13 15:24:50 +02:00
  • 49cc8b2c60 finetune : rename feed-forward tensors (w1/w2/w3) (#4839) Daniel Bevenius 2024-02-13 14:15:42 +01:00
  • 263978904c finetune : rename feed-forward tensors (w1/w2/w3) (#4839) Daniel Bevenius 2024-02-13 14:15:42 +01:00
  • b8667a8447 tests : multi-thread the tokenizer tests (#5474) Georgi Gerganov 2024-02-13 15:14:22 +02:00
  • cf45252a7c tests : multi-thread the tokenizer tests (#5474) Georgi Gerganov 2024-02-13 15:14:22 +02:00
  • f54932ce66 llama : support batched embeddings (#5466) Douglas Hanley 2024-02-13 06:06:58 -06:00
  • 03bf161eb6 llama : support batched embeddings (#5466) Douglas Hanley 2024-02-13 06:06:58 -06:00
  • 990dfea10c make: add error message for bad CUDA version (#5444) Johannes Gäßler 2024-02-13 12:38:37 +01:00
  • ad014bba97 make: add error message for bad CUDA version (#5444) Johannes Gäßler 2024-02-13 12:38:37 +01:00
  • a6747dccc5 bert : add tests + fix quantization (#5475) Georgi Gerganov 2024-02-13 13:01:29 +02:00
  • 49cc1f7d67 bert : add tests + fix quantization (#5475) Georgi Gerganov 2024-02-13 13:01:29 +02:00
  • fea77814c7 tests : disable moe test (#5473) Georgi Gerganov 2024-02-13 11:20:24 +02:00
  • 99b8b43d7b tests : disable moe test (#5473) Georgi Gerganov 2024-02-13 11:20:24 +02:00
  • a0074519b4 ggml-quants : fix compiler warnings (shadow variable) (#5472) Kawrakow 2024-02-13 09:07:57 +02:00
  • 895407f31b ggml-quants : fix compiler warnings (shadow variable) (#5472) Kawrakow 2024-02-13 09:07:57 +02:00
  • fc01cc08e5 llama : fix quantization when tensors are missing (#5423) Georgi Gerganov 2024-02-12 20:14:39 +02:00
  • 099afc6274 llama : fix quantization when tensors are missing (#5423) Georgi Gerganov 2024-02-12 20:14:39 +02:00
  • 6858a4fa83 swift : package no longer use ggml dependency (#5465) Georgi Gerganov 2024-02-12 19:54:29 +02:00
  • df334a1125 swift : package no longer use ggml dependency (#5465) Georgi Gerganov 2024-02-12 19:54:29 +02:00
  • 6a83825ce0 py : fix persimmon n_rot conversion (#5460) Lee 2024-02-13 01:29:57 +08:00
  • dbd8828eb0 py : fix persimmon n_rot conversion (#5460) Lee 2024-02-13 01:29:57 +08:00
  • 16dba45e9c ggml-sycl: Replace 3d ops with macro (#5458) Abhilash Majumder 2024-02-12 20:22:05 +05:30
  • 43fe07c1a4 ggml-sycl: Replace 3d ops with macro (#5458) Abhilash Majumder 2024-02-12 20:22:05 +05:30
  • 5270994e6d llava : remove prog parameter from ArgumentParser (#5457) Daniel Bevenius 2024-02-12 09:38:44 +01:00
  • 4a46d2b792 llava : remove prog parameter from ArgumentParser (#5457) Daniel Bevenius 2024-02-12 09:38:44 +01:00
  • 0ca4e0c14c sync : ggml (#5452) Georgi Gerganov 2024-02-12 09:16:06 +02:00