Commit Graph

  • f1dc8c669a SimpleChat: a simple and dumb web front end for testing /chat/completions and /completions end points and try chat (#7350) HanishKVC 2024-05-22 23:23:21 +05:30
  • 1e374365d1 SimpleChat: a simple and dumb web front end for testing /chat/completions and /completions end points and try chat (#7350) HanishKVC 2024-05-22 23:23:21 +05:30
  • c37774e1ef build : remove zig (#7471) Georgi Gerganov 2024-05-22 20:05:38 +03:00
  • 197ff91462 build : remove zig (#7471) Georgi Gerganov 2024-05-22 20:05:38 +03:00
  • 43b6515153 common : normalize naming style (#7462) Georgi Gerganov 2024-05-22 20:04:20 +03:00
  • 6ff13987ad common : normalize naming style (#7462) Georgi Gerganov 2024-05-22 20:04:20 +03:00
  • b491ac4240 CUDA: fix FA out-of-bounds writes (#7465) Johannes Gäßler 2024-05-22 17:58:25 +02:00
  • 38c03478a3 CUDA: fix FA out-of-bounds writes (#7465) Johannes Gäßler 2024-05-22 17:58:25 +02:00
  • fde5560e23 phi3 : duplicate rope factors in each layer (#7447) slaren 2024-05-22 16:10:46 +02:00
  • b18532a4ef phi3 : duplicate rope factors in each layer (#7447) slaren 2024-05-22 16:10:46 +02:00
  • 9e51f2e934 vulkan: add workaround for iterator boundary check to fix clang-cl debug build (#7426) k.h.lai 2024-05-22 20:53:21 +08:00
  • fcda1128bc vulkan: add workaround for iterator boundary check to fix clang-cl debug build (#7426) k.h.lai 2024-05-22 20:53:21 +08:00
  • 6aed746d28 llama : add missing model type names (#7445) Justine Tunney 2024-05-22 07:08:18 -04:00
  • 03d8900ebe llama : add missing model type names (#7445) Justine Tunney 2024-05-22 07:08:18 -04:00
  • 5b113062d5 cuda : fix compile warning (#7454) Georgi Gerganov 2024-05-22 12:36:37 +03:00
  • 9b3d833189 cuda : fix compile warning (#7454) Georgi Gerganov 2024-05-22 12:36:37 +03:00
  • 71bf04b8bd CUDA: remove incorrect precision check (#7454) Johannes Gäßler 2024-05-22 10:24:29 +02:00
  • 95fb0aefab CUDA: remove incorrect precision check (#7454) Johannes Gäßler 2024-05-22 10:24:29 +02:00
  • 300da6320d cuda : fix rope + add tests (#7452) Georgi Gerganov 2024-05-22 11:01:35 +03:00
  • 3e5faa8503 cuda : fix rope + add tests (#7452) Georgi Gerganov 2024-05-22 11:01:35 +03:00
  • c1a6ad7577 llama : add phi3 128K model support (#7225) liuwei-git 2024-05-22 04:28:32 +08:00
  • 201cc11afa llama : add phi3 128K model support (#7225) liuwei-git 2024-05-22 04:28:32 +08:00
  • 58ca88c1f3 metal : handle F16 inf values, fix FA partial offload (#7434) Georgi Gerganov 2024-05-21 23:03:42 +03:00
  • 6369bf0433 metal : handle F16 inf values, fix FA partial offload (#7434) Georgi Gerganov 2024-05-21 23:03:42 +03:00
  • 287fa980b8 grammars: fix resampling logic regression (#7424) Olivier Chafik 2024-05-21 20:40:00 +01:00
  • e402de364b grammars: fix resampling logic regression (#7424) Olivier Chafik 2024-05-21 20:40:00 +01:00
  • 1f2bce9bc2 CUDA: fix unused warning in mmq.cu (#7442) Johannes Gäßler 2024-05-21 19:27:12 +02:00
  • fcf6538ba6 CUDA: fix unused warning in mmq.cu (#7442) Johannes Gäßler 2024-05-21 19:27:12 +02:00
  • 61ab7a8eb1 tests : test-tokenizer-0.sh print more info (#7402) Georgi Gerganov 2024-05-21 19:53:48 +03:00
  • c3f8d58356 tests : test-tokenizer-0.sh print more info (#7402) Georgi Gerganov 2024-05-21 19:53:48 +03:00
  • e205f11bbc examples: cache hf model when --model not provided (#7353) Amir 2024-05-21 17:13:12 +03:00
  • 11474e756d examples: cache hf model when --model not provided (#7353) Amir 2024-05-21 17:13:12 +03:00
  • 260949cad5 CUDA: deduplicate mmq code (#7397) Johannes Gäßler 2024-05-21 16:02:12 +02:00
  • d8ee902227 CUDA: deduplicate mmq code (#7397) Johannes Gäßler 2024-05-21 16:02:12 +02:00
  • 0dbe001317 Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425) jaime-m-p 2024-05-21 14:39:48 +02:00
  • d7e852c1bc Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425) jaime-m-p 2024-05-21 14:39:48 +02:00
  • 49a32c0167 Tokenizer SPM fixes for phi-3 and llama-spm (#7375) jaime-m-p 2024-05-20 20:15:57 +02:00
  • 917dc8cfa6 Tokenizer SPM fixes for phi-3 and llama-spm (#7375) jaime-m-p 2024-05-20 20:15:57 +02:00
  • 60faeefff0 llama : remove Persimmon (#7408) Georgi Gerganov 2024-05-20 19:35:28 +03:00
  • fabf30b4c4 llama : remove Persimmon (#7408) Georgi Gerganov 2024-05-20 19:35:28 +03:00
  • a2a24aec6f perplexity: update README FP16 results [no ci] (#7413) Johannes Gäßler 2024-05-20 18:15:38 +02:00
  • 20385cebcc perplexity: update README FP16 results [no ci] (#7413) Johannes Gäßler 2024-05-20 18:15:38 +02:00
  • 9b6d4a568c rpc : track allocated buffers (#7411) Radoslav Gerganov 2024-05-20 16:36:55 +03:00
  • db10f01310 rpc : track allocated buffers (#7411) Radoslav Gerganov 2024-05-20 16:36:55 +03:00
  • 31b2d6e05b server : fix temperature + disable some tests (#7409) Georgi Gerganov 2024-05-20 15:10:03 +03:00
  • 3bc10cb485 server : fix temperature + disable some tests (#7409) Georgi Gerganov 2024-05-20 15:10:03 +03:00
  • 6de307daa8 [SYCL] Update SYCL upscale operation (#7321) AidanBeltonS 2024-05-20 12:08:23 +01:00
  • 6bf9b66fa3 [SYCL] Update SYCL upscale operation (#7321) AidanBeltonS 2024-05-20 12:08:23 +01:00
  • 65fca291a0 Update README.md (#7410) Bingan 2024-05-20 17:55:34 +08:00
  • 26cd4237bc Update README.md (#7410) Bingan 2024-05-20 17:55:34 +08:00
  • a00e636fc5 ggml-opencl, llama: using reserve() if count already known (#7272) Herman Semenov 2024-05-20 07:33:21 +00:00
  • 213e90ed73 ggml-opencl, llama: using reserve() if count already known (#7272) Herman Semenov 2024-05-20 07:33:21 +00:00
  • 0ad2755e84 ggml : add loongarch lsx and lasx support (#6454) junchao-loongson 2024-05-20 15:19:21 +08:00
  • 65c58207ec ggml : add loongarch lsx and lasx support (#6454) junchao-loongson 2024-05-20 15:19:21 +08:00
  • c930c28bec server : tuning tests (#7388) Georgi Gerganov 2024-05-20 10:16:41 +03:00
  • 1cc0155d04 server : tuning tests (#7388) Georgi Gerganov 2024-05-20 10:16:41 +03:00
  • 9cc3a7c871 server : return error on too large embedding input (#7389) Georgi Gerganov 2024-05-20 08:56:05 +03:00
  • e932094d58 server : return error on too large embedding input (#7389) Georgi Gerganov 2024-05-20 08:56:05 +03:00
  • 8a5e27cbd7 tests : fix --keep_split -> --keep-split (#7374) Georgi Gerganov 2024-05-20 08:55:09 +03:00
  • 2789baf480 tests : fix --keep_split -> --keep-split (#7374) Georgi Gerganov 2024-05-20 08:55:09 +03:00
  • 2f4cf4d13a Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 (#7258) Srihari-mcw 2024-05-19 19:18:39 -07:00
  • 33c8d50acc Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 (#7258) Srihari-mcw 2024-05-19 19:18:39 -07:00
  • cb9cf0fb9b llama : remove MPI backend (#7395) slaren 2024-05-20 01:17:03 +02:00
  • d359f30921 llama : remove MPI backend (#7395) slaren 2024-05-20 01:17:03 +02:00
  • f43b1eb190 quantize : fix --keep-split check (#7374) Fred Douglas 2024-05-19 11:37:04 -05:00
  • 1ea2a0036e quantize : fix --keep-split check (#7374) Fred Douglas 2024-05-19 11:37:04 -05:00
  • 924913a1b7 Vulkan Embedding Fix (#7360) 0cc4m 2024-05-19 17:19:53 +02:00
  • f030ec1f7a Vulkan Embedding Fix (#7360) 0cc4m 2024-05-19 17:19:53 +02:00
  • 3265340345 ggml : fix another case of quants nans (#7387) slaren 2024-05-19 17:08:46 +02:00
  • e4e6f67be6 ggml : fix another case of quants nans (#7387) slaren 2024-05-19 17:08:46 +02:00
  • 59a38d4847 ggml: implement quantized KV cache for FA (#7372) Johannes Gäßler 2024-05-19 16:46:13 +02:00
  • 5ca49cbecd ggml: implement quantized KV cache for FA (#7372) Johannes Gäßler 2024-05-19 16:46:13 +02:00
  • a742a54fd0 server: add test for token probs (#7347) Johannes Gäßler 2024-05-19 16:26:02 +02:00
  • 1b01f06db0 server: add test for token probs (#7347) Johannes Gäßler 2024-05-19 16:26:02 +02:00
  • 9ae757d0b5 server: fix seed being reported back (#7382) Johannes Gäßler 2024-05-19 16:06:33 +02:00
  • 41858392e1 server: fix seed being reported back (#7382) Johannes Gäßler 2024-05-19 16:06:33 +02:00
  • 753bb58afa Add StableLM2 pre-tokenizer (#7349) Anas Ahouzi 2024-05-19 14:46:46 +02:00
  • 6aade19ee7 Add StableLM2 pre-tokenizer (#7349) Anas Ahouzi 2024-05-19 14:46:46 +02:00
  • 802b614cd9 cuda : clear error after buffer allocation failure (#7376) slaren 2024-05-19 14:19:37 +02:00
  • ab33f7a338 cuda : clear error after buffer allocation failure (#7376) slaren 2024-05-19 14:19:37 +02:00
  • a846498a4a labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363) Brian 2024-05-19 20:51:03 +10:00
  • e23b974f4c labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363) Brian 2024-05-19 20:51:03 +10:00
  • beb87b0aed cmake : update android comments (#7341) Georgi Gerganov 2024-05-19 11:01:01 +03:00
  • 854d365aba cmake : update android comments (#7341) Georgi Gerganov 2024-05-19 11:01:01 +03:00
  • 64ae46a41c Capture CUDA logging output (#7298) fraxy-v 2024-05-19 01:44:42 +03:00
  • f5bf761747 Capture CUDA logging output (#7298) fraxy-v 2024-05-19 01:44:42 +03:00
  • cc5796c0ec ci : re-enable sanitizer runs (#7358) Georgi Gerganov 2024-05-18 18:55:54 +03:00
  • 059031b8c4 ci : re-enable sanitizer runs (#7358) Georgi Gerganov 2024-05-18 18:55:54 +03:00
  • ae3045fe3f android : use "ci-android" branch for CI (#7341) Georgi Gerganov 2024-05-18 13:40:39 +03:00
  • 511182eabb android : use "ci-android" branch for CI (#7341) Georgi Gerganov 2024-05-18 13:40:39 +03:00
  • 6fe8769d65 CUDA: deduplicate FlashAttention code (#7352) Johannes Gäßler 2024-05-18 12:36:25 +02:00
  • 133d99c599 CUDA: deduplicate FlashAttention code (#7352) Johannes Gäßler 2024-05-18 12:36:25 +02:00
  • 97cd158809 server: correct --threads documentation [no ci] (#7362) Johannes Gäßler 2024-05-18 11:10:47 +02:00
  • cb42c29427 server: correct --threads documentation [no ci] (#7362) Johannes Gäßler 2024-05-18 11:10:47 +02:00
  • faf3777e1c cuda : add half2 __shfl_xor() for ROCm 5.5 (#7263) Engininja2 2024-05-18 02:05:17 -06:00
  • d233b507cd cuda : add half2 __shfl_xor() for ROCm 5.5 (#7263) Engininja2 2024-05-18 02:05:17 -06:00
  • 1e9bede474 llama : add support for larger Granite Code Models (20B, 34B) (#7324) Steffen Röcker 2024-05-18 10:04:55 +02:00
  • 0f98acfac6 llama : add support for larger Granite Code Models (20B, 34B) (#7324) Steffen Röcker 2024-05-18 10:04:55 +02:00
  • 048941c1ee perplexity : ndot progress and show stats with < 100 tasks (#7348) strawberrymelonpanda 2024-05-18 00:57:08 -07:00
  • ca57e0f35e perplexity : ndot progress and show stats with < 100 tasks (#7348) strawberrymelonpanda 2024-05-18 00:57:08 -07:00