Commit Graph

  • 8334d57dc9 issues : change label from bug to bug-unconfirmed (#3748) Georgi Gerganov 2023-10-28 15:25:33 +03:00
  • ba231e8a6d issues : change label from bug to bug-unconfirmed (#3748) Georgi Gerganov 2023-10-28 15:25:33 +03:00
  • 8359bb07d0 convert : ignore tokens if their IDs are within [0, vocab_size) (#3831) Georgi Gerganov 2023-10-28 15:25:15 +03:00
  • 8a2f2fea29 convert : ignore tokens if their IDs are within [0, vocab_size) (#3831) Georgi Gerganov 2023-10-28 15:25:15 +03:00
  • 04bc57d01d llama : allow quantizing k-quants to fall back when tensor size incompatible (#3747) Kerfuffle 2023-10-28 05:54:24 -06:00
  • bd6d9e2059 llama : allow quantizing k-quants to fall back when tensor size incompatible (#3747) Kerfuffle 2023-10-28 05:54:24 -06:00
  • dd8b7789a0 llama : add option for greedy sampling with probs (#3813) Georgi Gerganov 2023-10-28 14:23:11 +03:00
  • ee1a0ec9cb llama : add option for greedy sampling with probs (#3813) Georgi Gerganov 2023-10-28 14:23:11 +03:00
  • 18be82b674 common : print that one line of the syntax help *also* to standard output (#3823) Henk Poley 2023-10-28 12:16:33 +02:00
  • 177461104b common : print that one line of the syntax help *also* to standard output (#3823) Henk Poley 2023-10-28 12:16:33 +02:00
  • 45c1ab6155 starcoder : add GPU offloading (#3827) Georgi Gerganov 2023-10-28 12:06:08 +03:00
  • fdee152e4e starcoder : add GPU offloading (#3827) Georgi Gerganov 2023-10-28 12:06:08 +03:00
  • 82978d0a4a speculative : ensure draft and target model vocab matches (#3812) Kerfuffle 2023-10-27 15:40:07 -06:00
  • 41aee4df82 speculative : ensure draft and target model vocab matches (#3812) Kerfuffle 2023-10-27 15:40:07 -06:00
  • 301c99c7d6 llama : correctly report GGUFv3 format (#3818) cebtenzzre 2023-10-27 17:33:53 -04:00
  • 6d459cbfbe llama : correctly report GGUFv3 format (#3818) cebtenzzre 2023-10-27 17:33:53 -04:00
  • 01d7a8d60e simple : fix batch handling (#3803) Thibault Terrasson 2023-10-27 16:37:41 +02:00
  • c8d6a1f34a simple : fix batch handling (#3803) Thibault Terrasson 2023-10-27 16:37:41 +02:00
  • 06d579b4ed cuda : improve text-generation and batched decoding performance (#3776) Georgi Gerganov 2023-10-27 17:01:23 +03:00
  • 2f9ec7e271 cuda : improve text-generation and batched decoding performance (#3776) Georgi Gerganov 2023-10-27 17:01:23 +03:00
  • 292889a84b server : do not release slot on image input (#3798) Georgi Gerganov 2023-10-26 22:53:37 +03:00
  • 34b2a5e1ee server : do not release slot on image input (#3798) Georgi Gerganov 2023-10-26 22:53:37 +03:00
  • 44f3e63aea batched-bench : print params at start Georgi Gerganov 2023-10-25 10:26:27 +03:00
  • 6961c4bd0b batched-bench : print params at start Georgi Gerganov 2023-10-25 10:26:27 +03:00
  • 8a2ce10daa log : disable pid in log filenames Georgi Gerganov 2023-10-25 10:09:16 +03:00
  • cc44877486 log : disable pid in log filenames Georgi Gerganov 2023-10-25 10:09:16 +03:00
  • a95405ff2b server : add parameter -tb N, --threads-batch N (#3584) (#3768) cebtenzzre 2023-10-24 16:10:43 -04:00
  • ad93962657 server : add parameter -tb N, --threads-batch N (#3584) (#3768) cebtenzzre 2023-10-24 16:10:43 -04:00
  • d044111620 server : do not block system prompt update (#3767) Georgi Gerganov 2023-10-24 23:08:20 +03:00
  • 1717521cdb server : do not block system prompt update (#3767) Georgi Gerganov 2023-10-24 23:08:20 +03:00
  • bb83918e6c sync : ggml (conv ops + cuda MSVC fixes) (#3765) Georgi Gerganov 2023-10-24 21:51:20 +03:00
  • b2f7e04bd3 sync : ggml (conv ops + cuda MSVC fixes) (#3765) Georgi Gerganov 2023-10-24 21:51:20 +03:00
  • afa9fadb38 cmake : add missed dependencies (#3763) John Smith 2023-10-25 01:48:45 +08:00
  • abd21fc99f cmake : add missed dependencies (#3763) John Smith 2023-10-25 01:48:45 +08:00
  • 42adfb7c7c cuda : add batched cuBLAS GEMM for faster attention (#3749) Georgi Gerganov 2023-10-24 16:48:37 +03:00
  • 2b4ea35e56 cuda : add batched cuBLAS GEMM for faster attention (#3749) Georgi Gerganov 2023-10-24 16:48:37 +03:00
  • 7b34c1046c Add more tokenizer tests (#3742) Galunid 2023-10-24 09:17:17 +02:00
  • daab3d7f45 Add more tokenizer tests (#3742) Galunid 2023-10-24 09:17:17 +02:00
  • cd942eebb7 metal : handle ggml_scale for n%4 != 0 (close #3754) Georgi Gerganov 2023-10-24 09:46:50 +03:00
  • 469c9addef metal : handle ggml_scale for n%4 != 0 (close #3754) Georgi Gerganov 2023-10-24 09:46:50 +03:00
  • 55550d1521 Revert "make : add optional CUDA_NATIVE_ARCH (#2482)" Georgi Gerganov 2023-10-23 23:46:05 +03:00
  • e3932593d4 Revert "make : add optional CUDA_NATIVE_ARCH (#2482)" Georgi Gerganov 2023-10-23 23:46:05 +03:00
  • 93ea72c66e issues : separate bug and enhancement template + no default title (#3748) M. Yusuf Sarıgöz 2023-10-23 22:57:16 +03:00
  • 9d02956443 issues : separate bug and enhancement template + no default title (#3748) M. Yusuf Sarıgöz 2023-10-23 22:57:16 +03:00
  • 806c36c256 Update special token handling in conversion scripts for gpt2 derived tokenizers (#3746) Galunid 2023-10-23 21:46:00 +02:00
  • 69a6735087 Update special token handling in conversion scripts for gpt2 derived tokenizers (#3746) Galunid 2023-10-23 21:46:00 +02:00
  • a04926db16 llama : remove token functions with context args in favor of model (#3720) Marcus Dunn 2023-10-23 12:40:03 -07:00
  • 5be6c803fa llama : remove token functions with context args in favor of model (#3720) Marcus Dunn 2023-10-23 12:40:03 -07:00
  • 51ee769790 Fix baichuan convert script not detecing model (#3739) Galunid 2023-10-23 17:47:03 +02:00
  • 6336701c93 Fix baichuan convert script not detecing model (#3739) Galunid 2023-10-23 17:47:03 +02:00
  • 624ae30d1d make : add optional CUDA_NATIVE_ARCH (#2482) Alex 2023-10-22 15:56:53 -04:00
  • 96981f37b1 make : add optional CUDA_NATIVE_ARCH (#2482) Alex 2023-10-22 15:56:53 -04:00
  • 26299b9f54 server : parallel decoding and multimodal (#3677) Georgi Gerganov 2023-10-22 22:53:08 +03:00
  • 438c2ca830 server : parallel decoding and multimodal (#3677) Georgi Gerganov 2023-10-22 22:53:08 +03:00
  • 75be681d83 Add test for MPT tokenization (#3728) goerch 2023-10-22 21:21:42 +02:00
  • 9e70cc0322 Add test for MPT tokenization (#3728) goerch 2023-10-22 21:21:42 +02:00
  • 21a26a6dea readme : remove unsupported node.js library (#3703) Ian Scrivener 2023-10-23 05:16:43 +11:00
  • 5a42a5f8e8 readme : remove unsupported node.js library (#3703) Ian Scrivener 2023-10-23 05:16:43 +11:00
  • ca82c7e398 llama : validate special token ids are in range when loading GGUF model (#3635) Kerfuffle 2023-10-22 12:14:56 -06:00
  • a5e7dbd614 llama : validate special token ids are in range when loading GGUF model (#3635) Kerfuffle 2023-10-22 12:14:56 -06:00
  • 6814218d0a main : escape prompt for cfg_negative_prompt and consecutive inputs in main with interactive (#3623) vvhg1 2023-10-22 20:09:51 +02:00
  • d3956aea53 main : escape prompt for cfg_negative_prompt and consecutive inputs in main with interactive (#3623) vvhg1 2023-10-22 20:09:51 +02:00
  • bba94c6de9 batched : add len CLI argument Georgi Gerganov 2023-10-22 08:37:20 +03:00
  • 22c69a2794 batched : add len CLI argument Georgi Gerganov 2023-10-22 08:37:20 +03:00
  • 1c980e81e3 CLBlast: Add outer loops over src0 for broadcasting in mulmat shibe2 2023-10-12 16:01:23 +04:00
  • 465219b914 CLBlast: Add outer loops over src0 for broadcasting in mulmat shibe2 2023-10-12 16:01:23 +04:00
  • ede7949722 sampling : refactor init to use llama_sampling_params (#3696) Georgi Gerganov 2023-10-20 21:07:23 +03:00
  • d1031cf49c sampling : refactor init to use llama_sampling_params (#3696) Georgi Gerganov 2023-10-20 21:07:23 +03:00
  • c20d543399 gguf : support big endian platform (#3552) Qin Yue Chen 2023-10-20 06:19:40 -05:00
  • 8cf19d60dc gguf : support big endian platform (#3552) Qin Yue Chen 2023-10-20 06:19:40 -05:00
  • ddff4228d1 server : fix uninitialized sampling context (close #3685) Georgi Gerganov 2023-10-20 13:06:10 +03:00
  • a0edf73bda server : fix uninitialized sampling context (close #3685) Georgi Gerganov 2023-10-20 13:06:10 +03:00
  • 49b424d771 ggml : fix rope + llama minor optimizations (#3560) Herman Semenov 2023-10-20 10:02:12 +00:00
  • f439e506e8 ggml : fix rope + llama minor optimizations (#3560) Herman Semenov 2023-10-20 10:02:12 +00:00
  • 6efa10b7da convert : restore compat with old Falcon models (#3680) cebtenzzre 2023-10-20 01:32:08 -04:00
  • e78f3ef24a convert : restore compat with old Falcon models (#3680) cebtenzzre 2023-10-20 01:32:08 -04:00
  • 9db9af1db4 multimodal : add BakLLaVA conversion support (#3682) M. Yusuf Sarıgöz 2023-10-19 19:40:41 +03:00
  • f3b25e4043 multimodal : add BakLLaVA conversion support (#3682) M. Yusuf Sarıgöz 2023-10-19 19:40:41 +03:00
  • d84cace5bc llava : avoid segfault in case of non-existent mmproj file (#3674) M. Yusuf Sarıgöz 2023-10-19 16:59:11 +03:00
  • 60abea9798 llava : avoid segfault in case of non-existent mmproj file (#3674) M. Yusuf Sarıgöz 2023-10-19 16:59:11 +03:00
  • f9bbb76017 readme : update hot topics Georgi Gerganov 2023-10-18 21:44:43 +03:00
  • 004797f6ac readme : update hot topics Georgi Gerganov 2023-10-18 21:44:43 +03:00
  • 5632453c04 speculative : bug fixes Georgi Gerganov 2023-10-18 18:49:40 +03:00
  • 4e82b2ea3f speculative : bug fixes Georgi Gerganov 2023-10-18 18:49:40 +03:00
  • 57dbdbdc54 speculative : add tree-based sampling example (#3624) Georgi Gerganov 2023-10-18 16:21:57 +03:00
  • 0e89203b51 speculative : add tree-based sampling example (#3624) Georgi Gerganov 2023-10-18 16:21:57 +03:00
  • 6304b44c9a metal : implement q5_0 and q5_1 kernels (#3648) Jhen-Jie Hong 2023-10-18 07:21:48 -05:00
  • c67fe68e41 metal : implement q5_0 and q5_1 kernels (#3648) Jhen-Jie Hong 2023-10-18 07:21:48 -05:00
  • 3537868cbd opencl : fix element-wise multiplication (#3656) shibe2 2023-10-18 16:09:22 +04:00
  • 1117d06607 opencl : fix element-wise multiplication (#3656) shibe2 2023-10-18 16:09:22 +04:00
  • e8062852c1 fix embeddings when using CUDA (#3657) slaren 2023-10-17 22:24:50 +02:00
  • cb33f43a2a fix embeddings when using CUDA (#3657) slaren 2023-10-17 22:24:50 +02:00
  • 7a07258faf llama : avoid fprintf in favor of LLAMA_LOG (#3538) Georgi Gerganov 2023-10-17 22:34:26 +03:00
  • e1675d133c llama : avoid fprintf in favor of LLAMA_LOG (#3538) Georgi Gerganov 2023-10-17 22:34:26 +03:00
  • 2404ccf7ab readme : update hot-topics & models, detail windows release in usage (#3615) BarfingLemurs 2023-10-17 14:13:21 -04:00
  • 8402566a7c readme : update hot-topics & models, detail windows release in usage (#3615) BarfingLemurs 2023-10-17 14:13:21 -04:00
  • 99629835fc CLBlast: Fix temporary buffer size for f16 conversion (wsize) shibe2 2023-10-11 21:30:06 +04:00
  • 40e5ce054f CLBlast: Fix temporary buffer size for f16 conversion (wsize) shibe2 2023-10-11 21:30:06 +04:00
  • 233cf27460 train-text-from-scratch : fix assert failure in ggml-alloc (#3618) slaren 2023-10-17 19:00:58 +02:00
  • a5e8c1d8c7 train-text-from-scratch : fix assert failure in ggml-alloc (#3618) slaren 2023-10-17 19:00:58 +02:00