Commit Graph

  • dc51c17e4c metal : bug-fix when enable ggml-alloc (#2757) Shouzheng Liu 2023-08-24 12:27:25 -04:00
  • 38b16dfca6 metal : bug-fix when enable ggml-alloc (#2757) Shouzheng Liu 2023-08-24 12:27:25 -04:00
  • 8b08abe24f convert : auto-determine model name based on dir + scripts update Georgi Gerganov 2023-08-24 19:26:19 +03:00
  • 8f8c28e89c convert : auto-determine model name based on dir + scripts update Georgi Gerganov 2023-08-24 19:26:19 +03:00
  • 2a2645fd76 Fix for main example getting stuck when -n -2 and --interactive (#2767) Kerfuffle 2023-08-24 10:11:13 -06:00
  • 7694adda8d Fix for main example getting stuck when -n -2 and --interactive (#2767) Kerfuffle 2023-08-24 10:11:13 -06:00
  • 3b743a5340 fix convert.py for codellama, add llama 34B to the list of recognized models (#2768) slaren 2023-08-24 17:44:11 +02:00
  • fea95c682d fix convert.py for codellama, add llama 34B to the list of recognized models (#2768) slaren 2023-08-24 17:44:11 +02:00
  • a74a205f64 Tag release with build number (#2732) DannyDaemonic 2023-08-24 06:58:02 -07:00
  • ef955fbd23 Tag release with build number (#2732) DannyDaemonic 2023-08-24 06:58:02 -07:00
  • 25399c1197 metal : add Q8_0 support (#2763) Georgi Gerganov 2023-08-24 16:19:57 +03:00
  • d67777c202 metal : add Q8_0 support (#2763) Georgi Gerganov 2023-08-24 16:19:57 +03:00
  • 96e9fad81f llama : escape all U+2581 in a string (#2750) Georgi Gerganov 2023-08-24 12:26:01 +03:00
  • c3e53b421a llama : escape all U+2581 in a string (#2750) Georgi Gerganov 2023-08-24 12:26:01 +03:00
  • f4102e260a llama : fix grammar sometimes generating null char (#2756) Evan Jones 2023-08-24 00:07:13 -04:00
  • 6e91a1b070 llama : fix grammar sometimes generating null char (#2756) Evan Jones 2023-08-24 00:07:13 -04:00
  • fc84c48240 readme : fix link Georgi Gerganov 2023-08-23 23:44:19 +03:00
  • 44d5462b5c readme : fix link Georgi Gerganov 2023-08-23 23:44:19 +03:00
  • 1fac3b2c0b minor : fix trailing whitespace Georgi Gerganov 2023-08-23 23:43:00 +03:00
  • c7868b0753 minor : fix trailing whitespace Georgi Gerganov 2023-08-23 23:43:00 +03:00
  • eb5bf4480c readme : update hot topics Georgi Gerganov 2023-08-23 23:41:16 +03:00
  • 79da24b58c readme : update hot topics Georgi Gerganov 2023-08-23 23:41:16 +03:00
  • 5faba0e8a3 llm : add Falcon support (#2717) Georgi Gerganov 2023-08-23 23:08:04 +03:00
  • cf658adc83 llm : add Falcon support (#2717) Georgi Gerganov 2023-08-23 23:08:04 +03:00
  • 4e35149e03 minor : fix trailing whitespace Georgi Gerganov 2023-08-23 22:37:39 +03:00
  • a192860cfe minor : fix trailing whitespace Georgi Gerganov 2023-08-23 22:37:39 +03:00
  • 7bdaf1f167 examples : restore the functionality to import llama2.c models (#2685) Olivier Chafik 2023-08-23 20:33:05 +01:00
  • 95385241a9 examples : restore the functionality to import llama2.c models (#2685) Olivier Chafik 2023-08-23 20:33:05 +01:00
  • fb68e9c4e4 fix convert-lora-to-ggml.py (#2738) slaren 2023-08-23 16:46:54 +02:00
  • 335acd2ffd fix convert-lora-to-ggml.py (#2738) slaren 2023-08-23 16:46:54 +02:00
  • ae180e1cec main : insert bos if no tokens (#2727) klosax 2023-08-23 16:46:03 +02:00
  • 5290c38e6e main : insert bos if no tokens (#2727) klosax 2023-08-23 16:46:03 +02:00
  • e2108e57c8 gitignore : fix for windows (#2729) akawrykow 2023-08-23 07:31:34 -07:00
  • cc34dbda96 gitignore : fix for windows (#2729) akawrykow 2023-08-23 07:31:34 -07:00
  • 557d5f9edf chmod : make scripts executable (#2675) Cebtenzzre 2023-08-23 10:29:09 -04:00
  • 7c2227a197 chmod : make scripts executable (#2675) Cebtenzzre 2023-08-23 10:29:09 -04:00
  • 61c5da152b devops : RPM Specs (#2723) JohnnyB 2023-08-23 15:28:22 +01:00
  • f19dca04ea devops : RPM Specs (#2723) JohnnyB 2023-08-23 15:28:22 +01:00
  • 6d9174f956 Fix values shown in the quantize tool help (#2735) Kawrakow 2023-08-23 12:57:12 +03:00
  • 8207214b6a Fix values shown in the quantize tool help (#2735) Kawrakow 2023-08-23 12:57:12 +03:00
  • 0a7ab80b61 Strided perplexity (#2714) Kawrakow 2023-08-23 12:56:42 +03:00
  • 62959e740e Strided perplexity (#2714) Kawrakow 2023-08-23 12:56:42 +03:00
  • 047c0403c4 Fix ggml to gguf conversion on Windows (#2733) IgnacioFDM 2023-08-23 06:31:09 -03:00
  • 7f7ddd5002 Fix ggml to gguf conversion on Windows (#2733) IgnacioFDM 2023-08-23 06:31:09 -03:00
  • 58656653b1 server : allow json array in prompt or content for direct token input (#2306) Xiao-Yong Jin 2023-08-23 02:12:12 -05:00
  • b8ad1b66b2 server : allow json array in prompt or content for direct token input (#2306) Xiao-Yong Jin 2023-08-23 02:12:12 -05:00
  • 943bf8930c docs : add grammar docs (#2701) Evan Jones 2023-08-22 21:01:57 -04:00
  • f5fe98d11b docs : add grammar docs (#2701) Evan Jones 2023-08-22 21:01:57 -04:00
  • 0ef7086455 Improve handling of special tokens in GGML to GGUF converter (#2725) Kerfuffle 2023-08-22 17:39:39 -06:00
  • 777f42ba18 Improve handling of special tokens in GGML to GGUF converter (#2725) Kerfuffle 2023-08-22 17:39:39 -06:00
  • d916cb3d85 llama : fix whitespace escaping in tokenizer (#2724) goerch 2023-08-22 23:10:42 +02:00
  • 46ef5b5fcf llama : fix whitespace escaping in tokenizer (#2724) goerch 2023-08-22 23:10:42 +02:00
  • 466a79f7b4 CUDA: use mul_mat_q kernels by default (#2683) Johannes Gäßler 2023-08-22 22:47:05 +02:00
  • c63bb1d16a CUDA: use mul_mat_q kernels by default (#2683) Johannes Gäßler 2023-08-22 22:47:05 +02:00
  • c358145028 convert.py : clarifying error message (#2718) Alex Petenchea 2023-08-22 21:58:16 +03:00
  • 3b6cfe7c92 convert.py : clarifying error message (#2718) Alex Petenchea 2023-08-22 21:58:16 +03:00
  • 946bf0ad96 Fix CUDA softmax by subtracting max value before exp (#2665) Jiahao Li 2023-08-23 02:27:06 +08:00
  • 800c9635b4 Fix CUDA softmax by subtracting max value before exp (#2665) Jiahao Li 2023-08-23 02:27:06 +08:00
  • 45b45614c0 gguf : add ftype meta info to the model (#2710) Georgi Gerganov 2023-08-22 20:05:59 +03:00
  • deb7dfca4b gguf : add ftype meta info to the model (#2710) Georgi Gerganov 2023-08-22 20:05:59 +03:00
  • 42e9f23b94 Quantization imrovements for k_quants (#2707) Kawrakow 2023-08-22 19:14:09 +03:00
  • bac66994cf Quantization imrovements for k_quants (#2707) Kawrakow 2023-08-22 19:14:09 +03:00
  • a1a27de242 embedding : evaluate prompt in batches (#2713) slaren 2023-08-22 16:03:12 +02:00
  • 519c981f8b embedding : evaluate prompt in batches (#2713) slaren 2023-08-22 16:03:12 +02:00
  • 127e9263b9 ggml-cuda : use graph allocator (#2684) slaren 2023-08-22 15:25:19 +02:00
  • 1123f7fbdf ggml-cuda : use graph allocator (#2684) slaren 2023-08-22 15:25:19 +02:00
  • cd3ea9a1aa ggml : sync latest (SAM + SD operators, CUDA alibi) (#2709) Georgi Gerganov 2023-08-22 14:22:08 +03:00
  • ef3f333d37 ggml : sync latest (SAM + SD operators, CUDA alibi) (#2709) Georgi Gerganov 2023-08-22 14:22:08 +03:00
  • f7b77acb2d llama-bench : minor fixes (#2695) slaren 2023-08-22 09:56:03 +02:00
  • 8e4364f2af llama-bench : minor fixes (#2695) slaren 2023-08-22 09:56:03 +02:00
  • 6c828cecd4 ggml : support CUDA's half type for aarch64(#1455) (#2670) Kylin 2023-08-22 15:14:23 +08:00
  • 1e3bc523d8 ggml : support CUDA's half type for aarch64(#1455) (#2670) Kylin 2023-08-22 15:14:23 +08:00
  • 546f5e93bb metal : add missing barriers for mul-mat (#2699) Shouzheng Liu 2023-08-22 02:18:40 -04:00
  • 14b1d7e6f7 metal : add missing barriers for mul-mat (#2699) Shouzheng Liu 2023-08-22 02:18:40 -04:00
  • d5ef2c2437 server : fallback to default if client param is null (#2688) Jhen-Jie Hong 2023-08-22 08:32:00 +08:00
  • 226255b44e server : fallback to default if client param is null (#2688) Jhen-Jie Hong 2023-08-22 08:32:00 +08:00
  • 6b727cf519 Fix convert-llama-ggmlv3-to-gguf.py vocab conversion (#2698) Kerfuffle 2023-08-21 18:01:34 -06:00
  • 930523c8e1 Fix convert-llama-ggmlv3-to-gguf.py vocab conversion (#2698) Kerfuffle 2023-08-21 18:01:34 -06:00
  • b8ca3dd433 py : remove obsolete script Georgi Gerganov 2023-08-21 23:40:22 +03:00
  • c8dba409e6 py : remove obsolete script Georgi Gerganov 2023-08-21 23:40:22 +03:00
  • d62137b150 gguf : new file format with flexible meta data (beta) (#2398) Georgi Gerganov 2023-08-21 23:07:43 +03:00
  • 6381d4e110 gguf : new file format with flexible meta data (beta) (#2398) Georgi Gerganov 2023-08-21 23:07:43 +03:00
  • 4f23ca4846 metal : fix synchronization in new matrix multiplication kernel (#2686) Shouzheng Liu 2023-08-21 06:59:29 -04:00
  • dadbed99e6 metal : fix synchronization in new matrix multiplication kernel (#2686) Shouzheng Liu 2023-08-21 06:59:29 -04:00
  • ea31f62159 HellaSwag: split token evaluation into batches if needed (#2681) Kawrakow 2023-08-21 11:11:31 +03:00
  • cb1c0727bd HellaSwag: split token evaluation into batches if needed (#2681) Kawrakow 2023-08-21 11:11:31 +03:00
  • 1d51559847 ggml : move all type info to ggml_type_traits (#2663) slaren 2023-08-20 22:17:53 +02:00
  • 9e232f0234 ggml : move all type info to ggml_type_traits (#2663) slaren 2023-08-20 22:17:53 +02:00
  • bc13d08f35 More efficient Hellaswag implementation (#2677) Kawrakow 2023-08-20 16:44:46 +03:00
  • 5e9ff54a67 More efficient Hellaswag implementation (#2677) Kawrakow 2023-08-20 16:44:46 +03:00
  • b300feb149 server : better default prompt (#2646) Georgi Gerganov 2023-08-19 00:45:36 +03:00
  • 1f0bccb279 server : better default prompt (#2646) Georgi Gerganov 2023-08-19 00:45:36 +03:00
  • 56661570b4 server : update xxd usage for older versions compatibility (#2649) Jhen-Jie Hong 2023-08-19 05:41:32 +08:00
  • f63564adfa server : update xxd usage for older versions compatibility (#2649) Jhen-Jie Hong 2023-08-19 05:41:32 +08:00
  • d4aa4a2893 Add link to clojure bindings to Readme. (#2659) Adrian 2023-08-18 12:39:22 -07:00
  • 2d8b76a110 Add link to clojure bindings to Readme. (#2659) Adrian 2023-08-18 12:39:22 -07:00
  • 69c3699939 readme : incoming BREAKING CHANGE Georgi Gerganov 2023-08-18 17:48:31 +03:00
  • 7af633aec3 readme : incoming BREAKING CHANGE Georgi Gerganov 2023-08-18 17:48:31 +03:00
  • a697bb6396 llama : add benchmark example (#2626) slaren 2023-08-18 12:44:58 +02:00
  • 097e121e2f llama : add benchmark example (#2626) slaren 2023-08-18 12:44:58 +02:00