Commit Graph

  • 466a79f7b4 CUDA: use mul_mat_q kernels by default (#2683) Johannes Gäßler 2023-08-22 22:47:05 +02:00
  • c63bb1d16a CUDA: use mul_mat_q kernels by default (#2683) Johannes Gäßler 2023-08-22 22:47:05 +02:00
  • c358145028 convert.py : clarifying error message (#2718) Alex Petenchea 2023-08-22 21:58:16 +03:00
  • 3b6cfe7c92 convert.py : clarifying error message (#2718) Alex Petenchea 2023-08-22 21:58:16 +03:00
  • 946bf0ad96 Fix CUDA softmax by subtracting max value before exp (#2665) Jiahao Li 2023-08-23 02:27:06 +08:00
  • 800c9635b4 Fix CUDA softmax by subtracting max value before exp (#2665) Jiahao Li 2023-08-23 02:27:06 +08:00
  • 45b45614c0 gguf : add ftype meta info to the model (#2710) Georgi Gerganov 2023-08-22 20:05:59 +03:00
  • deb7dfca4b gguf : add ftype meta info to the model (#2710) Georgi Gerganov 2023-08-22 20:05:59 +03:00
  • 42e9f23b94 Quantization imrovements for k_quants (#2707) Kawrakow 2023-08-22 19:14:09 +03:00
  • bac66994cf Quantization imrovements for k_quants (#2707) Kawrakow 2023-08-22 19:14:09 +03:00
  • a1a27de242 embedding : evaluate prompt in batches (#2713) slaren 2023-08-22 16:03:12 +02:00
  • 519c981f8b embedding : evaluate prompt in batches (#2713) slaren 2023-08-22 16:03:12 +02:00
  • 127e9263b9 ggml-cuda : use graph allocator (#2684) slaren 2023-08-22 15:25:19 +02:00
  • 1123f7fbdf ggml-cuda : use graph allocator (#2684) slaren 2023-08-22 15:25:19 +02:00
  • cd3ea9a1aa ggml : sync latest (SAM + SD operators, CUDA alibi) (#2709) Georgi Gerganov 2023-08-22 14:22:08 +03:00
  • ef3f333d37 ggml : sync latest (SAM + SD operators, CUDA alibi) (#2709) Georgi Gerganov 2023-08-22 14:22:08 +03:00
  • f7b77acb2d llama-bench : minor fixes (#2695) slaren 2023-08-22 09:56:03 +02:00
  • 8e4364f2af llama-bench : minor fixes (#2695) slaren 2023-08-22 09:56:03 +02:00
  • 6c828cecd4 ggml : support CUDA's half type for aarch64(#1455) (#2670) Kylin 2023-08-22 15:14:23 +08:00
  • 1e3bc523d8 ggml : support CUDA's half type for aarch64(#1455) (#2670) Kylin 2023-08-22 15:14:23 +08:00
  • 546f5e93bb metal : add missing barriers for mul-mat (#2699) Shouzheng Liu 2023-08-22 02:18:40 -04:00
  • 14b1d7e6f7 metal : add missing barriers for mul-mat (#2699) Shouzheng Liu 2023-08-22 02:18:40 -04:00
  • d5ef2c2437 server : fallback to default if client param is null (#2688) Jhen-Jie Hong 2023-08-22 08:32:00 +08:00
  • 226255b44e server : fallback to default if client param is null (#2688) Jhen-Jie Hong 2023-08-22 08:32:00 +08:00
  • 6b727cf519 Fix convert-llama-ggmlv3-to-gguf.py vocab conversion (#2698) Kerfuffle 2023-08-21 18:01:34 -06:00
  • 930523c8e1 Fix convert-llama-ggmlv3-to-gguf.py vocab conversion (#2698) Kerfuffle 2023-08-21 18:01:34 -06:00
  • b8ca3dd433 py : remove obsolete script Georgi Gerganov 2023-08-21 23:40:22 +03:00
  • c8dba409e6 py : remove obsolete script Georgi Gerganov 2023-08-21 23:40:22 +03:00
  • d62137b150 gguf : new file format with flexible meta data (beta) (#2398) Georgi Gerganov 2023-08-21 23:07:43 +03:00
  • 6381d4e110 gguf : new file format with flexible meta data (beta) (#2398) Georgi Gerganov 2023-08-21 23:07:43 +03:00
  • 4f23ca4846 metal : fix synchronization in new matrix multiplication kernel (#2686) Shouzheng Liu 2023-08-21 06:59:29 -04:00
  • dadbed99e6 metal : fix synchronization in new matrix multiplication kernel (#2686) Shouzheng Liu 2023-08-21 06:59:29 -04:00
  • ea31f62159 HellaSwag: split token evaluation into batches if needed (#2681) Kawrakow 2023-08-21 11:11:31 +03:00
  • cb1c0727bd HellaSwag: split token evaluation into batches if needed (#2681) Kawrakow 2023-08-21 11:11:31 +03:00
  • 1d51559847 ggml : move all type info to ggml_type_traits (#2663) slaren 2023-08-20 22:17:53 +02:00
  • 9e232f0234 ggml : move all type info to ggml_type_traits (#2663) slaren 2023-08-20 22:17:53 +02:00
  • bc13d08f35 More efficient Hellaswag implementation (#2677) Kawrakow 2023-08-20 16:44:46 +03:00
  • 5e9ff54a67 More efficient Hellaswag implementation (#2677) Kawrakow 2023-08-20 16:44:46 +03:00
  • b300feb149 server : better default prompt (#2646) Georgi Gerganov 2023-08-19 00:45:36 +03:00
  • 1f0bccb279 server : better default prompt (#2646) Georgi Gerganov 2023-08-19 00:45:36 +03:00
  • 56661570b4 server : update xxd usage for older versions compatibility (#2649) Jhen-Jie Hong 2023-08-19 05:41:32 +08:00
  • f63564adfa server : update xxd usage for older versions compatibility (#2649) Jhen-Jie Hong 2023-08-19 05:41:32 +08:00
  • d4aa4a2893 Add link to clojure bindings to Readme. (#2659) Adrian 2023-08-18 12:39:22 -07:00
  • 2d8b76a110 Add link to clojure bindings to Readme. (#2659) Adrian 2023-08-18 12:39:22 -07:00
  • 69c3699939 readme : incoming BREAKING CHANGE Georgi Gerganov 2023-08-18 17:48:31 +03:00
  • 7af633aec3 readme : incoming BREAKING CHANGE Georgi Gerganov 2023-08-18 17:48:31 +03:00
  • a697bb6396 llama : add benchmark example (#2626) slaren 2023-08-18 12:44:58 +02:00
  • 097e121e2f llama : add benchmark example (#2626) slaren 2023-08-18 12:44:58 +02:00
  • 0b795c74e9 readme : add link to Rust bindings (#2656) mdrokz 2023-08-18 15:47:58 +05:30
  • eaf98c2649 readme : add link to Rust bindings (#2656) mdrokz 2023-08-18 15:47:58 +05:30
  • 7e77f2ce99 perplexity : more meaningful ETA number - 2 decimal points Georgi Gerganov 2023-08-18 12:48:55 +03:00
  • e9b12c332e perplexity : more meaningful ETA number - 2 decimal points Georgi Gerganov 2023-08-18 12:48:55 +03:00
  • b799dbf153 Fix unicode in grammars (fixes #2501) (#2553) Evan Jones 2023-08-17 19:54:44 -04:00
  • 604b8bdfa6 Fix unicode in grammars (fixes #2501) (#2553) Evan Jones 2023-08-17 19:54:44 -04:00
  • 380d2fca81 server : support for saving templates in browser LocalStorage (#2486) staviq 2023-08-17 23:34:01 +00:00
  • 10151bee2e server : support for saving templates in browser LocalStorage (#2486) staviq 2023-08-17 23:34:01 +00:00
  • 0fc9b45a9a README: fix LLAMA_CUDA_MMV_Y documentation (#2647) Johannes Gäßler 2023-08-17 23:57:59 +02:00
  • 0992a7b8b1 README: fix LLAMA_CUDA_MMV_Y documentation (#2647) Johannes Gäßler 2023-08-17 23:57:59 +02:00
  • dfd2bf9187 [Zig] Fixing Zig build and improvements (#2554) Henri Vasserman 2023-08-17 23:11:18 +03:00
  • 6ddeefad9b [Zig] Fixing Zig build and improvements (#2554) Henri Vasserman 2023-08-17 23:11:18 +03:00
  • aad3ce7fee Add --cfg-negative-prompt-file option for examples (#2591) Kerfuffle 2023-08-17 07:29:44 -06:00
  • 8dae7ce684 Add --cfg-negative-prompt-file option for examples (#2591) Kerfuffle 2023-08-17 07:29:44 -06:00
  • 3340c5d564 llama : replace (permute + reshape + view_1d) with (view_3d) (#2538) Georgi Gerganov 2023-08-17 10:47:09 +03:00
  • a73ccf1aa3 llama : replace (permute + reshape + view_1d) with (view_3d) (#2538) Georgi Gerganov 2023-08-17 10:47:09 +03:00
  • 747ac45a45 tests : adds simple llama grammar tests (#2618) drbh 2023-08-17 03:41:01 -04:00
  • 7cf54e1f74 tests : adds simple llama grammar tests (#2618) drbh 2023-08-17 03:41:01 -04:00
  • 783bffe0ba ggml-alloc : fix discrepency between measure&eval (#2639) Shouzheng Liu 2023-08-17 03:35:53 -04:00
  • a872a2b28e ggml-alloc : fix discrepency between measure&eval (#2639) Shouzheng Liu 2023-08-17 03:35:53 -04:00
  • 33a7556b2e cmake : install ggml-meta.metal if LLAMA_METAL (#2449) Kolen Cheung 2023-08-16 21:09:49 +01:00
  • 0919a0f73d cmake : install ggml-meta.metal if LLAMA_METAL (#2449) Kolen Cheung 2023-08-16 21:09:49 +01:00
  • f5f80b4078 metal : print error of load pipeline state (#2564) Jhen-Jie Hong 2023-08-17 04:09:03 +08:00
  • ed53db86c3 metal : print error of load pipeline state (#2564) Jhen-Jie Hong 2023-08-17 04:09:03 +08:00
  • 3e347c09df metal : enable ggml-alloc (#2627) Shouzheng Liu 2023-08-16 16:08:28 -04:00
  • fc8ef549e5 metal : enable ggml-alloc (#2627) Shouzheng Liu 2023-08-16 16:08:28 -04:00
  • d47e62c9da metal : matrix-matrix multiplication kernel (#2615) Shouzheng Liu 2023-08-16 16:07:04 -04:00
  • bf83bff674 metal : matrix-matrix multiplication kernel (#2615) Shouzheng Liu 2023-08-16 16:07:04 -04:00
  • 7c50c2c95a scripts : add helper script to get wikitext Georgi Gerganov 2023-08-15 10:04:58 +03:00
  • b5ffb2849d scripts : add helper script to get wikitext Georgi Gerganov 2023-08-15 10:04:58 +03:00
  • 87be3696f6 server : add missing /json-schema-to-grammar.mjs (#2616) Jhen-Jie Hong 2023-08-15 06:14:14 +08:00
  • 3ebb00935f server : add missing /json-schema-to-grammar.mjs (#2616) Jhen-Jie Hong 2023-08-15 06:14:14 +08:00
  • 1aa7b274c7 metal : return null instead of exit(1) (#2573) Jhen-Jie Hong 2023-08-14 21:37:39 +08:00
  • d783f7982e metal : return null instead of exit(1) (#2573) Jhen-Jie Hong 2023-08-14 21:37:39 +08:00
  • bb04b1c694 server : add --numa support (#2524) Cheng Shao 2023-08-14 15:36:42 +02:00
  • d75561df20 server : add --numa support (#2524) Cheng Shao 2023-08-14 15:36:42 +02:00
  • f638f57333 llama : add missing enum keyword in function signatures (#2610) Kamil Tomšík 2023-08-14 15:35:16 +02:00
  • 348acf188c llama : add missing enum keyword in function signatures (#2610) Kamil Tomšík 2023-08-14 15:35:16 +02:00
  • 677c32921e CUDA: launch_bounds, small q4_K, q5_K mmq refactor (#2596) Johannes Gäßler 2023-08-14 10:41:22 +02:00
  • 1cd06fa25e CUDA: launch_bounds, small q4_K, q5_K mmq refactor (#2596) Johannes Gäßler 2023-08-14 10:41:22 +02:00
  • 16aaeef5b3 server : fix default grammar by use empty string in the UI (#2604) Jhen-Jie Hong 2023-08-14 16:20:17 +08:00
  • 2feb8934eb server : fix default grammar by use empty string in the UI (#2604) Jhen-Jie Hong 2023-08-14 16:20:17 +08:00
  • 4fce312113 server : implement json-schema-to-grammar.mjs & add grammar param in the UI (#2588) Jhen-Jie Hong 2023-08-14 15:16:54 +08:00
  • 5517d6e692 server : implement json-schema-to-grammar.mjs & add grammar param in the UI (#2588) Jhen-Jie Hong 2023-08-14 15:16:54 +08:00
  • 67cd356f19 Enhance Windows 7 and below compatibility. (#2592) vxiiduu 2023-08-14 13:59:16 +10:00
  • f31b539714 Enhance Windows 7 and below compatibility. (#2592) vxiiduu 2023-08-14 13:59:16 +10:00
  • 56208c9293 test : add simple grammar parsing tests (#2594) drbh 2023-08-13 10:00:48 -04:00
  • ee77efea2a test : add simple grammar parsing tests (#2594) drbh 2023-08-13 10:00:48 -04:00
  • e1305c82c3 CUDA: Fixed OpenLLaMA 3b mmq, reduced compile time (#2590) Johannes Gäßler 2023-08-13 00:24:45 +02:00
  • f64d44a9b9 CUDA: Fixed OpenLLaMA 3b mmq, reduced compile time (#2590) Johannes Gäßler 2023-08-13 00:24:45 +02:00
  • bb9ebb4394 Adding support for llama2.c models (#2559) byte-6174 2023-08-11 19:17:25 -04:00
  • b19edd54d5 Adding support for llama2.c models (#2559) byte-6174 2023-08-11 19:17:25 -04:00