Commit Graph

  • 66ad009c5e llama : add session file format and saved sessions in main (#1169) Evan Jones 2023-04-28 11:59:37 -04:00
  • 1481a9cf25 llama : add session file format and saved sessions in main (#1169) Evan Jones 2023-04-28 11:59:37 -04:00
  • 2502e9f309 ggml : add helper debug printf in soft_max Georgi Gerganov 2023-04-28 17:58:44 +03:00
  • 11d902364b ggml : add helper debug printf in soft_max Georgi Gerganov 2023-04-28 17:58:44 +03:00
  • bf545d13af ggml : add CLBlast support (#1164) 0cc4m 2023-04-28 16:57:16 +02:00
  • 7296c961d9 ggml : add CLBlast support (#1164) 0cc4m 2023-04-28 16:57:16 +02:00
  • afe6563a58 Correcting link to w64devkit (#1214) Folko-Ven 2023-04-28 19:22:48 +05:00
  • 78ec543733 Correcting link to w64devkit (#1214) Folko-Ven 2023-04-28 19:22:48 +05:00
  • 25079cd3cf Add Manjaro CUDA include and lib dirs to Makefile (#1212) Johannes Gäßler 2023-04-28 15:40:32 +02:00
  • 92a6e13a31 Add Manjaro CUDA include and lib dirs to Makefile (#1212) Johannes Gäßler 2023-04-28 15:40:32 +02:00
  • fbb558100f add avx2 for dot_q8_0_q8_0, 2x faster than scalar (#1211) Yann Follet 2023-04-28 19:59:48 +08:00
  • 04aaae1d79 add avx2 for dot_q8_0_q8_0, 2x faster than scalar (#1211) Yann Follet 2023-04-28 19:59:48 +08:00
  • e05d16c8c7 ggml : slightly faster AVX2 implementation for Q5 (#1197) Stephan Walter 2023-04-26 20:26:42 +00:00
  • 0b2da20538 ggml : slightly faster AVX2 implementation for Q5 (#1197) Stephan Walter 2023-04-26 20:26:42 +00:00
  • f9318ab76d readme : add quantization info Georgi Gerganov 2023-04-26 23:24:42 +03:00
  • f9be42add0 readme : add quantization info Georgi Gerganov 2023-04-26 23:24:42 +03:00
  • 8e0ec92ace ggml : add Q5_0 and Q5_1 quantization (#1187) Georgi Gerganov 2023-04-26 23:14:13 +03:00
  • 574406dc7e ggml : add Q5_0 and Q5_1 quantization (#1187) Georgi Gerganov 2023-04-26 23:14:13 +03:00
  • a852f545f3 Allow setting the rng seed after initialization. (#1184) Ásgeir Bjarni Ingvarsson 2023-04-26 20:08:43 +00:00
  • 87a6f846d3 Allow setting the rng seed after initialization. (#1184) Ásgeir Bjarni Ingvarsson 2023-04-26 20:08:43 +00:00
  • 8ad378c494 Updating build instructions to include BLAS support (#1183) DaniAndTheWeb 2023-04-26 22:03:03 +02:00
  • ea3ad7eb60 Updating build instructions to include BLAS support (#1183) DaniAndTheWeb 2023-04-26 22:03:03 +02:00
  • 9e63abecf7 quantize : use map to assign quantization type from string (#1191) Pavol Rusnak 2023-04-26 18:43:27 +02:00
  • 859fee6dfb quantize : use map to assign quantization type from string (#1191) Pavol Rusnak 2023-04-26 18:43:27 +02:00
  • 994f7e33aa Update SHA256SUMS after quantization change (#1181) Stephan Walter 2023-04-25 21:41:56 +00:00
  • 4afcc37869 Update SHA256SUMS after quantization change (#1181) Stephan Walter 2023-04-25 21:41:56 +00:00
  • 6633395f7e py : cast lora_alpha to int in convert-lora-to-ggml (#1170) ostix360 2023-04-25 23:33:08 +02:00
  • 667c501334 py : cast lora_alpha to int in convert-lora-to-ggml (#1170) ostix360 2023-04-25 23:33:08 +02:00
  • 3c1c300ec5 nix: use convert.py instead of legacy wrapper convert-pth-to-ggml.py (#981) Pavol Rusnak 2023-04-25 23:19:57 +02:00
  • bb98e77be7 nix: use convert.py instead of legacy wrapper convert-pth-to-ggml.py (#981) Pavol Rusnak 2023-04-25 23:19:57 +02:00
  • 17cded0679 ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (#1179) Georgi Gerganov 2023-04-25 23:40:51 +03:00
  • 7a32fcb3b2 ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (#1179) Georgi Gerganov 2023-04-25 23:40:51 +03:00
  • 5ee3976631 ggml : use full range for Q4_0 and Q4_2 quantization (#729) unbounded 2023-04-25 19:20:46 +02:00
  • dd0eabc049 ggml : use full range for Q4_0 and Q4_2 quantization (#729) unbounded 2023-04-25 19:20:46 +02:00
  • 61733608fc ggml : fix bug in ggml_compute_forward_sum_f32 (#1162) xaedes 2023-04-24 23:02:02 +02:00
  • 54bb60e268 ggml : fix bug in ggml_compute_forward_sum_f32 (#1162) xaedes 2023-04-24 23:02:02 +02:00
  • d7fad1ff4a ggml : export symbols (#1155) Georgi Gerganov 2023-04-24 22:18:25 +03:00
  • 8a0f8673ba ggml : export symbols (#1155) Georgi Gerganov 2023-04-24 22:18:25 +03:00
  • 0cad6d7c3a examples : add save_load_state example (#1150) xaedes 2023-04-24 18:23:31 +02:00
  • 0c5692345d examples : add save_load_state example (#1150) xaedes 2023-04-24 18:23:31 +02:00
  • be38e1d79d llama : increase scratch buffer size for 65B (ref #1152) Georgi Gerganov 2023-04-24 18:47:03 +03:00
  • 957c8ae21d llama : increase scratch buffer size for 65B (ref #1152) Georgi Gerganov 2023-04-24 18:47:03 +03:00
  • 094250bc21 examples/main README improvements and some light refactoring (#1131) mgroeber9110 2023-04-24 17:45:32 +02:00
  • 9b0a4d4214 examples/main README improvements and some light refactoring (#1131) mgroeber9110 2023-04-24 17:45:32 +02:00
  • c62ac69921 Fix build for gcc 8 and test in CI (#1154) Stephan Walter 2023-04-24 15:38:26 +00:00
  • 2ec83428de Fix build for gcc 8 and test in CI (#1154) Stephan Walter 2023-04-24 15:38:26 +00:00
  • 19861e9394 Fix cuda compilation (#1128) slaren 2023-04-24 17:29:58 +02:00
  • e4cf982e0d Fix cuda compilation (#1128) slaren 2023-04-24 17:29:58 +02:00
  • dd8f46b8d6 llama : refactor get / set state + remove redundant kv cache API (#1143) Georgi Gerganov 2023-04-24 07:40:02 +03:00
  • c4fe84fb0d llama : refactor get / set state + remove redundant kv cache API (#1143) Georgi Gerganov 2023-04-24 07:40:02 +03:00
  • 7de8c169ec Fix LoRA acronym (#1145) slaren 2023-04-23 23:03:44 +02:00
  • 1d78fecdab Fix LoRA acronym (#1145) slaren 2023-04-23 23:03:44 +02:00
  • 3dbec7fb22 scripts : add helper scripts to synch ggml repo Georgi Gerganov 2023-04-23 19:57:09 +03:00
  • 284685f169 scripts : add helper scripts to synch ggml repo Georgi Gerganov 2023-04-23 19:57:09 +03:00
  • 2e00da6d52 Added README.md for main with examples and explanations (#1139) DannyDaemonic 2023-04-23 08:37:02 -07:00
  • edce63baa9 Added README.md for main with examples and explanations (#1139) DannyDaemonic 2023-04-23 08:37:02 -07:00
  • 593b344f2f ggml : do not print perf ops that have not been used at all Georgi Gerganov 2023-04-23 18:32:52 +03:00
  • ec9cdb6752 ggml : do not print perf ops that have not been used at all Georgi Gerganov 2023-04-23 18:32:52 +03:00
  • 03a6e6f189 ggml : better PERF prints + support "LLAMA_PERF=1 make" Georgi Gerganov 2023-04-23 18:15:39 +03:00
  • e4422e299c ggml : better PERF prints + support "LLAMA_PERF=1 make" Georgi Gerganov 2023-04-23 18:15:39 +03:00
  • fe240a9faf Improve AVX2 for vec_dot_q4_3_q8_0 (#1138) Stephan Walter 2023-04-23 11:01:03 +00:00
  • 53c8434398 Improve AVX2 for vec_dot_q4_3_q8_0 (#1138) Stephan Walter 2023-04-23 11:01:03 +00:00
  • 0ba63e81ff readme : update gpt4all instructions (#980) Pavol Rusnak 2023-04-23 10:21:26 +02:00
  • c6524f46eb readme : update gpt4all instructions (#980) Pavol Rusnak 2023-04-23 10:21:26 +02:00
  • 6fde036756 A better packNibbles and mul_sum_i8_pairs_float implementation using AVX512 (#1119) Yishuo Wang 2023-04-23 15:57:05 +08:00
  • c9e2c26f41 A better packNibbles and mul_sum_i8_pairs_float implementation using AVX512 (#1119) Yishuo Wang 2023-04-23 15:57:05 +08:00
  • 3ca1110186 ggml : fix Q4_3 cuBLAS Georgi Gerganov 2023-04-22 16:31:56 +03:00
  • 0e018fe008 ggml : fix Q4_3 cuBLAS Georgi Gerganov 2023-04-22 16:31:56 +03:00
  • 90da0b75a3 ci : trigger CI for drafts, but not most PR actions (#1125) Stephan Walter 2023-04-22 13:12:29 +00:00
  • 857308d1e8 ci : trigger CI for drafts, but not most PR actions (#1125) Stephan Walter 2023-04-22 13:12:29 +00:00
  • 949ca5ce05 Fix CI: ARM NEON, quantization unit tests, editorconfig (#1122) Stephan Walter 2023-04-22 10:54:13 +00:00
  • c50b628810 Fix CI: ARM NEON, quantization unit tests, editorconfig (#1122) Stephan Walter 2023-04-22 10:54:13 +00:00
  • 109a6a9414 ggml : unit test for quantization functions (#953) unbounded 2023-04-22 11:10:39 +02:00
  • 5f939498d5 ggml : unit test for quantization functions (#953) unbounded 2023-04-22 11:10:39 +02:00
  • 74cec44258 llama : print timings on ctrl+c exit (#1021) wbpxre150 2023-04-22 16:56:35 +08:00
  • 36b4f7e064 llama : print timings on ctrl+c exit (#1021) wbpxre150 2023-04-22 16:56:35 +08:00
  • 0f230cce20 llama : have n_batch default to 512 (#1091) eiery 2023-04-22 04:27:05 -04:00
  • 10f19c1121 llama : have n_batch default to 512 (#1091) eiery 2023-04-22 04:27:05 -04:00
  • f9a61db2de cmake : fix build under Windows when enable BUILD_SHARED_LIBS (#1100) Howard Su 2023-04-22 16:18:20 +08:00
  • 7e312f165c cmake : fix build under Windows when enable BUILD_SHARED_LIBS (#1100) Howard Su 2023-04-22 16:18:20 +08:00
  • 887673522d ggml : fix AVX build + update to new Q8_0 format Georgi Gerganov 2023-04-22 11:08:12 +03:00
  • 872c365a91 ggml : fix AVX build + update to new Q8_0 format Georgi Gerganov 2023-04-22 11:08:12 +03:00
  • 59fab3116a ggml : alternative Q4_3 implementation using modified Q8_0 (#1109) Georgi Gerganov 2023-04-22 10:55:35 +03:00
  • 955ef9a5d5 ggml : alternative Q4_3 implementation using modified Q8_0 (#1109) Georgi Gerganov 2023-04-22 10:55:35 +03:00
  • 9cecc39408 ggml : AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring (#1099) Stephan Walter 2023-04-22 07:37:05 +00:00
  • c5aa5e5777 ggml : AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring (#1099) Stephan Walter 2023-04-22 07:37:05 +00:00
  • 81fc78220f examples : Improve Alpaca Default Repeat Penalty: Better Match Alpaca.cpp Experience (#1107) Clint Herron 2023-04-22 02:54:33 -04:00
  • e9a9cb0c54 examples : Improve Alpaca Default Repeat Penalty: Better Match Alpaca.cpp Experience (#1107) Clint Herron 2023-04-22 02:54:33 -04:00
  • 8158098a31 llama : add api for getting/setting the complete state: rng, logits, embedding and kv_cache (#1105) xaedes 2023-04-22 08:21:32 +02:00
  • b6e7f9b09e llama : add api for getting/setting the complete state: rng, logits, embedding and kv_cache (#1105) xaedes 2023-04-22 08:21:32 +02:00
  • 1e909c9209 Improve cuBLAS performance by using a memory pool (#1094) slaren 2023-04-21 21:59:17 +02:00
  • 50cb666b8a Improve cuBLAS performance by using a memory pool (#1094) slaren 2023-04-21 21:59:17 +02:00
  • 643c806c1a llama : fixed rlimit error message (#888) apaz 2023-04-21 13:48:06 -05:00
  • 25d7abbd1f llama : fixed rlimit error message (#888) apaz 2023-04-21 13:48:06 -05:00
  • ab2496878d cmake : link threads publicly to ggml (#1042) 源文雨 2023-04-22 02:27:06 +08:00
  • 018f2279f5 cmake : link threads publicly to ggml (#1042) 源文雨 2023-04-22 02:27:06 +08:00
  • f6c87bbcaa main : evaluate tokens in batches after swapping context (#1014) Alex Klinkhamer 2023-04-21 11:18:09 -07:00
  • 9411288271 main : evaluate tokens in batches after swapping context (#1014) Alex Klinkhamer 2023-04-21 11:18:09 -07:00
  • 42c156e8d7 llama : remember and restore kv cache data pointers (#1104) xaedes 2023-04-21 17:25:21 +02:00
  • 8687c1f258 llama : remember and restore kv cache data pointers (#1104) xaedes 2023-04-21 17:25:21 +02:00