Commit Graph

  • cae6483d88 ggml : a faster version for Q4_1 x Q8_0 dot products (#1083) Kawrakow 2023-04-21 17:18:26 +02:00
  • 1bfc153e2f ggml : a faster version for Q4_1 x Q8_0 dot products (#1083) Kawrakow 2023-04-21 17:18:26 +02:00
  • b9387e79c4 Show perplexity ETA in hours and minutes (#1096) slaren 2023-04-21 14:57:57 +02:00
  • 3d59769c3b Show perplexity ETA in hours and minutes (#1096) slaren 2023-04-21 14:57:57 +02:00
  • 465c3659d2 llama : fix comment for "output.weight" tensor Georgi Gerganov 2023-04-21 10:23:36 +03:00
  • d40fded93e llama : fix comment for "output.weight" tensor Georgi Gerganov 2023-04-21 10:23:36 +03:00
  • fe9564f09e Add ggml-model-*.bin checksums for 7B, 13B, 30B, 65B (#1088) Stephan Walter 2023-04-20 21:56:44 +00:00
  • 2510c1831f Add ggml-model-*.bin checksums for 7B, 13B, 30B, 65B (#1088) Stephan Walter 2023-04-20 21:56:44 +00:00
  • 9e824bf15c ggml : sync ggml (add GPT-NeoX RoPE implementation) Georgi Gerganov 2023-04-20 23:32:59 +03:00
  • 12b5900dbc ggml : sync ggml (add GPT-NeoX RoPE implementation) Georgi Gerganov 2023-04-20 23:32:59 +03:00
  • f04613a668 ggml : fix bug in ggml_compute_forward_dup_f32() Georgi Gerganov 2023-04-20 21:58:05 +03:00
  • 9ff334f3c9 ggml : fix bug in ggml_compute_forward_dup_f32() Georgi Gerganov 2023-04-20 21:58:05 +03:00
  • dc2fb22941 Add Q4_3 support to cuBLAS (#1086) slaren 2023-04-20 20:49:53 +02:00
  • 2005469ea1 Add Q4_3 support to cuBLAS (#1086) slaren 2023-04-20 20:49:53 +02:00
  • 7a693926b8 ggml : do not break cuBLAS build (Q4_3 is not yet implemented) Georgi Gerganov 2023-04-20 21:43:50 +03:00
  • 8a1756abdf ggml : do not break cuBLAS build (Q4_3 is not yet implemented) Georgi Gerganov 2023-04-20 21:43:50 +03:00
  • c3aa2316ac ggml : fix Q4_3 quantization Georgi Gerganov 2023-04-20 20:44:05 +03:00
  • 66aab46079 ggml : fix Q4_3 quantization Georgi Gerganov 2023-04-20 20:44:05 +03:00
  • 6e34a4c7c8 llama : multi-threaded quantization (#1075) Kawrakow 2023-04-20 19:42:27 +02:00
  • 38de86a711 llama : multi-threaded quantization (#1075) Kawrakow 2023-04-20 19:42:27 +02:00
  • 0a8cdb2ea1 ggml : add Q4_3 quantization (#1082) Georgi Gerganov 2023-04-20 20:35:53 +03:00
  • e0305ead3a ggml : add Q4_3 quantization (#1082) Georgi Gerganov 2023-04-20 20:35:53 +03:00
  • 59f4d32a01 ci : remove the LLAMA_ACCELERATE matrix dimension from Ubuntu builds in the CI (#1074) Ivan Komarov 2023-04-20 17:15:18 +02:00
  • 6a9661ea5a ci : remove the LLAMA_ACCELERATE matrix dimension from Ubuntu builds in the CI (#1074) Ivan Komarov 2023-04-20 17:15:18 +02:00
  • b6c1bfc960 fix: LLAMA_CUBLAS=1 undefined reference 'shm_open' (#1080) 源文雨 2023-04-20 21:28:43 +08:00
  • 5addcb120c fix: LLAMA_CUBLAS=1 undefined reference 'shm_open' (#1080) 源文雨 2023-04-20 21:28:43 +08:00
  • 091a53228c AVX2 optimization for vec_dot_q4_2_q8_0 (#1068) Stephan Walter 2023-04-20 06:45:41 +00:00
  • c8c2c52482 AVX2 optimization for vec_dot_q4_2_q8_0 (#1068) Stephan Walter 2023-04-20 06:45:41 +00:00
  • 881ecfb4ef Improve cuBLAS performance by dequantizing on the GPU (#1065) slaren 2023-04-20 03:14:14 +02:00
  • 02d6988121 Improve cuBLAS performance by dequantizing on the GPU (#1065) slaren 2023-04-20 03:14:14 +02:00
  • 7ecc2d9e42 Minor: Readme fixed grammar, spelling, and misc updates (#1071) CRD716 2023-04-19 14:52:14 -05:00
  • 834695fe3a Minor: Readme fixed grammar, spelling, and misc updates (#1071) CRD716 2023-04-19 14:52:14 -05:00
  • e0e10251a3 Q4_2 quantization with rmse-optimized scale and quants (#1062) Kawrakow 2023-04-19 20:20:14 +02:00
  • f7d05095b4 Q4_2 quantization with rmse-optimized scale and quants (#1062) Kawrakow 2023-04-19 20:20:14 +02:00
  • 73a59affb2 ggml : use 8-bit precision for Q4_1 intermediate results (#1047) Georgi Gerganov 2023-04-19 20:10:08 +03:00
  • 884e7d7a2b ggml : use 8-bit precision for Q4_1 intermediate results (#1047) Georgi Gerganov 2023-04-19 20:10:08 +03:00
  • 068083ca76 readme : add warning about Q4_2 and Q4_3 Georgi Gerganov 2023-04-19 19:07:54 +03:00
  • 7cd5c4a3e9 readme : add warning about Q4_2 and Q4_3 Georgi Gerganov 2023-04-19 19:07:54 +03:00
  • ec0e355be1 ggml : Q4 cleanup - remove 4-bit dot product code (#1061) Stephan Walter 2023-04-19 16:06:37 +00:00
  • f3d4edf504 ggml : Q4 cleanup - remove 4-bit dot product code (#1061) Stephan Walter 2023-04-19 16:06:37 +00:00
  • bc5977cc90 Add NVIDIA cuBLAS support (#1044) slaren 2023-04-19 11:22:45 +02:00
  • 8944a13296 Add NVIDIA cuBLAS support (#1044) slaren 2023-04-19 11:22:45 +02:00
  • dee44e099f Multi-threaded ggml_cpy (#1035) slaren 2023-04-19 00:53:24 +02:00
  • 6667401238 Multi-threaded ggml_cpy (#1035) slaren 2023-04-19 00:53:24 +02:00
  • 4207b4b129 ggml : add new Q4_2 quantization (ARM only) (#1046) Georgi Gerganov 2023-04-18 23:54:57 +03:00
  • 77a73403ca ggml : add new Q4_2 quantization (ARM only) (#1046) Georgi Gerganov 2023-04-18 23:54:57 +03:00
  • 5eaa6d25cf ggml : scratch that - vmlaq_n_f32 is always better Georgi Gerganov 2023-04-18 23:11:23 +03:00
  • 50a8a2af97 ggml : scratch that - vmlaq_n_f32 is always better Georgi Gerganov 2023-04-18 23:11:23 +03:00
  • 998b1d59c8 gitignore : vdot Georgi Gerganov 2023-04-18 23:00:08 +03:00
  • 4caebf6d40 gitignore : vdot Georgi Gerganov 2023-04-18 23:00:08 +03:00
  • 47aacf4239 ggml : optimize ggml_vec_dot_q4_0_q8_0() using vectorized accumulators Georgi Gerganov 2023-04-18 22:59:17 +03:00
  • dcdd65e296 ggml : optimize ggml_vec_dot_q4_0_q8_0() using vectorized accumulators Georgi Gerganov 2023-04-18 22:59:17 +03:00
  • 684aeeb3e0 Adding a simple program to measure speed of dot products (#1041) Kawrakow 2023-04-18 21:00:14 +02:00
  • 5ecff35151 Adding a simple program to measure speed of dot products (#1041) Kawrakow 2023-04-18 21:00:14 +02:00
  • 426d0c45f4 readme : update hot topics about new LoRA functionality Georgi Gerganov 2023-04-18 20:10:26 +03:00
  • 7faa7460f0 readme : update hot topics about new LoRA functionality Georgi Gerganov 2023-04-18 20:10:26 +03:00
  • b2ef9f4eae ci : do not run on drafts Georgi Gerganov 2023-04-17 18:00:10 +03:00
  • 5af8e32238 ci : do not run on drafts Georgi Gerganov 2023-04-17 18:00:10 +03:00
  • b1f527be59 Do not close file after mmap (Windows version) (#1034) Ivan Komarov 2023-04-18 03:15:50 +02:00
  • 42747220b4 Do not close file after mmap (Windows version) (#1034) Ivan Komarov 2023-04-18 03:15:50 +02:00
  • 3c9a24cc72 readme : add Ruby bindings (#1029) Atsushi Tatsuma 2023-04-18 04:34:35 +09:00
  • e9298af389 readme : add Ruby bindings (#1029) Atsushi Tatsuma 2023-04-18 04:34:35 +09:00
  • 5e5d6ffdaa add 4_0 to default outfile namestr dict (#1031) Cameron 2023-04-17 11:26:23 -07:00
  • 4ad73137a1 add 4_0 to default outfile namestr dict (#1031) Cameron 2023-04-17 11:26:23 -07:00
  • dc0fa95077 Add LoRA support (#820) slaren 2023-04-17 17:28:55 +02:00
  • 315a95a4d3 Add LoRA support (#820) slaren 2023-04-17 17:28:55 +02:00
  • 368d63e55f llama : well-defined static initialization of complex objects (#927) Arik Poznanski 2023-04-17 17:41:53 +03:00
  • efd05648c8 llama : well-defined static initialization of complex objects (#927) Arik Poznanski 2023-04-17 17:41:53 +03:00
  • 9bed0fc823 quantize-stats : fix bug in --type argument Georgi Gerganov 2023-04-17 17:31:06 +03:00
  • eb17a026fd quantize-stats : fix bug in --type argument Georgi Gerganov 2023-04-17 17:31:06 +03:00
  • 42ea22af13 ggml : avoid using ggml_fp16_to_fp32() and ggml_fp32_to_fp16() in ggml.c Georgi Gerganov 2023-04-17 16:16:23 +03:00
  • 69b740289f ggml : avoid using ggml_fp16_to_fp32() and ggml_fp32_to_fp16() in ggml.c Georgi Gerganov 2023-04-17 16:16:23 +03:00
  • fb550a0f64 Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933) Ivan Komarov 2023-04-17 15:10:57 +02:00
  • f266259ad9 Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933) Ivan Komarov 2023-04-17 15:10:57 +02:00
  • b5fefdd2a8 Fix: do not close file on mmap (#1017) slaren 2023-04-16 21:27:38 +02:00
  • 47f61aaa5f Fix: do not close file on mmap (#1017) slaren 2023-04-16 21:27:38 +02:00
  • aa84f3b5d5 stdout : vertical align outputs for better readibility Georgi Gerganov 2023-04-16 13:58:48 +03:00
  • 3173a62eb9 stdout : vertical align outputs for better readibility Georgi Gerganov 2023-04-16 13:58:48 +03:00
  • c5cb4f71c6 examples: add missing <ctime> include for time() (#1011) Pavol Rusnak 2023-04-16 12:13:00 +02:00
  • 489537e6cf examples: add missing <ctime> include for time() (#1011) Pavol Rusnak 2023-04-16 12:13:00 +02:00
  • 598810e9c4 Fix msys2 build error and warnings (#1009) nanahi 2023-04-16 17:13:42 +08:00
  • 2d3481c721 Fix msys2 build error and warnings (#1009) nanahi 2023-04-16 17:13:42 +08:00
  • a0909d9b15 convert.py: Fix loading safetensors and ggml format on Windows (#991) comex 2023-04-15 14:53:21 -07:00
  • 74f5899df4 convert.py: Fix loading safetensors and ggml format on Windows (#991) comex 2023-04-15 14:53:21 -07:00
  • 0363b17de2 Fix potential int8 overflow in non-SIMD vec_dot (#986) Stephan Walter 2023-04-15 18:28:56 +00:00
  • 2f7c8e014e Fix potential int8 overflow in non-SIMD vec_dot (#986) Stephan Walter 2023-04-15 18:28:56 +00:00
  • 378ffbab0e Refactor ggml.c for future tensor types (#1001) Stephan Walter 2023-04-15 16:25:38 +00:00
  • 0ad964631f Refactor ggml.c for future tensor types (#1001) Stephan Walter 2023-04-15 16:25:38 +00:00
  • 053915a751 ggml : add Q8_0 quantization for intermediate results (#951) Georgi Gerganov 2023-04-15 17:53:22 +03:00
  • e95b6554b4 ggml : add Q8_0 quantization for intermediate results (#951) Georgi Gerganov 2023-04-15 17:53:22 +03:00
  • a15576393c ggml : use posix_memalign on non-Windows env Georgi Gerganov 2023-04-15 14:25:45 +03:00
  • aa485cee33 ggml : use posix_memalign on non-Windows env Georgi Gerganov 2023-04-15 14:25:45 +03:00
  • ae921afa4a benchmark : fix result validation in benchmark-q4_0-matmult (#987) Ivan Komarov 2023-04-15 07:51:54 +02:00
  • c12b14b77f benchmark : fix result validation in benchmark-q4_0-matmult (#987) Ivan Komarov 2023-04-15 07:51:54 +02:00
  • 05d97008d3 cmake : add finding the OpenBLAS header file (#992) katsu560 2023-04-15 14:51:11 +09:00
  • 106faaf297 cmake : add finding the OpenBLAS header file (#992) katsu560 2023-04-15 14:51:11 +09:00
  • 87bb0f74bd Revert "main : alternative instruct mode (Vicuna support, etc.) (#863)" (#982) Pavol Rusnak 2023-04-14 21:58:43 +02:00
  • c85e03d12e Revert "main : alternative instruct mode (Vicuna support, etc.) (#863)" (#982) Pavol Rusnak 2023-04-14 21:58:43 +02:00
  • 66cf09af08 py : bump sentencepiece to 0.1.98 to support Python 3.11 (#976) Pavol Rusnak 2023-04-14 21:46:49 +02:00
  • 489093548c py : bump sentencepiece to 0.1.98 to support Python 3.11 (#976) Pavol Rusnak 2023-04-14 21:46:49 +02:00