Commit Graph

  • 741a6fed44 make : add train-text-from-scratch (#1850) daboe01 2023-06-15 19:42:48 +02:00
  • cf267d1c71 make : add train-text-from-scratch (#1850) daboe01 2023-06-15 19:42:48 +02:00
  • 2f41c7c27c readme : server compile flag (#1874) Srinivas Billa 2023-06-15 18:36:38 +01:00
  • 9dda13e5e1 readme : server compile flag (#1874) Srinivas Billa 2023-06-15 18:36:38 +01:00
  • 8fd9583b2c make : clean *.so files (#1857) sandyiscool 2023-06-15 23:06:06 +05:30
  • 37e257c48e make : clean *.so files (#1857) sandyiscool 2023-06-15 23:06:06 +05:30
  • 9af138537f Fix the validation of main device (#1872) Howard Su 2023-06-16 01:29:59 +08:00
  • 64cc19b4fe Fix the validation of main device (#1872) Howard Su 2023-06-16 01:29:59 +08:00
  • feaa009626 metal : parallel command buffer encoding (#1860) Georgi Gerganov 2023-06-15 20:29:48 +03:00
  • 4bfcc855ab metal : parallel command buffer encoding (#1860) Georgi Gerganov 2023-06-15 20:29:48 +03:00
  • da3f715abb Better error when using both LoRA + GPU layers (#1861) Johannes Gäßler 2023-06-15 19:06:46 +02:00
  • 6b8312e797 Better error when using both LoRA + GPU layers (#1861) Johannes Gäßler 2023-06-15 19:06:46 +02:00
  • 4557646a38 CUDA full GPU acceleration, KV cache in VRAM (#1827) Johannes Gäßler 2023-06-14 19:47:19 +02:00
  • 254a7a7a5f CUDA full GPU acceleration, KV cache in VRAM (#1827) Johannes Gäßler 2023-06-14 19:47:19 +02:00
  • 1133a63542 baby-llama : fix operator!= (#1821) 0xspringtime 2023-06-13 15:37:54 -04:00
  • 9254920265 baby-llama : fix operator!= (#1821) 0xspringtime 2023-06-13 15:37:54 -04:00
  • d49334df42 train : improved training-from-scratch example (#1652) xaedes 2023-06-13 21:04:40 +02:00
  • e32089b2c2 train : improved training-from-scratch example (#1652) xaedes 2023-06-13 21:04:40 +02:00
  • 499e22902e llama : do a warm-up eval at start for better timings (#1824) Georgi Gerganov 2023-06-13 20:20:07 +03:00
  • 2347e45e7b llama : do a warm-up eval at start for better timings (#1824) Georgi Gerganov 2023-06-13 20:20:07 +03:00
  • 5416059be8 Allow "quantizing" to f16 and f32 (#1787) Kerfuffle 2023-06-13 04:23:23 -06:00
  • 74d4cfa343 Allow "quantizing" to f16 and f32 (#1787) Kerfuffle 2023-06-13 04:23:23 -06:00
  • 44af4a2f75 Metal implementation for all k_quants (#1807) Kawrakow 2023-06-12 22:39:21 +03:00
  • 74a6d922f1 Metal implementation for all k_quants (#1807) Kawrakow 2023-06-12 22:39:21 +03:00
  • 1200071552 ci : run when changing only the CUDA sources (#1800) slaren 2023-06-12 19:12:47 +02:00
  • e4caa8da59 ci : run when changing only the CUDA sources (#1800) slaren 2023-06-12 19:12:47 +02:00
  • df687e822c Leverage mmap for offloading tensors to GPU (#1597) Howard Su 2023-06-12 20:44:16 +08:00
  • 58970a4c39 Leverage mmap for offloading tensors to GPU (#1597) Howard Su 2023-06-12 20:44:16 +08:00
  • 6c7fc1cc50 metal : fix failure to load model (#1817) Kawrakow 2023-06-12 14:31:36 +03:00
  • 8c0a10e64d metal : fix failure to load model (#1817) Kawrakow 2023-06-12 14:31:36 +03:00
  • 1ca9832378 Fix issue where interactive mode crashes when input exceeds ctx size (#1789) Kerfuffle 2023-06-11 08:19:17 -06:00
  • fa84c4b3e8 Fix issue where interactive mode crashes when input exceeds ctx size (#1789) Kerfuffle 2023-06-11 08:19:17 -06:00
  • 6b4bca53f7 Fixed WSL cuda's OOM error (#1594) Kyle Liang 2023-06-11 21:20:52 +08:00
  • 12b063f0ec Fixed WSL cuda's OOM error (#1594) Kyle Liang 2023-06-11 21:20:52 +08:00
  • eb97aa37ad Update SHA256SUMS with current hashes for models quantized using q4_0 (#1798) Ryan Landay 2023-06-11 17:38:53 +08:00
  • 31d2b5f4a4 Update SHA256SUMS with current hashes for models quantized using q4_0 (#1798) Ryan Landay 2023-06-11 17:38:53 +08:00
  • 2069d67d2b cmake : fix Metal build (close #1791) Georgi Gerganov 2023-06-10 22:56:53 +03:00
  • 4de0334f5c cmake : fix Metal build (close #1791) Georgi Gerganov 2023-06-10 22:56:53 +03:00
  • 378a3a814d k-quants : GCC12 compilation fix (#1792) Artyom Lebedev 2023-06-10 22:51:36 +03:00
  • 3f1223155a k-quants : GCC12 compilation fix (#1792) Artyom Lebedev 2023-06-10 22:51:36 +03:00
  • c6755e854f metal : fix issue with ggml-metal.metal path. Closes #1769 (#1782) Andrei 2023-06-10 10:47:34 -04:00
  • 303f5809f1 metal : fix issue with ggml-metal.metal path. Closes #1769 (#1782) Andrei 2023-06-10 10:47:34 -04:00
  • 5e19207f69 doc : fix wrong address of BLIS.md (#1772) Aisuko 2023-06-11 00:08:11 +10:00
  • 059e99066d doc : fix wrong address of BLIS.md (#1772) Aisuko 2023-06-11 00:08:11 +10:00
  • bbd64dc9df ggml : force no_alloc == false when creating opt tensors (close #1699) Georgi Gerganov 2023-06-10 12:06:45 +03:00
  • 17c10acfb4 ggml : force no_alloc == false when creating opt tensors (close #1699) Georgi Gerganov 2023-06-10 12:06:45 +03:00
  • a785c53962 metal : add Q4_1 implementation (#1785) Kawrakow 2023-06-10 11:28:11 +03:00
  • e9b66ee982 metal : add Q4_1 implementation (#1785) Kawrakow 2023-06-10 11:28:11 +03:00
  • efdb29c078 llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691) Kerfuffle 2023-06-10 01:59:17 -06:00
  • 4f0154b0ba llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691) Kerfuffle 2023-06-10 01:59:17 -06:00
  • 2ca4da3218 ggml : workaround for missing _mm256_setr_m128i in GCC < 8 (#1638) Xingchen Song(宋星辰) 2023-06-10 15:49:40 +08:00
  • ef3171d162 ggml : workaround for missing _mm256_setr_m128i in GCC < 8 (#1638) Xingchen Song(宋星辰) 2023-06-10 15:49:40 +08:00
  • 0d61b825f8 make : add SSSE3 compilation use case (#1659) rankaiyx 2023-06-10 14:41:59 +08:00
  • 555275a693 make : add SSSE3 compilation use case (#1659) rankaiyx 2023-06-10 14:41:59 +08:00
  • 6a4f97263d OpenCL: Add release memory (#1741) Robert Sung-wook Shin 2023-06-10 01:24:40 +09:00
  • 98ed165574 OpenCL: Add release memory (#1741) Robert Sung-wook Shin 2023-06-10 01:24:40 +09:00
  • f183c7331b Windows nvcc workaround (#1753) Johannes Gäßler 2023-06-09 13:58:15 +02:00
  • ae9663f188 Windows nvcc workaround (#1753) Johannes Gäßler 2023-06-09 13:58:15 +02:00
  • a7d5f8967a metal : fix build "tanhf" -> "tanh" Georgi Gerganov 2023-06-09 11:11:04 +03:00
  • b33dee282f metal : fix build "tanhf" -> "tanh" Georgi Gerganov 2023-06-09 11:11:04 +03:00
  • be9626fc1a metal : add GELU implementation (#1770) AT 2023-06-09 04:00:51 -04:00
  • 92f44ff7f7 metal : add GELU implementation (#1770) AT 2023-06-09 04:00:51 -04:00
  • 34bca912d4 metal : faster q4_0 (#1775) Kawrakow 2023-06-09 10:39:59 +03:00
  • 245fc3c37d metal : faster q4_0 (#1775) Kawrakow 2023-06-09 10:39:59 +03:00
  • e6d0170855 metal : add Q2_K implementation (#1762) Kawrakow 2023-06-08 22:28:21 +03:00
  • 72ff5282bf metal : add Q2_K implementation (#1762) Kawrakow 2023-06-08 22:28:21 +03:00
  • d9198d8e30 Revert "ggml : load data into int8x16x4_t using vld4q_s8 on arm64 (#1738)" Georgi Gerganov 2023-06-08 20:48:14 +03:00
  • 0bf7cf1b29 Revert "ggml : load data into int8x16x4_t using vld4q_s8 on arm64 (#1738)" Georgi Gerganov 2023-06-08 20:48:14 +03:00
  • 7364ddfbe8 ggml : load data into int8x16x4_t using vld4q_s8 on arm64 (#1738) le.chang 2023-06-09 00:47:56 +08:00
  • 8432d4d9f7 ggml : load data into int8x16x4_t using vld4q_s8 on arm64 (#1738) le.chang 2023-06-09 00:47:56 +08:00
  • b73306af21 metal : Q6_K implementation (#1752) Kawrakow 2023-06-08 19:46:22 +03:00
  • 0f291e1f65 metal : Q6_K implementation (#1752) Kawrakow 2023-06-08 19:46:22 +03:00
  • 19cceb2600 Add llama.cpp docker support for non-latin languages (#1673) qingfengfenga 2023-06-08 15:58:53 +08:00
  • 8fc8179919 Add llama.cpp docker support for non-latin languages (#1673) qingfengfenga 2023-06-08 15:58:53 +08:00
  • 9aec13530b ggml : fix fprintf warnings (#1720) Steven Roussey 2023-06-08 00:12:28 -07:00
  • b50b570ed9 ggml : fix fprintf warnings (#1720) Steven Roussey 2023-06-08 00:12:28 -07:00
  • 62fe685f4f clang-tidy : restore dot file from accidental deletion Georgi Gerganov 2023-06-08 10:09:08 +03:00
  • 53aba3f393 clang-tidy : restore dot file from accidental deletion Georgi Gerganov 2023-06-08 10:09:08 +03:00
  • a68dd950d7 metal : add Q4_K implementation (#1733) Kawrakow 2023-06-08 10:08:23 +03:00
  • 4161bdc04d metal : add Q4_K implementation (#1733) Kawrakow 2023-06-08 10:08:23 +03:00
  • 2d1609eaaf k-quants : add missing compile definition to CMakeLists (#1748) johnson442 2023-06-08 08:02:48 +01:00
  • 0035858273 k-quants : add missing compile definition to CMakeLists (#1748) johnson442 2023-06-08 08:02:48 +01:00
  • afb1e0cf32 k-quants : allow to optionally disable at compile time (#1734) Georgi Gerganov 2023-06-07 10:59:52 +03:00
  • 5c64a0952e k-quants : allow to optionally disable at compile time (#1734) Georgi Gerganov 2023-06-07 10:59:52 +03:00
  • 4e1aea2999 flake : update to support metal on m1/m2 (#1724) jacobi petrucciani 2023-06-07 00:15:31 -04:00
  • 5b57a5b726 flake : update to support metal on m1/m2 (#1724) jacobi petrucciani 2023-06-07 00:15:31 -04:00
  • e115880e31 readme : add June roadmap Georgi Gerganov 2023-06-07 07:15:08 +03:00
  • 4dc62c545d readme : add June roadmap Georgi Gerganov 2023-06-07 07:15:08 +03:00
  • 1ad7e68899 main: add the possibility to open the prompt cache read-only (#1640) Willy Tarreau 2023-06-07 04:10:17 +02:00
  • 35a84916fb main: add the possibility to open the prompt cache read-only (#1640) Willy Tarreau 2023-06-07 04:10:17 +02:00
  • fb21508b1b llama : fix vram_scratch var Georgi Gerganov 2023-06-06 22:54:39 +03:00
  • 2d7bf110ed llama : fix vram_scratch var Georgi Gerganov 2023-06-06 22:54:39 +03:00
  • 11757de5e1 llama : fix compile warnings Georgi Gerganov 2023-06-06 22:41:53 +03:00
  • 2a4e41a086 llama : fix compile warnings Georgi Gerganov 2023-06-06 22:41:53 +03:00
  • e957101084 Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703) Johannes Gäßler 2023-06-06 21:33:23 +02:00
  • 17366df842 Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703) Johannes Gäßler 2023-06-06 21:33:23 +02:00
  • 32ef74369c metal : add f16 support Georgi Gerganov 2023-06-06 20:16:57 +03:00
  • 44f906e853 metal : add f16 support Georgi Gerganov 2023-06-06 20:16:57 +03:00
  • 698d0096d6 Clblast fixes + enhancements to save VRAM and offload more layers (#1675) LostRuins 2023-06-07 01:00:01 +08:00
  • d5b111f53d Clblast fixes + enhancements to save VRAM and offload more layers (#1675) LostRuins 2023-06-07 01:00:01 +08:00