Commit Graph

  • 2ca4da3218 ggml : workaround for missing _mm256_setr_m128i in GCC < 8 (#1638) Xingchen Song(宋星辰) 2023-06-10 15:49:40 +08:00
  • ef3171d162 ggml : workaround for missing _mm256_setr_m128i in GCC < 8 (#1638) Xingchen Song(宋星辰) 2023-06-10 15:49:40 +08:00
  • 0d61b825f8 make : add SSSE3 compilation use case (#1659) rankaiyx 2023-06-10 14:41:59 +08:00
  • 555275a693 make : add SSSE3 compilation use case (#1659) rankaiyx 2023-06-10 14:41:59 +08:00
  • 6a4f97263d OpenCL: Add release memory (#1741) Robert Sung-wook Shin 2023-06-10 01:24:40 +09:00
  • 98ed165574 OpenCL: Add release memory (#1741) Robert Sung-wook Shin 2023-06-10 01:24:40 +09:00
  • f183c7331b Windows nvcc workaround (#1753) Johannes Gäßler 2023-06-09 13:58:15 +02:00
  • ae9663f188 Windows nvcc workaround (#1753) Johannes Gäßler 2023-06-09 13:58:15 +02:00
  • a7d5f8967a metal : fix build "tanhf" -> "tanh" Georgi Gerganov 2023-06-09 11:11:04 +03:00
  • b33dee282f metal : fix build "tanhf" -> "tanh" Georgi Gerganov 2023-06-09 11:11:04 +03:00
  • be9626fc1a metal : add GELU implementation (#1770) AT 2023-06-09 04:00:51 -04:00
  • 92f44ff7f7 metal : add GELU implementation (#1770) AT 2023-06-09 04:00:51 -04:00
  • 34bca912d4 metal : faster q4_0 (#1775) Kawrakow 2023-06-09 10:39:59 +03:00
  • 245fc3c37d metal : faster q4_0 (#1775) Kawrakow 2023-06-09 10:39:59 +03:00
  • e6d0170855 metal : add Q2_K implementation (#1762) Kawrakow 2023-06-08 22:28:21 +03:00
  • 72ff5282bf metal : add Q2_K implementation (#1762) Kawrakow 2023-06-08 22:28:21 +03:00
  • d9198d8e30 Revert "ggml : load data into int8x16x4_t using vld4q_s8 on arm64 (#1738)" Georgi Gerganov 2023-06-08 20:48:14 +03:00
  • 0bf7cf1b29 Revert "ggml : load data into int8x16x4_t using vld4q_s8 on arm64 (#1738)" Georgi Gerganov 2023-06-08 20:48:14 +03:00
  • 7364ddfbe8 ggml : load data into int8x16x4_t using vld4q_s8 on arm64 (#1738) le.chang 2023-06-09 00:47:56 +08:00
  • 8432d4d9f7 ggml : load data into int8x16x4_t using vld4q_s8 on arm64 (#1738) le.chang 2023-06-09 00:47:56 +08:00
  • b73306af21 metal : Q6_K implementation (#1752) Kawrakow 2023-06-08 19:46:22 +03:00
  • 0f291e1f65 metal : Q6_K implementation (#1752) Kawrakow 2023-06-08 19:46:22 +03:00
  • 19cceb2600 Add llama.cpp docker support for non-latin languages (#1673) qingfengfenga 2023-06-08 15:58:53 +08:00
  • 8fc8179919 Add llama.cpp docker support for non-latin languages (#1673) qingfengfenga 2023-06-08 15:58:53 +08:00
  • 9aec13530b ggml : fix fprintf warnings (#1720) Steven Roussey 2023-06-08 00:12:28 -07:00
  • b50b570ed9 ggml : fix fprintf warnings (#1720) Steven Roussey 2023-06-08 00:12:28 -07:00
  • 62fe685f4f clang-tidy : restore dot file from accidental deletion Georgi Gerganov 2023-06-08 10:09:08 +03:00
  • 53aba3f393 clang-tidy : restore dot file from accidental deletion Georgi Gerganov 2023-06-08 10:09:08 +03:00
  • a68dd950d7 metal : add Q4_K implementation (#1733) Kawrakow 2023-06-08 10:08:23 +03:00
  • 4161bdc04d metal : add Q4_K implementation (#1733) Kawrakow 2023-06-08 10:08:23 +03:00
  • 2d1609eaaf k-quants : add missing compile definition to CMakeLists (#1748) johnson442 2023-06-08 08:02:48 +01:00
  • 0035858273 k-quants : add missing compile definition to CMakeLists (#1748) johnson442 2023-06-08 08:02:48 +01:00
  • afb1e0cf32 k-quants : allow to optionally disable at compile time (#1734) Georgi Gerganov 2023-06-07 10:59:52 +03:00
  • 5c64a0952e k-quants : allow to optionally disable at compile time (#1734) Georgi Gerganov 2023-06-07 10:59:52 +03:00
  • 4e1aea2999 flake : update to support metal on m1/m2 (#1724) jacobi petrucciani 2023-06-07 00:15:31 -04:00
  • 5b57a5b726 flake : update to support metal on m1/m2 (#1724) jacobi petrucciani 2023-06-07 00:15:31 -04:00
  • e115880e31 readme : add June roadmap Georgi Gerganov 2023-06-07 07:15:08 +03:00
  • 4dc62c545d readme : add June roadmap Georgi Gerganov 2023-06-07 07:15:08 +03:00
  • 1ad7e68899 main: add the possibility to open the prompt cache read-only (#1640) Willy Tarreau 2023-06-07 04:10:17 +02:00
  • 35a84916fb main: add the possibility to open the prompt cache read-only (#1640) Willy Tarreau 2023-06-07 04:10:17 +02:00
  • fb21508b1b llama : fix vram_scratch var Georgi Gerganov 2023-06-06 22:54:39 +03:00
  • 2d7bf110ed llama : fix vram_scratch var Georgi Gerganov 2023-06-06 22:54:39 +03:00
  • 11757de5e1 llama : fix compile warnings Georgi Gerganov 2023-06-06 22:41:53 +03:00
  • 2a4e41a086 llama : fix compile warnings Georgi Gerganov 2023-06-06 22:41:53 +03:00
  • e957101084 Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703) Johannes Gäßler 2023-06-06 21:33:23 +02:00
  • 17366df842 Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703) Johannes Gäßler 2023-06-06 21:33:23 +02:00
  • 32ef74369c metal : add f16 support Georgi Gerganov 2023-06-06 20:16:57 +03:00
  • 44f906e853 metal : add f16 support Georgi Gerganov 2023-06-06 20:16:57 +03:00
  • 698d0096d6 Clblast fixes + enhancements to save VRAM and offload more layers (#1675) LostRuins 2023-06-07 01:00:01 +08:00
  • d5b111f53d Clblast fixes + enhancements to save VRAM and offload more layers (#1675) LostRuins 2023-06-07 01:00:01 +08:00
  • 0679702945 ggml : fix builds, add ggml-quants-k.o (close #1712, close #1710) Georgi Gerganov 2023-06-06 10:18:03 +03:00
  • 2d43387daf ggml : fix builds, add ggml-quants-k.o (close #1712, close #1710) Georgi Gerganov 2023-06-06 10:18:03 +03:00
  • 0aa222be28 gitignore : add .clang-tidy Georgi Gerganov 2023-06-06 09:55:10 +03:00
  • 7ad7750c5c gitignore : add .clang-tidy Georgi Gerganov 2023-06-06 09:55:10 +03:00
  • 391d82f905 llama : temporary disable Q6_K output quantization (#1711) Georgi Gerganov 2023-06-06 09:39:38 +03:00
  • 7a74dee6b4 llama : temporary disable Q6_K output quantization (#1711) Georgi Gerganov 2023-06-06 09:39:38 +03:00
  • c54f0af06b metal : add checks for buffer size (#1706) Spencer Sutton 2023-06-05 23:28:17 -04:00
  • 590250f7a9 metal : add checks for buffer size (#1706) Spencer Sutton 2023-06-05 23:28:17 -04:00
  • 739a917df4 docs : add performance troubleshoot + example benchmark documentation (#1674) Yuval Peled 2023-06-05 23:32:36 +03:00
  • f4c55d3bd7 docs : add performance troubleshoot + example benchmark documentation (#1674) Yuval Peled 2023-06-05 23:32:36 +03:00
  • ae6585dec5 readme : fix typo (#1700) Foul-Tarnished 2023-06-05 22:28:37 +02:00
  • f1465624c2 readme : fix typo (#1700) Foul-Tarnished 2023-06-05 22:28:37 +02:00
  • 38afdf54fd llama : consistently catch and throw only exceptions deriving from std::exception (#1599) mgroeber9110 2023-06-05 22:24:29 +02:00
  • c2df36d60d llama : consistently catch and throw only exceptions deriving from std::exception (#1599) mgroeber9110 2023-06-05 22:24:29 +02:00
  • 6600bfe1f7 metal : use shared buffers between CPU and GPU (#1696) kiltyj 2023-06-05 13:24:04 -07:00
  • 9d0693bce3 metal : use shared buffers between CPU and GPU (#1696) kiltyj 2023-06-05 13:24:04 -07:00
  • 3cdac3973b ggml : fix internal overflow in ggml_time_us on Windows (#1702) grahameth 2023-06-05 22:11:49 +02:00
  • efe0507632 ggml : fix internal overflow in ggml_time_us on Windows (#1702) grahameth 2023-06-05 22:11:49 +02:00
  • 1b56d3c0cd ci : disable auto tidy (#1705) Georgi Gerganov 2023-06-05 23:05:05 +03:00
  • e7fe66e670 ci : disable auto tidy (#1705) Georgi Gerganov 2023-06-05 23:05:05 +03:00
  • 083663ff7d ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684) Kawrakow 2023-06-05 22:56:18 +03:00
  • 99009e72f8 ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684) Kawrakow 2023-06-05 22:56:18 +03:00
  • c0bd595a01 Increase 3B scratch buffers. (#1698) Henri Vasserman 2023-06-05 13:43:08 +03:00
  • 5220a991a5 Increase 3B scratch buffers. (#1698) Henri Vasserman 2023-06-05 13:43:08 +03:00
  • dbfd086f3f llama : fix Metal KV cache sync (close #1695) Georgi Gerganov 2023-06-05 10:19:03 +03:00
  • d1f563a743 llama : fix Metal KV cache sync (close #1695) Georgi Gerganov 2023-06-05 10:19:03 +03:00
  • 595b95c725 readme : update hot topics Georgi Gerganov 2023-06-04 23:38:19 +03:00
  • 827f5eda91 readme : update hot topics Georgi Gerganov 2023-06-04 23:38:19 +03:00
  • 442db16478 llama : Metal inference (#1642) Georgi Gerganov 2023-06-04 23:34:30 +03:00
  • ecb217db4f llama : Metal inference (#1642) Georgi Gerganov 2023-06-04 23:34:30 +03:00
  • 9223b3ab53 OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653) 0cc4m 2023-06-04 08:12:05 +02:00
  • dcb2ed4826 OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653) 0cc4m 2023-06-04 08:12:05 +02:00
  • a65a9c322b Add info about CUDA_VISIBLE_DEVICES (#1682) Henri Vasserman 2023-06-03 16:35:20 +03:00
  • d8bd0013e8 Add info about CUDA_VISIBLE_DEVICES (#1682) Henri Vasserman 2023-06-03 16:35:20 +03:00
  • d4a5347616 Docker: change to calling convert.py (#1641) Jiří Podivín 2023-06-03 14:11:53 +02:00
  • b5c85468a3 Docker: change to calling convert.py (#1641) Jiří Podivín 2023-06-03 14:11:53 +02:00
  • 1faeb8fd7c Fix prompt cache saving and chat-persistent rollover (#1678) Evan Jones 2023-06-03 07:28:45 -04:00
  • 136476e898 Fix prompt cache saving and chat-persistent rollover (#1678) Evan Jones 2023-06-03 07:28:45 -04:00
  • ee6461be54 OpenLLaMA 3B support (#1588) Henri Vasserman 2023-05-30 21:24:22 +03:00
  • ffb06a345e OpenLLaMA 3B support (#1588) Henri Vasserman 2023-05-30 21:24:22 +03:00
  • 9bd2bdfba4 ggml : sync cgraph import / export API Georgi Gerganov 2023-05-29 19:31:44 +03:00
  • 7552ac5863 ggml : sync cgraph import / export API Georgi Gerganov 2023-05-29 19:31:44 +03:00
  • 6ae16e9c4f ggml : fix bug in ggml_alibi Georgi Gerganov 2023-05-29 19:30:49 +03:00
  • 5d1830b99d ggml : fix bug in ggml_alibi Georgi Gerganov 2023-05-29 19:30:49 +03:00
  • 5d769f35b3 Work around for recalculating logits in cached prompts (Fixes #1585) (#1609) DannyDaemonic 2023-05-29 05:13:40 -07:00
  • 248367605e Work around for recalculating logits in cached prompts (Fixes #1585) (#1609) DannyDaemonic 2023-05-29 05:13:40 -07:00
  • 4541c5d448 Adding git in container package dependencies (#1621) Jiří Podivín 2023-05-29 06:45:50 +02:00
  • 0e730dd23b Adding git in container package dependencies (#1621) Jiří Podivín 2023-05-29 06:45:50 +02:00
  • e7b0672c25 LLAMA_DEBUG adds debug symbols (#1617) Johannes Gäßler 2023-05-28 21:01:02 +02:00
  • 3b126f654f LLAMA_DEBUG adds debug symbols (#1617) Johannes Gäßler 2023-05-28 21:01:02 +02:00