Commit Graph

  • c6e455a3ca flake : update flake.nix (#2270) wzy 2023-07-19 15:01:55 +08:00
  • 45a1b07e9b flake : update flake.nix (#2270) wzy 2023-07-19 15:01:55 +08:00
  • 28f201429f cmake : install targets (#2256) wzy 2023-07-19 15:01:11 +08:00
  • b1f4290953 cmake : install targets (#2256) wzy 2023-07-19 15:01:11 +08:00
  • 880eee64f1 ci : integrate with ggml-org/ci (#2250) Georgi Gerganov 2023-07-18 14:24:43 +03:00
  • d01bccde9f ci : integrate with ggml-org/ci (#2250) Georgi Gerganov 2023-07-18 14:24:43 +03:00
  • 7a0c5c9455 llama : shorten quantization descriptions Georgi Gerganov 2023-07-18 11:50:49 +03:00
  • 6cbf9dfb32 llama : shorten quantization descriptions Georgi Gerganov 2023-07-18 11:50:49 +03:00
  • b4c5800a98 Support dup & cont ops on CUDA (#2242) Jiahao Li 2023-07-18 01:39:29 +08:00
  • 7568d1a2b2 Support dup & cont ops on CUDA (#2242) Jiahao Li 2023-07-18 01:39:29 +08:00
  • 6f0630a2d9 llama : fix t_start_sample_us initialization warning (#2238) Alex Klinkhamer 2023-07-16 14:01:45 -07:00
  • b7647436cc llama : fix t_start_sample_us initialization warning (#2238) Alex Klinkhamer 2023-07-16 14:01:45 -07:00
  • 77a124664c ggml : fixed runtime bugs and compile errors related to GGML_PERF and GGML_DEBUG (#2219) Qingyou Meng 2023-07-17 03:57:28 +08:00
  • 672dda10e4 ggml : fixed runtime bugs and compile errors related to GGML_PERF and GGML_DEBUG (#2219) Qingyou Meng 2023-07-17 03:57:28 +08:00
  • db33038490 py : turn verify-checksum-models.py into executable (#2245) Jiří Podivín 2023-07-16 21:54:47 +02:00
  • 27ab66e437 py : turn verify-checksum-models.py into executable (#2245) Jiří Podivín 2023-07-16 21:54:47 +02:00
  • 88d1303c97 llama : add custom RoPE (#2054) Xiao-Yong Jin 2023-07-15 06:34:16 -04:00
  • 6e7cca4047 llama : add custom RoPE (#2054) Xiao-Yong Jin 2023-07-15 06:34:16 -04:00
  • f308c3e432 flake : add runHook preInstall/postInstall to installPhase so hooks function (#2224) Dave Della Costa 2023-07-14 15:13:38 -04:00
  • a6803cab94 flake : add runHook preInstall/postInstall to installPhase so hooks function (#2224) Dave Della Costa 2023-07-14 15:13:38 -04:00
  • 350f657517 make : use pkg-config for OpenBLAS (#2222) wzy 2023-07-15 03:05:08 +08:00
  • 7dabc66f3c make : use pkg-config for OpenBLAS (#2222) wzy 2023-07-15 03:05:08 +08:00
  • 39598e597b cuda : allocate all temporary ggml_tensor_extra_gpu from a fixed-size buffer (#2220) Bach Le 2023-07-15 03:00:58 +08:00
  • 7cdd30bf1f cuda : allocate all temporary ggml_tensor_extra_gpu from a fixed-size buffer (#2220) Bach Le 2023-07-15 03:00:58 +08:00
  • 974a37c5e6 ggml : fix static_assert with older compilers #2024 (#2218) Evan Miller 2023-07-14 14:55:56 -04:00
  • e8035f141e ggml : fix static_assert with older compilers #2024 (#2218) Evan Miller 2023-07-14 14:55:56 -04:00
  • 5e57b4f825 llama : add functions that work directly on model (#2197) Bach Le 2023-07-15 02:55:24 +08:00
  • 7513b7b0a1 llama : add functions that work directly on model (#2197) Bach Le 2023-07-15 02:55:24 +08:00
  • bb2b894e2e build.zig : install config header (#2216) Ali Chraghi 2023-07-14 11:50:58 -07:00
  • de8342423d build.zig : install config header (#2216) Ali Chraghi 2023-07-14 11:50:58 -07:00
  • 9a4d875282 examples : fixed path typos in embd-input (#2214) Shangning Xu 2023-07-15 02:40:05 +08:00
  • c48c525f87 examples : fixed path typos in embd-input (#2214) Shangning Xu 2023-07-15 02:40:05 +08:00
  • fa641f1f2b cuda : support broadcast add & mul (#2192) Jiahao Li 2023-07-15 02:38:24 +08:00
  • 206e01de11 cuda : support broadcast add & mul (#2192) Jiahao Li 2023-07-15 02:38:24 +08:00
  • dad966c69c CUDA: mul_mat_vec_q kernels for k-quants (#2203) Johannes Gäßler 2023-07-14 19:44:08 +02:00
  • 4304bd3cde CUDA: mul_mat_vec_q kernels for k-quants (#2203) Johannes Gäßler 2023-07-14 19:44:08 +02:00
  • e59a2947e2 make : fix combination of LLAMA_METAL and LLAMA_MPI (#2208) James Reynolds 2023-07-14 11:34:40 -06:00
  • 229aab351c make : fix combination of LLAMA_METAL and LLAMA_MPI (#2208) James Reynolds 2023-07-14 11:34:40 -06:00
  • 12093f532e ggml : sync (ggml_conv_2d, fix mul_mat bug, CUDA GLM rope) Georgi Gerganov 2023-07-14 16:36:41 +03:00
  • 697966680b ggml : sync (ggml_conv_2d, fix mul_mat bug, CUDA GLM rope) Georgi Gerganov 2023-07-14 16:36:41 +03:00
  • b962dfb7f1 Metal: faster Q4_0 and Q4_1 matrix x vector kernels (#2212) Kawrakow 2023-07-14 12:46:21 +03:00
  • 27ad57a69b Metal: faster Q4_0 and Q4_1 matrix x vector kernels (#2212) Kawrakow 2023-07-14 12:46:21 +03:00
  • 2d2e6da9c6 Revert "Support using mmap when applying LoRA (#2095)" (#2206) Howard Su 2023-07-13 21:58:25 +08:00
  • 32c5411631 Revert "Support using mmap when applying LoRA (#2095)" (#2206) Howard Su 2023-07-13 21:58:25 +08:00
  • 32f1dba0a5 Fix compile error on Windows CUDA (#2207) Howard Su 2023-07-13 21:58:09 +08:00
  • ff5d58faec Fix compile error on Windows CUDA (#2207) Howard Su 2023-07-13 21:58:09 +08:00
  • d2431be1dc devops : add missing quotes to bash script (#2193) Bodo Graumann 2023-07-13 15:49:14 +02:00
  • b782422a3e devops : add missing quotes to bash script (#2193) Bodo Graumann 2023-07-13 15:49:14 +02:00
  • 161aed64f6 metal : new q4_0 matrix-vector kernel (#2188) Shouzheng Liu 2023-07-12 16:10:55 -04:00
  • 1cbf561466 metal : new q4_0 matrix-vector kernel (#2188) Shouzheng Liu 2023-07-12 16:10:55 -04:00
  • e380b1243a ggml : broadcast mul_mat + conv batch support (#2199) Georgi Gerganov 2023-07-12 20:51:29 +03:00
  • 975221e954 ggml : broadcast mul_mat + conv batch support (#2199) Georgi Gerganov 2023-07-12 20:51:29 +03:00
  • 60db956c53 ggml : add ggml_pool_1d and ggml_pool_2d Georgi Gerganov 2023-07-12 20:27:03 +03:00
  • 4523d10d0c ggml : add ggml_pool_1d and ggml_pool_2d Georgi Gerganov 2023-07-12 20:27:03 +03:00
  • 42d3e55efb cuda : add gelu support Georgi Gerganov 2023-07-12 20:26:18 +03:00
  • 680e6f9177 cuda : add gelu support Georgi Gerganov 2023-07-12 20:26:18 +03:00
  • cf1fc25386 FP16 is supported in CM=6.0 (#2177) Howard Su 2023-07-12 20:18:40 +08:00
  • 4e7464ef88 FP16 is supported in CM=6.0 (#2177) Howard Su 2023-07-12 20:18:40 +08:00
  • c40d275eb7 Fixed __dp4a compute capability: 6.0 -> 6.1 (#2189) Johannes Gäßler 2023-07-12 10:38:52 +02:00
  • 2b5eb72e10 Fixed __dp4a compute capability: 6.0 -> 6.1 (#2189) Johannes Gäßler 2023-07-12 10:38:52 +02:00
  • a876a9ced4 ggml : revert CUDA broadcast changes from #2183 (#2191) Georgi Gerganov 2023-07-12 10:54:19 +03:00
  • f7d278faf3 ggml : revert CUDA broadcast changes from #2183 (#2191) Georgi Gerganov 2023-07-12 10:54:19 +03:00
  • e6ec947c28 ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183) Georgi Gerganov 2023-07-11 22:53:34 +03:00
  • 20d7740a9b ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183) Georgi Gerganov 2023-07-11 22:53:34 +03:00
  • 70091ad05f ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178) Spencer Sutton 2023-07-11 12:31:10 -04:00
  • 5bf2a27718 ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178) Spencer Sutton 2023-07-11 12:31:10 -04:00
  • 7279fbbdcb llama : add classifier-free guidance (#2135) Bach Le 2023-07-12 00:18:43 +08:00
  • c9c74b4e3f llama : add classifier-free guidance (#2135) Bach Le 2023-07-12 00:18:43 +08:00
  • 3cff4c5d9c docker : add '--server' option (#2174) Jinwoo Jeong 2023-07-12 01:12:35 +09:00
  • 3ec7e596b2 docker : add '--server' option (#2174) Jinwoo Jeong 2023-07-12 01:12:35 +09:00
  • 501f2cb068 readme : fix zig build instructions (#2171) Chad Brewbaker 2023-07-11 11:03:06 -05:00
  • 917831c63a readme : fix zig build instructions (#2171) Chad Brewbaker 2023-07-11 11:03:06 -05:00
  • cc7216222c Support using mmap when applying LoRA (#2095) Howard Su 2023-07-11 22:37:01 +08:00
  • 2347463201 Support using mmap when applying LoRA (#2095) Howard Su 2023-07-11 22:37:01 +08:00
  • e2d6f7e95a Possible solution to allow K-quants on models with n_vocab!=32000 (#2148) LostRuins 2023-07-11 22:01:08 +08:00
  • bbef28218f Possible solution to allow K-quants on models with n_vocab!=32000 (#2148) LostRuins 2023-07-11 22:01:08 +08:00
  • 7282a23a5e mpi : add support for distributed inference via MPI (#2099) Evan Miller 2023-07-10 11:49:56 -04:00
  • 5656d10599 mpi : add support for distributed inference via MPI (#2099) Evan Miller 2023-07-10 11:49:56 -04:00
  • 645219e901 llama : remove "first token must be BOS" restriction (#2153) oobabooga 2023-07-09 05:59:53 -03:00
  • 1d16309969 llama : remove "first token must be BOS" restriction (#2153) oobabooga 2023-07-09 05:59:53 -03:00
  • daecfe4059 main : escape prompt prefix/suffix (#2151) Nigel Bosch 2023-07-09 03:56:18 -05:00
  • db4047ad5c main : escape prompt prefix/suffix (#2151) Nigel Bosch 2023-07-09 03:56:18 -05:00
  • a9f858bae7 readme : update Termux instructions (#2147) JackJollimore 2023-07-09 05:20:43 -03:00
  • 18780e0a5e readme : update Termux instructions (#2147) JackJollimore 2023-07-09 05:20:43 -03:00
  • 80192cec3a ggml : fix buidling with Intel MKL but ask for "cblas.h" issue (#2104) (#2115) clyang 2023-07-09 16:12:20 +08:00
  • 3bbc1a11f0 ggml : fix buidling with Intel MKL but ask for "cblas.h" issue (#2104) (#2115) clyang 2023-07-09 16:12:20 +08:00
  • e62b9c1c04 readme : add more docs indexes (#2127) rankaiyx 2023-07-09 15:38:42 +08:00
  • 2492a53fd0 readme : add more docs indexes (#2127) rankaiyx 2023-07-09 15:38:42 +08:00
  • 36054e5b18 Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (#2144) Johannes Gäßler 2023-07-08 20:01:44 +02:00
  • 64639555ff Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (#2144) Johannes Gäßler 2023-07-08 20:01:44 +02:00
  • 305b61f7a2 CUDA: add __restrict__ to mul mat vec kernels (#2140) Johannes Gäßler 2023-07-08 00:25:15 +02:00
  • 061f5f8d21 CUDA: add __restrict__ to mul mat vec kernels (#2140) Johannes Gäßler 2023-07-08 00:25:15 +02:00
  • 7466f4edad docker : add support for CUDA in docker (#1461) dylan 2023-07-07 11:25:25 -07:00
  • 84525e7962 docker : add support for CUDA in docker (#1461) dylan 2023-07-07 11:25:25 -07:00
  • d94ea1d98a ci : switch threads to 1 (#2138) Georgi Gerganov 2023-07-07 21:23:57 +03:00
  • a7e20edf22 ci : switch threads to 1 (#2138) Georgi Gerganov 2023-07-07 21:23:57 +03:00
  • d259d42719 ggml : change ggml_graph_compute() API to not require context (#1999) Qingyou Meng 2023-07-08 00:24:01 +08:00
  • 1d656d6360 ggml : change ggml_graph_compute() API to not require context (#1999) Qingyou Meng 2023-07-08 00:24:01 +08:00
  • 77f39d5b4f ggml : remove sched_yield() call in ggml_graph_compute_thread() (#2134) Georgi Gerganov 2023-07-07 18:36:37 +03:00
  • 7242140283 ggml : remove sched_yield() call in ggml_graph_compute_thread() (#2134) Georgi Gerganov 2023-07-07 18:36:37 +03:00