Commit Graph

  • 2ba70388a3 Fix Q4_K and Q5_K for QK_K = 64 on CUDA (#2359) Kawrakow 2023-07-25 13:48:04 +03:00
  • 129d844c87 Fix Q4_K and Q5_K for QK_K = 64 on CUDA (#2359) Kawrakow 2023-07-25 13:48:04 +03:00
  • 8b710f6c6c server: add rms_norm_eps parameter (#2380) slaren 2023-07-25 11:36:17 +02:00
  • d5512b782b server: add rms_norm_eps parameter (#2380) slaren 2023-07-25 11:36:17 +02:00
  • 85913309c2 [Server] Escape HTML in webchat (#2368) Henri Vasserman 2023-07-25 10:27:34 +03:00
  • c798308e3a [Server] Escape HTML in webchat (#2368) Henri Vasserman 2023-07-25 10:27:34 +03:00
  • f03ca6ee8a make rms_norm_eps a parameter (#2374) slaren 2023-07-24 17:57:12 +02:00
  • 41c674161f make rms_norm_eps a parameter (#2374) slaren 2023-07-24 17:57:12 +02:00
  • 6d91bf5e54 Chat UI extras (#2366) Aarni Koskela 2023-07-24 17:54:22 +03:00
  • b3f138d058 Chat UI extras (#2366) Aarni Koskela 2023-07-24 17:54:22 +03:00
  • 0e2cd25570 ggml : sync (unary ops refactor, static-correctness) (#2370) Georgi Gerganov 2023-07-24 14:46:21 +03:00
  • 5b2b2dc6ae ggml : sync (unary ops refactor, static-correctness) (#2370) Georgi Gerganov 2023-07-24 14:46:21 +03:00
  • 9830c37087 Fix scalar version of Q5_K when QK_K = 64 (#2362) Kawrakow 2023-07-24 12:55:02 +03:00
  • 42f70cb2f6 Fix scalar version of Q5_K when QK_K = 64 (#2362) Kawrakow 2023-07-24 12:55:02 +03:00
  • 9e7ca68b90 llama : add grammar-based sampling (#1773) Evan Jones 2023-07-23 23:58:10 -04:00
  • 84e09a7d8b llama : add grammar-based sampling (#1773) Evan Jones 2023-07-23 23:58:10 -04:00
  • 61be4d0b27 Some more Q4_K and Q5_K speedup on CUDA (#2346) Kawrakow 2023-07-24 00:19:47 +03:00
  • 2f9cf974a0 Some more Q4_K and Q5_K speedup on CUDA (#2346) Kawrakow 2023-07-24 00:19:47 +03:00
  • c4a2ed79be Add gqa parameter support to the server (#2351) IgnacioFDM 2023-07-23 17:31:17 -03:00
  • 4f06592cc6 Add gqa parameter support to the server (#2351) IgnacioFDM 2023-07-23 17:31:17 -03:00
  • 118a2ced18 Fix __dp4a documentation (#2348) Johannes Gäßler 2023-07-23 17:49:06 +02:00
  • 70d26ac388 Fix __dp4a documentation (#2348) Johannes Gäßler 2023-07-23 17:49:06 +02:00
  • f65f81e723 common : n_threads == -1 uses std::thread::hardware_concurrency() (#2347) wzy 2023-07-23 21:33:02 +08:00
  • 57921ca6db common : n_threads == -1 uses std::thread::hardware_concurrency() (#2347) wzy 2023-07-23 21:33:02 +08:00
  • 9ab1bc29d9 fix n_tasks (#2342) slaren 2023-07-23 15:19:39 +02:00
  • 3602ac4255 fix n_tasks (#2342) slaren 2023-07-23 15:19:39 +02:00
  • 9677ba9f06 ggml: move op parameters from tensors to ggml_tensor::op_params (#2333) slaren 2023-07-23 14:36:02 +02:00
  • 95a6c595e7 ggml: move op parameters from tensors to ggml_tensor::op_params (#2333) slaren 2023-07-23 14:36:02 +02:00
  • 3fe463b26d llama : grouped-query attention + LLaMAv2 70B support (#2276) Georgi Gerganov 2023-07-23 15:09:47 +03:00
  • e76d630df1 llama : grouped-query attention + LLaMAv2 70B support (#2276) Georgi Gerganov 2023-07-23 15:09:47 +03:00
  • 02086139f9 llama : print help to stdout (#2338) maddes8cht 2023-07-23 13:59:48 +02:00
  • 1d0824b247 llama : print help to stdout (#2338) maddes8cht 2023-07-23 13:59:48 +02:00
  • d88eb2f671 flake : support nix build '.#opencl' (#2337) wzy 2023-07-23 19:57:02 +08:00
  • bc3ec2cdc9 flake : support nix build '.#opencl' (#2337) wzy 2023-07-23 19:57:02 +08:00
  • 9a208e606c llama : print max tensor size to stderr (#2336) Christian Demsar 2023-07-23 07:56:34 -04:00
  • a940458e48 llama : print max tensor size to stderr (#2336) Christian Demsar 2023-07-23 07:56:34 -04:00
  • 52f75c7549 make : fix CLBLAST compile support in FreeBSD (#2331) Jose Maldonado 2023-07-23 07:52:08 -04:00
  • 91171b8072 make : fix CLBLAST compile support in FreeBSD (#2331) Jose Maldonado 2023-07-23 07:52:08 -04:00
  • 574ccaa8b8 examples : simplify vim plugin (#2327) AustinMroz 2023-07-23 06:16:48 -05:00
  • 355c80f49e examples : simplify vim plugin (#2327) AustinMroz 2023-07-23 06:16:48 -05:00
  • a8354a5525 metal : support bcast add & dup & cont op (#2323) Jiahao Li 2023-07-23 19:00:37 +08:00
  • 83a00ce69b metal : support bcast add & dup & cont op (#2323) Jiahao Li 2023-07-23 19:00:37 +08:00
  • bb9446b3aa Speed up Q4_K (#2322) Kawrakow 2023-07-23 08:49:20 +03:00
  • d2a43664f9 Speed up Q4_K (#2322) Kawrakow 2023-07-23 08:49:20 +03:00
  • 0b6dbeae14 CUDA: Fixed 7b q3_K_S with mul_mat_vec_q (#2313) Johannes Gäßler 2023-07-22 21:27:34 +02:00
  • b9b7d94fc1 CUDA: Fixed 7b q3_K_S with mul_mat_vec_q (#2313) Johannes Gäßler 2023-07-22 21:27:34 +02:00
  • 926b090575 llama : optimize memory buffers (#2325) Georgi Gerganov 2023-07-22 21:17:57 +03:00
  • b47b8a9cfe llama : optimize memory buffers (#2325) Georgi Gerganov 2023-07-22 21:17:57 +03:00
  • 0c4de4eb95 Perplexity: Compute scores correlated to HellaSwag (#2312) klosax 2023-07-22 14:21:24 +02:00
  • b5fe67f8c6 Perplexity: Compute scores correlated to HellaSwag (#2312) klosax 2023-07-22 14:21:24 +02:00
  • 6f5d82363e examples : basic VIM plugin whoreson 2023-07-22 12:34:51 +02:00
  • 24baa54ac1 examples : basic VIM plugin whoreson 2023-07-22 12:34:51 +02:00
  • ac16f363f8 ci : fix args Georgi Gerganov 2023-07-22 12:00:56 +03:00
  • dd6c67d3cb ci : fix args Georgi Gerganov 2023-07-22 12:00:56 +03:00
  • 2fe787f8bc ci : add 7B CUDA tests (#2319) Georgi Gerganov 2023-07-22 11:48:22 +03:00
  • 5d500e8ccf ci : add 7B CUDA tests (#2319) Georgi Gerganov 2023-07-22 11:48:22 +03:00
  • d20efa0a13 examples : add easy python script to create quantized (k-bit support) GGML models from local HF Transformer models (#2311) Richard Roberson 2023-07-21 13:01:10 -06:00
  • 7d5f18468c examples : add easy python script to create quantized (k-bit support) GGML models from local HF Transformer models (#2311) Richard Roberson 2023-07-21 13:01:10 -06:00
  • 492226c4dc Custom RoPE + bettter memory management for CUDA (#2295) Kawrakow 2023-07-21 17:27:51 +03:00
  • d924522a46 Custom RoPE + bettter memory management for CUDA (#2295) Kawrakow 2023-07-21 17:27:51 +03:00
  • 326586c96e Faster Q3_K implementation on Metal (#2307) Kawrakow 2023-07-21 17:05:30 +03:00
  • 4d76a5f49b Faster Q3_K implementation on Metal (#2307) Kawrakow 2023-07-21 17:05:30 +03:00
  • 257a690f16 ggml : fix the rope fix (513f861953) Georgi Gerganov 2023-07-21 15:16:55 +03:00
  • 0db14fef06 ggml : fix the rope fix (513f861953) Georgi Gerganov 2023-07-21 15:16:55 +03:00
  • 0f8aca4e96 examples : fix typo in minigpt4.py (#2298) Ikko Eltociear Ashimine 2023-07-21 20:53:07 +09:00
  • 03e566977b examples : fix typo in minigpt4.py (#2298) Ikko Eltociear Ashimine 2023-07-21 20:53:07 +09:00
  • b3b8017933 ggml : fix rope args order + assert (#2054) Georgi Gerganov 2023-07-21 14:51:34 +03:00
  • 513f861953 ggml : fix rope args order + assert (#2054) Georgi Gerganov 2023-07-21 14:51:34 +03:00
  • 05fa6e7001 gitignore : fix final newline Georgi Gerganov 2023-07-21 14:42:41 +03:00
  • 3973b25a64 gitignore : fix final newline Georgi Gerganov 2023-07-21 14:42:41 +03:00
  • b11e6b20af llama : remove cfg smooth factor as it is only a reparameterization of the guidance scale (#2280) Guillaume "Vermeille" Sanchez 2023-07-21 12:58:36 +02:00
  • ab0e26bdfb llama : remove cfg smooth factor as it is only a reparameterization of the guidance scale (#2280) Guillaume "Vermeille" Sanchez 2023-07-21 12:58:36 +02:00
  • 29993c5483 gitignore : changes for Poetry users + chat examples (#2284) Jose Maldonado 2023-07-21 06:53:27 -04:00
  • 73643f5fb1 gitignore : changes for Poetry users + chat examples (#2284) Jose Maldonado 2023-07-21 06:53:27 -04:00
  • 5581df6a95 make : fix indentation Georgi Gerganov 2023-07-21 13:50:55 +03:00
  • a814d04f81 make : fix indentation Georgi Gerganov 2023-07-21 13:50:55 +03:00
  • 30188d71ac ci : fix MNT realpath usage (#2250) Georgi Gerganov 2023-07-21 13:48:18 +03:00
  • 4c013bb738 ci : fix MNT realpath usage (#2250) Georgi Gerganov 2023-07-21 13:48:18 +03:00
  • 89cf735f50 make : support customized LLAMA_CUDA_NVCC and LLAMA_CUDA_CCBIN (#2275) Sky Yan 2023-07-21 18:38:57 +08:00
  • 42c7c2e2e9 make : support customized LLAMA_CUDA_NVCC and LLAMA_CUDA_CCBIN (#2275) Sky Yan 2023-07-21 18:38:57 +08:00
  • 716d8e181a flake : remove intel mkl from flake.nix due to missing files (#2277) wzy 2023-07-21 18:26:34 +08:00
  • 78a3d13424 flake : remove intel mkl from flake.nix due to missing files (#2277) wzy 2023-07-21 18:26:34 +08:00
  • 5c622da7fa llama : make tensor_split ptr instead of array (#2272) Georgi Gerganov 2023-07-21 13:10:51 +03:00
  • ae178ab46b llama : make tensor_split ptr instead of array (#2272) Georgi Gerganov 2023-07-21 13:10:51 +03:00
  • 2e1c631292 make : add new target for test binaries (#2244) Jiří Podivín 2023-07-21 12:09:16 +02:00
  • 54e3bc76fe make : add new target for test binaries (#2244) Jiří Podivín 2023-07-21 12:09:16 +02:00
  • 4183277024 MIKU MAYHEM: Upgrading the Default Model for Maximum Fun 🎉 (#2287) Hatsune Miku 2023-07-21 08:13:18 +00:00
  • 019fe257bb MIKU MAYHEM: Upgrading the Default Model for Maximum Fun 🎉 (#2287) Hatsune Miku 2023-07-21 08:13:18 +00:00
  • 485d811887 Faster Q2_K on Metal (#2297) Kawrakow 2023-07-21 10:44:40 +03:00
  • e68c96f7fe Faster Q2_K on Metal (#2297) Kawrakow 2023-07-21 10:44:40 +03:00
  • 8e0505a028 make : fix embdinput library and server examples building on MSYS2 (#2235) Przemysław Pawełczyk 2023-07-21 09:42:21 +02:00
  • 9cf022a188 make : fix embdinput library and server examples building on MSYS2 (#2235) Przemysław Pawełczyk 2023-07-21 09:42:21 +02:00
  • a1b74a6521 Faster Q5_K and Q6_K on Metal (#2294) Kawrakow 2023-07-20 18:19:45 +03:00
  • e782c9e735 Faster Q5_K and Q6_K on Metal (#2294) Kawrakow 2023-07-20 18:19:45 +03:00
  • 27af8b24c2 Faster Q4_K on Metal (#2290) Kawrakow 2023-07-20 15:18:43 +03:00
  • 785829dfe8 Faster Q4_K on Metal (#2290) Kawrakow 2023-07-20 15:18:43 +03:00
  • 76998b01e3 llama : fix regression from #2000 - could not load no-mmap models Georgi Gerganov 2023-07-20 13:47:26 +03:00
  • fff0e0eafe llama : fix regression from #2000 - could not load no-mmap models Georgi Gerganov 2023-07-20 13:47:26 +03:00
  • 092b078946 metal: minor q4 optimization and reduce code size (#2248) Shouzheng Liu 2023-07-20 06:32:22 -04:00
  • 417a85a001 metal: minor q4 optimization and reduce code size (#2248) Shouzheng Liu 2023-07-20 06:32:22 -04:00