Commit Graph

  • 0f5006b5c2 llama : disable FA if KV head size do not match (#7982) Georgi Gerganov 2024-06-17 19:40:01 +03:00
  • 7c26775adb llama : disable FA if KV head size do not match (#7982) Georgi Gerganov 2024-06-17 19:40:01 +03:00
  • 367e2f7ddf Add Nix and Flox install instructions (#7899) Bryan Honof 2024-06-17 17:37:55 +02:00
  • b473e95084 Add Nix and Flox install instructions (#7899) Bryan Honof 2024-06-17 17:37:55 +02:00
  • c53c1ea4e3 sched : offload_op also requires supports_op (#7977) slaren 2024-06-17 16:51:42 +02:00
  • 99052cd227 sched : offload_op also requires supports_op (#7977) slaren 2024-06-17 16:51:42 +02:00
  • bb68ec160e fix: divide 0 exception in mamba (#7932) Frank Mai 2024-06-17 22:11:08 +08:00
  • c637fcd34d fix: divide 0 exception in mamba (#7932) Frank Mai 2024-06-17 22:11:08 +08:00
  • ba05597315 Implement non-mapped async IO for CUDA on Windows. (#7896) Markus Tavenrath 2024-06-17 16:10:15 +02:00
  • 6a2f0b3474 Implement non-mapped async IO for CUDA on Windows. (#7896) Markus Tavenrath 2024-06-17 16:10:15 +02:00
  • 1ede7e4cb8 rpc : fix load/store misaligned addresses (#7948) Georgi Gerganov 2024-06-17 11:09:20 +03:00
  • 21be9cab94 rpc : fix load/store misaligned addresses (#7948) Georgi Gerganov 2024-06-17 11:09:20 +03:00
  • 69a2e7318d gguf-dump.py: add --markdown dump output (#7853) Brian 2024-06-17 15:25:20 +10:00
  • 006167aaf6 gguf-dump.py: add --markdown dump output (#7853) Brian 2024-06-17 15:25:20 +10:00
  • e604f8d538 [SYCL] Update README-sycl.md for Chapter "Recommended release" and "News" (#7946) Neo Zhang 2024-06-17 11:17:07 +08:00
  • df68d4fa5d [SYCL] Update README-sycl.md for Chapter "Recommended release" and "News" (#7946) Neo Zhang 2024-06-17 11:17:07 +08:00
  • 7871c5029a Add support for sqrt on CUDA (#7953) Calvin Laurenson 2024-06-16 15:23:04 -07:00
  • 43b35e38ba Add support for sqrt on CUDA (#7953) Calvin Laurenson 2024-06-16 15:23:04 -07:00
  • 0b7ca53f1a cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231) Georgi Gerganov 2024-06-11 17:39:01 +03:00
  • 19b7a836f6 cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231) Georgi Gerganov 2024-06-11 17:39:01 +03:00
  • 9efdbeee59 ggml : fix and optimize ppc64le (ggml/849) Hong Bo PENG 2024-06-16 16:53:11 +08:00
  • b5fcf8ef5c ggml : fix and optimize ppc64le (ggml/849) Hong Bo PENG 2024-06-16 16:53:11 +08:00
  • 9d8799abee ggml : remove duplicate include of ggml-common.h (ggml/853) Daniel Bevenius 2024-06-16 10:51:18 +02:00
  • 398105ff43 ggml : remove duplicate include of ggml-common.h (ggml/853) Daniel Bevenius 2024-06-16 10:51:18 +02:00
  • f5e676a8ea flake.lock: Update (#7951) Georgi Gerganov 2024-06-16 19:16:21 +03:00
  • bc6c457fa3 flake.lock: Update (#7951) Georgi Gerganov 2024-06-16 19:16:21 +03:00
  • ea2d0ee0a3 unicode : avoid char32_t (#7957) Georgi Gerganov 2024-06-16 14:51:40 +03:00
  • 52399254b3 unicode : avoid char32_t (#7957) Georgi Gerganov 2024-06-16 14:51:40 +03:00
  • 5ba38be709 readme : update UI list [no ci] (#7958) hopkins385 2024-06-16 13:51:18 +02:00
  • 6fe1c62741 readme : update UI list [no ci] (#7958) hopkins385 2024-06-16 13:51:18 +02:00
  • b15a28061e ggml : fix handling of zero blocks in IQ quants (#7955) Georgi Gerganov 2024-06-16 14:50:12 +03:00
  • cddaf028ad ggml : fix handling of zero blocks in IQ quants (#7955) Georgi Gerganov 2024-06-16 14:50:12 +03:00
  • 0a673baa03 github : update pr template Georgi Gerganov 2024-06-16 10:46:51 +03:00
  • c8a82194a8 github : update pr template Georgi Gerganov 2024-06-16 10:46:51 +03:00
  • 0b35ff8340 Vulkan Shader Refactor, Memory Debugging Option (#7947) 0cc4m 2024-06-16 07:17:31 +02:00
  • 7c7836d9d4 Vulkan Shader Refactor, Memory Debugging Option (#7947) 0cc4m 2024-06-16 07:17:31 +02:00
  • e4ed322dde Add cvector-generator example (#7514) Xuan Son Nguyen 2024-06-15 18:53:40 +02:00
  • 0c7b3595b9 Add cvector-generator example (#7514) Xuan Son Nguyen 2024-06-15 18:53:40 +02:00
  • 6994c1326c [SYCL] remove global variables (#7710) Meng, Hengyu 2024-06-15 14:05:10 +08:00
  • 7b2f4a7d19 [SYCL] remove global variables (#7710) Meng, Hengyu 2024-06-15 14:05:10 +08:00
  • 7006c85155 ci : fix macos x86 build (#7940) olexiyb 2024-06-14 20:28:34 +03:00
  • f8ec8877b7 ci : fix macos x86 build (#7940) olexiyb 2024-06-14 20:28:34 +03:00
  • 8c1fed631e CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921) Johannes Gäßler 2024-06-14 18:41:49 +02:00
  • 76d66ee0be CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921) Johannes Gäßler 2024-06-14 18:41:49 +02:00
  • ed569d67cd metal : utilize max shared memory for mul_mat_id (#7935) Georgi Gerganov 2024-06-14 17:14:09 +03:00
  • 66ef1ceedf metal : utilize max shared memory for mul_mat_id (#7935) Georgi Gerganov 2024-06-14 17:14:09 +03:00
  • 18b650c9c3 llama-bench : fix RPC indication (#7936) Radoslav Gerganov 2024-06-14 16:47:41 +03:00
  • e65bbf606c llama-bench : fix RPC indication (#7936) Radoslav Gerganov 2024-06-14 16:47:41 +03:00
  • 703ac0fa9f llama : more checks before assuming FIM tokens (#7644) Sigbjørn Skjæret 2024-06-14 12:20:04 +02:00
  • 6fcd1331ef llama : more checks before assuming FIM tokens (#7644) Sigbjørn Skjæret 2024-06-14 12:20:04 +02:00
  • 16d99b1477 convert : add Poro-34B-chat tokenizer support (#7713) Elaine 2024-06-14 13:16:49 +03:00
  • 41b9260f18 convert : add Poro-34B-chat tokenizer support (#7713) Elaine 2024-06-14 13:16:49 +03:00
  • e0a066da74 rpc : fix ggml_backend_rpc_supports_buft() (#7918) Radoslav Gerganov 2024-06-13 15:18:44 +03:00
  • 172c825684 rpc : fix ggml_backend_rpc_supports_buft() (#7918) Radoslav Gerganov 2024-06-13 15:18:44 +03:00
  • 1f44a37b2b readme : Remove outdated instructions from README.md (#7914) [no ci] Galunid 2024-06-13 09:42:41 +02:00
  • a55eb1bf0f readme : Remove outdated instructions from README.md (#7914) [no ci] Galunid 2024-06-13 09:42:41 +02:00
  • d2ff37b278 move BLAS to a separate backend (#6210) slaren 2024-06-13 03:11:35 +02:00
  • f578b86b21 move BLAS to a separate backend (#6210) slaren 2024-06-13 03:11:35 +02:00
  • b267b997c5 build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) Olivier Chafik 2024-06-13 00:41:52 +01:00
  • 1c641e6aac build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) Olivier Chafik 2024-06-13 00:41:52 +01:00
  • c9ff309cdb CUDA: fix broken oob check for FA vec f32 kernel (#7904) Johannes Gäßler 2024-06-12 17:41:51 +02:00
  • 963552903f CUDA: fix broken oob check for FA vec f32 kernel (#7904) Johannes Gäßler 2024-06-12 17:41:51 +02:00
  • 8ac08e825d tests : add non-cont unary tests (#7857) Georgi Gerganov 2024-06-12 16:00:22 +03:00
  • a9cae48003 tests : add non-cont unary tests (#7857) Georgi Gerganov 2024-06-12 16:00:22 +03:00
  • ae9bcd1d75 ggml : improve ggml_is_contiguous logic (#7856) Georgi Gerganov 2024-06-12 15:24:20 +03:00
  • bfaa676b08 ggml : improve ggml_is_contiguous logic (#7856) Georgi Gerganov 2024-06-12 15:24:20 +03:00
  • f196f2a2c9 server : restore numeric prompts (#7883) Georgi Gerganov 2024-06-12 14:42:29 +03:00
  • 704a35b183 server : restore numeric prompts (#7883) Georgi Gerganov 2024-06-12 14:42:29 +03:00
  • f16b859200 update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (#7894) Meng, Hengyu 2024-06-12 17:05:35 +08:00
  • dcf752707d update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (#7894) Meng, Hengyu 2024-06-12 17:05:35 +08:00
  • a128c6d094 Fix a typo and add Fedora 40 pacakge to install for Vulkan (#7794) [no ci] Patrice Ferlet 2024-06-12 03:18:16 +02:00
  • f2b5764beb Fix a typo and add Fedora 40 pacakge to install for Vulkan (#7794) [no ci] Patrice Ferlet 2024-06-12 03:18:16 +02:00
  • 478f365d75 vulkan: select only one device for single gpu with multiple drivers (#7582) k.h.lai 2024-06-12 03:26:05 +08:00
  • 73bac2b11d vulkan: select only one device for single gpu with multiple drivers (#7582) k.h.lai 2024-06-12 03:26:05 +08:00
  • 8ba2f270e6 Update Vulkan RoPE implementation (#7818) 0cc4m 2024-06-11 21:20:29 +02:00
  • ef52d1d16a Update Vulkan RoPE implementation (#7818) 0cc4m 2024-06-11 21:20:29 +02:00
  • a08dd44cb8 fix broken link in pr template (#7880) [no ci] Deven Mistry 2024-06-11 12:18:58 -04:00
  • 14f83526cd fix broken link in pr template (#7880) [no ci] Deven Mistry 2024-06-11 12:18:58 -04:00
  • ea1bb2b82b github: move PR template to .github/ root (#7868) Brian 2024-06-12 00:43:41 +10:00
  • 6fe42d073f github: move PR template to .github/ root (#7868) Brian 2024-06-12 00:43:41 +10:00
  • a9e26d8c45 llama-bench: more compact markdown tables (#7879) Johannes Gäßler 2024-06-11 14:45:40 +02:00
  • 148995e5e5 llama-bench: more compact markdown tables (#7879) Johannes Gäßler 2024-06-11 14:45:40 +02:00
  • 99456d217a tests : check the Python version (#7872) Georgi Gerganov 2024-06-11 10:10:20 +03:00
  • 4bfe50f741 tests : check the Python version (#7872) Georgi Gerganov 2024-06-11 10:10:20 +03:00
  • cb7240ad05 CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860) Johannes Gäßler 2024-06-11 08:26:07 +02:00
  • bdcb8f4222 CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860) Johannes Gäßler 2024-06-11 08:26:07 +02:00
  • e4e6f9abea fix CUDA CI by using a windows-2019 image (#7861) slaren 2024-06-11 07:59:20 +02:00
  • c2ce6c47e4 fix CUDA CI by using a windows-2019 image (#7861) slaren 2024-06-11 07:59:20 +02:00
  • 52819e6643 json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866) Olivier Chafik 2024-06-11 02:22:57 +01:00
  • b61eb9644d json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866) Olivier Chafik 2024-06-11 02:22:57 +01:00
  • 1de5991f7c json: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841) Olivier Chafik 2024-06-11 01:00:30 +01:00
  • 396b18dfec json: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841) Olivier Chafik 2024-06-11 01:00:30 +01:00
  • 9fa0c89c0c cmake : fix CMake requirement for CUDA (#7821) Jared Van Bortel 2024-06-10 18:32:10 -04:00
  • 864a99e7a0 cmake : fix CMake requirement for CUDA (#7821) Jared Van Bortel 2024-06-10 18:32:10 -04:00
  • 27d373a411 ci : try win-2019 on server windows test (#7854) slaren 2024-06-10 14:18:41 +02:00
  • fd5ea0f897 ci : try win-2019 on server windows test (#7854) slaren 2024-06-10 14:18:41 +02:00
  • a98d3afc28 examples : remove --instruct remnants (#7846) Georgi Gerganov 2024-06-10 15:00:15 +03:00
  • c28a83902c examples : remove --instruct remnants (#7846) Georgi Gerganov 2024-06-10 15:00:15 +03:00
  • 968cfb9d8d server : improve "prompt" handling (#7847) Georgi Gerganov 2024-06-10 14:59:55 +03:00
  • d9da0e4986 server : improve "prompt" handling (#7847) Georgi Gerganov 2024-06-10 14:59:55 +03:00