Commit Graph

  • eb759d3b34 kompute : implement op_getrows_f32 (#6403) woachk 2024-06-03 07:32:16 +02:00
  • 9e405b6e2e kompute : implement op_getrows_f32 (#6403) woachk 2024-06-03 07:32:16 +02:00
  • 96636899da fix bug introduced in using calloc (#7701) Dave Airlie 2024-06-03 07:59:54 +10:00
  • 3413ae2193 fix bug introduced in using calloc (#7701) Dave Airlie 2024-06-03 07:59:54 +10:00
  • 5ad7f4fc02 flake.lock: Update (#7686) Georgi Gerganov 2024-06-03 00:13:12 +03:00
  • 1669810d7c flake.lock: Update (#7686) Georgi Gerganov 2024-06-03 00:13:12 +03:00
  • 57bc9cfda5 chore : add ignore rule for generated server themes (#7689) Austin 2024-06-02 13:39:08 -04:00
  • 7c4e5b7eae chore : add ignore rule for generated server themes (#7689) Austin 2024-06-02 13:39:08 -04:00
  • 496913a2c3 [SYCL] Update rpc-server.cpp to include SYCL backend (#7682) nickp27 2024-06-02 19:13:54 +10:00
  • 9422c5e34b [SYCL] Update rpc-server.cpp to include SYCL backend (#7682) nickp27 2024-06-02 19:13:54 +10:00
  • 0f8cdc3051 Fix FlashAttention debug test, FP32 assert (#7684) Johannes Gäßler 2024-06-01 23:26:10 +02:00
  • e141ce624a Fix FlashAttention debug test, FP32 assert (#7684) Johannes Gäßler 2024-06-01 23:26:10 +02:00
  • e606e940c3 server : new UI (#7633) Yazan Agha-Schrader 2024-06-01 21:31:48 +02:00
  • 2e666832e6 server : new UI (#7633) Yazan Agha-Schrader 2024-06-01 21:31:48 +02:00
  • cfd5c8cc34 SimpleChat: Simple histogram/repeatMatching driven garbageTrimming, Settings UI, Streaming mode, OpenAi Compat (Model, Authorization Bearer), Save/Restore session, Auto Settings UI (#7548) HanishKVC 2024-06-01 21:50:18 +05:30
  • 2ac95c9d56 SimpleChat: Simple histogram/repeatMatching driven garbageTrimming, Settings UI, Streaming mode, OpenAi Compat (Model, Authorization Bearer), Save/Restore session, Auto Settings UI (#7548) HanishKVC 2024-06-01 21:50:18 +05:30
  • 474d8c7bac CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681) Johannes Gäßler 2024-06-01 15:47:04 +02:00
  • 750f60c03e CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681) Johannes Gäßler 2024-06-01 15:47:04 +02:00
  • 0ef2e997e2 CUDA: quantized KV support for FA vec (#7527) Johannes Gäßler 2024-06-01 08:44:14 +02:00
  • 9b596417af CUDA: quantized KV support for FA vec (#7527) Johannes Gäßler 2024-06-01 08:44:14 +02:00
  • d91eb8d805 server : update js (#7670) Georgi Gerganov 2024-05-31 22:23:04 +03:00
  • a323ec60af server : update js (#7670) Georgi Gerganov 2024-05-31 22:23:04 +03:00
  • ea8a7d7f25 convert-hf : Handle NotImplementedError in convert-hf-to-gguf (#7660) Galunid 2024-05-31 17:42:33 +02:00
  • 0515ad93f4 convert-hf : Handle NotImplementedError in convert-hf-to-gguf (#7660) Galunid 2024-05-31 17:42:33 +02:00
  • 4f660b95b5 scripts: update compare_llama_bench.py [no ci] (#7673) Johannes Gäßler 2024-05-31 16:26:21 +02:00
  • c8047d538f scripts: update compare_llama_bench.py [no ci] (#7673) Johannes Gäßler 2024-05-31 16:26:21 +02:00
  • c505724630 Improve HIP compatibility (#7672) Daniele 2024-05-31 14:00:29 +00:00
  • 30e238b246 Improve HIP compatibility (#7672) Daniele 2024-05-31 14:00:29 +00:00
  • 7d5ddd5e7f readme : link homebrew discussion Georgi Gerganov 2024-05-31 15:04:58 +03:00
  • 16926dff92 readme : link homebrew discussion Georgi Gerganov 2024-05-31 15:04:58 +03:00
  • 48611fbd58 ggml : fix loongson compile warnings (#7537) Georgi Gerganov 2024-05-31 14:17:10 +03:00
  • 0c27e6f62e ggml : fix loongson compile warnings (#7537) Georgi Gerganov 2024-05-31 14:17:10 +03:00
  • 149f27dcac Somehow '**' got lost (#7663) Galunid 2024-05-31 10:24:41 +02:00
  • 2e32f874e6 Somehow '**' got lost (#7663) Galunid 2024-05-31 10:24:41 +02:00
  • ea8ebdcf83 Add convert.py removal to hot topics (#7662) Galunid 2024-05-31 10:09:20 +02:00
  • 1af511fc22 Add convert.py removal to hot topics (#7662) Galunid 2024-05-31 10:09:20 +02:00
  • 44e2cef7a9 [no ci] docs: add aikit to readme (#7650) Sertaç Özercan 2024-05-30 16:57:16 -07:00
  • 0541f06296 [no ci] docs: add aikit to readme (#7650) Sertaç Özercan 2024-05-30 16:57:16 -07:00
  • cf81bcbc54 Fixed painfully slow single process builds. (#7326) JohnnyB 2024-05-30 21:32:38 +01:00
  • 9022c33646 Fixed painfully slow single process builds. (#7326) JohnnyB 2024-05-30 21:32:38 +01:00
  • 8624e77bea llama : cache llama_token_to_piece (#7587) Georgi Gerganov 2024-05-30 19:01:41 +03:00
  • 5921b8f089 llama : cache llama_token_to_piece (#7587) Georgi Gerganov 2024-05-30 19:01:41 +03:00
  • 5e5a08cb47 Fix conan badge display [no ci] (#7645) Martin Delille 2024-05-30 17:07:39 +02:00
  • 5dcdf94676 Fix conan badge display [no ci] (#7645) Martin Delille 2024-05-30 17:07:39 +02:00
  • a76d659ed1 Add brew installation instruction to README [no ci] (#7616) Manuel 2024-05-30 16:58:15 +02:00
  • 2e2340de17 Add brew installation instruction to README [no ci] (#7616) Manuel 2024-05-30 16:58:15 +02:00
  • 8ea71785cc readme : add Conan badge (#7638) Martin Delille 2024-05-30 14:52:50 +02:00
  • 7846540bd2 readme : add Conan badge (#7638) Martin Delille 2024-05-30 14:52:50 +02:00
  • 0a83967e79 github: add contact links to issues and convert question into research [no ci] (#7612) Brian 2024-05-30 21:55:36 +10:00
  • e6157f94c8 github: add contact links to issues and convert question into research [no ci] (#7612) Brian 2024-05-30 21:55:36 +10:00
  • 81e37e4303 Move convert.py to examples/convert-legacy-llama.py (#7430) Galunid 2024-05-30 13:40:00 +02:00
  • 9c4c9cc83f Move convert.py to examples/convert-legacy-llama.py (#7430) Galunid 2024-05-30 13:40:00 +02:00
  • ae41991fe1 faster avx512 exp implementation (#7551) Chris Elrod 2024-05-30 07:32:55 -04:00
  • 59b0d07766 faster avx512 exp implementation (#7551) Chris Elrod 2024-05-30 07:32:55 -04:00
  • 63c01f3329 ggml : fix loongarch build (O2 issue) (#7636) junchao-loongson 2024-05-30 17:30:10 +08:00
  • d5c05821f3 ggml : fix loongarch build (O2 issue) (#7636) junchao-loongson 2024-05-30 17:30:10 +08:00
  • 01a30eac03 README: explain parallel build [no ci] (#7618) Johannes Gäßler 2024-05-30 09:52:39 +02:00
  • 972b555ab9 README: explain parallel build [no ci] (#7618) Johannes Gäßler 2024-05-30 09:52:39 +02:00
  • 991a5632cd [SYCL] fix intel docker (#7630) Meng, Hengyu 2024-05-30 14:19:08 +08:00
  • 3854c9d07f [SYCL] fix intel docker (#7630) Meng, Hengyu 2024-05-30 14:19:08 +08:00
  • 771aed5905 gguf-py : Add tokenizer.ggml.pre to gguf-new-metadata.py (#7627) Galunid 2024-05-30 02:10:40 +02:00
  • eb57fee51f gguf-py : Add tokenizer.ggml.pre to gguf-new-metadata.py (#7627) Galunid 2024-05-30 02:10:40 +02:00
  • 6646c417ea metal : remove invalid asserts (#7617) Georgi Gerganov 2024-05-29 22:20:40 +03:00
  • 55d62262a9 metal : remove invalid asserts (#7617) Georgi Gerganov 2024-05-29 22:20:40 +03:00
  • 8b43fe0cbd metal : add missing asserts (#7617) Georgi Gerganov 2024-05-29 20:45:25 +03:00
  • 975ec63ff2 metal : add missing asserts (#7617) Georgi Gerganov 2024-05-29 20:45:25 +03:00
  • d9ca3be5b3 ggml : fix YARN + add tests + add asserts (#7617) Georgi Gerganov 2024-05-29 20:17:31 +03:00
  • fb76ec31a9 ggml : fix YARN + add tests + add asserts (#7617) Georgi Gerganov 2024-05-29 20:17:31 +03:00
  • 46e19e501f cuda : non-cont concat support (#7610) Georgi Gerganov 2024-05-29 15:38:26 +03:00
  • cce3dcffc5 cuda : non-cont concat support (#7610) Georgi Gerganov 2024-05-29 15:38:26 +03:00
  • e8e3c442d1 llama-bench : add support for the RPC backend (#7435) Radoslav Gerganov 2024-05-29 14:45:44 +03:00
  • 210d99173d llama-bench : add support for the RPC backend (#7435) Radoslav Gerganov 2024-05-29 14:45:44 +03:00
  • 5542469a75 ggml : use atomic_flag for critical section (#7598) slaren 2024-05-29 13:36:39 +02:00
  • 87bdf2a199 ggml : use atomic_flag for critical section (#7598) slaren 2024-05-29 13:36:39 +02:00
  • 556fc986b2 scripts : remove mpi remnants Georgi Gerganov 2024-05-29 14:31:18 +03:00
  • 00281b7be3 scripts : remove mpi remnants Georgi Gerganov 2024-05-29 14:31:18 +03:00
  • 89012dad24 sync : ggml Georgi Gerganov 2024-05-29 14:29:52 +03:00
  • 2ab977282b sync : ggml Georgi Gerganov 2024-05-29 14:29:52 +03:00
  • 3cddca643d ggml : restore ggml_rope_xpos_inplace (ggml/0) Georgi Gerganov 2024-05-26 18:35:23 +03:00
  • 72de268bec ggml : restore ggml_rope_xpos_inplace (ggml/0) Georgi Gerganov 2024-05-26 18:35:23 +03:00
  • 9c5826d9c2 Add Arc A750 and Arch linux to readme-sycl.md as verified GPU model and Linux distro (#7605) Akarshan Biswas 2024-05-29 12:23:47 +05:30
  • 0e8d8bfd6c Add Arc A750 and Arch linux to readme-sycl.md as verified GPU model and Linux distro (#7605) Akarshan Biswas 2024-05-29 12:23:47 +05:30
  • b1a8082b6b ggml : fix typo in ggml.c (#7603) zhouwg 2024-05-29 10:09:31 +08:00
  • 504f0c340f ggml : fix typo in ggml.c (#7603) zhouwg 2024-05-29 10:09:31 +08:00
  • 9af0e83c8a [SYCL] Align GEMM dispatch (#7566) Meng, Hengyu 2024-05-29 07:00:24 +08:00
  • b864b50ce5 [SYCL] Align GEMM dispatch (#7566) Meng, Hengyu 2024-05-29 07:00:24 +08:00
  • be8232d40f Tokenizer WPM fixes (#7500) jaime-m-p 2024-05-28 21:46:34 +02:00
  • 02c1ecad07 Tokenizer WPM fixes (#7500) jaime-m-p 2024-05-28 21:46:34 +02:00
  • 5730e70f34 sycl : fix assert (#7563) Georgi Gerganov 2024-05-28 22:22:50 +03:00
  • 6bd12ce409 sycl : fix assert (#7563) Georgi Gerganov 2024-05-28 22:22:50 +03:00
  • c65d048afb llama : support small Granite models (#7481) Giuseppe Scrivano 2024-05-28 20:49:49 +02:00
  • 5442939fcc llama : support small Granite models (#7481) Giuseppe Scrivano 2024-05-28 20:49:49 +02:00
  • ad5e69a461 vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE (#7552) k.h.lai 2024-05-29 01:25:08 +08:00
  • 56411a950f vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE (#7552) k.h.lai 2024-05-29 01:25:08 +08:00
  • b9469762a3 rpc : resource management rework (#7562) Radoslav Gerganov 2024-05-28 18:13:36 +03:00
  • 2b737caae1 rpc : resource management rework (#7562) Radoslav Gerganov 2024-05-28 18:13:36 +03:00
  • e354ad8256 Add support for DeepseekV2ForCausalLM (#7519) fairydreaming 2024-05-28 17:07:05 +02:00
  • ee3dff6b8e Add support for DeepseekV2ForCausalLM (#7519) fairydreaming 2024-05-28 17:07:05 +02:00
  • 7833b10088 tests : fix test-tokenizer-0.sh Georgi Gerganov 2024-05-28 15:04:09 +03:00
  • edc29433fa tests : fix test-tokenizer-0.sh Georgi Gerganov 2024-05-28 15:04:09 +03:00