Commit Graph

  • 231ae28f07 readme : add API changes section Georgi Gerganov 2024-03-03 12:44:03 +02:00
  • 8828f2da08 llama : allow for user specified embedding pooling type (#5849) Douglas Hanley 2024-03-03 04:40:27 -06:00
  • 475df1d6cf llama : allow for user specified embedding pooling type (#5849) Douglas Hanley 2024-03-03 04:40:27 -06:00
  • c1056811b9 gguf-dump : support i-quants (#5841) Nindaleth 2024-03-03 09:43:42 +01:00
  • 87c2e8b279 gguf-dump : support i-quants (#5841) Nindaleth 2024-03-03 09:43:42 +01:00
  • 8d22a241f1 llama : fix llama_copy_state_data with fragmented KV cache (#5840) compilade 2024-03-03 03:41:55 -05:00
  • de9692a7d2 llama : fix llama_copy_state_data with fragmented KV cache (#5840) compilade 2024-03-03 03:41:55 -05:00
  • bfcd457754 ci : schedule slow server tests only on Release or on demand (#5839) Pierrick Hymbert 2024-03-03 09:35:23 +01:00
  • e6029348e8 ci : schedule slow server tests only on Release or on demand (#5839) Pierrick Hymbert 2024-03-03 09:35:23 +01:00
  • ad7bd6dd67 server : init http requests thread pool with --parallel if set (#5836) Pierrick Hymbert 2024-03-03 08:48:36 +01:00
  • 8ef969afce server : init http requests thread pool with --parallel if set (#5836) Pierrick Hymbert 2024-03-03 08:48:36 +01:00
  • e10ea9bdf1 flake.lock: Update (#5842) Georgi Gerganov 2024-03-03 06:11:31 +02:00
  • fa974646e1 flake.lock: Update (#5842) Georgi Gerganov 2024-03-03 06:11:31 +02:00
  • 17a641c829 server: tests: passkey challenge / self-extend with context shift demo (#5832) Pierrick Hymbert 2024-03-02 22:00:14 +01:00
  • 9731134296 server: tests: passkey challenge / self-extend with context shift demo (#5832) Pierrick Hymbert 2024-03-02 22:00:14 +01:00
  • e4fc905575 llama : add abort_callback to interrupt computation (#5409) Michael Podvitskiy 2024-03-02 20:52:25 +01:00
  • 4a6e2d6142 llama : add abort_callback to interrupt computation (#5409) Michael Podvitskiy 2024-03-02 20:52:25 +01:00
  • d80d3d7aa2 ggml : fix IQ3_S AVX implementation (#5834) Georgi Gerganov 2024-03-02 20:00:49 +02:00
  • 494c870326 ggml : fix IQ3_S AVX implementation (#5834) Georgi Gerganov 2024-03-02 20:00:49 +02:00
  • b36b522137 convert : automatically fall back to HfVocab if tokenizer.model doesn't exist (#5821) Jared Van Bortel 2024-03-02 12:27:26 -05:00
  • 4d4d2366fc convert : automatically fall back to HfVocab if tokenizer.model doesn't exist (#5821) Jared Van Bortel 2024-03-02 12:27:26 -05:00
  • 3dfc06f9b8 convert-hf : make model class definitions self-contained (#5825) Jared Van Bortel 2024-03-02 12:21:47 -05:00
  • c7a0ad8ec9 convert-hf : make model class definitions self-contained (#5825) Jared Van Bortel 2024-03-02 12:21:47 -05:00
  • 7fa9f3f64a ggml : IQ3_S improvements (#5829) Kawrakow 2024-03-02 17:00:51 +02:00
  • bbde6eb256 ggml : IQ3_S improvements (#5829) Kawrakow 2024-03-02 17:00:51 +02:00
  • ba72410b27 scripts : add pod-llama.sh Georgi Gerganov 2024-03-02 16:54:08 +02:00
  • ef2cd694c4 scripts : add pod-llama.sh Georgi Gerganov 2024-03-02 16:54:08 +02:00
  • 38ab231f5c llama : refactor internal quantization functions (#5830) Xuan Son Nguyen 2024-03-02 15:19:09 +01:00
  • 6c32d8c7ad llama : refactor internal quantization functions (#5830) Xuan Son Nguyen 2024-03-02 15:19:09 +01:00
  • 3cd45e96d1 llama : fix segfault from unknown model arch name (#5820) compilade 2024-03-02 08:42:56 -05:00
  • 802da0091b llama : fix segfault from unknown model arch name (#5820) compilade 2024-03-02 08:42:56 -05:00
  • fd79e6d84b Support multiple GPUs (split mode) on SYCL backend (#5806) Neo Zhang Jianyu 2024-03-02 19:49:30 +08:00
  • 715641391d Support multiple GPUs (split mode) on SYCL backend (#5806) Neo Zhang Jianyu 2024-03-02 19:49:30 +08:00
  • 3d1f923ebd workflows : remove nocleanup arg for check-requirements.sh (#5826) crasm 2024-03-02 00:11:06 -05:00
  • 9bf297a02b workflows : remove nocleanup arg for check-requirements.sh (#5826) crasm 2024-03-02 00:11:06 -05:00
  • 2d596a57fd build(nix): Introduce flake.formatter for nix fmt (#5687) Tushar 2024-03-02 04:48:26 +05:30
  • cb5e8f7fc4 build(nix): Introduce flake.formatter for nix fmt (#5687) Tushar 2024-03-02 04:48:26 +05:30
  • c59ecb57fc convert-hf-to-gguf : require einops for InternLM2ForCausalLM (#5792) nold 2024-03-01 22:51:12 +01:00
  • da3b9ba2b7 convert-hf-to-gguf : require einops for InternLM2ForCausalLM (#5792) nold 2024-03-01 22:51:12 +01:00
  • 34d4a453d2 llama : add StarCoder2 support (#5795) Sourab Mangrulkar 2024-03-02 01:00:46 +05:30
  • c29af7e225 llama : add StarCoder2 support (#5795) Sourab Mangrulkar 2024-03-02 01:00:46 +05:30
  • ff87db13c0 server : remove api_like_OAI.py proxy script (#5808) Georgi Gerganov 2024-03-01 20:00:58 +02:00
  • 38d16b1426 server : remove api_like_OAI.py proxy script (#5808) Georgi Gerganov 2024-03-01 20:00:58 +02:00
  • 43e139b0da ggml-vulkan: fix VULKAN_CHECK_RESULTS flag, which was previously broken (#5813) ddpasa 2024-03-01 18:00:00 +01:00
  • c2224f003b ggml-vulkan: fix VULKAN_CHECK_RESULTS flag, which was previously broken (#5813) ddpasa 2024-03-01 18:00:00 +01:00
  • 6ab00005c0 gemma : fix bfloat16 -> float16 conversion issue (#5810) kunal-vaishnavi 2024-03-01 06:08:08 -08:00
  • e743386728 gemma : fix bfloat16 -> float16 conversion issue (#5810) kunal-vaishnavi 2024-03-01 06:08:08 -08:00
  • 45f4f019a8 common : fix flag --logits-all to --all-logits (#5805) Miwa / Ensan 2024-03-01 22:48:56 +09:00
  • f49a535686 common : fix flag --logits-all to --all-logits (#5805) Miwa / Ensan 2024-03-01 22:48:56 +09:00
  • 5d743c9e16 llama : cleanup unused mmq flags (#5772) Pierrick Hymbert 2024-03-01 12:39:06 +01:00
  • 3ab8b3a92e llama : cleanup unused mmq flags (#5772) Pierrick Hymbert 2024-03-01 12:39:06 +01:00
  • 5952230c9f unicode : switch to multimap based nfd_map (#5799) Douglas Hanley 2024-03-01 03:15:36 -06:00
  • 9600d59e01 unicode : switch to multimap based nfd_map (#5799) Douglas Hanley 2024-03-01 03:15:36 -06:00
  • db13e52569 server: allow to override threads server pool with --threads-http (#5794) Pierrick Hymbert 2024-03-01 10:08:08 +01:00
  • 5cb02b4a01 server: allow to override threads server pool with --threads-http (#5794) Pierrick Hymbert 2024-03-01 10:08:08 +01:00
  • f7c3753e18 ci : add Ubuntu 22 Vulkan CI run (#5789) Eve 2024-03-01 08:54:53 +00:00
  • 6ea0f010ff ci : add Ubuntu 22 Vulkan CI run (#5789) Eve 2024-03-01 08:54:53 +00:00
  • 95555b04b7 server : fix newlines in help (#5785) Georgi Gerganov 2024-03-01 09:59:43 +02:00
  • f105471ef6 server : fix newlines in help (#5785) Georgi Gerganov 2024-03-01 09:59:43 +02:00
  • 0f13afed93 [SYCL] Use batched mul_mat pathway (#5591) AidanBeltonS 2024-03-01 07:36:47 +00:00
  • 38d1521608 [SYCL] Use batched mul_mat pathway (#5591) AidanBeltonS 2024-03-01 07:36:47 +00:00
  • f304cb4978 Server: normalize naming (#5779) Xuan Son Nguyen 2024-02-29 21:42:11 +01:00
  • 052051d8ae Server: normalize naming (#5779) Xuan Son Nguyen 2024-02-29 21:42:11 +01:00
  • d8b7c57702 llama : constified llama_set_state_data's src (#5774) Marcus Dunn 2024-02-29 00:17:23 -08:00
  • d5ab29757e llama : constified llama_set_state_data's src (#5774) Marcus Dunn 2024-02-29 00:17:23 -08:00
  • 03526b311c ci : reduce 3b ppl chunks to 1 to avoid timeout (#5771) Georgi Gerganov 2024-02-28 21:44:21 +02:00
  • 87c91c0766 ci : reduce 3b ppl chunks to 1 to avoid timeout (#5771) Georgi Gerganov 2024-02-28 21:44:21 +02:00
  • b872ae4e05 make portability_enumeration_ext apple only (#5757) Eve 2024-02-28 19:33:37 +00:00
  • 317709b2a8 make portability_enumeration_ext apple only (#5757) Eve 2024-02-28 19:33:37 +00:00
  • 8c3a3dc4ca llama : remove deprecated API (#5770) Georgi Gerganov 2024-02-28 18:43:38 +02:00
  • 08c5ee87e4 llama : remove deprecated API (#5770) Georgi Gerganov 2024-02-28 18:43:38 +02:00
  • c87a313a6f awq-py : remove (#5768) Georgi Gerganov 2024-02-28 17:36:53 +02:00
  • 78aacf3634 awq-py : remove (#5768) Georgi Gerganov 2024-02-28 17:36:53 +02:00
  • 49255c8c2e sync : ggml Georgi Gerganov 2024-02-28 11:17:32 +02:00
  • 8c0e8f4e73 sync : ggml Georgi Gerganov 2024-02-28 11:17:32 +02:00
  • 527baa910d add google magika inference example (ggml/748) slaren 2024-02-25 20:41:35 +01:00
  • 2774b0c974 add google magika inference example (ggml/748) slaren 2024-02-25 20:41:35 +01:00
  • c18cf38511 Introduce backend GUIDs (ggml/743) UEXTM.com 2024-02-24 11:27:36 -05:00
  • 5f70671856 Introduce backend GUIDs (ggml/743) UEXTM.com 2024-02-24 11:27:36 -05:00
  • 98d62bd45e server : hit Ctrl+C twice to exit (#5734) Xuan Son Nguyen 2024-02-28 09:55:37 +01:00
  • a693bea1e6 server : hit Ctrl+C twice to exit (#5734) Xuan Son Nguyen 2024-02-28 09:55:37 +01:00
  • 0b7dbbbfd1 llama : fix non-quantization of expert gating tensors (#5754) compilade 2024-02-28 03:52:56 -05:00
  • adcb12a9ba llama : fix non-quantization of expert gating tensors (#5754) compilade 2024-02-28 03:52:56 -05:00
  • c2c3700597 llama : improve BERT tokenization (#5740) Douglas Hanley 2024-02-28 02:51:11 -06:00
  • 177628bfd8 llama : improve BERT tokenization (#5740) Douglas Hanley 2024-02-28 02:51:11 -06:00
  • 829dc074cb readme : add link to LLaVA 1.6 models (#5758) Daniel Bevenius 2024-02-28 09:39:39 +01:00
  • 6c4416868d readme : add link to LLaVA 1.6 models (#5758) Daniel Bevenius 2024-02-28 09:39:39 +01:00
  • 2318a4cc87 server : add "/chat/completions" alias for "/v1/...` (#5722) Jorge A 2024-02-28 01:39:15 -07:00
  • efc72253f7 server : add "/chat/completions" alias for "/v1/...` (#5722) Jorge A 2024-02-28 01:39:15 -07:00
  • 1ef3857e88 ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (#5760) Kawrakow 2024-02-28 10:37:02 +02:00
  • 7c4263d426 ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (#5760) Kawrakow 2024-02-28 10:37:02 +02:00
  • 67c31e6bc9 Attempt to fix android build (#5752) Kawrakow 2024-02-27 19:16:49 +02:00
  • cb49e0f8c9 Attempt to fix android build (#5752) Kawrakow 2024-02-27 19:16:49 +02:00
  • a1260421bf IQ4_XS: a 4.25 bpw quantization (#5747) Kawrakow 2024-02-27 16:34:24 +02:00
  • 0becb22ac0 IQ4_XS: a 4.25 bpw quantization (#5747) Kawrakow 2024-02-27 16:34:24 +02:00
  • 45bcf55705 cuda : replace remaining shfl_xor with calls to warp_reduce functions (#5744) Engininja2 2024-02-27 07:22:45 -06:00
  • c24a2a6e60 cuda : replace remaining shfl_xor with calls to warp_reduce functions (#5744) Engininja2 2024-02-27 07:22:45 -06:00
  • 1b4c646152 ggml-quants : fix avx2 iq1_s vec_dot when compiled with gcc (#5742) Engininja2 2024-02-27 06:50:18 -06:00
  • 1f30b7a9f1 ggml-quants : fix avx2 iq1_s vec_dot when compiled with gcc (#5742) Engininja2 2024-02-27 06:50:18 -06:00
  • 127296f9ad llama : fix defrag bugs + add parameter (#5735) Georgi Gerganov 2024-02-27 14:35:51 +02:00