Commit Graph

  • b9469762a3 rpc : resource management rework (#7562) Radoslav Gerganov 2024-05-28 18:13:36 +03:00
  • 2b737caae1 rpc : resource management rework (#7562) Radoslav Gerganov 2024-05-28 18:13:36 +03:00
  • e354ad8256 Add support for DeepseekV2ForCausalLM (#7519) fairydreaming 2024-05-28 17:07:05 +02:00
  • ee3dff6b8e Add support for DeepseekV2ForCausalLM (#7519) fairydreaming 2024-05-28 17:07:05 +02:00
  • 7833b10088 tests : fix test-tokenizer-0.sh Georgi Gerganov 2024-05-28 15:04:09 +03:00
  • edc29433fa tests : fix test-tokenizer-0.sh Georgi Gerganov 2024-05-28 15:04:09 +03:00
  • 364c72ddb6 llama : handle unknown utf8 bytes (#7588) Georgi Gerganov 2024-05-28 13:55:35 +03:00
  • 8b99e2aa66 llama : handle unknown utf8 bytes (#7588) Georgi Gerganov 2024-05-28 13:55:35 +03:00
  • d42f92b628 github: add refactor to issue template (#7561) Brian 2024-05-28 20:27:27 +10:00
  • 271ff3fc44 github: add refactor to issue template (#7561) Brian 2024-05-28 20:27:27 +10:00
  • d904b43cc3 [SYCL]fix ggml_sycl_mul_mat_id() to match the change of api (#7436) Neo Zhang 2024-05-28 17:53:37 +08:00
  • e2b065071c [SYCL]fix ggml_sycl_mul_mat_id() to match the change of api (#7436) Neo Zhang 2024-05-28 17:53:37 +08:00
  • 7acc48960e ggml : generalize GGML_OP_CONCAT (#7563) Georgi Gerganov 2024-05-28 11:04:19 +03:00
  • 0548a4187f ggml : generalize GGML_OP_CONCAT (#7563) Georgi Gerganov 2024-05-28 11:04:19 +03:00
  • b156a7902e server: do not remove whitespace at the start of a completion chunk (#7524) mgroeber9110 2024-05-28 06:55:51 +02:00
  • 9335b969e8 server: do not remove whitespace at the start of a completion chunk (#7524) mgroeber9110 2024-05-28 06:55:51 +02:00
  • 6e1a1dbdc7 Markdownish code block fix (#7571) Nathan Epstein 2024-05-28 00:41:14 -04:00
  • c41767154e Markdownish code block fix (#7571) Nathan Epstein 2024-05-28 00:41:14 -04:00
  • e86aa96307 llava : update clip.h (#7580) Ikko Eltociear Ashimine 2024-05-28 11:48:16 +09:00
  • 74b239b3d5 llava : update clip.h (#7580) Ikko Eltociear Ashimine 2024-05-28 11:48:16 +09:00
  • 8b277ca70f update HIP_UMA #7399 (#7414) Djip007 2024-05-28 01:40:47 +02:00
  • 852aafb163 update HIP_UMA #7399 (#7414) Djip007 2024-05-28 01:40:47 +02:00
  • 2e63949afa adding in x64 targets to cmake presets (#7574) kunnis 2024-05-27 18:40:12 -05:00
  • 0136966daf adding in x64 targets to cmake presets (#7574) kunnis 2024-05-27 18:40:12 -05:00
  • 22109fa246 make: add --device-debug to NVCC debug flags (#7542) Johannes Gäßler 2024-05-27 19:34:40 +02:00
  • 10b1e45876 make: add --device-debug to NVCC debug flags (#7542) Johannes Gäßler 2024-05-27 19:34:40 +02:00
  • c2971b1992 Allow multiple copy function pointers for CUDA graph kernel param updates (#7565) agray3 2024-05-27 18:33:42 +01:00
  • 197c00681b Allow multiple copy function pointers for CUDA graph kernel param updates (#7565) agray3 2024-05-27 18:33:42 +01:00
  • e7a82c1cc0 Fix q_xxs using mul_mat_q (#7459) AidanBeltonS 2024-05-27 17:34:51 +01:00
  • 95f84d5ce8 Fix q_xxs using mul_mat_q (#7459) AidanBeltonS 2024-05-27 17:34:51 +01:00
  • 79c81bf253 Add freq factors (#7495) AidanBeltonS 2024-05-27 13:34:09 +01:00
  • 5487593bc7 Add freq factors (#7495) AidanBeltonS 2024-05-27 13:34:09 +01:00
  • f81106e1ca metal : add GGML_OP_REPEAT kernels (#7557) Georgi Gerganov 2024-05-27 12:10:19 +03:00
  • 1d8fca72ae metal : add GGML_OP_REPEAT kernels (#7557) Georgi Gerganov 2024-05-27 12:10:19 +03:00
  • cd7ef83d84 metal : disable FA kernel for HS=256 (#7556) Georgi Gerganov 2024-05-27 10:38:39 +03:00
  • 62bfef5194 metal : disable FA kernel for HS=256 (#7556) Georgi Gerganov 2024-05-27 10:38:39 +03:00
  • 49da0c4246 llama : add comments about experimental flags (#7544) Georgi Gerganov 2024-05-27 09:24:13 +03:00
  • eaf6e03174 llama : add comments about experimental flags (#7544) Georgi Gerganov 2024-05-27 09:24:13 +03:00
  • 11a3ec860b github: add self sorted issue ticket forms (#7543) Brian 2024-05-27 10:54:30 +10:00
  • d6ef0e77dd github: add self sorted issue ticket forms (#7543) Brian 2024-05-27 10:54:30 +10:00
  • 426def78e0 flake.lock: Update (#7540) Georgi Gerganov 2024-05-26 18:54:56 +03:00
  • dff451cfa1 flake.lock: Update (#7540) Georgi Gerganov 2024-05-26 18:54:56 +03:00
  • 22d5b5d3c6 main: replace --no-special with --special (#7534) Brian 2024-05-27 00:10:17 +10:00
  • d298382ad9 main: replace --no-special with --special (#7534) Brian 2024-05-27 00:10:17 +10:00
  • 19a748b5eb Fix aya-23 conversion scripts (#7539) Galunid 2024-05-26 16:02:34 +02:00
  • 32a28217f4 Fix aya-23 conversion scripts (#7539) Galunid 2024-05-26 16:02:34 +02:00
  • cbfef1b8c1 llama : add Smaug 70B support (#7402) Bartowski 2024-05-26 08:28:35 -04:00
  • c429b33beb llama : add Smaug 70B support (#7402) Bartowski 2024-05-26 08:28:35 -04:00
  • 1971bfacb0 Readme: add akx/ggify to tools (#1484) Aarni Koskela 2024-05-26 15:09:42 +03:00
  • 9146d36fe7 Readme: add akx/ggify to tools (#1484) Aarni Koskela 2024-05-26 15:09:42 +03:00
  • c20c488e86 SimpleChat Completion Mode flexibility and cleanup, Settings gMe, Optional sliding window (#7480) HanishKVC 2024-05-26 06:26:34 +05:30
  • b9adcbbf92 SimpleChat Completion Mode flexibility and cleanup, Settings gMe, Optional sliding window (#7480) HanishKVC 2024-05-26 06:26:34 +05:30
  • 14247798cb train : change default FA argument (#7528) Georgi Gerganov 2024-05-25 15:21:30 +03:00
  • 9588f196b1 train : change default FA argument (#7528) Georgi Gerganov 2024-05-25 15:21:30 +03:00
  • 39c7118283 labeler: added Apple Metal detector (+Kompute) (#7529) Brian 2024-05-25 19:30:42 +10:00
  • 3cbd23ed88 labeler: added Apple Metal detector (+Kompute) (#7529) Brian 2024-05-25 19:30:42 +10:00
  • e8b258a8ea main : don't print special tokens with --grammar (#6923) Justine Tunney 2024-05-25 05:04:03 -04:00
  • 00c6390793 main : don't print special tokens with --grammar (#6923) Justine Tunney 2024-05-25 05:04:03 -04:00
  • 6e71889fcf ggml: aarch64: SVE kernels for q8_0_q8_0, q4_0_q8_0 vector dot (#7433) Masaya, Kato 2024-05-25 17:42:31 +09:00
  • faa0e6979a ggml: aarch64: SVE kernels for q8_0_q8_0, q4_0_q8_0 vector dot (#7433) Masaya, Kato 2024-05-25 17:42:31 +09:00
  • b878b50a18 android : module (#7502) Elton Kola 2024-05-25 04:11:33 -04:00
  • 9791f40258 android : module (#7502) Elton Kola 2024-05-25 04:11:33 -04:00
  • 1f72fc0afe fix missing slash in fs_get_cache_directory() (#7503) Xuan Son Nguyen 2024-05-25 05:30:59 +02:00
  • 902184dd3a fix missing slash in fs_get_cache_directory() (#7503) Xuan Son Nguyen 2024-05-25 05:30:59 +02:00
  • c37df4489c Make tokenize CLI tool have nicer command line arguments. (#6188) Mikko Juola 2024-05-24 18:14:42 -07:00
  • 57684331fc Make tokenize CLI tool have nicer command line arguments. (#6188) Mikko Juola 2024-05-24 18:14:42 -07:00
  • 2e044457a1 gguf-py : fix and simplify quantized shape round-trip (#7483) compilade 2024-05-24 21:11:48 -04:00
  • b83bab15a5 gguf-py : fix and simplify quantized shape round-trip (#7483) compilade 2024-05-24 21:11:48 -04:00
  • 13c0958fb9 flake.lock: Update (#7232) Georgi Gerganov 2024-05-24 18:59:06 +03:00
  • d041d2ceaa flake.lock: Update (#7232) Georgi Gerganov 2024-05-24 18:59:06 +03:00
  • 7929f977ac docker.yml: disable light-intel and server-intel test (#7515) Brian 2024-05-24 23:47:56 +10:00
  • 27891f6db0 docker.yml: disable light-intel and server-intel test (#7515) Brian 2024-05-24 23:47:56 +10:00
  • 0682aaed8d Add support for ArcticForCausalLM (#7020) fairydreaming 2024-05-24 14:31:13 +02:00
  • fbca2f27fc Add support for ArcticForCausalLM (#7020) fairydreaming 2024-05-24 14:31:13 +02:00
  • 99e4e32005 add build shared lib in win release package (#7438) Neo Zhang 2024-05-24 10:06:56 +08:00
  • 0df0aa8e43 add build shared lib in win release package (#7438) Neo Zhang 2024-05-24 10:06:56 +08:00
  • 664f7bd8b0 readme : remove trailing space (#7469) Georgi Gerganov 2024-05-23 17:43:18 +03:00
  • 74f33adf5f readme : remove trailing space (#7469) Georgi Gerganov 2024-05-23 17:43:18 +03:00
  • 6e329292af ggml : silence UB sanitizer error during iq2_xxs quantization (#0) Georgi Gerganov 2024-05-23 17:17:43 +03:00
  • 1debe72737 ggml : silence UB sanitizer error during iq2_xxs quantization (#0) Georgi Gerganov 2024-05-23 17:17:43 +03:00
  • 6ba113274e Fix phi3 chat template confusion with zephyr (#7449) Tristan Druyen 2024-05-23 16:15:15 +02:00
  • 007489e895 Fix phi3 chat template confusion with zephyr (#7449) Tristan Druyen 2024-05-23 16:15:15 +02:00
  • 872401a90a readme : add Bunny in supported models [no ci] (#7469) Raj Hammeer Singh Hada 2024-05-23 18:00:13 +05:30
  • 8b94e799df readme : add Bunny in supported models [no ci] (#7469) Raj Hammeer Singh Hada 2024-05-23 18:00:13 +05:30
  • b4b6347da9 llama : add getters for n_threads/n_threads_batch (#7464) Daniel Bevenius 2024-05-23 14:29:26 +02:00
  • 3015851c5a llama : add getters for n_threads/n_threads_batch (#7464) Daniel Bevenius 2024-05-23 14:29:26 +02:00
  • 159d3a2641 ci : use Pythia models instead of OpenLlama (#7470) Georgi Gerganov 2024-05-23 15:28:14 +03:00
  • 55ac3b7aea ci : use Pythia models instead of OpenLlama (#7470) Georgi Gerganov 2024-05-23 15:28:14 +03:00
  • 7525e9fad7 readme : add GPT-NeoX + Pythia to the list of supported models (#7491) Victor Nogueira 2024-05-23 15:12:43 +03:00
  • dacfcebd60 readme : add GPT-NeoX + Pythia to the list of supported models (#7491) Victor Nogueira 2024-05-23 15:12:43 +03:00
  • 29d6974d16 Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-NeoX base models) (#7461) fairydreaming 2024-05-23 11:49:53 +02:00
  • 9b82476ee9 Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-NeoX base models) (#7461) fairydreaming 2024-05-23 11:49:53 +02:00
  • 61f0b9e711 llama : rename n_ctx -> cache.size, less confusing (#0) Georgi Gerganov 2024-05-23 12:38:18 +03:00
  • a61a94e543 llama : rename n_ctx -> cache.size, less confusing (#0) Georgi Gerganov 2024-05-23 12:38:18 +03:00
  • 75e5b1388c labeler.yml: add embedding label detector [no ci] (#7482) Brian 2024-05-23 17:40:43 +10:00
  • 152da28ae5 labeler.yml: add embedding label detector [no ci] (#7482) Brian 2024-05-23 17:40:43 +10:00
  • 46296642ca ggml : remove ggml_flash_attn and ggml_flash_ff (#7463) Georgi Gerganov 2024-05-23 10:00:44 +03:00
  • d48c88cbd5 ggml : remove ggml_flash_attn and ggml_flash_ff (#7463) Georgi Gerganov 2024-05-23 10:00:44 +03:00
  • a90628d8a0 ggml : drop support for QK_K=64 (#7473) Georgi Gerganov 2024-05-23 10:00:21 +03:00
  • e84b71c2c6 ggml : drop support for QK_K=64 (#7473) Georgi Gerganov 2024-05-23 10:00:21 +03:00