Commit Graph

  • 2d00741e12 py : fix lint (#4889) Georgi Gerganov 2024-01-12 13:03:38 +02:00
  • 9d4439a38d llama : fix llm_build_k_shift to use correct n_rot (#4889) Georgi Gerganov 2024-01-12 13:01:56 +02:00
  • f445c0e68c llama : fix llm_build_k_shift to use correct n_rot (#4889) Georgi Gerganov 2024-01-12 13:01:56 +02:00
  • b8e769ee21 Importance Matrix calculation (#4861) Kawrakow 2024-01-12 06:59:57 +01:00
  • 326b418b59 Importance Matrix calculation (#4861) Kawrakow 2024-01-12 06:59:57 +01:00
  • 599288bf8a server : fix infill when prompt is empty (#4833) Georgi Gerganov 2024-01-11 23:23:49 +02:00
  • 1d118386fe server : fix infill when prompt is empty (#4833) Georgi Gerganov 2024-01-11 23:23:49 +02:00
  • d12d72c0a2 main : better name for variable n_print (#4874) Georgi Gerganov 2024-01-11 22:46:26 +02:00
  • 7edefbd79c main : better name for variable n_print (#4874) Georgi Gerganov 2024-01-11 22:46:26 +02:00
  • 2e19c63956 main : disable token count by default (#4874) Georgi Gerganov 2024-01-11 22:43:05 +02:00
  • 3ca63b4538 main : disable token count by default (#4874) Georgi Gerganov 2024-01-11 22:43:05 +02:00
  • 6936a37c2b swift : track ggml release branch (#4867) Georgi Gerganov 2024-01-11 21:58:28 +02:00
  • b037787548 swift : track ggml release branch (#4867) Georgi Gerganov 2024-01-11 21:58:28 +02:00
  • a9d5db805b llama : restore intended k-quants mixes for MoE models (#4872) Kawrakow 2024-01-11 20:43:15 +01:00
  • 469e75d0a3 llama : restore intended k-quants mixes for MoE models (#4872) Kawrakow 2024-01-11 20:43:15 +01:00
  • 2757620588 ggml : SOTA 2-bit quants (add IQ2_XS) (#4856) Kawrakow 2024-01-11 20:39:39 +01:00
  • 49662cbed3 ggml : SOTA 2-bit quants (add IQ2_XS) (#4856) Kawrakow 2024-01-11 20:39:39 +01:00
  • 8b546126e4 swift : pin ggml commit + remove ggml.h from spm-headers (#4878) Georgi Gerganov 2024-01-11 21:31:31 +02:00
  • 3ba5b8ca8e swift : pin ggml commit + remove ggml.h from spm-headers (#4878) Georgi Gerganov 2024-01-11 21:31:31 +02:00
  • 026f72d14b server : implement credentialed CORS (#4514) Laura 2024-01-11 19:02:48 +01:00
  • 4330bd83fe server : implement credentialed CORS (#4514) Laura 2024-01-11 19:02:48 +01:00
  • 2fce0d62ba server : support for multiple api keys (#4864) Michael Coppola 2024-01-11 12:51:17 -05:00
  • 27379455c3 server : support for multiple api keys (#4864) Michael Coppola 2024-01-11 12:51:17 -05:00
  • 1760ce4a1d server : add LOG_INFO when model is successfully loaded (#4881) Behnam M 2024-01-11 12:41:39 -05:00
  • eab6795006 server : add LOG_INFO when model is successfully loaded (#4881) Behnam M 2024-01-11 12:41:39 -05:00
  • 855d77b3ce ci: nix-flake-update: new token with pr permissions (#4879) Someone 2024-01-11 17:22:34 +00:00
  • d8d90aa343 ci: nix-flake-update: new token with pr permissions (#4879) Someone 2024-01-11 17:22:34 +00:00
  • 948a870c14 main : print total token count and tokens consumed so far (#4874) pudepiedj 2024-01-11 16:14:52 +00:00
  • 43f76bf1c3 main : print total token count and tokens consumed so far (#4874) pudepiedj 2024-01-11 16:14:52 +00:00
  • d08d46765f server : fix typo in model name (#4876) Isaac McFadyen 2024-01-11 09:33:26 -05:00
  • 2f043328e3 server : fix typo in model name (#4876) Isaac McFadyen 2024-01-11 09:33:26 -05:00
  • a84f27add4 metal : put encoder debug group behind a define (#4873) Paul Tsochantaris 2024-01-11 14:31:52 +00:00
  • 2a7c94db5f metal : put encoder debug group behind a define (#4873) Paul Tsochantaris 2024-01-11 14:31:52 +00:00
  • d3d92fe41c sync : ggml Georgi Gerganov 2024-01-11 09:39:08 +02:00
  • 64802ec00d sync : ggml Georgi Gerganov 2024-01-11 09:39:08 +02:00
  • 71cbca9e4c metal : fix deprecation warning (ggml/690) Georgi Gerganov 2024-01-11 09:34:59 +02:00
  • 3267c2abc7 metal : fix deprecation warning (ggml/690) Georgi Gerganov 2024-01-11 09:34:59 +02:00
  • 1332477a98 ggml : remove ggml_cpy_inplace and ggml_cont_inplace (ggml/693) Timothy Cronin 2024-01-11 02:27:48 -05:00
  • f85a973aa1 ggml : remove ggml_cpy_inplace and ggml_cont_inplace (ggml/693) Timothy Cronin 2024-01-11 02:27:48 -05:00
  • c1cc3b7fc6 metal : wrap each operation in debug group (ggml/690) Jack Mousseau 2024-01-10 06:19:19 -08:00
  • 5362e43962 metal : wrap each operation in debug group (ggml/690) Jack Mousseau 2024-01-10 06:19:19 -08:00
  • 677469392f ggml : change GGML_MAX_NAME at compile time (ggml/682) leejet 2024-01-10 21:13:42 +08:00
  • e739de7909 ggml : change GGML_MAX_NAME at compile time (ggml/682) leejet 2024-01-10 21:13:42 +08:00
  • f50c2560a7 Fix execlp call (ggml/689) Halalaluyafail3 2024-01-09 11:16:37 -05:00
  • c910e3c28a Fix execlp call (ggml/689) Halalaluyafail3 2024-01-09 11:16:37 -05:00
  • 72025b333a fix : cuda order of synchronization when setting a buffer (ggml/679) Erik Scholz 2024-01-05 16:00:00 +01:00
  • f34432ca1e fix : cuda order of synchronization when setting a buffer (ggml/679) Erik Scholz 2024-01-05 16:00:00 +01:00
  • 751a33212c server : update readme to document the new /health endpoint (#4866) Behnam M 2024-01-11 02:12:05 -05:00
  • 7a9f75c38b server : update readme to document the new /health endpoint (#4866) Behnam M 2024-01-11 02:12:05 -05:00
  • a2e0602ac0 server : fix build + rename enums (#4870) Georgi Gerganov 2024-01-11 09:10:34 +02:00
  • 5c1980d8d4 server : fix build + rename enums (#4870) Georgi Gerganov 2024-01-11 09:10:34 +02:00
  • fe3d53f647 server : add a /health endpoint (#4860) Behnam M 2024-01-10 14:56:05 -05:00
  • cd108e641d server : add a /health endpoint (#4860) Behnam M 2024-01-10 14:56:05 -05:00
  • 6d07c21fb9 llama : add additional suffixes for model params (#4834) Brian 2024-01-11 01:09:53 +11:00
  • 57d016ba2d llama : add additional suffixes for model params (#4834) Brian 2024-01-11 01:09:53 +11:00
  • 9ff2a25fc4 llama : recognize 1B phi models (#4847) Austin 2024-01-10 08:39:09 -05:00
  • 329ff61569 llama : recognize 1B phi models (#4847) Austin 2024-01-10 08:39:09 -05:00
  • 239c336728 clip : support more quantization types (#4846) John 2024-01-10 14:37:09 +01:00
  • d34633d8db clip : support more quantization types (#4846) John 2024-01-10 14:37:09 +01:00
  • 3091bc1ab0 Python script to compare commits with llama-bench (#4844) Johannes Gäßler 2024-01-10 01:04:33 +01:00
  • 4f56458d34 Python script to compare commits with llama-bench (#4844) Johannes Gäßler 2024-01-10 01:04:33 +01:00
  • 2786c0bee6 convert.py : fix vanilla LLaMA model conversion (#4818) Austin 2024-01-09 13:46:46 -05:00
  • 6efb8eb30e convert.py : fix vanilla LLaMA model conversion (#4818) Austin 2024-01-09 13:46:46 -05:00
  • acc4879612 llava-cli : don't crash if --image flag is invalid (#4835) Justine Tunney 2024-01-09 09:59:14 -08:00
  • 36e5a08b20 llava-cli : don't crash if --image flag is invalid (#4835) Justine Tunney 2024-01-09 09:59:14 -08:00
  • e05036aac6 metal : improve dequantize precision to match CPU (#4836) Georgi Gerganov 2024-01-09 19:37:08 +02:00
  • 4dccb38d9a metal : improve dequantize precision to match CPU (#4836) Georgi Gerganov 2024-01-09 19:37:08 +02:00
  • 67c950b390 scripts : improve get-pg.sh (#4838) Georgi Gerganov 2024-01-09 19:20:45 +02:00
  • 9a818f7c42 scripts : improve get-pg.sh (#4838) Georgi Gerganov 2024-01-09 19:20:45 +02:00
  • a1392fc017 readme : add 3rd party collama reference to UI list (#4840) iohub 2024-01-10 00:45:54 +08:00
  • 18adb4e9bb readme : add 3rd party collama reference to UI list (#4840) iohub 2024-01-10 00:45:54 +08:00
  • 377e2df071 scripts : script to get Paul Graham essays in txt format (#4838) Georgi Gerganov 2024-01-09 16:23:05 +02:00
  • d9653894df scripts : script to get Paul Graham essays in txt format (#4838) Georgi Gerganov 2024-01-09 16:23:05 +02:00
  • b680381cfd server : update readme about token probs (#4777) Behnam M 2024-01-09 05:02:05 -05:00
  • 128de3585b server : update readme about token probs (#4777) Behnam M 2024-01-09 05:02:05 -05:00
  • 2d0d38f5e0 server : add api-key flag to documentation (#4832) Zsapi 2024-01-09 10:12:43 +01:00
  • 8c58330318 server : add api-key flag to documentation (#4832) Zsapi 2024-01-09 10:12:43 +01:00
  • d39e6e0cad ggml : fix vld1q_s8_x4 32-bit compat (#4828) Georgi Gerganov 2024-01-09 10:42:06 +02:00
  • 18c2e1752c ggml : fix vld1q_s8_x4 32-bit compat (#4828) Georgi Gerganov 2024-01-09 10:42:06 +02:00
  • 635bddbf8b CUDA: faster softmax via shared memory + fp16 math (#4742) Johannes Gäßler 2024-01-09 08:58:55 +01:00
  • 8f900abfc0 CUDA: faster softmax via shared memory + fp16 math (#4742) Johannes Gäßler 2024-01-09 08:58:55 +01:00
  • cfbe33b956 common : fix the short form of --grp-attn-w, not -gat (#4825) howlger 2024-01-08 20:05:53 +01:00
  • 1fc2f265ff common : fix the short form of --grp-attn-w, not -gat (#4825) howlger 2024-01-08 20:05:53 +01:00
  • ac412c4ec1 readme : add link to SOTA models Georgi Gerganov 2024-01-08 20:25:17 +02:00
  • a9a8c5de3d readme : add link to SOTA models Georgi Gerganov 2024-01-08 20:25:17 +02:00
  • 7daaac5c6d SOTA 2-bit quants (#4773) Kawrakow 2024-01-08 16:02:32 +01:00
  • dd5ae06405 SOTA 2-bit quants (#4773) Kawrakow 2024-01-08 16:02:32 +01:00
  • 0fe35cdf38 swift : exclude ggml-metal.metal from the package (#4822) Georgi Gerganov 2024-01-08 16:40:51 +02:00
  • 668b31fc7d swift : exclude ggml-metal.metal from the package (#4822) Georgi Gerganov 2024-01-08 16:40:51 +02:00
  • 3e86f86432 llama.swiftui : update readme Georgi Gerganov 2024-01-08 15:57:36 +02:00
  • 42ea63c5a3 llama.swiftui : update readme Georgi Gerganov 2024-01-08 15:57:36 +02:00
  • 7ddf5857e7 main : add self-extend support (#4815) Georgi Gerganov 2024-01-08 11:18:32 +02:00
  • 52531fdff8 main : add self-extend support (#4815) Georgi Gerganov 2024-01-08 11:18:32 +02:00
  • a386b0dd63 examples : add passkey test (#3856) Georgi Gerganov 2024-01-08 11:14:04 +02:00
  • b0034d93ce examples : add passkey test (#3856) Georgi Gerganov 2024-01-08 11:14:04 +02:00
  • 9e96d6076a readme : add lgrammel/modelfusion JS/TS client for llama.cpp (#4814) Lars Grammel 2024-01-07 21:24:11 +01:00
  • b7e7982953 readme : add lgrammel/modelfusion JS/TS client for llama.cpp (#4814) Lars Grammel 2024-01-07 21:24:11 +01:00
  • d513cfc4b5 llama-bench : add no-kv-offload parameter (#4812) slaren 2024-01-07 17:59:01 +01:00
  • 226460cc0d llama-bench : add no-kv-offload parameter (#4812) slaren 2024-01-07 17:59:01 +01:00
  • 770ec541f9 CUDA: fixed redundant value dequantization (#4809) Johannes Gäßler 2024-01-07 17:24:08 +01:00