Commit Graph

  • 34dbd5ac9f Add left recursion check: quit early instead of going into an infinite loop (#7083) Haggai Nuchi 2024-05-13 22:25:56 -07:00
  • e0f556186b Add left recursion check: quit early instead of going into an infinite loop (#7083) Haggai Nuchi 2024-05-13 22:25:56 -07:00
  • dc91e1430e docs: Fix typo and update description for --embeddings flag (#7026) Ryuei 2024-05-14 14:20:47 +09:00
  • 27f65d6267 docs: Fix typo and update description for --embeddings flag (#7026) Ryuei 2024-05-14 14:20:47 +09:00
  • 2ea6201d71 convert-hf : support direct Q8_0 conversion (#7234) compilade 2024-05-13 14:10:51 -04:00
  • ee52225067 convert-hf : support direct Q8_0 conversion (#7234) compilade 2024-05-13 14:10:51 -04:00
  • b60d93f7f7 llama : less KV padding when FA is off (#7257) Georgi Gerganov 2024-05-13 17:15:15 +03:00
  • 614d3b914e llama : less KV padding when FA is off (#7257) Georgi Gerganov 2024-05-13 17:15:15 +03:00
  • e0baf1aca7 llava-cli: fix base64 prompt (#7248) k.h.lai 2024-05-13 22:02:36 +08:00
  • 30e70334f7 llava-cli: fix base64 prompt (#7248) k.h.lai 2024-05-13 22:02:36 +08:00
  • 2d2147923e perplexity: add BF16 vs. FP16 results (#7150) Johannes Gäßler 2024-05-13 13:03:27 +02:00
  • 1c570d8bee perplexity: add BF16 vs. FP16 results (#7150) Johannes Gäßler 2024-05-13 13:03:27 +02:00
  • b97711d6c1 [SYCL] rm wait() (#7233) Neo Zhang 2024-05-13 18:11:26 +08:00
  • 948f4ec7c5 [SYCL] rm wait() (#7233) Neo Zhang 2024-05-13 18:11:26 +08:00
  • bf009f1d45 llama : rename jina tokenizers to v2 (#7249) Joan Fontanals 2024-05-13 10:35:14 +02:00
  • 9aa672490c llama : rename jina tokenizers to v2 (#7249) Joan Fontanals 2024-05-13 10:35:14 +02:00
  • 97684629fa convert.py: Outfile default name change and additional metadata support (#4858) Brian 2024-05-13 12:56:47 +10:00
  • b1f8af1886 convert.py: Outfile default name change and additional metadata support (#4858) Brian 2024-05-13 12:56:47 +10:00
  • 9b9b4eb946 change default temperature of OAI compat API from 0 to 1 (#7226) Benjamin Findley 2024-05-12 19:40:08 -07:00
  • e586ee4259 change default temperature of OAI compat API from 0 to 1 (#7226) Benjamin Findley 2024-05-12 19:40:08 -07:00
  • 1fa2d4319b [SYCL] Add oneapi runtime dll files to win release package (#7241) Neo Zhang 2024-05-13 08:04:29 +08:00
  • cbf75894d2 [SYCL] Add oneapi runtime dll files to win release package (#7241) Neo Zhang 2024-05-13 08:04:29 +08:00
  • b079bf29ed [SYCL] update CI with oneapi 2024.1 (#7235) Neo Zhang 2024-05-13 08:02:55 +08:00
  • 0d5cef78ae [SYCL] update CI with oneapi 2024.1 (#7235) Neo Zhang 2024-05-13 08:02:55 +08:00
  • 871641d19e CUDA: add FP32 FlashAttention vector kernel (#7188) Johannes Gäßler 2024-05-12 19:40:45 +02:00
  • dc685be466 CUDA: add FP32 FlashAttention vector kernel (#7188) Johannes Gäßler 2024-05-12 19:40:45 +02:00
  • 7e9af321f0 cmake : fix version cmp (#7227) Georgi Gerganov 2024-05-12 18:30:23 +03:00
  • 6f1b63606f cmake : fix version cmp (#7227) Georgi Gerganov 2024-05-12 18:30:23 +03:00
  • 951bf492c4 remove convert-lora-to-ggml.py (#7204) slaren 2024-05-12 02:29:33 +02:00
  • b228aba91a remove convert-lora-to-ggml.py (#7204) slaren 2024-05-12 02:29:33 +02:00
  • 51855c8a4d metal : fix warnings (skipme) (#0) Georgi Gerganov 2024-05-11 21:36:20 +03:00
  • 7bd4ffb780 metal : fix warnings (skipme) (#0) Georgi Gerganov 2024-05-11 21:36:20 +03:00
  • 582ea6f97b sync : ggml Georgi Gerganov 2024-05-11 21:35:05 +03:00
  • 1622ac023f sync : ggml Georgi Gerganov 2024-05-11 21:35:05 +03:00
  • 7162ede797 metal : fix indent (ggml/0) Georgi Gerganov 2024-05-11 16:57:53 +03:00
  • 6aeff24f8b metal : fix indent (ggml/0) Georgi Gerganov 2024-05-11 16:57:53 +03:00
  • 7e2103ca3c ggml : resolve merge (ggml/0) Georgi Gerganov 2024-05-11 16:25:50 +03:00
  • 325756d28d ggml : resolve merge (ggml/0) Georgi Gerganov 2024-05-11 16:25:50 +03:00
  • aa01842c75 Scripting & documenting debugging one test without anything else in the loop. (#7096) Josh Ramer 2024-05-11 12:26:35 -05:00
  • fed0108491 Scripting & documenting debugging one test without anything else in the loop. (#7096) Josh Ramer 2024-05-11 12:26:35 -05:00
  • a00bd6e049 fix system prompt handling (#7153) Xuan Son Nguyen 2024-05-11 17:28:10 +02:00
  • 72c177c1f6 fix system prompt handling (#7153) Xuan Son Nguyen 2024-05-11 17:28:10 +02:00
  • 770c662564 convert-hf : support bfloat16 conversion (#7158) compilade 2024-05-11 11:06:26 -04:00
  • 5a419926b0 convert-hf : support bfloat16 conversion (#7158) compilade 2024-05-11 11:06:26 -04:00
  • fa6762f4a1 sync : ggml Georgi Gerganov 2024-05-11 12:02:39 +03:00
  • fae9d234b6 sync : ggml Georgi Gerganov 2024-05-11 12:02:39 +03:00
  • cc66c93104 feat: implemented sigmoid function (ggml/806) Justina Cho 2024-05-01 14:44:26 -07:00
  • f5ef34e428 feat: implemented sigmoid function (ggml/806) Justina Cho 2024-05-01 14:44:26 -07:00
  • d10ea79cb8 build: fix and ignore msvc warnings (ggml/805) Borislav Stanimirov 2024-04-25 17:24:07 +03:00
  • ef0d5e3ec9 build: fix and ignore msvc warnings (ggml/805) Borislav Stanimirov 2024-04-25 17:24:07 +03:00
  • c62cac0848 convert : skip unaccessible HF repos (#7210) CrispStrobe 2024-05-11 10:18:35 +02:00
  • 3292733f95 convert : skip unaccessible HF repos (#7210) CrispStrobe 2024-05-11 10:18:35 +02:00
  • 2a4289e314 server : free llama_batch on exit (#7212) Steve Grubb 2024-05-11 04:13:02 -04:00
  • 988631335a server : free llama_batch on exit (#7212) Steve Grubb 2024-05-11 04:13:02 -04:00
  • 40b7feb8e2 llama : lookup word in vocab before doing BPE merges (#7193) Haoxiang Fei 2024-05-11 16:12:06 +08:00
  • f99e1e456e llama : lookup word in vocab before doing BPE merges (#7193) Haoxiang Fei 2024-05-11 16:12:06 +08:00
  • 70a18260b2 server: fix reported top tokens for temperature 0 (#7203) Johannes Gäßler 2024-05-11 10:11:28 +02:00
  • 5ae3426b0b server: fix reported top tokens for temperature 0 (#7203) Johannes Gäßler 2024-05-11 10:11:28 +02:00
  • bbd8009868 llama : add Jina Embeddings architecture (#6826) Joan Fontanals 2024-05-11 09:46:09 +02:00
  • b83cc3f5b3 llama : add Jina Embeddings architecture (#6826) Joan Fontanals 2024-05-11 09:46:09 +02:00
  • 11ca6a98cf ggml : full ALiBi support (#7192) Georgi Gerganov 2024-05-11 10:32:41 +03:00
  • 9cb317f77e ggml : full ALiBi support (#7192) Georgi Gerganov 2024-05-11 10:32:41 +03:00
  • 541cbc7081 llama-bench : add pp+tg test type (#7199) slaren 2024-05-10 18:03:54 +02:00
  • e849648888 llama-bench : add pp+tg test type (#7199) slaren 2024-05-10 18:03:54 +02:00
  • 63b0a81065 metal : fix flash attention kernel requirements (#7169) Georgi Gerganov 2024-05-10 18:20:10 +03:00
  • 18e437665c metal : fix flash attention kernel requirements (#7169) Georgi Gerganov 2024-05-10 18:20:10 +03:00
  • d2d303de31 convert : print "ignore_merges" field Georgi Gerganov 2024-05-10 17:53:04 +03:00
  • 8c660242d7 convert : print "ignore_merges" field Georgi Gerganov 2024-05-10 17:53:04 +03:00
  • e2869d1350 llama : use n_vocab to differentiate between mistral 7B and llama3 8B (#7200) slaren 2024-05-10 14:28:01 +02:00
  • 25c6e82e7a llama : use n_vocab to differentiate between mistral 7B and llama3 8B (#7200) slaren 2024-05-10 14:28:01 +02:00
  • e7cdac9331 Fix memory bug in grammar parser (#7194) Justine Tunney 2024-05-10 07:01:08 -04:00
  • 4e3880978f Fix memory bug in grammar parser (#7194) Justine Tunney 2024-05-10 07:01:08 -04:00
  • 46b54e6d5c Main+: optionally allow special tokens from user in interactive mode (#7097) HanishKVC 2024-05-10 15:51:58 +05:30
  • f89fe2732c Main+: optionally allow special tokens from user in interactive mode (#7097) HanishKVC 2024-05-10 15:51:58 +05:30
  • aa76c59db1 llava : fix moondream support (#7163) Andrei 2024-05-10 02:41:10 -04:00
  • d11afd6652 llava : fix moondream support (#7163) Andrei 2024-05-10 02:41:10 -04:00
  • c02c1a8f66 Minor arithmetic improvement to mmvq wrapper kernel (#7172) Ouadie EL FAROUKI 2024-05-10 01:32:15 +01:00
  • 8c570c9496 Minor arithmetic improvement to mmvq wrapper kernel (#7172) Ouadie EL FAROUKI 2024-05-10 01:32:15 +01:00
  • abad588dfd eval-callback : fix conversion to float (#7184) slaren 2024-05-10 01:04:12 +02:00
  • eaf4bd8b39 eval-callback : fix conversion to float (#7184) slaren 2024-05-10 01:04:12 +02:00
  • ef3d5ae6f4 Vulkan Bugfixes and Improvements (#7084) 0cc4m 2024-05-09 20:39:54 +02:00
  • befddd0f15 Vulkan Bugfixes and Improvements (#7084) 0cc4m 2024-05-09 20:39:54 +02:00
  • e48eb88bff readme : add scheduled server workflow status badge Georgi Gerganov 2024-05-09 16:40:42 +03:00
  • d46dbc76f8 readme : add scheduled server workflow status badge Georgi Gerganov 2024-05-09 16:40:42 +03:00
  • 57a6495c6e readme : add app (#6371) l3utterfly 2024-05-09 22:32:40 +09:00
  • 0961d86604 readme : add app (#6371) l3utterfly 2024-05-09 22:32:40 +09:00
  • 0e8594ab53 llama3 custom regex split (#6965) jaime-m-p 2024-05-09 15:30:44 +02:00
  • 43248e5594 llama3 custom regex split (#6965) jaime-m-p 2024-05-09 15:30:44 +02:00
  • b02a859891 CUDA: generalize FP16 fattn vec kernel (#7061) Johannes Gäßler 2024-05-09 14:32:02 +02:00
  • a743d76a01 CUDA: generalize FP16 fattn vec kernel (#7061) Johannes Gäßler 2024-05-09 14:32:02 +02:00
  • 6d8f524274 Add warning if token is invalid (#7173) Galunid 2024-05-09 14:13:05 +02:00
  • f31ec120bc Add warning if token is invalid (#7173) Galunid 2024-05-09 14:13:05 +02:00
  • 34b85f13f9 llama : update llama_timings.n_p_eval setting (#7160) Daniel Bevenius 2024-05-09 13:03:29 +02:00
  • fd9f92b154 llama : update llama_timings.n_p_eval setting (#7160) Daniel Bevenius 2024-05-09 13:03:29 +02:00
  • 54e0d47fc7 gguf-py : add special token modification capability (#7166) Sigbjørn Skjæret 2024-05-09 12:56:00 +02:00
  • 22842164bc gguf-py : add special token modification capability (#7166) Sigbjørn Skjæret 2024-05-09 12:56:00 +02:00
  • 981a5d447c opencl : alignment size converted from bits to bytes (#7090) Albert Jin 2024-05-09 17:34:37 +08:00
  • 4734524882 opencl : alignment size converted from bits to bytes (#7090) Albert Jin 2024-05-09 17:34:37 +08:00
  • f1d4e6faea TypoFix (#7162) Ahmet Zeer 2024-05-09 11:16:45 +03:00
  • 07cd41d096 TypoFix (#7162) Ahmet Zeer 2024-05-09 11:16:45 +03:00