Commit Graph

  • 69ff61397d llama : support models without vocabulary (#5798) Michael Podvitskiy 2024-03-14 17:21:56 +01:00
  • af8deb1449 embedding : add EOS token if not present (#899) Georgi Gerganov 2024-03-14 15:14:14 +02:00
  • 044ec4b2a5 embedding : add EOS token if not present (#899) Georgi Gerganov 2024-03-14 15:14:14 +02:00
  • 0a538d4614 gguf-py : fix dtype check (#6045) Georgi Gerganov 2024-03-14 13:32:14 +02:00
  • 77178eedc8 gguf-py : fix dtype check (#6045) Georgi Gerganov 2024-03-14 13:32:14 +02:00
  • 01cd238406 readme : improve readme for Llava-1.6 example (#6044) Jian Liao 2024-03-14 04:18:23 -07:00
  • 15a333260a readme : improve readme for Llava-1.6 example (#6044) Jian Liao 2024-03-14 04:18:23 -07:00
  • 0c0c0276af server: disable debug release type sanitizer, simplify trigger (#6047) Pierrick Hymbert 2024-03-14 12:15:39 +01:00
  • 43241adf22 server: disable debug release type sanitizer, simplify trigger (#6047) Pierrick Hymbert 2024-03-14 12:15:39 +01:00
  • c4253c1ef9 llama : fix typo Georgi Gerganov 2024-03-14 13:13:06 +02:00
  • a44bc969e4 llama : fix typo Georgi Gerganov 2024-03-14 13:13:06 +02:00
  • 4390300db7 llama : optimize defrag moves + fix fragmentation calculation (#6037) Michael Podvitskiy 2024-03-14 11:56:48 +01:00
  • 2c4fb69246 llama : optimize defrag moves + fix fragmentation calculation (#6037) Michael Podvitskiy 2024-03-14 11:56:48 +01:00
  • cb903ba055 gguf-py : add support for I8, I16 and I32 (#6045) Ondřej Čertík 2024-03-14 04:40:14 -06:00
  • 3ca23481dd gguf-py : add support for I8, I16 and I32 (#6045) Ondřej Čertík 2024-03-14 04:40:14 -06:00
  • fdef39cfd9 ggml : designate enum vals for integer types (#6050) Georgi Gerganov 2024-03-14 12:38:37 +02:00
  • 3fe8d7a17f ggml : designate enum vals for integer types (#6050) Georgi Gerganov 2024-03-14 12:38:37 +02:00
  • 286e9cb050 embedding : print all resulting embeddings (#899) Georgi Gerganov 2024-03-14 12:37:20 +02:00
  • 68265ebfc6 embedding : print all resulting embeddings (#899) Georgi Gerganov 2024-03-14 12:37:20 +02:00
  • faa83237eb metal : build metallib + fix embed path (#6015) Georgi Gerganov 2024-03-14 11:55:23 +02:00
  • 381da2d9f0 metal : build metallib + fix embed path (#6015) Georgi Gerganov 2024-03-14 11:55:23 +02:00
  • 0d197c3a0d embedding : print cosine similarity (#899) Georgi Gerganov 2024-03-14 10:12:29 +02:00
  • 0fd6c1f015 embedding : print cosine similarity (#899) Georgi Gerganov 2024-03-14 10:12:29 +02:00
  • d2694e37ea readme : update details about running llama in Termux on Android (#6039) Linwei Wang 2024-03-14 02:34:40 +08:00
  • 19885d205e readme : update details about running llama in Termux on Android (#6039) Linwei Wang 2024-03-14 02:34:40 +08:00
  • 847ed47b30 readme : update API changes and hot topics Georgi Gerganov 2024-03-13 20:33:56 +02:00
  • 76a936c893 readme : update API changes and hot topics Georgi Gerganov 2024-03-13 20:33:56 +02:00
  • 4a6d25766b grammar : handle missing "root" node (#6004) Clint Herron 2024-03-13 14:10:40 -04:00
  • 463628372d grammar : handle missing "root" node (#6004) Clint Herron 2024-03-13 14:10:40 -04:00
  • f88d2005a4 llama : add pipeline parallelism support (#6017) slaren 2024-03-13 18:54:21 +01:00
  • f30ea47a87 llama : add pipeline parallelism support (#6017) slaren 2024-03-13 18:54:21 +01:00
  • dc1bd94e29 test-backend-ops : skip CPU backend by default (#6028) slaren 2024-03-13 14:58:30 +01:00
  • d8fd0ccf6a test-backend-ops : skip CPU backend by default (#6028) slaren 2024-03-13 14:58:30 +01:00
  • 714c607f32 Update get version (#6025) AidanBeltonS 2024-03-13 13:17:54 +00:00
  • b3d978600f Update get version (#6025) AidanBeltonS 2024-03-13 13:17:54 +00:00
  • 2534910086 Server: Use multi-task for embeddings endpoint (#6001) Xuan Son Nguyen 2024-03-13 11:39:11 +01:00
  • 99b71c068f Server: Use multi-task for embeddings endpoint (#6001) Xuan Son Nguyen 2024-03-13 11:39:11 +01:00
  • f91923eccf ci : remove tidy-review (#6021) slaren 2024-03-12 16:55:19 +01:00
  • 306d34be7a ci : remove tidy-review (#6021) slaren 2024-03-12 16:55:19 +01:00
  • 241e4b53bb ggml : reuse quantum structs across backends (#5943) Georgi Gerganov 2024-03-12 14:27:20 +02:00
  • 8030da7afe ggml : reuse quantum structs across backends (#5943) Georgi Gerganov 2024-03-12 14:27:20 +02:00
  • fc01fdca5f ggml : fix UB in IQ2_S and IQ3_S (#6012) Georgi Gerganov 2024-03-12 13:49:55 +02:00
  • 184215e783 ggml : fix UB in IQ2_S and IQ3_S (#6012) Georgi Gerganov 2024-03-12 13:49:55 +02:00
  • 2e3cc9fefa sycl : update IQ1_S kernels (WIP - not working!) (#5995) Georgi Gerganov 2024-03-12 11:15:05 +02:00
  • 48358b2e5b sycl : update IQ1_S kernels (WIP - not working!) (#5995) Georgi Gerganov 2024-03-12 11:15:05 +02:00
  • e0103d6c83 grammar : fix unnecessarily retained pointer to rules (#6003) gliptic 2024-03-11 20:59:03 +01:00
  • 5cdb371731 grammar : fix unnecessarily retained pointer to rules (#6003) gliptic 2024-03-11 20:59:03 +01:00
  • ab58721a77 1.5 bit: we can do even better (#5999) Kawrakow 2024-03-11 16:53:15 +01:00
  • 44ca159faf 1.5 bit: we can do even better (#5999) Kawrakow 2024-03-11 16:53:15 +01:00
  • 106423ad26 llama : more consistent names of count variables (#5994) Georgi Gerganov 2024-03-11 17:49:47 +02:00
  • 05b06210c9 llama : more consistent names of count variables (#5994) Georgi Gerganov 2024-03-11 17:49:47 +02:00
  • f60b94d486 llama : refactor unicode stuff (#5992) Georgi Gerganov 2024-03-11 17:47:47 +02:00
  • 83796e62bc llama : refactor unicode stuff (#5992) Georgi Gerganov 2024-03-11 17:47:47 +02:00
  • 6e5dfeb7bc Update server docker image URLs (#5997) Jakub N 2024-03-11 14:40:42 +01:00
  • 828defefb6 Update server docker image URLs (#5997) Jakub N 2024-03-11 14:40:42 +01:00
  • 2375a2e6d9 Server: format error to json (#5961) Xuan Son Nguyen 2024-03-11 10:56:41 +01:00
  • caa106d4e0 Server: format error to json (#5961) Xuan Son Nguyen 2024-03-11 10:56:41 +01:00
  • c63425b789 ggml, ci : Windows ARM runner and build fixes (#5979) Michael Podvitskiy 2024-03-11 10:28:51 +01:00
  • 3202361c5b ggml, ci : Windows ARM runner and build fixes (#5979) Michael Podvitskiy 2024-03-11 10:28:51 +01:00
  • 1feee1846b server : maintain chat completion id for streaming responses (#5988) Minsoo Cheong 2024-03-11 17:09:32 +09:00
  • 332bdfd798 server : maintain chat completion id for streaming responses (#5988) Minsoo Cheong 2024-03-11 17:09:32 +09:00
  • 40a6355a97 cmake : fix subdir for LLAMA_METAL_EMBED_LIBRARY (#5985) Gilad S 2024-03-11 10:00:08 +02:00
  • ecab1c75de cmake : fix subdir for LLAMA_METAL_EMBED_LIBRARY (#5985) Gilad S 2024-03-11 10:00:08 +02:00
  • bbbde69c2e llama : fix F16/F32 downcast + improve names (#5980) Georgi Gerganov 2024-03-11 09:56:47 +02:00
  • ee35600b90 llama : fix F16/F32 downcast + improve names (#5980) Georgi Gerganov 2024-03-11 09:56:47 +02:00
  • a6450d8a96 Better 1.5 bit quantization (#5971) Kawrakow 2024-03-11 07:51:49 +01:00
  • be858f6205 Better 1.5 bit quantization (#5971) Kawrakow 2024-03-11 07:51:49 +01:00
  • 9a7bd8975a [SYCL] Add q3_s and q1_s (#5886) Abhilash Majumder 2024-03-11 10:27:56 +05:30
  • ef3ced26a3 [SYCL] Add q3_s and q1_s (#5886) Abhilash Majumder 2024-03-11 10:27:56 +05:30
  • cbf6526992 [SYCL] Add support for SYCL Nvidia target (#5738) AidanBeltonS 2024-03-11 01:13:57 +00:00
  • 3814a07392 [SYCL] Add support for SYCL Nvidia target (#5738) AidanBeltonS 2024-03-11 01:13:57 +00:00
  • f345019c58 metal : move mm_id indices to shared mem (#5982) Georgi Gerganov 2024-03-10 23:12:48 +02:00
  • bb6d00bbf9 metal : move mm_id indices to shared mem (#5982) Georgi Gerganov 2024-03-10 23:12:48 +02:00
  • a803b110e1 android : fix utf8 decoding error (#5935) Dean 2024-03-11 04:03:17 +08:00
  • 7ab7b733bb android : fix utf8 decoding error (#5935) Dean 2024-03-11 04:03:17 +08:00
  • 15d7c286ac readme : update hot topics Georgi Gerganov 2024-03-10 20:58:26 +02:00
  • d9f65c97c3 readme : update hot topics Georgi Gerganov 2024-03-10 20:58:26 +02:00
  • 2c0917ae95 sync : ggml Georgi Gerganov 2024-03-10 20:10:46 +02:00
  • b838b53ad6 sync : ggml Georgi Gerganov 2024-03-10 20:10:46 +02:00
  • 59a9930d5d ggml : try fix 32-bit arm compat (whisper/1938) Georgi Gerganov 2024-03-08 23:45:07 +02:00
  • df4dc3e7cb ggml : try fix 32-bit arm compat (whisper/1938) Georgi Gerganov 2024-03-08 23:45:07 +02:00
  • 39c13a13d4 ggml : remove __constant__ specifier for CUDA tables (#5940) Georgi Gerganov 2024-03-10 20:09:24 +02:00
  • bf47a5eefc ggml : remove __constant__ specifier for CUDA tables (#5940) Georgi Gerganov 2024-03-10 20:09:24 +02:00
  • f16deb64a8 server: ci: windows build and tests (#5968) Pierrick Hymbert 2024-03-10 18:17:47 +01:00
  • fa8a809a91 server: ci: windows build and tests (#5968) Pierrick Hymbert 2024-03-10 18:17:47 +01:00
  • 4a3dc2ed36 llama : add support for GritLM (#5959) DAN™ 2024-03-10 11:56:30 -04:00
  • bcebd7dbf6 llama : add support for GritLM (#5959) DAN™ 2024-03-10 11:56:30 -04:00
  • 7c7879e85f grammar : verify parsed state (#5950) Clint Herron 2024-03-10 11:17:43 -04:00
  • 2960eae847 grammar : verify parsed state (#5950) Clint Herron 2024-03-10 11:17:43 -04:00
  • 8a8474c4f4 nix: update flake.lock (#5969) Georgi Gerganov 2024-03-10 16:43:08 +02:00
  • c78541479c nix: update flake.lock (#5969) Georgi Gerganov 2024-03-10 16:43:08 +02:00
  • 38b993cd39 server: benchmark: chat/completions scenario and other llm servers comparison (#5941) Pierrick Hymbert 2024-03-09 23:41:49 +01:00
  • 621e86b331 server: benchmark: chat/completions scenario and other llm servers comparison (#5941) Pierrick Hymbert 2024-03-09 23:41:49 +01:00
  • 3fd87918fd server : print chat template info Georgi Gerganov 2024-03-09 22:04:00 +02:00
  • 77d1ac7e00 server : print chat template info Georgi Gerganov 2024-03-09 22:04:00 +02:00
  • a96d02297c perplexity : support using multiple sequences to allow larger batch sizes (#5946) slaren 2024-03-09 19:55:54 +01:00
  • d894f352bf perplexity : support using multiple sequences to allow larger batch sizes (#5946) slaren 2024-03-09 19:55:54 +01:00
  • 8467a534cb readme : update hot topics Georgi Gerganov 2024-03-09 18:14:13 +02:00
  • 098dbaab44 readme : update hot topics Georgi Gerganov 2024-03-09 18:14:13 +02:00
  • 6988d2fe43 ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (#5951) Georgi Gerganov 2024-03-09 17:36:20 +02:00