Commit Graph

  • 3aca280541 llama : add missing kv clear in llama_beam_search (#6664) David Renshaw 2024-04-14 15:24:15 -04:00
  • 1958f7e06c llama : add missing kv clear in llama_beam_search (#6664) David Renshaw 2024-04-14 15:24:15 -04:00
  • 4be3a42642 Add Command R chat template (#6650) Chao Jiang 2024-04-15 00:16:34 +08:00
  • 04fbc5f23e Add Command R chat template (#6650) Chao Jiang 2024-04-15 00:16:34 +08:00
  • f8bcf4c337 flake.lock: Update (#6669) Georgi Gerganov 2024-04-14 16:55:30 +03:00
  • f184dd9208 flake.lock: Update (#6669) Georgi Gerganov 2024-04-14 16:55:30 +03:00
  • 40eb08085b Added support for GGML_OP_CLAMP in Metal (#6662) Dave 2024-04-14 07:14:19 -04:00
  • 422c2aff1c Added support for GGML_OP_CLAMP in Metal (#6662) Dave 2024-04-14 07:14:19 -04:00
  • 10579b6e66 Fix --split-max-size (#6655) Sigbjørn Skjæret 2024-04-14 13:12:59 +02:00
  • 8800226d65 Fix --split-max-size (#6655) Sigbjørn Skjæret 2024-04-14 13:12:59 +02:00
  • 30ad7d5191 [bug fix] convert github repository_owner to lowercase (#6673) Jaemin Son 2024-04-14 20:12:36 +09:00
  • e689fc4e91 [bug fix] convert github repository_owner to lowercase (#6673) Jaemin Son 2024-04-14 20:12:36 +09:00
  • ae5d1b6888 convert : enable the --use-temp-file cli flag (#6645) James A Capozzoli 2024-04-14 04:40:18 -04:00
  • a4ec34e1cd convert : enable the --use-temp-file cli flag (#6645) James A Capozzoli 2024-04-14 04:40:18 -04:00
  • fa014c25fb fix memcpy() crash, add missed cmd in guide, fix softmax (#6622) Neo Zhang Jianyu 2024-04-14 10:42:29 +08:00
  • de17e3f745 fix memcpy() crash, add missed cmd in guide, fix softmax (#6622) Neo Zhang Jianyu 2024-04-14 10:42:29 +08:00
  • c419c245fb CUDA: fix matrix multiplication logic for tests (#6667) Johannes Gäßler 2024-04-14 00:21:55 +02:00
  • b5e7285baf CUDA: fix matrix multiplication logic for tests (#6667) Johannes Gäßler 2024-04-14 00:21:55 +02:00
  • 5dc0da80c8 model: support arch DbrxForCausalLM (#6515) Pierrick Hymbert 2024-04-13 11:33:52 +02:00
  • 4bd0f93e4a model: support arch DbrxForCausalLM (#6515) Pierrick Hymbert 2024-04-13 11:33:52 +02:00
  • 13748db703 JSON schema conversion: ️ faster repetitions, min/maxLength for strings, cap number length (#6555) Olivier Chafik 2024-04-12 19:43:38 +01:00
  • ab9a3240a9 JSON schema conversion: ️ faster repetitions, min/maxLength for strings, cap number length (#6555) Olivier Chafik 2024-04-12 19:43:38 +01:00
  • b7d8f29a1c metal : unify mul_mv_id kernels (#6556) slaren 2024-04-12 18:13:20 +02:00
  • fbbc030ba9 metal : unify mul_mv_id kernels (#6556) slaren 2024-04-12 18:13:20 +02:00
  • 695f2463f0 infill : add download instructions for model (#6626) Daniel Bevenius 2024-04-12 14:11:46 +02:00
  • 4cc120c744 infill : add download instructions for model (#6626) Daniel Bevenius 2024-04-12 14:11:46 +02:00
  • 237da9f48d server : coherent log output for KV cache full (#6637) Pierrick Hymbert 2024-04-12 13:49:21 +02:00
  • 24ee66ed0d server : coherent log output for KV cache full (#6637) Pierrick Hymbert 2024-04-12 13:49:21 +02:00
  • e9f7c11748 llama : add gguf_remove_key + remove split meta during quantize (#6591) jiez 2024-04-12 18:45:06 +08:00
  • 91c736015b llama : add gguf_remove_key + remove split meta during quantize (#6591) jiez 2024-04-12 18:45:06 +08:00
  • 28a569bc8e chore: Fix markdown warnings (#6625) Rene Leonhardt 2024-04-12 10:52:36 +02:00
  • 5c4d767ac0 chore: Fix markdown warnings (#6625) Rene Leonhardt 2024-04-12 10:52:36 +02:00
  • 0f1c6019d3 imatrix : remove invalid assert (#6632) Georgi Gerganov 2024-04-12 11:49:58 +03:00
  • ef21ce4ccb imatrix : remove invalid assert (#6632) Georgi Gerganov 2024-04-12 11:49:58 +03:00
  • acd4ddedcc Correct free memory and total memory. (#6630) MasterYi1024 2024-04-12 16:28:12 +08:00
  • dee7f8d692 Correct free memory and total memory. (#6630) MasterYi1024 2024-04-12 16:28:12 +08:00
  • f65f145cc4 eval-callback: use ggml_op_desc to pretty print unary operator name (#6631) Pierrick Hymbert 2024-04-12 10:26:47 +02:00
  • 81da18e71c eval-callback: use ggml_op_desc to pretty print unary operator name (#6631) Pierrick Hymbert 2024-04-12 10:26:47 +02:00
  • 774c09566b ci : disable Metal for macOS-latest-cmake-x64 (#6628) Georgi Gerganov 2024-04-12 11:15:05 +03:00
  • 9ed2737acc ci : disable Metal for macOS-latest-cmake-x64 (#6628) Georgi Gerganov 2024-04-12 11:15:05 +03:00
  • 841f92606e Optimization: eliminate addition of redundant stacks when advancing grammar. (#6616) Clint Herron 2024-04-11 21:44:50 -04:00
  • 04a5ac211e Optimization: eliminate addition of redundant stacks when advancing grammar. (#6616) Clint Herron 2024-04-11 21:44:50 -04:00
  • 971d23152f As suggested by @slaren, disabling Metal for test to fix CI build on OSX from #6576 (#6619) Clint Herron 2024-04-11 17:44:48 -04:00
  • f7001ccc5a As suggested by @slaren, disabling Metal for test to fix CI build on OSX from #6576 (#6619) Clint Herron 2024-04-11 17:44:48 -04:00
  • 6dddf1d64a Refactor Error Handling for CUDA (#6575) Nikolas 2024-04-11 21:56:29 +02:00
  • a474f50ebb Refactor Error Handling for CUDA (#6575) Nikolas 2024-04-11 21:56:29 +02:00
  • 5dfffb64e9 grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609) Olivier Chafik 2024-04-11 19:47:34 +01:00
  • cbaadc9294 grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609) Olivier Chafik 2024-04-11 19:47:34 +01:00
  • 82a4aa7a3a ci: download artifacts to release directory (#6612) Hugo Roussel 2024-04-11 19:52:21 +02:00
  • 1bbdaf6ecd ci: download artifacts to release directory (#6612) Hugo Roussel 2024-04-11 19:52:21 +02:00
  • 63d85af088 scripts : add --outdir option to hf.sh (#6600) Daniel Bevenius 2024-04-11 15:22:47 +02:00
  • f4183afe6a scripts : add --outdir option to hf.sh (#6600) Daniel Bevenius 2024-04-11 15:22:47 +02:00
  • b65aae6a1d eval-callback: Example how to use eval callback for debugging (#6576) Pierrick Hymbert 2024-04-11 14:51:07 +02:00
  • b804b1ef77 eval-callback: Example how to use eval callback for debugging (#6576) Pierrick Hymbert 2024-04-11 14:51:07 +02:00
  • 0a48a50efe gguf : add option to not check tensor data (#6582) Daniel Bevenius 2024-04-10 20:16:48 +02:00
  • 8228b66dbc gguf : add option to not check tensor data (#6582) Daniel Bevenius 2024-04-10 20:16:48 +02:00
  • f343ac422b minor layout improvements (#6572) Ralph Soika 2024-04-10 19:18:25 +02:00
  • b3a96f27f0 minor layout improvements (#6572) Ralph Soika 2024-04-10 19:18:25 +02:00
  • f5fab5862b llama : add model types for mixtral (#6589) slaren 2024-04-10 17:24:14 +02:00
  • 4f407a0a35 llama : add model types for mixtral (#6589) slaren 2024-04-10 17:24:14 +02:00
  • 6dcf19070c convert.py : add consolidated.safetensors for mixtral 8x22b (#6587) slaren 2024-04-10 15:23:12 +02:00
  • 65c64dc36f convert.py : add consolidated.safetensors for mixtral 8x22b (#6587) slaren 2024-04-10 15:23:12 +02:00
  • 7e0ca3abb2 docs : how to add a model (#6565) Pierrick Hymbert 2024-04-10 08:58:48 +02:00
  • 67fac4b95f docs : how to add a model (#6565) Pierrick Hymbert 2024-04-10 08:58:48 +02:00
  • d885d8c317 readme : fix ROCm link (#6579) Artem Zinnatullin 2024-04-10 00:49:12 -06:00
  • 29122d32ac readme : fix ROCm link (#6579) Artem Zinnatullin 2024-04-10 00:49:12 -06:00
  • 3a4635aa6f readme : update UI list (#6560) sjxx 2024-04-10 14:34:00 +08:00
  • b231b37b09 readme : update UI list (#6560) sjxx 2024-04-10 14:34:00 +08:00
  • 6c55775ebb readme: fix typo in amdgpu target name (#6573) Jiří Sejkora 2024-04-10 00:23:02 +02:00
  • ba5e134e07 readme: fix typo in amdgpu target name (#6573) Jiří Sejkora 2024-04-10 00:23:02 +02:00
  • ac727da885 BERT tokenizer fixes (#6498) Jared Van Bortel 2024-04-09 13:44:08 -04:00
  • 1b67731e18 BERT tokenizer fixes (#6498) Jared Van Bortel 2024-04-09 13:44:08 -04:00
  • 61e36e1532 sync : ggml Georgi Gerganov 2024-04-09 20:29:06 +03:00
  • c4a3a4ff47 sync : ggml Georgi Gerganov 2024-04-09 20:29:06 +03:00
  • f63aee5c16 server : detect search query to start webchat (#6554) Ed Lee 2024-04-09 01:31:47 -07:00
  • 400d5d722d server : detect search query to start webchat (#6554) Ed Lee 2024-04-09 01:31:47 -07:00
  • 78aade9c48 llama : add Command R Plus support (#6491) Carolinabanana 2024-04-09 09:16:13 +01:00
  • 5dc9dd7152 llama : add Command R Plus support (#6491) Carolinabanana 2024-04-09 09:16:13 +01:00
  • 41c01483dd license : update copyright notice + add AUTHORS (#6405) Georgi Gerganov 2024-04-09 09:23:19 +03:00
  • e11a8999b5 license : update copyright notice + add AUTHORS (#6405) Georgi Gerganov 2024-04-09 09:23:19 +03:00
  • fa18a55bd8 llama : fix attention layer count sanity check (#6550) Georgi Gerganov 2024-04-08 22:25:49 +03:00
  • cc4a95426d llama : fix attention layer count sanity check (#6550) Georgi Gerganov 2024-04-08 22:25:49 +03:00
  • 2324e00feb Comment explaining a decision (#6531) kunnis 2024-04-08 10:44:19 -05:00
  • cecd8d3c98 Comment explaining a decision (#6531) kunnis 2024-04-08 10:44:19 -05:00
  • f6d42b6366 quantize : fix precedence of cli args (#6541) Georgi Gerganov 2024-04-08 16:23:01 +03:00
  • b73e564b16 quantize : fix precedence of cli args (#6541) Georgi Gerganov 2024-04-08 16:23:01 +03:00
  • d365f350b5 llama : support negative ith in llama_get_ API (#6519) Rick G 2024-04-08 06:02:30 -07:00
  • e3c337d87c llama : support negative ith in llama_get_ API (#6519) Rick G 2024-04-08 06:02:30 -07:00
  • c7959808aa llama : save and restore kv cache for single seq id (#6341) Jan Boon 2024-04-08 20:43:30 +08:00
  • beea6e1b16 llama : save and restore kv cache for single seq id (#6341) Jan Boon 2024-04-08 20:43:30 +08:00
  • 262b8ddd2c remove row=1 cond (#6532) Abhilash Majumder 2024-04-08 13:56:01 +05:30
  • 87fb5b4234 remove row=1 cond (#6532) Abhilash Majumder 2024-04-08 13:56:01 +05:30
  • 5417e85bbb Adding KodiBot to UI list (#6535) Firat 2024-04-08 00:48:29 -07:00
  • d752327c33 Adding KodiBot to UI list (#6535) Firat 2024-04-08 00:48:29 -07:00
  • 0b06becdd7 Change Windows AMD example to release build to make inference much faster. (#6525) Mark Fairbairn 2024-04-07 19:52:19 +01:00
  • 855f54402e Change Windows AMD example to release build to make inference much faster. (#6525) Mark Fairbairn 2024-04-07 19:52:19 +01:00
  • 9bcd3315a1 flake.lock: Update (#6517) Georgi Gerganov 2024-04-07 21:25:30 +03:00
  • b909236c0b flake.lock: Update (#6517) Georgi Gerganov 2024-04-07 21:25:30 +03:00
  • 9b88711670 Add GritLM as supported models. (#6513) DAN™ 2024-04-07 13:33:59 -04:00
  • e0717e751e Add GritLM as supported models. (#6513) DAN™ 2024-04-07 13:33:59 -04:00