Commit Graph

  • 3a48d558a6 metal : replace loop of dispatch_async with dispatch_apply (#4934) Alex Azarov 2024-01-16 14:41:27 +01:00
  • b171b1b08a metal : log recommendedMaxWorkingSetSize on iOS 16+ (#4936) Alex Azarov 2024-01-16 14:33:02 +01:00
  • 7c8d3abd1a metal : log recommendedMaxWorkingSetSize on iOS 16+ (#4936) Alex Azarov 2024-01-16 14:33:02 +01:00
  • c53c0e8d1a examples : fix and improv docs for the grammar generator (#4909) Maximilian Winter 2024-01-16 13:10:48 +01:00
  • 122ed4840c examples : fix and improv docs for the grammar generator (#4909) Maximilian Winter 2024-01-16 13:10:48 +01:00
  • bcc7c68e4b ggml : introduce GGML_CALL function annotation (#4850) Justine Tunney 2024-01-16 03:16:33 -08:00
  • a0b3ac8c48 ggml : introduce GGML_CALL function annotation (#4850) Justine Tunney 2024-01-16 03:16:33 -08:00
  • dd7b94d290 finetune : use LLAMA_FILE_MAGIC_GGLA (#4961) Daniel Bevenius 2024-01-16 12:14:19 +01:00
  • d75c232e1d finetune : use LLAMA_FILE_MAGIC_GGLA (#4961) Daniel Bevenius 2024-01-16 12:14:19 +01:00
  • d55fdc7382 speculative : threading options (#4959) stduhpf 2024-01-16 12:04:32 +01:00
  • e0324285a5 speculative : threading options (#4959) stduhpf 2024-01-16 12:04:32 +01:00
  • 663c94457d pass cpu-architecture arguments only to host code (C;C++) (#4943) ngc92 2024-01-15 20:40:48 +02:00
  • 3e5ca7931c pass cpu-architecture arguments only to host code (C;C++) (#4943) ngc92 2024-01-15 20:40:48 +02:00
  • af3d54c2c0 llama : apply classifier-free guidance to logits directly (#4951) David Friehs 2024-01-15 14:06:52 +01:00
  • 4483396751 llama : apply classifier-free guidance to logits directly (#4951) David Friehs 2024-01-15 14:06:52 +01:00
  • e8fb5e8b7f awq-py : fix typo in awq-py/README.md (#4947) Victor Z. Peng 2024-01-15 04:41:46 -08:00
  • d9aa4ffa6e awq-py : fix typo in awq-py/README.md (#4947) Victor Z. Peng 2024-01-15 04:41:46 -08:00
  • 654e0fc991 cuda : fix dequantize kernel names (#4938) Georgi Gerganov 2024-01-15 13:27:00 +02:00
  • ddb008d845 cuda : fix dequantize kernel names (#4938) Georgi Gerganov 2024-01-15 13:27:00 +02:00
  • 3a1d71c3f4 llama : check for 256 divisibility for IQ2_XS, IQ2_XXS (#4950) Kawrakow 2024-01-15 10:09:38 +02:00
  • 2faaef3979 llama : check for 256 divisibility for IQ2_XS, IQ2_XXS (#4950) Kawrakow 2024-01-15 10:09:38 +02:00
  • 0cd36d2237 CUDA: faster dequantize kernels for Q4_0 and Q4_1 (#4938) Kawrakow 2024-01-15 07:48:06 +02:00
  • 4a3156de2f CUDA: faster dequantize kernels for Q4_0 and Q4_1 (#4938) Kawrakow 2024-01-15 07:48:06 +02:00
  • 584e39b210 llama : fix missing quotes (#4937) David Pflug 2024-01-14 10:46:00 -05:00
  • a836c8f534 llama : fix missing quotes (#4937) David Pflug 2024-01-14 10:46:00 -05:00
  • 4629f415ed Add ability to use importance matrix for all k-quants (#4930) Kawrakow 2024-01-14 16:21:12 +02:00
  • 467a882fd2 Add ability to use importance matrix for all k-quants (#4930) Kawrakow 2024-01-14 16:21:12 +02:00
  • b9d3d4307d llama : check LLAMA_TRACE env for extra logging (#4929) Georgi Gerganov 2024-01-14 13:26:53 +02:00
  • bb0c139247 llama : check LLAMA_TRACE env for extra logging (#4929) Georgi Gerganov 2024-01-14 13:26:53 +02:00
  • 97e013bd09 scripts : sync-ggml-am.sh option to skip commits Georgi Gerganov 2024-01-14 11:08:09 +02:00
  • 9408cfdad6 scripts : sync-ggml-am.sh option to skip commits Georgi Gerganov 2024-01-14 11:08:09 +02:00
  • ccd0d21f80 llama : use LLAMA_LOG_ macros for logging Georgi Gerganov 2024-01-14 11:03:19 +02:00
  • 03c5267490 llama : use LLAMA_LOG_ macros for logging Georgi Gerganov 2024-01-14 11:03:19 +02:00
  • 1fb783d4df Fix ffn_down quantization mix for MoE models (#4927) Kawrakow 2024-01-14 10:53:39 +02:00
  • a128c38de8 Fix ffn_down quantization mix for MoE models (#4927) Kawrakow 2024-01-14 10:53:39 +02:00
  • 6c92cc5bb7 metal : correctly set SIMD support flags on iOS (#4923) Alex Azarov 2024-01-14 09:44:39 +01:00
  • 5f5fe1bd60 metal : correctly set SIMD support flags on iOS (#4923) Alex Azarov 2024-01-14 09:44:39 +01:00
  • e77c6d2b23 llama : support WinXP build with MinGW 8.1.0 (#3419) Karthik Kumar Viswanathan 2024-01-14 00:41:44 -08:00
  • ac32902a87 llama : support WinXP build with MinGW 8.1.0 (#3419) Karthik Kumar Viswanathan 2024-01-14 00:41:44 -08:00
  • f6d2b332f6 2-bit quantizations (#4897) Kawrakow 2024-01-14 09:45:56 +02:00
  • 147b17ac94 2-bit quantizations (#4897) Kawrakow 2024-01-14 09:45:56 +02:00
  • b56c35d2ef Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B (#4906) Kawrakow 2024-01-14 09:44:30 +02:00
  • 807179ec58 Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B (#4906) Kawrakow 2024-01-14 09:44:30 +02:00
  • 2e51f37f77 sync : ggml Georgi Gerganov 2024-01-14 00:14:46 +02:00
  • 76484fbfd3 sync : ggml Georgi Gerganov 2024-01-14 00:14:46 +02:00
  • f2426af97b ggml: cache sin/cos for RoPE (#4908) Johannes Gäßler 2024-01-13 21:41:37 +01:00
  • c71d608ce7 ggml: cache sin/cos for RoPE (#4908) Johannes Gäßler 2024-01-13 21:41:37 +01:00
  • 2329ddc378 metal : remove old API (#4919) Georgi Gerganov 2024-01-13 20:45:45 +02:00
  • 4be5ef556d metal : remove old API (#4919) Georgi Gerganov 2024-01-13 20:45:45 +02:00
  • 8db1eb43ce server : fix prompt caching with system prompt (#4914) Georgi Gerganov 2024-01-13 19:31:26 +02:00
  • 0ea069b87b server : fix prompt caching with system prompt (#4914) Georgi Gerganov 2024-01-13 19:31:26 +02:00
  • cc46b61450 llama : fix detokenization of non-special added-tokens (#4916) Georgi Gerganov 2024-01-13 18:47:38 +02:00
  • f172de03f1 llama : fix detokenization of non-special added-tokens (#4916) Georgi Gerganov 2024-01-13 18:47:38 +02:00
  • 1b05b8ef88 metal : disable log for loaded kernels (#4794) Georgi Gerganov 2024-01-13 18:46:37 +02:00
  • 2d57de5255 metal : disable log for loaded kernels (#4794) Georgi Gerganov 2024-01-13 18:46:37 +02:00
  • 9bee33a18c llama : minimize size used for state save/load (#4820) David Friehs 2024-01-13 17:29:43 +01:00
  • df845cc982 llama : minimize size used for state save/load (#4820) David Friehs 2024-01-13 17:29:43 +01:00
  • 0930ffa61d workflows: unbreak nix-build-aarch64, and split it out (#4915) Someone 2024-01-13 16:29:16 +00:00
  • 6b48ed0893 workflows: unbreak nix-build-aarch64, and split it out (#4915) Someone 2024-01-13 16:29:16 +00:00
  • cd34c8a010 main : add parameter --no-display-prompt (#4541) Yann Follet 2024-01-14 00:09:08 +08:00
  • 722d33f34e main : add parameter --no-display-prompt (#4541) Yann Follet 2024-01-14 00:09:08 +08:00
  • 6bc73ad492 gguf : fix potential infinite for-loop (#4600) texmex76 2024-01-13 17:06:20 +01:00
  • c30b1ef39a gguf : fix potential infinite for-loop (#4600) texmex76 2024-01-13 17:06:20 +01:00
  • ac78d391cb metal : refactor kernel loading code (#4794) Georgi Gerganov 2024-01-13 18:03:45 +02:00
  • b38b5e93ae metal : refactor kernel loading code (#4794) Georgi Gerganov 2024-01-13 18:03:45 +02:00
  • 7079775d2b compare-llama-bench: tweak output format (#4910) Johannes Gäßler 2024-01-13 15:52:53 +01:00
  • 7dc78764e2 compare-llama-bench: tweak output format (#4910) Johannes Gäßler 2024-01-13 15:52:53 +01:00
  • a51d98e6ba server : fix deadlock that occurs in multi-prompt scenarios (#4905) Ziad Ben Hadj-Alouane 2024-01-13 09:20:46 -05:00
  • 356327feb3 server : fix deadlock that occurs in multi-prompt scenarios (#4905) Ziad Ben Hadj-Alouane 2024-01-13 09:20:46 -05:00
  • a4a5e25c95 server : fix crash with multimodal models without BOS token (#4904) makomk 2024-01-13 14:16:11 +00:00
  • ee8243adaa server : fix crash with multimodal models without BOS token (#4904) makomk 2024-01-13 14:16:11 +00:00
  • cc5f0d2aeb convert : update phi-2 to latest HF repo (#4903) Georgi Gerganov 2024-01-13 13:44:37 +02:00
  • 15ebe59210 convert : update phi-2 to latest HF repo (#4903) Georgi Gerganov 2024-01-13 13:44:37 +02:00
  • 211567ae1a sync : ggml Georgi Gerganov 2024-01-12 22:02:43 +02:00
  • de473f5f8e sync : ggml Georgi Gerganov 2024-01-12 22:02:43 +02:00
  • 8a8a831cef ggml : fix 32-bit ARM compat for IQ2_XS (whisper/1758) Georgi Gerganov 2024-01-12 14:02:30 +02:00
  • f238461236 ggml : fix 32-bit ARM compat for IQ2_XS (whisper/1758) Georgi Gerganov 2024-01-12 14:02:30 +02:00
  • b103cfe655 backend_sched : fix assignments slaren 2024-01-12 20:38:34 +01:00
  • fa5c1fb44a backend_sched : fix assignments slaren 2024-01-12 20:38:34 +01:00
  • 9342050de3 examples : add pydantic models to GBNF grammar generator (#4883) Maximilian Winter 2024-01-12 20:46:45 +01:00
  • 52ee4540c0 examples : add pydantic models to GBNF grammar generator (#4883) Maximilian Winter 2024-01-12 20:46:45 +01:00
  • 50f828eab9 CUDA: faster q8_0 -> f16 dequantization (#4895) Johannes Gäßler 2024-01-12 20:38:54 +01:00
  • 3fe81781e3 CUDA: faster q8_0 -> f16 dequantization (#4895) Johannes Gäßler 2024-01-12 20:38:54 +01:00
  • 882a16a127 llama : ggml-backend integration (#4766) slaren 2024-01-12 20:07:38 +01:00
  • e7e4df031b llama : ggml-backend integration (#4766) slaren 2024-01-12 20:07:38 +01:00
  • ab969b80d4 llama : remove redundant assert for StableLM (#4901) Georgi Gerganov 2024-01-12 20:54:12 +02:00
  • 584d674be6 llama : remove redundant assert for StableLM (#4901) Georgi Gerganov 2024-01-12 20:54:12 +02:00
  • e55262208d export-lora : use LLAMA_FILE_MAGIC_GGLA (#4894) Daniel Bevenius 2024-01-12 18:54:53 +01:00
  • 930f907d3e export-lora : use LLAMA_FILE_MAGIC_GGLA (#4894) Daniel Bevenius 2024-01-12 18:54:53 +01:00
  • 9905daaaa3 llama.swiftui : update models layout (#4826) Zay 2024-01-12 05:48:00 -07:00
  • e790eef21c llama.swiftui : update models layout (#4826) Zay 2024-01-12 05:48:00 -07:00
  • cda6c08cb5 gitignore : imatrix Georgi Gerganov 2024-01-12 14:33:21 +02:00
  • 5537d9d36b gitignore : imatrix Georgi Gerganov 2024-01-12 14:33:21 +02:00
  • d80c163451 CUDA: fix softmax compile for old CUDA versions (#4862) Johannes Gäßler 2024-01-12 12:30:41 +01:00
  • 1b280c9fff CUDA: fix softmax compile for old CUDA versions (#4862) Johannes Gäßler 2024-01-12 12:30:41 +01:00
  • 060cb10825 llama : fix typo "imp_embd" -> "inp_embd" Georgi Gerganov 2024-01-12 13:10:19 +02:00
  • 3cabe80630 llama : fix typo "imp_embd" -> "inp_embd" Georgi Gerganov 2024-01-12 13:10:19 +02:00
  • 55dcfe3b5a common : streamline the formatting of help (#4890) howlger 2024-01-12 12:05:32 +01:00
  • 4315a94366 common : streamline the formatting of help (#4890) howlger 2024-01-12 12:05:32 +01:00
  • 6cd8a0d214 py : fix lint (#4889) Georgi Gerganov 2024-01-12 13:03:38 +02:00