Commit Graph

  • 11cda014ac convert.py: add mapping for safetensors bf16 (#1598) Aarni Koskela 2023-07-07 16:12:49 +03:00
  • 3e08ae99ce convert.py: add mapping for safetensors bf16 (#1598) Aarni Koskela 2023-07-07 16:12:49 +03:00
  • b29cd6ba6d Fix opencl by wrap #if-else-endif with \n (#2086) Howard Su 2023-07-07 11:34:18 +08:00
  • 481f793acc Fix opencl by wrap #if-else-endif with \n (#2086) Howard Su 2023-07-07 11:34:18 +08:00
  • 0949e52280 ggml : fix restrict usage Georgi Gerganov 2023-07-06 19:41:31 +03:00
  • dfd9fce6d6 ggml : fix restrict usage Georgi Gerganov 2023-07-06 19:41:31 +03:00
  • 56e5233320 convert : update for baichuan (#2081) Judd 2023-07-07 00:23:49 +08:00
  • 36680f6e40 convert : update for baichuan (#2081) Judd 2023-07-07 00:23:49 +08:00
  • 6815028f35 alpaca.sh : update model file name (#2074) tslmy 2023-07-06 09:17:50 -07:00
  • a17a2683d8 alpaca.sh : update model file name (#2074) tslmy 2023-07-06 09:17:50 -07:00
  • 2978dd92ec Expose generation timings from server & update completions.js (#2116) Tobias Lütke 2023-07-05 16:51:13 -04:00
  • 31cfbb1013 Expose generation timings from server & update completions.js (#2116) Tobias Lütke 2023-07-05 16:51:13 -04:00
  • 8a3fecd99b Update Server Instructions (#2113) Jesse Jojo Johnson 2023-07-05 18:03:19 +00:00
  • 983b555e9d Update Server Instructions (#2113) Jesse Jojo Johnson 2023-07-05 18:03:19 +00:00
  • bd32bac50c ggml : fix bug introduced in #1237 Georgi Gerganov 2023-07-05 20:44:11 +03:00
  • ec326d350c ggml : fix bug introduced in #1237 Georgi Gerganov 2023-07-05 20:44:11 +03:00
  • 5ac2fe8ccf tests : fix test-grad0 Georgi Gerganov 2023-07-05 20:20:05 +03:00
  • 1b6efeab82 tests : fix test-grad0 Georgi Gerganov 2023-07-05 20:20:05 +03:00
  • e0a5b08cdc ggml : generalize quantize_fns for simpler FP16 handling (#1237) Stephan Walter 2023-07-05 16:13:06 +00:00
  • 1b107b8550 ggml : generalize quantize_fns for simpler FP16 handling (#1237) Stephan Walter 2023-07-05 16:13:06 +00:00
  • 2dc0a82cd0 Update server instructions for web front end (#2103) Jesse Jojo Johnson 2023-07-05 15:13:35 +00:00
  • 8567c76b53 Update server instructions for web front end (#2103) Jesse Jojo Johnson 2023-07-05 15:13:35 +00:00
  • aa060f64de Quantized dot products for CUDA mul mat vec (#2067) Johannes Gäßler 2023-07-05 14:19:42 +02:00
  • 924dd22fd3 Quantized dot products for CUDA mul mat vec (#2067) Johannes Gäßler 2023-07-05 14:19:42 +02:00
  • 928a2061d8 llama: Don't double count the sampling time (#2107) Howard Su 2023-07-05 18:31:23 +08:00
  • 051c70dcd5 llama: Don't double count the sampling time (#2107) Howard Su 2023-07-05 18:31:23 +08:00
  • bfbd12322e Fixed OpenCL offloading prints (#2082) Johannes Gäßler 2023-07-05 08:58:05 +02:00
  • 9e4475f5cf Fixed OpenCL offloading prints (#2082) Johannes Gäßler 2023-07-05 08:58:05 +02:00
  • fdc61bd755 embd-input: Fix input embedding example unsigned int seed (#2105) Nigel Bosch 2023-07-04 18:33:33 -05:00
  • 7f0e9a775e embd-input: Fix input embedding example unsigned int seed (#2105) Nigel Bosch 2023-07-04 18:33:33 -05:00
  • d9c33ac749 readme : add link web chat PR Georgi Gerganov 2023-07-04 22:25:22 +03:00
  • b472f3fca5 readme : add link web chat PR Georgi Gerganov 2023-07-04 22:25:22 +03:00
  • 40c6a525ea ggml : sync latest (new ops, macros, refactoring) (#2106) Georgi Gerganov 2023-07-04 21:54:11 +03:00
  • ed9a54e512 ggml : sync latest (new ops, macros, refactoring) (#2106) Georgi Gerganov 2023-07-04 21:54:11 +03:00
  • d7d540c04d Add an API example using server.cpp similar to OAI. (#2009) jwj7140 2023-07-05 03:06:12 +09:00
  • f257fd2550 Add an API example using server.cpp similar to OAI. (#2009) jwj7140 2023-07-05 03:06:12 +09:00
  • 111f1b47ab Simple webchat for server (#1998) Tobias Lütke 2023-07-04 10:05:27 -04:00
  • 7ee76e45af Simple webchat for server (#1998) Tobias Lütke 2023-07-04 10:05:27 -04:00
  • f1533445e7 Allow old Make to build server. (#2098) Henri Vasserman 2023-07-04 15:38:04 +03:00
  • acc111caf9 Allow old Make to build server. (#2098) Henri Vasserman 2023-07-04 15:38:04 +03:00
  • 7342fb172e Update Makefile: clean simple (#2097) ZhouYuChen 2023-07-04 20:15:16 +08:00
  • 23c7c6fc91 Update Makefile: clean simple (#2097) ZhouYuChen 2023-07-04 20:15:16 +08:00
  • bdf2652c83 CI: make the brew update temporarily optional. (#2092) Erik Scholz 2023-07-04 01:50:12 +02:00
  • 698efad5fb CI: make the brew update temporarily optional. (#2092) Erik Scholz 2023-07-04 01:50:12 +02:00
  • 1d97415a70 [ggml] fix index for ne03 value in ggml_cl_mul_f32 (#2088) Govlzkoy 2023-07-04 07:50:00 +08:00
  • 14a2cc71f6 [ggml] fix index for ne03 value in ggml_cl_mul_f32 (#2088) Govlzkoy 2023-07-04 07:50:00 +08:00
  • 8e9801d5b0 fix server crashes (#2076) Henri Vasserman 2023-07-04 00:05:23 +03:00
  • 1cf14ccef1 fix server crashes (#2076) Henri Vasserman 2023-07-04 00:05:23 +03:00
  • b4acd69f66 Fix crash of test-tokenizer-0 under Debug build (#2064) Howard Su 2023-07-04 02:43:55 +08:00
  • cc45a7feb8 Fix crash of test-tokenizer-0 under Debug build (#2064) Howard Su 2023-07-04 02:43:55 +08:00
  • 1a58c36ca8 [llama] No need to check file version when loading vocab score (#2079) Howard Su 2023-07-03 19:58:58 +08:00
  • 55dbb915cc [llama] No need to check file version when loading vocab score (#2079) Howard Su 2023-07-03 19:58:58 +08:00
  • 552a05c6dc server: add option to output probabilities for completion (#1962) WangHaoranRobin 2023-07-03 05:38:44 +08:00
  • d7d2e6a0f0 server: add option to output probabilities for completion (#1962) WangHaoranRobin 2023-07-03 05:38:44 +08:00
  • 8fe97102e5 ggml : fix build with OpenBLAS (close #2066) Georgi Gerganov 2023-07-02 09:46:46 +03:00
  • 46088f7231 ggml : fix build with OpenBLAS (close #2066) Georgi Gerganov 2023-07-02 09:46:46 +03:00
  • 1e9814bf70 Better CUDA synchronization logic (#2057) Johannes Gäßler 2023-07-01 21:49:44 +02:00
  • 0bc2cdfc87 Better CUDA synchronization logic (#2057) Johannes Gäßler 2023-07-01 21:49:44 +02:00
  • 2764eaeaf7 Test-based VRAM scratch size + context adjustment (#2056) Johannes Gäßler 2023-07-01 21:47:26 +02:00
  • befb3a3562 Test-based VRAM scratch size + context adjustment (#2056) Johannes Gäßler 2023-07-01 21:47:26 +02:00
  • abbce9e3f9 cmake : don't force -mcpu=native on aarch64 (#2063) Daniel Drake 2023-07-01 20:31:44 +02:00
  • b213227067 cmake : don't force -mcpu=native on aarch64 (#2063) Daniel Drake 2023-07-01 20:31:44 +02:00
  • b67aea635d metal : release buffers when freeing metal context (#2062) Aaron Miller 2023-07-01 11:14:59 -07:00
  • 2f8cd979ec metal : release buffers when freeing metal context (#2062) Aaron Miller 2023-07-01 11:14:59 -07:00
  • 94e1a0ab7d convert : add support of baichuan-7b (#2055) Judd 2023-07-02 01:00:25 +08:00
  • 471aab6e4c convert : add support of baichuan-7b (#2055) Judd 2023-07-02 01:00:25 +08:00
  • fcce8eb52b llama : fix return value of llama_load_session_file_internal (#2022) Georgi Gerganov 2023-07-01 19:05:09 +03:00
  • 463f2f4c4f llama : fix return value of llama_load_session_file_internal (#2022) Georgi Gerganov 2023-07-01 19:05:09 +03:00
  • f44d5638b8 llama : catch llama_load_session_file_internal exceptions (#2022) Rand Xie 2023-07-02 00:02:58 +08:00
  • cb44dbc7de llama : catch llama_load_session_file_internal exceptions (#2022) Rand Xie 2023-07-02 00:02:58 +08:00
  • 0f42fa7e17 embd-input : fix returning ptr to temporary Georgi Gerganov 2023-07-01 18:46:00 +03:00
  • 79f634a19d embd-input : fix returning ptr to temporary Georgi Gerganov 2023-07-01 18:46:00 +03:00
  • 080d870d99 train : fix compile warning Georgi Gerganov 2023-07-01 18:45:44 +03:00
  • 04606a1599 train : fix compile warning Georgi Gerganov 2023-07-01 18:45:44 +03:00
  • 6fcaeb4790 ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995) Qingyou Meng 2023-07-01 23:42:43 +08:00
  • b1ca8f36a9 ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995) Qingyou Meng 2023-07-01 23:42:43 +08:00
  • 41ce2a335b Use unsigned for random seed (#2006) Howard Su 2023-06-29 21:15:15 +08:00
  • b8c8dda75f Use unsigned for random seed (#2006) Howard Su 2023-06-29 21:15:15 +08:00
  • f3aefd46e3 Porting the improved K-Quant CUDA kernels to OpenCL (#1966) LostRuins 2023-06-29 11:56:43 +08:00
  • 96a712ca1b Porting the improved K-Quant CUDA kernels to OpenCL (#1966) LostRuins 2023-06-29 11:56:43 +08:00
  • f3a12da865 llama : replacing auto &kv with const auto &kv (#2041) m3ndax 2023-06-28 20:39:08 +02:00
  • d3494bb86b llama : replacing auto &kv with const auto &kv (#2041) m3ndax 2023-06-28 20:39:08 +02:00
  • 1070867381 cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028) Salvador E. Tropea 2023-06-28 14:27:31 -03:00
  • 5b351e94d0 cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028) Salvador E. Tropea 2023-06-28 14:27:31 -03:00
  • 55112aa1db cuda : fix missing const qualifier in casts (#2027) Salvador E. Tropea 2023-06-28 14:26:26 -03:00
  • 6432aabb6d cuda : fix missing const qualifier in casts (#2027) Salvador E. Tropea 2023-06-28 14:26:26 -03:00
  • 6d8b07691b llama : remove shards weight file support (#2000) Howard Su 2023-06-28 10:13:02 -07:00
  • b922bc351b llama : remove shards weight file support (#2000) Howard Su 2023-06-28 10:13:02 -07:00
  • 85f06e26fe CUDA GPU acceleration for LoRAs + f16 models (#1970) Johannes Gäßler 2023-06-28 18:35:54 +02:00
  • 7f9753fa12 CUDA GPU acceleration for LoRAs + f16 models (#1970) Johannes Gäßler 2023-06-28 18:35:54 +02:00
  • 24b1895f76 llama : support input embeddings directly (#1910) ningshanwutuobang 2023-06-28 23:53:37 +08:00
  • cfa0750bc9 llama : support input embeddings directly (#1910) ningshanwutuobang 2023-06-28 23:53:37 +08:00
  • 67de6d1ed6 fix pthreads setaffinity usage on android (#2020) Erik Scholz 2023-06-27 19:06:33 +02:00
  • 9d23589d63 fix pthreads setaffinity usage on android (#2020) Erik Scholz 2023-06-27 19:06:33 +02:00
  • 0dacdf3f52 baby-llama : fix build after ggml_rope change (#2016) Howard Su 2023-06-27 13:07:13 +08:00
  • 0be54f75a6 baby-llama : fix build after ggml_rope change (#2016) Howard Su 2023-06-27 13:07:13 +08:00
  • 49d212967e llama : fix rope usage after ChatGLM change Georgi Gerganov 2023-06-27 00:37:13 +03:00
  • 181e8d9755 llama : fix rope usage after ChatGLM change Georgi Gerganov 2023-06-27 00:37:13 +03:00
  • 6ac63cb561 ggml : add support for ChatGLM RoPE Georgi Gerganov 2023-06-27 00:06:51 +03:00
  • d9779021bd ggml : add support for ChatGLM RoPE Georgi Gerganov 2023-06-27 00:06:51 +03:00