Commit Graph

  • ba0c7c70ab Vulkan k-quant mmq and ggml-backend offload functionality (#6155) 0cc4m 2024-03-29 17:29:21 +01:00
  • 537fc022b8 sync : ggml (#6351) Georgi Gerganov 2024-03-29 17:45:46 +02:00
  • d48ccf3ad4 sync : ggml (#6351) Georgi Gerganov 2024-03-29 17:45:46 +02:00
  • 7861298830 [Model] Add support for xverse (#6301) hxer7963 2024-03-29 21:37:03 +08:00
  • 069574775c [Model] Add support for xverse (#6301) hxer7963 2024-03-29 21:37:03 +08:00
  • 71ba2b4748 ci : fix BGE wget (#6383) Georgi Gerganov 2024-03-29 14:34:28 +02:00
  • cfde806eb9 ci : fix BGE wget (#6383) Georgi Gerganov 2024-03-29 14:34:28 +02:00
  • 2f09ee47af readme : add project (#6356) zhouwg 2024-03-29 15:33:46 +08:00
  • b910287954 readme : add project (#6356) zhouwg 2024-03-29 15:33:46 +08:00
  • 8ae35eee39 cmake : add explicit metal version options (#6370) Matt Clayton 2024-03-29 03:27:42 -04:00
  • 8093987090 cmake : add explicit metal version options (#6370) Matt Clayton 2024-03-29 03:27:42 -04:00
  • 87ef849926 llama : remove redundant reshape in build_kv_store (#6369) Daniel Bevenius 2024-03-29 08:23:22 +01:00
  • 057400a3fd llama : remove redundant reshape in build_kv_store (#6369) Daniel Bevenius 2024-03-29 08:23:22 +01:00
  • 88b3d00023 convert : allow conversion of Mistral HF models (#6144) Pedro Cuenca 2024-03-29 08:15:00 +01:00
  • b75c38166c convert : allow conversion of Mistral HF models (#6144) Pedro Cuenca 2024-03-29 08:15:00 +01:00
  • 18c218bbc8 readme : add notice for UI list Georgi Gerganov 2024-03-28 22:56:03 +02:00
  • bfe7dafc9c readme : add notice for UI list Georgi Gerganov 2024-03-28 22:56:03 +02:00
  • 2d5c7313cf [SYCL] Revisited & updated SYCL build documentation (#6141) Ouadie EL FAROUKI 2024-03-28 16:01:47 +00:00
  • 5106ef482c [SYCL] Revisited & updated SYCL build documentation (#6141) Ouadie EL FAROUKI 2024-03-28 16:01:47 +00:00
  • df064a721d convert : refactor vocab selection logic (#6355) Jared Van Bortel 2024-03-28 11:44:36 -04:00
  • be55134a53 convert : refactor vocab selection logic (#6355) Jared Van Bortel 2024-03-28 11:44:36 -04:00
  • dbf509a459 llava : fix MobileVLM (#6364) Ziang Wu 2024-03-28 22:33:10 +08:00
  • 66ba560256 llava : fix MobileVLM (#6364) Ziang Wu 2024-03-28 22:33:10 +08:00
  • af949dc9d7 llama : fix command-r inference when omitting outputs (#6367) compilade 2024-03-28 08:05:54 -04:00
  • 0308f5e3d7 llama : fix command-r inference when omitting outputs (#6367) compilade 2024-03-28 08:05:54 -04:00
  • e685e83d37 ci: bench: fix master not schedule, fix commit status failed on external repo (#6365) Pierrick Hymbert 2024-03-28 11:27:56 +01:00
  • 28cb9a09c4 ci: bench: fix master not schedule, fix commit status failed on external repo (#6365) Pierrick Hymbert 2024-03-28 11:27:56 +01:00
  • 49c535c478 doc: fix outdated default value of batch size (#6336) Ting Sun 2024-03-28 16:51:06 +08:00
  • cfc4d75df6 doc: fix outdated default value of batch size (#6336) Ting Sun 2024-03-28 16:51:06 +08:00
  • 7d68981aa1 server : stop gracefully on SIGTERM (#6348) Eric Zhang 2024-03-28 16:50:48 +08:00
  • 6902cb7f2e server : stop gracefully on SIGTERM (#6348) Eric Zhang 2024-03-28 16:50:48 +08:00
  • c31404122b nix: removed unnessesary indentation hutli 2024-03-27 19:17:30 +01:00
  • d2d8f38996 nix: removed unnessesary indentation hutli 2024-03-27 19:17:30 +01:00
  • 307f7a2c76 nix: moved blas availability check to package inputs so it is still overridable hutli 2024-03-27 19:14:28 +01:00
  • d39b308eaf nix: moved blas availability check to package inputs so it is still overridable hutli 2024-03-27 19:14:28 +01:00
  • 1adb187009 using blas.meta.available to check host platform hutli 2024-03-27 18:10:08 +01:00
  • c873976649 using blas.meta.available to check host platform hutli 2024-03-27 18:10:08 +01:00
  • 3e8bc9c8e8 only using explicit blas if hostPlatform is allowed hutli 2024-03-27 17:25:05 +01:00
  • dbb03e2b9c only using explicit blas if hostPlatform is allowed hutli 2024-03-27 17:25:05 +01:00
  • 65d6316f65 nix: .#windows: proper cross-compilation set-up Someone Serge 2024-03-26 16:22:42 +00:00
  • e9f17dc3bf nix: .#windows: proper cross-compilation set-up Someone Serge 2024-03-26 16:22:42 +00:00
  • 9a6d7fad9c nix: package: don't introduce the dependency on python Someone Serge 2024-03-26 16:22:07 +00:00
  • 22a462cc1f nix: package: don't introduce the dependency on python Someone Serge 2024-03-26 16:22:07 +00:00
  • b1babcfd2f nix: .#widnows: init hutli 2024-02-15 14:25:04 +01:00
  • f6a0f5c642 nix: .#widnows: init hutli 2024-02-15 14:25:04 +01:00
  • f236c6a0c9 doc: fix typo in MobileVLM-README.md (#6181) Ziang Wu 2024-03-28 12:03:30 +08:00
  • d0e2f6416b doc: fix typo in MobileVLM-README.md (#6181) Ziang Wu 2024-03-28 12:03:30 +08:00
  • 3e1a1444fb [SYCL] fix set main gpu crash (#6339) Neo Zhang Jianyu 2024-03-28 08:55:24 +08:00
  • 25f4a613c4 [SYCL] fix set main gpu crash (#6339) Neo Zhang Jianyu 2024-03-28 08:55:24 +08:00
  • 8b6cc1c8d4 server: continuous performance monitoring and PR comment (#6283) Pierrick Hymbert 2024-03-27 20:26:49 +01:00
  • a016026a3a server: continuous performance monitoring and PR comment (#6283) Pierrick Hymbert 2024-03-27 20:26:49 +01:00
  • 5becb0a105 nix: ci: dont test cuda and rocm (for now) Someone Serge 2024-03-27 16:17:46 +00:00
  • 53c7ec53d5 nix: ci: dont test cuda and rocm (for now) Someone Serge 2024-03-27 16:17:46 +00:00
  • 50d413394f ggml : fix bounds checking of zero size views (#6347) slaren 2024-03-27 15:07:50 +01:00
  • e5b89a441a ggml : fix bounds checking of zero size views (#6347) slaren 2024-03-27 15:07:50 +01:00
  • 5232cd21dc make : whitespace Georgi Gerganov 2024-03-27 15:02:49 +02:00
  • 3a0345970e make : whitespace Georgi Gerganov 2024-03-27 15:02:49 +02:00
  • 9567782446 embedding : show full embedding for single prompt (#6342) howlger 2024-03-27 12:15:44 +01:00
  • 1e13987fba embedding : show full embedding for single prompt (#6342) howlger 2024-03-27 12:15:44 +01:00
  • 031e0a5ea0 [SYCL] Fix batched impl for NVidia GPU (#6164) AidanBeltonS 2024-03-27 08:16:40 +00:00
  • e82f9e2b83 [SYCL] Fix batched impl for NVidia GPU (#6164) AidanBeltonS 2024-03-27 08:16:40 +00:00
  • fde3245a49 Make IQ1_M work for QK_K = 64 (#6327) Kawrakow 2024-03-27 08:44:27 +01:00
  • cbc8343619 Make IQ1_M work for QK_K = 64 (#6327) Kawrakow 2024-03-27 08:44:27 +01:00
  • 25ec05290c common : change --no-penalize-nl to --penalize-nl (#6334) Sigbjørn Skjæret 2024-03-27 08:23:10 +01:00
  • e562b9714b common : change --no-penalize-nl to --penalize-nl (#6334) Sigbjørn Skjæret 2024-03-27 08:23:10 +01:00
  • 12f31d9e2d llama2c : open file as binary (#6332) Georgi Gerganov 2024-03-27 09:16:02 +02:00
  • 2ab4f00d25 llama2c : open file as binary (#6332) Georgi Gerganov 2024-03-27 09:16:02 +02:00
  • 760e1835d8 readme : add php api bindings (#6326) Mateusz Charytoniuk 2024-03-27 08:08:59 +01:00
  • 1740d6dd4e readme : add php api bindings (#6326) Mateusz Charytoniuk 2024-03-27 08:08:59 +01:00
  • 2ce5de6282 server: public: use relative routes for static files (#6325) Eric Zhang 2024-03-27 13:55:29 +08:00
  • 0642b22cd1 server: public: use relative routes for static files (#6325) Eric Zhang 2024-03-27 13:55:29 +08:00
  • 7f6e5bb122 [SYCL] fix no file in win rel (#6314) Neo Zhang Jianyu 2024-03-27 09:47:06 +08:00
  • a4f569e8a3 [SYCL] fix no file in win rel (#6314) Neo Zhang Jianyu 2024-03-27 09:47:06 +08:00
  • e8b00e1e36 wpm : portable unicode tolower (#6305) Jared Van Bortel 2024-03-26 17:46:21 -04:00
  • 32c8486e1f wpm : portable unicode tolower (#6305) Jared Van Bortel 2024-03-26 17:46:21 -04:00
  • de7b41ef94 llama : greatly reduce output buffer memory usage (#6122) compilade 2024-03-26 10:46:41 -04:00
  • 557410b8f0 llama : greatly reduce output buffer memory usage (#6122) compilade 2024-03-26 10:46:41 -04:00
  • ab7258efcb IQ1_M: 1.75 bpw quantization (#6302) Kawrakow 2024-03-26 15:21:27 +01:00
  • 55c1b2a3bb IQ1_M: 1.75 bpw quantization (#6302) Kawrakow 2024-03-26 15:21:27 +01:00
  • 33594ea8cb convert-hf : fix exception in sentencepiece with added tokens (#6320) Pedro Cuenca 2024-03-26 13:32:19 +01:00
  • e097633f63 convert-hf : fix exception in sentencepiece with added tokens (#6320) Pedro Cuenca 2024-03-26 13:32:19 +01:00
  • aa1647413e quantize : be able to override metadata by key (#6321) Kawrakow 2024-03-26 13:09:30 +01:00
  • d25b1c31b0 quantize : be able to override metadata by key (#6321) Kawrakow 2024-03-26 13:09:30 +01:00
  • 9f5bfcc851 embedding : adjust n_ubatch value (#6296) Minsoo Cheong 2024-03-26 18:11:46 +09:00
  • deb7240100 embedding : adjust n_ubatch value (#6296) Minsoo Cheong 2024-03-26 18:11:46 +09:00
  • 377334f208 server : add n_discard parameter (#6300) Jan Boon 2024-03-26 16:47:43 +08:00
  • 3d032ece8e server : add n_discard parameter (#6300) Jan Boon 2024-03-26 16:47:43 +08:00
  • 38ae378324 nix: make xcrun visible in Nix sandbox for precompiling Metal shaders (#6118) Joseph Stahl 2024-03-25 20:51:46 -04:00
  • e190f1fca6 nix: make xcrun visible in Nix sandbox for precompiling Metal shaders (#6118) Joseph Stahl 2024-03-25 20:51:46 -04:00
  • f6b28778af cuda : rename build flag to LLAMA_CUDA (#6299) slaren 2024-03-26 01:16:01 +01:00
  • 280345968d cuda : rename build flag to LLAMA_CUDA (#6299) slaren 2024-03-26 01:16:01 +01:00
  • 99162be346 nix: fix blas support (#6281) Christian Kögler 2024-03-25 18:52:45 +01:00
  • b06c16ef9f nix: fix blas support (#6281) Christian Kögler 2024-03-25 18:52:45 +01:00
  • b5c444bba2 tests : include IQ2_XXS and IQ2_XS in test-quantize-fns (#6303) Kawrakow 2024-03-25 18:33:15 +01:00
  • 1f2fd4e727 tests : include IQ2_XXS and IQ2_XS in test-quantize-fns (#6303) Kawrakow 2024-03-25 18:33:15 +01:00
  • b6a4db803e flake.lock: Update (#6266) Georgi Gerganov 2024-03-25 17:22:27 +02:00
  • 43139cc528 flake.lock: Update (#6266) Georgi Gerganov 2024-03-25 17:22:27 +02:00
  • 2824dcd14a cuda : fix LLAMA_CUDA_F16 build (#6298) slaren 2024-03-25 15:43:22 +01:00
  • 2f34b865b6 cuda : fix LLAMA_CUDA_F16 build (#6298) slaren 2024-03-25 15:43:22 +01:00
  • e4eff12dc6 cuda : refactor into multiple files (#6269) slaren 2024-03-25 13:50:23 +01:00