Commit Graph

  • 21f660efc0 cuda : fix disabling device with --tensor-split 1,0 (#3951) Jared Van Bortel 2023-11-05 10:08:57 -05:00
  • 132d25b8a6 cuda : fix disabling device with --tensor-split 1,0 (#3951) Jared Van Bortel 2023-11-05 10:08:57 -05:00
  • ac0220874a llama : mark LLM_ARCH_STARCODER as full offload supported (#3945) Meng Zhang 2023-11-05 04:40:08 -08:00
  • 3d48f42efc llama : mark LLM_ARCH_STARCODER as full offload supported (#3945) Meng Zhang 2023-11-05 04:40:08 -08:00
  • a04db7ec4e cmake : MSVC instruction detection (fixed up #809) (#3923) Eve 2023-11-05 08:03:09 +00:00
  • c41ea36eaa cmake : MSVC instruction detection (fixed up #809) (#3923) Eve 2023-11-05 08:03:09 +00:00
  • 67a2970f9f ci : use intel sde when ci cpu doesn't support avx512 (#3949) Eve 2023-11-05 07:46:44 +00:00
  • a7fac013cf ci : use intel sde when ci cpu doesn't support avx512 (#3949) Eve 2023-11-05 07:46:44 +00:00
  • 3a175b6f3a cuda : revert CUDA pool stuff (#3944) slaren 2023-11-05 08:12:13 +01:00
  • 48ade94538 cuda : revert CUDA pool stuff (#3944) slaren 2023-11-05 08:12:13 +01:00
  • 477f7e68d5 gguf-py: Support 01.AI Yi models (#3943) Kerfuffle 2023-11-04 16:20:34 -06:00
  • f28af0d81a gguf-py: Support 01.AI Yi models (#3943) Kerfuffle 2023-11-04 16:20:34 -06:00
  • 4861b667af metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938) Peter Sugihara 2023-11-03 12:18:18 -07:00
  • d9b33fe95b metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938) Peter Sugihara 2023-11-03 12:18:18 -07:00
  • c1a2ca89f4 ggml-metal: fix yarn rope (#3937) Xiao-Yong Jin 2023-11-03 13:00:31 -05:00
  • 5ba3746171 ggml-metal: fix yarn rope (#3937) Xiao-Yong Jin 2023-11-03 13:00:31 -05:00
  • 1a7b1c3993 ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921) slaren 2023-11-03 12:13:09 +01:00
  • abb77e7319 ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921) slaren 2023-11-03 12:13:09 +01:00
  • 5e1949ebea speculative : change default p_accept to 0.5 + CLI args (#3919) Georgi Gerganov 2023-11-03 09:41:17 +02:00
  • 8f961abdc4 speculative : change default p_accept to 0.5 + CLI args (#3919) Georgi Gerganov 2023-11-03 09:41:17 +02:00
  • 191d7ac243 common : YAYF (yet another YARN fix) (#3925) Georgi Gerganov 2023-11-03 09:24:00 +02:00
  • 05816027d6 common : YAYF (yet another YARN fix) (#3925) Georgi Gerganov 2023-11-03 09:24:00 +02:00
  • e9de126fe8 llama : change yarn_ext_factor placeholder to -1 (#3922) cebtenzzre 2023-11-03 02:31:58 -04:00
  • 3fdbe6b66b llama : change yarn_ext_factor placeholder to -1 (#3922) cebtenzzre 2023-11-03 02:31:58 -04:00
  • 6c6d2c2efd cuda : add ROCM aliases for CUDA pool stuff (#3918) Kerfuffle 2023-11-02 13:58:22 -06:00
  • 629f917cd6 cuda : add ROCM aliases for CUDA pool stuff (#3918) Kerfuffle 2023-11-02 13:58:22 -06:00
  • de7e65f827 cmake : fix relative path to git submodule index (#3915) Andrei 2023-11-02 15:40:31 -04:00
  • 51b2fc11f7 cmake : fix relative path to git submodule index (#3915) Andrei 2023-11-02 15:40:31 -04:00
  • 534bbd5c14 readme : add notice about #3912 Georgi Gerganov 2023-11-02 20:44:12 +02:00
  • 224e7d5b14 readme : add notice about #3912 Georgi Gerganov 2023-11-02 20:44:12 +02:00
  • 1c40a5638a cuda : fix const ptrs warning causing ROCm build issues (#3913) Georgi Gerganov 2023-11-02 20:32:11 +02:00
  • c7743fe1c1 cuda : fix const ptrs warning causing ROCm build issues (#3913) Georgi Gerganov 2023-11-02 20:32:11 +02:00
  • a0361be442 cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903) Oleksii Maryshchenko 2023-11-02 18:10:39 +01:00
  • d6069051de cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903) Oleksii Maryshchenko 2023-11-02 18:10:39 +01:00
  • e1df15f539 gguf : print error for GGUFv1 files (#3908) Georgi Gerganov 2023-11-02 16:22:30 +02:00
  • 4ff1046d75 gguf : print error for GGUFv1 files (#3908) Georgi Gerganov 2023-11-02 16:22:30 +02:00
  • afbc267d09 cmake : disable LLAMA_NATIVE by default (#3906) slaren 2023-11-02 13:10:33 +01:00
  • 21958bb393 cmake : disable LLAMA_NATIVE by default (#3906) slaren 2023-11-02 13:10:33 +01:00
  • 41121e703b gguf : remove special-case code for GGUFv1 (#3901) Georgi Gerganov 2023-11-02 11:20:21 +02:00
  • 2756c4fbff gguf : remove special-case code for GGUFv1 (#3901) Georgi Gerganov 2023-11-02 11:20:21 +02:00
  • 9468fa0e43 llm : prevent from 1-D tensors being GPU split (#3697) Georgi Gerganov 2023-11-02 09:54:18 +02:00
  • 1efae9b7dc llm : prevent from 1-D tensors being GPU split (#3697) Georgi Gerganov 2023-11-02 09:54:18 +02:00
  • 6c1e3443ed build : link against build info instead of compiling against it (#3879) cebtenzzre 2023-11-02 02:50:16 -04:00
  • b12fa0d1c1 build : link against build info instead of compiling against it (#3879) cebtenzzre 2023-11-02 02:50:16 -04:00
  • 570b4a235d cuda : check if this fixes Pascal card regression (#3882) Georgi Gerganov 2023-11-02 08:35:10 +02:00
  • 4d719a6d4e cuda : check if this fixes Pascal card regression (#3882) Georgi Gerganov 2023-11-02 08:35:10 +02:00
  • 3b5af5bf0e metal : fix build errors and kernel sig after #2268 (#3898) Georgi Gerganov 2023-11-02 08:33:37 +02:00
  • 183b3fac6c metal : fix build errors and kernel sig after #2268 (#3898) Georgi Gerganov 2023-11-02 08:33:37 +02:00
  • 13f4dd6529 cuda : fix RoPE after #2268 (#3897) cebtenzzre 2023-11-02 01:49:44 -04:00
  • 2fffa0d61f cuda : fix RoPE after #2268 (#3897) cebtenzzre 2023-11-02 01:49:44 -04:00
  • db24ce778d llama : fix llama_context_default_params after #2268 (#3893) cebtenzzre 2023-11-01 19:29:14 -04:00
  • 0eb332a10f llama : fix llama_context_default_params after #2268 (#3893) cebtenzzre 2023-11-01 19:29:14 -04:00
  • 8a1544cd4f ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891) slaren 2023-11-01 23:10:09 +01:00
  • d02e98cde0 ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891) slaren 2023-11-01 23:10:09 +01:00
  • 2bbcc7a05c llama : implement YaRN RoPE scaling (#2268) cebtenzzre 2023-11-01 18:04:33 -04:00
  • 898aeca90a llama : implement YaRN RoPE scaling (#2268) cebtenzzre 2023-11-01 18:04:33 -04:00
  • f616fee832 llm : fix llm_build_kqv taking unused tensor (benign, #3837) Georgi Gerganov 2023-11-01 23:08:30 +02:00
  • c43c2da8af llm : fix llm_build_kqv taking unused tensor (benign, #3837) Georgi Gerganov 2023-11-01 23:08:30 +02:00
  • 58564c3a54 llm : fix falcon norm after refactoring (#3837) Georgi Gerganov 2023-11-01 23:00:50 +02:00
  • 523e49b111 llm : fix falcon norm after refactoring (#3837) Georgi Gerganov 2023-11-01 23:00:50 +02:00
  • 944d61b928 metal : multi-simd softmax (#3710) Georgi Gerganov 2023-11-01 21:25:00 +02:00
  • e16b9fa4ba metal : multi-simd softmax (#3710) Georgi Gerganov 2023-11-01 21:25:00 +02:00
  • ed5d2c812d common : minor (#3715) Georgi Gerganov 2023-11-01 21:15:55 +02:00
  • ff8f9a88da common : minor (#3715) Georgi Gerganov 2023-11-01 21:15:55 +02:00
  • ee877f3a65 llm : add llm_build_context (#3881) Georgi Gerganov 2023-11-01 20:11:02 +02:00
  • 50337961a6 llm : add llm_build_context (#3881) Georgi Gerganov 2023-11-01 20:11:02 +02:00
  • 3523709a39 common : allow caller to handle help/argument exceptions (#3715) bandoti 2023-11-01 14:42:01 -03:00
  • 0e40806c1c common : allow caller to handle help/argument exceptions (#3715) bandoti 2023-11-01 14:42:01 -03:00
  • d722ed0a43 log : make generating separate log files optional (#3787) staviq 2023-11-01 15:18:27 +01:00
  • a2758d08e4 log : make generating separate log files optional (#3787) staviq 2023-11-01 15:18:27 +01:00
  • 5dee23d4fe sampling : null grammar field after reset (#3885) l3utterfly 2023-11-01 21:40:43 +08:00
  • e75dfdd31b sampling : null grammar field after reset (#3885) l3utterfly 2023-11-01 21:40:43 +08:00
  • bcfbbd0434 ggml : fix UNUSED macro (#3762) Georgi Gerganov 2023-11-01 13:50:45 +02:00
  • 9a3b4f6c86 ggml : fix UNUSED macro (#3762) Georgi Gerganov 2023-11-01 13:50:45 +02:00
  • fff7d66104 finetune : add -ngl parameter (#3762) Andrew Godfrey 2023-11-01 04:49:04 -07:00
  • 73bdcb395e finetune : add -ngl parameter (#3762) Andrew Godfrey 2023-11-01 04:49:04 -07:00
  • abf041e7c8 scripts : add server-llm.sh (#3868) Georgi Gerganov 2023-11-01 11:29:07 +02:00
  • f0e209324a scripts : add server-llm.sh (#3868) Georgi Gerganov 2023-11-01 11:29:07 +02:00
  • 4eabd661df server : re-enable completion and embedded at the same time (#3876) Adrian Hesketh 2023-11-01 09:28:28 +00:00
  • ca190bca8e server : re-enable completion and embedded at the same time (#3876) Adrian Hesketh 2023-11-01 09:28:28 +00:00
  • 420c3aee7c llama : refactor graph build code (#3837) Georgi Gerganov 2023-11-01 08:04:02 +02:00
  • 71e3718abd llama : refactor graph build code (#3837) Georgi Gerganov 2023-11-01 08:04:02 +02:00
  • 1603f191ad samplers : Min-P sampler implementation [alternative to Top P/Top K] (#3841) kalomaze 2023-10-31 14:44:49 -05:00
  • 238657db23 samplers : Min-P sampler implementation [alternative to Top P/Top K] (#3841) kalomaze 2023-10-31 14:44:49 -05:00
  • 093afcb68f flake.nix: fix for rocm 5.7 (#3853) Tungsten842 2023-10-31 18:24:03 +01:00
  • 07178c98e1 flake.nix: fix for rocm 5.7 (#3853) Tungsten842 2023-10-31 18:24:03 +01:00
  • 845a1edfaa ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861) Georgi Gerganov 2023-10-30 19:19:15 +02:00
  • 207b51900e ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861) Georgi Gerganov 2023-10-30 19:19:15 +02:00
  • 3228710867 Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843) Kerfuffle 2023-10-29 11:31:40 -06:00
  • 6e08281e58 Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843) Kerfuffle 2023-10-29 11:31:40 -06:00
  • 62ce7e02df make : remove unnecessary dependency on build-info.h (#3842) cebtenzzre 2023-10-29 12:33:47 -04:00
  • 2046eb4345 make : remove unnecessary dependency on build-info.h (#3842) cebtenzzre 2023-10-29 12:33:47 -04:00
  • 6db0371992 llama : fix kv shift bug (#3835) Georgi Gerganov 2023-10-29 18:32:51 +02:00
  • 71a09da301 llama : fix kv shift bug (#3835) Georgi Gerganov 2023-10-29 18:32:51 +02:00
  • 91be989d92 ggml : quantization refactoring (#3833) Georgi Gerganov 2023-10-29 18:32:28 +02:00
  • d69d777c02 ggml : quantization refactoring (#3833) Georgi Gerganov 2023-10-29 18:32:28 +02:00
  • 2546cef587 flake : update flake.lock for newer transformers version + provide extra dev shell (#3797) Erik Scholz 2023-10-28 16:41:07 +02:00
  • ff3bad83e2 flake : update flake.lock for newer transformers version + provide extra dev shell (#3797) Erik Scholz 2023-10-28 16:41:07 +02:00
  • f6d9ffa4e7 metal : try cwd for ggml-metal.metal if bundle lookup fails (#3793) Aarni Koskela 2023-10-28 15:43:01 +03:00
  • 82a6646e02 metal : try cwd for ggml-metal.metal if bundle lookup fails (#3793) Aarni Koskela 2023-10-28 15:43:01 +03:00