Commit Graph

  • c4f2af9ccc WIP Iwan Kawrakow 2025-08-25 06:52:33 +03:00
  • 26c0dfdbfa WIP Iwan Kawrakow 2025-08-24 20:03:55 +03:00
  • 916733144c WIP Iwan Kawrakow 2025-08-24 19:37:02 +03:00
  • 406da6670d Now also -fmoe works Iwan Kawrakow 2025-08-24 18:59:18 +03:00
  • af6b5365cc This seems to work Iwan Kawrakow 2025-08-24 16:46:41 +03:00
  • 7a68553487 Add mikupad to ik_llama as an alternative WebUI (#558) saood06 2025-08-24 08:27:29 -05:00
  • af13c9a292 Add mikupad to ik_llama as an alternative WebUI (#558) saood06 2025-08-24 08:27:29 -05:00
  • c213189c2b Fix compile without flag on systems without it installed s6/mikupad Saood Karim 2025-08-24 07:42:51 -05:00
  • 0c3a3ffee6 WIP: adding mainline mmq_id implementation Iwan Kawrakow 2025-08-24 14:57:21 +03:00
  • fcfefca139 Merge branch 'main' into s6/mikupad Saood Karim 2025-08-24 04:07:51 -05:00
  • d1042307a6 Finalize build, Put support behing LLAMA_SERVER_SQLITE3: command not found build option, and update error message to include the build option is not passed situation Saood Karim 2025-08-24 03:55:41 -05:00
  • d3aecc7f37 Log for debugging #721 (#722) Kawrakow 2025-08-23 15:24:34 +03:00
  • e008c0e192 Log for debugging #721 (#722) Kawrakow 2025-08-23 15:24:34 +03:00
  • 2e86b76476 Remove the 16 ik/debug_issue_721 Iwan Kawrakow 2025-08-23 14:58:14 +03:00
  • efeb5ff1df Log for debugging #721 Iwan Kawrakow 2025-08-23 14:23:28 +03:00
  • 277426f040 Sanitize importances for KT quantization ik/sanitize_importance_kt_quants Iwan Kawrakow 2025-08-23 07:43:38 +03:00
  • 8e23ecdd96 Fix more Q8_0 repacking mess on AVX2 (#719) Kawrakow 2025-08-23 09:04:51 +03:00
  • e919e89d5a Fix more Q8_0 repacking mess on AVX2 (#719) Kawrakow 2025-08-23 09:04:51 +03:00
  • 7845ae4a8d Fix more Q8_0 repacking mess on AVX2 ik/fix_q80_avx2_2 Iwan Kawrakow 2025-08-23 09:03:30 +03:00
  • 866145b2b9 Remove scary warning about incompatible model (#717) Kawrakow 2025-08-22 18:42:01 +03:00
  • 9351cc3416 Remove scary warning about incompatible model (#717) Kawrakow 2025-08-22 18:42:01 +03:00
  • 6c01bedbb1 Minor ik/remove_scary_warning Iwan Kawrakow 2025-08-22 18:21:04 +03:00
  • 9fa5956602 Remove scary warning about incompatible model Iwan Kawrakow 2025-08-22 18:15:05 +03:00
  • e962ce8c70 CUDA: faster IQ2_K, IQ2_KS, IQ2_K_R4 (#716) Kawrakow 2025-08-22 07:25:35 +03:00
  • dfa6e2b5fa CUDA: faster IQ2_K, IQ2_KS, IQ2_K_R4 (#716) Kawrakow 2025-08-22 07:25:35 +03:00
  • ca8c72ff1a AVX512+AVXVNNI GEMM implementation for quants using Q8_K for activations (#710) Kawrakow 2025-08-22 06:27:07 +03:00
  • 3b94f0a73e AVX512+AVXVNNI GEMM implementation for quants using Q8_K for activations (#710) Kawrakow 2025-08-22 06:27:07 +03:00
  • 0b448997ec Does this fix #690? (#711) Kawrakow 2025-08-21 19:17:33 +03:00
  • c8e4d6648c Does this fix #690? (#711) Kawrakow 2025-08-21 19:17:33 +03:00
  • 01eee24f0f Use bperm trick for iq2_k_r4 gemv -> ~7% gain ik/cuda_iq2k_use_bperm1 Iwan Kawrakow 2025-08-21 18:58:19 +03:00
  • 9cf9172afe Use bperm trick for iq2_k gemv -> ~3% gain Iwan Kawrakow 2025-08-21 18:44:57 +03:00
  • 353e9ab38a Use bperm trick for iq2_ks gemv -> ~7% gain Iwan Kawrakow 2025-08-21 18:29:00 +03:00
  • 693e9d1a16 Use bperm trick for iq2_k_r4 gemm -> ~3% gain Iwan Kawrakow 2025-08-21 18:00:17 +03:00
  • 1d91c16869 Use bperm trick for iq2_k gemm -> ~5% gain Iwan Kawrakow 2025-08-21 17:35:33 +03:00
  • eb488f98da Use bperm trick for iq2_ks gemm -> 7% gain Iwan Kawrakow 2025-08-21 17:08:16 +03:00
  • c5f58e0270 CUDA: faster IQ3_K, IQ3_KS, IQ3_K_R4 (#714) Kawrakow 2025-08-21 19:08:57 +03:00
  • 05cd6994c8 CUDA: faster IQ3_K, IQ3_KS, IQ3_K_R4 (#714) Kawrakow 2025-08-21 19:08:57 +03:00
  • 90379f3d51 Use bperm trick for iq3_k gemv -> 4.5% gain ik/cuda_iq3k_use_bperm1 Iwan Kawrakow 2025-08-21 15:49:00 +03:00
  • 91b20d4bec Use bperm trick for iq3_k gemv -> ~3% faster Iwan Kawrakow 2025-08-21 15:32:13 +03:00
  • 770bf5ff87 Use bperm trick for iq3_k_r4 gemv -> ~5% faster Iwan Kawrakow 2025-08-21 14:22:10 +03:00
  • fa9e69fdd6 Use bperm trick for iq3_k -> 8% PP performance gain Iwan Kawrakow 2025-08-21 14:04:18 +03:00
  • 3eaacf235e Use bperm trick for iq3_k -> 5% PP performance gain Iwan Kawrakow 2025-08-21 13:50:06 +03:00
  • 2078f269ef Use bperm trick for iq3_ks - 5% PP performance gain Iwan Kawrakow 2025-08-21 13:21:33 +03:00
  • 9f3d062ba7 CUDA: faster prompt processing for 4-bit quants (#713) Kawrakow 2025-08-21 15:57:35 +03:00
  • 78de7736e8 CUDA: faster prompt processing for 4-bit quants (#713) Kawrakow 2025-08-21 15:57:35 +03:00
  • 8d91235b0e Use get_int_from_table_16 everywhere for 4-bit quants ik/cuda_use_bperm Iwan Kawrakow 2025-08-21 11:27:39 +03:00
  • 7fe9cd9968 Use __byte_perm in get_int_from_table_16 Iwan Kawrakow 2025-08-21 10:14:59 +03:00
  • 79f34c4e1d Just always set num_rows to 16 ik/q8_k_r16 Iwan Kawrakow 2025-08-21 08:09:25 +03:00
  • f6f56db00d Another attempt ik/try_fix_690 Iwan Kawrakow 2025-08-20 18:00:25 +03:00
  • 6b82a64273 Disable "...is not marked as EOG" messages (#712) Kawrakow 2025-08-20 16:47:14 +03:00
  • 0cb6696943 Disable "...is not marked as EOG" messages (#712) Kawrakow 2025-08-20 16:47:14 +03:00
  • 0299da3ef8 Disable "...is not marked as EOG" messages ik/disable_vocab_debug Iwan Kawrakow 2025-08-20 16:45:59 +03:00
  • 062ed408e1 Does this fix #690? Iwan Kawrakow 2025-08-20 16:35:42 +03:00
  • f3edfe0f03 Fix AVX2 Iwan Kawrakow 2025-08-20 14:46:18 +03:00
  • be2d5e5fed q8_k_r16: iq5_ks, iq5_k, and iq6_k now use q8_k_r16 on Zen4+ Iwan Kawrakow 2025-08-20 14:28:15 +03:00
  • ab90830d25 q8_k_r16: iq4_kss, iq4_ks, and iq4_k now use q8_k_r16 on Zen4+ Iwan Kawrakow 2025-08-20 14:09:01 +03:00
  • 349c44cda2 q8_k_r16: iq3_ks and iq3_k now uses q8_k_r16 on Zen4+ Iwan Kawrakow 2025-08-20 12:56:34 +03:00
  • f3d5d99e3a q8_k_r16: iq2_kl now uses q8_k_r16 on Zen4+ Iwan Kawrakow 2025-08-20 12:44:40 +03:00
  • 84f8ae7aad q8_k_r16: iq2_ks and iq2_k now uses q8_k_r16 on Zen4+ Iwan Kawrakow 2025-08-20 12:38:20 +03:00
  • eb488e7475 q8_k_r16: q2_K and q3_K now uses q8_k_r16 on Zen4+ Iwan Kawrakow 2025-08-20 12:24:07 +03:00
  • 344289ab19 q8_k_r16: iq1_s and iq1_m now uses q8_k_r16 on Zen4+ Iwan Kawrakow 2025-08-20 12:04:35 +03:00
  • ea27251b56 q8_k_r16: iq3_s now uses q8_k_r16 on Zen4+ Iwan Kawrakow 2025-08-20 11:54:53 +03:00
  • bdb266baf1 q8_k_r16: iq3_xxs now uses q8_k_r16 on Zen4+ Iwan Kawrakow 2025-08-20 11:49:35 +03:00
  • 669f774fb1 q8_k_r16: iq2_s now uses q8_k_r16 on Zen4+ Iwan Kawrakow 2025-08-20 11:43:33 +03:00
  • f11612ba59 q8_k_r16: iq2_xs now uses q8_k_r16 on Zen4+ Iwan Kawrakow 2025-08-20 11:35:11 +03:00
  • 0c88b498d7 q8_k_r16: iq2_xxs now uses q8_k_r16 on Zen4+ Iwan Kawrakow 2025-08-20 10:21:40 +03:00
  • 8791a0e7e6 q8_k_r16: iq4_xs now uses q8_k_r16 on Zen4+ Iwan Kawrakow 2025-08-20 09:23:23 +03:00
  • 270b45a481 q8_k_r16: basics Iwan Kawrakow 2025-08-20 08:31:35 +03:00
  • 5d39c132f2 This is better ik/fix_q80_avx2_mess Iwan Kawrakow 2025-08-19 19:53:17 +03:00
  • a9eeef53f3 Fix q8_0 repacking issues on AVX2 (#708) Kawrakow 2025-08-19 19:49:58 +03:00
  • 2572d16399 Fix q8_0 repacking issues on AVX2 (#708) Kawrakow 2025-08-19 19:49:58 +03:00
  • cf87ad6923 Fix q8_0 repacking issues on AVX2 ik/fix_q80_moe_avx2 Iwan Kawrakow 2025-08-19 18:43:36 +03:00
  • bc2cce5ce1 Disable experimental code that causes issues with MSVC (#707) Kawrakow 2025-08-19 18:09:49 +03:00
  • ee85af6d26 Disable experimental code that causes issues with MSVC (#707) Kawrakow 2025-08-19 18:09:49 +03:00
  • 93e8e31943 Disable experimental code that causes issues with MSVC ik/disable_experimental_code1 Iwan Kawrakow 2025-08-19 17:11:18 +03:00
  • f98b1befdb remove curious assertions (#705) usrlocalben 2025-08-19 06:41:29 -05:00
  • 6aef3c9415 remove curious assertions (#705) usrlocalben 2025-08-19 06:41:29 -05:00
  • 2f6c613de8 Make compression not show in sidebar if extension is not loaded Saood Karim 2025-08-19 00:44:44 -05:00
  • 06bed7e01b Port universal assisted decoding to llama-server (#699) g2mt 2025-08-17 23:22:23 -07:00
  • 23fe18ce83 Port universal assisted decoding to llama-server (#699) g2mt 2025-08-17 23:22:23 -07:00
  • 41d346ac23 Fix crash ik/cpu_swa_v1 Iwan Kawrakow 2025-08-18 07:56:14 +03:00
  • e9899c0801 Set mask bounds for all supported SWA models Iwan Kawrakow 2025-08-15 19:46:13 +03:00
  • 6aaeb81c94 Compute mask bounds when creating the mask Iwan Kawrakow 2025-08-15 19:36:55 +03:00
  • 43096be033 This does the trick for PP Iwan Kawrakow 2025-08-15 16:45:54 +03:00
  • ed9504bd92 Pull in full sqlite_modern_cpp repo for the license as it is not attached to source files Saood Karim 2025-08-17 08:25:37 -05:00
  • a3b174b69a Merge remote-tracking branch 'origin/main' into s6/mikupad Saood Karim 2025-08-17 08:02:02 -05:00
  • 6b2c84b099 Revert "Better CPU prompt processing performance for SWA models (#696)" (#701) Kawrakow 2025-08-17 15:44:02 +03:00
  • a3a523009e Revert "Better CPU prompt processing performance for SWA models (#696)" (#701) Kawrakow 2025-08-17 15:44:02 +03:00
  • d0d3014cf7 Merge remote-tracking branch 'origin/main' into s6/mikupad Saood Karim 2025-08-17 07:41:45 -05:00
  • e29829eda8 Revert "Better CPU prompt processing performance for SWA models (#696)" ik/reverts Iwan Kawrakow 2025-08-17 15:32:56 +03:00
  • 1ca612375e Fix GLM-4.5 attention (#700) Kawrakow 2025-08-17 14:31:03 +03:00
  • 7d14f8ea79 Fix GLM-4.5 attention (#700) Kawrakow 2025-08-17 14:31:03 +03:00
  • d4d017766e Better CPU prompt processing performance for SWA models (#696) Kawrakow 2025-08-17 10:30:27 +03:00
  • 93a4f6089f Better CPU prompt processing performance for SWA models (#696) Kawrakow 2025-08-17 10:30:27 +03:00
  • ca22798c0d Fix GLM-4.5 attention ik/fix_glm4_attn Iwan Kawrakow 2025-08-17 09:10:20 +03:00
  • 259cbf0bde Added "usage" to server response (#695) Pavel Dudkouski 2025-08-16 22:25:14 -04:00
  • 4bf5c8184b Added "usage" to server response (#695) Pavel Dudkouski 2025-08-16 22:25:14 -04:00
  • b21baa0c3e fix merge conflict issue Saood Karim 2025-08-16 00:20:13 -05:00
  • 63eb72e0e2 Merge remote-tracking branch 'origin/main' into s6/mikupad Saood Karim 2025-08-16 00:04:07 -05:00
  • b6bc5eedad Port speculative decoding from upstream to llama-server (#645) g2mt 2025-08-15 21:26:44 -07:00