Commit Graph

  • 44114257e3 Minor delta-net tweak ik/qkvz_tweak1 Kawrakow 2026-02-28 10:10:41 +00:00
  • 0ff3a43289 Bring back #1333 and #1335 (#1340) main Kawrakow 2026-02-28 14:31:42 +01:00
  • b845b0612c Remove autoregressive and chunking ik/fused_delta_net_3a Kawrakow 2026-02-28 13:21:19 +00:00
  • de88fa04b7 Bring back fused delta net 3 Kawrakow 2026-02-28 13:07:44 +00:00
  • 1922449b2c Revert delta net 3 (#1339) Kawrakow 2026-02-28 13:12:08 +01:00
  • fe05d4ae70 Revert "Fused delta net 3 (#1333)" ik/revert_delta_net_3 Kawrakow 2026-02-28 12:09:12 +00:00
  • 18b1b23ed5 Revert "Simplify delta-net (#1335)" Kawrakow 2026-02-28 12:09:02 +00:00
  • e5fc30244c Simplify delta-net (#1335) Kawrakow 2026-02-28 11:12:19 +01:00
  • 985f80180e Minor delta-net tweak ik/qkvz_tweak Kawrakow 2026-02-28 10:10:41 +00:00
  • 702e0765b8 Update README with clarification on '_XL' models Kawrakow 2026-02-27 16:22:10 +01:00
  • 3469ffb904 Minor ik/simplify_delta_net Kawrakow 2026-02-27 15:08:18 +00:00
  • 77a727e6fe Minor Kawrakow 2026-02-27 14:42:09 +00:00
  • ae0172a6ce Simplify delta-net Kawrakow 2026-02-27 14:19:24 +00:00
  • 7b68353e09 Fused delta net 3 (#1333) Kawrakow 2026-02-27 15:02:56 +01:00
  • 3c43fe37fa Fix race ik/fused_delta_net_3 Kawrakow 2026-02-27 13:23:54 +00:00
  • 6ac4335155 Make fused delta-net the default Kawrakow 2026-02-27 11:07:37 +00:00
  • f6439e420a Minor Kawrakow 2026-02-27 09:49:26 +00:00
  • e2dbf3acc3 Remove unused stuff Kawrakow 2026-02-27 09:47:17 +00:00
  • 745dee7d4e Cleanup Kawrakow 2026-02-27 09:37:22 +00:00
  • e217c36475 Keep the state in registers Kawrakow 2026-02-27 09:29:43 +00:00
  • 06727c50be This is better than chunked Kawrakow 2026-02-27 09:01:08 +00:00
  • 1e6d36b1b4 Graph parallel for dense Qwen-3.5 models (#1331) Kawrakow 2026-02-27 07:03:25 +01:00
  • facc8fdc44 Very slightly better fused delta-net (#1330) Kawrakow 2026-02-27 07:03:09 +01:00
  • fc46213fb9 Cleanup ik/sm_graph_q35 Kawrakow 2026-02-26 15:38:37 +00:00
  • 7292e23af4 Graph parallel for idense Qwen-3.5 models Kawrakow 2026-02-26 15:27:37 +00:00
  • c115e185c1 Very slightly better fused delta-net ik/slightly_better_fdn Kawrakow 2026-02-26 14:52:13 +00:00
  • 62a7dcac5a Move the Qwen-3.5 models to the standard attention mechanism (#1329) Kawrakow 2026-02-26 15:50:51 +01:00
  • f23ff50a37 Move the Qwen-3.5 models to the standard attention mechanism ik/qwen35_std_attn Kawrakow 2026-02-26 10:29:33 +00:00
  • 757bee6238 Add special FA handling for dense Qwen3.5 (#1328) Kawrakow 2026-02-26 11:27:41 +01:00
  • 7340745572 Add special FA handling for dense Qwen3.5 ik/fattn_q35dense Kawrakow 2026-02-26 08:16:31 +00:00
  • 0aa6f7e7cd iAdding support for dense Qwen-3.5 models (#1326) Kawrakow 2026-02-26 08:51:01 +01:00
  • 33698a2072 iAdding support for dense Qwen-3.5 models ik/qwen35dense Kawrakow 2026-02-26 07:21:48 +00:00
  • 2616efa296 Fused delta net 2 (#1320) Kawrakow 2026-02-26 06:53:43 +01:00
  • 87b35dac0c Faster quantization for MoE models with many experts (#1322) Kawrakow 2026-02-26 06:52:28 +01:00
  • 3fac78c48b server: enable checkpoint for recurrent models (#1310) firecoperana 2026-02-25 23:51:18 -06:00
  • 7962e9a4b3 save checkpoint during pp fcp/recurrent_checkpoint firecoperana 2026-02-25 19:05:54 -06:00
  • 4b840a362e Faster quantization for MoE models with many experts ik/faster_moe_quantize Kawrakow 2026-02-25 17:51:40 +00:00
  • 216f44363f Fix KT quantization yet again (#1321) Kawrakow 2026-02-25 18:07:12 +01:00
  • 785da428f7 Also this one ik/fix_quantize_kt Kawrakow 2026-02-25 16:48:43 +00:00
  • 8feb02bde7 Fixes for k-quants Kawrakow 2026-02-25 15:57:38 +00:00
  • daa9a2c764 Add same 1e-16f check for all quants in iqk_uantize.cpp Kawrakow 2026-02-25 15:15:11 +00:00
  • ab3c601269 Fix KT quantization yet again Kawrakow 2026-02-25 14:20:22 +00:00
  • 233898704c server: enable checkpoint for recurrent models firecoperana 2026-02-22 15:10:56 -06:00
  • 0579a868b9 Restore per context buffer size log ik/fused_delta_net_2 Kawrakow 2026-02-25 13:26:37 +00:00
  • ef2ab07b5b Merge remote-tracking branch 'origin/main' into ik/fused_delta_net_2 Kawrakow 2026-02-25 13:19:09 +00:00
  • c77ec4b8b8 Fused delta-net (#1315) Kawrakow 2026-02-25 14:12:48 +01:00
  • a8ef7e20e7 More tweaks Kawrakow 2026-02-25 13:11:47 +00:00
  • 8af3755f32 This seems quite a bit better Kawrakow 2026-02-25 07:14:28 +00:00
  • 0bf7043a7b Display the size of the tensors overriden during the tensor loading (#1318) Nexes the Elder 2026-02-25 07:36:27 +01:00
  • 170467e835 Llama-quantize: Partial requant feature (#1313) Nexes the Elder 2026-02-25 07:25:15 +01:00
  • 0ec3e739be Don't re-apply L2 norm - it has already been done Kawrakow 2026-02-25 05:27:55 +00:00
  • b3cf43e7f3 Give some nodes a name ik/fused_delta_net Kawrakow 2026-02-24 16:46:39 +00:00
  • 1687ff88be Use eps = 1e-6 Kawrakow 2026-02-24 14:15:28 +00:00
  • d7c0104967 Change meaning of fdn from bool flag to threshold value Kawrakow 2026-02-24 12:59:07 +00:00
  • b184e84480 Much faster fused delta-net on the CPU Kawrakow 2026-02-24 12:42:06 +00:00
  • 2ef38b56df CPU optimizations Kawrakow 2026-02-24 10:41:19 +00:00
  • 7af6892dcf More CUDA fused delta net optimizations Kawrakow 2026-02-24 10:00:03 +00:00
  • fecdcd5aa1 Add -fdn to llama-bench Kawrakow 2026-02-24 09:59:25 +00:00
  • dc44a37ca2 Simplify/improve CUDA delta-net Kawrakow 2026-02-24 07:41:09 +00:00
  • 28b31a66b2 Add command line argument for fused delta net Kawrakow 2026-02-24 05:40:26 +00:00
  • a350f1b96f Revive fused delta-net Kawrakow 2026-02-23 16:37:11 +00:00
  • 68431b049a server: propagate task index to response objects for batch requests (#1303) Joshua Jolley 2026-02-24 07:39:38 -07:00
  • aaa545c3dc adaptive p: collect probability before logit bias (#1314) dungquixote42 2026-02-24 09:39:17 -05:00
  • 38ca19d828 Minor delta-net tweak (#1308) Kawrakow 2026-02-24 15:22:57 +01:00
  • 7065488135 Slightly better graph parallel for Qwen3-Next (#1307) Kawrakow 2026-02-24 15:22:30 +01:00
  • cfb6747776 llama-quantize: --dry-run option (#1309) Kawrakow 2026-02-24 15:21:52 +01:00
  • 96b8298472 Fix typo in merge-up-gate-experts argument (#1311) TheAIGuyFromAR 2026-02-24 08:13:22 -06:00
  • 1a40325265 llama-quantize: --dry-run option ik/quantize_dry_run Kawrakow 2026-02-23 15:16:10 +00:00
  • 8d401d78ee Minor delta-net tweak ik/minor_delta_tweak Kawrakow 2026-02-23 14:47:36 +00:00
  • 35da97d53e Minor ik/graph_parallel_tweak Kawrakow 2026-02-23 09:42:21 +00:00
  • fd2a70913c Make sure we pick the reduced tensor from the right GPU Kawrakow 2026-02-23 09:37:34 +00:00
  • 68bd30d99c Fix max nodes (again) (#1306) Kawrakow 2026-02-23 11:17:37 +01:00
  • bf10f9f0c2 Fix max nodes (again) ik/max_nodes_again Kawrakow 2026-02-23 12:07:07 +02:00
  • 2bb40f8c35 Fix llm_arch_is_hybrid (#1305) Kawrakow 2026-02-23 08:55:53 +01:00
  • 6fd8f27108 Fix llm_arch_is_hybrid ik/fix_hybrid_detection Kawrakow 2026-02-23 09:53:11 +02:00
  • 5dacb5355a Graph parallel for Qwen3-Next (#1292) Kawrakow 2026-02-23 07:58:00 +01:00
  • dcf50d8279 Fix tool call for Qwen3.5 (#1300) Yap Sok Ann 2026-02-23 13:54:56 +07:00
  • efc294cc39 server: fix crash from adaptive p (#1304) firecoperana 2026-02-23 00:25:52 -06:00
  • bd3cf6e2cf server: fix crash from adaptive p fcp/crash firecoperana 2026-02-22 17:45:42 -06:00
  • 89b1e2b518 Better estimate for max. nuber of compute nodes (#1296) Kawrakow 2026-02-22 18:16:49 +01:00
  • 09a88c9ae5 Add MTP decoding support for GLM-4.x MoE (#1270) Samuel Oliveira Alves 2026-02-22 14:14:39 -03:00
  • cbf7fc7e2f Update README with warning about '_XL' models from Unsloth Kawrakow 2026-02-22 07:42:17 +01:00
  • bd387a279a Add new authors to the AUTHORS file Kawrakow 2026-02-21 19:20:31 +01:00
  • 52285a5ee7 Just in case ik/max_nodes Kawrakow 2026-02-21 18:14:14 +00:00
  • d93fe37ba7 Better estimate for max. nuber of compute nodes Kawrakow 2026-02-21 18:06:24 +00:00
  • 66323b92f7 Qwen3.5-MoE: fix regenerating message error (#1295) firecoperana 2026-02-21 11:24:12 -06:00
  • 463359f96d This works, but is slower than split mode layer ik/sm_graph_q3next Kawrakow 2026-02-20 16:10:29 +00:00
  • 6c2f7ad397 WIP Kawrakow 2026-02-20 10:18:39 +00:00
  • 13c3d83ce7 Qwen3.5-MoE support (#1288) Kawrakow 2026-02-21 08:33:06 +01:00
  • 07516cec2d This appears to work ik/qwen35moe Kawrakow 2026-02-19 16:15:31 +00:00
  • 8737d3d924 WIP: loads and runs, but not correct Kawrakow 2026-02-19 06:58:36 +00:00
  • b2cb4512c5 Create parameters overview (#1269) mcm007 2026-02-20 08:20:56 +02:00
  • 0f411b02e2 Fix adaptive p sampler bug with string ban (#1287) dungquixote42 2026-02-20 01:11:36 -05:00
  • b855bf92de Fix slot prompt updating. (#1285) rkozuch 2026-02-19 18:15:49 +11:00
  • d81cde5cea Fix very low bpw missing imatrix check (#1284) Kawrakow 2026-02-19 08:15:26 +01:00
  • 51df09be8a Feat - add kimi 2.5 Vision (#1280) Samuel Oliveira Alves 2026-02-19 04:15:03 -03:00
  • 04cf685e82 Factor out delta net (#1286) Kawrakow 2026-02-18 17:16:17 +01:00
  • d2e193d711 More standard attn for Qwen3-Next ik/delta_net Kawrakow 2026-02-18 13:14:17 +00:00
  • 9452bc06d9 Use the standard FFN functions Kawrakow 2026-02-18 11:32:13 +00:00
  • 5a22dca980 WIP Kawrakow 2026-02-18 10:36:38 +00:00