Default Branch

30381fc1fc · Faster hybrid inference when shared experts (#1191) · Updated 2026-01-26 05:22:05 +00:00

Branches

598e8e7d5f · Fix build with RPC not enabled · Updated 2025-11-30 18:03:48 +00:00

4147
4034

ec45020e37 · Leave FFN partial results as f16 · Updated 2025-11-28 07:25:20 +00:00

118
15

4c4c84ba7f · Attempt to fix #1014 · Updated 2025-11-27 09:11:23 +00:00

4147
4028

0a6e650e29 · Fix llama-bench mla parameter · Updated 2025-11-27 08:30:58 +00:00

4147
4028

2339d41d2e · Change default RPC order and fix wrong RPC order in --device arg · Updated 2025-11-26 02:32:00 +00:00

123
1

43f9f342dd · Add MXFP4 to gguf-py constants · Updated 2025-11-24 14:42:33 +00:00

4147
4025

422585d726 · Enable iq4_nl KV cache on CUDA · Updated 2025-11-24 08:39:14 +00:00

4147
4024

8297d10111 · Fix q6_0 dequantize · Updated 2025-11-24 08:04:46 +00:00

4147
4023

99e0e334a5 · Disable RoPE cache · Updated 2025-11-24 06:08:07 +00:00

4147
4021

0369d2ba44 · Gigachat: CPU FA (needs 192 x 192 for MLA = 3) · Updated 2025-11-21 09:44:34 +00:00    ikawrakow

4147
4018

2e4bfed583 · WIP: try syncing - not working yet · Updated 2025-11-20 13:30:43 +00:00

132
1

b9d25dc35b · Fix requatizing from row-interleaved quants · Updated 2025-11-20 10:45:56 +00:00

133
1

8f7dd2f06b · Make gguf-py stuff work with numpy 2.0 · Updated 2025-11-20 09:11:01 +00:00

135
1

4b731fe333 · Fix junja -> junja · Updated 2025-11-20 08:01:21 +00:00

135
3

00259c14a7 · Also llama-bench · Updated 2025-11-19 15:14:52 +00:00

136
2

810c47fc38 · Attempt to fix #974 · Updated 2025-11-19 12:50:35 +00:00

138
1

c1d0738a1b · Make sure we can fuse Q and K RoPE for DeepSeek models · Updated 2025-11-19 12:39:34 +00:00

140
1

f514891418 · Fuse sum_rows and div with topk-moe · Updated 2025-11-19 10:14:33 +00:00

140
1

5195e38d47 · Fuse Q and K RoPE · Updated 2025-11-18 12:05:15 +00:00

142
1

a1c32c1d39 · Add usage for -vq, --validate-quants · Updated 2025-11-17 15:00:51 +00:00    ikawrakow

4147
4004