Default Branch

30381fc1fc · Faster hybrid inference when shared experts (#1191) · Updated 2026-01-26 05:22:05 +00:00

Branches

04829ca412 · Adjust ncols for ADA_LOVELACE or better · Updated 2026-01-26 09:00:42 +00:00

0
2

c96ad27cd0 · server: add string ban · Updated 2026-01-25 22:04:41 +00:00

2
1

109686af6f · Faster hybrid inference when shared experts · Updated 2026-01-25 14:38:54 +00:00

2
1

aff7aa0cf6 · Add condition · Updated 2026-01-25 06:52:04 +00:00

3
4

6e6d105d4e · Much faster rng sampling · Updated 2026-01-24 13:41:47 +00:00

3
1

c663eeaca6 · Disable when the KV cache is not f16 · Updated 2026-01-24 05:03:52 +00:00

5
3

7f5503244e · Handle quantized cache · Updated 2026-01-23 06:47:29 +00:00

5
2

3a3e1638d4 · Remove llamafile remnants · Updated 2026-01-22 11:12:04 +00:00

6
1

32f8e6a565 · Merge remote-tracking branch 'origin/main' into ik/sm_graph_cuda_graphs · Updated 2026-01-22 10:34:11 +00:00

8
4

c37783b361 · Fix non-contiguous batched cuBLAS · Updated 2026-01-22 10:05:35 +00:00

12
1

a2fb4cefda · sweep_bench: set number of repetions · Updated 2026-01-21 08:33:42 +00:00

12
1

3d5b854aee · Make comments more precise when experts gating function is missing · Updated 2026-01-21 07:08:54 +00:00

13
1

487411b676 · This is better · Updated 2026-01-21 05:52:10 +00:00

14
2

a6651d017a · Change graph key · Updated 2026-01-20 15:35:53 +00:00

14
2

8f98961b96 · Fix build failure when OpenMP is not available · Updated 2026-01-20 11:06:25 +00:00

16
1

bc16202fc7 · Merge remote-tracking branch 'origin/main' into ik/topk_moe_fuse_bias · Updated 2026-01-20 10:47:11 +00:00

16
5

03c0629b3c · Make FA work for mla != 0 · Updated 2026-01-20 07:58:31 +00:00

17
3

1e240db2a0 · Couldn't look at it without fixing it. · Updated 2026-01-19 14:38:53 +00:00

18
3

f62e317dbe · Merge remote-tracking branch 'origin/main' into ik/adaptive_p_2 · Updated 2026-01-19 13:11:04 +00:00

19
8

a96e5449cc · Correctly accumulate sampling time for adaptive_p · Updated 2026-01-19 10:17:07 +00:00

21
4