Default Branch

30381fc1fc · Faster hybrid inference when shared experts (#1191) · Updated 2026-01-26 05:22:05 +00:00

Branches

2de3a96510 · Avoid computing the attention reduce op for cohere2 · Updated 2025-12-24 10:14:58 +00:00

4147
4076

172f9dad4c · WIP: fix sm layer (MoE) · Updated 2025-12-21 16:12:03 +00:00

75
9

706341d15b · nccl: second attempt, not working · Updated 2025-12-21 05:58:50 +00:00

75
7

e28148d401 · WIP · Updated 2025-12-20 06:50:58 +00:00

75
6

64908da772 · cuda: set device to src device before p2p copy · Updated 2025-12-17 11:43:36 +00:00

76
1

0864655a72 · Disable split scheduling with tensor overrides · Updated 2025-12-17 06:38:18 +00:00

4147
4076

5a731064e6 · Much better TG speed with split mode "graph" · Updated 2025-12-15 13:53:35 +00:00

81
1

664a529332 · Use actual active number of layers when preparing splits · Updated 2025-12-14 06:41:41 +00:00

83
1

f81c0b7fa0 · WIP · Updated 2025-12-13 17:43:17 +00:00

4147
4071

d82ed383ce · Fix sync logic · Updated 2025-12-13 17:39:42 +00:00

4147
4063

72af525c9f · Undo sync reduction · Updated 2025-12-13 15:57:07 +00:00

4147
4062

082545b3f0 · Do not use split mode graph scheduling if there are tensor overrides · Updated 2025-12-12 13:36:02 +00:00

4147
4061

50fbde85dc · Fix overflow in offset calculation in mmq · Updated 2025-12-12 13:22:02 +00:00

4147
4060

643cccd2c8 · This is better · Updated 2025-12-12 06:23:39 +00:00

4147
4060

ca1e7070f6 · Be able to enable or disable P2P via command line argument · Updated 2025-12-11 17:46:54 +00:00

4147
4058

e094f32467 · Fix #1055 · Updated 2025-12-11 13:26:41 +00:00

4147
4057

b41b17943d · Fix the fix · Updated 2025-12-11 07:03:52 +00:00

4147
4054

c953b47266 · Be able to set a max. number of GPUs to be used in split mode graph · Updated 2025-12-11 06:21:42 +00:00

4147
4054

b37fafdc39 · Fix llama-bench - missing buffer override comparison operator · Updated 2025-12-11 06:18:45 +00:00

4147
4053

b0cc63bcdf · Another attempt for sm graph · Updated 2025-12-09 19:30:06 +00:00

97
3