Default Branch

3a945af45d · Faster prompt processing on CUDA (#1687) · Updated 2026-04-25 07:05:23 +00:00

Branches

b147e31f5a · Reduce memory usage for FlashMLA-2 · Updated 2025-03-17 13:00:26 +00:00    ikawrakow

4437
3601

f2fb15de77 · Fix CUDA · Updated 2025-03-16 05:40:18 +00:00    ikawrakow

4437
3596

765c03d09b · FlashMLA-2: slightly smaller computer buffer size · Updated 2025-03-12 13:06:31 +00:00    ikawrakow

4437
3590

50bbc3f335 · FlashMLA(CUDA) - allow q8_0 for KV cache · Updated 2025-03-11 16:41:39 +00:00    ikawrakow

4437
3590

e0eebfd8ad · Try using fp32 for FlashMLA · Updated 2025-03-10 17:07:53 +00:00    ikawrakow

4437
3587

56921ccd49 · imatrix: wv_b <-> wkv_b · Updated 2025-03-10 13:31:22 +00:00    ikawrakow

4437
3589

cfec33848f · Guard against numerical precision issues for MLA on CUDA · Updated 2025-03-09 16:24:15 +00:00    ikawrakow

4437
3588

1a6712c0ca · This works on CUDA, but · Updated 2025-03-08 17:41:35 +00:00    ikawrakow

4437
3584

8fe22695ee · FlashMLA-2: on the CPU it now works also with q8_KV · Updated 2025-03-08 11:42:41 +00:00    ikawrakow

4437
3585

d29f8d3d40 · Add the --custom-q option to the help · Updated 2025-03-06 16:38:32 +00:00    ikawrakow

4437
3582

862b84bb28 · Cleanup · Updated 2025-03-06 12:59:14 +00:00    ikawrakow

4437
3583

c5a9bd4bf9 · CUDA FA with Dk != Dv: it works now for DeepSeek · Updated 2025-03-04 08:07:09 +00:00    ikawrakow

4437
3585

560c6ec7db · FlashMLA: that should be it for now · Updated 2025-03-03 07:20:30 +00:00    ikawrakow

4437
3585

8e612d50c1 · Add ser option to llama-bench · Updated 2025-03-01 11:25:04 +00:00    ikawrakow

4437
3579

23e080f576 · A better way to measure the cost of ggml_barrier · Updated 2025-03-01 06:28:49 +00:00    ikawrakow

4437
3577

84853b9a9b · Better concat for contiguous tensors · Updated 2025-02-28 17:32:36 +00:00    ikawrakow

4437
3579

407ca33b2a · Option to use MLA without a transposed cache · Updated 2025-02-27 08:28:54 +00:00    ikawrakow

4437
3575

a107d9664c · Cleanup · Updated 2025-02-27 06:31:29 +00:00    ikawrakow

4437
3578

78b407122f · Slightly better · Updated 2025-02-26 10:10:37 +00:00    ikawrakow

4437
3575

655981cced · Add more timing info · Updated 2025-02-25 09:54:59 +00:00    ikawrakow

4437
3576