Default Branch

30381fc1fc · Faster hybrid inference when shared experts (#1191) · Updated 2026-01-26 05:22:05 +00:00

Branches

de818b77d6 · iq3_k, iq5_k: faster quantization · Updated 2024-08-05 05:13:53 +00:00    ikawrakow

4147
3380

30c002d22d · iq4_k: speedup quantization by a factor of ~2 · Updated 2024-08-03 16:32:43 +00:00    ikawrakow

4147
3379

7b3b413fe0 · Add copyright notice · Updated 2024-07-31 13:06:32 +00:00    ikawrakow

4147
3378

b29f64ea70 · iq4_k: scalar dot product · Updated 2024-07-28 10:09:28 +00:00    ikawrakow

4147
3355

473e280500 · Fusing a mat mul op followed by scale op on the CPU · Updated 2024-07-27 07:45:56 +00:00    ikawrakow

4147
3349

573e5007cd · Remove check · Updated 2024-07-26 15:00:26 +00:00    ikawrakow

4147
3350

ccdb948329 · Offload Bitnet token embeddings to the GPU - the right way · Updated 2024-07-26 10:50:41 +00:00    ikawrakow

4147
3346

db6b0f6dab · Update README with the new CUDA/Meat performance · Updated 2024-07-26 07:06:22 +00:00    ikawrakow

4147
3346

86d94862ae · iqk_soft_max · Updated 2024-07-22 14:34:42 +00:00    ikawrakow

4147
3329

7024ecfeb4 · iq1bn: faster AVX2 · Updated 2024-07-17 07:17:05 +00:00    ikawrakow

4147
3320