diff --git a/README.md b/README.md index f4f0ecbe..78327609 100644 --- a/README.md +++ b/README.md @@ -10,8 +10,10 @@ This repository is a fork of [llama.cpp](https://github.com/ggerganov/llama.cpp) ### Model Support -LlaMA-3-Nemotron [PR 377](https://github.com/ikawrakow/ik_llama.cpp/pull/377), Qwen3 [PR 355](https://github.com/ikawrakow/ik_llama.cpp/pull/355), GLM-4 [PR 344](https://github.com/ikawrakow/ik_llama.cpp/pull/344), Command-A [PR 341](https://github.com/ikawrakow/ik_llama.cpp/pull/341), bitnet-b1.58-2B-4T [PR 337](https://github.com/ikawrakow/ik_llama.cpp/pull/337), LLaMA-4 [PR 321](https://github.com/ikawrakow/ik_llama.cpp/pull/321), Gemma3 [PR 276](https://github.com/ikawrakow/ik_llama.cpp/pull/276), DeepSeek-V3 [PR 176](https://github.com/ikawrakow/ik_llama.cpp/pull/176) - +### Model Implementations +| LlaMA-3-Nemotron | Qwen3 | GLM-4 | Command-A | bitnet-b1.58-2B-4T | LLaMA-4 | Gemma3 | DeepSeek-V3 | +|:----------------:|:-----:|:-----:|:---------:|:------------------:|:-------:|:------:|:-----------:| +[PR 377](https://github.com/ikawrakow/ik_llama.cpp/pull/377) | [PR 355](https://github.com/ikawrakow/ik_llama.cpp/pull/355) | [PR 344](https://github.com/ikawrakow/ik_llama.cpp/pull/344) | [PR 341](https://github.com/ikawrakow/ik_llama.cpp/pull/341) | [PR 337](https://github.com/ikawrakow/ik_llama.cpp/pull/337) | [PR 321](https://github.com/ikawrakow/ik_llama.cpp/pull/321) | [PR 276](https://github.com/ikawrakow/ik_llama.cpp/pull/276) | [PR 176](https://github.com/ikawrakow/ik_llama.cpp/pull/176) | ### Quantization #### Quantization additions @@ -24,9 +26,10 @@ Information and the original CUDA implementation in [PR 113](https://github.com/ Information can be found in [Discussion 8](https://github.com/ikawrakow/ik_llama.cpp/discussions/8). -Initial implementations (Zen4, AVX2, NEON): `IQ5_KS_R4` [PR 426](https://github.com/ikawrakow/ik_llama.cpp/pull/426), `IQ5_KS` [PR 422](https://github.com/ikawrakow/ik_llama.cpp/pull/422), `IQ4_KS_R4` [PR 150](https://github.com/ikawrakow/ik_llama.cpp/pull/150), `IQ5_K_R4` [PR 149](https://github.com/ikawrakow/ik_llama.cpp/pull/149), `IQ2_K_R4` [PR 146](https://github.com/ikawrakow/ik_llama.cpp/pull/146), `IQ3_K_R4` [PR 145](https://github.com/ikawrakow/ik_llama.cpp/pull/145), `IQ4_K_R4` [PR 138](https://github.com/ikawrakow/ik_llama.cpp/pull/138), `IQ4_KSS` [PR 89](https://github.com/ikawrakow/ik_llama.cpp/pull/89), `IQ2_KS` [PR 85](https://github.com/ikawrakow/ik_llama.cpp/pull/85), `IQ4_KS` [PR 83](https://github.com/ikawrakow/ik_llama.cpp/pull/83), `IQ6_K` [PR 14](https://github.com/ikawrakow/ik_llama.cpp/pull/14), `IQ2_K, IQ3_K and IQ5_K` [PR 7](https://github.com/ikawrakow/ik_llama.cpp/pull/7), `IQ4_K` [PR 6](https://github.com/ikawrakow/ik_llama.cpp/pull/6) - -Cuda implementations: `IQ4_KS_R4` and `IQ5_KS_R4` [PR 493](https://github.com/ikawrakow/ik_llama.cpp/pull/493), `IQ1_S_R4` [PR 492](https://github.com/ikawrakow/ik_llama.cpp/pull/492), `IQ1_M_R4` [PR 494](https://github.com/ikawrakow/ik_llama.cpp/pull/494). `IQ4_KS_R4` and `IQ5_KS_R4` [PR 462](https://github.com/ikawrakow/ik_llama.cpp/pull/462), `IQ2_K_R4`, `IQ3_K_R4`, `IQ4_K_R4`, `IQ5_K_R4` [PR 461](https://github.com/ikawrakow/ik_llama.cpp/pull/461), `IQ4_K, IQ5_K, IQ6_K` [PR 417](https://github.com/ikawrakow/ik_llama.cpp/pull/417), `IQ2_KS, IQ2_K, IQ3_K` [PR 418](https://github.com/ikawrakow/ik_llama.cpp/pull/417) +| | IQ2_KS | IQ2_K (R4) | IQ3_K (R4) | IQ4_KSS | IQ4_KS (R4) | IQ4_K (R4) | IQ5_KS (R4) | IQ5_K (R4) | IQ6_K | +|---------------------|:------:|:----------:|:----------:|:-------:|:-----------:|:----------:|:-----------:|:----------:|:-----:| +| CPU | [85](https://github.com/ikawrakow/ik_llama.cpp/pull/85) | [7](https://github.com/ikawrakow/ik_llama.cpp/pull/7) ([146](https://github.com/ikawrakow/ik_llama.cpp/pull/146)) | [7](https://github.com/ikawrakow/ik_llama.cpp/pull/7) ([145](https://github.com/ikawrakow/ik_llama.cpp/pull/145)) | [89](https://github.com/ikawrakow/ik_llama.cpp/pull/89) | [83](https://github.com/ikawrakow/ik_llama.cpp/pull/83) ([150](https://github.com/ikawrakow/ik_llama.cpp/pull/150)) | [6](https://github.com/ikawrakow/ik_llama.cpp/pull/6) ([138](https://github.com/ikawrakow/ik_llama.cpp/pull/138)) | [422](https://github.com/ikawrakow/ik_llama.cpp/pull/422) ([426](https://github.com/ikawrakow/ik_llama.cpp/pull/426)) | [7](https://github.com/ikawrakow/ik_llama.cpp/pull/7) ([149](https://github.com/ikawrakow/ik_llama.cpp/pull/149)) | [14](https://github.com/ikawrakow/ik_llama.cpp/pull/14) | +| CUDA | [418](https://github.com/ikawrakow/ik_llama.cpp/pull/418) | [418](https://github.com/ikawrakow/ik_llama.cpp/pull/418) ([461](https://github.com/ikawrakow/ik_llama.cpp/pull/461)) | [418](https://github.com/ikawrakow/ik_llama.cpp/pull/418) ([461](https://github.com/ikawrakow/ik_llama.cpp/pull/461)) | [89](https://github.com/ikawrakow/ik_llama.cpp/pull/89) | [83](https://github.com/ikawrakow/ik_llama.cpp/pull/493) ([493](https://github.com/ikawrakow/ik_llama.cpp/pull/493), [462](https://github.com/ikawrakow/ik_llama.cpp/pull/462)) | [417](https://github.com/ikawrakow/ik_llama.cpp/pull/417) ([461](https://github.com/ikawrakow/ik_llama.cpp/pull/461)) | [422](https://github.com/ikawrakow/ik_llama.cpp/pull/422) ([493](https://github.com/ikawrakow/ik_llama.cpp/pull/493), [462](https://github.com/ikawrakow/ik_llama.cpp/pull/462)) | [417](https://github.com/ikawrakow/ik_llama.cpp/pull/417) ([461](https://github.com/ikawrakow/ik_llama.cpp/pull/461)) | [417](https://github.com/ikawrakow/ik_llama.cpp/pull/417) | #### Quantization improvements