mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-24 07:04:11 +00:00
Update README.md with tables
This commit is contained in:
13
README.md
13
README.md
@@ -10,8 +10,10 @@ This repository is a fork of [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
||||
|
||||
### Model Support
|
||||
|
||||
LlaMA-3-Nemotron [PR 377](https://github.com/ikawrakow/ik_llama.cpp/pull/377), Qwen3 [PR 355](https://github.com/ikawrakow/ik_llama.cpp/pull/355), GLM-4 [PR 344](https://github.com/ikawrakow/ik_llama.cpp/pull/344), Command-A [PR 341](https://github.com/ikawrakow/ik_llama.cpp/pull/341), bitnet-b1.58-2B-4T [PR 337](https://github.com/ikawrakow/ik_llama.cpp/pull/337), LLaMA-4 [PR 321](https://github.com/ikawrakow/ik_llama.cpp/pull/321), Gemma3 [PR 276](https://github.com/ikawrakow/ik_llama.cpp/pull/276), DeepSeek-V3 [PR 176](https://github.com/ikawrakow/ik_llama.cpp/pull/176)
|
||||
|
||||
### Model Implementations
|
||||
| LlaMA-3-Nemotron | Qwen3 | GLM-4 | Command-A | bitnet-b1.58-2B-4T | LLaMA-4 | Gemma3 | DeepSeek-V3 |
|
||||
|:----------------:|:-----:|:-----:|:---------:|:------------------:|:-------:|:------:|:-----------:|
|
||||
[PR 377](https://github.com/ikawrakow/ik_llama.cpp/pull/377) | [PR 355](https://github.com/ikawrakow/ik_llama.cpp/pull/355) | [PR 344](https://github.com/ikawrakow/ik_llama.cpp/pull/344) | [PR 341](https://github.com/ikawrakow/ik_llama.cpp/pull/341) | [PR 337](https://github.com/ikawrakow/ik_llama.cpp/pull/337) | [PR 321](https://github.com/ikawrakow/ik_llama.cpp/pull/321) | [PR 276](https://github.com/ikawrakow/ik_llama.cpp/pull/276) | [PR 176](https://github.com/ikawrakow/ik_llama.cpp/pull/176) |
|
||||
### Quantization
|
||||
|
||||
#### Quantization additions
|
||||
@@ -24,9 +26,10 @@ Information and the original CUDA implementation in [PR 113](https://github.com/
|
||||
|
||||
Information can be found in [Discussion 8](https://github.com/ikawrakow/ik_llama.cpp/discussions/8).
|
||||
|
||||
Initial implementations (Zen4, AVX2, NEON): `IQ5_KS_R4` [PR 426](https://github.com/ikawrakow/ik_llama.cpp/pull/426), `IQ5_KS` [PR 422](https://github.com/ikawrakow/ik_llama.cpp/pull/422), `IQ4_KS_R4` [PR 150](https://github.com/ikawrakow/ik_llama.cpp/pull/150), `IQ5_K_R4` [PR 149](https://github.com/ikawrakow/ik_llama.cpp/pull/149), `IQ2_K_R4` [PR 146](https://github.com/ikawrakow/ik_llama.cpp/pull/146), `IQ3_K_R4` [PR 145](https://github.com/ikawrakow/ik_llama.cpp/pull/145), `IQ4_K_R4` [PR 138](https://github.com/ikawrakow/ik_llama.cpp/pull/138), `IQ4_KSS` [PR 89](https://github.com/ikawrakow/ik_llama.cpp/pull/89), `IQ2_KS` [PR 85](https://github.com/ikawrakow/ik_llama.cpp/pull/85), `IQ4_KS` [PR 83](https://github.com/ikawrakow/ik_llama.cpp/pull/83), `IQ6_K` [PR 14](https://github.com/ikawrakow/ik_llama.cpp/pull/14), `IQ2_K, IQ3_K and IQ5_K` [PR 7](https://github.com/ikawrakow/ik_llama.cpp/pull/7), `IQ4_K` [PR 6](https://github.com/ikawrakow/ik_llama.cpp/pull/6)
|
||||
|
||||
Cuda implementations: `IQ4_KS_R4` and `IQ5_KS_R4` [PR 493](https://github.com/ikawrakow/ik_llama.cpp/pull/493), `IQ1_S_R4` [PR 492](https://github.com/ikawrakow/ik_llama.cpp/pull/492), `IQ1_M_R4` [PR 494](https://github.com/ikawrakow/ik_llama.cpp/pull/494). `IQ4_KS_R4` and `IQ5_KS_R4` [PR 462](https://github.com/ikawrakow/ik_llama.cpp/pull/462), `IQ2_K_R4`, `IQ3_K_R4`, `IQ4_K_R4`, `IQ5_K_R4` [PR 461](https://github.com/ikawrakow/ik_llama.cpp/pull/461), `IQ4_K, IQ5_K, IQ6_K` [PR 417](https://github.com/ikawrakow/ik_llama.cpp/pull/417), `IQ2_KS, IQ2_K, IQ3_K` [PR 418](https://github.com/ikawrakow/ik_llama.cpp/pull/417)
|
||||
| | IQ2_KS | IQ2_K (R4) | IQ3_K (R4) | IQ4_KSS | IQ4_KS (R4) | IQ4_K (R4) | IQ5_KS (R4) | IQ5_K (R4) | IQ6_K |
|
||||
|---------------------|:------:|:----------:|:----------:|:-------:|:-----------:|:----------:|:-----------:|:----------:|:-----:|
|
||||
| CPU | [85](https://github.com/ikawrakow/ik_llama.cpp/pull/85) | [7](https://github.com/ikawrakow/ik_llama.cpp/pull/7) ([146](https://github.com/ikawrakow/ik_llama.cpp/pull/146)) | [7](https://github.com/ikawrakow/ik_llama.cpp/pull/7) ([145](https://github.com/ikawrakow/ik_llama.cpp/pull/145)) | [89](https://github.com/ikawrakow/ik_llama.cpp/pull/89) | [83](https://github.com/ikawrakow/ik_llama.cpp/pull/83) ([150](https://github.com/ikawrakow/ik_llama.cpp/pull/150)) | [6](https://github.com/ikawrakow/ik_llama.cpp/pull/6) ([138](https://github.com/ikawrakow/ik_llama.cpp/pull/138)) | [422](https://github.com/ikawrakow/ik_llama.cpp/pull/422) ([426](https://github.com/ikawrakow/ik_llama.cpp/pull/426)) | [7](https://github.com/ikawrakow/ik_llama.cpp/pull/7) ([149](https://github.com/ikawrakow/ik_llama.cpp/pull/149)) | [14](https://github.com/ikawrakow/ik_llama.cpp/pull/14) |
|
||||
| CUDA | [418](https://github.com/ikawrakow/ik_llama.cpp/pull/418) | [418](https://github.com/ikawrakow/ik_llama.cpp/pull/418) ([461](https://github.com/ikawrakow/ik_llama.cpp/pull/461)) | [418](https://github.com/ikawrakow/ik_llama.cpp/pull/418) ([461](https://github.com/ikawrakow/ik_llama.cpp/pull/461)) | [89](https://github.com/ikawrakow/ik_llama.cpp/pull/89) | [83](https://github.com/ikawrakow/ik_llama.cpp/pull/493) ([493](https://github.com/ikawrakow/ik_llama.cpp/pull/493), [462](https://github.com/ikawrakow/ik_llama.cpp/pull/462)) | [417](https://github.com/ikawrakow/ik_llama.cpp/pull/417) ([461](https://github.com/ikawrakow/ik_llama.cpp/pull/461)) | [422](https://github.com/ikawrakow/ik_llama.cpp/pull/422) ([493](https://github.com/ikawrakow/ik_llama.cpp/pull/493), [462](https://github.com/ikawrakow/ik_llama.cpp/pull/462)) | [417](https://github.com/ikawrakow/ik_llama.cpp/pull/417) ([461](https://github.com/ikawrakow/ik_llama.cpp/pull/461)) | [417](https://github.com/ikawrakow/ik_llama.cpp/pull/417) |
|
||||
|
||||
#### Quantization improvements
|
||||
|
||||
|
||||
Reference in New Issue
Block a user