ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-24 15:14:10 +00:00

Files

Kawrakow 9e1d14f9c3 WIP GLM4.5 - this works

PP is already better than split mode layer, but TG for zero context
is kind of low - 60 vs 92 t/s. TG becomes better than split mode layer
at around 20k tokens. PP at 26k tokens is 1.55X of sm layer.

2025-11-30 18:05:13 +00:00

CMakeLists.txt

Enable and clean up compiler warnings in src (#824 )

2025-10-11 16:01:13 +03:00

llama-arch.cpp

Support GigaChat3 (#995 )

2025-11-24 06:55:14 +01:00

llama-arch.h

Support GigaChat3 (#995 )

2025-11-24 06:55:14 +01:00

llama-build-context.cpp

WIP GLM4.5 - this works

2025-11-30 18:05:13 +00:00

llama-build-context.h

WIP GLM4.5 - runs with wrong results

2025-11-30 18:05:13 +00:00

llama-context.h

WIP

2025-11-30 18:05:12 +00:00

llama-cparams.h

Graph reuse (#947 )

2025-11-14 06:58:19 +02:00

llama-grammar.cpp

Update grammar (#1023 )

2025-11-30 18:45:38 +01:00

llama-grammar.h

Update grammar (#1023 )

2025-11-30 18:45:38 +01:00

llama-hparams.cpp

Update mtmd to improve accuracy of M-RoPE (#993 )

2025-11-29 07:27:15 +01:00

llama-hparams.h

Update mtmd to improve accuracy of M-RoPE (#993 )

2025-11-29 07:27:15 +01:00

llama-impl.h

WIP

2025-11-30 18:05:12 +00:00

llama-load-tensors.cpp

WIP GLM4.5 - this works

2025-11-30 18:05:13 +00:00

llama-mmap.cpp

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00

llama-mmap.h

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00

llama-model-loader.cpp

CUDA: set compute parameters via command line arguments (#910 )

2025-11-07 07:11:23 +02:00

llama-model-loader.h

Merge Q, K, V (#878 )

2025-10-30 10:49:48 +02:00

llama-model.cpp

Allow distinct output tensor for Gemma models (#969 )

2025-11-16 12:12:41 +02:00

llama-model.h

WIP GLM4.5 - runs with wrong results

2025-11-30 18:05:13 +00:00

llama-quantize.cpp

Fix requatizing from row-interleaved quants (#992 )

2025-11-20 11:50:09 +01:00

llama-sampling.cpp

Update grammar (#1023 )

2025-11-30 18:45:38 +01:00

llama-sampling.h

add dry sampler (#513 )

2025-06-19 10:24:53 +03:00

llama-vocab.cpp

Update mtmd to improve accuracy of M-RoPE (#993 )

2025-11-29 07:27:15 +01:00

llama-vocab.h

Update mtmd to improve accuracy of M-RoPE (#993 )

2025-11-29 07:27:15 +01:00

llama.cpp

Rename split mode "row" to split mode "graph"

2025-11-30 18:05:13 +00:00

unicode-data.cpp

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

unicode-data.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

unicode.cpp

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00

unicode.h

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00