ik_llama.cpp/src at 4daff01b3902776488e06cbf8f86597d708cb928 - ik_llama.cpp - Public git mirror

ikawrakow/ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-13 15:30:03 +00:00

Files

History

Kawrakow 4daff01b39 Refactor file llama.cpp (#823 )

* llama_model and llama_hparams

* llama_build_context

Surprisingly small reduction in llama.cpp compile time given
the reduction in LOCs (22k -> 14k)

* LLM_TN

llama.cpp compilation: 50 s -> 33 s

* llama_quantize

* arch names

* All graph building is now in llm-build-context.cpp

* hparams loading

llama.cpp is now just 9300 LOC, but still takes 32 seconds to compile.

* We are now at 6 seconds to build the src folder

* load -> create

We are not actually loading the tensors, but just creating them.

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-10-11 11:35:20 +03:00

..

CMakeLists.txt

Refactor file llama.cpp (#823 )

2025-10-11 11:35:20 +03:00

llama-arch.cpp

Refactor file llama.cpp (#823 )

2025-10-11 11:35:20 +03:00

llama-arch.h

Refactor file llama.cpp (#823 )

2025-10-11 11:35:20 +03:00

llama-build-context.cpp

Refactor file llama.cpp (#823 )

2025-10-11 11:35:20 +03:00

llama-build-context.h

Refactor file llama.cpp (#823 )

2025-10-11 11:35:20 +03:00

llama-context.h

Refactor file llama.cpp (#823 )

2025-10-11 11:35:20 +03:00

llama-cparams.h

Refactor file llama.cpp (#823 )

2025-10-11 11:35:20 +03:00

llama-grammar.cpp

Tool calls support from mainline (#723 )

2025-09-01 08:38:49 +03:00

llama-grammar.h

Tool calls support from mainline (#723 )

2025-09-01 08:38:49 +03:00

llama-hparams.cpp

Refactor file llama.cpp (#823 )

2025-10-11 11:35:20 +03:00

llama-hparams.h

Refactor file llama.cpp (#823 )

2025-10-11 11:35:20 +03:00

llama-impl.h

Remove double definition of LLAMA_LOG_DEBUG

2025-09-01 08:42:04 +03:00

llama-load-tensors.cpp

Refactor file llama.cpp (#823 )

2025-10-11 11:35:20 +03:00

llama-mmap.cpp

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00

llama-mmap.h

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00

llama-model-loader.cpp

Refactor file llama.cpp (#823 )

2025-10-11 11:35:20 +03:00

llama-model-loader.h

Refactor file llama.cpp (#823 )

2025-10-11 11:35:20 +03:00

llama-model.cpp

Refactor file llama.cpp (#823 )

2025-10-11 11:35:20 +03:00

llama-model.h

Refactor file llama.cpp (#823 )

2025-10-11 11:35:20 +03:00

llama-quantize.cpp

Refactor file llama.cpp (#823 )

2025-10-11 11:35:20 +03:00

llama-sampling.cpp

Tool calls support from mainline (#723 )

2025-09-01 08:38:49 +03:00

llama-sampling.h

add dry sampler (#513 )

2025-06-19 10:24:53 +03:00

llama-vocab.cpp

Port mdmd from mainline + Qwen2/2.5-VL support (#798 )

2025-09-27 08:45:29 +02:00

llama-vocab.h

model : add grok-2 support (#782 )

2025-09-23 16:31:01 +02:00

llama.cpp

Refactor file llama.cpp (#823 )

2025-10-11 11:35:20 +03:00

unicode-data.cpp

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

unicode-data.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

unicode.cpp

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00

unicode.h

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00