ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-05-11 08:30:19 +00:00

Files

Kawrakow d239dabcc6 Graph parallel for Qwen-3.5-MoE (#1347 )

* Graph parallel for Qwen3.5-MoE

* Add --max-gpu to llama-bench

* Fix graph reuse when not all GPUs participate in self-attention

2026-03-02 07:48:43 +01:00

baby-llama

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

batched

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

batched-bench

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

batched.swift

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

benchmark

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

convert-llama2c-to-ggml

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

cvector-generator

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

deprecation-warning

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

embedding

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

eval-callback

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

export-lora

Merge vulkan code from mainline up to commit of 6/28/2025 (#563 )

2025-07-02 08:49:42 +02:00

gbnf-validator

llama : add token matching support to llama-grammar (#1220 )

2026-02-03 07:57:17 +02:00

gguf

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

gguf-hash

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

gguf-split

Llama-quantize: Partial requant feature (#1313 )

2026-02-25 07:25:15 +01:00

gritlm

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

imatrix

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

infill

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

jeopardy

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

llama-bench

Graph parallel for Qwen-3.5-MoE (#1347 )

2026-03-02 07:48:43 +01:00

llama.android

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

llama.swiftui

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

llava

add dry sampler (#513 )

2025-06-19 10:24:53 +03:00

lookahead

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

lookup

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

main

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

main-cmake-pkg

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

mtmd

Make vision woork with Qwen-3.5 models (#1345 )

2026-03-01 17:44:37 +01:00

parallel

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

passkey

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

perplexity

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

quantize

Llama-quantize: Partial requant feature (#1313 )

2026-02-25 07:25:15 +01:00

quantize-stats

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

retrieval

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

rpc

Refactor chat and server file (#1062 )

2025-12-15 08:27:20 +01:00

save-load-state

server: enable checkpoint for recurrent models (#1310 )

2026-02-26 06:51:18 +01:00

server

server: add checkpoint tolerance and fix grammar_trigger init (#1346 )

2026-03-02 07:45:32 +01:00

simple

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

speculative

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

sweep-bench

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

sycl

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

tokenize

spec : add self speculative decoding, ngram and refactor (#1261 )

2026-02-13 19:04:55 +01:00

base-translate.sh

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

chat-13B.bat

Create chat-13B.bat (#592 )

2023-03-29 20:21:09 +03:00

chat-13B.sh

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

chat-persistent.sh

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

chat-vicuna.sh

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

chat.sh

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

CMakeLists.txt

Port mdmd from mainline + Qwen2/2.5-VL support (#798 )

2025-09-27 08:45:29 +02:00

convert_legacy_llama.py

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

json_schema_pydantic_example.py

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

json_schema_to_grammar.py

Update grammar (#1023 )

2025-11-30 18:45:38 +01:00

llama.vim

llama.vim : added api key support (#5090 )

2024-01-23 08:51:27 +02:00

llm.vim

llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879 )

2023-08-30 09:50:55 +03:00

Miku.sh

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

pydantic_models_to_grammar_examples.py

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

pydantic_models_to_grammar.py

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

reason-act.sh

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

regex_to_grammar.py

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

server_embd.py

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

server-llama2-13B.sh

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

ts-type-to-grammar.sh

JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )

2024-04-12 19:43:38 +01:00