ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-05-11 00:20:19 +00:00

Files

firecoperana 49979ba9e9 llama: enable K-shift for quantized KV cache for cuda (#760 )

cuda: add q8_0->f32 cpy operation (#9571)
It will fail on unsupported backends or quant types.

Co-authored-by: Ivan <nekotekina@gmail.com>

2025-09-05 11:54:18 +02:00

CMakeLists.txt

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00

llama-arch.h

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00

llama-grammar.cpp

Tool calls support from mainline (#723 )

2025-09-01 08:38:49 +03:00

llama-grammar.h

Tool calls support from mainline (#723 )

2025-09-01 08:38:49 +03:00

llama-impl.h

Remove double definition of LLAMA_LOG_DEBUG

2025-09-01 08:42:04 +03:00

llama-mmap.cpp

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00

llama-mmap.h

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00

llama-model-loader.cpp

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00

llama-model-loader.h

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00

llama-sampling.cpp

Tool calls support from mainline (#723 )

2025-09-01 08:38:49 +03:00

llama-sampling.h

add dry sampler (#513 )

2025-06-19 10:24:53 +03:00

llama-vocab.cpp

Tool calls support from mainline (#723 )

2025-09-01 08:38:49 +03:00

llama-vocab.h

Tool calls support from mainline (#723 )

2025-09-01 08:38:49 +03:00

llama.cpp

llama: enable K-shift for quantized KV cache for cuda (#760 )

2025-09-05 11:54:18 +02:00

unicode-data.cpp

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

unicode-data.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

unicode.cpp

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00

unicode.h

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00