ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 17:20:01 +00:00

Files

Kawrakow ce3ce97a29 Fix repacked legacy quants (#951 )

* Fix q5_0_r4

The issue waqs in the tail part. As almost all models have tensor
rows that are multiple of 128, that part was never triggered in testing.
But ithe gpt-oss models have an embedding size of 2880, so we end
up there and trigger the bug.

* Fix q6_0_r4

Same fix as q5_0_r4

* Fix q4_0_r8

* Fix q5_0_r4 and q6_0_r4 also on Zen4

* Fix q4_0_r8 also on Zen4

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-11-13 15:35:37 +02:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

CUDA: set compute parameters via command line arguments (#910 )

2025-11-07 07:11:23 +02:00

src

Fix repacked legacy quants (#951 )

2025-11-13 15:35:37 +02:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Enable fusion by default (#939 )

2025-11-11 10:35:48 +02:00