ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-04 13:30:47 +00:00

Files

Kawrakow c19404bcda MoE fix for R4 quants (#170 )

* Fix bug in iqk_mul_mat

I recently added the possibility to have a matrix multiplication
kernel that processes 16 columns in the right matrix per iteration.
This introduced a bug that shows up when batch size is greater
than 16, is not a multiple of 16, and the remainder is not a multiple
of the maximum columns being processed by the regular kernels
(and so, never showed up in my testing using TG-128 and PP-512).

This commit fixes the issue.

* Make sure rows per thread is a multiple of 4 also for MoE when using _r4 quants

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-01-12 13:19:14 +02:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Be able to re-quantize MS BitNet I2_S models (#169 )

2025-01-10 18:18:04 +02:00

src

MoE fix for R4 quants (#170 )

2025-01-12 13:19:14 +02:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Move to c++17 projectwide (#80 )

2024-10-04 14:43:26 +03:00