mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-20 13:14:09 +00:00
MoE fix for R4 quants (#170)
* Fix bug in iqk_mul_mat I recently added the possibility to have a matrix multiplication kernel that processes 16 columns in the right matrix per iteration. This introduced a bug that shows up when batch size is greater than 16, is not a multiple of 16, and the remainder is not a multiple of the maximum columns being processed by the regular kernels (and so, never showed up in my testing using TG-128 and PP-512). This commit fixes the issue. * Make sure rows per thread is a multiple of 4 also for MoE when using _r4 quants --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
@@ -139,6 +139,8 @@ int main(int argc, char ** argv) {
|
||||
const int n_ctx_req = is_pp_shared ? pp + pl*tg : pl*(pp + tg);
|
||||
|
||||
if (n_ctx_req > n_kv_max) {
|
||||
printf("n_ctx_req = %d is greater than n_kv_max = %d for pp = %d, tg = %d, pl = %d\n",
|
||||
n_ctx_req, n_kv_max, pp, tg, pl);
|
||||
continue;
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user