ik_llama.cpp/616 - Adding IQ1_KT - 1.75 bpw SOTA quants.md at main - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 09:09:50 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

212 KiB

Raw Permalink Blame History

🔀 #616 - Adding IQ1_KT - 1.75 bpw SOTA quants

Author	`ikawrakow`
State	✅ Open
Created	2025-07-16
Updated	2025-07-19

Description

With Kimi-2 at 1 trillion parameters being the new rage of the day, my guess is that even more local inference enthusiasts will reach to very low bit-per-weight (bpw) quantized models. The state of affairs in mainline llama.cpp for very low bpw quants is not good:

Nothings has been done to improve quantization quality since I contributed IQ1_S and IQ1_M to mainline.
IQ1_M does not even have a CUDA quantized matrix multiplication kernel (a.k.a, MMQ), which results in a disastrous prompt processing (PP) performance

The situation is better in ik_llama.cpp performance wise, but quantization quality improvements for the sub-2 bpw quants have been relatively minor.

Hence, this PR adds IQ1_KT - 1.75 bpw quantization type based on an integer trellis similar to IQ2_KT, IQ3_KT and IQ4_KT. IQ1_KT uses

Per tensor row float scales
Blocks of 32 weights with 4-bit block scales
Groups of 8 quants per trellis sequence, each group requiring 13 bits.

Similar to the other *_KT quants

Performance is excellent on CUDA for PP and TG
PP performance is excellent on AVX2/AVX512 and ARM_NEON
TG performance is somewhat lower (~10-15%) than other quantization types of similar size
TG performance is bad on ARM_NEON

As trellis quants performance is very low on Metal (at least for my 30-core M2-Max GPU), I didn't not even bother to add a Metal implementation.

To illustrate the quantization quality compared to other quantization types, the next graph shows PPL(Q)/PPL(f16)-1 for LlaMA-3.1-8B-Instruct, which is notoriously hard to quantize. I have excluded the IQ1_M and IQ1_S data points as this would have extended the y-axis too much to be useful. We can see that IQ1_KT at 1.92 bpw provides nearly the same quality as IQ2_XXS at 2.13 bpw, so almost a 10% reduction in model size for comparable quantization quality. I have made the IQ2_KL data point magenta because it was also added very recently in PR #602.

💬 Conversation

👤 ubergarm commented the 2025-07-16 at 15:50:24:

With Kimi-2 at 1 trillion parameters being the new rage of the day, my guess is that even more local inference enthusiasts will reach to very low bit-per-weight (bpw) quantized models.

Indeed, people are asking me for sub 2bpw quants of Kimi-K2 already: https://huggingface.co/ubergarm/Kimi-K2-Instruct-GGUF/discussions/1#6876f91f7cf1ec76dfc9fa9e

I'm out of the office for a day or so, but will leave this IQ1_KT Kimi-K2 cooking with this recipe and see how it goes. Normally I leave ffn_down_exps slightly larger, but to get the size down gonna bonk all the routed exps down to 1.75bpw.

Guessing it will finish up around ~230GiB or so, still too large to fully offload on dual RTX 6000 PRO Blackwells haha...

👈 Secret Recipe

#!/usr/bin/env bash

custom="
## Attention [0-60] (GPU)
# Only ik's fork uses this, keep it q8_0 as its only for PP with -mla 3
blk\..*\.attn_kv_b\.weight=q8_0

# ideally k_b and v_b are smaller than q8_0 as they are is used for TG with -mla 3 (and ik's imatrix supports it)
# blk.*.attn_k_b.weight is not divisible by 256 so only supports qN_0 or iq4_nl
blk\..*\.attn_k_b\.weight=iq4_nl

# Balance of attn tensors
blk\..*\.attn_.*=iq4_kt

## First Single Dense Layer [0] (GPU)
blk\..*\.ffn_down\.weight=iq4_kt
blk\..*\.ffn_(gate|up)\.weight=iq3_kt

## Shared Expert [1-60] (GPU)
blk\..*\.ffn_down_shexp\.weight=iq4_kt
blk\..*\.ffn_(gate|up)_shexp\.weight=iq3_kt

## Routed Experts [1-60] (CPU)
blk\..*\.ffn_down_exps\.weight=iq1_kt
blk\..*\.ffn_(gate|up)_exps\.weight=iq1_kt

## Token embedding and output tensors (GPU)
token_embd\.weight=iq4_kt
output\.weight=iq5_ks
"

custom=$(
  echo "$custom" | grep -v '^#' | \
  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
)

numactl -N 1 -m 1 \
./build/bin/llama-quantize \
    --custom-q "$custom" \
    --imatrix /mnt/raid/models/ubergarm/Kimi-K2-Instruct-GGUF/imatrix-Kimi-K2-Instruct-Q8_0.dat \
    /mnt/raid/models/ubergarm/Kimi-K2-Instruct-GGUF/Kimi-K2-384x15B-Instruct-safetensors-BF16-00001-of-00045.gguf \
    /mnt/raid/models/ubergarm/Kimi-K2-Instruct-GGUF/Kimi-K2-Instruct-IQ1_KT.gguf \
    IQ1_KT \
    192

👤 ikawrakow commented the 2025-07-16 at 19:26:04:

@Nexesenex Thanks! Added the forgotten file.

👤 Nexesenex commented the 2025-07-16 at 21:36:24:

@ikawrakow : Thanks!

constants.py could be updated as well, I guess.

👤 ubergarm commented the 2025-07-17 at 00:39:25:

Cooked a slightly larger version just for comparison. Same recipe as above except larger iq2_kt for ffn_down_exps so more like my "normal" recipes

llm_load_print_meta: model params     = 1.027 T
llm_load_print_meta: model size       = 228.948 GiB (1.915 BPW)
llm_load_print_meta: repeating layers = 227.682 GiB (1.909 BPW, 1024.571 B parameters)
llm_load_print_meta: general.name     = Kimi K2 Instruct Bf16 Safetensors

llama_model_loader: - type  f32:  365 tensors
llama_model_loader: - type q8_0:   61 tensors
llama_model_loader: - type iq4_nl:   61 tensors
llama_model_loader: - type iq5_ks:    1 tensors
llama_model_loader: - type iq2_kt:   60 tensors
llama_model_loader: - type iq3_kt:  122 tensors
llama_model_loader: - type iq4_kt:  367 tensors
llama_model_loader: - type iq1_kt:  120 tensors

llama_print_timings:        load time =   80560.40 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time = 1917998.73 ms / 290816 tokens (    6.60 ms per token,   151.62 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time = 1936434.86 ms / 290817 tokens

Final estimate: PPL = 4.1310 +/- 0.02266

👤 magikRUKKOLA commented the 2025-07-19 at 01:30:36:

@ubergarm

Ok, I will retest the UD-IQ3_XXS.

Well, yeah, I retested the UD-IQ3-XXS from unsloth with the default settings and the results are below.

Final estimate: PPL = 3.1467 +/- 0.01596

Its possible I messed up the initial calculations due to non-default perplexity config. So my initial value was 3.1382 which seems to be incorrect. Thanks for letting me know!

export MALLOC_CONF="background_thread:true,percpu_arena:phycpu,metadata_thp:auto,dirty_decay_ms:10000,muzzy_decay_ms:60000"
export LD_PRELOAD=/usr/local/lib/libjemalloc.so

CUDA_VISIBLE_DEVICES="0,1" \
/opt/ik_llama.cpp/ik_llama.cpp/build/bin/llama-perplexity \
    -f /opt/ik_llama.cpp/wiki.test.raw \
    --model /opt/unsloth/Kimi-K2-Instruct-GGUF/UD-IQ3_XXS/Kimi-K2-Instruct-UD-IQ3_XXS-00001-of-00009.gguf \
    --alias unsloth/Kimi-K2-Instruct-UD-IQ3_XXS \
    --ctx-size $((512)) \
    -ub $((512)) \
    -ctk q8_0 \
    -mla 3 -fa \
    -amb 512 \
    -fmoe \
    --n-gpu-layers 99 \
    --override-tensor exps=CPU \
    --parallel 1 \
    --threads $(grep ^cpu\\scores /proc/cpuinfo | uniq | awk '{print $4}' | xargs -I{} echo "{}-0" | bc) \
    --host 0.0.0.0 \
    --port 8080 \
    --lookup-cache-dynamic /mnt/data/ik_llama.kv.dump


ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
main: build = 3808 (38012f72)
main: built with cc (Debian 14.2.0-19) 14.2.0 for x86_64-linux-gnu
main: seed  = 1752881437
llama_model_loader: additional 8 GGUFs metadata loaded.
llama_model_loader: loaded meta data with 62 key-value pairs and 1096 tensors from /opt/unsloth/Kimi-K2-Instruct-GGUF/UD-IQ3_XXS/Kimi-K2-Ins
truct-UD-IQ3_XXS-00001-of-00009.gguf (version GGUF V3 (latest))
...
*** Your prompt processing speed will be crippled ***

Consider making your own ik_llama.cpp compatible model or
ask the model provider to make one for you,
==========================================================================
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 1.0607 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = deepseek2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 163840
llm_load_print_meta: n_merges         = 163328
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 7168
llm_load_print_meta: n_layer          = 61
llm_load_print_meta: n_head           = 64
llm_load_print_meta: n_head_kv        = 64
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_swa_pattern    = 1
llm_load_print_meta: n_embd_head_k    = 192
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 12288
llm_load_print_meta: n_embd_v_gqa     = 8192
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 18432
llm_load_print_meta: n_expert         = 384
llm_load_print_meta: n_expert_used    = 8
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = yarn
llm_load_print_meta: freq_base_train  = 50000.0
llm_load_print_meta: freq_scale_train = 0.03125
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 671B
llm_load_print_meta: model ftype      = IQ3_XXS - 3.0625 bpw
llm_load_print_meta: model params     = 1.026 T
llm_load_print_meta: model size       = 388.003 GiB (3.247 BPW)
llm_load_print_meta: repeating layers = 386.491 GiB (3.242 BPW, 1024.059 B parameters)
llm_load_print_meta: general.name     = Kimi-K2-Instruct
llm_load_print_meta: BOS token        = 163584 '[BOS]'
llm_load_print_meta: EOS token        = 163586 '<|im_end|>'
llm_load_print_meta: PAD token        = 163839 '[PAD]'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 163586 '<|im_end|>'
llm_load_print_meta: max token length = 512
llm_load_print_meta: n_layer_dense_lead   = 1
llm_load_print_meta: n_lora_q             = 1536
llm_load_print_meta: n_lora_kv            = 512
llm_load_print_meta: n_ff_exp             = 2048
llm_load_print_meta: n_expert_shared      = 1
llm_load_print_meta: expert_weights_scale = 2.8
llm_load_print_meta: expert_weights_norm  = 1
llm_load_print_meta: expert_gating_func   = sigmoid
llm_load_print_meta: rope_yarn_log_mul    = 0.1000
llm_load_tensors: ggml ctx size =    1.35 MiB
...
llm_load_tensors: offloading 61 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 62/62 layers to GPU
llm_load_tensors:        CPU buffer size = 44823.65 MiB
llm_load_tensors:        CPU buffer size = 47456.06 MiB
llm_load_tensors:        CPU buffer size = 45899.98 MiB
llm_load_tensors:        CPU buffer size = 46406.32 MiB
llm_load_tensors:        CPU buffer size = 45897.95 MiB
llm_load_tensors:        CPU buffer size = 45899.09 MiB
llm_load_tensors:        CPU buffer size = 45903.13 MiB
llm_load_tensors:        CPU buffer size = 46126.73 MiB
llm_load_tensors:        CPU buffer size = 26822.94 MiB
llm_load_tensors:        CPU buffer size =   630.00 MiB
llm_load_tensors:      CUDA0 buffer size =  2998.56 MiB
llm_load_tensors:      CUDA1 buffer size =  3632.72 MiB
....................................................................................................
============ llm_prepare_mla: need to compute 61 wkv_b tensors
...
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 1
llama_new_context_with_model: mla_attn   = 3
llama_new_context_with_model: attn_max_b = 512
llama_new_context_with_model: fused_moe  = 1
llama_new_context_with_model: ser        = -1, 0
llama_new_context_with_model: freq_base  = 50000.0
llama_new_context_with_model: freq_scale = 0.03125
llama_kv_cache_init:      CUDA0 KV buffer size =    37.07 MiB
llama_kv_cache_init:      CUDA1 KV buffer size =    35.87 MiB
llama_new_context_with_model: KV self size  =   72.91 MiB, c^KV (q8_0):   72.91 MiB, kv^T: not used
llama_new_context_with_model:  CUDA_Host  output buffer size =     2.50 MiB
llama_new_context_with_model: pipeline parallelism enabled (n_copies=1)
llama_new_context_with_model:      CUDA0 compute buffer size =   263.00 MiB
llama_new_context_with_model:      CUDA1 compute buffer size =   334.00 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =   162.01 MiB
llama_new_context_with_model: graph nodes  = 3586
llama_new_context_with_model: graph splits = 123

system_info: n_threads = 64 / 128 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
perplexity: tokenizing the input ..
perplexity: tokenization took 910.573 ms
perplexity: calculating perplexity over 568 chunks, n_ctx=512, batch_size=2048, n_seq=4
perplexity: 47.59 seconds per pass - ETA 1 hours 52.62 minutes
[1]2.4402,[2]3.2625,[3]2.7728,[4]2.7844,[5]2.4434,[6]2.2209,[7]2.2743,[8]2.1760,[9]2.1280,[10]2.1779,[11]2.1036,[12]2.0877,[13]2.0981,[14]2.1244,[15]2.2348,[16]2.3364,[17]2.4509,[18]2.6415,[19]2.6341,[20]2.6690,[21]2.7114,[22]2.6991,[23]2.6637,[24]2.6005,[25]2.5621,[26]2.5216,[27]2.4967,[28]2.5094,[29]2.4895,[30]2.5136,[31]2.5486,[32]2.5543,[33]2.5785,[34]2.6011,[35]2.6537,[36]2.6763,[37]2.7021,[38]2.7616,[39]2.7924,[40]2.8271,[41]2.8815,[42]2.9177,[43]2.9329,[44]2.9529,[45]3.0282,[46]3.0813,[47]3.0751,[48]3.0435,[49]3.0158,[50]3.0184,[51]3.0462,[52]3.0724,[53]3.1099,[54]3.1123,[55]3.1226,[56]3.1415,[57]3.1249,[58]3.1263,[59]3.1443,[60]3.1822,[61]3.2109,[62]3.2374,[63]3.2661,[64]3.2831,[65]3.2993,[66]3.2916,[67]3.2796,[68]3.2570,[69]3.2600,[70]3.2654,[71]3.2423,[72]3.2270,[73]3.2153,[74]3.2256,[75]3.2352,[76]3.2106,[77]3.1840,[78]3.1672,[79]3.1658,[80]3.1370,[81]3.1144,[82]3.0953,[83]3.1231,[84]3.1159,[85]3.0917,[86]3.0745,[87]3.0550,[88]3.0508,[89]3.0369,[90]3.0450,[91]3.0328,[92]3.0210,[93]3.0096,[94]2.9879,[95]2.9714,[96]2.9472,[97]2.9493,[98]2.9482,[99]2.9355,[100]2.9206,[101]2.9173,[102]2.9167,[103]2.9417,[104]2.9715,[105]3.0042,[106]3.0105,[107]3.0387,[108]3.0660,[109]3.0839,[110]3.1193,[111]3.1550,[112]3.1754,[113]3.1715,[114]3.1695,[115]3.1597,[116]3.1450,[117]3.1330,[118]3.1342,[119]3.1297,[120]3.1278,[121]3.1166,[122]3.1045,[123]3.0960,[124]3.0911,[125]3.0776,[126]3.0671,[127]3.0565,[128]3.0534,[129]3.0539,[130]3.0560,[131]3.0569,[132]3.0592,[133]3.0522,[134]3.0566,[135]3.0717,[136]3.0667,[137]3.0601,[138]3.0667,[139]3.0655,[140]3.0809,[141]3.0765,[142]3.0728,[143]3.0742,[144]3.0702,[145]3.0697,[146]3.0658,[147]3.0530,[148]3.0516,[149]3.0474,[150]3.0475,[151]3.0482,[152]3.0406,[153]3.0408,[154]3.0366,[155]3.0307,[156]3.0317,[157]3.0326,[158]3.0281,[159]3.0316,[160]3.0278,[161]3.0222,[162]3.0263,[163]3.0288,[164]3.0447,[165]3.0468,[166]3.0622,[167]3.0728,[168]3.0872,[169]3.1035,[170]3.1238,[171]3.1427,[172]3.1657,[173]3.1806,[174]3.1749,[175]3.1695,[176]3.1583,[177]3.1556,[178]3.1547,[179]3.1467,[180]3.1341,[181]3.1313,[182]3.1324,[183]3.1488,[184]3.1637,[185]3.1785,[186]3.1908,[187]3.2003,[188]3.2178,[189]3.2334,[190]3.2471,[191]3.2558,[192]3.2574,[193]3.2642,[194]3.2665,[195]3.2658,[196]3.2801,[197]3.2852,[198]3.2983,[199]3.3091,[200]3.3121,[201]3.3188,[202]3.3144,[203]3.3297,[204]3.3277,[205]3.3334,[206]3.3333,[207]3.3358,[208]3.3374,[209]3.3439,[210]3.3487,[211]3.3536,[212]3.3544,[213]3.3516,[214]3.3534,[215]3.3533,[216]3.3571,[217]3.3662,[218]3.3620,[219]3.3605,[220]3.3561,[221]3.3567,[222]3.3561,[223]3.3577,[224]3.3584,[225]3.3582,[226]3.3624,[227]3.3671,[228]3.3530,[229]3.3536,[230]3.3499,[231]3.3473,[232]3.3542,[233]3.3636,[234]3.3694,[235]3.3611,[236]3.3585,[237]3.3571,[238]3.3612,[239]3.3656,[240]3.3684,[241]3.3762,[242]3.3856,[243]3.3938,[244]3.4020,[245]3.4132,[246]3.4218,[247]3.4244,[248]3.4328,[249]3.4376,[250]3.4371,[251]3.4275,[252]3.4152,[253]3.4055,[254]3.4000,[255]3.3961,[256]3.3938,[257]3.3953,[258]3.3939,[259]3.3920,[260]3.3889,[261]3.3870,[262]3.3828,[263]3.3787,[264]3.3739,[265]3.3687,[266]3.3668,[267]3.3673,[268]3.3628,[269]3.3588,[270]3.3531,[271]3.3484,[272]3.3444,[273]3.3397,[274]3.3385,[275]3.3302,[276]3.3266,[277]3.3214,[278]3.3200,[279]3.3140,[280]3.3131,[281]3.3192,[282]3.3233,[283]3.3299,[284]3.3378,[285]3.3447,[286]3.3497,[287]3.3613,[288]3.3691,[289]3.3749,[290]3.3751,[291]3.3770,[292]3.3782,[293]3.3809,[294]3.3716,[295]3.3720,[296]3.3778,[297]3.3796,[298]3.3836,[299]3.3875,[300]3.3897,[301]3.3946,[302]3.3992,[303]3.3987,[304]3.3954,[305]3.3971,[306]3.3961,[307]3.3972,[308]3.4018,[309]3.4028,[310]3.4023,[311]3.4029,[312]3.3962,[313]3.3942,[314]3.3987,[315]3.4001,[316]3.3971,[317]3.3952,[318]3.3904,[319]3.3853,[320]3.3804,[321]3.3724,[322]3.3648,[323]3.3578,[324]3.3515,[325]3.3468,[326]3.3395,[327]3.3368,[328]3.3318,[329]3.3303,[330]3.3232,[331]3.3273,[332]3.3214,[333]3.3223,[334]3.3229,[335]3.3258,[336]3.3300,[337]3.3292,[338]3.3291,[339]3.3289,[340]3.3285,[341]3.3281,[342]3.3337,[343]3.3345,[344]3.3344,[345]3.3426,[346]3.3482,[347]3.3523,[348]3.3470,[349]3.3436,[350]3.3410,[351]3.3392,[352]3.3327,[353]3.3259,[354]3.3200,[355]3.3196,[356]3.3205,[357]3.3181,[358]3.3146,[359]3.3113,[360]3.3124,[361]3.3096,[362]3.3042,[363]3.3005,[364]3.2967,[365]3.2931,[366]3.2890,[367]3.2853,[368]3.2813,[369]3.2815,[370]3.2824,[371]3.2766,[372]3.2741,[373]3.2691,[374]3.2642,[375]3.2619,[376]3.2577,[377]3.2522,[378]3.2504,[379]3.2470,[380]3.2441,[381]3.2424,[382]3.2434,[383]3.2388,[384]3.2403,[385]3.2426,[386]3.2464,[387]3.2506,[388]3.2557,[389]3.2585,[390]3.2627,[391]3.2675,[392]3.2696,[393]3.2623,[394]3.2593,[395]3.2538,[396]3.2513,[397]3.2479,[398]3.2442,[399]3.2375,[400]3.2418,[401]3.2345,[402]3.2295,[403]3.2237,[404]3.2245,[405]3.2212,[406]3.2144,[407]3.2069,[408]3.2032,[409]3.1968,[410]3.1908,[411]3.1883,[412]3.1839,[413]3.1779,[414]3.1731,[415]3.1723,[416]3.1699,[417]3.1705,[418]3.1666,[419]3.1621,[420]3.1571,[421]3.1521,[422]3.1509,[423]3.1475,[424]3.1481,[425]3.1456,[426]3.1441,[427]3.1388,[428]3.1358,[429]3.1338,[430]3.1311,[431]3.1261,[432]3.1227,[433]3.1179,[434]3.1150,[435]3.1127,[436]3.1078,[437]3.1024,[438]3.0975,[439]3.0925,[440]3.0897,[441]3.0847,[442]3.0831,[443]3.0796,[444]3.0791,[445]3.0819,[446]3.0867,[447]3.0929,[448]3.0911,[449]3.0885,[450]3.0881,[451]3.0917,[452]3.0956,[453]3.0971,[454]3.1002,[455]3.1025,[456]3.1081,[457]3.1097,[458]3.1119,[459]3.1160,[460]3.1170,[461]3.1208,[462]3.1227,[463]3.1298,[464]3.1350,[465]3.1377,[466]3.1383,[467]3.1383,[468]3.1393,[469]3.1447,[470]3.1429,[471]3.1419,[472]3.1472,[473]3.1495,[474]3.1500,[475]3.1528,[476]3.1543,[477]3.1548,[478]3.1566,[479]3.1566,[480]3.1580,[481]3.1583,[482]3.1575,[483]3.1582,[484]3.1586,[485]3.1580,[486]3.1605,[487]3.1580,[488]3.1596,[489]3.1575,[490]3.1664,[491]3.1702,[492]3.1747,[493]3.1759,[494]3.1783,[495]3.1831,[496]3.1852,[497]3.1878,[498]3.1921,[499]3.1924,[500]3.1923,[501]3.1927,[502]3.1942,[503]3.1960,[504]3.1959,[505]3.1996,[506]3.2027,[507]3.2090,[508]3.2094,[509]3.2104,[510]3.2116,[511]3.2167,[512]3.2221,[513]3.2267,[514]3.2280,[515]3.2246,[516]3.2225,[517]3.2219,[518]3.2189,[519]3.2153,[520]3.2144,[521]3.2123,[522]3.2095,[523]3.2085,[524]3.2076,[525]3.2039,[526]3.2043,[527]3.2034,[528]3.2039,[529]3.2053,[530]3.2027,[531]3.2024,[532]3.2009,[533]3.1985,[534]3.1978,[535]3.1962,[536]3.1949,[537]3.1938,[538]3.1885,[539]3.1851,[540]3.1850,[541]3.1827,[542]3.1850,[543]3.1849,[544]3.1856,[545]3.1850,[546]3.1850,[547]3.1847,[548]3.1883,[549]3.1886,[550]3.1881,[551]3.1902,[552]3.1862,[553]3.1828,[554]3.1773,[555]3.1734,[556]3.1718,[557]3.1671,[558]3.1619,[559]3.1584,[560]3.1579,[561]3.1548,[562]3.1528,[563]3.1507,[564]3.1500,[565]3.1484,[566]3.1510,[567]3.1492,[568]3.1467,
Final estimate: PPL = 3.1467 +/- 0.01596

llama_print_timings:        load time =  126687.38 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time = 6458901.47 ms / 290816 tokens (   22.21 ms per token,    45.03 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time = 6468857.58 ms / 290817 tokens

👤 ThomasBaruzier commented the 2025-07-19 at 15:59:44:

Thanks Iwan and Ubergram for the amazing work! You two motivated me to try Kimi on my "mere" 128GB + 3x3090 rig.

@ubergarm, I tried using your imatrix and script to test this new quant, and I have a few questions if you don’t mind.

Here’s the script I use - basically your recipe but with blk\..*\.ffn_(gate|up)_exps\.weight at iq1_s_r4.

Script

#!/bin/bash

set -e

imatrix='/home/user/storage/gguf/Kimi-K2-Instruct/Kimi-K2-Instruct-Q8_0.imatrix'
input='/home/user/storage/gguf/Kimi-K2-Instruct/Kimi-K2-Instruct-Q8_0.gguf'
output='/home/user/nvme/gguf/Kimi-K2-Instruct/Kimi-K2-Instruct-IQ1_S.gguf'

custom="
## Attention [0-60] (GPU)
# Only ik's fork uses this, keep it q8_0 as its only for PP with -mla 3
blk\..*\.attn_kv_b\.weight=q8_0

# ideally k_b and v_b are smaller than q8_0 as they are is used for TG with -mla 3 (and ik's imatrix supports it)
# blk.*.attn_k_b.weight is not divisible by 256 so only supports qN_0 or iq4_nl
blk\..*\.attn_k_b\.weight=iq4_nl

# Balance of attn tensors
blk\..*\.attn_.*=iq4_kt

## First Single Dense Layer [0] (GPU)
blk\..*\.ffn_down\.weight=iq4_kt
blk\..*\.ffn_(gate|up)\.weight=iq3_kt

## Shared Expert [1-60] (GPU)
blk\..*\.ffn_down_shexp\.weight=iq4_kt
blk\..*\.ffn_(gate|up)_shexp\.weight=iq3_kt

## Routed Experts [1-60] (CPU)
blk\..*\.ffn_down_exps\.weight=iq1_kt
blk\..*\.ffn_(gate|up)_exps\.weight=iq1_s_r4

## Token embedding and output tensors (GPU)
token_embd\.weight=iq4_kt
output\.weight=iq5_ks
"

if [ -f "$output" ]; then
  read -p "Quant already exists: $output. Continue? (N/y): " x
  [ "$x" != y ] && exit 0
  rm -f "$output"
fi

get_screen() {
  if [ -z "$STY" ]; then
    log_path=$(readlink -f "$0")
    log_path="${log_path%/*}/logs/${log_path##*/}"
    log_path="${log_path%.*}.log"
    screen -ls | grep -q "$screen_name" && \
    echo 'Process already running.' && exit 1
    echo "Launching the $screen_name screen..."
    mkdir -p "${log_path%/*}"
    echo '------------------------------------' >> "$log_path"
    screen -mS "$screen_name" -L -Logfile "$log_path" bash "$0" "$@"
    exit 0
  fi
}

screen_name='ik-kimi'
get_screen

custom=$(
  echo "$custom" | grep -v '^#' | \
  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
)

/home/user/files/ai/llama/ik_llama.cpp/llama-quantize \
  --allow-requantize \
  --custom-q "$custom" \
  --imatrix "$imatrix" \
  "$input" "$output" \
  IQ1_KT 32

Which tensors are unnecessary for MLA 3? It seems there are a few suspicious warnings:
- ====== llama_model_quantize_internal: did not find weights for token_embd.weight
- converting to iq4_kt .. cluster_points: Oops. Cluster 4 has no points: 0 1 0 0
- cluster_points: 1 out of 625 clusters dir not have any points
- ====== llama_model_quantize_internal: did not find weights for blk.0.attn_k_b.weight

It seems you already commented about Oops. Cluster X has no points in this repo, and it’s apparently harmless. However, could token_embd.weight be missing because I used Q8_0 as input? Note that the Q8_0 input was made from convert_hf_to_gguf.py:
python convert_hf_to_gguf.py --outfile /home/user/storage/gguf/Kimi-K2-Instruct/Kimi-K2-Instruct-Q8_0.gguf /home/user/storage/llm/Kimi-K2-Instruct-BF16/ --outtype q8_0 --model-name Kimi-K2-Instruct --split-max-size 9999G

Full logs (so far)

Adding custom rule blk\..*\.attn_kv_b\.weight -> q8_0
Adding custom rule blk\..*\.attn_k_b\.weight -> iq4_nl
Adding custom rule blk\..*\.attn_.* -> iq4_kt
Adding custom rule blk\..*\.ffn_down\.weight -> iq4_kt
Adding custom rule blk\..*\.ffn_(gate|up)\.weight -> iq3_kt
Adding custom rule blk\..*\.ffn_down_shexp\.weight -> iq4_kt
Adding custom rule blk\..*\.ffn_(gate|up)_shexp\.weight -> iq3_kt
Adding custom rule blk\..*\.ffn_down_exps\.weight -> iq1_kt
Adding custom rule blk\..*\.ffn_(gate|up)_exps\.weight -> iq1_s_r4
Adding custom rule token_embd\.weight -> iq4_kt
Adding custom rule output\.weight -> iq5_ks
load_imatrix: imatrix dataset='ubergarm-imatrix-calibration-corpus-v02.txt'
load_imatrix: loaded 729 importance matrix entries from /home/tyra/storage/gguf/Kimi-K2-Instruct/Kimi-K2-Instruct-Q8_0.imatrix computed on 826 chunks
prepare_imatrix: have 729 importance matrix entries
main: build = 3818 (77eaa532)
main: built with cc (GCC) 15.1.1 20250425 for x86_64-pc-linux-gnu
main: quantizing '/home/tyra/storage/gguf/Kimi-K2-Instruct/Kimi-K2-Instruct-Q8_0.gguf' to '/home/tyra/nvme/gguf/Kimi-K2-Instruct/Kimi-K2-Instruct-IQ1_S.gguf' as IQ1_KT using 32 threads
llama_model_loader: loaded meta data with 50 key-value pairs and 1157 tensors from /home/tyra/storage/gguf/Kimi-K2-Instruct/Kimi-K2-Instruct-Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = deepseek2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Kimi-K2-Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Kimi-K2
llama_model_loader: - kv   5:                         general.size_label str              = 384x15B
llama_model_loader: - kv   6:                            general.license str              = other
llama_model_loader: - kv   7:                       general.license.name str              = modified-mit
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Kimi K2 Instruct
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Moonshotai
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/moonshotai/Kim...
llama_model_loader: - kv  12:                               general.tags arr[str,1]       = ["unsloth"]
llama_model_loader: - kv  13:                      deepseek2.block_count u32              = 61
llama_model_loader: - kv  14:                   deepseek2.context_length u32              = 131072
llama_model_loader: - kv  15:                 deepseek2.embedding_length u32              = 7168
llama_model_loader: - kv  16:              deepseek2.feed_forward_length u32              = 18432
llama_model_loader: - kv  17:             deepseek2.attention.head_count u32              = 64
llama_model_loader: - kv  18:          deepseek2.attention.head_count_kv u32              = 64
llama_model_loader: - kv  19:                   deepseek2.rope.freq_base f32              = 50000.000000
llama_model_loader: - kv  20: deepseek2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  21:                deepseek2.expert_used_count u32              = 8
llama_model_loader: - kv  22:                          general.file_type u32              = 7
llama_model_loader: - kv  23:        deepseek2.leading_dense_block_count u32              = 1
llama_model_loader: - kv  24:                       deepseek2.vocab_size u32              = 163840
llama_model_loader: - kv  25:            deepseek2.attention.q_lora_rank u32              = 1536
llama_model_loader: - kv  26:           deepseek2.attention.kv_lora_rank u32              = 512
llama_model_loader: - kv  27:             deepseek2.attention.key_length u32              = 192
llama_model_loader: - kv  28:           deepseek2.attention.value_length u32              = 128
llama_model_loader: - kv  29:       deepseek2.expert_feed_forward_length u32              = 2048
llama_model_loader: - kv  30:                     deepseek2.expert_count u32              = 384
llama_model_loader: - kv  31:              deepseek2.expert_shared_count u32              = 1
llama_model_loader: - kv  32:             deepseek2.expert_weights_scale f32              = 2.827000
llama_model_loader: - kv  33:              deepseek2.expert_weights_norm bool             = true
llama_model_loader: - kv  34:               deepseek2.expert_gating_func u32              = 2
llama_model_loader: - kv  35:             deepseek2.rope.dimension_count u32              = 64
llama_model_loader: - kv  36:                deepseek2.rope.scaling.type str              = yarn
llama_model_loader: - kv  37:              deepseek2.rope.scaling.factor f32              = 32.000000
llama_model_loader: - kv  38: deepseek2.rope.scaling.original_context_length u32              = 4096
llama_model_loader: - kv  39: deepseek2.rope.scaling.yarn_log_multiplier f32              = 0.100000
llama_model_loader: - kv  40:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  41:                         tokenizer.ggml.pre str              = kimi-k2
llama_model_loader: - kv  42:                      tokenizer.ggml.tokens arr[str,163840]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  43:                  tokenizer.ggml.token_type arr[i32,163840]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  44:                      tokenizer.ggml.merges arr[str,163328]  = ["Ġ Ġ", "ĠĠ ĠĠ", "Ġ t", "i n",...
llama_model_loader: - kv  45:                tokenizer.ggml.bos_token_id u32              = 163584
llama_model_loader: - kv  46:                tokenizer.ggml.eos_token_id u32              = 163585
llama_model_loader: - kv  47:            tokenizer.ggml.padding_token_id u32              = 163839
llama_model_loader: - kv  48:                    tokenizer.chat_template str              = {%- if tools -%}\n  <|im_system|>tool_...
llama_model_loader: - kv  49:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  365 tensors
llama_model_loader: - type q8_0:  792 tensors
================================ Have weights data with 729 entries
[   1/1157]                    token_embd.weight - [ 7168, 163840,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor token_embd.weight

====== llama_model_quantize_internal: did not find weights for token_embd.weight
converting to iq4_kt .. cluster_points: Oops. Cluster 4 has no points:  0 1 0 0
cluster_points: 1 out of 625 clusters dir not have any points
 size =  1190.00 MiB ->   560.62 MiB
[   2/1157]               blk.0.attn_norm.weight - [ 7168,     1,     1,     1], type =    f32, size =    0.027 MB
[   3/1157]                blk.0.ffn_down.weight - [18432,  7168,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.0.ffn_down.weight
converting to iq4_kt .. size =   133.88 MiB ->    63.03 MiB
[   4/1157]                blk.0.ffn_gate.weight - [ 7168, 18432,     1,     1], type =   q8_0, Using custom type iq3_kt for tensor blk.0.ffn_gate.weight
converting to iq3_kt .. size =   133.88 MiB ->    49.29 MiB
[   5/1157]                  blk.0.ffn_up.weight - [ 7168, 18432,     1,     1], type =   q8_0, Using custom type iq3_kt for tensor blk.0.ffn_up.weight
converting to iq3_kt .. size =   133.88 MiB ->    49.29 MiB
[   6/1157]                blk.0.ffn_norm.weight - [ 7168,     1,     1,     1], type =    f32, size =    0.027 MB
[   7/1157]          blk.0.attn_kv_a_norm.weight - [  512,     1,     1,     1], type =    f32, size =    0.002 MB
[   8/1157]           blk.0.attn_kv_a_mqa.weight - [ 7168,   576,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.0.attn_kv_a_mqa.weight
converting to iq4_kt .. size =     4.18 MiB ->     1.97 MiB
[   9/1157]               blk.0.attn_kv_b.weight - [  512, 16384,     1,     1], type =   q8_0, Using custom type q8_0 for tensor blk.0.attn_kv_b.weight
size =    8.500 MB
[  10/1157]                blk.0.attn_k_b.weight - [  128, 32768,     1,     1], type =   q8_0, Using custom type iq4_nl for tensor blk.0.attn_k_b.weight

====== llama_model_quantize_internal: did not find weights for blk.0.attn_k_b.weight
converting to iq4_nl .. size =     4.25 MiB ->     2.25 MiB
[  11/1157]                blk.0.attn_v_b.weight - [  512,  8192,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.0.attn_v_b.weight

====== llama_model_quantize_internal: did not find weights for blk.0.attn_v_b.weight
converting to iq4_kt .. size =     4.25 MiB ->     2.03 MiB
[  12/1157]             blk.0.attn_output.weight - [ 8192,  7168,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.0.attn_output.weight
converting to iq4_kt .. size =    59.50 MiB ->    28.03 MiB
[  13/1157]           blk.0.attn_q_a_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  14/1157]                blk.0.attn_q_a.weight - [ 7168,  1536,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.0.attn_q_a.weight
converting to iq4_kt .. size =    11.16 MiB ->     5.26 MiB
[  15/1157]                blk.0.attn_q_b.weight - [ 1536, 12288,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.0.attn_q_b.weight
converting to iq4_kt .. size =    19.12 MiB ->     9.05 MiB
[  16/1157]               blk.9.attn_norm.weight - [ 7168,     1,     1,     1], type =    f32, size =    0.027 MB
[  17/1157]           blk.9.ffn_down_exps.weight - [ 2048,  7168,   384,     1], type =   q8_0, Using custom type iq1_kt for tensor blk.9.ffn_down_exps.weight
converting to iq1_kt .. size =  5712.00 MiB ->  1186.50 MiB
[  18/1157]           blk.9.ffn_gate_exps.weight - [ 7168,  2048,   384,     1], type =   q8_0, Using custom type iq1_s_r4 for tensor blk.9.ffn_gate_exps.weight
converting to iq1_s_r4 .. size =  5712.00 MiB ->  1009.50 MiB
[  19/1157]             blk.9.ffn_up_exps.weight - [ 7168,  2048,   384,     1], type =   q8_0, Using custom type iq1_s_r4 for tensor blk.9.ffn_up_exps.weight
converting to iq1_s_r4 .. size =  5712.00 MiB ->  1009.50 MiB
[  20/1157]               blk.9.exp_probs_b.bias - [  384,     1,     1,     1], type =    f32, size =    0.001 MB
[  21/1157]            blk.9.ffn_gate_inp.weight - [ 7168,   384,     1,     1], type =    f32, size =   10.500 MB
[  22/1157]          blk.9.ffn_down_shexp.weight - [ 2048,  7168,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.9.ffn_down_shexp.weight
converting to iq4_kt .. size =    14.88 MiB ->     7.03 MiB
[  23/1157]          blk.9.ffn_gate_shexp.weight - [ 7168,  2048,     1,     1], type =   q8_0, Using custom type iq3_kt for tensor blk.9.ffn_gate_shexp.weight
converting to iq3_kt .. size =    14.88 MiB ->     5.48 MiB
[  24/1157]            blk.9.ffn_up_shexp.weight - [ 7168,  2048,     1,     1], type =   q8_0, Using custom type iq3_kt for tensor blk.9.ffn_up_shexp.weight
converting to iq3_kt .. size =    14.88 MiB ->     5.48 MiB
[  25/1157]                blk.9.ffn_norm.weight - [ 7168,     1,     1,     1], type =    f32, size =    0.027 MB
[  26/1157]          blk.9.attn_kv_a_norm.weight - [  512,     1,     1,     1], type =    f32, size =    0.002 MB
[  27/1157]           blk.9.attn_kv_a_mqa.weight - [ 7168,   576,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.9.attn_kv_a_mqa.weight
converting to iq4_kt .. size =     4.18 MiB ->     1.97 MiB
[  28/1157]               blk.9.attn_kv_b.weight - [  512, 16384,     1,     1], type =   q8_0, Using custom type q8_0 for tensor blk.9.attn_kv_b.weight
size =    8.500 MB
[  29/1157]                blk.9.attn_k_b.weight - [  128, 32768,     1,     1], type =   q8_0, Using custom type iq4_nl for tensor blk.9.attn_k_b.weight

====== llama_model_quantize_internal: did not find weights for blk.9.attn_k_b.weight
converting to iq4_nl .. size =     4.25 MiB ->     2.25 MiB
[  30/1157]                blk.9.attn_v_b.weight - [  512,  8192,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.9.attn_v_b.weight

====== llama_model_quantize_internal: did not find weights for blk.9.attn_v_b.weight
converting to iq4_kt .. size =     4.25 MiB ->     2.03 MiB
[  31/1157]             blk.9.attn_output.weight - [ 8192,  7168,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.9.attn_output.weight
converting to iq4_kt .. size =    59.50 MiB ->    28.03 MiB
[  32/1157]           blk.9.attn_q_a_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  33/1157]                blk.9.attn_q_a.weight - [ 7168,  1536,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.9.attn_q_a.weight
converting to iq4_kt .. size =    11.16 MiB ->     5.26 MiB
[  34/1157]                blk.9.attn_q_b.weight - [ 1536, 12288,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.9.attn_q_b.weight
converting to iq4_kt .. size =    19.12 MiB ->     9.05 MiB
[  35/1157]              blk.10.attn_norm.weight - [ 7168,     1,     1,     1], type =    f32, size =    0.027 MB
[  36/1157]          blk.10.ffn_down_exps.weight - [ 2048,  7168,   384,     1], type =   q8_0, Using custom type iq1_kt for tensor blk.10.ffn_down_exps.weight
converting to iq1_kt .. size =  5712.00 MiB ->  1186.50 MiB
[  37/1157]          blk.10.ffn_gate_exps.weight - [ 7168,  2048,   384,     1], type =   q8_0, Using custom type iq1_s_r4 for tensor blk.10.ffn_gate_exps.weight
converting to iq1_s_r4 .. size =  5712.00 MiB ->  1009.50 MiB
[  38/1157]            blk.10.ffn_up_exps.weight - [ 7168,  2048,   384,     1], type =   q8_0, Using custom type iq1_s_r4 for tensor blk.10.ffn_up_exps.weight
converting to iq1_s_r4 .. size =  5712.00 MiB ->  1009.50 MiB
[  39/1157]              blk.10.exp_probs_b.bias - [  384,     1,     1,     1], type =    f32, size =    0.001 MB
[  40/1157]           blk.10.ffn_gate_inp.weight - [ 7168,   384,     1,     1], type =    f32, size =   10.500 MB
[  41/1157]         blk.10.ffn_down_shexp.weight - [ 2048,  7168,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.10.ffn_down_shexp.weight
converting to iq4_kt .. size =    14.88 MiB ->     7.03 MiB
[  42/1157]         blk.10.ffn_gate_shexp.weight - [ 7168,  2048,     1,     1], type =   q8_0, Using custom type iq3_kt for tensor blk.10.ffn_gate_shexp.weight
converting to iq3_kt .. size =    14.88 MiB ->     5.48 MiB
[  43/1157]           blk.10.ffn_up_shexp.weight - [ 7168,  2048,     1,     1], type =   q8_0, Using custom type iq3_kt for tensor blk.10.ffn_up_shexp.weight
converting to iq3_kt .. size =    14.88 MiB ->     5.48 MiB
[  44/1157]               blk.10.ffn_norm.weight - [ 7168,     1,     1,     1], type =    f32, size =    0.027 MB
[  45/1157]         blk.10.attn_kv_a_norm.weight - [  512,     1,     1,     1], type =    f32, size =    0.002 MB
[  46/1157]          blk.10.attn_kv_a_mqa.weight - [ 7168,   576,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.10.attn_kv_a_mqa.weight
converting to iq4_kt .. size =     4.18 MiB ->     1.97 MiB
[  47/1157]              blk.10.attn_kv_b.weight - [  512, 16384,     1,     1], type =   q8_0, Using custom type q8_0 for tensor blk.10.attn_kv_b.weight
size =    8.500 MB
[  48/1157]               blk.10.attn_k_b.weight - [  128, 32768,     1,     1], type =   q8_0, Using custom type iq4_nl for tensor blk.10.attn_k_b.weight

====== llama_model_quantize_internal: did not find weights for blk.10.attn_k_b.weight
converting to iq4_nl .. size =     4.25 MiB ->     2.25 MiB
[  49/1157]               blk.10.attn_v_b.weight - [  512,  8192,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.10.attn_v_b.weight

====== llama_model_quantize_internal: did not find weights for blk.10.attn_v_b.weight
converting to iq4_kt .. size =     4.25 MiB ->     2.03 MiB
[  50/1157]            blk.10.attn_output.weight - [ 8192,  7168,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.10.attn_output.weight
converting to iq4_kt .. size =    59.50 MiB ->    28.03 MiB
[  51/1157]          blk.10.attn_q_a_norm.weight - [ 1536,     1,     1,     1], type =    f32, size =    0.006 MB
[  52/1157]               blk.10.attn_q_a.weight - [ 7168,  1536,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.10.attn_q_a.weight
converting to iq4_kt .. size =    11.16 MiB ->     5.26 MiB
[  53/1157]               blk.10.attn_q_b.weight - [ 1536, 12288,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.10.attn_q_b.weight
converting to iq4_kt .. size =    19.12 MiB ->     9.05 MiB
[  54/1157]              blk.11.attn_norm.weight - [ 7168,     1,     1,     1], type =    f32, size =    0.027 MB
[  55/1157]          blk.11.ffn_down_exps.weight - [ 2048,  7168,   384,     1], type =   q8_0, Using custom type iq1_kt for tensor blk.11.ffn_down_exps.weight
converting to iq1_kt .. size =  5712.00 MiB ->  1186.50 MiB
[  56/1157]          blk.11.ffn_gate_exps.weight - [ 7168,  2048,   384,     1], type =   q8_0, Using custom type iq1_s_r4 for tensor blk.11.ffn_gate_exps.weight
converting to iq1_s_r4 .. size =  5712.00 MiB ->  1009.50 MiB
[  57/1157]            blk.11.ffn_up_exps.weight - [ 7168,  2048,   384,     1], type =   q8_0, Using custom type iq1_s_r4 for tensor blk.11.ffn_up_exps.weight
converting to iq1_s_r4 .. size =  5712.00 MiB ->  1009.50 MiB
[  58/1157]              blk.11.exp_probs_b.bias - [  384,     1,     1,     1], type =    f32, size =    0.001 MB
[  59/1157]           blk.11.ffn_gate_inp.weight - [ 7168,   384,     1,     1], type =    f32, size =   10.500 MB
[  60/1157]         blk.11.ffn_down_shexp.weight - [ 2048,  7168,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.11.ffn_down_shexp.weight
converting to iq4_kt .. size =    14.88 MiB ->     7.03 MiB
[  61/1157]         blk.11.ffn_gate_shexp.weight - [ 7168,  2048,     1,     1], type =   q8_0, Using custom type iq3_kt for tensor blk.11.ffn_gate_shexp.weight
converting to iq3_kt .. size =    14.88 MiB ->     5.48 MiB
[  62/1157]           blk.11.ffn_up_shexp.weight - [ 7168,  2048,     1,     1], type =   q8_0, Using custom type iq3_kt for tensor blk.11.ffn_up_shexp.weight
converting to iq3_kt .. size =    14.88 MiB ->     5.48 MiB
[  63/1157]               blk.11.ffn_norm.weight - [ 7168,     1,     1,     1], type =    f32, size =    0.027 MB
[  64/1157]         blk.11.attn_kv_a_norm.weight - [  512,     1,     1,     1], type =    f32, size =    0.002 MB
[  65/1157]          blk.11.attn_kv_a_mqa.weight - [ 7168,   576,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.11.attn_kv_a_mqa.weight
converting to iq4_kt .. size =     4.18 MiB ->     1.97 MiB
[  66/1157]              blk.11.attn_kv_b.weight - [  512, 16384,     1,     1], type =   q8_0, Using custom type q8_0 for tensor blk.11.attn_kv_b.weight
size =    8.500 MB
[  67/1157]               blk.11.attn_k_b.weight - [  128, 32768,     1,     1], type =   q8_0, Using custom type iq4_nl for tensor blk.11.attn_k_b.weight

====== llama_model_quantize_internal: did not find weights for blk.11.attn_k_b.weight
converting to iq4_nl .. size =     4.25 MiB ->     2.25 MiB
[  68/1157]               blk.11.attn_v_b.weight - [  512,  8192,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.11.attn_v_b.weight

====== llama_model_quantize_internal: did not find weights for blk.11.attn_v_b.weight
converting to iq4_kt .. size =     4.25 MiB ->     2.03 MiB
[  69/1157]            blk.11.attn_output.weight - [ 8192,  7168,     1,     1], type =   q8_0, Using custom type iq4_kt for tensor blk.11.attn_output.weight
converting to iq4_kt ..

How much accuracy do we lose by requantizing from Q8_0 instead of BF16?

Thanks!

👤 ubergarm commented the 2025-07-19 at 16:38:41:

The warning about missing imatrix data for attn_k_b is not good.

Hrrm, I too see this for my Kimi-K2-Instruct quantize logs:

====== llama_model_quantize_internal: did not find weights for blk.5.attn_kv_b.weight
====== llama_model_quantize_internal: did not find weights for blk.5.attn_k_b.weight
====== llama_model_quantize_internal: did not find weights for blk.5.attn_v_b.weight

Looking back at my deepseek quantization logs it only has:

====== llama_model_quantize_internal: did not find weights for blk.47.attn_k_b.weight

The main difference is that for kimi-k2 imatrix i used -mla 1 whereas with the older deepseek imatrix i did not specify -mla at all?

Also, yesterday I discovered that Kimi-K2-Instruct seems very sensitive to attn/shexp/blk.0.ffn.* or possibly just attn. I'm thinking it is because Kimi-K2 uses half the attn heads and 33% of the ffn dense layers as DeepSeek. So going back and requantizing my recipes with full q8_0 attn/shexp/blk.0.ffn.* is improving PP a lot for a little BPW.

So now I'm not sure if this is because of those architecture changes in Kimi-K2, or perhaps just my imatrix was not being properly applied to the MLA tensors? hrmm...

I'm updating the chart and data with what I have so far up above: https://github.com/ikawrakow/ik_llama.cpp/pull/616#issuecomment-3087170346

👤 magikRUKKOLA commented the 2025-07-19 at 22:39:56:

@ubergarm

Here is my dump:

/opt/unsloth/Kimi-K2-Instruct-GGUF/UD-IQ3_XXS# find ./ -name "*gguf" | xargs -I{} gguf-dump "./{}" &> /tmp/dump.log

--- /tmp/dump2.log	2025-07-20 01:34:55.913286620 +0300
+++ /tmp/dump.log	2025-07-20 01:36:37.213790237 +0300
@@ -1,9 +1,9 @@
-INFO:gguf-dump:* Loading: /mnt/data/models/unsloth/Kimi-K2-Instruct-GGUF/UD-IQ3_XXS/Kimi-K2-Instruct-UD-IQ3_XXS-00001-of-00009.gguf
+INFO:gguf-dump:* Loading: ././Kimi-K2-Instruct-UD-IQ3_XXS-00001-of-00009.gguf
 * File is LITTLE endian, script is running on a LITTLE endian host.
-* Dumping 64 key/value pair(s)
+* Dumping 65 key/value pair(s)
       1: UINT32     |        1 | GGUF.version = 3
       2: UINT64     |        1 | GGUF.tensor_count = 134
-      3: UINT64     |        1 | GGUF.kv_count = 61
+      3: UINT64     |        1 | GGUF.kv_count = 62
       4: STRING     |        1 | general.architecture = 'deepseek2'
       5: STRING     |        1 | general.type = 'model'
       6: STRING     |        1 | general.name = 'Kimi-K2-Instruct'
@@ -15,10 +15,10 @@
      12: STRING     |        1 | general.license.name = 'modified-mit'
      13: STRING     |        1 | general.repo_url = 'https://huggingface.co/unsloth'
      14: UINT32     |        1 | general.base_model.count = 1
-     15: STRING     |        1 | general.base_model.0.name = 'Kimi K2 Instruct'
+     15: STRING     |        1 | general.base_model.0.name = 'Kimi K2 Instruct BF16'
      16: STRING     |        1 | general.base_model.0.organization = 'Moonshotai'
-     17: STRING     |        1 | general.base_model.0.repo_url = 'https://huggingface.co/moonshotai/Kimi-K2-Instruct'
-     18: [STRING]   |        1 | general.tags
+     17: STRING     |        1 | general.base_model.0.repo_url = 'https://huggingface.co/moonshotai/Kimi-K2-Instruct-BF16'
+     18: [STRING]   |       11 | general.tags = ['unsloth', 'unsloth', 'unsloth', 'unsloth', 'unsloth', 'unsloth', ...]
      19: UINT32     |        1 | deepseek2.block_count = 61
      20: UINT32     |        1 | deepseek2.context_length = 131072
      21: UINT32     |        1 | deepseek2.embedding_length = 7168
@@ -47,24 +47,25 @@
      44: FLOAT32    |        1 | deepseek2.rope.scaling.factor = 32.0
      45: UINT32     |        1 | deepseek2.rope.scaling.original_context_length = 4096
      46: FLOAT32    |        1 | deepseek2.rope.scaling.yarn_log_multiplier = 0.10000000149011612
-     47: STRING     |        1 | tokenizer.ggml.model = 'gpt2'
-     48: STRING     |        1 | tokenizer.ggml.pre = 'kimi-k2'
-     49: [STRING]   |   163840 | tokenizer.ggml.tokens
-     50: [INT32]    |   163840 | tokenizer.ggml.token_type
-     51: [STRING]   |   163328 | tokenizer.ggml.merges
-     52: UINT32     |        1 | tokenizer.ggml.bos_token_id = 163584
-     53: UINT32     |        1 | tokenizer.ggml.eos_token_id = 163585
-     54: UINT32     |        1 | tokenizer.ggml.padding_token_id = 163839
-     55: STRING     |        1 | tokenizer.chat_template = '{%- if tools -%}\n  <|im_system|>tool_declare<|im_middle|>{{ '
-     56: UINT32     |        1 | general.quantization_version = 2
-     57: UINT32     |        1 | general.file_type = 23
-     58: STRING     |        1 | quantize.imatrix.file = 'Kimi-K2-Instruct-GGUF/imatrix_unsloth.dat'
-     59: STRING     |        1 | quantize.imatrix.dataset = 'unsloth_calibration_Kimi-K2-Instruct.txt'
-     60: UINT32     |        1 | quantize.imatrix.entries_count = 667
-     61: UINT32     |        1 | quantize.imatrix.chunks_count = 714
-     62: UINT16     |        1 | split.no = 0
-     63: INT32      |        1 | split.tensors.count = 1096
-     64: UINT16     |        1 | split.count = 9
+     47: UINT32     |        1 | tokenizer.ggml.bos_token_id = 163584
+     48: UINT32     |        1 | tokenizer.ggml.eos_token_id = 163586
+     49: UINT32     |        1 | tokenizer.ggml.padding_token_id = 163839
+     50: STRING     |        1 | tokenizer.chat_template = "{% if tools -%}\n    {{ '<|im_system|>tool_declare<|im_mid..."
+     51: BOOL       |        1 | tokenizer.ggml.add_bos_token = False
+     52: STRING     |        1 | tokenizer.ggml.model = 'gpt2'
+     53: STRING     |        1 | tokenizer.ggml.pre = 'kimi-k2'
+     54: [STRING]   |   163840 | tokenizer.ggml.tokens = ['!', '"', '#', '$', '%', '&', ...]
+     55: [INT32]    |   163840 | tokenizer.ggml.token_type = [1, 1, 1, 1, 1, 1, ...]
+     56: [STRING]   |   163328 | tokenizer.ggml.merges = ['Ġ Ġ', 'ĠĠ ĠĠ', 'Ġ t', 'i n', 'ä ¸', 'Ġ a', ...]
+     57: UINT32     |        1 | general.quantization_version = 2
+     58: UINT32     |        1 | general.file_type = 23
+     59: STRING     |        1 | quantize.imatrix.file = 'Kimi-K2-Instruct-GGUF/imatrix_unsloth.dat'
+     60: STRING     |        1 | quantize.imatrix.dataset = 'unsloth_calibration_Kimi-K2-Instruct.txt'
+     61: UINT32     |        1 | quantize.imatrix.entries_count = 667
+     62: UINT32     |        1 | quantize.imatrix.chunks_count = 714
+     63: UINT16     |        1 | split.no = 0
+     64: INT32      |        1 | split.tensors.count = 1096
+     65: UINT16     |        1 | split.count = 9
 * Dumping 134 tensor(s)
       1: 1174405120 |  7168, 163840,     1,     1 | Q6_K    | output.weight
       2:       7168 |  7168,     1,     1,     1 | F32     | output_norm.weight
@@ -200,7 +201,7 @@
     132:   18874368 |  1536, 12288,     1,     1 | Q5_K    | blk.7.attn_q_b.weight
     133:    4194304 |   512,   128,    64,     1 | Q8_0    | blk.7.attn_v_b.weight
     134:        384 |   384,     1,     1,     1 | F32     | blk.7.exp_probs_b.bias
-INFO:gguf-dump:* Loading: /mnt/data/models/unsloth/Kimi-K2-Instruct-GGUF/UD-IQ3_XXS/Kimi-K2-Instruct-UD-IQ3_XXS-00002-of-00009.gguf
+INFO:gguf-dump:* Loading: ././Kimi-K2-Instruct-UD-IQ3_XXS-00002-of-00009.gguf
 * File is LITTLE endian, script is running on a LITTLE endian host.
 * Dumping 6 key/value pair(s)
       1: UINT32     |        1 | GGUF.version = 3
@@ -338,7 +339,7 @@
     126:        384 |   384,     1,     1,     1 | F32     | blk.14.exp_probs_b.bias
     127: 5637144576 |  2048,  7168,   384,     1 | IQ3_XXS | blk.14.ffn_down_exps.weight
     128:   14680064 |  2048,  7168,     1,     1 | IQ4_XS  | blk.14.ffn_down_shexp.weight
-INFO:gguf-dump:* Loading: /mnt/data/models/unsloth/Kimi-K2-Instruct-GGUF/UD-IQ3_XXS/Kimi-K2-Instruct-UD-IQ3_XXS-00003-of-00009.gguf
+INFO:gguf-dump:* Loading: ././Kimi-K2-Instruct-UD-IQ3_XXS-00003-of-00009.gguf
 * File is LITTLE endian, script is running on a LITTLE endian host.
 * Dumping 6 key/value pair(s)
       1: UINT32     |        1 | GGUF.version = 3
@@ -474,7 +475,7 @@
     124:        384 |   384,     1,     1,     1 | F32     | blk.21.exp_probs_b.bias
     125: 5637144576 |  2048,  7168,   384,     1 | IQ3_XXS | blk.21.ffn_down_exps.weight
     126:   14680064 |  2048,  7168,     1,     1 | IQ4_XS  | blk.21.ffn_down_shexp.weight
-INFO:gguf-dump:* Loading: /mnt/data/models/unsloth/Kimi-K2-Instruct-GGUF/UD-IQ3_XXS/Kimi-K2-Instruct-UD-IQ3_XXS-00004-of-00009.gguf
+INFO:gguf-dump:* Loading: ././Kimi-K2-Instruct-UD-IQ3_XXS-00004-of-00009.gguf
 * File is LITTLE endian, script is running on a LITTLE endian host.
 * Dumping 6 key/value pair(s)
       1: UINT32     |        1 | GGUF.version = 3
@@ -614,7 +615,7 @@
     128:    2752512 |  7168,   384,     1,     1 | F32     | blk.28.ffn_gate_inp.weight
     129:   14680064 |  7168,  2048,     1,     1 | IQ4_XS  | blk.28.ffn_gate_shexp.weight
     130:       7168 |  7168,     1,     1,     1 | F32     | blk.28.ffn_norm.weight
-INFO:gguf-dump:* Loading: /mnt/data/models/unsloth/Kimi-K2-Instruct-GGUF/UD-IQ3_XXS/Kimi-K2-Instruct-UD-IQ3_XXS-00005-of-00009.gguf
+INFO:gguf-dump:* Loading: ././Kimi-K2-Instruct-UD-IQ3_XXS-00005-of-00009.gguf
 * File is LITTLE endian, script is running on a LITTLE endian host.
 * Dumping 6 key/value pair(s)
       1: UINT32     |        1 | GGUF.version = 3
@@ -762,7 +763,145 @@
     136:   18874368 |  1536, 12288,     1,     1 | IQ4_XS  | blk.36.attn_q_b.weight
     137:    4194304 |   512,   128,    64,     1 | Q8_0    | blk.36.attn_v_b.weight
     138:        384 |   384,     1,     1,     1 | F32     | blk.36.exp_probs_b.bias
-INFO:gguf-dump:* Loading: /mnt/data/models/unsloth/Kimi-K2-Instruct-GGUF/UD-IQ3_XXS/Kimi-K2-Instruct-UD-IQ3_XXS-00007-of-00009.gguf
+INFO:gguf-dump:* Loading: ././Kimi-K2-Instruct-UD-IQ3_XXS-00006-of-00009.gguf
+* File is LITTLE endian, script is running on a LITTLE endian host.
+* Dumping 6 key/value pair(s)
+      1: UINT32     |        1 | GGUF.version = 3
+      2: UINT64     |        1 | GGUF.tensor_count = 128
+      3: UINT64     |        1 | GGUF.kv_count = 3
+      4: UINT16     |        1 | split.no = 5
+      5: INT32      |        1 | split.tensors.count = 1096
+      6: UINT16     |        1 | split.count = 9
+* Dumping 128 tensor(s)
+      1: 5637144576 |  2048,  7168,   384,     1 | IQ3_XXS | blk.36.ffn_down_exps.weight
+      2:   14680064 |  2048,  7168,     1,     1 | IQ4_XS  | blk.36.ffn_down_shexp.weight
+      3: 5637144576 |  7168,  2048,   384,     1 | IQ3_XXS | blk.36.ffn_gate_exps.weight
+      4:    2752512 |  7168,   384,     1,     1 | F32     | blk.36.ffn_gate_inp.weight
+      5:   14680064 |  7168,  2048,     1,     1 | IQ4_XS  | blk.36.ffn_gate_shexp.weight
+      6:       7168 |  7168,     1,     1,     1 | F32     | blk.36.ffn_norm.weight
+      7: 5637144576 |  7168,  2048,   384,     1 | IQ3_XXS | blk.36.ffn_up_exps.weight
+      8:   14680064 |  7168,  2048,     1,     1 | IQ4_XS  | blk.36.ffn_up_shexp.weight
+      9:    4194304 |   128,   512,    64,     1 | Q8_0    | blk.37.attn_k_b.weight
+     10:    4128768 |  7168,   576,     1,     1 | IQ4_XS  | blk.37.attn_kv_a_mqa.weight
+     11:        512 |   512,     1,     1,     1 | F32     | blk.37.attn_kv_a_norm.weight
+     12:       7168 |  7168,     1,     1,     1 | F32     | blk.37.attn_norm.weight
+     13:   58720256 |  8192,  7168,     1,     1 | IQ4_XS  | blk.37.attn_output.weight
+     14:   11010048 |  7168,  1536,     1,     1 | Q4_K    | blk.37.attn_q_a.weight
+     15:       1536 |  1536,     1,     1,     1 | F32     | blk.37.attn_q_a_norm.weight
+     16:   18874368 |  1536, 12288,     1,     1 | IQ4_XS  | blk.37.attn_q_b.weight
+     17:    4194304 |   512,   128,    64,     1 | Q8_0    | blk.37.attn_v_b.weight
+     18:        384 |   384,     1,     1,     1 | F32     | blk.37.exp_probs_b.bias
+     19: 5637144576 |  2048,  7168,   384,     1 | IQ3_XXS | blk.37.ffn_down_exps.weight
+     20:   14680064 |  2048,  7168,     1,     1 | IQ4_XS  | blk.37.ffn_down_shexp.weight
+     21: 5637144576 |  7168,  2048,   384,     1 | IQ3_XXS | blk.37.ffn_gate_exps.weight
+     22:    2752512 |  7168,   384,     1,     1 | F32     | blk.37.ffn_gate_inp.weight
+     23:   14680064 |  7168,  2048,     1,     1 | IQ4_XS  | blk.37.ffn_gate_shexp.weight
+     24:       7168 |  7168,     1,     1,     1 | F32     | blk.37.ffn_norm.weight
+     25: 5637144576 |  7168,  2048,   384,     1 | IQ3_XXS | blk.37.ffn_up_exps.weight
+     26:   14680064 |  7168,  2048,     1,     1 | IQ4_XS  | blk.37.ffn_up_shexp.weight
+     27:    4194304 |   128,   512,    64,     1 | Q8_0    | blk.38.attn_k_b.weight
+     28:    4128768 |  7168,   576,     1,     1 | IQ4_XS  | blk.38.attn_kv_a_mqa.weight
+     29:        512 |   512,     1,     1,     1 | F32     | blk.38.attn_kv_a_norm.weight
+     30:       7168 |  7168,     1,     1,     1 | F32     | blk.38.attn_norm.weight
+     31:   58720256 |  8192,  7168,     1,     1 | IQ4_XS  | blk.38.attn_output.weight
+     32:   11010048 |  7168,  1536,     1,     1 | Q4_K    | blk.38.attn_q_a.weight
+     33:       1536 |  1536,     1,     1,     1 | F32     | blk.38.attn_q_a_norm.weight
+     34:   18874368 |  1536, 12288,     1,     1 | IQ4_XS  | blk.38.attn_q_b.weight
+     35:    4194304 |   512,   128,    64,     1 | Q8_0    | blk.38.attn_v_b.weight
+     36:        384 |   384,     1,     1,     1 | F32     | blk.38.exp_probs_b.bias
+     37: 5637144576 |  2048,  7168,   384,     1 | IQ3_XXS | blk.38.ffn_down_exps.weight
+     38:   14680064 |  2048,  7168,     1,     1 | IQ4_XS  | blk.38.ffn_down_shexp.weight
+     39: 5637144576 |  7168,  2048,   384,     1 | IQ3_XXS | blk.38.ffn_gate_exps.weight
+     40:    2752512 |  7168,   384,     1,     1 | F32     | blk.38.ffn_gate_inp.weight
+     41:   14680064 |  7168,  2048,     1,     1 | IQ4_XS  | blk.38.ffn_gate_shexp.weight
+     42:       7168 |  7168,     1,     1,     1 | F32     | blk.38.ffn_norm.weight
+     43: 5637144576 |  7168,  2048,   384,     1 | IQ3_XXS | blk.38.ffn_up_exps.weight
+     44:   14680064 |  7168,  2048,     1,     1 | IQ4_XS  | blk.38.ffn_up_shexp.weight
+     45:    4194304 |   128,   512,    64,     1 | Q8_0    | blk.39.attn_k_b.weight
+     46:    4128768 |  7168,   576,     1,     1 | IQ4_XS  | blk.39.attn_kv_a_mqa.weight
+     47:        512 |   512,     1,     1,     1 | F32     | blk.39.attn_kv_a_norm.weight
+     48:       7168 |  7168,     1,     1,     1 | F32     | blk.39.attn_norm.weight
+     49:   58720256 |  8192,  7168,     1,     1 | IQ4_XS  | blk.39.attn_output.weight
+     50:   11010048 |  7168,  1536,     1,     1 | Q4_K    | blk.39.attn_q_a.weight
+     51:       1536 |  1536,     1,     1,     1 | F32     | blk.39.attn_q_a_norm.weight
+     52:   18874368 |  1536, 12288,     1,     1 | IQ4_XS  | blk.39.attn_q_b.weight
+     53:    4194304 |   512,   128,    64,     1 | Q8_0    | blk.39.attn_v_b.weight
+     54:        384 |   384,     1,     1,     1 | F32     | blk.39.exp_probs_b.bias
+     55: 5637144576 |  2048,  7168,   384,     1 | IQ3_XXS | blk.39.ffn_down_exps.weight
+     56:   14680064 |  2048,  7168,     1,     1 | IQ4_XS  | blk.39.ffn_down_shexp.weight
+     57: 5637144576 |  7168,  2048,   384,     1 | IQ3_XXS | blk.39.ffn_gate_exps.weight
+     58:    2752512 |  7168,   384,     1,     1 | F32     | blk.39.ffn_gate_inp.weight
+     59:   14680064 |  7168,  2048,     1,     1 | IQ4_XS  | blk.39.ffn_gate_shexp.weight
+     60:       7168 |  7168,     1,     1,     1 | F32     | blk.39.ffn_norm.weight
+     61: 5637144576 |  7168,  2048,   384,     1 | IQ3_XXS | blk.39.ffn_up_exps.weight
+     62:   14680064 |  7168,  2048,     1,     1 | IQ4_XS  | blk.39.ffn_up_shexp.weight
+     63:    4194304 |   128,   512,    64,     1 | Q8_0    | blk.40.attn_k_b.weight
+     64:    4128768 |  7168,   576,     1,     1 | IQ4_XS  | blk.40.attn_kv_a_mqa.weight
+     65:        512 |   512,     1,     1,     1 | F32     | blk.40.attn_kv_a_norm.weight
+     66:       7168 |  7168,     1,     1,     1 | F32     | blk.40.attn_norm.weight
+     67:   58720256 |  8192,  7168,     1,     1 | IQ4_XS  | blk.40.attn_output.weight
+     68:   11010048 |  7168,  1536,     1,     1 | Q4_K    | blk.40.attn_q_a.weight
+     69:       1536 |  1536,     1,     1,     1 | F32     | blk.40.attn_q_a_norm.weight
+     70:   18874368 |  1536, 12288,     1,     1 | IQ4_XS  | blk.40.attn_q_b.weight
+     71:    4194304 |   512,   128,    64,     1 | Q8_0    | blk.40.attn_v_b.weight
+     72:        384 |   384,     1,     1,     1 | F32     | blk.40.exp_probs_b.bias
+     73: 5637144576 |  2048,  7168,   384,     1 | IQ3_XXS | blk.40.ffn_down_exps.weight
+     74:   14680064 |  2048,  7168,     1,     1 | IQ4_XS  | blk.40.ffn_down_shexp.weight
+     75: 5637144576 |  7168,  2048,   384,     1 | IQ3_XXS | blk.40.ffn_gate_exps.weight
+     76:    2752512 |  7168,   384,     1,     1 | F32     | blk.40.ffn_gate_inp.weight
+     77:   14680064 |  7168,  2048,     1,     1 | IQ4_XS  | blk.40.ffn_gate_shexp.weight
+     78:       7168 |  7168,     1,     1,     1 | F32     | blk.40.ffn_norm.weight
+     79: 5637144576 |  7168,  2048,   384,     1 | IQ3_XXS | blk.40.ffn_up_exps.weight
+     80:   14680064 |  7168,  2048,     1,     1 | IQ4_XS  | blk.40.ffn_up_shexp.weight
+     81:    4194304 |   128,   512,    64,     1 | Q8_0    | blk.41.attn_k_b.weight
+     82:    4128768 |  7168,   576,     1,     1 | Q6_K    | blk.41.attn_kv_a_mqa.weight
+     83:        512 |   512,     1,     1,     1 | F32     | blk.41.attn_kv_a_norm.weight
+     84:       7168 |  7168,     1,     1,     1 | F32     | blk.41.attn_norm.weight
+     85:   58720256 |  8192,  7168,     1,     1 | IQ4_XS  | blk.41.attn_output.weight
+     86:   11010048 |  7168,  1536,     1,     1 | Q4_K    | blk.41.attn_q_a.weight
+     87:       1536 |  1536,     1,     1,     1 | F32     | blk.41.attn_q_a_norm.weight
+     88:   18874368 |  1536, 12288,     1,     1 | IQ4_XS  | blk.41.attn_q_b.weight
+     89:    4194304 |   512,   128,    64,     1 | Q8_0    | blk.41.attn_v_b.weight
+     90:        384 |   384,     1,     1,     1 | F32     | blk.41.exp_probs_b.bias
+     91: 5637144576 |  2048,  7168,   384,     1 | IQ3_XXS | blk.41.ffn_down_exps.weight
+     92:   14680064 |  2048,  7168,     1,     1 | IQ4_XS  | blk.41.ffn_down_shexp.weight
+     93: 5637144576 |  7168,  2048,   384,     1 | IQ3_XXS | blk.41.ffn_gate_exps.weight
+     94:    2752512 |  7168,   384,     1,     1 | F32     | blk.41.ffn_gate_inp.weight
+     95:   14680064 |  7168,  2048,     1,     1 | IQ4_XS  | blk.41.ffn_gate_shexp.weight
+     96:       7168 |  7168,     1,     1,     1 | F32     | blk.41.ffn_norm.weight
+     97: 5637144576 |  7168,  2048,   384,     1 | IQ3_XXS | blk.41.ffn_up_exps.weight
+     98:   14680064 |  7168,  2048,     1,     1 | IQ4_XS  | blk.41.ffn_up_shexp.weight
+     99:    4194304 |   128,   512,    64,     1 | Q8_0    | blk.42.attn_k_b.weight
+    100:    4128768 |  7168,   576,     1,     1 | IQ4_XS  | blk.42.attn_kv_a_mqa.weight
+    101:        512 |   512,     1,     1,     1 | F32     | blk.42.attn_kv_a_norm.weight
+    102:       7168 |  7168,     1,     1,     1 | F32     | blk.42.attn_norm.weight
+    103:   58720256 |  8192,  7168,     1,     1 | IQ4_XS  | blk.42.attn_output.weight
+    104:   11010048 |  7168,  1536,     1,     1 | Q4_K    | blk.42.attn_q_a.weight
+    105:       1536 |  1536,     1,     1,     1 | F32     | blk.42.attn_q_a_norm.weight
+    106:   18874368 |  1536, 12288,     1,     1 | IQ4_XS  | blk.42.attn_q_b.weight
+    107:    4194304 |   512,   128,    64,     1 | Q8_0    | blk.42.attn_v_b.weight
+    108:        384 |   384,     1,     1,     1 | F32     | blk.42.exp_probs_b.bias
+    109: 5637144576 |  2048,  7168,   384,     1 | IQ3_XXS | blk.42.ffn_down_exps.weight
+    110:   14680064 |  2048,  7168,     1,     1 | IQ4_XS  | blk.42.ffn_down_shexp.weight
+    111: 5637144576 |  7168,  2048,   384,     1 | IQ3_XXS | blk.42.ffn_gate_exps.weight
+    112:    2752512 |  7168,   384,     1,     1 | F32     | blk.42.ffn_gate_inp.weight
+    113:   14680064 |  7168,  2048,     1,     1 | IQ4_XS  | blk.42.ffn_gate_shexp.weight
+    114:       7168 |  7168,     1,     1,     1 | F32     | blk.42.ffn_norm.weight
+    115: 5637144576 |  7168,  2048,   384,     1 | IQ3_XXS | blk.42.ffn_up_exps.weight
+    116:   14680064 |  7168,  2048,     1,     1 | IQ4_XS  | blk.42.ffn_up_shexp.weight
+    117:    4194304 |   128,   512,    64,     1 | Q8_0    | blk.43.attn_k_b.weight
+    118:    4128768 |  7168,   576,     1,     1 | Q6_K    | blk.43.attn_kv_a_mqa.weight
+    119:        512 |   512,     1,     1,     1 | F32     | blk.43.attn_kv_a_norm.weight
+    120:       7168 |  7168,     1,     1,     1 | F32     | blk.43.attn_norm.weight
+    121:   58720256 |  8192,  7168,     1,     1 | IQ4_XS  | blk.43.attn_output.weight
+    122:   11010048 |  7168,  1536,     1,     1 | Q4_K    | blk.43.attn_q_a.weight
+    123:       1536 |  1536,     1,     1,     1 | F32     | blk.43.attn_q_a_norm.weight
+    124:   18874368 |  1536, 12288,     1,     1 | IQ4_XS  | blk.43.attn_q_b.weight
+    125:    4194304 |   512,   128,    64,     1 | Q8_0    | blk.43.attn_v_b.weight
+    126:        384 |   384,     1,     1,     1 | F32     | blk.43.exp_probs_b.bias
+    127: 5637144576 |  2048,  7168,   384,     1 | IQ3_XXS | blk.43.ffn_down_exps.weight
+    128:   14680064 |  2048,  7168,     1,     1 | IQ4_XS  | blk.43.ffn_down_shexp.weight
+INFO:gguf-dump:* Loading: ././Kimi-K2-Instruct-UD-IQ3_XXS-00007-of-00009.gguf
 * File is LITTLE endian, script is running on a LITTLE endian host.
 * Dumping 6 key/value pair(s)
       1: UINT32     |        1 | GGUF.version = 3
@@ -902,7 +1041,7 @@
     128:    2752512 |  7168,   384,     1,     1 | F32     | blk.50.ffn_gate_inp.weight
     129:   14680064 |  7168,  2048,     1,     1 | IQ4_XS  | blk.50.ffn_gate_shexp.weight
     130:       7168 |  7168,     1,     1,     1 | F32     | blk.50.ffn_norm.weight
-INFO:gguf-dump:* Loading: /mnt/data/models/unsloth/Kimi-K2-Instruct-GGUF/UD-IQ3_XXS/Kimi-K2-Instruct-UD-IQ3_XXS-00008-of-00009.gguf
+INFO:gguf-dump:* Loading: ././Kimi-K2-Instruct-UD-IQ3_XXS-00008-of-00009.gguf
 * File is LITTLE endian, script is running on a LITTLE endian host.
 * Dumping 6 key/value pair(s)
       1: UINT32     |        1 | GGUF.version = 3
@@ -1034,7 +1173,7 @@
     120:        384 |   384,     1,     1,     1 | F32     | blk.57.exp_probs_b.bias
     121: 5637144576 |  2048,  7168,   384,     1 | IQ4_XS  | blk.57.ffn_down_exps.weight
     122:   14680064 |  2048,  7168,     1,     1 | Q6_K    | blk.57.ffn_down_shexp.weight
-INFO:gguf-dump:* Loading: /mnt/data/models/unsloth/Kimi-K2-Instruct-GGUF/UD-IQ3_XXS/Kimi-K2-Instruct-UD-IQ3_XXS-00009-of-00009.gguf
+INFO:gguf-dump:* Loading: ././Kimi-K2-Instruct-UD-IQ3_XXS-00009-of-00009.gguf
 * File is LITTLE endian, script is running on a LITTLE endian host.
 * Dumping 6 key/value pair(s)
       1: UINT32     |        1 | GGUF.version = 3

👤 ikawrakow commented the 2025-07-20 at 08:30:26:

Hrrm, I too see this for my Kimi-K2-Instruct quantize logs:

====== llama_model_quantize_internal: did not find weights for blk.5.attn_kv_b.weight ====== llama_model_quantize_internal: did not find weights for blk.5.attn_k_b.weight ====== llama_model_quantize_internal: did not find weights for blk.5.attn_v_b.weight

@ubergarm As discussed elsewhere, it is expected that there is no imatrix data for attn_kv_b. But no imatrix data for attn_k_b and attn_v_b is unexpected if you used -mla 1. Could you please run the imatrix tool adding --verbosity 2 to your command line? There will be a lot of output to stdout with that, so redirect to a log file and post the log here. You only need to run 1 batch so we see the names of all tensors where data is being captured.

👤 ubergarm commented the 2025-07-20 at 15:18:58:

@ThomasBaruzier

everything minus ffn gate up down is very small

Yes, I like to imagine a person with the attn/shexp/first N ffn dense layers as the head, and all the routed exps as the body. DeepSeek has a very small "head" and a very large "body". Kimi-K2 has an even smaller tiny "head" and an even larger "body" haha...

So perhaps one must be more careful when squishing that tiny "brain" lol... All metaphorical of course...

I would love to see a visualization of the relative sizes of say older llama vs deepseek vs kimi using visualization tool like https://github.com/ManimCommunity/manim/ ... too many things to do hah...

I'll test some more about that imatrix with -mla 1 vs without -mla at all and get logs once ssh is back up for the remote rigs 🤞

Also, is there a way to get the tensor types from llama-gguf? Or should I use something like gguf-py?

I didn't ever notice build/bin/llama-gguf even existed hah... Here is how I view gguf files similar to how @magikRUKKOLA is showing above:

cd ik_llama.cpp
# https://docs.astral.sh/uv/getting-started/installation/
uv venv ./venv --python 3.12 --python-preference=only-managed
source ./venv/bin/activate
uv pip install numpy==1.26.2 sentencepiece pyyaml  

./gguf-py/scripts/gguf_dump.py /models/mymodel.gguf

👤 ubergarm commented the 2025-07-20 at 16:08:14:

@ikawrakow

Could you please run the imatrix tool adding --verbosity 2 to your command line? There will be a lot of output to stdout with that, so redirect to a log file and post the log here. You only need to run 1 batch so we see the names of all tensors where data is being captured.

Just got access to the rig again after some storms cut short my cooking last night haha... Here are two command and logs for imatrix on Kimi-K2. One like I did with -mla 1 and another omitting it. First full repeating layer chunk only.

👈 llama-imatrix -mla 1

model=/mnt/raid/models/ubergarm/Kimi-K2-Instruct-GGUF/Kimi-K2-Instruct-Q8_0.gguf

numactl --interleave=all \
./build/bin/llama-imatrix \
    -m "$model" \
    -f ubergarm-imatrix-calibration-corpus-v02.txt \
    -o /tmp/imatrix-test.dat \
    -mla 1 \
    --verbosity 2 \
    --ctx-size 512 \
    --layer-similarity \
    --numa distribute \
    --threads 384 \
    2>&1 | tee -a logs/imat-kimi-mla-1.log

llama_model_loader: loaded meta data with 42 key-value pairs and 1157 tensors from /mnt/raid/models/ubergarm/Kimi-K2-Instruct-GGUF/Kimi-K2-Instruct-Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = deepseek2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Kimi K2 Instruct Bf16 Safetensors
llama_model_loader: - kv   3:                           general.finetune str              = Instruct-safetensors
llama_model_loader: - kv   4:                           general.basename str              = Kimi-K2
llama_model_loader: - kv   5:                         general.size_label str              = 384x15B
llama_model_loader: - kv   6:                      deepseek2.block_count u32              = 61
llama_model_loader: - kv   7:                   deepseek2.context_length u32              = 131072
llama_model_loader: - kv   8:                 deepseek2.embedding_length u32              = 7168
llama_model_loader: - kv   9:              deepseek2.feed_forward_length u32              = 18432
llama_model_loader: - kv  10:             deepseek2.attention.head_count u32              = 64
llama_model_loader: - kv  11:          deepseek2.attention.head_count_kv u32              = 64
llama_model_loader: - kv  12:                   deepseek2.rope.freq_base f32              = 50000.000000
llama_model_loader: - kv  13: deepseek2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  14:                deepseek2.expert_used_count u32              = 8
llama_model_loader: - kv  15:                          general.file_type u32              = 7
llama_model_loader: - kv  16:        deepseek2.leading_dense_block_count u32              = 1
llama_model_loader: - kv  17:                       deepseek2.vocab_size u32              = 163840
llama_model_loader: - kv  18:            deepseek2.attention.q_lora_rank u32              = 1536
llama_model_loader: - kv  19:           deepseek2.attention.kv_lora_rank u32              = 512
llama_model_loader: - kv  20:             deepseek2.attention.key_length u32              = 192
llama_model_loader: - kv  21:           deepseek2.attention.value_length u32              = 128
llama_model_loader: - kv  22:       deepseek2.expert_feed_forward_length u32              = 2048
llama_model_loader: - kv  23:                     deepseek2.expert_count u32              = 384
llama_model_loader: - kv  24:              deepseek2.expert_shared_count u32              = 1
llama_model_loader: - kv  25:             deepseek2.expert_weights_scale f32              = 2.827000
llama_model_loader: - kv  26:              deepseek2.expert_weights_norm bool             = true
llama_model_loader: - kv  27:               deepseek2.expert_gating_func u32              = 2
llama_model_loader: - kv  28:             deepseek2.rope.dimension_count u32              = 64
llama_model_loader: - kv  29:                deepseek2.rope.scaling.type str              = yarn
llama_model_loader: - kv  30:              deepseek2.rope.scaling.factor f32              = 32.000000
llama_model_loader: - kv  31: deepseek2.rope.scaling.original_context_length u32              = 4096
llama_model_loader: - kv  32: deepseek2.rope.scaling.yarn_log_multiplier f32              = 0.100000
llama_model_loader: - kv  33:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  34:                         tokenizer.ggml.pre str              = kimi-k2
llama_model_loader: - kv  35:                      tokenizer.ggml.tokens arr[str,163840]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  36:                  tokenizer.ggml.token_type arr[i32,163840]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  37:                      tokenizer.ggml.merges arr[str,163328]  = ["Ġ Ġ", "ĠĠ ĠĠ", "Ġ t", "i n",...
llama_model_loader: - kv  38:                tokenizer.ggml.bos_token_id u32              = 163584
llama_model_loader: - kv  39:                tokenizer.ggml.eos_token_id u32              = 163585
llama_model_loader: - kv  40:                    tokenizer.chat_template str              = {% if tools -%}\n    {{ '<|im_system|>...
llama_model_loader: - kv  41:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  365 tensors
llama_model_loader: - type q8_0:  792 tensors
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 1.0607 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = deepseek2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 163840
llm_load_print_meta: n_merges         = 163328
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 7168
llm_load_print_meta: n_layer          = 61
llm_load_print_meta: n_head           = 64
llm_load_print_meta: n_head_kv        = 64
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_swa_pattern    = 1
llm_load_print_meta: n_embd_head_k    = 192
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 12288
llm_load_print_meta: n_embd_v_gqa     = 8192
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 18432
llm_load_print_meta: n_expert         = 384
llm_load_print_meta: n_expert_used    = 8
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = yarn
llm_load_print_meta: freq_base_train  = 50000.0
llm_load_print_meta: freq_scale_train = 0.03125
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 671B
llm_load_print_meta: model ftype      = Q8_0
llm_load_print_meta: model params     = 1.027 T
llm_load_print_meta: model size       = 1016.623 GiB (8.504 BPW)
llm_load_print_meta: repeating layers = 1014.299 GiB (8.504 BPW, 1024.571 B parameters)
llm_load_print_meta: general.name     = Kimi K2 Instruct Bf16 Safetensors
llm_load_print_meta: BOS token        = 163584 '[BOS]'
llm_load_print_meta: EOS token        = 163585 '[EOS]'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 163586 '<|im_end|>'
llm_load_print_meta: max token length = 512
llm_load_print_meta: n_layer_dense_lead   = 1
llm_load_print_meta: n_lora_q             = 1536
llm_load_print_meta: n_lora_kv            = 512
llm_load_print_meta: n_ff_exp             = 2048
llm_load_print_meta: n_expert_shared      = 1
llm_load_print_meta: expert_weights_scale = 2.8
llm_load_print_meta: expert_weights_norm  = 1
llm_load_print_meta: expert_gating_func   = sigmoid
llm_load_print_meta: rope_yarn_log_mul    = 0.1000
llm_load_tensors: ggml ctx size =    0.47 MiB
llm_load_tensors:        CPU buffer size = 1041021.91 MiB
....................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: mla_attn   = 1
llama_new_context_with_model: attn_max_b = 0
llama_new_context_with_model: fused_moe  = 0
llama_new_context_with_model: ser        = -1, 0
llama_new_context_with_model: freq_base  = 50000.0
llama_new_context_with_model: freq_scale = 0.03125
llama_kv_cache_init:        CPU KV buffer size =    64.81 MiB
llama_new_context_with_model: KV self size  =   64.81 MiB, c^KV (f16):   34.31 MiB, kv^T (f16):   30.50 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.63 MiB
llama_new_context_with_model:        CPU compute buffer size =   334.00 MiB
llama_new_context_with_model: graph nodes  = 3827
llama_new_context_with_model: graph splits = 1

system_info: n_threads = 384 / 768 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
compute_imatrix: tokenizing the input ..
compute_imatrix: tokenization took 836.032 ms
compute_imatrix: computing over 826 chunks with batch_size 512
collect_imatrix[0]:       blk.0.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.0.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.0.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.0.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:         blk.0.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:            blk.0.ffn_gate.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:              blk.0.ffn_up.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.0.ffn_down.weight, MUL_MAT, 18432 x   512, 0
collect_imatrix[1]:       blk.1.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.1.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.1.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.1.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:         blk.1.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.1.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.1.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.1.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.1.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.1.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.1.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.1.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:       blk.2.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.2.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.2.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.2.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:         blk.2.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.2.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.2.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.2.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.2.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.2.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.2.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.2.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:       blk.3.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.3.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.3.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.3.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:         blk.3.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.3.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.3.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.3.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.3.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.3.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.3.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.3.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:       blk.4.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.4.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.4.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.4.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:         blk.4.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.4.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.4.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.4.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.4.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.4.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.4.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.4.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:       blk.5.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.5.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.5.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.5.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:         blk.5.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.5.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.5.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.5.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.5.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.5.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.5.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.5.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:       blk.6.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.6.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.6.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.6.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:         blk.6.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.6.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.6.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.6.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.6.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.6.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.6.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.6.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:       blk.7.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.7.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.7.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.7.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:         blk.7.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.7.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.7.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.7.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.7.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.7.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.7.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.7.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:       blk.8.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.8.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.8.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.8.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:         blk.8.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.8.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.8.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.8.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.8.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.8.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.8.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.8.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:       blk.9.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.9.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.9.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.9.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:         blk.9.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.9.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.9.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.9.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.9.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.9.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.9.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.9.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.10.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.10.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.10.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.10.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.10.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.10.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.10.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.10.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.10.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.10.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.10.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.10.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.11.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.11.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.11.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.11.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.11.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.11.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.11.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.11.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.11.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.11.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.11.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.11.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.12.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.12.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.12.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.12.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.12.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.12.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.12.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.12.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.12.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.12.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.12.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.12.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.13.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.13.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.13.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.13.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.13.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.13.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.13.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.13.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.13.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.13.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.13.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.13.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.14.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.14.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.14.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.14.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.14.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.14.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.14.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.14.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.14.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.14.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.14.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.14.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.15.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.15.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.15.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.15.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.15.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.15.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.15.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.15.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.15.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.15.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.15.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.15.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.16.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.16.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.16.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.16.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.16.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.16.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.16.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.16.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.16.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.16.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.16.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.16.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.17.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.17.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.17.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.17.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.17.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.17.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.17.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.17.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.17.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.17.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.17.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.17.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.18.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.18.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.18.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.18.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.18.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.18.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.18.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.18.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.18.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.18.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.18.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.18.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.19.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.19.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.19.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.19.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.19.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.19.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.19.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.19.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.19.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.19.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.19.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.19.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.20.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.20.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.20.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.20.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.20.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.20.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.20.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.20.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.20.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.20.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.20.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.20.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.21.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.21.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.21.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.21.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.21.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.21.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.21.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.21.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.21.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.21.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.21.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.21.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.22.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.22.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.22.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.22.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.22.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.22.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.22.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.22.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.22.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.22.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.22.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.22.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.23.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.23.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.23.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.23.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.23.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.23.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.23.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.23.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.23.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.23.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.23.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.23.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.24.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.24.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.24.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.24.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.24.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.24.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.24.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.24.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.24.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.24.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.24.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.24.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.25.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.25.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.25.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.25.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.25.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.25.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.25.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.25.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.25.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.25.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.25.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.25.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.26.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.26.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.26.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.26.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.26.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.26.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.26.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.26.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.26.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.26.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.26.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.26.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.27.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.27.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.27.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.27.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.27.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.27.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.27.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.27.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.27.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.27.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.27.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.27.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.28.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.28.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.28.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.28.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.28.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.28.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.28.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.28.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.28.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.28.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.28.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.28.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.29.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.29.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.29.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.29.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.29.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.29.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.29.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.29.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.29.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.29.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.29.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.29.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.30.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.30.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.30.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.30.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.30.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.30.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.30.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.30.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.30.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.30.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.30.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.30.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.31.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.31.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.31.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.31.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.31.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.31.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.31.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.31.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.31.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.31.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.31.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.31.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.32.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.32.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.32.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.32.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.32.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.32.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.32.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.32.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.32.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.32.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.32.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.32.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.33.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.33.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.33.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.33.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.33.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.33.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.33.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.33.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.33.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.33.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.33.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.33.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.34.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.34.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.34.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.34.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.34.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.34.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.34.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.34.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.34.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.34.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.34.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.34.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.35.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.35.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.35.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.35.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.35.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.35.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.35.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.35.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.35.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.35.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.35.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.35.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.36.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.36.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.36.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.36.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.36.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.36.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.36.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.36.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.36.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.36.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.36.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.36.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.37.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.37.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.37.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.37.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.37.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.37.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.37.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.37.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.37.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.37.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.37.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.37.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.38.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.38.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.38.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.38.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.38.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.38.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.38.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.38.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.38.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.38.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.38.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.38.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.39.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.39.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.39.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.39.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.39.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.39.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.39.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.39.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.39.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.39.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.39.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.39.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.40.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.40.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.40.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.40.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.40.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.40.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.40.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.40.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.40.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.40.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.40.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.40.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.41.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.41.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.41.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.41.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.41.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.41.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.41.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.41.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.41.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.41.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.41.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.41.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.42.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.42.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.42.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.42.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.42.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.42.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.42.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.42.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.42.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.42.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.42.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.42.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.43.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.43.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.43.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.43.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.43.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.43.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.43.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.43.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.43.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.43.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.43.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.43.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.44.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.44.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.44.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.44.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.44.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.44.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.44.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.44.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.44.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.44.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.44.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.44.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.45.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.45.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.45.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.45.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.45.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.45.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.45.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.45.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.45.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.45.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.45.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.45.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.46.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.46.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.46.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.46.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.46.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.46.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.46.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.46.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.46.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.46.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.46.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.46.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.47.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.47.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.47.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.47.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.47.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.47.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.47.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.47.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.47.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.47.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.47.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.47.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.48.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.48.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.48.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.48.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.48.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.48.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.48.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.48.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.48.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.48.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.48.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.48.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.49.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.49.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.49.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.49.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.49.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.49.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.49.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.49.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.49.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.49.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.49.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.49.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.50.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.50.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.50.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.50.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.50.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.50.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.50.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.50.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.50.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.50.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.50.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.50.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.51.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.51.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.51.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.51.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.51.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.51.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.51.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.51.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.51.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.51.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.51.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.51.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.52.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.52.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.52.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.52.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.52.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.52.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.52.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.52.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.52.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.52.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.52.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.52.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.53.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.53.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.53.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.53.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.53.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.53.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.53.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.53.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.53.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.53.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.53.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.53.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.54.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.54.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.54.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.54.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.54.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.54.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.54.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.54.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.54.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.54.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.54.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.54.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.55.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.55.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.55.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.55.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.55.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.55.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.55.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.55.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.55.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.55.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.55.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.55.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.56.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.56.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.56.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.56.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.56.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.56.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.56.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.56.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.56.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.56.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.56.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.56.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.57.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.57.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.57.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.57.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.57.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.57.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.57.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.57.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.57.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.57.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.57.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.57.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.58.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.58.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.58.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.58.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.58.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.58.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.58.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.58.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.58.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.58.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.58.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.58.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.59.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.59.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.59.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.59.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.59.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.59.ffn_gate_inp.weightcompute_imatrix: 190.09 seconds per pass - ETA 43 hours 36.88 minutes
, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.59.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.59.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.59.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.59.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.59.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.59.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.60.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.60.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.60.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]: blk.60.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.60.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.60.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.60.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.60.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.60.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.60.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.60.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.60.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:                    output.weight, MUL_MAT,  7168 x   512, 0
[1]75.3007,

👈 llama-imatrix (no mla)

model=/mnt/raid/models/ubergarm/Kimi-K2-Instruct-GGUF/Kimi-K2-Instruct-Q8_0.gguf

numactl --interleave=all \
./build/bin/llama-imatrix \
    -m "$model" \
    -f ubergarm-imatrix-calibration-corpus-v02.txt \
    -o /tmp/imatrix-test.dat \
    --verbosity 2 \
    --ctx-size 512 \
    --layer-similarity \
    --numa distribute \
    --threads 384 \
    2>&1 | tee -a logs/imat-kimi-no-mla.log

llama_model_loader: loaded meta data with 42 key-value pairs and 1157 tensors from /mnt/raid/models/ubergarm/Kimi-K2-Instruct-GGUF/Kimi-K2-Instruct-Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = deepseek2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Kimi K2 Instruct Bf16 Safetensors
llama_model_loader: - kv   3:                           general.finetune str              = Instruct-safetensors
llama_model_loader: - kv   4:                           general.basename str              = Kimi-K2
llama_model_loader: - kv   5:                         general.size_label str              = 384x15B
llama_model_loader: - kv   6:                      deepseek2.block_count u32              = 61
llama_model_loader: - kv   7:                   deepseek2.context_length u32              = 131072
llama_model_loader: - kv   8:                 deepseek2.embedding_length u32              = 7168
llama_model_loader: - kv   9:              deepseek2.feed_forward_length u32              = 18432
llama_model_loader: - kv  10:             deepseek2.attention.head_count u32              = 64
llama_model_loader: - kv  11:          deepseek2.attention.head_count_kv u32              = 64
llama_model_loader: - kv  12:                   deepseek2.rope.freq_base f32              = 50000.000000
llama_model_loader: - kv  13: deepseek2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  14:                deepseek2.expert_used_count u32              = 8
llama_model_loader: - kv  15:                          general.file_type u32              = 7
llama_model_loader: - kv  16:        deepseek2.leading_dense_block_count u32              = 1
llama_model_loader: - kv  17:                       deepseek2.vocab_size u32              = 163840
llama_model_loader: - kv  18:            deepseek2.attention.q_lora_rank u32              = 1536
llama_model_loader: - kv  19:           deepseek2.attention.kv_lora_rank u32              = 512
llama_model_loader: - kv  20:             deepseek2.attention.key_length u32              = 192
llama_model_loader: - kv  21:           deepseek2.attention.value_length u32              = 128
llama_model_loader: - kv  22:       deepseek2.expert_feed_forward_length u32              = 2048
llama_model_loader: - kv  23:                     deepseek2.expert_count u32              = 384
llama_model_loader: - kv  24:              deepseek2.expert_shared_count u32              = 1
llama_model_loader: - kv  25:             deepseek2.expert_weights_scale f32              = 2.827000
llama_model_loader: - kv  26:              deepseek2.expert_weights_norm bool             = true
llama_model_loader: - kv  27:               deepseek2.expert_gating_func u32              = 2
llama_model_loader: - kv  28:             deepseek2.rope.dimension_count u32              = 64
llama_model_loader: - kv  29:                deepseek2.rope.scaling.type str              = yarn
llama_model_loader: - kv  30:              deepseek2.rope.scaling.factor f32              = 32.000000
llama_model_loader: - kv  31: deepseek2.rope.scaling.original_context_length u32              = 4096
llama_model_loader: - kv  32: deepseek2.rope.scaling.yarn_log_multiplier f32              = 0.100000
llama_model_loader: - kv  33:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  34:                         tokenizer.ggml.pre str              = kimi-k2
llama_model_loader: - kv  35:                      tokenizer.ggml.tokens arr[str,163840]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  36:                  tokenizer.ggml.token_type arr[i32,163840]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  37:                      tokenizer.ggml.merges arr[str,163328]  = ["Ġ Ġ", "ĠĠ ĠĠ", "Ġ t", "i n",...
llama_model_loader: - kv  38:                tokenizer.ggml.bos_token_id u32              = 163584
llama_model_loader: - kv  39:                tokenizer.ggml.eos_token_id u32              = 163585
llama_model_loader: - kv  40:                    tokenizer.chat_template str              = {% if tools -%}\n    {{ '<|im_system|>...
llama_model_loader: - kv  41:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  365 tensors
llama_model_loader: - type q8_0:  792 tensors
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 1.0607 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = deepseek2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 163840
llm_load_print_meta: n_merges         = 163328
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 7168
llm_load_print_meta: n_layer          = 61
llm_load_print_meta: n_head           = 64
llm_load_print_meta: n_head_kv        = 64
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_swa_pattern    = 1
llm_load_print_meta: n_embd_head_k    = 192
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 12288
llm_load_print_meta: n_embd_v_gqa     = 8192
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 18432
llm_load_print_meta: n_expert         = 384
llm_load_print_meta: n_expert_used    = 8
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = yarn
llm_load_print_meta: freq_base_train  = 50000.0
llm_load_print_meta: freq_scale_train = 0.03125
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 671B
llm_load_print_meta: model ftype      = Q8_0
llm_load_print_meta: model params     = 1.027 T
llm_load_print_meta: model size       = 1016.623 GiB (8.504 BPW)
llm_load_print_meta: repeating layers = 1014.299 GiB (8.504 BPW, 1024.571 B parameters)
llm_load_print_meta: general.name     = Kimi K2 Instruct Bf16 Safetensors
llm_load_print_meta: BOS token        = 163584 '[BOS]'
llm_load_print_meta: EOS token        = 163585 '[EOS]'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 163586 '<|im_end|>'
llm_load_print_meta: max token length = 512
llm_load_print_meta: n_layer_dense_lead   = 1
llm_load_print_meta: n_lora_q             = 1536
llm_load_print_meta: n_lora_kv            = 512
llm_load_print_meta: n_ff_exp             = 2048
llm_load_print_meta: n_expert_shared      = 1
llm_load_print_meta: expert_weights_scale = 2.8
llm_load_print_meta: expert_weights_norm  = 1
llm_load_print_meta: expert_gating_func   = sigmoid
llm_load_print_meta: rope_yarn_log_mul    = 0.1000
llm_load_tensors: ggml ctx size =    0.47 MiB
llm_load_tensors:        CPU buffer size = 1041021.91 MiB
....................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: mla_attn   = 0
llama_new_context_with_model: attn_max_b = 0
llama_new_context_with_model: fused_moe  = 0
llama_new_context_with_model: ser        = -1, 0
llama_new_context_with_model: freq_base  = 50000.0
llama_new_context_with_model: freq_scale = 0.03125
llama_kv_cache_init:        CPU KV buffer size =  1220.00 MiB
llama_new_context_with_model: KV self size  = 1220.00 MiB, K (f16):  732.00 MiB, V (f16):  488.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.63 MiB
llama_new_context_with_model:        CPU compute buffer size =   334.00 MiB
llama_new_context_with_model: graph nodes  = 3766
llama_new_context_with_model: graph splits = 1

system_info: n_threads = 384 / 768 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
compute_imatrix: tokenizing the input ..
compute_imatrix: tokenization took 840.818 ms
compute_imatrix: computing over 826 chunks with batch_size 512
collect_imatrix[0]:            blk.0.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.0.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:       blk.0.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.0.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:         blk.0.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:            blk.0.ffn_gate.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:              blk.0.ffn_up.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.0.ffn_down.weight, MUL_MAT, 18432 x   512, 0
collect_imatrix[1]:            blk.1.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.1.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:       blk.1.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.1.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:         blk.1.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.1.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.1.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.1.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.1.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.1.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.1.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.1.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:            blk.2.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.2.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:       blk.2.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.2.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:         blk.2.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.2.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.2.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.2.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.2.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.2.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.2.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.2.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:            blk.3.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.3.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:       blk.3.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.3.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:         blk.3.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.3.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.3.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.3.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.3.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.3.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.3.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.3.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:            blk.4.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.4.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:       blk.4.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.4.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:         blk.4.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.4.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.4.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.4.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.4.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.4.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.4.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.4.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:            blk.5.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.5.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:       blk.5.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.5.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:         blk.5.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.5.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.5.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.5.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.5.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.5.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.5.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.5.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:            blk.6.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.6.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:       blk.6.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.6.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:         blk.6.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.6.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.6.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.6.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.6.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.6.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.6.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.6.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:            blk.7.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.7.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:       blk.7.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.7.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:         blk.7.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.7.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.7.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.7.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.7.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.7.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.7.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.7.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:            blk.8.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.8.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:       blk.8.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.8.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:         blk.8.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.8.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.8.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.8.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.8.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.8.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.8.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.8.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:            blk.9.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:            blk.9.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:       blk.9.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.9.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:         blk.9.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:        blk.9.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.9.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:         blk.9.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:       blk.9.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.9.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:        blk.9.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.9.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.10.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.10.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.10.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.10.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.10.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.10.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.10.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.10.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.10.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.10.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.10.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.10.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.11.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.11.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.11.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.11.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.11.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.11.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.11.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.11.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.11.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.11.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.11.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.11.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.12.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.12.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.12.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.12.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.12.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.12.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.12.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.12.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.12.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.12.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.12.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.12.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.13.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.13.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.13.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.13.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.13.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.13.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.13.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.13.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.13.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.13.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.13.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.13.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.14.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.14.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.14.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.14.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.14.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.14.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.14.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.14.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.14.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.14.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.14.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.14.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.15.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.15.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.15.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.15.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.15.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.15.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.15.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.15.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.15.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.15.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.15.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.15.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.16.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.16.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.16.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.16.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.16.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.16.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.16.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.16.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.16.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.16.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.16.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.16.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.17.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.17.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.17.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.17.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.17.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.17.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.17.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.17.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.17.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.17.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.17.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.17.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.18.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.18.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.18.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.18.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.18.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.18.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.18.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.18.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.18.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.18.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.18.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.18.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.19.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.19.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.19.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.19.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.19.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.19.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.19.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.19.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.19.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.19.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.19.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.19.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.20.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.20.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.20.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.20.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.20.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.20.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.20.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.20.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.20.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.20.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.20.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.20.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.21.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.21.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.21.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.21.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.21.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.21.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.21.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.21.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.21.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.21.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.21.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.21.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.22.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.22.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.22.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.22.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.22.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.22.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.22.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.22.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.22.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.22.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.22.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.22.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.23.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.23.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.23.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.23.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.23.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.23.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.23.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.23.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.23.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.23.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.23.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.23.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.24.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.24.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.24.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.24.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.24.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.24.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.24.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.24.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.24.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.24.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.24.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.24.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.25.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.25.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.25.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.25.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.25.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.25.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.25.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.25.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.25.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.25.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.25.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.25.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.26.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.26.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.26.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.26.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.26.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.26.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.26.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.26.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.26.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.26.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.26.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.26.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.27.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.27.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.27.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.27.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.27.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.27.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.27.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.27.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.27.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.27.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.27.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.27.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.28.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.28.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.28.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.28.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.28.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.28.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.28.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.28.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.28.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.28.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.28.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.28.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.29.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.29.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.29.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.29.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.29.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.29.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.29.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.29.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.29.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.29.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.29.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.29.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.30.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.30.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.30.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.30.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.30.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.30.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.30.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.30.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.30.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.30.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.30.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.30.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.31.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.31.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.31.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.31.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.31.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.31.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.31.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.31.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.31.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.31.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.31.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.31.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.32.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.32.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.32.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.32.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.32.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.32.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.32.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.32.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.32.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.32.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.32.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.32.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.33.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.33.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.33.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.33.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.33.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.33.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.33.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.33.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.33.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.33.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.33.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.33.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.34.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.34.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.34.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.34.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.34.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.34.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.34.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.34.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.34.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.34.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.34.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.34.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.35.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.35.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.35.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.35.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.35.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.35.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.35.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.35.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.35.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.35.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.35.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.35.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.36.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.36.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.36.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.36.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.36.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.36.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.36.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.36.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.36.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.36.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.36.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.36.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.37.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.37.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.37.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.37.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.37.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.37.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.37.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.37.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.37.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.37.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.37.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.37.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.38.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.38.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.38.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.38.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.38.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.38.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.38.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.38.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.38.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.38.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.38.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.38.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.39.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.39.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.39.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.39.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.39.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.39.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.39.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.39.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.39.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.39.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.39.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.39.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.40.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.40.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.40.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.40.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.40.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.40.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.40.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.40.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.40.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.40.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.40.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.40.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.41.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.41.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.41.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.41.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.41.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.41.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.41.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.41.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.41.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.41.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.41.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.41.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.42.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.42.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.42.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.42.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.42.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.42.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.42.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.42.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.42.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.42.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.42.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.42.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.43.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.43.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.43.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.43.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.43.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.43.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.43.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.43.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.43.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.43.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.43.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.43.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.44.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.44.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.44.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.44.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.44.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.44.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.44.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.44.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.44.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.44.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.44.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.44.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.45.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.45.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.45.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.45.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.45.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.45.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.45.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.45.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.45.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.45.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.45.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.45.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.46.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.46.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.46.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.46.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.46.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.46.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.46.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.46.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.46.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.46.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.46.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.46.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.47.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.47.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.47.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.47.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.47.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.47.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.47.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.47.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.47.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.47.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.47.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.47.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.48.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.48.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.48.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.48.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.48.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.48.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.48.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.48.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.48.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.48.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.48.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.48.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.49.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.49.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.49.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.49.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.49.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.49.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.49.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.49.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.49.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.49.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.49.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.49.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.50.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.50.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.50.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.50.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.50.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.50.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.50.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.50.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.50.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.50.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.50.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.50.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.51.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.51.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.51.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.51.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.51.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.51.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.51.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.51.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.51.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.51.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.51.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.51.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.52.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.52.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.52.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.52.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.52.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.52.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.52.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.52.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.52.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.52.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.52.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.52.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.53.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.53.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.53.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.53.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.53.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.53.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.53.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.53.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.53.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.53.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.53.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.53.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.54.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.54.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.54.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.54.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.54.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.54.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.54.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.54.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.54.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.54.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.54.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.54.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.55.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.55.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.55.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.55.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.55.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.55.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.55.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.55.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.55.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.55.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.55.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.55.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.56.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.56.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.56.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.56.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.56.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.56.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.56.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.56.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.56.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.56.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.56.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.56.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.57.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.57.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.57.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.57.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.57.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.57.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.57.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.57.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.57.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.57.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.57.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.57.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.58.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.58.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.58.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.58.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.58.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.58.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.58.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.58.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.58.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.58.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.58.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.58.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.59.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.59.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.59.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.59.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.59.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.59.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:   compute_imatrix: 22.24 seconds per pass - ETA 5 hours 6.18 minutes
   blk.59.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.59.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.59.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.59.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.59.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.59.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:           blk.60.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:           blk.60.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[1]:      blk.60.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:          blk.60.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[1]:        blk.60.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[1]:       blk.60.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:      blk.60.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:        blk.60.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[1]:      blk.60.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:     blk.60.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:       blk.60.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[1]:     blk.60.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:                    output.weight, MUL_MAT,  7168 x   512, 0
[1]75.2142,collect_imatrix[1]:            blk.0.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:            blk.0.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[2]:       blk.0.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:           blk.0.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[2]:         blk.0.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[2]:            blk.0.ffn_gate.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:              blk.0.ffn_up.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:            blk.0.ffn_down.weight, MUL_MAT, 18432 x   512, 0
collect_imatrix[2]:            blk.1.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:            blk.1.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[2]:       blk.1.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:           blk.1.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[2]:         blk.1.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[2]:        blk.1.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:       blk.1.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[2]:         blk.1.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[2]:       blk.1.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[2]:      blk.1.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:        blk.1.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:      blk.1.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[2]:            blk.2.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:            blk.2.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[2]:       blk.2.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:           blk.2.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[2]:         blk.2.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[2]:        blk.2.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:       blk.2.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[2]:         blk.2.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[2]:       blk.2.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[2]:      blk.2.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:        blk.2.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:      blk.2.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[2]:            blk.3.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:            blk.3.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[2]:       blk.3.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:           blk.3.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[2]:         blk.3.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[2]:        blk.3.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:       blk.3.ffn_gate_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[2]:         blk.3.ffn_up_exps.weight, MUL_MAT_ID,  7168 x   512, 0
collect_imatrix[2]:       blk.3.ffn_down_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[2]:      blk.3.ffn_gate_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:        blk.3.ffn_up_shexp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:      blk.3.ffn_down_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[2]:            blk.4.attn_q_a.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:            blk.4.attn_q_b.weight, MUL_MAT,  1536 x   512, 0
collect_imatrix[2]:       blk.4.attn_kv_a_mqa.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:           blk.4.attn_kv_b.weight, MUL_MAT,   512 x   512, 0
collect_imatrix[2]:         blk.4.attn_output.weight, MUL_MAT,  8192 x   512, 0
collect_imatrix[2]:        blk.4.ffn_gate_inp.weight, MUL_MAT,  7168 x   512, 0
collect_imatrix[2]:       blk.4.ffn_gate_exps.weight, MUL_MAT_ID,  71

👤 ThomasBaruzier commented the 2025-07-20 at 16:59:15:

Yes, I like to imagine a person with the attn/shexp/first N ffn dense layers as their "head", and all the routed exps as their "body". DeepSeek has a very small "head" and a very large "body". Kimi-K2 has an even smaller tiny "head" and an even larger "body" haha...

Funny analogy ahah. I guess we could try using some Q8_K_R8 for these tensors if one wanted pure cpu inference. I wonder how fast that would go. For cuda, I guess the best bet could be Q8_0 or Q6_K? Or maybe lower quants could be still fine if the PPL bump was due to missing tensor data in the imatrix?

I didn't ever notice build/bin/llama-gguf even existed hah... Here is how I view gguf files similar to how @magikRUKKOLA is showing above

Thanks, I will check it out

👤 ikawrakow commented the 2025-07-20 at 17:23:02:

I guess the best bet could be Q8_0 or Q6_K

Q8_0 will be faster for PP, Q6_K for TG. As Q6_K is not the fastest quantization type on CUDA, you may want to try Q6_0 - a highly overlooked quant - to get the best of both worlds.

👤 ubergarm commented the 2025-07-20 at 17:34:37:

I wonder how fast that would go.

I have some preliminary llama-sweep-bench with my original recipe Kimi-K2 quants on CPU only backend using the experimental AVX512 PR (on AMD Zen 5 CPU): https://github.com/ikawrakow/ik_llama.cpp/pull/612#issuecomment-3076539817

I plan to get at least one a/b test sweep-bench of my kimi-k2 v0.1 original recipe vs the v0.2 full q8_0 attn/shexp/blk.0.ffn.* on this same rig today and might release the updated quants if the speed hit is not too bad given the improvement Perplexity.

Of course I'll probably want to try a v0.3 recipe eventually after sorting out the MLA imatrix business 😅 ... Fortunately hf doesn't charge for the public storage 💰 🪦 🤗 ...

212 KiB Raw Permalink Blame History Unescape Escape

🔀 #616 - Adding IQ1_KT - 1.75 bpw SOTA quants

Description

💬 Conversation

212 KiB

Raw Permalink Blame History