mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2026-03-14 18:37:23 +00:00
* fix(amx): add BufferASmallKGroupImpl to fix buffer overflow in from_mat The original BufferAKGroupImpl::from_mat writes 64 bytes per K_STEP iteration but when K_STEP=32 (for GemmKernel224Int4SmallKGroup), this causes buffer overflow. BufferASmallKGroupImpl overrides from_mat to write only 32 bytes per iteration. * perf(k2-moe): optimize memory allocation with pooled buffers - Replace per-expert buffer allocation with shared memory pools - Dynamically assign buffer slices based on activated experts - Add group_size inference from scale tensor shape in amx.py * delete kimi k2 forward test * add TODO comment for pool_count_ calculation