ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-05-11 08:20:21 +00:00

Files

Oql 8139c092bf Reduce CPU memory usage during large chunk prefill (Fixes #1676 ) (#1683 )

* fix(amx): add BufferASmallKGroupImpl to fix buffer overflow in from_mat

The original BufferAKGroupImpl::from_mat writes 64 bytes per K_STEP iteration
but when K_STEP=32 (for GemmKernel224Int4SmallKGroup), this causes buffer overflow.

BufferASmallKGroupImpl overrides from_mat to write only 32 bytes per iteration.

* perf(k2-moe): optimize memory allocation with pooled buffers

- Replace per-expert buffer allocation with shared memory pools
- Dynamically assign buffer slices based on activated experts
- Add group_size inference from scale tensor shape in amx.py

* delete kimi k2 forward test

* add TODO comment for pool_count_ calculation

2025-12-08 20:19:07 +08:00

amx

Reduce CPU memory usage during large chunk prefill (Fixes #1676 ) (#1683 )

2025-12-08 20:19:07 +08:00

kvcache

update kt-kernel

2025-11-03 15:19:52 +08:00

llamafile

[feat](moe_kernel): add amd blis support (int8) (#1600 )

2025-11-27 12:08:53 +08:00

moe_kernel

[feat](moe_kernel): add amd blis support (int8) (#1600 )