ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-03-14 18:37:23 +00:00

Files

Oql 8139c092bf Reduce CPU memory usage during large chunk prefill (Fixes #1676 ) (#1683 )

* fix(amx): add BufferASmallKGroupImpl to fix buffer overflow in from_mat

The original BufferAKGroupImpl::from_mat writes 64 bytes per K_STEP iteration
but when K_STEP=32 (for GemmKernel224Int4SmallKGroup), this causes buffer overflow.

BufferASmallKGroupImpl overrides from_mat to write only 32 bytes per iteration.

* perf(k2-moe): optimize memory allocation with pooled buffers

- Replace per-expert buffer allocation with shared memory pools
- Dynamically assign buffer slices based on activated experts
- Add group_size inference from scale tensor shape in amx.py

* delete kimi k2 forward test

* add TODO comment for pool_count_ calculation

2025-12-08 20:19:07 +08:00

utils

Reduce CPU memory usage during large chunk prefill (Fixes #1676 ) (#1683 )

2025-12-08 20:19:07 +08:00

__init__.py

Fix kt-kernel for new wrapper (#1588 )

2025-11-10 21:47:34 +08:00

experts_base.py

fix(llamafile): resolve deferred experts data race and update README (#1646 )

2025-11-26 23:19:37 +08:00

experts.py

[Feature] Add avx-based kimi-k2 support (#1656 )

2025-12-02 16:01:07 +08:00