Files
ktransformers/kt-kernel/operators
Oql 8139c092bf Reduce CPU memory usage during large chunk prefill (Fixes #1676) (#1683)
* fix(amx): add BufferASmallKGroupImpl to fix buffer overflow in from_mat

The original BufferAKGroupImpl::from_mat writes 64 bytes per K_STEP iteration
but when K_STEP=32 (for GemmKernel224Int4SmallKGroup), this causes buffer overflow.

BufferASmallKGroupImpl overrides from_mat to write only 32 bytes per iteration.

* perf(k2-moe): optimize memory allocation with pooled buffers

- Replace per-expert buffer allocation with shared memory pools
- Dynamically assign buffer slices based on activated experts
- Add group_size inference from scale tensor shape in amx.py

* delete kimi k2 forward test

* add TODO comment for pool_count_ calculation
2025-12-08 20:19:07 +08:00
..
2025-11-03 15:19:52 +08:00
2025-11-03 15:19:52 +08:00
2025-10-12 05:13:00 +00:00
2025-10-12 05:13:00 +00:00
2025-10-12 05:13:00 +00:00
2025-10-12 05:13:00 +00:00
2025-10-12 05:13:00 +00:00
2025-10-12 05:13:00 +00:00