* fix(amx): add BufferASmallKGroupImpl to fix buffer overflow in from_mat
The original BufferAKGroupImpl::from_mat writes 64 bytes per K_STEP iteration
but when K_STEP=32 (for GemmKernel224Int4SmallKGroup), this causes buffer overflow.
BufferASmallKGroupImpl overrides from_mat to write only 32 bytes per iteration.
* perf(k2-moe): optimize memory allocation with pooled buffers
- Replace per-expert buffer allocation with shared memory pools
- Dynamically assign buffer slices based on activated experts
- Add group_size inference from scale tensor shape in amx.py
* delete kimi k2 forward test
* add TODO comment for pool_count_ calculation
* support Kimi-K2-Thinking original weight
fix amx kernel bug
* update k2 avx kernel.
* feat: add CPUInfer write buffer task
* [feat]: add kimi k2 cpu write buffer support
- Implement write_weights_to_buffer function in k2-moe.hpp for extracting GPU expert weights
- Fix down (w2) weight column-wise slicing for different TP configurations
- Support three TP scenarios: cpu_tp == gpu_tp, cpu_tp > gpu_tp, cpu_tp < gpu_tp
- Add comprehensive test cases for weight extraction validation
- Ensure compatibility with Kimi model's MoE architecture
* [fix]: correct write_weight_scale_to_buffer expert offset calculation
Fixed the bug in write_weight_scale_to_buffer_task where expert offsets in GPU buffers were incorrectly calculated. Changed from using per_expert_gpu sizes to using full gpu_tp sizes, ensuring correct memory layout for multi-expert scenarios.
Also added benchmark scripts for k2 moe and write buffer operations, and cleaned up debug output in test files.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* [feat]: add write buffer wrapper
* [fix] fix comment
---------
Co-authored-by: ouqingliang <1692110604@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
* [feat]: kt-kernel: Add resume arg to CPU weight conversion
* [docs]: kt-kernel: Document resume arg for CPU weight conversion
* [fix]: kt-kernel: Only print resume layer if in use
* [fix]: kt-kernel: Don't log skipped layers when using resume_layer
* [feat]: update kt-kernel hooks and add contribution guide
* [docs]: add contributing guide
* [style]: format the python file and cpp file in kt-kernel
* update README for kt-kernel
* style: format C++ and Python code in kt-kernel
- Format C++ files: task_queue, ext_bindings, and MoE operators
- Format Python utility modules: amx, llamafile, and loader
- Improve code readability and consistency