ErvinXie
71f683acec
Support Native Kimi K2 Thinking (#1663)
* [feat]: fix k2 prefill
* Update Kimi-K2-Thinking.md
* Create Kimi-K2-Thinking-Native.md
* Update Kimi-K2-Thinking.md
* Update Kimi-K2-Thinking.md
* Update Kimi-K2-Thinking-Native.md
* [perf] optimize K2 MoE weight loading with per-expert pointers
- Avoid expensive torch.stack().contiguous() in Python (was ~6.6s)
- Use per-expert pointer arrays (gate_projs) instead of contiguous memory
- C++ worker pool performs parallel memcpy for TP slicing
- Add LOAD_TIME_PROFILE for load_weights timing analysis
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: ouqingliang <1692110604@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-05 21:53:05 +08:00
..
2025-02-19 16:37:18 +08:00
2025-12-02 20:04:10 +08:00
2024-07-27 16:06:58 +08:00
2025-11-17 11:48:42 +08:00
2025-04-29 11:12:51 +08:00
2025-05-15 07:03:45 +00:00
2025-03-05 20:21:04 +08:00
2025-02-14 19:58:26 +00:00
2025-11-10 16:08:04 +08:00
2025-05-28 13:55:35 +08:00
2025-03-17 17:54:00 +08:00
2025-11-13 20:44:13 +08:00
2025-02-26 15:43:08 +00:00
2025-05-17 15:25:33 +08:00
2025-12-05 21:53:05 +08:00
2025-12-05 21:53:05 +08:00
2025-09-05 20:19:37 +08:00
2025-11-11 20:54:41 +08:00
2025-11-29 15:46:55 +08:00
2025-11-11 20:54:41 +08:00
2025-05-15 07:03:45 +00:00
2024-08-30 03:34:39 +09:00
2024-08-29 12:04:56 +08:00
2024-11-04 14:02:19 +08:00
2025-02-14 19:58:26 +00:00
2025-06-30 15:09:35 +00:00
2025-09-16 13:21:58 +00:00
2025-03-14 14:25:52 -04:00
2025-11-06 17:34:21 +08:00
2025-07-31 03:14:49 +00:00
2025-02-13 17:25:12 +08:00
2025-05-28 13:55:35 +08:00