Files
ktransformers/kt-kernel/python
Benjamin F f05b4009f3 [fix](kt-kernel): fix double mem used by safetensor loader (#1997)
Release the SafeTensor mmap loader singleton after each layer's
load_weights() completes. The C++ engine already holds a deep copy
(cpu_infer.sync() guarantees this), so releasing the mmap handles is
safe. The next layer recreates the loader on demand.

This halves peak memory usage during model loading (e.g. DSv3.2:
1.2T -> 613G).

Based on #1966 by @poryfly — adapted to v0.6.2.post3 codebase
(adds MXFP4 support missing from the original PR).

Co-authored-by: xiongchenhui <xiongchenhui@hisense.com>
2026-05-11 12:00:30 +08:00
..