ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-06-06 15:54:02 +00:00

Author	SHA1	Message	Date
mrhaoxx	250e4fe52e	merge: integrate origin/main into sft branch Resolve conflicts: - experts.py: keep SFT mode dispatch, add main's numa_nodes param - experts_base.py: merge numa_nodes into shared _get_cpu_infer - convert_cpu_weights.py: keep SFT version (per-layer shard, tmpfs, batched FP8, backward weights) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 22:40:07 +08:00
mrhaoxx	a789729923	align sft branch with main: revert worker_pool, strip sft_timer, fix inference defaults - Revert worker_pool.cpp/.h to main (remove RDTSC timer, Chrome Trace, sft_timer namespace, ITT API, extended do_work_stealing_job API) - Strip all sft_timer instrumentation from sft-only files (sft_moe.hpp, moe-sft-tp.hpp, avx_kernels.hpp) - Restore pin_memory=True in KExpertsCPUBuffer (inference path) - Restore fused tensor transpose logic in convert_cpu_weights.py (main layout) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 17:39:56 +08:00
mrhaoxx	a98d544833	merge: integrate origin/main into sft branch Resolved 6 conflicts: - CMakeLists.txt: keep cpptrace + debug flag, accept flexible build type - worker_pool.cpp: keep SFT profiling + main's block=1 spin fix - ext_bindings.cpp: keep both SFT MOE bindings and AVX2/BF16/FP8 bindings - common.hpp: keep gpu_experts_mask + SFT backward weight fields - __init__.py: export both generate_gpu_experts_masks and AMXSFTMoEWrapper - experts.py: gpu_experts_mask for inference, num_gpu_experts for SFT, new methods	2026-04-08 23:19:28 +08:00
mrhaoxx	f36699affd	feat(sft): AMX MoE SFT backend with LoRA support Complete SFT (Supervised Fine-Tuning) backend for MoE models using AMX SIMD: Core C++ implementation: - sft_moe.hpp: Forward/backward with LoRA fused operations (~5500 lines) - moe-sft-tp.hpp: Tensor-parallel wrapper for multi-NUMA - amx/moe-sft-tp.hpp: AMX-specific TP implementation - avx_kernels.hpp: AVX512 SIMD kernels for LoRA GEMM - amx_kernels.hpp: AMX tile kernels for Panel5 rank-outer optimization - worker_pool: RDTSC profiling, Chrome trace output, SFT timer infrastructure - ext_bindings.cpp: SFT MOE pybind bindings (BF16/INT8/INT4 + SkipLoRA variants) Python sft/ submodule (kt_kernel.sft): - base.py: BaseSFTMoEWrapper with buffer management (template method pattern) - amx.py: AMXSFTMoEWrapper (weight loading, C++ task construction) - autograd.py: KTMoEFunction (torch.autograd.Function for distributed training) - layer.py: KTMoELayerWrapper (nn.Module replacing HF MoE layers) - arch.py: MOEArchConfig (Qwen3/DeepSeek/Mixtral architecture detection) - weights.py: Expert weight extraction and checkpoint loading - lora.py: PEFT LoRA adaptation (view buffers, grad buffers, save/load adapter) - wrapper.py: wrap_moe_layers_with_kt_wrapper, load_kt_model, build_kt_device_map - config.py: KTConfig dataclass (DeepSpeed-style opaque config passthrough) - dist_utils.py: Distributed gather/scatter, checkpoint-phase detection Design decisions: - Rank-0-only expert pattern: only rank 0 holds C++ wrapper and expert weights - DeepSpeed-style integration: accelerate keeps only KTransformersPlugin (framework interaction fields), all logic in kt_kernel.sft - Inference isolation: importing kt_kernel does not load sft/ submodule - Old field name compatibility: _get_kt_config() converts kt_xxx→xxx automatically Verified: Qwen3-235B-A22B 4GPU AMXBF16 training, loss converges normally.	2026-04-08 23:11:00 +08:00
Doctor Shotgun	24cd4fc055	feat(kt-kernel): Add utility script to merge loose layer weights to safetensors (#1886 ) * Add utility script to merge loose layer weights to safetensors * Send warnings and errors to stderr * Fix expert index parsing for MOE_INT4 and MOE_INT8	2026-03-31 10:41:07 +08:00
alin899992	9c18b60556	feat: CPU weight conversion for GLM-5 and MiniMax-M2.5 (#1853 ) * Support for GLM-5 and Minimax-M2.5 Add CPU weight conversion support for GLM-5 and Minimax-M2.5 * fix: remove overly restrictive MiniMax condition and deduplicate code - Remove `args.input_type == "fp8"` from MiniMaxConverter selection so bf16/fp16 MiniMax models no longer fall through to OnlineQuantConverter (which doesn't handle w1/w2/w3 naming and would fail). - Remove OnlineQuantConverter._find_expert_layers() which is identical to the inherited ConverterBase._find_expert_layers(). - Remove redundant expert_key_filter assignment (same as base default). --------- Co-authored-by: ErvinXie <ervinxie@foxmail.com>	2026-03-31 10:39:48 +08:00
Jianwei Dong	027832c590	[feat](kt-kernel): CPU-GPU experts sched (#1796 )	2026-01-16 17:01:15 +08:00
Jiaqi Liao	46b0f36980	[feat](kt-kernel): Fix CPU instruction set variants for build & install (#1746 ) * [feat]: Enhance CPU feature detection and support for AVX512 extensions - Added cmake/DetectCPU.cmake for automatic CPU feature detection. - Updated CMakeLists.txt to include auto-detection logic for AVX512 features. - Modified install.sh to include new AVX512_VBMI option for FP8 MoE. - Enhanced _cpu_detect.py to support progressive matching of CPU variants. - Created scripts/check_cpu_features.py for manual CPU feature checks. - Updated setup.py to reflect changes in CPU variant building and environment variables. * [fix](kt-kernel): Add conditional inclusion of FP8 MoE for AVX512 BF16 support * [chore](kt-kernel): update project version to 0.5.0 in CMakeLists.txt and version.py	2025-12-24 18:57:45 +08:00
mrhaoxx	503295fc88	[feat](kt-kernel): refactor convert_cpu_weights.py to support conversation for GLM-4.6V (#1687 ) Signed-off-by: mrhaoxx <mr.haoxx@gmail.com>	2025-12-09 14:24:41 +08:00
Jianwei Dong	fd78fe520a	fix(scripts): resolve OOM when converting gpu weights and update README (#1640 )	2025-12-01 14:15:14 +08:00
mrhaoxx	637c49c83f	[feat](kt-kernel): support qwen3-vl weights convert (#1648 )	2025-11-27 22:29:09 +08:00
ZiWei Yuan	1374b98ee5	[feat](moe_kernel): add amd blis support (int8) (#1600 ) * [feat]: init amd adaption * [feat]: add blis support * [fix]: fix setup and moe kernel warpper * [fix](setup.py): support rebuild with cache and import kt_kernel works fine * [feat]: add moe_kernel converter for amd and implement the load method(haven't tested yet) * [feat](moe_kernel/moe.hpp): delete unused memory when using save * [fix](moe_kernel): update PLAIN for pack * [fix](moe_kernel): rm printf debug * [fix](moe_kernel): skip gpu experts * [fix](moe_kernel/moe.hpp): update include memory path * [feat](moe_kernel/moe.hpp): support expert deferral * [feat]: finish amd --------- Co-authored-by: mrhaoxx <mr.haoxx@gmail.com>	2025-11-27 12:08:53 +08:00
Jianwei Dong	51745a9ea1	add ci (#1642 )	2025-11-25 20:52:08 +08:00
DocShotgun	e72a4fb880	[feat](kt-kernel): Add resume arg to CPU weight conversion (#1630 ) * [feat]: kt-kernel: Add resume arg to CPU weight conversion * [docs]: kt-kernel: Document resume arg for CPU weight conversion * [fix]: kt-kernel: Only print resume layer if in use * [fix]: kt-kernel: Don't log skipped layers when using resume_layer	2025-11-22 12:00:15 +08:00
ZiWei Yuan	aef6672dd8	[docs]: add contribuing guide and add hooks install (#1613 ) * [feat]: update kt-kernel hooks and add contribution guide * [docs]: add contributing guide * [style]: format the python file and cpp file in kt-kernel	2025-11-15 18:26:49 +08:00
Jiaqi Liao	13b8ddecd9	AMXMoEWrapper -> KTMoEWrapper (#1604 ) fix import KTMoEWrapper	2025-11-12 16:34:54 +08:00
Oql	34c71ba8bf	Merge pull request #1568 from kvcache-ai/add_bf16_scripts add convert_moe_to_bf16.py	2025-11-07 17:55:38 +08:00
ouqingliang	a18f007d45	add convert_moe_to_bf16.py	2025-11-07 09:53:19 +00:00
Peilin Li	d939e56646	add the convert from fp8 to bf16 for Kimi-K2 model	2025-11-06 17:20:28 +08:00
ovowei	f854d03bd7	update kt-kernel	2025-11-03 15:19:52 +08:00
ovowei	28d8663374	fix	2025-10-22 18:14:34 +08:00
Atream	4c5fcf9774	add kt-kernel	2025-10-12 05:13:00 +00:00

22 Commits