diff --git a/README.md b/README.md index 666d2c5..cf13969 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,7 @@ KTransformers is a research project focused on efficient inference and fine-tuning of large language models through CPU-GPU heterogeneous computing. The project has evolved into **two core modules**: [kt-kernel](./kt-kernel/) and [kt-sft](./kt-sft/). ## 🔥 Updates - +* **Dec 22, 2025**: Support RL-DPO fine-tuning with LLaMA-Factory. ([Tutorial](./doc/en/SFT/DPO_tutorial.md)) * **Dec 5, 2025**: Support Native Kimi-K2-Thinking inference ([Tutorial](./doc/en/Kimi-K2-Thinking-Native.md)) * **Nov 6, 2025**: Support Kimi-K2-Thinking inference ([Tutorial](./doc/en/Kimi-K2-Thinking.md)) and fine-tune ([Tutorial](./doc/en/SFT_Installation_Guide_KimiK2.md)) * **Nov 4, 2025**: KTransformers Fine-Tuning × LLaMA-Factory Integration. ([Tutorial](./doc/en/KTransformers-Fine-Tuning_User-Guide.md)) diff --git a/doc/SUMMARY.md b/doc/SUMMARY.md index af80672..8b8750a 100644 --- a/doc/SUMMARY.md +++ b/doc/SUMMARY.md @@ -7,8 +7,9 @@ # Tutorial - [kt-sft part](en/SFT/README.md) - - [kt-sft developer tech notes](en/SFT/KTransformers-Fine-Tuning_Developer-Technical-Notes.md) - [Injection Tutorial](en/SFT/injection_tutorial.md) + - [kt-sft developer tech notes](en/SFT/KTransformers-Fine-Tuning_Developer-Technical-Notes.md) + - [DPO tutorial](en/SFT/DPO_tutorial.md) diff --git a/doc/en/DPO_tutorial.md b/doc/en/SFT/DPO_tutorial.md similarity index 97% rename from doc/en/DPO_tutorial.md rename to doc/en/SFT/DPO_tutorial.md index 396e0e6..8238ce0 100644 --- a/doc/en/DPO_tutorial.md +++ b/doc/en/SFT/DPO_tutorial.md @@ -61,7 +61,7 @@ pip install custom_flashinfer/ ## Prepare Models -We uses `deepseek-ai/DeepSeek-V2-Lite` as an example here. You can replace it with other models such as Kimi K2. +We use `deepseek-ai/DeepSeek-V2-Lite` as an example here. You can replace it with other models such as Kimi K2. ## How to start diff --git a/kt-sft/README.md b/kt-sft/README.md index 22f233f..e94e84f 100644 --- a/kt-sft/README.md +++ b/kt-sft/README.md @@ -191,6 +191,8 @@ cpu_infer: 32 chunk_size: 8192 ``` +We also support RL DPO training using the KTransformers backend now. See [DPO Tutorial](../doc/en/SFT/DPO_tutorial.md) for details. + `kt_optimize_rule` controls **placement strategy**. See also [ktransformers/optimize_rules](https://github.com/kvcache-ai/ktransformers/tree/main/ktransformers/optimize/optimize_rules). Naming hints (`*` = wildcard): | Pattern | Meaning |