mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2026-04-29 18:51:15 +00:00
[docs]: add contribuing guide and add hooks install (#1613)
* [feat]: update kt-kernel hooks and add contribution guide * [docs]: add contributing guide * [style]: format the python file and cpp file in kt-kernel
This commit is contained in:
139
.github/CONTRIBUTING.md
vendored
Normal file
139
.github/CONTRIBUTING.md
vendored
Normal file
@@ -0,0 +1,139 @@
|
|||||||
|
## Before Commit!
|
||||||
|
|
||||||
|
Your commit message must follow Conventional Commits (https://www.conventionalcommits.org/) and your code should be formatted. The Git hooks will do most of the work automatically:
|
||||||
|
|
||||||
|
### Tool Requirements
|
||||||
|
|
||||||
|
You need a recent `clang-format` (>= 18). In a conda environment you can install:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
conda install -c conda-forge clang-format=18
|
||||||
|
```
|
||||||
|
|
||||||
|
If you previously configured with an older version, remove the build directory and reconfigure:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
rm -rf kt-kernel/build
|
||||||
|
```
|
||||||
|
|
||||||
|
Install `black` for Python formatting:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
conda install black
|
||||||
|
```
|
||||||
|
|
||||||
|
### Install hook:
|
||||||
|
```shell
|
||||||
|
bash kt-kernel/scripts/install-git-hooks.sh
|
||||||
|
#or just cmake the kt-kernel
|
||||||
|
cmake -S kt-kernel -B kt-kernel/build
|
||||||
|
```
|
||||||
|
|
||||||
|
There are manual commands if you need format.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
cmake -S kt-kernel -B kt-kernel/build
|
||||||
|
cmake --build kt-kernel/build --target format
|
||||||
|
```
|
||||||
|
|
||||||
|
## Developer Note
|
||||||
|
|
||||||
|
Formatting and commit message rules are enforced by Git hooks. After installing `clang-format` and `black`, just commit normally—the hooks will run formatting for you.
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
> If formatting modifies files, the commit is aborted after staging those changes. Review them and run `git commit` again. Repeat until no further formatting changes appear.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Conventional Commit Regex (Reference)
|
||||||
|
|
||||||
|
The commit-msg hook enforces this pattern:
|
||||||
|
|
||||||
|
```text
|
||||||
|
regex='^\[(feat|fix|docs|style|refactor|perf|test|build|ci|chore|revert|wip)\](\([^\)]+\))?(!)?: .+'
|
||||||
|
```
|
||||||
|
|
||||||
|
Meaning (English):
|
||||||
|
* `[type]` required — one of feat|fix|docs|style|refactor|perf|test|build|ci|chore|revert|wip
|
||||||
|
* Optional scope: `(scope)` — any chars except `)`
|
||||||
|
* Optional breaking change marker: `!` right after type or scope
|
||||||
|
* Separator: `: ` (colon + space)
|
||||||
|
* Subject: free text (at least one character)
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
```text
|
||||||
|
[feat]: add adaptive batching
|
||||||
|
[fix(parser)]: handle empty token list
|
||||||
|
[docs]!: update API section for breaking rename
|
||||||
|
```
|
||||||
|
|
||||||
|
You can bypass locally (not recommended) with:
|
||||||
|
```shell
|
||||||
|
git commit --no-verify
|
||||||
|
```
|
||||||
|
## 提交前提醒
|
||||||
|
|
||||||
|
提交信息必须满足 Conventional Commits 规范 (https://www.conventionalcommits.org/),代码需要符合格式要求。Git 钩子已经集成了大部分工作:
|
||||||
|
### 软件要求
|
||||||
|
|
||||||
|
需要较新的 `clang-format` (>= 18),在 conda 环境中安装:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
conda install -c conda-forge clang-format=18
|
||||||
|
```
|
||||||
|
|
||||||
|
如果之前用老版本配置过,请删除构建目录重新配置:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
rm -rf kt-kernel/build
|
||||||
|
```
|
||||||
|
|
||||||
|
安装 `black` 以进行 Python 文件格式化:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
conda install black
|
||||||
|
```
|
||||||
|
### 安装钩子
|
||||||
|
```shell
|
||||||
|
bash kt-kernel/scripts/install-git-hooks.sh
|
||||||
|
#or just cmake the kt-kernel
|
||||||
|
cmake -S kt-kernel -B kt-kernel/build
|
||||||
|
```
|
||||||
|
如果你需要手动格式化:
|
||||||
|
```shell
|
||||||
|
cmake -S kt-kernel -B kt-kernel/build
|
||||||
|
cmake --build kt-kernel/build --target format
|
||||||
|
```
|
||||||
|
|
||||||
|
## 开发者说明
|
||||||
|
|
||||||
|
本仓库通过 Git hooks 自动执行代码格式化与提交信息规范检查。只需安装好 `clang-format` 与 `black` 后正常执行提交即可,钩子会自动格式化。
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
> 如果格式化修改了文件,钩子会终止提交并已暂存这些改动。请查看修改后再次执行 `git commit`,重复直到没有新的格式化变更。
|
||||||
|
|
||||||
|
### 提交信息正则(参考)
|
||||||
|
|
||||||
|
钩子使用如下正则检查提交信息:
|
||||||
|
```text
|
||||||
|
regex='^\[(feat|fix|docs|style|refactor|perf|test|build|ci|chore|revert|wip)\](\([^\)]+\))?(!)?: .+'
|
||||||
|
```
|
||||||
|
含义:
|
||||||
|
* `[type]` 必填:feat|fix|docs|style|refactor|perf|test|build|ci|chore|revert|wip
|
||||||
|
* 作用域可选:`(scope)`,不能包含右括号
|
||||||
|
* 可选的破坏性标记:`!`
|
||||||
|
* 分隔符:冒号+空格 `: `
|
||||||
|
* 描述:至少一个字符
|
||||||
|
|
||||||
|
示例:
|
||||||
|
```text
|
||||||
|
[feat]: 增加自适应 batch 功能
|
||||||
|
[fix(tokenizer)]: 修复空 token 列表处理
|
||||||
|
[docs]!: 更新接口文档(存在破坏性修改)
|
||||||
|
```
|
||||||
|
|
||||||
|
跳过钩子(不推荐,仅紧急时):
|
||||||
|
```shell
|
||||||
|
git commit --no-verify
|
||||||
|
```
|
||||||
|
|
||||||
@@ -1,10 +1,12 @@
|
|||||||
#!/usr/bin/bash
|
#!/usr/bin/bash
|
||||||
# Pre-commit hook: run clang-format via CMake 'format' target and Black for Python before allowing commit.
|
# Pre-commit hook: run clang-format via kt-kernel's CMake 'format' target and Black for Python
|
||||||
# If formatting makes changes, stage them and abort so user can review.
|
# before allowing commit. If formatting makes changes, stage them and abort so user can review.
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
|
|
||||||
REPO_ROOT="$(git rev-parse --show-toplevel)"
|
REPO_ROOT="$(git rev-parse --show-toplevel)"
|
||||||
BUILD_DIR="$REPO_ROOT/build"
|
# kt-kernel project directory within the monorepo
|
||||||
|
KERNEL_DIR="$REPO_ROOT/kt-kernel"
|
||||||
|
BUILD_DIR="$KERNEL_DIR/build"
|
||||||
FORMAT_TARGET="format"
|
FORMAT_TARGET="format"
|
||||||
CLANG_FORMAT_BIN="${CLANG_FORMAT_BIN:-clang-format}"
|
CLANG_FORMAT_BIN="${CLANG_FORMAT_BIN:-clang-format}"
|
||||||
BLACK_BIN="${BLACK_BIN:-black}"
|
BLACK_BIN="${BLACK_BIN:-black}"
|
||||||
@@ -20,10 +22,10 @@ if ! command -v "$BLACK_BIN" >/dev/null 2>&1; then
|
|||||||
echo "[pre-commit] black not found (looked for $BLACK_BIN). Skipping Python format." >&2
|
echo "[pre-commit] black not found (looked for $BLACK_BIN). Skipping Python format." >&2
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Configure build directory if missing (quiet)
|
# Configure kt-kernel build directory if missing (quiet)
|
||||||
if [ ! -d "$BUILD_DIR" ] || [ ! -f "$BUILD_DIR/Makefile" ] && [ ! -f "$BUILD_DIR/build.ninja" ]; then
|
if [ ! -d "$BUILD_DIR" ] || { [ ! -f "$BUILD_DIR/Makefile" ] && [ ! -f "$BUILD_DIR/build.ninja" ]; }; then
|
||||||
echo "[pre-commit] configuring project (cmake) ..." >&2
|
echo "[pre-commit] configuring kt-kernel (cmake) ..." >&2
|
||||||
cmake -S "$REPO_ROOT" -B "$BUILD_DIR" >/dev/null
|
cmake -S "$KERNEL_DIR" -B "$BUILD_DIR" >/dev/null
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Run format target (prefer ninja if present)
|
# Run format target (prefer ninja if present)
|
||||||
@@ -38,15 +40,18 @@ fi
|
|||||||
|
|
||||||
# Run black on staged python files (or entire repo if you prefer)
|
# Run black on staged python files (or entire repo if you prefer)
|
||||||
if command -v "$BLACK_BIN" >/dev/null 2>&1; then
|
if command -v "$BLACK_BIN" >/dev/null 2>&1; then
|
||||||
# Get staged python files; if none, skip
|
# Run black only on kt-kernel's python and scripts directories
|
||||||
PY_FILES=$(git diff --cached --name-only --diff-filter=ACM | grep -E '\.py$' || true)
|
BLACK_PATHS=""
|
||||||
if [ -n "$PY_FILES" ]; then
|
if [ -d "$KERNEL_DIR/python" ]; then
|
||||||
echo "[pre-commit] running black on staged python files..." >&2
|
BLACK_PATHS="$BLACK_PATHS $KERNEL_DIR/python"
|
||||||
$BLACK_BIN $PY_FILES
|
fi
|
||||||
else
|
if [ -d "$KERNEL_DIR/scripts" ]; then
|
||||||
# Optionally format all python files; comment out if not desired
|
BLACK_PATHS="$BLACK_PATHS $KERNEL_DIR/scripts"
|
||||||
# $BLACK_BIN "$REPO_ROOT"
|
fi
|
||||||
:
|
if [ -n "$BLACK_PATHS" ]; then
|
||||||
|
echo "[pre-commit] running black on:$BLACK_PATHS" >&2
|
||||||
|
# shellcheck disable=SC2086
|
||||||
|
$BLACK_BIN $BLACK_PATHS
|
||||||
fi
|
fi
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
|||||||
@@ -79,27 +79,40 @@ if(USE_CONDA_TOOLCHAIN)
|
|||||||
set(CMAKE_INSTALL_RPATH_USE_LINK_PATH OFF)
|
set(CMAKE_INSTALL_RPATH_USE_LINK_PATH OFF)
|
||||||
endif()
|
endif()
|
||||||
|
|
||||||
## Ensure git hooks are installed when configuring the project
|
## Ensure git hooks are installed when configuring the project (monorepo-aware)
|
||||||
# If this is a git working copy and the installer exists, run it and fail the CMake configure
|
# If we are inside a git worktree (repo root is outside kt-kernel now), invoke the installer
|
||||||
# when installation fails. If no .git directory is present (e.g. source tarball), skip.
|
# which will link kt-kernel/.githooks into the top-level .git/hooks. Otherwise, skip.
|
||||||
if(EXISTS "${CMAKE_SOURCE_DIR}/.git" AND IS_DIRECTORY "${CMAKE_SOURCE_DIR}/.git")
|
find_program(GIT_BIN git)
|
||||||
if(EXISTS "${CMAKE_SOURCE_DIR}/scripts/install-git-hooks.sh")
|
if(GIT_BIN)
|
||||||
message(STATUS "Detected .git; installing git hooks using scripts/install-git-hooks.sh")
|
execute_process(
|
||||||
execute_process(
|
COMMAND "${GIT_BIN}" rev-parse --show-toplevel
|
||||||
COMMAND sh "${CMAKE_SOURCE_DIR}/scripts/install-git-hooks.sh"
|
WORKING_DIRECTORY "${CMAKE_SOURCE_DIR}"
|
||||||
WORKING_DIRECTORY "${CMAKE_SOURCE_DIR}"
|
OUTPUT_VARIABLE _GIT_TOP
|
||||||
RESULT_VARIABLE _INSTALL_GIT_HOOKS_RESULT
|
RESULT_VARIABLE _GIT_RV
|
||||||
OUTPUT_VARIABLE _INSTALL_GIT_HOOKS_OUT
|
OUTPUT_STRIP_TRAILING_WHITESPACE
|
||||||
ERROR_VARIABLE _INSTALL_GIT_HOOKS_ERR
|
ERROR_QUIET
|
||||||
|
)
|
||||||
|
if(_GIT_RV EQUAL 0 AND EXISTS "${_GIT_TOP}/.git" AND IS_DIRECTORY "${_GIT_TOP}/.git")
|
||||||
|
if(EXISTS "${CMAKE_SOURCE_DIR}/scripts/install-git-hooks.sh")
|
||||||
|
message(STATUS "Detected git worktree at ${_GIT_TOP}; installing hooks from kt-kernel/.githooks")
|
||||||
|
execute_process(
|
||||||
|
COMMAND sh "${CMAKE_SOURCE_DIR}/scripts/install-git-hooks.sh"
|
||||||
|
WORKING_DIRECTORY "${CMAKE_SOURCE_DIR}"
|
||||||
|
RESULT_VARIABLE _INSTALL_GIT_HOOKS_RESULT
|
||||||
|
OUTPUT_VARIABLE _INSTALL_GIT_HOOKS_OUT
|
||||||
|
ERROR_VARIABLE _INSTALL_GIT_HOOKS_ERR
|
||||||
)
|
)
|
||||||
if(NOT _INSTALL_GIT_HOOKS_RESULT EQUAL 0)
|
if(NOT _INSTALL_GIT_HOOKS_RESULT EQUAL 0)
|
||||||
message(FATAL_ERROR "Installing git hooks failed (exit ${_INSTALL_GIT_HOOKS_RESULT}).\nOutput:\n${_INSTALL_GIT_HOOKS_OUT}\nError:\n${_INSTALL_GIT_HOOKS_ERR}")
|
message(FATAL_ERROR "Installing git hooks failed (exit ${_INSTALL_GIT_HOOKS_RESULT}).\nOutput:\n${_INSTALL_GIT_HOOKS_OUT}\nError:\n${_INSTALL_GIT_HOOKS_ERR}")
|
||||||
|
endif()
|
||||||
|
else()
|
||||||
|
message(FATAL_ERROR "Required script 'scripts/install-git-hooks.sh' not found in kt-kernel; cannot install hooks.")
|
||||||
endif()
|
endif()
|
||||||
else()
|
else()
|
||||||
message(FATAL_ERROR "Repository appears to be a git repo but required script 'scripts/install-git-hooks.sh' was not found. Please ensure hooks installer is present.")
|
message(STATUS "No git worktree detected; skipping git hooks installation")
|
||||||
endif()
|
endif()
|
||||||
else()
|
else()
|
||||||
message(STATUS "No .git directory found; skipping git hooks installation")
|
message(STATUS "git not found; skipping git hooks installation")
|
||||||
endif()
|
endif()
|
||||||
|
|
||||||
set(CMAKE_CXX_STANDARD 20)
|
set(CMAKE_CXX_STANDARD 20)
|
||||||
|
|||||||
@@ -9,10 +9,10 @@
|
|||||||
**/
|
**/
|
||||||
#include "shared_mem_buffer.h"
|
#include "shared_mem_buffer.h"
|
||||||
|
|
||||||
|
#include <errno.h>
|
||||||
#include <numa.h>
|
#include <numa.h>
|
||||||
|
|
||||||
#include <cstdio>
|
#include <cstdio>
|
||||||
#include <errno.h>
|
|
||||||
|
|
||||||
size_t MemoryRequest::total_size() {
|
size_t MemoryRequest::total_size() {
|
||||||
size_t total = 0;
|
size_t total = 0;
|
||||||
|
|||||||
@@ -7,6 +7,7 @@
|
|||||||
#include <cstdio>
|
#include <cstdio>
|
||||||
#include <type_traits>
|
#include <type_traits>
|
||||||
|
|
||||||
|
#include "../cpu_backend/shared_mem_buffer.h"
|
||||||
#include "common.hpp"
|
#include "common.hpp"
|
||||||
|
|
||||||
// Forward declaration for Llamafile backend type checking
|
// Forward declaration for Llamafile backend type checking
|
||||||
|
|||||||
@@ -13,11 +13,11 @@
|
|||||||
#include <vector>
|
#include <vector>
|
||||||
|
|
||||||
#include "../common.hpp"
|
#include "../common.hpp"
|
||||||
|
#include "../cpu_backend/shared_mem_buffer.h"
|
||||||
#include "../moe-tp.hpp"
|
#include "../moe-tp.hpp"
|
||||||
#include "api/common.h"
|
#include "api/common.h"
|
||||||
#include "api/mat_kernel.h"
|
#include "api/mat_kernel.h"
|
||||||
#include "llama.cpp/ggml.h"
|
#include "llama.cpp/ggml.h"
|
||||||
|
|
||||||
template <class T, bool PLAIN = true>
|
template <class T, bool PLAIN = true>
|
||||||
class MOE_KERNEL_TP
|
class MOE_KERNEL_TP
|
||||||
#ifdef FORWARD_TIME_PROFILE
|
#ifdef FORWARD_TIME_PROFILE
|
||||||
|
|||||||
@@ -22,7 +22,6 @@ import triton
|
|||||||
import triton.language as tl
|
import triton.language as tl
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Q_BITS = 4
|
Q_BITS = 4
|
||||||
STORAGE_BITS = 32
|
STORAGE_BITS = 32
|
||||||
PACK_NUM = STORAGE_BITS // Q_BITS
|
PACK_NUM = STORAGE_BITS // Q_BITS
|
||||||
@@ -31,6 +30,7 @@ NUMA_NUM = 2
|
|||||||
REVERSE_AWQ_PACK_ORDER = [0, 4, 1, 5, 2, 6, 3, 7]
|
REVERSE_AWQ_PACK_ORDER = [0, 4, 1, 5, 2, 6, 3, 7]
|
||||||
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
||||||
|
|
||||||
|
|
||||||
@triton.jit
|
@triton.jit
|
||||||
def weight_dequant_kernel(x_ptr, s_ptr, y_ptr, M, N, BLOCK_SIZE: tl.constexpr):
|
def weight_dequant_kernel(x_ptr, s_ptr, y_ptr, M, N, BLOCK_SIZE: tl.constexpr):
|
||||||
pid_m = tl.program_id(axis=0)
|
pid_m = tl.program_id(axis=0)
|
||||||
@@ -51,10 +51,11 @@ def weight_dequant(x: torch.Tensor, s: torch.Tensor, block_size: int = 128) -> t
|
|||||||
assert x.dim() == 2 and s.dim() == 2
|
assert x.dim() == 2 and s.dim() == 2
|
||||||
M, N = x.size()
|
M, N = x.size()
|
||||||
y = torch.empty_like(x, dtype=torch.get_default_dtype())
|
y = torch.empty_like(x, dtype=torch.get_default_dtype())
|
||||||
grid = lambda meta: (triton.cdiv(M, meta['BLOCK_SIZE']), triton.cdiv(N, meta['BLOCK_SIZE']))
|
grid = lambda meta: (triton.cdiv(M, meta["BLOCK_SIZE"]), triton.cdiv(N, meta["BLOCK_SIZE"]))
|
||||||
weight_dequant_kernel[grid](x, s, y, M, N, BLOCK_SIZE=block_size)
|
weight_dequant_kernel[grid](x, s, y, M, N, BLOCK_SIZE=block_size)
|
||||||
return y
|
return y
|
||||||
|
|
||||||
|
|
||||||
def load_model_config(input_path: str, input_type: str = None) -> Dict:
|
def load_model_config(input_path: str, input_type: str = None) -> Dict:
|
||||||
"""Load model configuration from config.json
|
"""Load model configuration from config.json
|
||||||
|
|
||||||
@@ -297,7 +298,6 @@ class ConverterBase:
|
|||||||
handle = self.file_handle_map[file]
|
handle = self.file_handle_map[file]
|
||||||
return handle.get_tensor(key)
|
return handle.get_tensor(key)
|
||||||
|
|
||||||
|
|
||||||
# layers_id -> list[experts_id]
|
# layers_id -> list[experts_id]
|
||||||
def _find_expert_layers(self) -> Dict[int, List[int]]:
|
def _find_expert_layers(self) -> Dict[int, List[int]]:
|
||||||
"""Find all layers and experts in the model"""
|
"""Find all layers and experts in the model"""
|
||||||
@@ -517,7 +517,9 @@ class OnlineQuantConverter(ConverterBase):
|
|||||||
quant_method: str = "int4",
|
quant_method: str = "int4",
|
||||||
merge_to_safetensor: bool = True,
|
merge_to_safetensor: bool = True,
|
||||||
):
|
):
|
||||||
super().__init__(input_path, output_path, model_config, cpuinfer_threads, threadpool_count, input_type, merge_to_safetensor)
|
super().__init__(
|
||||||
|
input_path, output_path, model_config, cpuinfer_threads, threadpool_count, input_type, merge_to_safetensor
|
||||||
|
)
|
||||||
self.quant_method = quant_method
|
self.quant_method = quant_method
|
||||||
|
|
||||||
# For FP8, get block size from model_config
|
# For FP8, get block size from model_config
|
||||||
@@ -569,11 +571,11 @@ class OnlineQuantConverter(ConverterBase):
|
|||||||
if not os.path.exists(file_path):
|
if not os.path.exists(file_path):
|
||||||
raise FileNotFoundError(f"File not found: {file_path}")
|
raise FileNotFoundError(f"File not found: {file_path}")
|
||||||
|
|
||||||
with open(file_path, 'rb') as f:
|
with open(file_path, "rb") as f:
|
||||||
binary_data = f.read()
|
binary_data = f.read()
|
||||||
|
|
||||||
# Determine dtype based on file name
|
# Determine dtype based on file name
|
||||||
if 'scale' in file_path:
|
if "scale" in file_path:
|
||||||
# Scale tensors are typically float32
|
# Scale tensors are typically float32
|
||||||
np_array = np.frombuffer(binary_data, dtype=np.float32)
|
np_array = np.frombuffer(binary_data, dtype=np.float32)
|
||||||
else:
|
else:
|
||||||
@@ -616,22 +618,12 @@ class OnlineQuantConverter(ConverterBase):
|
|||||||
# Iterate through all experts
|
# Iterate through all experts
|
||||||
for expert_id in range(self.num_experts):
|
for expert_id in range(self.num_experts):
|
||||||
# For each projection (down, gate, up)
|
# For each projection (down, gate, up)
|
||||||
proj_mappings = [
|
proj_mappings = [("down", "ffn_down_exps"), ("gate", "ffn_gate_exps"), ("up", "ffn_up_exps")]
|
||||||
('down', 'ffn_down_exps'),
|
|
||||||
('gate', 'ffn_gate_exps'),
|
|
||||||
('up', 'ffn_up_exps')
|
|
||||||
]
|
|
||||||
|
|
||||||
for proj_name, proj_key in proj_mappings:
|
for proj_name, proj_key in proj_mappings:
|
||||||
# Build file patterns
|
# Build file patterns
|
||||||
quant_pattern = os.path.join(
|
quant_pattern = os.path.join(numa_folder, f"{amx_method}_{proj_name}_{expert_id}_*Byte_quant_.kt")
|
||||||
numa_folder,
|
scale_pattern = os.path.join(numa_folder, f"{amx_method}_{proj_name}_{expert_id}_*Byte_scale_.kt")
|
||||||
f'{amx_method}_{proj_name}_{expert_id}_*Byte_quant_.kt'
|
|
||||||
)
|
|
||||||
scale_pattern = os.path.join(
|
|
||||||
numa_folder,
|
|
||||||
f'{amx_method}_{proj_name}_{expert_id}_*Byte_scale_.kt'
|
|
||||||
)
|
|
||||||
|
|
||||||
# Find files using glob
|
# Find files using glob
|
||||||
quant_files = glob.glob(quant_pattern)
|
quant_files = glob.glob(quant_pattern)
|
||||||
@@ -705,18 +697,18 @@ class OnlineQuantConverter(ConverterBase):
|
|||||||
raise KeyError(f"Missing down weight_scale_inv for layer {layer_idx}, expert {expert_id}")
|
raise KeyError(f"Missing down weight_scale_inv for layer {layer_idx}, expert {expert_id}")
|
||||||
|
|
||||||
# Load FP8 weights and scales
|
# Load FP8 weights and scales
|
||||||
gate_fp8 = self._load_tensor(gate_key).to('cuda')
|
gate_fp8 = self._load_tensor(gate_key).to("cuda")
|
||||||
up_fp8 = self._load_tensor(up_key).to('cuda')
|
up_fp8 = self._load_tensor(up_key).to("cuda")
|
||||||
down_fp8 = self._load_tensor(down_key).to('cuda')
|
down_fp8 = self._load_tensor(down_key).to("cuda")
|
||||||
|
|
||||||
gate_scale_inv = self._load_tensor(gate_scale_key).to('cuda')
|
gate_scale_inv = self._load_tensor(gate_scale_key).to("cuda")
|
||||||
up_scale_inv = self._load_tensor(up_scale_key).to('cuda')
|
up_scale_inv = self._load_tensor(up_scale_key).to("cuda")
|
||||||
down_scale_inv = self._load_tensor(down_scale_key).to('cuda')
|
down_scale_inv = self._load_tensor(down_scale_key).to("cuda")
|
||||||
|
|
||||||
# Dequantize FP8 to BF16 using block-wise scaling
|
# Dequantize FP8 to BF16 using block-wise scaling
|
||||||
gate_weight = weight_dequant(gate_fp8, gate_scale_inv).to('cpu').to(torch.bfloat16).contiguous()
|
gate_weight = weight_dequant(gate_fp8, gate_scale_inv).to("cpu").to(torch.bfloat16).contiguous()
|
||||||
up_weight = weight_dequant(up_fp8, up_scale_inv).to('cpu').to(torch.bfloat16).contiguous()
|
up_weight = weight_dequant(up_fp8, up_scale_inv).to("cpu").to(torch.bfloat16).contiguous()
|
||||||
down_weight = weight_dequant(down_fp8, down_scale_inv).to('cpu').to(torch.bfloat16).contiguous()
|
down_weight = weight_dequant(down_fp8, down_scale_inv).to("cpu").to(torch.bfloat16).contiguous()
|
||||||
|
|
||||||
elif self.input_type == "fp16":
|
elif self.input_type == "fp16":
|
||||||
# Load FP16 and convert to BF16
|
# Load FP16 and convert to BF16
|
||||||
@@ -804,6 +796,7 @@ class OnlineQuantConverter(ConverterBase):
|
|||||||
print(f" Keeping layer folder structure at {self.output_path}/_layer_{layer_idx}")
|
print(f" Keeping layer folder structure at {self.output_path}/_layer_{layer_idx}")
|
||||||
return {}
|
return {}
|
||||||
|
|
||||||
|
|
||||||
"""
|
"""
|
||||||
Example usage(test passed):
|
Example usage(test passed):
|
||||||
python convert_cpu_weights.py --input-path /mnt/data3/models/DeepSeek-R1-0528/ --input-type fp8 --output /mnt/data3/models/DeepSeek-R1-0528-INT4-test --quant-method int4 --cpuinfer-threads 60 --threadpool-count 2
|
python convert_cpu_weights.py --input-path /mnt/data3/models/DeepSeek-R1-0528/ --input-type fp8 --output /mnt/data3/models/DeepSeek-R1-0528-INT4-test --quant-method int4 --cpuinfer-threads 60 --threadpool-count 2
|
||||||
@@ -811,6 +804,7 @@ python convert_cpu_weights.py --input-path /mnt/data3/models/DeepSeek-R1-0528/ -
|
|||||||
python convert_cpu_weights.py --input-path /mnt/data2/models/Qwen3-Next-80B-A3B-Instruct --input-type bf16 --output /mnt/data2/models/Qwen3-Next-80B-A3B-Instruct-INT4-test --quant-method int4 --cpuinfer-threads 60 --threadpool-count 2
|
python convert_cpu_weights.py --input-path /mnt/data2/models/Qwen3-Next-80B-A3B-Instruct --input-type bf16 --output /mnt/data2/models/Qwen3-Next-80B-A3B-Instruct-INT4-test --quant-method int4 --cpuinfer-threads 60 --threadpool-count 2
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
parser = argparse.ArgumentParser(description="Convert SafeTensors to column major 1D format")
|
parser = argparse.ArgumentParser(description="Convert SafeTensors to column major 1D format")
|
||||||
parser.add_argument("--input-path", "-i", required=True, help="Input directory with safetensors")
|
parser.add_argument("--input-path", "-i", required=True, help="Input directory with safetensors")
|
||||||
@@ -873,12 +867,25 @@ def main():
|
|||||||
|
|
||||||
if quant_method == "awq":
|
if quant_method == "awq":
|
||||||
converter = AWQToColumnMajorConverter(
|
converter = AWQToColumnMajorConverter(
|
||||||
args.input_path, args.output, model_config, args.cpuinfer_threads, args.threadpool_count, input_type=None, merge_to_safetensor=merge_to_safetensor
|
args.input_path,
|
||||||
|
args.output,
|
||||||
|
model_config,
|
||||||
|
args.cpuinfer_threads,
|
||||||
|
args.threadpool_count,
|
||||||
|
input_type=None,
|
||||||
|
merge_to_safetensor=merge_to_safetensor,
|
||||||
)
|
)
|
||||||
elif quant_method in ["int4", "int8"] and args.input_type in ["fp8", "fp16", "bf16"]:
|
elif quant_method in ["int4", "int8"] and args.input_type in ["fp8", "fp16", "bf16"]:
|
||||||
# Use OnlineQuantConverter for both INT4 and INT8 quantization
|
# Use OnlineQuantConverter for both INT4 and INT8 quantization
|
||||||
converter = OnlineQuantConverter(
|
converter = OnlineQuantConverter(
|
||||||
args.input_path, args.output, model_config, args.cpuinfer_threads, args.threadpool_count, args.input_type, quant_method, merge_to_safetensor
|
args.input_path,
|
||||||
|
args.output,
|
||||||
|
model_config,
|
||||||
|
args.cpuinfer_threads,
|
||||||
|
args.threadpool_count,
|
||||||
|
args.input_type,
|
||||||
|
quant_method,
|
||||||
|
merge_to_safetensor,
|
||||||
)
|
)
|
||||||
else:
|
else:
|
||||||
raise ValueError(
|
raise ValueError(
|
||||||
|
|||||||
@@ -34,63 +34,42 @@ from datasets import load_dataset
|
|||||||
|
|
||||||
def parse_args():
|
def parse_args():
|
||||||
parser = argparse.ArgumentParser(description="Quantize MoE models with selective quantization")
|
parser = argparse.ArgumentParser(description="Quantize MoE models with selective quantization")
|
||||||
|
|
||||||
# Required arguments
|
# Required arguments
|
||||||
parser.add_argument(
|
parser.add_argument("--model_id", type=str, required=True, help="Path to the input model directory")
|
||||||
"--model_id",
|
parser.add_argument("--output_dir", type=str, required=True, help="Path to save the quantized model")
|
||||||
type=str,
|
|
||||||
required=True,
|
|
||||||
help="Path to the input model directory"
|
|
||||||
)
|
|
||||||
parser.add_argument(
|
|
||||||
"--output_dir",
|
|
||||||
type=str,
|
|
||||||
required=True,
|
|
||||||
help="Path to save the quantized model"
|
|
||||||
)
|
|
||||||
|
|
||||||
# Optional arguments
|
# Optional arguments
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--quant_type",
|
"--quant_type",
|
||||||
type=str,
|
type=str,
|
||||||
choices=["W4A16", "W8A16"],
|
choices=["W4A16", "W8A16"],
|
||||||
default="W8A16",
|
default="W8A16",
|
||||||
help="Quantization type: W4A16 (GPTQ4) or W8A16 (GPTQ8). Default: W8A16"
|
help="Quantization type: W4A16 (GPTQ4) or W8A16 (GPTQ8). Default: W8A16",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--num_calibration_samples",
|
"--num_calibration_samples", type=int, default=512, help="Number of calibration samples. Default: 512"
|
||||||
type=int,
|
|
||||||
default=512,
|
|
||||||
help="Number of calibration samples. Default: 512"
|
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--max_sequence_length",
|
"--max_sequence_length", type=int, default=2048, help="Maximum sequence length for calibration. Default: 2048"
|
||||||
type=int,
|
|
||||||
default=2048,
|
|
||||||
help="Maximum sequence length for calibration. Default: 2048"
|
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--dampening_frac",
|
"--dampening_frac",
|
||||||
type=float,
|
type=float,
|
||||||
default=0.1,
|
default=0.1,
|
||||||
help="Dampening fraction to mitigate quantization noise. Default: 0.1"
|
help="Dampening fraction to mitigate quantization noise. Default: 0.1",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--dataset",
|
"--dataset",
|
||||||
type=str,
|
type=str,
|
||||||
default="HuggingFaceH4/ultrachat_200k",
|
default="HuggingFaceH4/ultrachat_200k",
|
||||||
help="Dataset for calibration. Default: HuggingFaceH4/ultrachat_200k"
|
help="Dataset for calibration. Default: HuggingFaceH4/ultrachat_200k",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--dataset_split",
|
"--dataset_split", type=str, default="train_sft", help="Dataset split to use. Default: train_sft"
|
||||||
type=str,
|
|
||||||
default="train_sft",
|
|
||||||
help="Dataset split to use. Default: train_sft"
|
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--force_cpu",
|
"--force_cpu", action="store_true", help="Force all computations to CPU (sets CUDA_VISIBLE_DEVICES='')"
|
||||||
action="store_true",
|
|
||||||
help="Force all computations to CPU (sets CUDA_VISIBLE_DEVICES='')"
|
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--ignore_patterns",
|
"--ignore_patterns",
|
||||||
@@ -103,29 +82,22 @@ def parse_args():
|
|||||||
r"re:.*\.shared_expert\..*$",
|
r"re:.*\.shared_expert\..*$",
|
||||||
r"re:.*\.shared_experts\..*$",
|
r"re:.*\.shared_experts\..*$",
|
||||||
r"re:.*\.mlp\.shared_expert_gate$",
|
r"re:.*\.mlp\.shared_expert_gate$",
|
||||||
r"re:.*\.linear_attn\..*$"
|
r"re:.*\.linear_attn\..*$",
|
||||||
],
|
],
|
||||||
help="Regex patterns for layers to ignore during quantization"
|
help="Regex patterns for layers to ignore during quantization",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--torch_dtype",
|
"--torch_dtype",
|
||||||
type=str,
|
type=str,
|
||||||
choices=["bfloat16", "float16", "float32"],
|
choices=["bfloat16", "float16", "float32"],
|
||||||
default="bfloat16",
|
default="bfloat16",
|
||||||
help="PyTorch dtype for model loading. Default: bfloat16"
|
help="PyTorch dtype for model loading. Default: bfloat16",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--trust_remote_code",
|
"--trust_remote_code", action="store_true", help="Allow loading of remote code (required for some models)"
|
||||||
action="store_true",
|
|
||||||
help="Allow loading of remote code (required for some models)"
|
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument("--random_seed", type=int, default=42, help="Random seed for dataset shuffling. Default: 42")
|
||||||
"--random_seed",
|
|
||||||
type=int,
|
|
||||||
default=42,
|
|
||||||
help="Random seed for dataset shuffling. Default: 42"
|
|
||||||
)
|
|
||||||
|
|
||||||
return parser.parse_args()
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
@@ -152,11 +124,7 @@ def get_torch_dtype(dtype_str):
|
|||||||
Returns:
|
Returns:
|
||||||
torch.dtype: Corresponding PyTorch dtype
|
torch.dtype: Corresponding PyTorch dtype
|
||||||
"""
|
"""
|
||||||
dtype_map = {
|
dtype_map = {"bfloat16": torch.bfloat16, "float16": torch.float16, "float32": torch.float32}
|
||||||
"bfloat16": torch.bfloat16,
|
|
||||||
"float16": torch.float16,
|
|
||||||
"float32": torch.float32
|
|
||||||
}
|
|
||||||
return dtype_map[dtype_str]
|
return dtype_map[dtype_str]
|
||||||
|
|
||||||
|
|
||||||
@@ -176,18 +144,18 @@ def check_dense_layers_and_update_ignore(model_id, ignore_patterns, trust_remote
|
|||||||
Updated ignore_patterns list with dense layer patterns added
|
Updated ignore_patterns list with dense layer patterns added
|
||||||
"""
|
"""
|
||||||
print("🔍 Checking model configuration for dense layers...")
|
print("🔍 Checking model configuration for dense layers...")
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# Load model configuration
|
# Load model configuration
|
||||||
config = AutoConfig.from_pretrained(model_id, trust_remote_code=trust_remote_code)
|
config = AutoConfig.from_pretrained(model_id, trust_remote_code=trust_remote_code)
|
||||||
|
|
||||||
# Check if the model has first_k_dense_replace parameter
|
# Check if the model has first_k_dense_replace parameter
|
||||||
first_k_dense_replace = getattr(config, 'first_k_dense_replace', None)
|
first_k_dense_replace = getattr(config, "first_k_dense_replace", None)
|
||||||
|
|
||||||
if first_k_dense_replace is not None and first_k_dense_replace > 0:
|
if first_k_dense_replace is not None and first_k_dense_replace > 0:
|
||||||
print(f"✅ Found dense layers configuration: first_k_dense_replace = {first_k_dense_replace}")
|
print(f"✅ Found dense layers configuration: first_k_dense_replace = {first_k_dense_replace}")
|
||||||
print(f" Adding first {first_k_dense_replace} layers to ignore list...")
|
print(f" Adding first {first_k_dense_replace} layers to ignore list...")
|
||||||
|
|
||||||
# Create regex pattern for dense layers (layers 0 to first_k_dense_replace-1)
|
# Create regex pattern for dense layers (layers 0 to first_k_dense_replace-1)
|
||||||
if first_k_dense_replace == 1:
|
if first_k_dense_replace == 1:
|
||||||
dense_pattern = r"re:model\.layers\.0\.mlp\..*$"
|
dense_pattern = r"re:model\.layers\.0\.mlp\..*$"
|
||||||
@@ -195,18 +163,18 @@ def check_dense_layers_and_update_ignore(model_id, ignore_patterns, trust_remote
|
|||||||
# For multiple layers, use range pattern
|
# For multiple layers, use range pattern
|
||||||
layer_range = f"[0-{first_k_dense_replace-1}]"
|
layer_range = f"[0-{first_k_dense_replace-1}]"
|
||||||
dense_pattern = f"re:model\\.layers\\.{layer_range}\\.mlp\\..*$"
|
dense_pattern = f"re:model\\.layers\\.{layer_range}\\.mlp\\..*$"
|
||||||
|
|
||||||
# Add the dense layer pattern to ignore list
|
# Add the dense layer pattern to ignore list
|
||||||
updated_ignore_patterns = ignore_patterns + [dense_pattern]
|
updated_ignore_patterns = ignore_patterns + [dense_pattern]
|
||||||
|
|
||||||
print(f" Dense layer pattern added: {dense_pattern}")
|
print(f" Dense layer pattern added: {dense_pattern}")
|
||||||
print(f" This will ignore MLP components in layers 0-{first_k_dense_replace-1}")
|
print(f" This will ignore MLP components in layers 0-{first_k_dense_replace-1}")
|
||||||
|
|
||||||
return updated_ignore_patterns
|
return updated_ignore_patterns
|
||||||
else:
|
else:
|
||||||
print("ℹ️ No dense layers detected (first_k_dense_replace not found or is 0)")
|
print("ℹ️ No dense layers detected (first_k_dense_replace not found or is 0)")
|
||||||
return ignore_patterns
|
return ignore_patterns
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f"⚠️ Warning: Could not check model config for dense layers: {e}")
|
print(f"⚠️ Warning: Could not check model config for dense layers: {e}")
|
||||||
print(" Proceeding with original ignore patterns...")
|
print(" Proceeding with original ignore patterns...")
|
||||||
@@ -246,11 +214,7 @@ def load_and_prepare_dataset(dataset_name, dataset_split, num_samples, max_lengt
|
|||||||
# Tokenize the data
|
# Tokenize the data
|
||||||
def tokenize(sample):
|
def tokenize(sample):
|
||||||
return tokenizer(
|
return tokenizer(
|
||||||
sample["text"],
|
sample["text"], padding=False, max_length=max_length, truncation=True, add_special_tokens=False
|
||||||
padding=False,
|
|
||||||
max_length=max_length,
|
|
||||||
truncation=True,
|
|
||||||
add_special_tokens=False
|
|
||||||
)
|
)
|
||||||
|
|
||||||
ds = ds.map(tokenize, remove_columns=ds.column_names)
|
ds = ds.map(tokenize, remove_columns=ds.column_names)
|
||||||
@@ -291,9 +255,7 @@ def main():
|
|||||||
# 0) Check for dense layers and update ignore patterns
|
# 0) Check for dense layers and update ignore patterns
|
||||||
# Dense layers in the first few layers should not be quantized
|
# Dense layers in the first few layers should not be quantized
|
||||||
updated_ignore_patterns = check_dense_layers_and_update_ignore(
|
updated_ignore_patterns = check_dense_layers_and_update_ignore(
|
||||||
args.model_id,
|
args.model_id, args.ignore_patterns, args.trust_remote_code
|
||||||
args.ignore_patterns,
|
|
||||||
args.trust_remote_code
|
|
||||||
)
|
)
|
||||||
|
|
||||||
# --------------------------------------------------------------------
|
# --------------------------------------------------------------------
|
||||||
@@ -302,13 +264,9 @@ def main():
|
|||||||
print("🔍 Inferring device map...")
|
print("🔍 Inferring device map...")
|
||||||
with init_empty_weights():
|
with init_empty_weights():
|
||||||
dummy = AutoModelForCausalLM.from_pretrained(
|
dummy = AutoModelForCausalLM.from_pretrained(
|
||||||
args.model_id,
|
args.model_id, torch_dtype=torch_dtype, trust_remote_code=args.trust_remote_code
|
||||||
torch_dtype=torch_dtype,
|
|
||||||
trust_remote_code=args.trust_remote_code
|
|
||||||
)
|
|
||||||
device_map = infer_auto_device_map(
|
|
||||||
dummy, no_split_module_classes=dummy._no_split_modules
|
|
||||||
)
|
)
|
||||||
|
device_map = infer_auto_device_map(dummy, no_split_module_classes=dummy._no_split_modules)
|
||||||
del dummy
|
del dummy
|
||||||
|
|
||||||
# Force all modules to CPU for quantization
|
# Force all modules to CPU for quantization
|
||||||
@@ -335,7 +293,7 @@ def main():
|
|||||||
args.num_calibration_samples,
|
args.num_calibration_samples,
|
||||||
args.max_sequence_length,
|
args.max_sequence_length,
|
||||||
tokenizer,
|
tokenizer,
|
||||||
args.random_seed
|
args.random_seed,
|
||||||
)
|
)
|
||||||
|
|
||||||
# --------------------------------------------------------------------
|
# --------------------------------------------------------------------
|
||||||
@@ -373,4 +331,4 @@ def main():
|
|||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
main()
|
main()
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ from safetensors.torch import load_file, save_file
|
|||||||
|
|
||||||
import gc
|
import gc
|
||||||
|
|
||||||
|
|
||||||
def weight_dequant_cpu(x: torch.Tensor, s: torch.Tensor, block_size: int = 128) -> torch.Tensor:
|
def weight_dequant_cpu(x: torch.Tensor, s: torch.Tensor, block_size: int = 128) -> torch.Tensor:
|
||||||
assert x.dim() == 2 and s.dim() == 2, "Expect 2D tensors for x and s"
|
assert x.dim() == 2 and s.dim() == 2, "Expect 2D tensors for x and s"
|
||||||
M, N = x.shape
|
M, N = x.shape
|
||||||
@@ -27,6 +28,7 @@ def weight_dequant_cpu(x: torch.Tensor, s: torch.Tensor, block_size: int = 128)
|
|||||||
y[m0:m1, n0:n1] = sub.to(torch.bfloat16)
|
y[m0:m1, n0:n1] = sub.to(torch.bfloat16)
|
||||||
return y
|
return y
|
||||||
|
|
||||||
|
|
||||||
def main(fp8_path, bf16_path):
|
def main(fp8_path, bf16_path):
|
||||||
torch.set_default_dtype(torch.bfloat16)
|
torch.set_default_dtype(torch.bfloat16)
|
||||||
os.makedirs(bf16_path, exist_ok=True)
|
os.makedirs(bf16_path, exist_ok=True)
|
||||||
@@ -34,7 +36,7 @@ def main(fp8_path, bf16_path):
|
|||||||
with open(model_index_file, "r") as f:
|
with open(model_index_file, "r") as f:
|
||||||
model_index = json.load(f)
|
model_index = json.load(f)
|
||||||
weight_map = model_index["weight_map"]
|
weight_map = model_index["weight_map"]
|
||||||
|
|
||||||
loaded_files = {}
|
loaded_files = {}
|
||||||
fp8_weight_names = []
|
fp8_weight_names = []
|
||||||
|
|
||||||
@@ -51,7 +53,7 @@ def main(fp8_path, bf16_path):
|
|||||||
file_name = os.path.basename(safetensor_file)
|
file_name = os.path.basename(safetensor_file)
|
||||||
current_state_dict = load_file(safetensor_file, device="cpu")
|
current_state_dict = load_file(safetensor_file, device="cpu")
|
||||||
loaded_files[file_name] = current_state_dict
|
loaded_files[file_name] = current_state_dict
|
||||||
|
|
||||||
new_state_dict = {}
|
new_state_dict = {}
|
||||||
for weight_name, weight in current_state_dict.items():
|
for weight_name, weight in current_state_dict.items():
|
||||||
if weight_name.endswith("_scale_inv"):
|
if weight_name.endswith("_scale_inv"):
|
||||||
@@ -67,17 +69,17 @@ def main(fp8_path, bf16_path):
|
|||||||
new_state_dict[weight_name] = weight
|
new_state_dict[weight_name] = weight
|
||||||
else:
|
else:
|
||||||
new_state_dict[weight_name] = weight
|
new_state_dict[weight_name] = weight
|
||||||
|
|
||||||
new_safetensor_file = os.path.join(bf16_path, file_name)
|
new_safetensor_file = os.path.join(bf16_path, file_name)
|
||||||
save_file(new_state_dict, new_safetensor_file)
|
save_file(new_state_dict, new_safetensor_file)
|
||||||
|
|
||||||
if len(loaded_files) > 2:
|
if len(loaded_files) > 2:
|
||||||
oldest_file = next(iter(loaded_files))
|
oldest_file = next(iter(loaded_files))
|
||||||
del loaded_files[oldest_file]
|
del loaded_files[oldest_file]
|
||||||
gc.collect()
|
gc.collect()
|
||||||
if torch.cuda.is_available():
|
if torch.cuda.is_available():
|
||||||
torch.cuda.empty_cache()
|
torch.cuda.empty_cache()
|
||||||
|
|
||||||
new_model_index_file = os.path.join(bf16_path, "model.safetensors.index.json")
|
new_model_index_file = os.path.join(bf16_path, "model.safetensors.index.json")
|
||||||
for weight_name in fp8_weight_names:
|
for weight_name in fp8_weight_names:
|
||||||
scale_inv_name = f"{weight_name}_scale_inv"
|
scale_inv_name = f"{weight_name}_scale_inv"
|
||||||
@@ -87,9 +89,10 @@ def main(fp8_path, bf16_path):
|
|||||||
json.dump({"metadata": {}, "weight_map": weight_map}, f, indent=2)
|
json.dump({"metadata": {}, "weight_map": weight_map}, f, indent=2)
|
||||||
print(f"Finish, Result in: {bf16_path}")
|
print(f"Finish, Result in: {bf16_path}")
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
parser = ArgumentParser()
|
parser = ArgumentParser()
|
||||||
parser.add_argument("--input-fp8-hf-path", type=str, required=True, help="Kimi-K2 FP8 model")
|
parser.add_argument("--input-fp8-hf-path", type=str, required=True, help="Kimi-K2 FP8 model")
|
||||||
parser.add_argument("--output-bf16-hf-path", type=str, required=True, help="BF16 model (After convert)")
|
parser.add_argument("--output-bf16-hf-path", type=str, required=True, help="BF16 model (After convert)")
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
main(args.input_fp8_hf_path, args.output_bf16_hf_path)
|
main(args.input_fp8_hf_path, args.output_bf16_hf_path)
|
||||||
|
|||||||
@@ -48,9 +48,7 @@ def _dequantize_tensor(
|
|||||||
if scales.numel() == weight.numel():
|
if scales.numel() == weight.numel():
|
||||||
scales = scales.reshape_as(weight)
|
scales = scales.reshape_as(weight)
|
||||||
else:
|
else:
|
||||||
raise ValueError(
|
raise ValueError(f"Scale shape {scales.shape} incompatible with weight shape {weight.shape}")
|
||||||
f"Scale shape {scales.shape} incompatible with weight shape {weight.shape}"
|
|
||||||
)
|
|
||||||
bf16 = (weight.to(torch.float32) * scales).to(torch.bfloat16)
|
bf16 = (weight.to(torch.float32) * scales).to(torch.bfloat16)
|
||||||
return bf16.contiguous()
|
return bf16.contiguous()
|
||||||
|
|
||||||
@@ -128,9 +126,7 @@ def convert_file(
|
|||||||
|
|
||||||
os.makedirs(os.path.dirname(output_path), exist_ok=True)
|
os.makedirs(os.path.dirname(output_path), exist_ok=True)
|
||||||
save_file(tensors, output_path)
|
save_file(tensors, output_path)
|
||||||
print(
|
print(f"[done] wrote {output_path} (converted={stats['converted']}, skipped={stats['skipped']})")
|
||||||
f"[done] wrote {output_path} (converted={stats['converted']}, skipped={stats['skipped']})"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def parse_args() -> argparse.Namespace:
|
def parse_args() -> argparse.Namespace:
|
||||||
@@ -174,9 +170,7 @@ def main():
|
|||||||
targets = [os.path.join(model_dir, fname) for fname in args.files]
|
targets = [os.path.join(model_dir, fname) for fname in args.files]
|
||||||
else:
|
else:
|
||||||
targets = [
|
targets = [
|
||||||
os.path.join(model_dir, name)
|
os.path.join(model_dir, name) for name in sorted(os.listdir(model_dir)) if name.endswith(".safetensors")
|
||||||
for name in sorted(os.listdir(model_dir))
|
|
||||||
if name.endswith(".safetensors")
|
|
||||||
]
|
]
|
||||||
|
|
||||||
if not targets:
|
if not targets:
|
||||||
|
|||||||
@@ -1,24 +1,29 @@
|
|||||||
#!/usr/bin/env sh
|
#!/usr/bin/env sh
|
||||||
# Install git hooks from .githooks into .git/hooks by creating symlinks (or copying if symlink fails).
|
# Install git hooks from kt-kernel/.githooks into the monorepo's .git/hooks by
|
||||||
|
# creating symlinks (or copying if symlink fails).
|
||||||
|
|
||||||
set -eu
|
set -eu
|
||||||
|
|
||||||
|
# This script lives in kt-kernel/scripts/, so REPO_ROOT = kt-kernel
|
||||||
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
|
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
|
||||||
GIT_DIR="$REPO_ROOT/.git"
|
|
||||||
HOOKS_SRC="$REPO_ROOT/.githooks"
|
HOOKS_SRC="$REPO_ROOT/.githooks"
|
||||||
|
|
||||||
|
# Detect the top-level Git worktree (the monorepo root: ktransformers)
|
||||||
|
GIT_TOP="$(git rev-parse --show-toplevel 2>/dev/null || true)"
|
||||||
|
if [ -z "$GIT_TOP" ] || [ ! -d "$GIT_TOP/.git" ]; then
|
||||||
|
echo "[install-git-hooks] Not inside a git worktree; skipping hooks installation." >&2
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
GIT_DIR="$GIT_TOP/.git"
|
||||||
HOOKS_DEST="$GIT_DIR/hooks"
|
HOOKS_DEST="$GIT_DIR/hooks"
|
||||||
|
|
||||||
if [ ! -d "$GIT_DIR" ]; then
|
|
||||||
echo "Not a git repository (no .git directory) at $REPO_ROOT" >&2
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
if [ ! -d "$HOOKS_SRC" ]; then
|
if [ ! -d "$HOOKS_SRC" ]; then
|
||||||
echo "No .githooks directory found at $HOOKS_SRC" >&2
|
echo "[install-git-hooks] No .githooks directory found at $HOOKS_SRC" >&2
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
echo "Installing git hooks from $HOOKS_SRC to $HOOKS_DEST"
|
echo "[install-git-hooks] Installing git hooks from $HOOKS_SRC to $HOOKS_DEST (repo: $GIT_TOP)"
|
||||||
|
|
||||||
# Ensure all source hook files are executable so that even if copied (not symlinked) they run.
|
# Ensure all source hook files are executable so that even if copied (not symlinked) they run.
|
||||||
for src_hook in "$HOOKS_SRC"/*; do
|
for src_hook in "$HOOKS_SRC"/*; do
|
||||||
@@ -49,4 +54,4 @@ for hook in "$HOOKS_SRC"/*; do
|
|||||||
fi
|
fi
|
||||||
done
|
done
|
||||||
|
|
||||||
echo "Done. Hooks installed."
|
echo "[install-git-hooks] Done. Hooks installed."
|
||||||
|
|||||||
Reference in New Issue
Block a user