Commit Graph

769 Commits

Author SHA1 Message Date
Azure-Tang
d98433c2d1 update git action env, add USE_BALANCE_SERVE=1 2025-04-01 12:58:28 +00:00
dongjw
5c7ed7b579 fix top_p = 0 bug 2025-04-01 20:38:33 +08:00
Azure-Tang
aeabd783b0 update git action env, add BALANCE_SERVE=1 2025-04-01 11:21:55 +00:00
Azure-Tang
31677181c3 Fix ktransformers-server flashinfer wrapper position arg issue;
Fix db position issue
2025-04-01 07:30:23 +00:00
Azure-Tang
203b853c75 rm KMoEGateDeepSeekV3, fall back to KMoEGate 2025-04-01 07:13:05 +00:00
Azure-Tang
3a5330b215 Merge branch 'main' into work-concurrent 2025-04-01 06:48:19 +00:00
fishingfly
7549ff335a fix: refine backend error message to include ROCM_HOME
Signed-off-by: fishingfly <zhoyuzf@163.com>
2025-04-01 10:50:38 +08:00
Atream
80c5cbecdd add nlohmann 2025-04-01 10:38:45 +08:00
Atream
9360d1e3c8 add submodules 2025-03-31 23:20:29 +08:00
Atream
25cee5810e add balance-serve, support concurrence 2025-03-31 22:55:32 +08:00
Atream
8d0292aa44 refactor folders 2025-03-31 22:45:37 +08:00
Yuhao Tsui
84164f584c Update completions.py 2025-03-26 15:39:46 +08:00
Yuhao Tsui
52fa671c10 Merge branch 'kvcache-ai:main' into main 2025-03-26 11:06:00 +08:00
Atream
f142f4dff3 Merge pull request #956 from kvcache-ai/Atream-patch-7
Update README.md
2025-03-22 12:14:48 +08:00
Atream
d4c6c2bb02 Update README.md 2025-03-22 12:14:36 +08:00
Aubrey Li
a12e8ab46e yaml: fix Marlin AssertionError
Marlin quantized linear only supports GPU device, when change generate_op
to "KLinearMarlin", generate_device need to be changed to "cuda" accordingly.

Fixes: e5b001d76f ("Update readme; Format code; Add example yaml.")
2025-03-21 23:58:20 +08:00
Aubrey Li
f4d52d1f0c Restore CPU offloading capability 2025-03-21 10:04:31 +08:00
Jiaqi Liao
05f6cede37 Merge pull request #943 from SkqLiao/main
fix benchmark params for human eval benchmark
2025-03-20 18:49:34 +08:00
SkqLiao
6d4626a5d9 fix params 2025-03-20 18:48:51 +08:00
Atream
ddd35d5be9 Merge pull request #940 from kvcache-ai/Atream-patch-6
Update gate.py
2025-03-20 14:54:20 +08:00
Atream
633af5d235 Update gate.py 2025-03-20 14:54:01 +08:00
SkqLiao
8cc4df980e use DeepSeek V3 instead of R1 for benchmarking 2025-03-20 11:59:03 +08:00
Jiaqi Liao
32a91c78c1 Merge pull request #935 from SkqLiao/main
Fix benchmarking slow issue on self-hosted actions
2025-03-20 10:14:37 +08:00
SkqLiao
e7d7d2705c rename CI/CD 2025-03-20 10:11:24 +08:00
SkqLiao
19c824f9d0 change cpu-infer due to actual cpu cores on self-hosted server. 2025-03-20 10:10:52 +08:00
Jiaqi Liao
649489dc67 Merge pull request #931 from SkqLiao/main
Add Human Eval Benchmark Test for CI/CD
2025-03-19 21:35:24 +08:00
SkqLiao
bad334fa5b fix path 2025-03-19 21:28:58 +08:00
SkqLiao
bc369b256c add CI/CD for human eval score benchmarking 2025-03-19 21:25:21 +08:00
Atream
8be56a0190 Merge pull request #927 from kvcache-ai/fix-gate-precision
Update gate.py
2025-03-19 16:16:31 +08:00
Atream
b453333f60 Update gate.py 2025-03-19 16:14:54 +08:00
Atream
6ca233cca3 Merge pull request #926 from kvcache-ai/Atream-patch-5
Update gate.py
2025-03-19 12:17:09 +08:00
Atream
44599229cd Update gate.py 2025-03-19 12:16:48 +08:00
Atream
aa8f985f85 Merge pull request #925 from kvcache-ai/fix-gate-compile
fix-gate-compile
2025-03-19 11:44:41 +08:00
Atream
114995355b fix-gate-compile 2025-03-19 11:27:18 +08:00
ZiWei Yuan
e788248364 Merge pull request #916 from kvcache-ai/patch_v0.2.3post2
📝 fix typo ktransformer->ktransformers
2025-03-17 17:55:30 +08:00
liam
4748a912e2 📝 fix typo ktransformer->ktransformers 2025-03-17 17:54:00 +08:00
Atream
8b51b0f058 Merge pull request #915 from kvcache-ai/Atream-patch-4
Atream patch 4
2025-03-17 17:05:39 +08:00
Atream
167506b779 Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml 2025-03-17 17:05:01 +08:00
Atream
c9a0c44213 Update DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml 2025-03-17 17:03:52 +08:00
Atream
3aee0fa099 Merge pull request #913 from kvcache-ai/Atream-patch-3
Add files via upload
2025-03-17 17:00:28 +08:00
Atream
094ac8f3a4 Add files via upload 2025-03-17 16:59:57 +08:00
ZiWei Yuan
8a8311cb04 Merge pull request #911 from kvcache-ai/patch_v0.2.3post2
🔧 update multi-gpu-fp8-linear and multi-gpu marlin yaml
2025-03-17 15:09:11 +08:00
liam
19f058ec9e 🔧 update multi-gpu-fp8-linear and multi-gpu marlin yaml 2025-03-17 15:08:12 +08:00
Azure
0e93a09d67 Merge pull request #906 from Azure-Tang/main
[Fix] Fix rocm example yaml
2025-03-16 10:27:59 +08:00
Azure-Tang
85c32fdd10 Fix rocm example yaml 2025-03-15 22:27:02 -04:00
Azure
63604cac59 Merge pull request #904 from Azure-Tang/main
[fix]Fix rocm compilation
2025-03-16 00:36:16 +08:00
Azure-Tang
4a31237346 fix rocm compilation 2025-03-15 12:34:03 -04:00
Atream
c51818c39a Merge pull request #902 from kvcache-ai/rollback-triton-prefill
rollback-triton-prefill
v0.2.3post2
2025-03-15 23:09:30 +08:00
Atream
3934b9dfc1 rollback-triton-prefill 2025-03-15 14:21:21 +00:00
ZiWei Yuan
bda9cf15e7 Merge pull request #899 from kvcache-ai/develop-0.2.3post2
 fix readme path
2025-03-15 19:20:52 +08:00