Commit Graph

769 Commits

Author SHA1 Message Date
moonshadow-25
9781d1e6f4 iq1s core 2025-03-01 21:48:25 +08:00
godrosev
93c5b75716 rem 2025-03-01 21:25:18 +08:00
godrosev
e6349eb240 iq1s 2025-03-01 21:00:11 +08:00
Atream
761de49843 Merge pull request #751 from kvcache-ai/Atream-patch-2
Update DeepseekR1_V3_tutorial.md
2025-03-01 19:57:00 +08:00
Atream
735873a32a Update DeepseekR1_V3_tutorial.md 2025-03-01 19:56:46 +08:00
Atream
bd33a59ecf Merge pull request #750 from kvcache-ai/feat-chunk-prefill-flashinfer
Support chunk prefill. Support 139K context for DeepSeek-R1 139K with in 24G VRAM.
2025-03-01 19:50:52 +08:00
Atream
fa03ea48dd Merge branch 'main' into feat-chunk-prefill-flashinfer 2025-03-01 11:35:09 +00:00
Atream
f35e8d41d8 support chunk prefill, support 139K context for 24G VRAM 2025-03-01 11:28:25 +00:00
ZiWei Yuan
511958d49c Merge pull request #743 from KMSorSMS/main
fix cache_lens bug in server and rm test prompt.txt
2025-03-01 00:17:53 +08:00
liam
80e0536fb0 Merge branch 'main' of https://github.com/KMSorSMS/ktransformers into main 2025-03-01 00:12:21 +08:00
liam
8ddc990668 fix server cache lens 2025-03-01 00:09:57 +08:00
Atream
494469d4c5 Merge pull request #722 from ZhangShuaiyi/remove_unused
Delete duplicate code
2025-02-28 15:04:21 +08:00
liam
71f4599dee 📝 rm test_prompt 2025-02-28 11:44:49 +08:00
ZiWei Yuan
1264f9407b Merge pull request #732 from KMSorSMS/main
 fox docker build
2025-02-28 11:28:06 +08:00
liam
a0e7afa432 fox docker build 2025-02-28 11:25:34 +08:00
Azure
add415124f Merge pull request #731 from Azure-Tang/update-template
[fix] Fix template name
2025-02-28 11:19:52 +08:00
Azure
bc52969918 fix name 2025-02-28 03:17:33 +00:00
Azure
0439cb36d4 Merge pull request #730 from Azure-Tang/update-template
[UPDATE] Update ZH/EN issue template
2025-02-28 11:10:29 +08:00
Azure
31b01f5b99 update ZH/EN template 2025-02-28 03:09:06 +00:00
Shuaiyi
a34a25d5cc Delete unused code 2025-02-27 13:18:19 +00:00
wang jiahao
7a19f3b781 Merge pull request #721 from kvcache-ai/fix_temperature
fix temperature
2025-02-27 21:01:21 +08:00
qiyuxinlin
22df52e94e fix temperature 2025-02-27 21:00:44 +08:00
Atream
85e2cc7bf4 Merge pull request #719 from kvcache-ai/fix-use-generation-json
use generation config from json file in official repo
2025-02-27 19:49:41 +08:00
Atream
e645d84794 use generation config from json file in official repo 2025-02-27 11:48:34 +00:00
wang jiahao
5e3c6b4f97 Merge pull request #644 from wtdcode/temperature_top_p_from_request
Allow temperature and top_p from /v1/chat/completions
2025-02-27 18:13:13 +08:00
lazymio
b121ca4df8 Fix according to upstream changes 2025-02-27 18:11:35 +08:00
wang jiahao
26f7b4af11 Merge branch 'main' into temperature_top_p_from_request 2025-02-27 18:08:55 +08:00
Azure
1f28f75f55 Merge pull request #717 from kvcache-ai/issue-template
Update issue templates
2025-02-27 18:02:34 +08:00
Azure
c61805dd0a Update issue templates 2025-02-27 17:53:27 +08:00
Atream
50c691297f Merge pull request #622 from akemimadoka/fix-msvc
Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC
2025-02-27 17:42:00 +08:00
Atream
0422152cf3 Merge pull request #670 from akemimadoka/fix-win
Fix RuntimeError on Windows caused by integer overflow in np.prod
2025-02-27 17:40:27 +08:00
Atream
798e1d0cfa Merge pull request #532 from xv44586/fix-sse-formatting
fix: fix SSE formatting
2025-02-27 12:19:23 +08:00
Atream
f403cde6d4 Merge pull request #650 from ceerRep/main
feat: basic api key support
2025-02-27 12:16:53 +08:00
Atream
1d5d5faef6 Merge pull request #626 from cyhasuka/main
Feat: Clear cache during weight loading to prevent OOM on GPUs with <=8GB VRAM
2025-02-27 12:13:10 +08:00
Atream
8db6a4d402 Merge branch 'main' into main 2025-02-27 12:12:32 +08:00
wang jiahao
3c8c580580 Merge pull request #691 from swu-hyk/ollama_api_chat
feat:implementation of chat routing for Ollama
2025-02-27 11:17:48 +08:00
Azure
ca93cf7548 Merge pull request #702 from Azure-Tang/update-readme
[UPDATE] Update documents.
2025-02-26 23:45:24 +08:00
Azure
c05ebb74b1 Update fp8 doc; Update install.md broken link 2025-02-26 15:43:08 +00:00
Atream
3ebe17eb63 Merge pull request #699 from kvcache-ai/Atream-patch-1
Update DeepseekR1_V3_tutorial.md
2025-02-26 22:04:45 +08:00
Atream
369f4d917d Update DeepseekR1_V3_tutorial.md 2025-02-26 22:04:29 +08:00
Atream
9650893adc Merge pull request #697 from kvcache-ai/fix-yaml
Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml
2025-02-26 21:54:01 +08:00
Atream
90eb87b3fc Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml 2025-02-26 21:53:50 +08:00
swu-hyk
ec7e912fee modify 2025-02-26 19:21:30 +08:00
swu-hyk
68e7df3a25 implementation of chat routing for Ollama 2025-02-26 17:05:00 +08:00
Chen Hongtao
9660b2cc1e Merge pull request #685 from vproxy-tools/main
fix numa cpu distribution
2025-02-26 15:35:19 +08:00
ZiWei Yuan
e7ebb26370 Merge pull request #684 from KMSorSMS/main
fix dockerfile in devcontainer and fix expert torch
2025-02-26 15:06:51 +08:00
liam
ffb86c66e3 fix experts torch 2025-02-26 15:04:40 +08:00
liam
de082f141c fix cd error 2025-02-26 14:54:47 +08:00
wkgcass
b2bff17775 fix numa cpu distribution
The numa node location would be calculated based on the total number
of worker threads.
So we should always use the actual number of threads instead of using a min() op.
2025-02-26 14:49:57 +08:00
akemimadoka
8817777e11 Fix RuntimeError on Windows caused by integer overflow in np.prod 2025-02-26 03:50:12 +08:00