ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-05-18 19:39:45 +00:00

Author	SHA1	Message	Date
moonshadow-25	9781d1e6f4	iq1s core	2025-03-01 21:48:25 +08:00
godrosev	93c5b75716	rem	2025-03-01 21:25:18 +08:00
godrosev	e6349eb240	iq1s	2025-03-01 21:00:11 +08:00
Atream	761de49843	Merge pull request #751 from kvcache-ai/Atream-patch-2 Update DeepseekR1_V3_tutorial.md	2025-03-01 19:57:00 +08:00
Atream	735873a32a	Update DeepseekR1_V3_tutorial.md	2025-03-01 19:56:46 +08:00
Atream	bd33a59ecf	Merge pull request #750 from kvcache-ai/feat-chunk-prefill-flashinfer Support chunk prefill. Support 139K context for DeepSeek-R1 139K with in 24G VRAM.	2025-03-01 19:50:52 +08:00
Atream	fa03ea48dd	Merge branch 'main' into feat-chunk-prefill-flashinfer	2025-03-01 11:35:09 +00:00
Atream	f35e8d41d8	support chunk prefill, support 139K context for 24G VRAM	2025-03-01 11:28:25 +00:00
ZiWei Yuan	511958d49c	Merge pull request #743 from KMSorSMS/main fix cache_lens bug in server and rm test prompt.txt	2025-03-01 00:17:53 +08:00
liam	80e0536fb0	Merge branch 'main' of https://github.com/KMSorSMS/ktransformers into main	2025-03-01 00:12:21 +08:00
liam	8ddc990668	⚡ fix server cache lens	2025-03-01 00:09:57 +08:00
Atream	494469d4c5	Merge pull request #722 from ZhangShuaiyi/remove_unused Delete duplicate code	2025-02-28 15:04:21 +08:00
liam	71f4599dee	📝 rm test_prompt	2025-02-28 11:44:49 +08:00
ZiWei Yuan	1264f9407b	Merge pull request #732 from KMSorSMS/main ⚡ fox docker build	2025-02-28 11:28:06 +08:00
liam	a0e7afa432	⚡ fox docker build	2025-02-28 11:25:34 +08:00
Azure	add415124f	Merge pull request #731 from Azure-Tang/update-template [fix] Fix template name	2025-02-28 11:19:52 +08:00
Azure	bc52969918	fix name	2025-02-28 03:17:33 +00:00
Azure	0439cb36d4	Merge pull request #730 from Azure-Tang/update-template [UPDATE] Update ZH/EN issue template	2025-02-28 11:10:29 +08:00
Azure	31b01f5b99	update ZH/EN template	2025-02-28 03:09:06 +00:00
Shuaiyi	a34a25d5cc	Delete unused code	2025-02-27 13:18:19 +00:00
wang jiahao	7a19f3b781	Merge pull request #721 from kvcache-ai/fix_temperature fix temperature	2025-02-27 21:01:21 +08:00
qiyuxinlin	22df52e94e	fix temperature	2025-02-27 21:00:44 +08:00
Atream	85e2cc7bf4	Merge pull request #719 from kvcache-ai/fix-use-generation-json use generation config from json file in official repo	2025-02-27 19:49:41 +08:00
Atream	e645d84794	use generation config from json file in official repo	2025-02-27 11:48:34 +00:00
wang jiahao	5e3c6b4f97	Merge pull request #644 from wtdcode/temperature_top_p_from_request Allow temperature and top_p from /v1/chat/completions	2025-02-27 18:13:13 +08:00
lazymio	b121ca4df8	Fix according to upstream changes	2025-02-27 18:11:35 +08:00
wang jiahao	26f7b4af11	Merge branch 'main' into temperature_top_p_from_request	2025-02-27 18:08:55 +08:00
Azure	1f28f75f55	Merge pull request #717 from kvcache-ai/issue-template Update issue templates	2025-02-27 18:02:34 +08:00
Azure	c61805dd0a	Update issue templates	2025-02-27 17:53:27 +08:00
Atream	50c691297f	Merge pull request #622 from akemimadoka/fix-msvc Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC	2025-02-27 17:42:00 +08:00
Atream	0422152cf3	Merge pull request #670 from akemimadoka/fix-win Fix RuntimeError on Windows caused by integer overflow in np.prod	2025-02-27 17:40:27 +08:00
Atream	798e1d0cfa	Merge pull request #532 from xv44586/fix-sse-formatting fix: fix SSE formatting	2025-02-27 12:19:23 +08:00
Atream	f403cde6d4	Merge pull request #650 from ceerRep/main feat: basic api key support	2025-02-27 12:16:53 +08:00
Atream	1d5d5faef6	Merge pull request #626 from cyhasuka/main Feat: Clear cache during weight loading to prevent OOM on GPUs with <=8GB VRAM	2025-02-27 12:13:10 +08:00
Atream	8db6a4d402	Merge branch 'main' into main	2025-02-27 12:12:32 +08:00
wang jiahao	3c8c580580	Merge pull request #691 from swu-hyk/ollama_api_chat feat:implementation of chat routing for Ollama	2025-02-27 11:17:48 +08:00
Azure	ca93cf7548	Merge pull request #702 from Azure-Tang/update-readme [UPDATE] Update documents.	2025-02-26 23:45:24 +08:00
Azure	c05ebb74b1	Update fp8 doc; Update install.md broken link	2025-02-26 15:43:08 +00:00
Atream	3ebe17eb63	Merge pull request #699 from kvcache-ai/Atream-patch-1 Update DeepseekR1_V3_tutorial.md	2025-02-26 22:04:45 +08:00
Atream	369f4d917d	Update DeepseekR1_V3_tutorial.md	2025-02-26 22:04:29 +08:00
Atream	9650893adc	Merge pull request #697 from kvcache-ai/fix-yaml Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml	2025-02-26 21:54:01 +08:00
Atream	90eb87b3fc	Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml	2025-02-26 21:53:50 +08:00
swu-hyk	ec7e912fee	modify	2025-02-26 19:21:30 +08:00
swu-hyk	68e7df3a25	implementation of chat routing for Ollama	2025-02-26 17:05:00 +08:00
Chen Hongtao	9660b2cc1e	Merge pull request #685 from vproxy-tools/main fix numa cpu distribution	2025-02-26 15:35:19 +08:00
ZiWei Yuan	e7ebb26370	Merge pull request #684 from KMSorSMS/main fix dockerfile in devcontainer and fix expert torch	2025-02-26 15:06:51 +08:00
liam	ffb86c66e3	⚡ fix experts torch	2025-02-26 15:04:40 +08:00
liam	de082f141c	⚡ fix cd error	2025-02-26 14:54:47 +08:00
wkgcass	b2bff17775	fix numa cpu distribution The numa node location would be calculated based on the total number of worker threads. So we should always use the actual number of threads instead of using a min() op.	2025-02-26 14:49:57 +08:00
akemimadoka	8817777e11	Fix RuntimeError on Windows caused by integer overflow in np.prod	2025-02-26 03:50:12 +08:00

... 6 7 8 9 10 ...

769 Commits