ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-05-19 03:49:23 +00:00

Author	SHA1	Message	Date
Azure-Tang	d98433c2d1	update git action env, add USE_BALANCE_SERVE=1	2025-04-01 12:58:28 +00:00
dongjw	5c7ed7b579	fix top_p = 0 bug	2025-04-01 20:38:33 +08:00
Azure-Tang	aeabd783b0	update git action env, add BALANCE_SERVE=1	2025-04-01 11:21:55 +00:00
Azure-Tang	31677181c3	Fix ktransformers-server flashinfer wrapper position arg issue; Fix db position issue	2025-04-01 07:30:23 +00:00
Azure-Tang	203b853c75	rm KMoEGateDeepSeekV3, fall back to KMoEGate	2025-04-01 07:13:05 +00:00
Azure-Tang	3a5330b215	Merge branch 'main' into work-concurrent	2025-04-01 06:48:19 +00:00
fishingfly	7549ff335a	fix: refine backend error message to include ROCM_HOME Signed-off-by: fishingfly <zhoyuzf@163.com>	2025-04-01 10:50:38 +08:00
Atream	80c5cbecdd	add nlohmann	2025-04-01 10:38:45 +08:00
Atream	9360d1e3c8	add submodules	2025-03-31 23:20:29 +08:00
Atream	25cee5810e	add balance-serve, support concurrence	2025-03-31 22:55:32 +08:00
Atream	8d0292aa44	refactor folders	2025-03-31 22:45:37 +08:00
Yuhao Tsui	84164f584c	Update completions.py	2025-03-26 15:39:46 +08:00
Yuhao Tsui	52fa671c10	Merge branch 'kvcache-ai:main' into main	2025-03-26 11:06:00 +08:00
Atream	f142f4dff3	Merge pull request #956 from kvcache-ai/Atream-patch-7 Update README.md	2025-03-22 12:14:48 +08:00
Atream	d4c6c2bb02	Update README.md	2025-03-22 12:14:36 +08:00
Aubrey Li	a12e8ab46e	yaml: fix Marlin AssertionError Marlin quantized linear only supports GPU device, when change generate_op to "KLinearMarlin", generate_device need to be changed to "cuda" accordingly. Fixes: `e5b001d76f` ("Update readme; Format code; Add example yaml.")	2025-03-21 23:58:20 +08:00
Aubrey Li	f4d52d1f0c	Restore CPU offloading capability	2025-03-21 10:04:31 +08:00
Jiaqi Liao	05f6cede37	Merge pull request #943 from SkqLiao/main fix benchmark params for human eval benchmark	2025-03-20 18:49:34 +08:00
SkqLiao	6d4626a5d9	fix params	2025-03-20 18:48:51 +08:00
Atream	ddd35d5be9	Merge pull request #940 from kvcache-ai/Atream-patch-6 Update gate.py	2025-03-20 14:54:20 +08:00
Atream	633af5d235	Update gate.py	2025-03-20 14:54:01 +08:00
SkqLiao	8cc4df980e	use DeepSeek V3 instead of R1 for benchmarking	2025-03-20 11:59:03 +08:00
Jiaqi Liao	32a91c78c1	Merge pull request #935 from SkqLiao/main Fix benchmarking slow issue on self-hosted actions	2025-03-20 10:14:37 +08:00
SkqLiao	e7d7d2705c	rename CI/CD	2025-03-20 10:11:24 +08:00
SkqLiao	19c824f9d0	change cpu-infer due to actual cpu cores on self-hosted server.	2025-03-20 10:10:52 +08:00
Jiaqi Liao	649489dc67	Merge pull request #931 from SkqLiao/main Add Human Eval Benchmark Test for CI/CD	2025-03-19 21:35:24 +08:00
SkqLiao	bad334fa5b	fix path	2025-03-19 21:28:58 +08:00
SkqLiao	bc369b256c	add CI/CD for human eval score benchmarking	2025-03-19 21:25:21 +08:00
Atream	8be56a0190	Merge pull request #927 from kvcache-ai/fix-gate-precision Update gate.py	2025-03-19 16:16:31 +08:00
Atream	b453333f60	Update gate.py	2025-03-19 16:14:54 +08:00
Atream	6ca233cca3	Merge pull request #926 from kvcache-ai/Atream-patch-5 Update gate.py	2025-03-19 12:17:09 +08:00
Atream	44599229cd	Update gate.py	2025-03-19 12:16:48 +08:00
Atream	aa8f985f85	Merge pull request #925 from kvcache-ai/fix-gate-compile fix-gate-compile	2025-03-19 11:44:41 +08:00
Atream	114995355b	fix-gate-compile	2025-03-19 11:27:18 +08:00
ZiWei Yuan	e788248364	Merge pull request #916 from kvcache-ai/patch_v0.2.3post2 📝 fix typo ktransformer->ktransformers	2025-03-17 17:55:30 +08:00
liam	4748a912e2	📝 fix typo ktransformer->ktransformers	2025-03-17 17:54:00 +08:00
Atream	8b51b0f058	Merge pull request #915 from kvcache-ai/Atream-patch-4 Atream patch 4	2025-03-17 17:05:39 +08:00
Atream	167506b779	Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml	2025-03-17 17:05:01 +08:00
Atream	c9a0c44213	Update DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml	2025-03-17 17:03:52 +08:00
Atream	3aee0fa099	Merge pull request #913 from kvcache-ai/Atream-patch-3 Add files via upload	2025-03-17 17:00:28 +08:00
Atream	094ac8f3a4	Add files via upload	2025-03-17 16:59:57 +08:00
ZiWei Yuan	8a8311cb04	Merge pull request #911 from kvcache-ai/patch_v0.2.3post2 🔧 update multi-gpu-fp8-linear and multi-gpu marlin yaml	2025-03-17 15:09:11 +08:00
liam	19f058ec9e	🔧 update multi-gpu-fp8-linear and multi-gpu marlin yaml	2025-03-17 15:08:12 +08:00
Azure	0e93a09d67	Merge pull request #906 from Azure-Tang/main [Fix] Fix rocm example yaml	2025-03-16 10:27:59 +08:00
Azure-Tang	85c32fdd10	Fix rocm example yaml	2025-03-15 22:27:02 -04:00
Azure	63604cac59	Merge pull request #904 from Azure-Tang/main [fix]Fix rocm compilation	2025-03-16 00:36:16 +08:00
Azure-Tang	4a31237346	fix rocm compilation	2025-03-15 12:34:03 -04:00
Atream	c51818c39a	Merge pull request #902 from kvcache-ai/rollback-triton-prefill rollback-triton-prefill v0.2.3post2	2025-03-15 23:09:30 +08:00
Atream	3934b9dfc1	rollback-triton-prefill	2025-03-15 14:21:21 +00:00
ZiWei Yuan	bda9cf15e7	Merge pull request #899 from kvcache-ai/develop-0.2.3post2 ⚡ fix readme path	2025-03-15 19:20:52 +08:00

... 3 4 5 6 7 ...

769 Commits