mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2026-05-20 20:38:56 +00:00
V4-Flash MXFP4 full-GPU prefill fallback now works: - Previously crashed all TP schedulers with StopIteration/AttributeError whenever --kt-gpu-prefill-token-threshold was low enough to actually fire (path was hardcoded for FP8/INT4 layouts). - Now detects MXFP4, re-runs the V4 swizzle on the 256-expert gpu_layer, caches the load across prefill chunks. - Measured on 8x RTX 5090 (threshold=1024, chunked=1024): 16k input -> 2011 tok/s, 65k -> 2798, 262k -> 2154 prefill TPS.
7 lines
123 B
Python
7 lines
123 B
Python
"""
|
|
KTransformers version information.
|
|
Shared across the top-level package and kt-kernel.
|
|
"""
|
|
|
|
__version__ = "0.6.2.post1"
|