ktransformers/version.py at c465557c23c2d8d5c8441cf9b06d2be03aa42dc2 - ktransformers - Public git mirror

kvcache-ai/ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-05-20 20:38:56 +00:00

Files

Benjamin F d7b5b49a3e [release]: 0.6.2.post1

V4-Flash MXFP4 full-GPU prefill fallback now works:
- Previously crashed all TP schedulers with StopIteration/AttributeError
  whenever --kt-gpu-prefill-token-threshold was low enough to actually
  fire (path was hardcoded for FP8/INT4 layouts).
- Now detects MXFP4, re-runs the V4 swizzle on the 256-expert gpu_layer,
  caches the load across prefill chunks.
- Measured on 8x RTX 5090 (threshold=1024, chunked=1024):
  16k input -> 2011 tok/s, 65k -> 2798, 262k -> 2154 prefill TPS.

2026-05-03 21:07:23 +08:00

7 lines

123 B

Python

Raw Blame History

 """
 KTransformers version information.
 Shared across the top-level package and kt-kernel.
 """
 __version__ = "0.6.2.post1"