Files
ktransformers/version.py
Benjamin F d7b5b49a3e [release]: 0.6.2.post1
V4-Flash MXFP4 full-GPU prefill fallback now works:
- Previously crashed all TP schedulers with StopIteration/AttributeError
  whenever --kt-gpu-prefill-token-threshold was low enough to actually
  fire (path was hardcoded for FP8/INT4 layouts).
- Now detects MXFP4, re-runs the V4 swizzle on the 256-expert gpu_layer,
  caches the load across prefill chunks.
- Measured on 8x RTX 5090 (threshold=1024, chunked=1024):
  16k input -> 2011 tok/s, 65k -> 2798, 262k -> 2154 prefill TPS.
2026-05-03 21:07:23 +08:00

7 lines
123 B
Python

"""
KTransformers version information.
Shared across the top-level package and kt-kernel.
"""
__version__ = "0.6.2.post1"