* Exposed draft model args for speculative decoding * Exposed int8 cache, dummy models, and no flash attention * Resolved CUDA 11.8 dependency issue