From 0c0139d9cf7169b010664011306cbba72d850e82 Mon Sep 17 00:00:00 2001 From: turboderp <11859846+turboderp@users.noreply.github.com> Date: Sat, 1 Nov 2025 19:40:58 +0100 Subject: [PATCH] Update docs --- doc/convert.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/convert.md b/doc/convert.md index 69d82f2..815f951 100644 --- a/doc/convert.md +++ b/doc/convert.md @@ -33,6 +33,8 @@ does. - **-dr / --device_ratios *list***: Ratio as comma-separated list. Determines how the encoding workload is distributed when using multiple devices. This is useful if using GPUs with dissimilar compute performance, to prevent slower GPUs from becoming bottlenecks. Ratios are relative, i.e. `1,1,3` is the same ratio as `3,3,9`. +- **-pm / --parallel_mode**: Fully parallelize quantization across multiple GPUs when possible. By default, multi-GPU quantization works by splitting the trellis encoding workload across multiple devices. For models with many small tensors (especially MoE models) this is inefficient since the resulting tile slices end up being too small for efficient batched encoding. This mode prefers distributing one linear layer to each GPU at a time, allowing larger encoding batches and more overall throughput. This mode is still somewhat experimental but will likely become the default soon. + #### Debug stuff (ignore these) - **-lcpi / --last_checkpoint_index *int***: If specified, don't save checkpoints after this module index.