Commit Graph

15 Commits

Author SHA1 Message Date
Brayden Zhong
6a9b09847c CUTLASS NVFP4 GEMM improvement of SM120 (#21314) 2026-04-01 09:04:34 +08:00
Артем Савкин
27071e0a43 [NPU] Update quantization&CI documentation (#21100)
Co-authored-by: Tamir Baydasov <41994229+TamirBaydasov@users.noreply.github.com>
2026-03-28 21:42:21 +03:00
Mook
23c191afb6 fix(docs): correct quantization documentation (#20301) (#20619) 2026-03-15 12:33:12 -04:00
Brayden Zhong
591e61245a [Doc] Add smal table for GEMM backends (#20213) 2026-03-09 22:19:57 -07:00
Bruce Changlong Xu
feda2b11c4 [AMD] Add AWQ AMD CI coverage and quantization platform compatibility docs (#19550) 2026-03-04 19:50:55 -08:00
Zack Yu
54589a2f2d docs: expand and update modelopt documentation (#18479)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-09 23:09:52 +00:00
fxmarty-amd
5af84c8af5 [AMD][Quantization] Add int4fp8_moe online quantization on ROCm (#7392)
Co-authored-by: Dehua Tang <dehtang@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: YC Tseng <yctseng@amd.com>
2026-01-14 01:44:40 -08:00
Zhiyu
f6423b626c Rename TensorRT Model Optimizer to Model Optimizer (#14455)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
2025-12-07 13:18:20 -08:00
b8zhong
88d1bab537 add doc for quantized kv cache (#14348)
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
Co-authored-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
2025-12-04 13:01:05 -08:00
赵晨阳
c56fc42430 Update quantization.md with new model resources (#13677) 2025-11-20 15:50:16 -08:00
Weiwei
caa4819bfc Add support for AutoRound quantized models (#10153) 2025-10-27 18:17:29 +08:00
Zhiyu
80b2b3207a Enable native ModelOpt quantization support (3/3) (#10154)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
2025-10-21 21:44:29 -07:00
Yineng Zhang
b7d1f17b8d Revert "enable auto-round quantization model (#6226)" (#10148) 2025-09-07 22:31:11 -07:00
Weiwei
c8295d2353 enable auto-round quantization model (#6226)
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
2025-09-07 22:05:35 -07:00
Lianmin Zheng
2449a0afe2 Refactor the docs (#9031) 2025-08-10 19:49:45 -07:00