mirror of
https://github.com/huchenlei/Depth-Anything.git
synced 2026-01-26 15:29:46 +00:00
Depth Anything for Semantic Segmentation
We use our Depth Anything pre-trained ViT-L encoder to fine-tune downstream semantic segmentation models.
Performance
Cityscapes
Note that our results are obtained without Mapillary pre-training.
| Method | Encoder | mIoU (s.s.) | m.s. |
|---|---|---|---|
| SegFormer | MiT-B5 | 82.4 | 84.0 |
| Mask2Former | Swin-L | 83.3 | 84.3 |
| OneFormer | Swin-L | 83.0 | 84.4 |
| OneFormer | ConNeXt-XL | 83.6 | 84.6 |
| DDP | ConNeXt-L | 83.2 | 83.9 |
| Ours | ViT-L | 84.8 | 86.2 |
ADE20K
| Method | Encoder | mIoU |
|---|---|---|
| SegFormer | MiT-B5 | 51.0 |
| Mask2Former | Swin-L | 56.4 |
| UperNet | BEiT-L | 56.3 |
| ViT-Adapter | BEiT-L | 58.3 |
| OneFormer | Swin-L | 57.4 |
| OneFormer | ConNeXt-XL | 57.4 |
| Ours | ViT-L | 59.4 |
Pre-trained models
Installation
Please refer to MMSegmentation for instructions.
After installation:
- move our config/depth_anything to mmseg's config
- move our dinov2.py to mmseg's backbones
- add DINOv2 in mmseg's models/backbones/init.py
For training or inference with our pre-trained models, please refer to MMSegmentation instructions.