Depth-Anything/semseg/README.md

# Depth Anything for Semantic Segmentation

We use our Depth Anything pre-trained ViT-L encoder to fine-tune downstream semantic segmentation models.


## Performance

### Cityscapes

Note that our results are obtained *without* Mapillary pre-training.

| Method | Encoder | mIoU (s.s.) | m.s. |
|:-:|:-:|:-:|:-:|
| SegFormer | MiT-B5 | 82.4 | 84.0 |
| Mask2Former | Swin-L | 83.3 | 84.3 |
| OneFormer | Swin-L | 83.0 | 84.4 |
| OneFormer | ConNeXt-XL | 83.6 | 84.6 |
| DDP | ConNeXt-L | 83.2 | 83.9 |
| **Ours** | ViT-L | **84.8** | **86.2** |


### ADE20K

| Method | Encoder | mIoU |
|:-:|:-:|:-:|
| SegFormer | MiT-B5 | 51.0 |
| Mask2Former | Swin-L | 56.4 |
| UperNet | BEiT-L | 56.3 |
| ViT-Adapter | BEiT-L | 58.3 |
| OneFormer | Swin-L | 57.4 |
| OneFormer | ConNeXt-XL | 57.4 |
| **Ours** | ViT-L | **59.4** |


## Pre-trained models

- [Cityscapes-ViT-L-mIoU-86.4](https://huggingface.co/spaces/LiheYoung/Depth-Anything/blob/main/checkpoints_semseg/cityscapes_vitl_mIoU_86.4.pth)
- [ADE20K-ViT-L-mIoU-59.4](https://huggingface.co/spaces/LiheYoung/Depth-Anything/blob/main/checkpoints_semseg/ade20k_vitl_mIoU_59.4.pth)


## Installation

Please refer to [MMSegmentation](https://github.com/open-mmlab/mmsegmentation/blob/main/docs/en/get_started.md#installation) for instructions.

After installation:
- move our [config/depth_anything](./config/depth_anything/) to mmseg's [config](https://github.com/open-mmlab/mmsegmentation/tree/main/configs)
- move our [dinov2.py](./dinov2.py) to mmseg's [backbones](https://github.com/open-mmlab/mmsegmentation/tree/main/mmseg/models/backbones)
- add DINOv2 in mmseg's [models/backbones/__init__.py](https://github.com/open-mmlab/mmsegmentation/blob/main/mmseg/models/backbones/__init__.py)


For training or inference with our pre-trained models, please refer to MMSegmentation [instructions](https://github.com/open-mmlab/mmsegmentation/blob/main/docs/en/user_guides/4_train_test.md).